summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/vgic
AgeCommit message (Collapse)Author
2025-05-22KVM: arm64: Fix documentation for vgic_its_iter_next()Marc Zyngier
As reported by the build robot, the documentation for vgic_its_iter_next() contains a typo. Fix it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505221421.KAuWlmSr-lkp@intel.com/ Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: vgic-its: Add debugfs interface to expose ITS tablesJing Zhang
This commit introduces a debugfs interface to display the contents of the VGIC Interrupt Translation Service (ITS) tables. The ITS tables map Device/Event IDs to Interrupt IDs and target processors. Exposing this information through debugfs allows for easier inspection and debugging of the interrupt routing configuration. The debugfs interface presents the ITS table data in a tabular format: Device ID: 0x0, Event ID Range: [0 - 31] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8192 0 0 0 0 1 8193 0 0 0 0 2 8194 0 2 2 0 Device ID: 0x18, Event ID Range: [0 - 3] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8225 0 0 0 0 1 8226 0 1 1 0 2 8227 0 3 3 0 Device ID: 0x10, Event ID Range: [0 - 7] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8229 0 3 3 1 1 8230 0 0 0 1 2 8231 0 1 1 1 3 8232 0 2 2 1 4 8233 0 3 3 1 The output is generated using the seq_file interface, allowing for efficient handling of potentially large ITS tables. This interface is read-only and does not allow modification of the ITS tables. It is intended for debugging and informational purposes only. Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20250220224247.2017205-1-jingzhangos@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-16KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1Wei-Lin Chang
In the case of ICH_LR<n>.HW == 1, bit 41 of LR is just a part of pINTID without EOI meaning, and bit 41 will be zeroed by the subsequent clearing of ICH_LR_PHYS_ID_MASK anyway. No functional changes intended. Signed-off-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw> Link: https://lore.kernel.org/r/20250512133223.866999-1-r09922117@csie.ntu.edu.tw Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-03-19Merge branch 'kvm-arm64/pv-cpuid' into kvmarm/nextOliver Upton
* kvm-arm64/pv-cpuid: : Paravirtualized implementation ID, courtesy of Shameer Kolothum : : Big-little has historically been a pain in the ass to virtualize. The : implementation ID (MIDR, REVIDR, AIDR) of a vCPU can change at the whim : of vCPU scheduling. This can be particularly annoying when the guest : needs to know the underlying implementation to mitigate errata. : : "Hyperscalers" face a similar scheduling problem, where VMs may freely : migrate between hosts in a pool of heterogenous hardware. And yes, our : server-class friends are equally riddled with errata too. : : In absence of an architected solution to this wart on the ecosystem, : introduce support for paravirtualizing the implementation exposed : to a VM, allowing the VMM to describe the pool of implementations that a : VM may be exposed to due to scheduling/migration. : : Userspace is expected to intercept and handle these hypercalls using the : SMCCC filter UAPI, should it choose to do so. smccc: kvm_guest: Fix kernel builds for 32 bit arm KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2 smccc/kvm_guest: Enable errata based on implementation CPUs arm64: Make  _midr_in_range_list() an exported function KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2 KVM: arm64: Specify hypercall ABI for retrieving target implementations arm64: Modify _midr_range() functions to read MIDR/REVIDR internally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19Merge branch 'kvm-arm64/nv-vgic' into kvmarm/nextOliver Upton
* kvm-arm64/nv-vgic: : NV VGICv3 support, courtesy of Marc Zyngier : : Support for emulating the GIC hypervisor controls and managing shadow : VGICv3 state for the L1 hypervisor. As part of it, bring in support for : taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt. KVM: arm64: nv: Fail KVM init if asking for NV without GICv3 KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2 KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting KVM: arm64: nv: Add Maintenance Interrupt emulation KVM: arm64: nv: Handle L2->L1 transition on interrupt injection KVM: arm64: nv: Nested GICv3 emulation KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg KVM: arm64: nv: Load timer before the GIC arm64: sysreg: Add layout for ICH_MISR_EL2 arm64: sysreg: Add layout for ICH_VTR_EL2 arm64: sysreg: Add layout for ICH_HCR_EL2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Allow userland to set VGIC maintenance IRQAndre Przywara
The VGIC maintenance IRQ signals various conditions about the LRs, when the GIC's virtualization extension is used. So far we didn't need it, but nested virtualization needs to know about this interrupt, so add a userland interface to setup the IRQ number. The architecture mandates that it must be a PPI, on top of that this code only exports a per-device option, so the PPI is the same on all VCPUs. Signed-off-by: Andre Przywara <andre.przywara@arm.com> [added some bits of documentation] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-16-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setupMarc Zyngier
Popular HW that is able to use NV also has a broken vgic implementation that requires trapping. On such HW, propagate the host trap bits into the guest's shadow ICH_HCR_EL2 register, making sure we don't allow an L2 guest to bring the system down. This involves a bit of tweaking so that the emulation code correctly poicks up the shadow state as needed, and to only partially sync ICH_HCR_EL2 back with the guest state to capture EOIcount. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Propagate used_lrs between L1 and L0 contextsMarc Zyngier
We have so far made sure that L1 and L0 vgic contexts were totally independent. There is however one spot of bother with this approach, and that's in the GICv3 emulation code required by our fruity friends. The issue is that the emulation code needs to know how many LRs are in flight. And while it is easy to reach the L0 version through the vcpu pointer, doing so for the L1 is much more complicated, as these structures are private to the nested code. We could simply expose that structure and pick one or the other depending on the context, but this seems extra complexity for not much benefit. Instead, just propagate the number of used LRs from the nested code into the L0 context, and be done with it. Should this become a burden, it can be easily rectified. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-14-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2Oliver Upton
Running an L2 guest with GICv4 enabled goes absolutely nowhere, and gets into a vicious cycle of nested ERET followed by nested exception entry into the L1. When KVM does a put on a runnable vCPU, it marks the vPE as nonresident but does not request a doorbell IRQ. Behind the scenes in the ITS driver's view of the vCPU, its_vpe::pending_last gets set to true to indicate that context is still runnable. This comes to a head when doing the nested ERET into L2. The vPE doesn't get scheduled on the redistributor as it is exclusively part of the L1's VGIC context. kvm_vgic_vcpu_pending_irq() returns true because the vPE appears runnable, and KVM does a nested exception entry into the L1 before L2 ever gets off the ground. This issue can be papered over by requesting a doorbell IRQ when descheduling a vPE as part of a nested ERET. KVM needs this anyway to kick the vCPU out of the L2 when an IRQ becomes pending for the L1. Link: https://lore.kernel.org/r/20240823212703.3576061-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-13-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Add Maintenance Interrupt emulationMarc Zyngier
Emulating the vGIC means emulating the dreaded Maintenance Interrupt. This is a two-pronged problem: - while running L2, getting an MI translates into an MI injected in the L1 based on the state of the HW. - while running L1, we must accurately reflect the state of the MI line, based on the in-memory state. The MI INTID is added to the distributor, as expected on any virtualisation-capable implementation, and further patches will allow its configuration. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Handle L2->L1 transition on interrupt injectionMarc Zyngier
An interrupt being delivered to L1 while running L2 must result in the correct exception being delivered to L1. This means that if, on entry to L2, we found ourselves with pending interrupts in the L1 distributor, we need to take immediate action. This is done by posting a request which will prevent the entry in L2, and deliver an IRQ exception to L1, forcing the switch. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Nested GICv3 emulationMarc Zyngier
When entering a nested VM, we set up the hypervisor control interface based on what the guest hypervisor has set. Especially, we investigate each list register written by the guest hypervisor whether HW bit is set. If so, we translate hw irq number from the guest's point of view to the real hardware irq number if there is a mapping. Co-developed-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> [Christoffer: Redesigned execution flow around vcpu load/put] Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [maz: Rewritten to support GICv3 instead of GICv2, NV2 support] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03KVM: arm64: nv: Plumb handling of GICv3 EL2 accessesMarc Zyngier
Wire the handling of all GICv3 EL2 registers, and provide emulation for all the non memory-backed registers (ICC_SRE_EL2, ICH_VTR_EL2, ICH_MISR_EL2, ICH_ELRSR_EL2, and ICH_EISR_EL2). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03arm64: sysreg: Add layout for ICH_VTR_EL2Marc Zyngier
The ICH_VTR_EL2-related macros are missing a number of config bits that we are about to handle. Take this opportunity to fully describe the layout of that register as part of the automatic generation infrastructure. This results in a bit of churn to repaint constants that are now generated with a different format. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-03arm64: sysreg: Add layout for ICH_HCR_EL2Marc Zyngier
The ICH_HCR_EL2-related macros are missing a number of control bits that we are about to handle. Take this opportunity to fully describe the layout of that register as part of the automatic generation infrastructure. This results in a bit of churn, unfortunately. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250225172930.1850838-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26arm64: Modify _midr_range() functions to read MIDR/REVIDR internallyShameer Kolothum
These changes lay the groundwork for adding support for guest kernels, allowing them to leverage target CPU implementations provided by the VMM. No functional changes intended. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250221140229.12588-2-shameerali.kolothum.thodi@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26KVM: arm64: vgic-v4: Fall back to software irqbypass if LPI not foundOliver Upton
Continuing with the theme of broken VMMs and guests, irqbypass registration can fail if the virtual ITS lacks a translation for the MSI. Either the guest hasn't mapped it or userspace may have forgotten to restore the ITS. Exit silently and allow irqbypass configuration to succeed. As a reward for ingenuity, LPIs are demoted to software injection. Tested-by: Sudheer Dantuluri <dantuluris@google.com> Fixes: 196b136498b3 ("KVM: arm/arm64: GICv4: Wire mapping/unmapping of VLPIs in VFIO irq bypass") Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250226183124.82094-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26KVM: arm64: vgic-v4: Only WARN for HW IRQ mismatch when unmapping vLPIOliver Upton
The VMM or guest can easily screw up GICv4 vLPI injection by misconfiguring the MSI or the virtual ITS. Don't fuss over it; limit the WARN in vgic_v4_unset_forwarding() to fire in the worrying case where an unrelated HW IRQ was mapped to a vLPI. Reported-by: Sudheer Dantuluri <dantuluris@google.com> Tested-by: Sudheer Dantuluri <dantuluris@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250226183124.82094-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-13KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic()Marc Zyngier
If userspace creates vcpus, then a vgic, we end-up in a situation where irqchip_in_kernel() will return true, but no private interrupt has been allocated for these vcpus. This situation will continue until userspace initialises the vgic, at which point we fix the early vcpus. Should a vcpu run or be initialised in the interval, bad things may happen. An obvious solution is to move this fix-up phase to the point where the vgic is created. This ensures that from that point onwards, all vcpus have their private interrupts, as new vcpus will directly allocate them. With that, we have the invariant that when irqchip_in_kernel() is true, all vcpus have their private interrupts. Reported-by: Alexander Potapenko <glider@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250212182558.2865232-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-17Merge branch kvm-arm64/misc-6.14 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.14: : . : Misc KVM/arm64 changes for 6.14 : : - Don't expose AArch32 EL0 capability when NV is enabled : : - Update documentation to reflect the full gamut of kvm-arm.mode : behaviours : : - Use the hypervisor VA bit width when dumping stacktraces : : - Decouple the hypervisor stack size from PAGE_SIZE, at least : on the surface... : : - Make use of str_enabled_disabled() when advertising GICv4.1 support : : - Explicitly handle BRBE traps as UNDEFINED : . KVM: arm64: Explicitly handle BRBE traps as UNDEFINED KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe() arm64: kvm: Introduce nvhe stack size constants KVM: arm64: Fix nVHE stacktrace VA bits mask Documentation: Update the behaviour of "kvm-arm.mode" KVM: arm64: nv: Advertise the lack of AArch32 EL0 support Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-11KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()Thorsten Blum
Remove hard-coded strings by using the str_enabled_disabled() helper function. Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250110225310.369980-2-thorsten.blum@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-20KVM: arm64: Introduce __pkvm_vcpu_{load,put}()Marc Zyngier
Rather than look-up the hyp vCPU on every run hypercall at EL2, introduce a per-CPU 'loaded_hyp_vcpu' tracking variable which is updated by a pair of load/put hypercalls called directly from kvm_arch_vcpu_{load,put}() when pKVM is enabled. Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20241218194059.3670226-10-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-12-03KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translationKeisuke Nishimura
The return value of xa_store() needs to be checked. This fix adds an error handling path that resolves the kref inconsistency on failure. As suggested by Oliver Upton, this function does not return the error code intentionally because the translation cache is best effort. Fixes: 8201d1028caa ("KVM: arm64: vgic-its: Maintain a translation cache per ITS") Signed-off-by: Keisuke Nishimura <keisuke.nishimura@inria.fr> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241130144952.23729-1-keisuke.nishimura@inria.fr Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizesMarc Zyngier
The ITS ABI infrastructure allows for some pretty lax code, where the size of the data doesn't have to match the size of the entry, potentially leading to a collection of interesting bugs. Commit 7fe28d7e68f9 ("KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*") added some checks, but starts by implicitly casting all writes to a 64bit value, hiding some of the issues. Instead, introduce macros that will check the data type actually used for dealing with the table entries. The macros are taking a symbolic entry type that is used to fetch the size of the entry type for the current ABI. This immediately catches a couple of low-impact gotchas (zero values that are implicitly 32bit), easy enough to fix. Given that we currently only have a single ABI, hardcode a couple of BUILD_BUG_ON()s that will fire if we use anything but a 64bit quantity, and some (currently unreachable) fallback code that may become useful one day. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definitionMarc Zyngier
VGIC_MAX_PRIVATE is a pretty useless definition, and is better replaced with VGIC_NR_PRIVATE_IRQS. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20KVM: arm64: vgic: Make vgic_get_irq() more robustMarc Zyngier
vgic_get_irq() has an awkward signature, as it takes both a kvm *and* a vcpu, where the vcpu is allowed to be NULL if the INTID being looked up is a global interrupt (SPI or LPI). This leads to potentially problematic situations where the INTID passed is a private interrupt, but that there is no vcpu. In order to make things less ambiguous, let have *two* helpers instead: - vgic_get_irq(struct kvm *kvm, u32 intid), which is only concerned with *global* interrupts, as indicated by the lack of vcpu. - vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid), which can return *any* interrupt class, but must have of course a non-NULL vcpu. Most of the code nicely falls under one or the other situations, except for a couple of cases (close to the UABI or in the debug code) where we have to distinguish between the two cases. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIRMarc Zyngier
Make sure we filter out non-LPI invalidation when handling writes to GICR_INVLPIR. Fixes: 4645d11f4a553 ("KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241117165757.247686-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITEKunkun Jiang
When DISCARD frees an ITE, it does not invalidate the corresponding ITE. In the scenario of continuous saves and restores, there may be a situation where an ITE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding ITE when DISCARD frees an ITE. Cc: stable@vger.kernel.org Fixes: eff484e0298d ("KVM: arm64: vgic-its: ITT save and restore") Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-6-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a deviceKunkun Jiang
vgic_its_save_device_tables will traverse its->device_list to save DTE for each device. vgic_its_restore_device_tables will traverse each entry of device table and check if it is valid. Restore if valid. But when MAPD unmaps a device, it does not invalidate the corresponding DTE. In the scenario of continuous saves and restores, there may be a situation where a device's DTE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding DTE when MAPD unmaps a device. Cc: stable@vger.kernel.org Fixes: 57a9a117154c ("KVM: arm64: vgic-its: Device table save/restore") Co-developed-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-5-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*Jing Zhang
In all the vgic_its_save_*() functinos, they do not check whether the data length is 8 bytes before calling vgic_write_guest_lock. This patch adds the check. To prevent the kernel from being blown up when the fault occurs, KVM_BUG_ON() is used. And the other BUG_ON()s are replaced together. Cc: stable@vger.kernel.org Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with the new entry read/write helpers] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-4-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-17KVM: arm64: Ensure vgic_ready() is ordered against MMIO registrationOliver Upton
kvm_vgic_map_resources() prematurely marks the distributor as 'ready', potentially allowing vCPUs to enter the guest before the distributor's MMIO registration has been made visible. Plug the race by marking the distributor as ready only after MMIO registration is completed. Rely on the implied ordering of synchronize_srcu() to ensure the MMIO registration is visible before vgic_dist::ready. This also means that writers to vgic_dist::ready are now serialized by the slots_lock, which was effectively the case already as all writers held the slots_lock in addition to the config_lock. Fixes: 59112e9c390b ("KVM: arm64: vgic: Fix a circular locking issue") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241017001947.2707312-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-17KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQSOliver Upton
KVM commits to a particular sizing of SPIs when the vgic is initialized, which is before the point a vgic becomes ready. On top of that, KVM supplies a default amount of SPIs should userspace not explicitly configure this. As such, the check for vgic_ready() in the handling of KVM_DEV_ARM_VGIC_GRP_NR_IRQS is completely wrong, and testing if nr_spis is nonzero is sufficient for preventing userspace from playing games with us. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241017001947.2707312-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-11KVM: arm64: Don't eagerly teardown the vgic on init errorMarc Zyngier
As there is very little ordering in the KVM API, userspace can instanciate a half-baked GIC (missing its memory map, for example) at almost any time. This means that, with the right timing, a thread running vcpu-0 can enter the kernel without a GIC configured and get a GIC created behind its back by another thread. Amusingly, it will pick up that GIC and start messing with the data structures without the GIC having been fully initialised. Similarly, a thread running vcpu-1 can enter the kernel, and try to init the GIC that was previously created. Since this GIC isn't properly configured (no memory map), it fails to correctly initialise. And that's the point where we decide to teardown the GIC, freeing all its resources. Behind vcpu-0's back. Things stop pretty abruptly, with a variety of symptoms. Clearly, this isn't good, we should be a bit more careful about this. It is obvious that this guest is not viable, as it is missing some important part of its configuration. So instead of trying to tear bits of it down, let's just mark it as *dead*. It means that any further interaction from userspace will result in -EIO. The memory will be released on the "normal" path, when userspace gives up. Cc: stable@vger.kernel.org Reported-by: Alexander Potapenko <glider@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241009183603.3221824-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-08KVM: arm64: Unregister redistributor for failed vCPU creationOliver Upton
Alex reports that syzkaller has managed to trigger a use-after-free when tearing down a VM: BUG: KASAN: slab-use-after-free in kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 Read of size 8 at addr ffffff801c6890d0 by task syz.3.2219/10758 CPU: 3 UID: 0 PID: 10758 Comm: syz.3.2219 Not tainted 6.11.0-rc6-dirty #64 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x17c/0x1a8 arch/arm64/kernel/stacktrace.c:317 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:324 __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x94/0xc0 lib/dump_stack.c:119 print_report+0x144/0x7a4 mm/kasan/report.c:377 kasan_report+0xcc/0x128 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 kvm_vm_release+0x4c/0x60 virt/kvm/kvm_main.c:1409 __fput+0x198/0x71c fs/file_table.c:422 ____fput+0x20/0x30 fs/file_table.c:450 task_work_run+0x1cc/0x23c kernel/task_work.c:228 do_notify_resume+0x144/0x1a0 include/linux/resume_user_mode.h:50 el0_svc+0x64/0x68 arch/arm64/kernel/entry-common.c:169 el0t_64_sync_handler+0x90/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Upon closer inspection, it appears that we do not properly tear down the MMIO registration for a vCPU that fails creation late in the game, e.g. a vCPU w/ the same ID already exists in the VM. It is important to consider the context of commit that introduced this bug by moving the unregistration out of __kvm_vgic_vcpu_destroy(). That change correctly sought to avoid an srcu v. config_lock inversion by breaking up the vCPU teardown into two parts, one guarded by the config_lock. Fix the use-after-free while avoiding lock inversion by adding a special-cased unregistration to __kvm_vgic_vcpu_destroy(). This is safe because failed vCPUs are torn down outside of the config_lock. Cc: stable@vger.kernel.org Fixes: f616506754d3 ("KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007223909.2157336-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guestMarc Zyngier
In order to be consistent, we shouldn't advertise a GICv3 when none is actually usable by the guest. Wipe the feature when these conditions apply, and allow the field to be written from userspace. This now allows us to rewrite the kvm_has_gicv3 helper() in terms of kvm_has_feat(), given that it is always evaluated at runtime. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Force GICv3 trap activation when no irqchip is configured on VHEMarc Zyngier
On a VHE system, no GICv3 traps get configured when no irqchip is present. This is not quite matching the "no GICv3" semantics that we want to present. Force such traps to be configured in this case. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Force SRE traps when SRE access is not enabledMarc Zyngier
We so far only write the ICH_HCR_EL2 config in two situations: - when we need to emulate the GICv3 CPU interface due to HW bugs - when we do direct injection, as the virtual CPU interface needs to be enabled This is all good. But it also means that we don't do anything special when we emulate a GICv2, or that there is no GIC at all. What happens in this case when the guest uses the GICv3 system registers? The *guest* gets a trap for a sysreg access (EC=0x18) while we'd really like it to get an UNDEF. Fixing this is a bit involved: - we need to set all the required trap bits (TC, TALL0, TALL1, TDIR) - for these traps to take effect, we need to (counter-intuitively) set ICC_SRE_EL1.SRE to 1 so that the above traps take priority. Note that doesn't fully work when GICv2 emulation is enabled, as we cannot set ICC_SRE_EL1.SRE to 1 (it breaks Group0 delivery as IRQ). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-27KVM: arm64: Move GICv3 trap configuration to kvm_calculate_traps()Marc Zyngier
Follow the pattern introduced with vcpu_set_hcr(), and introduce vcpu_set_ich_hcr(), which configures the GICv3 traps at the same point. This will allow future changes to introduce trap configuration on a per-VM basis. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240827152517.3909653-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-08-22KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3Marc Zyngier
On a system with a GICv3, if a guest hasn't been configured with GICv3 and that the host is not capable of GICv2 emulation, a write to any of the ICC_*SGI*_EL1 registers is trapped to EL2. We therefore try to emulate the SGI access, only to hit a NULL pointer as no private interrupt is allocated (no GIC, remember?). The obvious fix is to give the guest what it deserves, in the shape of a UNDEF exception. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-19KVM: arm64: vgic: Don't hold config_lock while unregistering redistributorsMarc Zyngier
We recently moved the teardown of the vgic part of a vcpu inside a critical section guarded by the config_lock. This teardown phase involves calling into kvm_io_bus_unregister_dev(), which takes the kvm->srcu lock. However, this violates the established order where kvm->srcu is taken on a memory fault (such as an MMIO access), possibly followed by taking the config_lock if the GIC emulation requires mutual exclusion from the other vcpus. It therefore results in a bad lockdep splat, as reported by Zenghui. Fix this by moving the call to kvm_io_bus_unregister_dev() outside of the config_lock critical section. At this stage, there shouln't be any need to hold the config_lock. As an additional bonus, document the ordering between kvm->slots_lock, kvm->srcu and kvm->arch.config_lock so that I cannot pretend I didn't know about those anymore. Fixes: 9eb18136af9f ("KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240819125045.3474845-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-19KVM: arm64: vgic-debug: Don't put unmarked LPIsZenghui Yu
If there were LPIs being mapped behind our back (i.e., between .start() and .stop()), we would put them at iter_unmark_lpis() without checking if they were actually *marked*, which is obviously not good. Switch to use the xa_for_each_marked() iterator to fix it. Cc: stable@vger.kernel.org Fixes: 85d3ccc8b75b ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240817101541.1664-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-08KVM: arm64: vgic: Hold config_lock while tearing down a CPU interfaceMarc Zyngier
Tearing down a vcpu CPU interface involves freeing the private interrupt array. If we don't hold the lock, we may race against another thread trying to configure it. Yeah, fuzzers do wonderful things... Taking the lock early solves this particular problem. Fixes: 03b3d00a70b5 ("KVM: arm64: vgic: Allocate private interrupts on demand") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240808091546.3262111-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-07KVM: arm64: vgic-debug: Exit the iterator properly w/o LPIZenghui Yu
In case the guest doesn't have any LPI, we previously relied on the iterator setting 'intid = nr_spis + VGIC_NR_PRIVATE_IRQS' && 'lpi_idx = 1' to exit the iterator. But it was broken with commit 85d3ccc8b75b ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") -- the intid remains at 'nr_spis + VGIC_NR_PRIVATE_IRQS - 1', and we end up endlessly printing the last SPI's state. Consider that it's meaningless to search the LPI xarray and populate lpi_idx when there is no LPI, let's just skip the process for that case. The result is that * If there's no LPI, we focus on the intid and exit the iterator when it runs out of the valid SPI range. * Otherwise we keep the current logic and let the xarray drive the iterator. Fixes: 85d3ccc8b75b ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240807052024.2084-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-02KVM: arm64: vgic: fix unexpected unlock sparse warningsSebastian Ott
Get rid of unexpected unlock sparse warnings in vgic code by adding an annotation to vgic_queue_irq_unlock(). arch/arm64/kvm/vgic/vgic.c:334:17: warning: context imbalance in 'vgic_queue_irq_unlock' - unexpected unlock arch/arm64/kvm/vgic/vgic.c:419:5: warning: context imbalance in 'kvm_vgic_inject_irq' - different lock contexts for basic block Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240723101204.7356-4-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-08-02KVM: arm64: fix kdoc warnings in W=1 buildsSebastian Ott
Fix kdoc warnings by adding missing function parameter descriptions or by conversion to a normal comment. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240723101204.7356-3-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-06KVM: arm64: Disassociate vcpus from redistributor region on teardownMarc Zyngier
When tearing down a redistributor region, make sure we don't have any dangling pointer to that region stored in a vcpu. Fixes: e5a35635464b ("kvm: arm64: vgic-v3: Introduce vgic_v3_free_redist_region()") Reported-by: Alexander Potapenko <glider@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240605175637.1635653-1-maz@kernel.org Cc: stable@vger.kernel.org
2024-05-12Merge tag 'kvmarm-6.10-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.10 - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions also require emulating part of the pointer authentication extension. As a result, the trap handling of pointer authentication has been greattly simplified. - Turn the global (and not very scalable) LPI translation cache into a per-ITS, scalable cache, making non directly injected LPIs much cheaper to make visible to the vcpu. - A batch of pKVM patches, mostly fixes and cleanups, as the upstreaming process seems to be resuming. Fingers crossed! - Allocate PPIs and SGIs outside of the vcpu structure, allowing for smaller EL2 mapping and some flexibility in implementing more or less than 32 private IRQs. - Purge stale mpidr_data if a vcpu is created after the MPIDR map has been created. - Preserve vcpu-specific ID registers across a vcpu reset. - Various minor cleanups and improvements.
2024-05-08Merge branch kvm-arm64/misc-6.10 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.10: : . : Misc fixes and updates targeting 6.10 : : - Improve boot-time diagnostics when the sysreg tables : are not correctly sorted : : - Allow FFA_MSG_SEND_DIRECT_REQ in the FFA proxy : : - Fix duplicate XNX field in the ID_AA64MMFR1_EL1 : writeable mask : : - Allocate PPIs and SGIs outside of the vcpu structure, allowing : for smaller EL2 mapping and some flexibility in implementing : more or less than 32 private IRQs. : : - Use bitmap_gather() instead of its open-coded equivalent : : - Make protected mode use hVHE if available : : - Purge stale mpidr_data if a vcpu is created after the MPIDR : map has been created : . KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather() KVM: arm64: vgic: Allocate private interrupts on demand KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNX KVM: arm64: Remove FFA_MSG_SEND_DIRECT_REQ from the denylist KVM: arm64: Improve out-of-order sysreg table diagnostics Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03Merge branch kvm-arm64/pkvm-6.10 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-6.10: (25 commits) : . : At last, a bunch of pKVM patches, courtesy of Fuad Tabba. : From the cover letter: : : "This series is a bit of a bombay-mix of patches we've been : carrying. There's no one overarching theme, but they do improve : the code by fixing existing bugs in pKVM, refactoring code to : make it more readable and easier to re-use for pKVM, or adding : functionality to the existing pKVM code upstream." : . KVM: arm64: Force injection of a data abort on NISV MMIO exit KVM: arm64: Restrict supported capabilities for protected VMs KVM: arm64: Refactor setting the return value in kvm_vm_ioctl_enable_cap() KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst KVM: arm64: Rename firmware pseudo-register documentation file KVM: arm64: Reformat/beautify PTP hypercall documentation KVM: arm64: Clarify rationale for ZCR_EL1 value restored on guest exit KVM: arm64: Introduce and use predicates that check for protected VMs KVM: arm64: Add is_pkvm_initialized() helper KVM: arm64: Simplify vgic-v3 hypercalls KVM: arm64: Move setting the page as dirty out of the critical section KVM: arm64: Change kvm_handle_mmio_return() return polarity KVM: arm64: Fix comment for __pkvm_vcpu_init_traps() KVM: arm64: Prevent kmemleak from accessing .hyp.data KVM: arm64: Do not map the host fpsimd state to hyp in pKVM KVM: arm64: Rename __tlb_switch_to_{guest,host}() in VHE KVM: arm64: Support TLB invalidation in guest context KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE KVM: arm64: Check for PTE validity when checking for executable/cacheable KVM: arm64: Avoid BUG-ing from the host abort path ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-03KVM: arm64: vgic: Allocate private interrupts on demandMarc Zyngier
Private interrupts are currently part of the CPU interface structure that is part of each and every vcpu we create. Currently, we have 32 of them per vcpu, resulting in a per-vcpu array that is just shy of 4kB. On its own, that's no big deal, but it gets in the way of other things: - each vcpu gets mapped at EL2 on nVHE/hVHE configurations. This requires memory that is physically contiguous. However, the EL2 code has no purpose looking at the interrupt structures and could do without them being mapped. - supporting features such as EPPIs, which extend the number of private interrupts past the 32 limit would make the array even larger, even for VMs that do not use the EPPI feature. Address these issues by moving the private interrupt array outside of the vcpu, and replace it with a simple pointer. We take this opportunity to make it obvious what gets initialised when, as that path was remarkably opaque, and tighten the locking. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240502154545.3012089-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>