summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm
AgeCommit message (Collapse)Author
2025-05-31Merge tag 'gcc-minimum-version-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull compiler version requirement update from Arnd Bergmann: "Require gcc-8 and binutils-2.30 x86 already uses gcc-8 as the minimum version, this changes all other architectures to the same version. gcc-8 is used is Debian 10 and Red Hat Enterprise Linux 8, both of which are still supported, and binutils 2.30 is the oldest corresponding version on those. Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as the system compiler but additionally include toolchains that remain supported. With the new minimum toolchain versions, a number of workarounds for older versions can be dropped, in particular on x86_64 and arm64. Importantly, the updated compiler version allows removing two of the five remaining gcc plugins, as support for sancov and structeak features is already included in modern compiler versions. I tried collecting the known changes that are possible based on the new toolchain version, but expect that more cleanups will be possible. Since this touches multiple architectures, I merged the patches through the asm-generic tree." * tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: Makefile.kcov: apply needed compiler option unconditionally in CFLAGS_KCOV Documentation: update binutils-2.30 version reference gcc-plugins: remove SANCOV gcc plugin Kbuild: remove structleak gcc plugin arm64: drop binutils version checks raid6: skip avx512 checks kbuild: require gcc-8 and binutils-2.30
2025-05-26Merge tag 'kvmarm-6.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups.
2025-05-23Merge branch kvm-arm64/misc-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.16: : . : Misc changes and improvements for 6.16: : : - Add a new selftest for the SVE host state being corrupted by a guest : : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, : ensuring that the window for interrupts is slightly bigger, and avoiding : a pretty bad erratum on the AmpereOne HW : : - Replace a couple of open-coded on/off strings with str_on_off() : : - Get rid of the pKVM memblock sorting, which now appears to be superflous : : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting : : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from : a pretty bad case of TLB corruption unless accesses to HCR_EL2 are : heavily synchronised : : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables : in a human-friendly fashion : . KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1 KVM: arm64: Drop sort_memblock_regions() KVM: arm64: selftests: Add test for SVE host corruption KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Replace ternary flags with str_on_off() helper Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/nv-nv into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/at-fixes-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/at-fixes-6.16: : . : Set of fixes for Address Translation (AT) instruction emulation, : which affect the (not yet upstream) NV support. : : From the cover letter: : : "Here's a small series of fixes for KVM's implementation of address : translation (aka the AT S1* instructions), addressing a number of : issues in increasing levels of severity: : : - We misreport PAR_EL1.PTW in a number of occasions, including state : that is not possible as per the architecture definition : : - We don't handle access faults at all, and that doesn't play very : well with the rest of the VNCR stuff : : - AT S1E{0,1} from EL2 with HCR_EL2.{E2H,TGE}={1,1} will absolutely : take the host down, no questions asked" : . KVM: arm64: Don't feed uninitialised data to HCR_EL2 KVM: arm64: Teach address translation about access faults KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E* Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/fgt-masks into kvmarm-master/nextMarc Zyngier
* kvm-arm64/fgt-masks: (43 commits) : . : Large rework of the way KVM deals with trap bits in conjunction with : the CPU feature registers. It now draws a direct link between which : the feature set, the system registers that need to UNDEF to match : the configuration and bits that need to behave as RES0 or RES1 in : the trap registers that are visible to the guest. : : Best of all, these definitions are mostly automatically generated : from the JSON description published by ARM under a permissive : license. : . KVM: arm64: Handle TSB CSYNC traps KVM: arm64: Add FGT descriptors for FEAT_FGT2 KVM: arm64: Allow sysreg ranges for FGT descriptors KVM: arm64: Add context-switch for FEAT_FGT2 registers KVM: arm64: Add trap routing for FEAT_FGT2 registers KVM: arm64: Add sanitisation for FEAT_FGT2 registers KVM: arm64: Add FEAT_FGT2 registers to the VNCR page KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits KVM: arm64: Allow kvm_has_feat() to take variable arguments KVM: arm64: Use FGT feature maps to drive RES0 bits KVM: arm64: Validate FGT register descriptions against RES0 masks KVM: arm64: Switch to table-driven FGU configuration KVM: arm64: Handle PSB CSYNC traps KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask KVM: arm64: Remove hand-crafted masks for FGT registers KVM: arm64: Use computed FGT masks to setup FGT registers KVM: arm64: Propagate FGT masks to the nVHE hypervisor KVM: arm64: Unconditionally configure fine-grain traps KVM: arm64: Use computed masks as sanitisers for FGT registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/mte-frac into kvmarm-master/nextMarc Zyngier
* kvm-arm64/mte-frac: : . : Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest, : courtesy of Ben Horgan. From the cover letter: : : "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM. : However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0 : indicates that MTE_ASYNC is supported. On a host with : ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the : MTE capability enabled will incorrectly see MTE_ASYNC advertised as : supported. This series fixes that." : . KVM: selftests: Confirm exposing MTE_frac does not break migration KVM: arm64: Make MTE_frac masking conditional on MTE capability arm64/sysreg: Expose MTE_frac so that it is visible to KVM Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23Merge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-np-thp-6.16: (21 commits) : . : Large mapping support for non-protected pKVM guests, courtesy of : Vincent Donnefort. From the cover letter: : : "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM : np-guests, that is installing PMD-level mappings in the stage-2, : whenever the stage-1 is backed by either Hugetlbfs or THPs." : . KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-22KVM: arm64: Fix documentation for vgic_its_iter_next()Marc Zyngier
As reported by the build robot, the documentation for vgic_its_iter_next() contains a typo. Fix it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505221421.KAuWlmSr-lkp@intel.com/ Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: np-guest CMOs with PMD_SIZE fixmapVincent Donnefort
With the introduction of stage-2 huge mappings in the pKVM hypervisor, guest pages CMO is needed for PMD_SIZE size. Fixmap only supports PAGE_SIZE and iterating over the huge-page is time consuming (mostly due to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency. Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap) to improve guest page CMOs when stage-2 huge mappings are installed. On a Pixel6, the iterative solution resulted in a latency of ~700us, while the PMD_SIZE fixmap reduces it to ~100us. Because of the horrendous private range allocation that would be necessary, this is disabled for 64KiB pages systems. Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Stage-2 huge mappings for np-guestsVincent Donnefort
Now np-guests hypercalls with range are supported, we can let the hypervisor to install block mappings whenever the Stage-1 allows it, that is when backed by either Hugetlbfs or THPs. The size of those block mappings is limited to PMD_SIZE. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to pkvm_mappingsQuentin Perret
In preparation for supporting stage-2 huge mappings for np-guest, add a nr_pages member for pkvm_mappings to allow EL1 to track the size of the stage-2 mapping. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Convert pkvm_mappings to interval treeQuentin Perret
In preparation for supporting stage-2 huge mappings for np-guest, let's convert pgt.pkvm_mappings to an interval tree. No functional change intended. Suggested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_unshare_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Add a range to __pkvm_host_share_guest()Vincent Donnefort
In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_share_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Introduce for_each_hyp_pageVincent Donnefort
Add a helper to iterate over the hypervisor vmemmap. This will be particularly handy with the introduction of huge mapping support for the np-guest stage-2. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: Handle huge mappings for np-guest CMOsVincent Donnefort
clean_dcache_guest_page() and invalidate_icache_guest_page() accept a size as an argument. But they also rely on fixmap, which can only map a single PAGE_SIZE page. With the upcoming stage-2 huge mappings for pKVM np-guests, those callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis until the whole range is done. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16Marc Zyngier
* kvm-arm64/pkvm-selftest-6.16: : . : pKVM selftests covering the memory ownership transitions by : Quentin Perret. From the initial cover letter: : : "We have recently found a bug [1] in the pKVM memory ownership : transitions by code inspection, but it could have been caught with a : test. : : Introduce a boot-time selftest exercising all the known pKVM memory : transitions and importantly checks the rejection of illegal transitions. : : The new test is hidden behind a new Kconfig option separate from : CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the : transition checks ([1] doesn't reproduce with EL2 debug enabled). : : [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/" : . KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical sectionMarc Zyngier
The conversion to kvm_release_faultin_page() missed the requirement for this to be called within a critical section with mmu_lock held for write. Move this call up to satisfy this requirement. Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock heldMarc Zyngier
Calling invalidate_vncr_va() without the mmu_lock held for write is a bad idea, and lockdep tells you about that. Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translatingMarc Zyngier
When translating a VNCR translation fault, we start by marking the current SW-managed TLB as invalid, so that we can populate it in place. This is, however, done without the mmu_lock held. A consequence of this is that another CPU dealing with TLBI emulation can observe a translation still flagged as valid, but with invalid walk results (such as pgshift being 0). Bad things can result from this, such as a BUG() in pgshift_level_to_ttl(). Fix it by taking the mmu_lock for write to perform this local invalidation, and use invalidate_vncr() instead of open-coding the write to the 'valid' flag. Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: vgic-its: Add debugfs interface to expose ITS tablesJing Zhang
This commit introduces a debugfs interface to display the contents of the VGIC Interrupt Translation Service (ITS) tables. The ITS tables map Device/Event IDs to Interrupt IDs and target processors. Exposing this information through debugfs allows for easier inspection and debugging of the interrupt routing configuration. The debugfs interface presents the ITS table data in a tabular format: Device ID: 0x0, Event ID Range: [0 - 31] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8192 0 0 0 0 1 8193 0 0 0 0 2 8194 0 2 2 0 Device ID: 0x18, Event ID Range: [0 - 3] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8225 0 0 0 0 1 8226 0 1 1 0 2 8227 0 3 3 0 Device ID: 0x10, Event ID Range: [0 - 7] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8229 0 3 3 1 1 8230 0 0 0 1 2 8231 0 1 1 1 3 8232 0 2 2 1 4 8233 0 3 3 1 The output is generated using the seq_file interface, allowing for efficient handling of potentially large ITS tables. This interface is read-only and does not allow modification of the ITS tables. It is intended for debugging and informational purposes only. Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20250220224247.2017205-1-jingzhangos@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19arm64: errata: Work around AmpereOne's erratum AC04_CPU_23D Scott Phillips
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous translations for data addresses initiated by load/store instructions. Only instruction initiated translations are vulnerable, not translations from prefetches for example. A DSB before the store to HCR_EL2 is sufficient to prevent older instructions from hitting the window for corruption, and an ISB after is sufficient to prevent younger instructions from hitting the window for corruption. Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Handle TSB CSYNC trapsMarc Zyngier
The architecture introduces a trap for TSB CSYNC that fits in the same EC as LS64 and PSB CSYNC. Let's deal with it in a similar way. It's not that we expect this to be useful any time soon anyway. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add FGT descriptors for FEAT_FGT2Marc Zyngier
Bulk addition of all the FGT2 traps reported with EC == 0x18, as described in the 2025-03 JSON drop. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Allow sysreg ranges for FGT descriptorsMarc Zyngier
Just like we allow sysreg ranges for Coarse Grained Trap descriptors, allow them for Fine Grain Traps as well. This comes with a warning that not all ranges are suitable for this particular definition of ranges. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add context-switch for FEAT_FGT2 registersMarc Zyngier
Just like the rest of the FGT registers, perform a switch of the FGT2 equivalent. This avoids the host configuration leaking into the guest... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add trap routing for FEAT_FGT2 registersMarc Zyngier
Similarly to the FEAT_FGT registers, pick the correct FEAT_FGT2 register when a sysreg trap indicates they could be responsible for the exception. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Add sanitisation for FEAT_FGT2 registersMarc Zyngier
Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bitsMarc Zyngier
Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. An additional complexity stems from the status of some bits such as E2H and RW, which do not had a RESx status, but still take a fixed value due to implementation choices in KVM. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bitsMarc Zyngier
Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Use FGT feature maps to drive RES0 bitsMarc Zyngier
Another benefit of mapping bits to features is that it becomes trivial to define which bits should be handled as RES0. Let's apply this principle to the guest's view of the FGT registers. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*Marc Zyngier
Since we're (almost) feature complete, let's allow userspace to request KVM_ARM_VCPU_EL2* by bumping KVM_VCPU_MAX_FEATURES up. We also now advertise the features to userspace with new capabilities. It's going to be great... Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250514103501.2225951-17-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Remove dead code from ERET handlingMarc Zyngier
Cleanly, this code cannot trigger, since we filter this from the caller. Drop it. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-16-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatchMarc Zyngier
Now that we have to handle TLBI S1E2 in the core code, plumb the sysinsn dispatch code into it, so that these instructions don't just UNDEF anymore. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-15-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2Marc Zyngier
A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR and potentially knock the fixmap mapping. Even worse, that TLBI must be able to work cross-vcpu. For that, we track on a per-VM basis if any VNCR is mapped, using an atomic counter. Whenever a TLBI S1E2 occurs and that this counter is non-zero, we take the long road all the way back to the core code. There, we iterate over all vcpus and check whether this particular invalidation has any damaging effect. If it does, we nuke the pseudo TLB and the corresponding fixmap. Yes, this is costly. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap addressMarc Zyngier
Since we now have a way to map the guest's VNCR_EL2 on the host, we can point the host's VNCR_EL2 to it and go full circle! Note that we unconditionally assign the fixmap to VNCR_EL2, irrespective of the guest's version being mapped or not. We want to take a fault on first access, so the fixmap either contains something guranteed to be either invalid or a guest mapping. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-13-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiersMarc Zyngier
During an invalidation triggered by an MMU notifier, we need to make sure we can drop the *host* mapping that would have been translated by the stage-2 mapping being invalidated. For the moment, the invalidation is pretty brutal, as we nuke the full IPA range, and therefore any VNCR_EL2 mapping. At some point, we'll be more light-weight, and the code is able to deal with something more targetted. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2Marc Zyngier
Now that we can handle faults triggered through VNCR_EL2, we need to map the corresponding page at EL2. But where, you'll ask? Since each CPU in the system can run a vcpu, we need a per-CPU mapping. For that, we carve a NR_CPUS range in the fixmap, giving us a per-CPU va at which to map the guest's VNCR's page. The mapping occurs both on vcpu load and on the back of a fault, both generating a request that will take care of the mapping. That mapping will also get dropped on vcpu put. Yes, this is a bit heavy handed, but it is simple. Eventually, we may want to have a per-VM, per-CPU mapping, which would avoid all the TLBI overhead. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Handle VNCR_EL2-triggered faultsMarc Zyngier
As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults. These faults can have multiple source: - We haven't mapped anything on the host: we need to compute the resulting translation, populate a TLB, and eventually map the corresponding page - The permissions are out of whack: we need to tell the guest about this state of affairs Note that the kernel doesn't support S1POE for itself yet, so the particular case of a VNCR page mapped with no permissions or with write-only permissions is not correctly handled yet. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2Marc Zyngier
Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits, and make it accessible to userspace when the VM is configured to support FEAT_NV2. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2Marc Zyngier
FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR is a virtual address in the EL2&0 (or EL2, but we thankfully ignore this) translation regime. As we need to replicate such mapping in the real EL2, it means that we need to remember that there is such a translation, and that any TLBI affecting EL2 can possibly affect this translation. It also means that any invalidation driven by an MMU notifier must be able to shoot down any such mapping. All in all, we need a data structure that represents this mapping, and that is extremely close to a TLB. Given that we can only use one of those per vcpu at any given time, we only allocate one. No effort is made to keep that structure small. If we need to start caching multiple of them, we may want to revisit that design point. But for now, it is kept simple so that we can reason about it. Oh, and add a braindump of how things are supposed to work, because I will definitely page this out at some point. Yes, pun intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nestingMarc Zyngier
We currently check for HCR_EL2.NV being set to decide whether we need to repaint PSTATE.M to say EL2 instead of EL1 on exit. However, this isn't correct when L2 is itself a hypervisor, and that L1 as set its own HCR_EL2.NV. That's because we "flatten" the state and inherit parts of the guest's own setup. In that case, we shouldn't adjust PSTATE.M, as this is really EL1 for both us and the guest. Instead of trying to try and work out how we ended-up with HCR_EL2.NV being set by introspecting both the host and guest states, use a per-CPU flag to remember the context (HYP or not), and use that information to decide whether PSTATE needs tweaking. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Move TLBI range decoding to a helperMarc Zyngier
As we are about to expand out TLB invalidation capabilities to support recursive virtualisation, move the decoding of a TLBI by range into a helper that returns the base, the range and the ASID. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Snapshot S1 ASID tagging information during walkMarc Zyngier
We currently completely ignore any sort of ASID tagging during a S1 walk, as AT doesn't care about it. However, such information is required if we are going to create anything that looks like a TLB from this walk. Let's capture it both the nG and ASID information while walking the page tables. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Extract translation helper from the AT codeMarc Zyngier
The address translation infrastructure is currently pretty tied to the AT emulation. However, we also need to features that require the use of VAs, such as VNCR_EL2 (and maybe one of these days SPE), meaning that we need a slightly more generic infrastructure. Start this by introducing a new helper (__kvm_translate_va()) that performs a S1 walk for a given translation regime, EL and PAN settings. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19KVM: arm64: nv: Allocate VNCR page when requiredMarc Zyngier
If running a NV guest on an ARMv8.4-NV capable system, let's allocate an additional page that will be used by the hypervisor to fulfill system register accesses. Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250514103501.2225951-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>