summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm
AgeCommit message (Collapse)Author
2025-05-16KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1Wei-Lin Chang
In the case of ICH_LR<n>.HW == 1, bit 41 of LR is just a part of pINTID without EOI meaning, and bit 41 will be zeroed by the subsequent clearing of ICH_LR_PHYS_ID_MASK anyway. No functional changes intended. Signed-off-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw> Link: https://lore.kernel.org/r/20250512133223.866999-1-r09922117@csie.ntu.edu.tw Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-16KVM: arm64: Make MTE_frac masking conditional on MTE capabilityBen Horgan
If MTE_frac is masked out unconditionally then the guest will always see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf (MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as supported whilst the host does not support it. Hence, expose the sanitised value of MTE_frac to the guest and user-space. As MTE_frac was previously hidden, always 0, and KVM must accept values from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to avoid incorrectly claiming hardware support for MTE_ASYNC in the guest. Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and wrongly assumes that MTE async faults can be generated even on hardware that does nto support them. This issue is not addressed here. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Link: https://lore.kernel.org/r/20250512114112.359087-3-ben.horgan@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-14KVM: arm64: Don't feed uninitialised data to HCR_EL2Marc Zyngier
When the guest executes an AT S1E{0,1} from EL2, and that its HCR_EL2.{E2H,TGE}=={1,1}, then this is a pure S1 translation that doesn't involve a guest-supplied S2, and the full S1 context is already in place. This allows us to take a shortcut and avoid save/restoring a bunch of registers. However, we set HCR_EL2 to a value suitable for the use of AT in guest context. And we do so by using the value that we saved. Or not. In the case described above, we restore whatever junk was on the stack, and carry on with it until the next entry. Needless to say, this is completely broken. But this also triggers the realisation that saving HCR_EL2 is a bit pointless. We are always in host context at the point where reach this code, and what we program to enter the guest is a known value (vcpu->arch.hcr_el2). Drop the pointless save/restore, and wrap the AT operations with writes that switch between guest and host values for HCR_EL2. Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250422122612.2675672-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-14KVM: arm64: Teach address translation about access faultsMarc Zyngier
It appears that our S1 PTW is completely oblivious of access faults. Teach the S1 translation code about it. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250422122612.2675672-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-14KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E*Marc Zyngier
When an AT S1E* operation fails, we need to report whether the translation failed at S2, and whether this was during a S1 PTW. But these two bits are not independent. PAR_EL1.PTW can only be set of PAR_EL1.S is also set, and PAR_EL1.S can only be set on its own when the full S1 PTW has succeeded, but that the access itself is reporting a fault at S2. As a result, it makes no sense to carry both ptw and s2 as parameters to fail_s1_walk(), and they should be unified. This fixes a number of cases where we were reporting PTW=1 *and* S=0, which makes no sense. Link: https://lore.kernel.org/r/20250422122612.2675672-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-10KVM: arm64: Validate FGT register descriptions against RES0 masksMarc Zyngier
In order to point out to the unsuspecting KVM hacker that they are missing something somewhere, validate that the known FGT bits do not intersect with the corresponding RES0 mask, as computed at boot time. THis check is also performed at boot time, ensuring that there is no runtime overhead. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-10KVM: arm64: Switch to table-driven FGU configurationMarc Zyngier
Defining the FGU behaviour is extremely tedious. It relies on matching each set of bits from FGT registers with am architectural feature, and adding them to the FGU list if the corresponding feature isn't advertised to the guest. It is however relatively easy to dump most of that information from the architecture JSON description, and use that to control the FGU bits. Let's introduce a new set of tables descripbing the mapping between FGT bits and features. Most of the time, this is only a lookup in an idreg field, with a few more complex exceptions. While this is obviously many more lines in a new file, this is mostly generated, and is pretty easy to maintain. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-10KVM: arm64: Handle PSB CSYNC trapsMarc Zyngier
The architecture introduces a trap for PSB CSYNC that fits in the same EC as LS64. Let's deal with it in a similar way as LS64. It's not that we expect this to be useful any time soon anyway. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-10KVM: arm64: Use KVM-specific HCRX_EL2 RES0 maskMarc Zyngier
We do not have a computed table for HCRX_EL2, so statically define the bits we know about. A warning will fire if the architecture grows bits that are not handled yet. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-10KVM: arm64: Remove hand-crafted masks for FGT registersMarc Zyngier
These masks are now useless, and can be removed. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-10KVM: arm64: Use computed FGT masks to setup FGT registersMarc Zyngier
Flip the hyervisor FGT configuration over to the computed FGT masks. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-08KVM: arm64: Drop sort_memblock_regions()Gavin Shan
Drop sort_memblock_regions() and avoid sorting the copied memory regions to be ascending order on their base addresses, because the source memory regions should have been sorted correctly when they are added by memblock_add() or its variants. This is generally reverting commit a14307f5310c ("KVM: arm64: Sort the hypervisor memblocks"). No functional changes intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250311043718.91004-1-gshan@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07KVM: arm64: Handle UBSAN faultsMostafa Saleh
As now UBSAN can be enabled, handle brk64 exits from UBSAN. Re-use the decoding code from the kernel, and panic with UBSAN message. Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-5-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2Mostafa Saleh
Add a new Kconfig CONFIG_UBSAN_KVM_EL2 for KVM which enables UBSAN for EL2 code (in protected/nvhe/hvhe) modes. This will re-use the same checks enabled for the kernel for the hypervisor. The only difference is that for EL2 it always emits a "brk" instead of implementing hooks as the hypervisor can't print reports. The KVM code will re-use the same code for the kernel "report_ubsan_failure()" so #ifdefs are changed to also have this code for CONFIG_UBSAN_KVM_EL2 Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250430162713.1997569-4-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-07KVM: arm64: Fix memory check in host_stage2_set_owner_locked()Mostafa Saleh
I found this simple bug while preparing some patches for pKVM. AFAICT, it should be harmless (besides crashing the kernel if it was misbehaving) Fixes: e94a7dea2972 ("KVM: arm64: Move host page ownership tracking to the hyp vmemmap") Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20250501162450.2784043-1-smostafa@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-07KVM: arm64: Properly save/restore HCRX_EL2Marc Zyngier
Rather than restoring HCRX_EL2 to a fixed value on vcpu exit, perform a full save/restore of the register, ensuring that we don't lose bits that would have been set at some point in the host kernel lifetime, such as the GCSEn bit. Fixes: ff5181d8a2a82 ("arm64/gcs: Provide basic EL2 setup to allow GCS usage at EL0 and EL1") Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250430105916.3815157-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-06KVM: arm64: Propagate FGT masks to the nVHE hypervisorMarc Zyngier
The nVHE hypervisor needs to have access to its own view of the FGT masks, which unfortunately results in a bit of data duplication. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Unconditionally configure fine-grain trapsMark Rutland
... otherwise we can inherit the host configuration if this differs from the KVM configuration. Signed-off-by: Mark Rutland <mark.rutland@arm.com> [maz: simplified a couple of things] Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Use computed masks as sanitisers for FGT registersMarc Zyngier
Now that we have computed RES0 bits, use them to sanitise the guest view of FGT registers. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Add description of FGT bits leading to EC!=0x18Marc Zyngier
The current FTP tables are only concerned with the bits generating ESR_ELx.EC==0x18. However, we want an exhaustive view of what KVM really knows about. So let's add another small table that provides that extra information. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Compute FGT masks from KVM's own FGT tablesMarc Zyngier
In the process of decoupling KVM's view of the FGT bits from the wider architectural state, use KVM's own FGT tables to build a synthetic view of what is actually known. This allows for some checking along the way. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Plug FEAT_GCS handlingMarc Zyngier
We don't seem to be handling the GCS-specific exception class. Handle it by delivering an UNDEF to the guest, and populate the relevant trap bits. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Don't treat HCRX_EL2 as a FGT registerMarc Zyngier
Treating HCRX_EL2 as yet another FGT register seems excessive, and gets in a way of further improvements. It is actually simpler to just be explicit about the masking, so just to that. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Restrict ACCDATA_EL1 undef to FEAT_LS64_ACCDATA being disabledMarc Zyngier
We currently unconditionally make ACCDATA_EL1 accesses UNDEF. As we are about to support it, restrict the UNDEF behaviour to cases where FEAT_LS64_ACCDATA is not exposed to the guest. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Handle trapping of FEAT_LS64* instructionsMarc Zyngier
We generally don't expect FEAT_LS64* instructions to trap, unless they are trapped by a guest hypervisor. Otherwise, this is just the guest playing tricks on us by using an instruction that isn't advertised, which we handle with a well deserved UNDEF. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Simplify handling of negative FGT bitsMarc Zyngier
check_fgt_bit() and triage_sysreg_trap() implement the same thing twice for no good reason. We have to lookup the FGT register twice, as we don't communicate it. Similarly, we extract the register value at the wrong spot. Reorganise the code in a more logical way so that things are done at the correct location, removing a lot of duplication. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Tighten handling of unknown FGT groupsMarc Zyngier
triage_sysreg_trap() assumes that it knows all the possible values for FGT groups, which won't be the case as we start adding more FGT registers (unless we add everything in one go, which is obviously undesirable). At the same time, it doesn't offer much in terms of debugging info when things go wrong. Turn the "__NR_FGT_GROUP_IDS__" case into a default, covering any unhandled value, and give the kernel hacker a bit of a clue about what's wrong (system register and full trap descriptor). Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06arm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2Marc Zyngier
Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake. It makes things hard to reason about, has the potential to introduce bugs by giving a meaning to bits that are really reserved, and is in general a bad description of the architecture. Given that #defines are cheap, let's describe both registers as intended by the architecture, and repaint all the existing uses. Yes, this is painful. The registers themselves are generated from the JSON file in an automated way. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Extend pKVM selftest for np-guestsQuentin Perret
The pKVM selftest intends to test as many memory 'transitions' as possible, so extend it to cover sharing pages with non-protected guests, including in the case of multi-sharing. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Selftest for pKVM transitionsQuentin Perret
We have recently found a bug [1] in the pKVM memory ownership transitions by code inspection, but it could have been caught with a test. Introduce a boot-time selftest exercising all the known pKVM memory transitions and importantly checks the rejection of illegal transitions. The new test is hidden behind a new Kconfig option separate from CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the transition checks ([1] doesn't reproduce with EL2 debug enabled). [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Don't WARN from __pkvm_host_share_guest()Quentin Perret
We currently WARN() if the host attempts to share a page that is not in an acceptable state with a guest. This isn't strictly necessary and makes testing much harder, so drop the WARN and make sure to propage the error code instead. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Add .hyp.data sectionDavid Brazdil
The hypervisor has not needed its own .data section because all globals were either .rodata or .bss. To avoid having to initialize future data-structures at run-time, let's introduce add a .data section to the hypervisor. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE modeMarc Zyngier
We keep setting and clearing these bits depending on the role of the host kernel, mimicking what we do for nVHE. But that's actually pretty pointless, as we always want physical interrupts to make it to the host, at EL2. This has also two problems: - it prevents IRQs from being taken when these bits are cleared if the implementation has chosen to implement these bits as masks when HCR_EL2.{TGE,xMO}=={0,0} - it triggers a bad erratum on the AmpereOne HW, which catches fire on clearing these bits while an interrupt is being taken (AC03_CPU_36). Let's kill these two birds with a single stone, and permanently set the xMO bits when running VHE. This involves a bit of surgery on code paths that rely on flipping these bits on and off for other purposes. Note that the earliest setting of hcr_el2 (in the init_hcr_el2 macro) is left untouched as is runs extremely early, with interrupts disabled, and soon enough overwritten with the final value containing the xMO bits. Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06KVM: arm64: Replace ternary flags with str_on_off() helperSeongsu Park
Replace repetitive ternary expressions with the str_on_off() helper function. This change improves code readability and ensures consistency in tracepoint string formatting Signed-off-by: Seongsu Park <sgsu.park@samsung.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1891546521.01744691102904.JavaMail.epsvc@epcpadp1new Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-05KVM: arm64: Prevent userspace from disabling AArch64 support at any ↵Marc Zyngier
virtualisable EL A sorry excuse for a selftest is trying to disable AArch64 support. And yes, this goes as well as you can imagine. Let's forbid this sort of things. Normal userspace shouldn't get caught doing that. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Link: https://lore.kernel.org/r/20250429114117.3618800-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE modeMarc Zyngier
We keep setting and clearing these bits depending on the role of the host kernel, mimicking what we do for nVHE. But that's actually pretty pointless, as we always want physical interrupts to make it to the host, at EL2. This has also two problems: - it prevents IRQs from being taken when these bits are cleared if the implementation has chosen to implement these bits as masks when HCR_EL2.{TGE,xMO}=={0,0} - it triggers a bad erratum on the AmpereOne HW, which catches fire on clearing these bits while an interrupt is being taken (AC03_CPU_36). Let's kill these two birds with a single stone, and permanently set the xMO bits when running VHE. This involves a bit of surgery on code paths that rely on flipping these bits on and off for other purposes. Note that the earliest setting of hcr_el2 (in the init_hcr_el2 macro) is left untouched as is runs extremely early, with interrupts disabled, and soon enough overwritten with the final value containing the xMO bits. Reported-by: D Scott Phillips <scott@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-05-05KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()Sebastian Ott
Commit fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") made the initialization of the local memcache variable in user_mem_abort() conditional, leaving a codepath where it is used uninitialized via kvm_pgtable_stage2_map(). This can fail on any path that requires a stage-2 allocation without transition via a permission fault or dirty logging. Fix this by making sure that memcache is always valid. Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/kvmarm/3f5db4c7-ccce-fb95-595c-692fa7aad227@redhat.com/ Link: https://lore.kernel.org/r/20250505173148.33900-1-sebott@redhat.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-04-30arm64: drop binutils version checksArnd Bergmann
Now that gcc-8 and binutils-2.30 are the minimum versions, a lot of the individual feature checks can go away for simplification. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-04-29Merge branch kvm-arm64/nv-pmu-fixes into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-pmu-fixes: : . : Fixes for NV PMU emulation. From the cover letter: : : "Joey reports that some of his PMU tests do not behave quite as : expected: : : - MDCR_EL2.HPMN is set to 0 out of reset : : - PMCR_EL0.P should reset all the counters when written from EL2 : : Oliver points out that setting PMCR_EL0.N from userspace by writing to : the register is silly with NV, and that we need a new PMU attribute : instead. : : On top of that, I figured out that we had a number of little gotchas: : : - It is possible for a guest to write an HPMN value that is out of : bound, and it seems valuable to limit it : : - PMCR_EL0.N should be the maximum number of counters when read from : EL2, and MDCR_EL2.HPMN when read from EL0/EL1 : : - Prevent userspace from updating PMCR_EL0.N when EL2 is available" : . KVM: arm64: Let kvm_vcpu_read_pmcr() return an EL-dependent value for PMCR_EL0.N KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMN KVM: arm64: Don't let userspace write to PMCR_EL0.N when the vcpu has EL2 KVM: arm64: Allow userspace to limit the number of PMU counters for EL2 VMs KVM: arm64: Contextualise the handling of PMCR_EL0.P writes KVM: arm64: Fix MDCR_EL2.HPMN reset value KVM: arm64: Repaint pmcr_n into nr_pmu_counters Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Unconditionally cross check hyp stateQuentin Perret
Now that the hypervisor's state is stored in the hyp_vmemmap, we no longer need an expensive page-table walk to read it. This means we can now afford to cross check the hyp-state during all memory ownership transitions where the hyp is involved unconditionally, hence avoiding problems such as [1]. [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-8-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Defer EL2 stage-1 mapping on shareQuentin Perret
We currently blindly map into EL2 stage-1 *any* page passed to the __pkvm_host_share_hyp() HVC. This is less than ideal from a security perspective as it makes exploitation of potential hypervisor gadgets easier than it should be. But interestingly, pKVM should never need to access SHARED_BORROWED pages that it hasn't previously pinned, so there is no need to map the page before that. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-7-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Move hyp state to hyp_vmemmapQuentin Perret
Tracking the hypervisor's ownership state into struct hyp_page has several benefits, including allowing far more efficient lookups (no page-table walk needed) and de-corelating the state from the presence of a mapping. This will later allow to map pages into EL2 stage-1 less proactively which is generally a good thing for security. And in the future this will help with tracking the state of pages mapped into the hypervisor's private range without requiring an alias into the 'linear map' range. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-6-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Introduce {get,set}_host_state() helpersQuentin Perret
Instead of directly accessing the host_state member in struct hyp_page, introduce static inline accessors to do it. The future hyp_state member will follow the same pattern as it will need some logic in the accessors. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-5-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Use 0b11 for encoding PKVM_NOPAGEQuentin Perret
The page ownership state encoded as 0b11 is currently considered reserved for future use, and PKVM_NOPAGE uses bit 2. In order to simplify the relocation of the hyp ownership state into the vmemmap in later patches, let's use the 'reserved' encoding for the PKVM_NOPAGE state. The struct hyp_page layout isn't guaranteed stable at all, so there is no real reason to have 'reserved' encodings. No functional changes intended. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Fix pKVM page-tracking commentsQuentin Perret
Most of the comments relating to pKVM page-tracking in nvhe/memory.h are now either slightly outdated or outright wrong. Fix the comments. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-28KVM: arm64: Track SVE state in the hypervisor vcpu structureFuad Tabba
When dealing with a guest with SVE enabled, make sure the host SVE state is pinned at EL2 S1, and that the hypervisor vCPU state is correctly initialised (and then unpinned on teardown). Co-authored-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-24KVM: arm64, x86: make kvm_arch_has_irq_bypass() inlinePaolo Bonzini
kvm_arch_has_irq_bypass() is a small function and even though it does not appear in any *really* hot paths, it's also not entirely rare. Make it inline---it also works out nicely in preparation for using it in kvm-intel.ko and kvm-amd.ko, since the function is not currently exported. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-11KVM: arm64: Let kvm_vcpu_read_pmcr() return an EL-dependent value for PMCR_EL0.NMarc Zyngier
When EL2 is present, PMCR_EL0.N is the effective value of MDCR_EL2.HPMN when accessed from EL1 or EL0. Make sure we honor this requirement. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Handle out-of-bound write to MDCR_EL2.HPMNMarc Zyngier
We don't really pay attention to what gets written to MDCR_EL2.HPMN, and funky guests could play ugly games on us. Restrict what gets written there, and limit the number of counters to what the PMU is allowed to have. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-04-11KVM: arm64: Don't let userspace write to PMCR_EL0.N when the vcpu has EL2Marc Zyngier
Now that userspace can provide its limit for hte maximum number of counters, prevent it from writing to PMCR_EL0.N, as the value should be derived from MDCR_EL2.HPMN in that case. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>