Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.16
* New features:
- Add large stage-2 mapping support for non-protected pKVM guests,
clawing back some performance.
- Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and
protected modes.
- Enable nested virtualisation support on systems that support it
(yes, it has been a long time coming), though it is disabled by
default.
* Improvements, fixes and cleanups:
- Large rework of the way KVM tracks architecture features and links
them with the effects of control bits. This ensures correctness of
emulation (the data is automatically extracted from the published
JSON files), and helps dealing with the evolution of the
architecture.
- Significant changes to the way pKVM tracks ownership of pages,
avoiding page table walks by storing the state in the hypervisor's
vmemmap. This in turn enables the THP support described above.
- New selftest checking the pKVM ownership transition rules
- Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
even if the host didn't have it.
- Fixes for the address translation emulation, which happened to be
rather buggy in some specific contexts.
- Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
from the number of counters exposed to a guest and addressing a
number of issues in the process.
- Add a new selftest for the SVE host state being corrupted by a
guest.
- Keep HCR_EL2.xMO set at all times for systems running with the
kernel at EL2, ensuring that the window for interrupts is slightly
bigger, and avoiding a pretty bad erratum on the AmpereOne HW.
- Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
from a pretty bad case of TLB corruption unless accesses to HCR_EL2
are heavily synchronised.
- Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
tables in a human-friendly fashion.
- and the usual random cleanups.
|
|
* kvm-arm64/misc-6.16:
: .
: Misc changes and improvements for 6.16:
:
: - Add a new selftest for the SVE host state being corrupted by a guest
:
: - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2,
: ensuring that the window for interrupts is slightly bigger, and avoiding
: a pretty bad erratum on the AmpereOne HW
:
: - Replace a couple of open-coded on/off strings with str_on_off()
:
: - Get rid of the pKVM memblock sorting, which now appears to be superflous
:
: - Drop superflous clearing of ICH_LR_EOI in the LR when nesting
:
: - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from
: a pretty bad case of TLB corruption unless accesses to HCR_EL2 are
: heavily synchronised
:
: - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables
: in a human-friendly fashion
: .
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables
arm64: errata: Work around AmpereOne's erratum AC04_CPU_23
KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1
KVM: arm64: Drop sort_memblock_regions()
KVM: arm64: selftests: Add test for SVE host corruption
KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode
KVM: arm64: Replace ternary flags with str_on_off() helper
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-nv:
: .
: Flick the switch on the NV support by adding the missing piece
: in the form of the VNCR page management. From the cover letter:
:
: "This is probably the most interesting bit of the whole NV adventure.
: So far, everything else has been a walk in the park, but this one is
: where the real fun takes place.
:
: With FEAT_NV2, most of the NV support revolves around tricking a guest
: into accessing memory while it tries to access system registers. The
: hypervisor's job is to handle the context switch of the actual
: registers with the state in memory as needed."
: .
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
KVM: arm64: Document NV caps and vcpu flags
KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
KVM: arm64: nv: Remove dead code from ERET handling
KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
KVM: arm64: nv: Handle VNCR_EL2-triggered faults
KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
KVM: arm64: nv: Move TLBI range decoding to a helper
KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
KVM: arm64: nv: Extract translation helper from the AT code
KVM: arm64: nv: Allocate VNCR page when required
arm64: sysreg: Add layout for VNCR_EL2
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/fgt-masks: (43 commits)
: .
: Large rework of the way KVM deals with trap bits in conjunction with
: the CPU feature registers. It now draws a direct link between which
: the feature set, the system registers that need to UNDEF to match
: the configuration and bits that need to behave as RES0 or RES1 in
: the trap registers that are visible to the guest.
:
: Best of all, these definitions are mostly automatically generated
: from the JSON description published by ARM under a permissive
: license.
: .
KVM: arm64: Handle TSB CSYNC traps
KVM: arm64: Add FGT descriptors for FEAT_FGT2
KVM: arm64: Allow sysreg ranges for FGT descriptors
KVM: arm64: Add context-switch for FEAT_FGT2 registers
KVM: arm64: Add trap routing for FEAT_FGT2 registers
KVM: arm64: Add sanitisation for FEAT_FGT2 registers
KVM: arm64: Add FEAT_FGT2 registers to the VNCR page
KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
KVM: arm64: Allow kvm_has_feat() to take variable arguments
KVM: arm64: Use FGT feature maps to drive RES0 bits
KVM: arm64: Validate FGT register descriptions against RES0 masks
KVM: arm64: Switch to table-driven FGU configuration
KVM: arm64: Handle PSB CSYNC traps
KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask
KVM: arm64: Remove hand-crafted masks for FGT registers
KVM: arm64: Use computed FGT masks to setup FGT registers
KVM: arm64: Propagate FGT masks to the nVHE hypervisor
KVM: arm64: Unconditionally configure fine-grain traps
KVM: arm64: Use computed masks as sanitisers for FGT registers
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/ubsan-el2:
: .
: Add UBSAN support to the EL2 portion of KVM, reusing most of the
: existing logic provided by CONFIG_IBSAN_TRAP.
:
: Patches courtesy of Mostafa Saleh.
: .
KVM: arm64: Handle UBSAN faults
KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2
ubsan: Remove regs from report_ubsan_failure()
arm64: Introduce esr_is_ubsan_brk()
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
With the introduction of stage-2 huge mappings in the pKVM hypervisor,
guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.
Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
to improve guest page CMOs when stage-2 huge mappings are installed.
On a Pixel6, the iterative solution resulted in a latency of ~700us,
while the PMD_SIZE fixmap reduces it to ~100us.
Because of the horrendous private range allocation that would be
necessary, this is disabled for 64KiB pages systems.
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now np-guests hypercalls with range are supported, we can let the
hypervisor to install block mappings whenever the Stage-1 allows it,
that is when backed by either Hugetlbfs or THPs. The size of those block
mappings is limited to PMD_SIZE.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall.
This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is
512 on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This
range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512
on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_share_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add a helper to iterate over the hypervisor vmemmap. This will be
particularly handy with the introduction of huge mapping support
for the np-guest stage-2.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
size as an argument. But they also rely on fixmap, which can only map a
single PAGE_SIZE page.
With the upcoming stage-2 huge mappings for pKVM np-guests, those
callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
until the whole range is done.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-selftest-6.16:
: .
: pKVM selftests covering the memory ownership transitions by
: Quentin Perret. From the initial cover letter:
:
: "We have recently found a bug [1] in the pKVM memory ownership
: transitions by code inspection, but it could have been caught with a
: test.
:
: Introduce a boot-time selftest exercising all the known pKVM memory
: transitions and importantly checks the rejection of illegal transitions.
:
: The new test is hidden behind a new Kconfig option separate from
: CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
: transition checks ([1] doesn't reproduce with EL2 debug enabled).
:
: [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/"
: .
KVM: arm64: Extend pKVM selftest for np-guests
KVM: arm64: Selftest for pKVM transitions
KVM: arm64: Don't WARN from __pkvm_host_share_guest()
KVM: arm64: Add .hyp.data section
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous
translations for data addresses initiated by load/store instructions.
Only instruction initiated translations are vulnerable, not translations
from prefetches for example. A DSB before the store to HCR_EL2 is
sufficient to prevent older instructions from hitting the window for
corruption, and an ISB after is sufficient to prevent younger
instructions from hitting the window for corruption.
Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Just like the rest of the FGT registers, perform a switch of the
FGT2 equivalent. This avoids the host configuration leaking into
the guest...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Just like the FEAT_FGT registers, treat the FGT2 variant the same
way. THis is a large update, but a fairly mechanical one.
The config dependencies are extracted from the 2025-03 JSON drop.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR
and potentially knock the fixmap mapping. Even worse, that TLBI
must be able to work cross-vcpu.
For that, we track on a per-VM basis if any VNCR is mapped, using
an atomic counter. Whenever a TLBI S1E2 occurs and that this counter
is non-zero, we take the long road all the way back to the core code.
There, we iterate over all vcpus and check whether this particular
invalidation has any damaging effect. If it does, we nuke the pseudo
TLB and the corresponding fixmap.
Yes, this is costly.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Since we now have a way to map the guest's VNCR_EL2 on the host,
we can point the host's VNCR_EL2 to it and go full circle!
Note that we unconditionally assign the fixmap to VNCR_EL2,
irrespective of the guest's version being mapped or not. We want
to take a fault on first access, so the fixmap either contains
something guranteed to be either invalid or a guest mapping.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-13-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We currently check for HCR_EL2.NV being set to decide whether we
need to repaint PSTATE.M to say EL2 instead of EL1 on exit.
However, this isn't correct when L2 is itself a hypervisor, and
that L1 as set its own HCR_EL2.NV. That's because we "flatten"
the state and inherit parts of the guest's own setup. In that case,
we shouldn't adjust PSTATE.M, as this is really EL1 for both us
and the guest.
Instead of trying to try and work out how we ended-up with HCR_EL2.NV
being set by introspecting both the host and guest states, use
a per-CPU flag to remember the context (HYP or not), and use that
information to decide whether PSTATE needs tweaking.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-7-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
These masks are now useless, and can be removed.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Flip the hyervisor FGT configuration over to the computed FGT
masks.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add a new Kconfig CONFIG_UBSAN_KVM_EL2 for KVM which enables
UBSAN for EL2 code (in protected/nvhe/hvhe) modes.
This will re-use the same checks enabled for the kernel for
the hypervisor. The only difference is that for EL2 it always
emits a "brk" instead of implementing hooks as the hypervisor
can't print reports.
The KVM code will re-use the same code for the kernel
"report_ubsan_failure()" so #ifdefs are changed to also have this
code for CONFIG_UBSAN_KVM_EL2
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-4-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
I found this simple bug while preparing some patches for pKVM.
AFAICT, it should be harmless (besides crashing the kernel if it
was misbehaving)
Fixes: e94a7dea2972 ("KVM: arm64: Move host page ownership tracking to the hyp vmemmap")
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Link: https://lore.kernel.org/r/20250501162450.2784043-1-smostafa@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Rather than restoring HCRX_EL2 to a fixed value on vcpu exit,
perform a full save/restore of the register, ensuring that
we don't lose bits that would have been set at some point in
the host kernel lifetime, such as the GCSEn bit.
Fixes: ff5181d8a2a82 ("arm64/gcs: Provide basic EL2 setup to allow GCS usage at EL0 and EL1")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250430105916.3815157-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The nVHE hypervisor needs to have access to its own view of the FGT
masks, which unfortunately results in a bit of data duplication.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
... otherwise we can inherit the host configuration if this differs from
the KVM configuration.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[maz: simplified a couple of things]
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Treating HCRX_EL2 as yet another FGT register seems excessive, and
gets in a way of further improvements. It is actually simpler to
just be explicit about the masking, so just to that.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake.
It makes things hard to reason about, has the potential to
introduce bugs by giving a meaning to bits that are really reserved,
and is in general a bad description of the architecture.
Given that #defines are cheap, let's describe both registers as
intended by the architecture, and repaint all the existing uses.
Yes, this is painful.
The registers themselves are generated from the JSON file in
an automated way.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The pKVM selftest intends to test as many memory 'transitions' as
possible, so extend it to cover sharing pages with non-protected guests,
including in the case of multi-sharing.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-5-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We have recently found a bug [1] in the pKVM memory ownership
transitions by code inspection, but it could have been caught with a
test.
Introduce a boot-time selftest exercising all the known pKVM memory
transitions and importantly checks the rejection of illegal transitions.
The new test is hidden behind a new Kconfig option separate from
CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
transition checks ([1] doesn't reproduce with EL2 debug enabled).
[1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We currently WARN() if the host attempts to share a page that is not in
an acceptable state with a guest. This isn't strictly necessary and
makes testing much harder, so drop the WARN and make sure to propage the
error code instead.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The hypervisor has not needed its own .data section because all globals
were either .rodata or .bss. To avoid having to initialize future
data-structures at run-time, let's introduce add a .data section to the
hypervisor.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We keep setting and clearing these bits depending on the role of
the host kernel, mimicking what we do for nVHE. But that's actually
pretty pointless, as we always want physical interrupts to make it
to the host, at EL2.
This has also two problems:
- it prevents IRQs from being taken when these bits are cleared
if the implementation has chosen to implement these bits as
masks when HCR_EL2.{TGE,xMO}=={0,0}
- it triggers a bad erratum on the AmpereOne HW, which catches
fire on clearing these bits while an interrupt is being taken
(AC03_CPU_36).
Let's kill these two birds with a single stone, and permanently
set the xMO bits when running VHE. This involves a bit of surgery
on code paths that rely on flipping these bits on and off for
other purposes.
Note that the earliest setting of hcr_el2 (in the init_hcr_el2
macro) is left untouched as is runs extremely early, with interrupts
disabled, and soon enough overwritten with the final value containing
the xMO bits.
Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We keep setting and clearing these bits depending on the role of
the host kernel, mimicking what we do for nVHE. But that's actually
pretty pointless, as we always want physical interrupts to make it
to the host, at EL2.
This has also two problems:
- it prevents IRQs from being taken when these bits are cleared
if the implementation has chosen to implement these bits as
masks when HCR_EL2.{TGE,xMO}=={0,0}
- it triggers a bad erratum on the AmpereOne HW, which catches
fire on clearing these bits while an interrupt is being taken
(AC03_CPU_36).
Let's kill these two birds with a single stone, and permanently
set the xMO bits when running VHE. This involves a bit of surgery
on code paths that rely on flipping these bits on and off for
other purposes.
Note that the earliest setting of hcr_el2 (in the init_hcr_el2
macro) is left untouched as is runs extremely early, with interrupts
disabled, and soon enough overwritten with the final value containing
the xMO bits.
Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250429114326.3618875-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Now that the hypervisor's state is stored in the hyp_vmemmap, we no
longer need an expensive page-table walk to read it. This means we can
now afford to cross check the hyp-state during all memory ownership
transitions where the hyp is involved unconditionally, hence avoiding
problems such as [1].
[1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-8-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We currently blindly map into EL2 stage-1 *any* page passed to the
__pkvm_host_share_hyp() HVC. This is less than ideal from a security
perspective as it makes exploitation of potential hypervisor gadgets
easier than it should be. But interestingly, pKVM should never need to
access SHARED_BORROWED pages that it hasn't previously pinned, so there
is no need to map the page before that.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-7-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Tracking the hypervisor's ownership state into struct hyp_page has
several benefits, including allowing far more efficient lookups (no
page-table walk needed) and de-corelating the state from the presence
of a mapping. This will later allow to map pages into EL2 stage-1 less
proactively which is generally a good thing for security. And in the
future this will help with tracking the state of pages mapped into the
hypervisor's private range without requiring an alias into the 'linear
map' range.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-6-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Instead of directly accessing the host_state member in struct hyp_page,
introduce static inline accessors to do it. The future hyp_state member
will follow the same pattern as it will need some logic in the accessors.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-5-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The page ownership state encoded as 0b11 is currently considered
reserved for future use, and PKVM_NOPAGE uses bit 2. In order to
simplify the relocation of the hyp ownership state into the
vmemmap in later patches, let's use the 'reserved' encoding for
the PKVM_NOPAGE state. The struct hyp_page layout isn't guaranteed
stable at all, so there is no real reason to have 'reserved' encodings.
No functional changes intended.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Most of the comments relating to pKVM page-tracking in nvhe/memory.h are
now either slightly outdated or outright wrong. Fix the comments.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When dealing with a guest with SVE enabled, make sure the host SVE
state is pinned at EL2 S1, and that the hypervisor vCPU state is
correctly initialised (and then unpinned on teardown).
Co-authored-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416152648.2982950-2-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The pKVM FF-A proxy rejects FF-A requests other than FFA_VERSION until
version negotiation is complete, which is signalled by setting the
global 'has_version_negotiated' variable.
To avoid excessive locking, this variable is checked directly from
kvm_host_ffa_handler() in response to an FF-A call, but this can race
against another CPU performing the negotiation and potentially lead to
reading a torn value (incredibly unlikely for a 'bool') or problematic
re-ordering of the accesses to 'has_version_negotiated' and
'hyp_ffa_version' whereby a stale version number could be read by
__do_ffa_mem_xfer().
Use acquire/release primitives when writing 'has_version_negotiated'
with the version lock held and when reading without the lock held.
Cc: Sebastian Ene <sebastianene@google.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>
Fixes: c9c012625e12 ("KVM: arm64: Trap FFA_VERSION host call in pKVM")
Signed-off-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250407152755.1041-1-will@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Don't re-walk the page tables if an SEA occurred during the faulting
page table walk to avoid taking a fatal exception in the hyp.
Additionally, check that FAR_EL2 is valid for SEAs not taken on PTW
as the architecture doesn't guarantee it contains the fault VA.
Finally, fix up the rest of the abort path by checking for SEAs early
and bugging the VM if we get further along with an UNKNOWN fault IPA.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250402201725.2963645-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Switch over to the typical sysreg table for HPFAR_EL2 as we're about to
start using more fields in the register.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250402201725.2963645-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
KVM's logic for deciding when HPFAR_EL2 is UNKNOWN doesn't align with
the architecture. Most notably, KVM assumes HPFAR_EL2 contains the
faulting IPA even in the case of an SEA.
Align the logic with the architecture rather than attempting to
paraphrase it. Additionally, take the opportunity to improve the
language around ARM erratum #834220 such that it actually describes the
bug.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250402201725.2963645-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/pkvm-6.15:
: pKVM updates for 6.15
:
: - SecPageTable stats for stage-2 table pages allocated by the protected
: hypervisor (Vincent Donnefort)
:
: - HCRX_EL2 trap + vCPU initialization fixes for pKVM (Fuad Tabba)
KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
KVM: arm64: Initialize HCRX_EL2 traps in pKVM
KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
KVM: arm64: Count pKVM stage-2 usage in secondary pagetable stats
KVM: arm64: Distinct pKVM teardown memcache for stage-2
KVM: arm64: Add flags to kvm_hyp_memcache
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/writable-midr:
: Writable implementation ID registers, courtesy of Sebastian Ott
:
: Introduce a new capability that allows userspace to set the
: ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1,
: and AIDR_EL1. Also plug a hole in KVM's trap configuration where
: SMIDR_EL1 was readable at EL1, despite the fact that KVM does not
: support SME.
KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable
KVM: arm64: Copy guest CTR_EL0 into hyp VM
KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR
KVM: arm64: Allow userspace to change the implementation ID registers
KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value
KVM: arm64: Maintain per-VM copy of implementation ID regs
KVM: arm64: Set HCR_EL2.TID1 unconditionally
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/pmuv3-asahi:
: Support PMUv3 for KVM guests on Apple silicon
:
: Take advantage of some IMPLEMENTATION DEFINED traps available on Apple
: parts to trap-and-emulate the PMUv3 registers on behalf of a KVM guest.
: Constrain the vPMU to a cycle counter and single event counter, as the
: Apple PMU has events that cannot be counted on every counter.
:
: There is a small new interface between the ARM PMU driver and KVM, where
: the PMU driver owns the PMUv3 -> hardware event mappings.
arm64: Enable IMP DEF PMUv3 traps on Apple M*
KVM: arm64: Provide 1 event counter on IMPDEF hardware
drivers/perf: apple_m1: Provide helper for mapping PMUv3 events
KVM: arm64: Remap PMUv3 events onto hardware
KVM: arm64: Advertise PMUv3 if IMPDEF traps are present
KVM: arm64: Compute synthetic sysreg ESR for Apple PMUv3 traps
KVM: arm64: Move PMUVer filtering into KVM code
KVM: arm64: Use guard() to cleanup usage of arm_pmus_lock
KVM: arm64: Drop kvm_arm_pmu_available static key
KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3
KVM: arm64: Always support SW_INCR PMU event
KVM: arm64: Compute PMCEID from arm_pmu's event bitmaps
drivers/perf: apple_m1: Support host/guest event filtering
drivers/perf: apple_m1: Refactor event select/filter configuration
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/nv-vgic:
: NV VGICv3 support, courtesy of Marc Zyngier
:
: Support for emulating the GIC hypervisor controls and managing shadow
: VGICv3 state for the L1 hypervisor. As part of it, bring in support for
: taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt.
KVM: arm64: nv: Fail KVM init if asking for NV without GICv3
KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ
KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts
KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2
KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting
KVM: arm64: nv: Add Maintenance Interrupt emulation
KVM: arm64: nv: Handle L2->L1 transition on interrupt injection
KVM: arm64: nv: Nested GICv3 emulation
KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses
KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses
KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg
KVM: arm64: nv: Load timer before the GIC
arm64: sysreg: Add layout for ICH_MISR_EL2
arm64: sysreg: Add layout for ICH_VTR_EL2
arm64: sysreg: Add layout for ICH_HCR_EL2
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|