linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-19	Merge branch 'kvm-arm64/writable-midr' into kvmarm/next	Oliver Upton
	* kvm-arm64/writable-midr: : Writable implementation ID registers, courtesy of Sebastian Ott : : Introduce a new capability that allows userspace to set the : ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1, : and AIDR_EL1. Also plug a hole in KVM's trap configuration where : SMIDR_EL1 was readable at EL1, despite the fact that KVM does not : support SME. KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable KVM: arm64: Copy guest CTR_EL0 into hyp VM KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR KVM: arm64: Allow userspace to change the implementation ID registers KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value KVM: arm64: Maintain per-VM copy of implementation ID regs KVM: arm64: Set HCR_EL2.TID1 unconditionally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19	Merge branch 'kvm-arm64/pv-cpuid' into kvmarm/next	Oliver Upton
	* kvm-arm64/pv-cpuid: : Paravirtualized implementation ID, courtesy of Shameer Kolothum : : Big-little has historically been a pain in the ass to virtualize. The : implementation ID (MIDR, REVIDR, AIDR) of a vCPU can change at the whim : of vCPU scheduling. This can be particularly annoying when the guest : needs to know the underlying implementation to mitigate errata. : : "Hyperscalers" face a similar scheduling problem, where VMs may freely : migrate between hosts in a pool of heterogenous hardware. And yes, our : server-class friends are equally riddled with errata too. : : In absence of an architected solution to this wart on the ecosystem, : introduce support for paravirtualizing the implementation exposed : to a VM, allowing the VMM to describe the pool of implementations that a : VM may be exposed to due to scheduling/migration. : : Userspace is expected to intercept and handle these hypercalls using the : SMCCC filter UAPI, should it choose to do so. smccc: kvm_guest: Fix kernel builds for 32 bit arm KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2 smccc/kvm_guest: Enable errata based on implementation CPUs arm64: Make _midr_in_range_list() an exported function KVM: arm64: Introduce KVM_REG_ARM_VENDOR_HYP_BMAP_2 KVM: arm64: Specify hypercall ABI for retrieving target implementations arm64: Modify _midr_range() functions to read MIDR/REVIDR internally Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19	Merge branch 'kvm-arm64/nv-idregs' into kvmarm/next	Oliver Upton
	* kvm-arm64/nv-idregs: : Changes to exposure of NV features, courtesy of Marc Zyngier : : Apply NV-specific feature restrictions at reset rather than at the point : of KVM_RUN. This makes the true feature set visible to userspace, a : necessary step towards save/restore support or NV VMs. : : Add an additional vCPU feature flag for selecting the E2H0 flavor of NV, : such that the VHE-ness of the VM can be applied to the feature set. KVM: arm64: selftests: Test that TGRAN_2 fields are writable KVM: arm64: Allow userspace to write ID_AA64MMFR0_EL1.TGRAN_2 KVM: arm64: Advertise FEAT_ECV when possible KVM: arm64: Make ID_AA64MMFR4_EL1.NV_frac writable KVM: arm64: Allow userspace to limit NV support to nVHE KVM: arm64: Move NV-specific capping to idreg sanitisation KVM: arm64: Enforce NV limits on a per-idregs basis KVM: arm64: Make ID_REG_LIMIT_FIELD_ENUM() more widely available KVM: arm64: Consolidate idreg callbacks KVM: arm64: Advertise NV2 in the boot messages KVM: arm64: Mark HCR.EL2.{NV*,AT} RES0 when ID_AA64MMFR4_EL1.NV_frac is 0 KVM: arm64: Mark HCR.EL2.E2H RES0 when ID_AA64MMFR1_EL1.VH is zero KVM: arm64: Hide ID_AA64MMFR2_EL1.NV from guest and userspace arm64: cpufeature: Handle NV_frac as a synonym of NV2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-19	Merge tag 'kvm-x86-selftests-6.15' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM selftests changes for 6.15, part 2 - Fix a variety of flaws, bugs, and false failures/passes dirty_log_test, and improve its coverage by collecting all dirty entries on each iteration. - Fix a few minor bugs related to handling of stats FDs. - Add infrastructure to make vCPU and VM stats FDs available to tests by default (open the FDs during VM/vCPU creation). - Relax an assertion on the number of HLT exits in the xAPIC IPI test when running on a CPU that supports AMD's Idle HLT (which elides interception of HLT if a virtual IRQ is pending and unmasked). - Misc cleanups and fixes.
2025-03-19	Merge tag 'kvm-x86-selftests_6.15-1' of https://github.com/kvm-x86/linux ↵	Paolo Bonzini
	into HEAD KVM selftests changes for 6.15, part 1 - Misc cleanups and prep work. - Annotate _no_printf() with "printf" so that pr_debug() statements are checked by the compiler for default builds (and pr_info() when QUIET). - Attempt to whack the last LLC references/misses mole in the Intel PMU counters test by adding a data load and doing CLFLUSH{OPT} on the data instead of the code being executed. The theory is that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters. - Fix a flaw in the Intel PMU counters test where it asserts that an event is counting correctly without actually knowing what the event counts on the underlying hardware.
2025-03-19	Merge tag 'kvm-x86-misc-6.15' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.15: - Fix a bug in PIC emulation that caused KVM to emit a spurious KVM_REQ_EVENT. - Add a helper to consolidate handling of mp_state transitions, and use it to clear pv_unhalted whenever a vCPU is made RUNNABLE. - Defer runtime CPUID updates until KVM emulates a CPUID instruction, to coalesce updates when multiple pieces of vCPU state are changing, e.g. as part of a nested transition. - Fix a variety of nested emulation bugs, and add VMX support for synthesizing nested VM-Exit on interception (instead of injecting #UD into L2). - Drop "support" for PV Async #PF with proctected guests without SEND_ALWAYS, as KVM can't get the current CPL. - Misc cleanups
2025-03-19	KVM: riscv: selftests: Add Zaamo/Zalrsc extensions to get-reg-list test	Clément Léger
	The KVM RISC-V allows Zaamo/Zalrsc extensions for Guest/VM so add these extensions to get-reg-list test. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240619153913.867263-6-cleger@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-03-12	KVM: arm64: selftests: Test that TGRAN*_2 fields are writable	Sebastian Ott
	Userspace can write to these fields for non-NV guests; add test that do just that. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/kvmarm/20250306184013.30008-1-sebott@redhat.com/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-06	KVM: riscv: selftests: Allow number of interrupts to be configurable	Atish Patra
	It is helpful to vary the number of the LCOFI interrupts generated by the overflow test. Allow additional argument for overflow test to accommodate that. It can be easily cross-validated with /proc/interrupts output in the host. Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-4-41d177e45929@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-03-06	KVM: riscv: selftests: Change command line option	Atish Patra
	The PMU test commandline option takes an argument to disable a certain test. The initial assumption behind this was a common use case is just to run all the test most of the time. However, running a single test seems more useful instead. Especially, the overflow test has been helpful to validate PMU virtualizaiton interrupt changes. Switching the command line option to run a single test instead of disabling a single test also allows to provide additional test specific arguments to the test. The default without any options remains unchanged which continues to run all the tests. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-3-41d177e45929@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-03-06	KVM: riscv: selftests: Do not start the counter in the overflow handler	Atish Patra
	There is no need to start the counter in the overflow handler as we intend to trigger precise number of LCOFI interrupts through these tests. The overflow irq handler has already stopped the counter. As a result, the stop call from the test function may return already stopped error which is fine as well. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-2-41d177e45929@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-03-03	KVM: selftests: Fix printf() format goof in SEV smoke test	Sean Christopherson
	Print out the index of mismatching XSAVE bytes using unsigned decimal format. Some versions of clang complain about trying to print an integer as an unsigned char. x86/sev_smoke_test.c:55:51: error: format specifies type 'unsigned char' but the argument has type 'int' [-Werror,-Wformat] Fixes: 8c53183dbaa2 ("selftests: kvm: add test for transferring FPU state into VMSA") Link: https://lore.kernel.org/r/20250228233852.3855676-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-03-03	KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage	Sean Christopherson
	During the initial mprotect(RO) stage of mmu_stress_test, keep vCPUs spinning until all vCPUs have hit -EFAULT, i.e. until all vCPUs have tried to write to a read-only page. If a vCPU manages to complete an entire iteration of the loop without hitting a read-only page, and the vCPU observes mprotect_ro_done before starting a second iteration, then the vCPU will prematurely fall through to GUEST_SYNC(3) (on x86 and arm64) and get out of sequence. Replace the "do-while (!r)" loop around the associated _vcpu_run() with a single invocation, as barring a KVM bug, the vCPU is guaranteed to hit -EFAULT, and retrying on success is super confusion, hides KVM bugs, and complicates this fix. The do-while loop was semi-unintentionally added specifically to fudge around a KVM x86 bug, and said bug is unhittable without modifying the test to force x86 down the !(x86\|\|arm64) path. On x86, if forced emulation is enabled, vcpu_arch_put_guest() may trigger emulation of the store to memory. Due a (very, very) longstanding bug in KVM x86's emulator, emulate writes to guest memory that fail during __kvm_write_guest_page() unconditionally return KVM_EXIT_MMIO. While that is desirable in the !memslot case, it's wrong in this case as the failure happens due to __copy_to_user() hitting a read-only page, not an emulated MMIO region. But as above, x86 only uses vcpu_arch_put_guest() if the __x86_64__ guards are clobbered to force x86 down the common path, and of course the unexpected MMIO is a KVM bug, i.e. should cause a test failure. Fixes: b6c304aec648 ("KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)") Reported-by: Yan Zhao <yan.y.zhao@intel.com> Closes: https://lore.kernel.org/all/20250208105318.16861-1-yan.y.zhao@intel.com Debugged-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20250228230804.3845860-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28	KVM: selftests: Relax assertion on HLT exits if CPU supports Idle HLT	Sean Christopherson
	If the CPU supports Idle HLT, which elides HLT VM-Exits if the vCPU has an unmasked pending IRQ or NMI, relax the xAPIC IPI test's assertion on the number of HLT exits to only require that the number of exits is less than or equal to the number of HLT instructions that were executed. I.e. don't fail the test if Idle HLT does what it's supposed to do. Note, unfortunately there's no way to determine if KVM supports Idle HLT, as this_cpu_has() checks raw CPU support, and kvm_cpu_has() checks what can be exposed to L1, i.e. the latter would check if KVM supports nested Idle HLT. But, since the assert is purely bonus coverage, checking for CPU support is good enough. Cc: Manali Shukla <Manali.Shukla@amd.com> Tested-by: Manali Shukla <Manali.Shukla@amd.com> Link: https://lore.kernel.org/r/20250226231809.3183093-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28	KVM: selftests: Assert that STI blocking isn't set after event injection	Sean Christopherson
	Add an L1 (guest) assert to the nested exceptions test to verify that KVM doesn't put VMRUN in an STI shadow (AMD CPUs bleed the shadow into the guest's int_state if a #VMEXIT occurs before VMRUN fully completes). Add a similar assert to the VMX side as well, because why not. Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://lore.kernel.org/r/20250224165442.2338294-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-28	KVM: selftests: Fix spelling mistake "UFFDIO_CONINUE" -> "UFFDIO_CONTINUE"	Colin Ian King
	There is a spelling mistake in a PER_PAGE_DEBUG debug message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20250227220819.656780-1-colin.i.king@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-26	KVM: selftests: Add test for KVM_REG_ARM_VENDOR_HYP_BMAP_2	Shameer Kolothum
	One difference here with other pseudo-firmware bitmap registers is that the default/reset value for the supported hypercall function-ids is 0 at present. Hence, modify the test accordingly. Reviewed-by: Sebastian Ott <sebott@redhat.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20250221140229.12588-7-shameerali.kolothum.thodi@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-26	KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR	Sebastian Ott
	Assert that MIDR_EL1, REVIDR_EL1, AIDR_EL1 are writable from userspace, that the changed values are visible to guests, and that they are preserved across a vCPU reset. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250225005401.679536-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-02-24	KVM: selftests: Add a nested (forced) emulation intercept test for x86	Sean Christopherson
	Add a rudimentary test for validating KVM's handling of L1 hypervisor intercepts during instruction emulation on behalf of L2. To minimize complexity and avoid overlap with other tests, only validate KVM's handling of instructions that L1 wants to intercept, i.e. that generate a nested VM-Exit. Full testing of emulation on behalf of L2 is better achieved by running existing (forced) emulation tests in a VM, (although on VMX, getting L0 to emulate on #UD requires modifying either L1 KVM to not intercept #UD, or modifying L0 KVM to prioritize L0's exception intercepts over L1's intercepts, as is done by KVM for SVM). Since emulation should never be successful, i.e. L2 always exits to L1, dynamically generate the L2 code stream instead of adding a helper for each instruction. Doing so requires hand coding instruction opcodes, but makes it significantly easier for the test to compute the expected "next RIP" and instruction length. Link: https://lore.kernel.org/r/20250201015518.689704-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14	KVM: selftests: Add infrastructure for getting vCPU binary stats	Sean Christopherson
	Now that the binary stats cache infrastructure is largely scope agnostic, add support for vCPU-scoped stats. Like VM stats, open and cache the stats FD when the vCPU is created so that it's guaranteed to be valid when vcpu_get_stats() is invoked. Account for the extra per-vCPU file descriptor in kvm_set_files_rlimit(), so that tests that create large VMs don't run afoul of resource limits. To sanity check that the infrastructure actually works, and to get a bit of bonus coverage, add an assert in x86's xapic_ipi_test to verify that the number of HLTs executed by the test matches the number of HLT exits observed by KVM. Tested-by: Manali Shukla <Manali.Shukla@amd.com> Link: https://lore.kernel.org/r/20250111005049.1247555-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14	KVM: selftests: Adjust number of files rlimit for all "standard" VMs	Sean Christopherson
	Move the max vCPUs test's RLIMIT_NOFILE adjustments to common code, and use the new helper to adjust the resource limit for non-barebones VMs by default. x86's recalc_apic_map_test creates 512 vCPUs, and a future change will open the binary stats fd for all vCPUs, which will put the recalc APIC test above some distros' default limit of 1024. Link: https://lore.kernel.org/r/20250111005049.1247555-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14	KVM: selftests: Get VM's binary stats FD when opening VM	Sean Christopherson
	Get and cache a VM's binary stats FD when the VM is opened, as opposed to waiting until the stats are first used. Opening the stats FD outside of __vm_get_stat() will allow converting it to a scope-agnostic helper. Note, this doesn't interfere with kvm_binary_stats_test's testcase that verifies a stats FD can be used after its own VM's FD is closed, as the cached FD is also closed during kvm_vm_free(). Link: https://lore.kernel.org/r/20250111005049.1247555-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14	KVM: selftests: Add struct and helpers to wrap binary stats cache	Sean Christopherson
	Add a struct and helpers to manage the binary stats cache, which is currently used only for VM-scoped stats. This will allow expanding the selftests infrastructure to provide support for vCPU-scoped binary stats, which, except for the ioctl to get the stats FD are identical to VM-scoped stats. Defer converting __vm_get_stat() to a scope-agnostic helper to a future patch, as getting the stats FD from KVM needs to be moved elsewhere before it can be made completely scope-agnostic. Link: https://lore.kernel.org/r/20250111005049.1247555-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14	KVM: selftests: Macrofy vm_get_stat() to auto-generate stat name string	Sean Christopherson
	Turn vm_get_stat() into a macro that generates a string for the stat name, as opposed to taking a string. This will allow hardening stat usage in the future to generate errors on unknown stats at compile time. No functional change intended. Link: https://lore.kernel.org/r/20250111005049.1247555-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-14	KVM: selftests: Assert that __vm_get_stat() actually finds a stat	Sean Christopherson
	Fail the test if it attempts to read a stat that doesn't exist, e.g. due to a typo (hooray, strings), or because the test tried to get a stat for the wrong scope. As is, there's no indiciation of failure and @data is left untouched, e.g. holds '0' or random stack data in most cases. Fixes: 8448ec5993be ("KVM: selftests: Add NX huge pages test") Link: https://lore.kernel.org/r/20250111005049.1247555-4-seanjc@google.com [sean: fixup spelling mistake, courtesy of Colin Ian King] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Close VM's binary stats FD when releasing VM	Sean Christopherson
	Close/free a VM's binary stats cache when the VM is released, not when the VM is fully freed. When a VM is re-created, e.g. for state save/restore tests, the stats FD and descriptor points at the old, defunct VM. The FD is still valid, in that the underlying stats file won't be freed until the FD is closed, but reading stats will always pull information from the old VM. Note, this is a benign bug in the current code base as none of the tests that recreate VMs use binary stats. Fixes: 83f6e109f562 ("KVM: selftests: Cache binary stats metadata for duration of test") Link: https://lore.kernel.org/r/20250111005049.1247555-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Fix mostly theoretical leak of VM's binary stats FD	Sean Christopherson
	When allocating and freeing a VM's cached binary stats info, check for a NULL descriptor, not a '0' file descriptor, as '0' is a legal FD. E.g. in the unlikely scenario the kernel installs the stats FD at entry '0', selftests would reallocate on the next __vm_get_stat() and/or fail to free the stats in kvm_vm_free(). Fixes: 83f6e109f562 ("KVM: selftests: Cache binary stats metadata for duration of test") Link: https://lore.kernel.org/r/20250111005049.1247555-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Allow running a single iteration of dirty_log_test	Sean Christopherson
	Now that dirty_log_test doesn't require running multiple iterations to verify dirty pages, and actually runs the requested number of iterations, drop the requirement that the test run at least "3" (which was really "2" at the time the test was written) iterations. Link: https://lore.kernel.org/r/20250111003004.1235645-21-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Fix an off-by-one in the number of dirty_log_test iterations	Sean Christopherson
	Actually run all requested iterations, instead of iterations-1 (the count starts at '1' due to the need to avoid '0' as an in-memory value for a dirty page). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-20-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Set per-iteration variables at the start of each iteration	Sean Christopherson
	Set the per-iteration variables at the start of each iteration instead of setting them before the loop, and at the end of each iteration. To ensure the vCPU doesn't race ahead before the first iteration, simply have the vCPU worker want for sem_vcpu_cont, which conveniently avoids the need to special case posting sem_vcpu_cont from the loop. Link: https://lore.kernel.org/r/20250111003004.1235645-19-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Tighten checks around prev iter's last dirty page in ring	Sean Christopherson
	Now that each iteration collects all dirty entries and ensures the guest completes at least one write, tighten the exemptions for the last dirty page of the previous iteration. Specifically, the only legal value (other than the current iteration) is N-1. Unlike the last page for the current iteration, the in-progress write from the previous iteration is guaranteed to have completed, otherwise the test would have hung. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-18-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Ensure guest writes min number of pages in dirty_log_test	Sean Christopherson
	Ensure the vCPU fully completes at least one write in each dirty_log_test iteration, as failure to dirty any pages complicates verification and forces the test to be overly conservative about possible values. E.g. verification needs to allow the last dirty page from a previous iteration to have any value, because the vCPU could get stuck for multiple iterations, which is unlikely but can happen in heavily overloaded and/or nested virtualization setups. Somewhat arbitrarily set the minimum to 0x100/256; high enough to be interesting, but not so high as to lead to pointlessly long runtimes. Opportunistically report the number of writes per iteration for debug purposes, and so that a human can sanity check the test. Due to each write targeting a random page, the number of dirty pages will likely be lower than the number of total writes, but it shouldn't be absurdly lower (which would suggest the pRNG is broken) Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-17-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: sefltests: Verify value of dirty_log_test last page isn't bogus	Sean Christopherson
	Add a sanity check that a completely garbage value wasn't written to the last dirty page in the ring, e.g. that it doesn't contain the next iteration's value. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Collect all dirty entries in each dirty_log_test iteration	Sean Christopherson
	Collect all dirty entries during each iteration of dirty_log_test by doing a final collection after the vCPU has been stopped. To deal with KVM's destructive approach to getting the dirty bitmaps, use a second bitmap for the post-stop collection. Collecting all entries that were dirtied during an iteration simplifies the verification logic and improves test coverage. - If a page is written during iteration X, but not seen as dirty until X+1, the test can get a false pass if the page is also written during X+1. - If a dirty page used a stale value from a previous iteration, the test would grant a false pass. - If a missed dirty log occurs in the last iteration, the test would fail to detect the issue. E.g. modifying mark_page_dirty_in_slot() to dirty an unwritten gfn: if (memslot && kvm_slot_dirty_track_enabled(memslot)) { unsigned long rel_gfn = gfn - memslot->base_gfn; u32 slot = (memslot->as_id << 16) \| memslot->id; if (!vcpu->extra_dirty && gfn_to_memslot(kvm, gfn + 1) == memslot) { vcpu->extra_dirty = true; mark_page_dirty_in_slot(kvm, memslot, gfn + 1); } if (kvm->dirty_ring_size && vcpu) kvm_dirty_ring_push(vcpu, slot, rel_gfn); else if (memslot->dirty_bitmap) set_bit_le(rel_gfn, memslot->dirty_bitmap); } isn't detected with the current approach, even with an interval of 1ms (when running nested in a VM; bare metal would be even less likely to detect the bug due to the vCPU being able to dirty more memory). Whereas collecting all dirty entries consistently detects failures with an interval of 700ms or more (the longer interval means a higher probability of an actual write to the prematurely-dirtied page). Link: https://lore.kernel.org/r/20250111003004.1235645-15-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Print (previous) last_page on dirty page value mismatch	Sean Christopherson
	Print out the last dirty pages from the current and previous iteration on verification failures. In many cases, bugs (especially test bugs) occur on the edges, i.e. on or near the last pages, and being able to correlate failures with the last pages can aid in debug. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Use continue to handle all "pass" scenarios in dirty_log_test	Sean Christopherson
	When verifying pages in dirty_log_test, immediately continue on all "pass" scenarios to make the logic consistent in how it handles pass vs. fail. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Post to sem_vcpu_stop if and only if vcpu_stop is true	Sean Christopherson
	When running dirty_log_test using the dirty ring, post to sem_vcpu_stop only when the main thread has explicitly requested that the vCPU stop. Synchronizing the vCPU and main thread whenever the dirty ring happens to be full is unnecessary, as KVM's ABI is to actively prevent the vCPU from running until the ring is no longer full. I.e. attempting to run the vCPU will simply result in KVM_EXIT_DIRTY_RING_FULL without ever entering the guest. And if KVM doesn't exit, e.g. let's the vCPU dirty more pages, then that's a KVM bug worth finding. Posting to sem_vcpu_stop on ring full also makes it difficult to get the test logic right, e.g. it's easy to let the vCPU keep running when it shouldn't, as a ring full can essentially happen at any given time. Opportunistically rework the handling of dirty_ring_vcpu_ring_full to leave it set for the remainder of the iteration in order to simplify the surrounding logic. Link: https://lore.kernel.org/r/20250111003004.1235645-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Keep dirty_log_test vCPU in guest until it needs to stop	Sean Christopherson
	In the dirty_log_test guest code, exit to userspace only when the vCPU is explicitly told to stop. Periodically exiting just to check if a flag has been set is unnecessary, weirdly complex, and wastes time handling exits that could be used to dirty memory. Opportunistically convert 'i' to a uint64_t to guard against the unlikely scenario that guest_num_pages exceeds the storage of an int. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Honor "stop" request in dirty ring test	Sean Christopherson
	Now that the vCPU doesn't dirty every page on the first iteration for architectures that support the dirty ring, honor vcpu_stop in the dirty ring's vCPU worker, i.e. stop when the main thread says "stop". This will allow plumbing vcpu_stop into the guest so that the vCPU doesn't need to periodically exit to userspace just to see if it should stop. Add a comment explaining that marking all pages as dirty is problematic for the dirty ring, as it results in the guest getting stuck on "ring full". This could be addressed by adding a GUEST_SYNC() in that initial loop, but it's not clear how that would interact with s390's behavior. Link: https://lore.kernel.org/r/20250111003004.1235645-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Limit dirty_log_test's s390x workaround to s390x	Maxim Levitsky
	s390 specific workaround causes the dirty-log mode of the test to dirty all guest memory on the first iteration, which is very slow when the test is run in a nested VM. Limit this workaround to s390x. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Continuously reap dirty ring while vCPU is running	Sean Christopherson
	Continue collecting entries from the dirty ring for the entire time the vCPU is running. Collecting exactly once all but guarantees the vCPU will encounter a "ring full" event and stop. While testing ring full is interesting, stopping and doing nothing is not, especially for larger intervals as the test effectively does nothing for a much longer time. To balance continuous collection with letting the guest make forward progress, chunk the interval waiting into 1ms loops (which also makes the math dead simple). To maintain coverage for "ring full", collect entries on subsequent iterations if and only if the ring has been filled at least once. I.e. let the ring fill up (if the interval allows), but after that contiuously empty it so that the vCPU can keep running. Opportunistically drop unnecessary zero-initialization of "count". Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Read per-page value into local var when verifying dirty_log_test	Sean Christopherson
	Cache the page's value during verification in a local variable, re-reading from the pointer is ugly and error prone, e.g. allows for bugs like checking the pointer itself instead of the value. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Precisely track number of dirty/clear pages for each iteration	Sean Christopherson
	Track and print the number of dirty and clear pages for each iteration. This provides parity between all log modes, and will allow collecting the dirty ring multiple times per iteration without spamming the console. Opportunistically drop the "Dirtied N pages" print, which is redundant and wrong. For the dirty ring testcase, the vCPU isn't guaranteed to complete a loop. And when the vCPU does complete a loot, there are no guarantees that it has dirtied that many pages; because the writes are to random address, the vCPU may have written the same page over and over, i.e. only dirtied one page. While the number of writes performed by the vCPU is also interesting, e.g. the pr_info() could be tweaked to use different verbiage, pages_count doesn't correctly track the number of writes either (because loops aren't guaranteed to a complete). Delete the print for now, as a future patch will precisely track the number of writes, at which point the verification phase can report the number of writes performed by each iteration. Link: https://lore.kernel.org/r/20250111003004.1235645-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Drop stale srandom() initialization from dirty_log_test	Sean Christopherson
	Drop an srandom() initialization that was leftover from the conversion to use selftests' guest_random_xxx() APIs. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Drop signal/kick from dirty ring testcase	Sean Christopherson
	Drop the signal/kick from dirty_log_test's dirty ring handling, as kicking the vCPU adds marginal value, at the cost of adding significant complexity to the test. Asynchronously interrupting the vCPU isn't novel; unless the kernel is fully tickless, the vCPU will be interrupted by IRQs for any decently large interval. And exiting to userspace mode in the middle of a sequence isn't novel either, as the vCPU will do so every time the ring becomes full. Link: https://lore.kernel.org/r/20250111003004.1235645-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Sync dirty_log_test iteration to guest before resuming	Sean Christopherson
	Sync the new iteration to the guest prior to restarting the vCPU, otherwise it's possible for the vCPU to dirty memory for the next iteration using the current iteration's value. Note, because the guest can be interrupted between the vCPU's load of the iteration and its write to memory, it's still possible for the guest to store the previous iteration to memory as the previous iteration may be cached in a CPU register (which the test accounts for). Note #2, the test's current approach of collecting dirty entries before stopping the vCPU also results dirty memory having the previous iteration. E.g. if page is dirtied in the previous iteration, but not the current iteration, the verification phase will observe the previous iteration's value in memory. That wart will be remedied in the near future, at which point synchronizing the iteration before restarting the vCPU will guarantee the only way for verification to observe stale iterations is due to the CPU register caching case, or due to a dirty entry being collected before the store retires. Link: https://lore.kernel.org/r/20250111003004.1235645-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Support multiple write retires in dirty_log_test	Maxim Levitsky
	If dirty_log_test is run nested, it is possible for entries in the emulated PML log to appear before the actual memory write is committed to the RAM, due to the way KVM retries memory writes as a response to a MMU fault. In addition to that in some very rare cases retry can happen more than once, which will lead to the test failure because once the write is finally committed it may have a very outdated iteration value. Detect and avoid this case. Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20250111003004.1235645-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Actually emit forced emulation prefix for kvm_asm_safe_fep()	Sean Christopherson
	Use KVM_ASM_SAFE_FEP, not simply KVM_ASM_SAFE, for kvm_asm_safe_fep(), as the non-FEP version doesn't force emulation (stating the obvious). Note, there are currently no users of kvm_asm_safe_fep(). Fixes: ab3b6a7de8df ("KVM: selftests: Add a forced emulation variation of KVM_ASM_SAFE()") Link: https://lore.kernel.org/r/20250130163135.270770-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC	Sean Christopherson
	Add testcases to x86's Hyper-V CPUID test to verify that KVM advertises support for features that require an in-kernel local APIC appropriately, i.e. that KVM hides support from the vCPU-scoped ioctl if the VM doesn't have an in-kernel local APIC. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250118003454.2619573-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-02-12	KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper	Sean Christopherson
	Allocate, get, and free the CPUID array in the Hyper-V CPUID test in the test's core helper, instead of copy+pasting code at each call site. In addition to deduplicating a small amount of code, restricting visibility of the array to a single invocation of the core test prevents "leaking" an array across test cases. Passing in @vcpu to the helper will also allow pivoting on VM-scoped information without needing to pass more booleans, e.g. to conditionally assert on features that require an in-kernel APIC. To avoid use-after-free bugs due to overzealous and careless developers, opportunstically add a comment to explain that the system-scoped helper caches the Hyper-V CPUID entries, i.e. that the caller is not responsible for freeing the memory. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250118003454.2619573-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>