summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)Author
2024-12-22Merge tag 'kvm-x86-fixes-6.13-rcN' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 fixes for 6.13: - Disable AVIC on SNP-enabled systems that don't allow writes to the virtual APIC page, as such hosts will hit unexpected RMP #PFs in the host when running VMs of any flavor. - Fix a WARN in the hypercall completion path due to KVM trying to determine if a guest with protected register state is in 64-bit mode (KVM's ABI is to assume such guests only make hypercalls in 64-bit mode). - Allow the guest to write to supported bits in MSR_AMD64_DE_CFG to fix a regression with Windows guests, and because KVM's read-only behavior appears to be entirely made up. - Treat TDP MMU faults as spurious if the faulting access is allowed given the existing SPTE. This fixes a benign WARN (other than the WARN itself) due to unexpectedly replacing a writable SPTE with a read-only SPTE.
2024-12-22KVM: x86: let it be known that ignore_msrs is a bad ideaPaolo Bonzini
When running KVM with ignore_msrs=1 and report_ignored_msrs=0, the user has no clue that that the guest is being lied to. This may cause bug reports such as https://gitlab.com/qemu-project/qemu/-/issues/2571, where enabling a CPUID bit in QEMU caused Linux guests to try reading MSR_CU_DEF_ERR; and being lied about the existence of MSR_CU_DEF_ERR caused the guest to assume other things about the local APIC which were not true: Sep 14 12:02:53 kernel: mce: [Firmware Bug]: Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly. Sep 14 12:02:53 kernel: unchecked MSR access error: RDMSR from 0x852 at rIP: 0xffffffffb548ffa7 (native_read_msr+0x7/0x40) Sep 14 12:02:53 kernel: Call Trace: ... Sep 14 12:02:53 kernel: native_apic_msr_read+0x20/0x30 Sep 14 12:02:53 kernel: setup_APIC_eilvt+0x47/0x110 Sep 14 12:02:53 kernel: mce_amd_feature_init+0x485/0x4e0 ... Sep 14 12:02:53 kernel: [Firmware Bug]: cpu 0, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu Without reported_ignored_msrs=0 at least the host kernel log will contain enough information to avoid going on a wild goose chase. But if reports about individual MSR accesses are being silenced too, at least complain loudly the first time a VM is started. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-22KVM: VMX: don't include '<linux/find.h>' directlyWolfram Sang
The header clearly states that it does not want to be included directly, only via '<linux/bitmap.h>'. Replace the include accordingly. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Message-ID: <20241217070539.2433-2-wsa+renesas@sang-engineering.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-19KVM: x86/mmu: Treat TDP MMU faults as spurious if access is already allowedSean Christopherson
Treat slow-path TDP MMU faults as spurious if the access is allowed given the existing SPTE to fix a benign warning (other than the WARN itself) due to replacing a writable SPTE with a read-only SPTE, and to avoid the unnecessary LOCK CMPXCHG and subsequent TLB flush. If a read fault races with a write fault, fast GUP fails for any reason when trying to "promote" the read fault to a writable mapping, and KVM resolves the write fault first, then KVM will end up trying to install a read-only SPTE (for a !map_writable fault) overtop a writable SPTE. Note, it's not entirely clear why fast GUP fails, or if that's even how KVM ends up with a !map_writable fault with a writable SPTE. If something else is going awry, e.g. due to a bug in mmu_notifiers, then treating read faults as spurious in this scenario could effectively mask the underlying problem. However, retrying the faulting access instead of overwriting an existing SPTE is functionally correct and desirable irrespective of the WARN, and fast GUP _can_ legitimately fail with a writable VMA, e.g. if the Accessed bit in primary MMU's PTE is toggled and causes a PTE value mismatch. The WARN was also recently added, specifically to track down scenarios where KVM is unnecessarily overwrites SPTEs, i.e. treating the fault as spurious doesn't regress KVM's bug-finding capabilities in any way. In short, letting the WARN linger because there's a tiny chance it's due to a bug elsewhere would be excessively paranoid. Fixes: 1a175082b190 ("KVM: x86/mmu: WARN and flush if resolving a TDP MMU fault clears MMU-writable") Reported-by: Lei Yang <leiyang@redhat.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219588 Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218213611.3181643-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: SVM: Allow guest writes to set MSR_AMD64_DE_CFG bitsSean Christopherson
Drop KVM's arbitrary behavior of making DE_CFG.LFENCE_SERIALIZE read-only for the guest, as rejecting writes can lead to guest crashes, e.g. Windows in particular doesn't gracefully handle unexpected #GPs on the WRMSR, and nothing in the AMD manuals suggests that LFENCE_SERIALIZE is read-only _if it exists_. KVM only allows LFENCE_SERIALIZE to be set, by the guest or host, if the underlying CPU has X86_FEATURE_LFENCE_RDTSC, i.e. if LFENCE is guaranteed to be serializing. So if the guest sets LFENCE_SERIALIZE, KVM will provide the desired/correct behavior without any additional action (the guest's value is never stuffed into hardware). And having LFENCE be serializing even when it's not _required_ to be is a-ok from a functional perspective. Fixes: 74a0e79df68a ("KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG value") Fixes: d1d93fa90f1a ("KVM: SVM: Add MSR-based feature support for serializing LFENCE") Reported-by: Simon Pilkington <simonp.git@mailbox.org> Closes: https://lore.kernel.org/all/52914da7-a97b-45ad-86a0-affdf8266c61@mailbox.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241211172952.1477605-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: x86: Play nice with protected guests in complete_hypercall_exit()Sean Christopherson
Use is_64_bit_hypercall() instead of is_64_bit_mode() to detect a 64-bit hypercall when completing said hypercall. For guests with protected state, e.g. SEV-ES and SEV-SNP, KVM must assume the hypercall was made in 64-bit mode as the vCPU state needed to detect 64-bit mode is unavailable. Hacking the sev_smoke_test selftest to generate a KVM_HC_MAP_GPA_RANGE hypercall via VMGEXIT trips the WARN: ------------[ cut here ]------------ WARNING: CPU: 273 PID: 326626 at arch/x86/kvm/x86.h:180 complete_hypercall_exit+0x44/0xe0 [kvm] Modules linked in: kvm_amd kvm ... [last unloaded: kvm] CPU: 273 UID: 0 PID: 326626 Comm: sev_smoke_test Not tainted 6.12.0-smp--392e932fa0f3-feat #470 Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024 RIP: 0010:complete_hypercall_exit+0x44/0xe0 [kvm] Call Trace: <TASK> kvm_arch_vcpu_ioctl_run+0x2400/0x2720 [kvm] kvm_vcpu_ioctl+0x54f/0x630 [kvm] __se_sys_ioctl+0x6b/0xc0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- Fixes: b5aead0064f3 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state") Cc: stable@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20241128004344.4072099-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: SVM: Disable AVIC on SNP-enabled system without HvInUseWrAllowed featureSuravee Suthikulpanit
On SNP-enabled system, VMRUN marks AVIC Backing Page as in-use while the guest is running for both secure and non-secure guest. Any hypervisor write to the in-use vCPU's AVIC backing page (e.g. to inject an interrupt) will generate unexpected #PF in the host. Currently, attempt to run AVIC guest would result in the following error: BUG: unable to handle page fault for address: ff3a442e549cc270 #PF: supervisor write access in kernel mode #PF: error_code(0x80000003) - RMP violation PGD b6ee01067 P4D b6ee02067 PUD 10096d063 PMD 11c540063 PTE 80000001149cc163 SEV-SNP: PFN 0x1149cc unassigned, dumping non-zero entries in 2M PFN region: [0x114800 - 0x114a00] ... Newer AMD system is enhanced to allow hypervisor to modify the backing page for non-secure guest on SNP-enabled system. This enhancement is available when the CPUID Fn8000_001F_EAX bit 30 is set (HvInUseWrAllowed). This table describes AVIC support matrix w.r.t. SNP enablement: | Non-SNP system | SNP system ----------------------------------------------------- Non-SNP guest | AVIC Activate | AVIC Activate iff | | HvInuseWrAllowed=1 ----------------------------------------------------- SNP guest | N/A | Secure AVIC Therefore, check and disable AVIC in kvm_amd driver when the feature is not available on SNP-enabled system. See the AMD64 Architecture Programmer’s Manual (APM) Volume 2 for detail. (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/ programmer-references/40332.pdf) Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20241104075845.7583-1-suravee.suthikulpanit@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: x86: Remove hwapic_irr_update() from kvm_x86_opsChao Gao
Remove the redundant .hwapic_irr_update() ops. If a vCPU has APICv enabled, KVM updates its RVI before VM-enter to L1 in vmx_sync_pir_to_irr(). This guarantees RVI is up-to-date and aligned with the vIRR in the virtual APIC. So, no need to update RVI every time the vIRR changes. Note that KVM never updates vmcs02 RVI in .hwapic_irr_update() or vmx_sync_pir_to_irr(). So, removing .hwapic_irr_update() has no impact to the nested case. Signed-off-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241111085947.432645-1-chao.gao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Honor event priority when emulating PI delivery during VM-EnterSean Christopherson
Move the handling of a nested posted interrupt notification that is unblocked by nested VM-Enter (unblocks L1 IRQs when ack-on-exit is enabled by L1) from VM-Enter emulation to vmx_check_nested_events(). To avoid a pointless forced immediate exit, i.e. to not regress IRQ delivery latency when a nested posted interrupt is pending at VM-Enter, block processing of the notification IRQ if and only if KVM must block _all_ events. Unlike injected events, KVM doesn't need to actually enter L2 before updating the vIRR and vmcs02.GUEST_INTR_STATUS, as the resulting L2 IRQ will be blocked by hardware itself, until VM-Enter to L2 completes. Note, very strictly speaking, moving the IRQ from L2's PIR to IRR before entering L2 is still technically wrong. But, practically speaking, only an L1 hypervisor or an L0 userspace that is deliberately checking event priority against PIR=>IRR processing can even notice; L2 will see architecturally correct behavior, as KVM ensures the VM-Enter is finished before doing anything that would effectively preempt the PIR=>IRR movement. Reported-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241101191447.1807602-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Use vmcs01's controls shadow to check for IRQ/NMI windows at VM-EnterSean Christopherson
Use vmcs01's execution controls shadow to check for IRQ/NMI windows after a successful nested VM-Enter, instead of snapshotting the information prior to emulating VM-Enter. It's quite difficult to see that the entire reason controls are snapshot prior nested VM-Enter is to read them from vmcs01 (vmcs02 is loaded if nested VM-Enter is successful). That could be solved with a comment, but explicitly using vmcs01's shadow makes the code self-documenting to a certain extent. No functional change intended (vmcs01's execution controls must not be modified during emulation of nested VM-Enter). Link: https://lore.kernel.org/r/20241101191447.1807602-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Drop manual vmcs01.GUEST_INTERRUPT_STATUS.RVI check at VM-EnterSean Christopherson
Drop the manual check for a pending IRQ in vmcs01's RVI field during nested VM-Enter, as the recently added call to kvm_apic_has_interrupt() when checking for pending events after successful VM-Enter is a superset of the RVI check (IRQs that are pending in RVI are also pending in L1's IRR). Link: https://lore.kernel.org/r/20241101191447.1807602-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Check for pending INIT/SIPI after entering non-root modeSean Christopherson
Explicitly check for a pending INIT or SIPI after entering non-root mode during nested VM-Enter emulation, as no VMCS information is quered as part of the check, i.e. there is no need to check for INIT/SIPI while vmcs01 is still loaded. Link: https://lore.kernel.org/r/20241101191447.1807602-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Explicitly update vPPR on successful nested VM-EnterSean Christopherson
Always request pending event evaluation after successful nested VM-Enter if L1 has a pending IRQ, as KVM will effectively do so anyways when APICv is enabled, by way of vmx_has_apicv_interrupt(). This will allow dropping the aforementioned APICv check, and will also allow handling nested Posted Interrupt processing entirely within vmx_check_nested_events(), which is necessary to honor priority between concurrent events. Note, checking for pending IRQs has a subtle side effect, as it results in a PPR update for L1's vAPIC (PPR virtualization does happen at VM-Enter, but for nested VM-Enter that affects L2's vAPIC, not L1's vAPIC). However, KVM updates PPR _constantly_, even when PPR technically shouldn't be refreshed, e.g. kvm_vcpu_has_events() re-evaluates PPR if IRQs are unblocked, by way of the same kvm_apic_has_interrupt() check. Ditto for nested VM-Enter itself, when nested posted interrupts are enabled. Thus, trying to avoid a PPR update on VM-Enter just to be pedantically accurate is ridiculous, given the behavior elsewhere in KVM. Link: https://lore.kernel.org/kvm/20230312180048.1778187-1-jason.cj.chen@intel.com Closes: https://lore.kernel.org/all/20240920080012.74405-1-mankku@gmail.com Signed-off-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241101191447.1807602-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: VMX: Allow toggling bits in MSR_IA32_RTIT_CTL when enable bit is clearedAdrian Hunter
Allow toggling other bits in MSR_IA32_RTIT_CTL if the enable bit is being cleared, the existing logic simply ignores the enable bit. E.g. KVM will incorrectly reject a write of '0' to stop tracing. Fixes: bf8c55d8dc09 ("KVM: x86: Implement Intel PT MSRs read/write emulation") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> [sean: rework changelog, drop stable@] Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241101185031.1799556-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-19KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VIDChao Gao
If KVM emulates an EOI for L1's virtual APIC while L2 is active, defer updating GUEST_INTERUPT_STATUS.SVI, i.e. the VMCS's cache of the highest in-service IRQ, until L1 is active, as vmcs01, not vmcs02, needs to track vISR. The missed SVI update for vmcs01 can result in L1 interrupts being incorrectly blocked, e.g. if there is a pending interrupt with lower priority than the interrupt that was EOI'd. This bug only affects use cases where L1's vAPIC is effectively passed through to L2, e.g. in a pKVM scenario where L2 is L1's depriveleged host, as KVM will only emulate an EOI for L1's vAPIC if Virtual Interrupt Delivery (VID) is disabled in vmc12, and L1 isn't intercepting L2 accesses to its (virtual) APIC page (or if x2APIC is enabled, the EOI MSR). WARN() if KVM updates L1's ISR while L2 is active with VID enabled, as an EOI from L2 is supposed to affect L2's vAPIC, but still defer the update, to try to keep L1 alive. Specifically, KVM forwards all APICv-related VM-Exits to L1 via nested_vmx_l1_wants_exit(): case EXIT_REASON_APIC_ACCESS: case EXIT_REASON_APIC_WRITE: case EXIT_REASON_EOI_INDUCED: /* * The controls for "virtualize APIC accesses," "APIC- * register virtualization," and "virtual-interrupt * delivery" only come from vmcs12. */ return true; Fixes: c7c9c56ca26f ("x86, apicv: add virtual interrupt delivery support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/kvm/20230312180048.1778187-1-jason.cj.chen@intel.com Reported-by: Markku Ahvenjärvi <mankku@gmail.com> Closes: https://lore.kernel.org/all/20240920080012.74405-1-mankku@gmail.com Cc: Janne Karhunen <janne.karhunen@gmail.com> Signed-off-by: Chao Gao <chao.gao@intel.com> [sean: drop request, handle in VMX, write changelog] Tested-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241128000010.4051275-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Add information about pending requests to kvm_exit tracepointMaxim Levitsky
Print pending requests in the kvm_exit tracepoint, which allows userspace to gather information on how often KVM interrupts vCPUs due to specific requests. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240910200350.264245-3-mlevitsk@redhat.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Add interrupt injection information to the kvm_entry tracepointMaxim Levitsky
Add VMX/SVM specific interrupt injection info the kvm_entry tracepoint. As is done with kvm_exit, gather the information via a kvm_x86_ops hook to avoid the moderately costly VMREADs on VMX when the tracepoint isn't enabled. Opportunistically rename the parameters in the get_exit_info() declaration to match the names used by both SVM and VMX. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20240910200350.264245-2-mlevitsk@redhat.com [sean: drop is_guest_mode() change, use intr_info/error_code for names] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: SVM: Handle event vectoring error in check_emulate_instruction()Ivan Orlov
Detect unhandleable vectoring in check_emulate_instruction() to prevent infinite retry loops on SVM, and to eliminate the main differences in how VM-Exits during event vectoring are handled on SVM versus VMX. E.g. if the vCPU puts its IDT in emulated MMIO memory and generates an event, without the check_emulate_instruction() change, SVM will re-inject the event and resume the guest, and effectively put the vCPU into an infinite loop. Signed-off-by: Ivan Orlov <iorlov@amazon.com> Link: https://lore.kernel.org/r/20241217181458.68690-6-iorlov@amazon.com [sean: grab "svm" locally, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: VMX: Handle event vectoring error in check_emulate_instruction()Ivan Orlov
Move handling of emulation during event vectoring, which KVM doesn't support, into VMX's check_emulate_instruction(), so that KVM detects all unsupported emulation, not just cached emulated MMIO (EPT misconfig). E.g. on emulated MMIO that isn't cached (EPT Violation) or occurs with legacy shadow paging (#PF). Rejecting emulation on other sources of emulation also fixes a largely theoretical flaw (thanks to the "unprotect and retry" logic), where KVM could incorrectly inject a #DF: 1. CPU executes an instruction and hits a #GP 2. While vectoring the #GP, a shadow #PF occurs 3. On the #PF VM-Exit, KVM re-injects #GP 4. KVM emulates because of the write-protected page 5. KVM "successfully" emulates and also detects the #GP 6. KVM synthesizes a #GP, and since #GP has already been injected, incorrectly escalates to a #DF. Fix the comment about EMULTYPE_PF as this flag doesn't necessarily mean MMIO anymore: it can also be set due to the write protection violation. Note, handle_ept_misconfig() checks vmx_check_emulate_instruction() before attempting emulation of any kind. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ivan Orlov <iorlov@amazon.com> Link: https://lore.kernel.org/r/20241217181458.68690-5-iorlov@amazon.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Try to unprotect and retry on unhandleable emulation failureIvan Orlov
If emulation is "rejected" by check_emulate_instruction(), try to unprotect and retry instruction execution before reporting the error to userspace. Currently, check_emulate_instruction() never signals failure when "unprotect and retry" is possible, but that will change in the future as both VMX and SVM will reject emulation due to coincident exception vectoring. E.g. if there is a write to a shadowed page table when vectoring an event, then unprotecting the gfn and retrying the instruction will allow the guest to make forward progress in most cases, i.e. will allow the vCPU to keep running instead of returning an error to userspace. This ensures that the subsequent patches won't make KVM exit to userspace when handling an intercepted #PF during vectoring without checking whether unprotect and retry is possible. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Ivan Orlov <iorlov@amazon.com> Link: https://lore.kernel.org/r/20241217181458.68690-4-iorlov@amazon.com [sean: massage changelog to clarify this is a nop for the current code] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Add emulation status for unhandleable exception vectoringIvan Orlov
Add emulation status for unhandleable vectoring, i.e. when KVM can't emulate an instruction because emulation was triggered on an exit that occurred while the CPU was vectoring an event. Such a situation can occur if guest sets the IDT descriptor base to point to MMIO region, and triggers an exception after that. Exit to userspace with event delivery error when KVM can't emulate an instruction when vectoring an event. Signed-off-by: Ivan Orlov <iorlov@amazon.com> Link: https://lore.kernel.org/r/20241217181458.68690-3-iorlov@amazon.com [sean: massage changelog and X86EMUL_UNHANDLEABLE_VECTORING comment] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Add function for vectoring error generationIvan Orlov
Extract VMX code for unhandleable VM-Exit during vectoring into vendor-agnostic function so that boiler-plate code can be shared by SVM. To avoid unnecessarily complexity in the helper, unconditionally report a GPA to userspace instead of having a conditional entry. For exits that don't report a GPA, i.e. everything except EPT Misconfig, simply report KVM's "invalid GPA". Signed-off-by: Ivan Orlov <iorlov@amazon.com> Link: https://lore.kernel.org/r/20241217181458.68690-2-iorlov@amazon.com [sean: clarify that the INVALID_GPA logic is new] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Use only local variables (no bitmask) to init kvm_cpu_capsSean Christopherson
Refactor the kvm_cpu_cap_init() macro magic to collect supported features in a local variable instead of passing them to the macro as a "mask". As pointed out by Maxim, relying on macros to "return" a value and set local variables is surprising, as the bitwise-OR logic suggests the macros are pure, i.e. have no side effects. Ideally, the feature initializers would have zero side effects, e.g. would take local variables as params, but there isn't a sane way to do so without either sacrificing the various compile-time assertions (basically a non-starter), or passing at least one variable, e.g. a struct, to each macro usage (adds a lot of noise and boilerplate code). Opportunistically force callers to emit a trailing comma by intentionally omitting a semicolon after invoking the feature initializers. Forcing a trailing comma isotales futures changes to a single line, i.e. doesn't cause churn for unrelated features/lines when adding/removing/modifying a feature. No functional change intended. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-58-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Explicitly track feature flags that are enabled at runtimeSean Christopherson
Add one last (hopefully) CPUID feature macro, RUNTIME_F(), and use it to track features that KVM supports, but that are only set at runtime (in response to other state), and aren't advertised to userspace via KVM_GET_SUPPORTED_CPUID. Currently, RUNTIME_F() is mostly just documentation, but tracking all KVM-supported features will allow for asserting, at build time, take), that all features that are set, cleared, *or* checked by KVM are known to kvm_set_cpu_caps(). No functional change intended. Link: https://lore.kernel.org/r/20241128013424.4096668-57-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Explicitly track feature flags that require vendor enablingSean Christopherson
Add another CPUID feature macro, VENDOR_F(), and use it to track features that KVM supports, but that need additional vendor support and so are conditionally enabled in vendor code. Currently, VENDOR_F() is mostly just documentation, but tracking all KVM-supported features will allow for asserting, at build time, take), that all features that are set, cleared, *or* checked by KVM are known to kvm_set_cpu_caps(). To fudge around a macro collision on 32-bit kernels, #undef DS to be able to get at X86_FEATURE_DS. No functional change intended. Link: https://lore.kernel.org/r/20241128013424.4096668-56-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Rename "SF" macro to "SCATTERED_F"Sean Christopherson
Now that each feature flag is on its own line, i.e. brevity isn't a major concern, drop the "SF" acronym and use the (almost) full name, SCATTERED_F. No functional change intended. Link: https://lore.kernel.org/r/20241128013424.4096668-55-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Pull CPUID capabilities from boot_cpu_data only as neededSean Christopherson
Don't memcpy() all of boot_cpu_data.x86_capability, and instead explicitly fill each kvm_cpu_cap_init leaf during kvm_cpu_cap_init(). While clever, copying all kernel capabilities risks over-reporting KVM capabilities, e.g. if KVM added support in __do_cpuid_func(), but neglected to init the supported set of capabilities. Note, explicitly grabbing leafs deliberately keeps Linux-defined leafs as 0! KVM should never advertise Linux-defined leafs; any relevant features that are "real", but scattered, must be gathered in their correct hardware- defined leaf. Link: https://lore.kernel.org/r/20241128013424.4096668-54-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Add a macro for features that are synthesized into boot_cpu_dataSean Christopherson
Add yet another CPUID macro, this time for features that the host kernel synthesizes into boot_cpu_data, i.e. that the kernel force sets even in situations where the feature isn't reported by CPUID. Thanks to the macro shenanigans of kvm_cpu_cap_init(), such features can now be handled in the core CPUID framework, i.e. don't need to be handled out-of-band and thus without as many guardrails. Adding a dedicated macro also helps document what's going on, e.g. the calls to kvm_cpu_cap_check_and_set() are very confusing unless the reader knows exactly how kvm_cpu_cap_init() generates kvm_cpu_caps (and even then, it's far from obvious). Link: https://lore.kernel.org/r/20241128013424.4096668-53-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES capsSean Christopherson
Drop the manual boot_cpu_has() checks on XSAVE when adjusting the guest's XSAVES capabilities now that guest cpu_caps incorporates KVM's support. The guest's cpu_caps are initialized from kvm_cpu_caps, which are in turn initialized from boot_cpu_data, i.e. checking guest_cpu_cap_has() also checks host/KVM capabilities (which is the entire point of cpu_caps). Cc: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-52-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_capsSean Christopherson
Switch all queries (except XSAVES) of guest features from guest CPUID to guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls to guest_cpu_cap_has(). Keep guest_cpuid_has() around for XSAVES, but subsume its helper guest_cpuid_get_register() and add a compile-time assertion to prevent using guest_cpuid_has() for any other feature. Add yet another comment for XSAVE to explain why KVM is allowed to query its raw guest CPUID. Opportunistically drop the unused guest_cpuid_clear(), as there should be no circumstance in which KVM needs to _clear_ a guest CPUID feature now that everything is tracked via cpu_caps. E.g. KVM may need to _change_ a feature to emulate dynamic CPUID flags, but KVM should never need to clear a feature in guest CPUID to prevent it from being used by the guest. Delete the last remnants of the governed features framework, as the lone holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for governed vs. ungoverned features. Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when computing reserved CR4 bits is a nop when viewed as a whole, as KVM's capabilities are already incorporated into the calculation, i.e. if a feature is present in guest CPUID but unsupported by KVM, its CR4 bit was already being marked as reserved, checking guest_cpu_cap_has() simply double-stamps that it's a reserved bit. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-51-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has()Sean Christopherson
Move the implementations of guest_has_{spec_ctrl,pred_cmd}_msr() down below guest_cpu_cap_has() so that their use of guest_cpuid_has() can be replaced with calls to guest_cpu_cap_has(). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-50-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based featuresSean Christopherson
When updating guest CPUID entries to emulate runtime behavior, e.g. when the guest enables a CR4-based feature that is tied to a CPUID flag, also update the vCPU's cpu_caps accordingly. This will allow replacing all usage of guest_cpuid_has() with guest_cpu_cap_has(). Note, this relies on kvm_set_cpuid() taking a snapshot of cpu_caps before invoking kvm_update_cpuid_runtime(), i.e. when KVM is updating CPUID entries that *may* become the vCPU's CPUID, so that unwinding to the old cpu_caps is possible if userspace tries to set bogus CPUID information. Note #2, none of the features in question use guest_cpu_cap_has() at this time, i.e. aside from settings bits in cpu_caps, this is a glorified nop. Cc: Yang Weijiang <weijiang.yang@intel.com> Cc: Robert Hoo <robert.hoo.linux@gmail.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-49-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of host supportSean Christopherson
When making runtime CPUID updates, change OSXSAVE and OSPKE even if their respective base features (XSAVE, PKU) are not supported by the host. KVM already incorporates host support in the vCPU's effective reserved CR4 bits. I.e. OSXSAVE and OSPKE can be set if and only if the host supports them. And conversely, since KVM's ABI is that KVM owns the dynamic OS feature flags, clearing them when they obviously aren't supported and thus can't be enabled is arguably a fix. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-48-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns right leafSean Christopherson
Drop an unnecessary check that kvm_find_cpuid_entry_index(), i.e. cpuid_entry2_find(), returns the correct leaf when getting CPUID.0x7.0x0 to update X86_FEATURE_OSPKE. cpuid_entry2_find() never returns an entry for the wrong function. And not that it matters, but cpuid_entry2_find() will always return a precise match for CPUID.0x7.0x0 since the index is significant. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-47-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtimeSean Christopherson
Move the handling of X86_FEATURE_MWAIT during CPUID runtime updates to utilize the lookup done for other CPUID.0x1 features. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-46-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Initialize guest cpu_caps based on KVM supportSean Christopherson
Constrain all guest cpu_caps based on KVM support instead of constraining only the few features that KVM _currently_ needs to verify are actually supported by KVM. The intent of cpu_caps is to track what the guest is actually capable of using, not the raw, unfiltered CPUID values that the guest sees. I.e. KVM should always consult it's only support when making decisions based on guest CPUID, and the only reason KVM has historically made the checks opt-in was due to lack of centralized tracking. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-45-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Treat MONTIOR/MWAIT as a "partially emulated" featureSean Christopherson
Enumerate MWAIT in cpuid_func_emulated(), but only if the caller wants to include "partially emulated" features, i.e. features that KVM kinda sorta emulates, but with major caveats. This will allow initializing the guest cpu_caps based on the set of features that KVM virtualizes and/or emulates, without needing to handle things like MONITOR/MWAIT as one-off exceptions. Adding one-off handling for individual features is quite painful, especially when considering future hardening. It's very doable to verify, at compile time, that every CPUID-based feature that KVM queries when emulating guest behavior is actually known to KVM, e.g. to prevent KVM bugs where KVM emulates some feature but fails to advertise support to userspace. In other words, any features that are special cased, i.e. not handled generically in the CPUID framework, would also need to be special cased for any hardening efforts that build on said framework. Link: https://lore.kernel.org/r/20241128013424.4096668-44-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Extract code for generating per-entry emulated CPUID informationSean Christopherson
Extract the meat of __do_cpuid_func_emulated() into a separate helper, cpuid_func_emulated(), so that cpuid_func_emulated() can be used with a single CPUID entry. This will allow marking emulated features as fully supported in the guest cpu_caps without needing to hardcode the set of emulated features in multiple locations. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-43-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Initialize guest cpu_caps based on guest CPUIDSean Christopherson
Initialize a vCPU's capabilities based on the guest CPUID provided by userspace instead of simply zeroing the entire array. This is the first step toward using cpu_caps to query *all* CPUID-based guest capabilities, i.e. will allow converting all usage of guest_cpuid_has() to guest_cpu_cap_has(). Zeroing the array was the logical choice when using cpu_caps was opt-in, e.g. "unsupported" was generally a safer default, and the whole point of governed features is that KVM would need to check host and guest support, i.e. making everything unsupported by default didn't require more code. But requiring KVM to manually "enable" every CPUID-based feature in cpu_caps would require an absurd amount of boilerplate code. Follow existing CPUID/kvm_cpu_caps nomenclature where possible, e.g. for the change() and clear() APIs. Replace check_and_set() with constrain() to try and capture that KVM is constraining userspace's desired guest feature set based on KVM's capabilities. This is intended to be gigantic nop, i.e. should not have any impact on guest or KVM functionality. This is also an intermediate step; a future commit will also incorporate KVM support into the vCPU's cpu_caps before converting guest_cpuid_has() to guest_cpu_cap_has(). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-42-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Replace guts of "governed" features with comprehensive cpu_capsSean Christopherson
Replace the internals of the governed features framework with a more comprehensive "guest CPU capabilities" implementation, i.e. with a guest version of kvm_cpu_caps. Keep the skeleton of governed features around for now as vmx_adjust_sec_exec_control() relies on detecting governed features to do the right thing for XSAVES, and switching all guest feature queries to guest_cpu_cap_has() requires subtle and non-trivial changes, i.e. is best done as a standalone change. Tracking *all* guest capabilities that KVM cares will allow excising the poorly named "governed features" framework, and effectively optimizes all KVM queries of guest capabilities, i.e. doesn't require making a subjective decision as to whether or not a feature is worth "governing", and doesn't require adding the code to do so. The cost of tracking all features is currently 92 bytes per vCPU on 64-bit kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features. That cost is well worth paying even if the only benefit was eliminating the "governed features" terminology. And practically speaking, the real cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm into a new order-N allocation, and if that happens there are better ways to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR state separate allocations. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-41-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap"Sean Christopherson
As the first step toward replacing KVM's so-called "governed features" framework with a more comprehensive, less poorly named implementation, replace the "kvm_governed_feature" function prefix with "guest_cpu_cap" and rename guest_can_use() to guest_cpu_cap_has(). The "guest_cpu_cap" naming scheme mirrors that of "kvm_cpu_cap", and provides a more clear distinction between guest capabilities, which are KVM controlled (heh, or one might say "governed"), and guest CPUID, which with few exceptions is fully userspace controlled. Opportunistically rewrite the comment about XSS passthrough for SEV-ES guests to avoid referencing so many functions, as such comments are prone to becoming stale (case in point...). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-40-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Advertise HYPERVISOR in KVM_GET_SUPPORTED_CPUIDSean Christopherson
Unconditionally advertise "support" for the HYPERVISOR feature in CPUID, as the flag simply communicates to the guest that's it's running under a hypervisor. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-39-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUIDSean Christopherson
Unconditionally advertise TSC_DEADLINE_TIMER via KVM_GET_SUPPORTED_CPUID, as KVM always emulates deadline mode, *if* the VM has an in-kernel local APIC. The odds of a VMM emulating the local APIC in userspace, not emulating the TSC deadline timer, _and_ reflecting KVM_GET_SUPPORTED_CPUID back into KVM_SET_CPUID2, i.e. the risk of over-advertising and breaking any setups, is extremely low. KVM has _unconditionally_ advertised X2APIC via CPUID since commit 0d1de2d901f4 ("KVM: Always report x2apic as supported feature"), and it is completely impossible for userspace to emulate X2APIC as KVM doesn't support forwarding the MSR accesses to userspace. I.e. KVM has relied on userspace VMMs to not misreport local APIC capabilities for nearly 13 years. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-38-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Remove all direct usage of cpuid_entry2_find()Sean Christopherson
Convert all use of cpuid_entry2_find() to kvm_find_cpuid_entry{,index}() now that cpuid_entry2_find() operates on the vCPU state, i.e. now that there is no need to use cpuid_entry2_find() directly in order to pass in non-vCPU state. To help prevent unwanted usage of cpuid_entry2_find(), #undef KVM_CPUID_INDEX_NOT_SIGNIFICANT, i.e. force KVM to use kvm_find_cpuid_entry(). No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-37-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near cpuid_entry2_find()Sean Christopherson
Move kvm_find_cpuid_entry{,_index}() "up" in cpuid.c so that they are colocated with cpuid_entry2_find(), e.g. to make it easier to see the effective guts of the helpers without having to bounce around cpuid.c. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-36-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find()Sean Christopherson
Now that KVM sets vcpu->arch.cpuid_{entries,nent} before processing the incoming CPUID entries during KVM_SET_CPUID{,2}, drop the @entries and @nent params from cpuid_entry2_find() and unconditionally operate on the vCPU state. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-35-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Remove unnecessary caching of KVM's PV CPUID baseSean Christopherson
Now that KVM only searches for KVM's PV CPUID base when userspace sets guest CPUID, drop the cache and simply do the search every time. Practically speaking, this is a nop except for situations where userspace sets CPUID _after_ running the vCPU, which is anything but a hot path, e.g. QEMU does so only when hotplugging a vCPU. And on the flip side, caching guest CPUID information, especially information that is used to query/modify _other_ CPUID state, is inherently dangerous as it's all too easy to use stale information, i.e. KVM should only cache CPUID state when the performance and/or programming benefits justify it. Link: https://lore.kernel.org/r/20241128013424.4096668-34-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets CPUIDSean Christopherson
Now that KVM disallows disabling HLT-exiting after vCPUs have been created, i.e. now that it's impossible for kvm_hlt_in_guest() to change while vCPUs are running, apply KVM's PV_UNHALT quirk only when userspace is setting guest CPUID. Opportunistically rename the helper to make it clear that KVM's behavior is a quirk that should never have been added. KVM's documentation explicitly states that userspace should not advertise PV_UNHALT if HLT-exiting is disabled, but for unknown reasons, commit caa057a2cad6 ("KVM: X86: Provide a capability to disable HLT intercepts") didn't stop at documenting the requirement and also massaged the incoming guest CPUID. Unfortunately, it's quite likely that userspace has come to rely on KVM's behavior, i.e. the code can't simply be deleted. The only reason KVM doesn't have an "official" quirk is that there is no known use case where disabling the quirk would make sense, i.e. letting userspace disable the quirk would further increase KVM's burden without any benefit. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-33-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2Sean Christopherson
When handling KVM_SET_CPUID{,2}, swap the old and new CPUID arrays and lengths before processing the new CPUID, and simply undo the swap if setting the new CPUID fails for whatever reason. To keep the diff reasonable, continue passing the entry array and length to most helpers, and defer the more complete cleanup to future commits. For any sane VMM, setting "bad" CPUID state is not a hot path (or even something that is surviable), and setting guest CPUID before it's known good will allow removing all of KVM's infrastructure for processing CPUID entries directly (as opposed to operating on vcpu->arch.cpuid_entries). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20241128013424.4096668-32-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-18KVM: x86: Add a macro to init CPUID features that KVM emulates in softwareSean Christopherson
Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to OR-in features that KVM emulates in software, i.e. that don't depend on the feature being available in hardware. The contained scope of kvm_cpu_cap_init() allows using a local variable to track the set of emulated leaves, which in addition to avoiding confusing and/or unnecessary variables, helps prevent misuse of EMUL_F(). Link: https://lore.kernel.org/r/20241128013424.4096668-31-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>