summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2025-05-04x86/fpu: Check TIF_NEED_FPU_LOAD instead of PF_KTHREAD|PF_USER_WORKER in ↵Oleg Nesterov
fpu__drop() PF_KTHREAD|PF_USER_WORKER tasks should never clear TIF_NEED_FPU_LOAD, so the TIF_NEED_FPU_LOAD check should equally filter them out. And this way an exiting userspace task can avoid the unnecessary "fwait" if it does context_switch() at least once on its way to exit_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143856.GA9009@redhat.com
2025-05-04x86/fpu: Always use memcpy_and_pad() in arch_dup_task_struct()Oleg Nesterov
It makes no sense to copy the bytes after sizeof(struct task_struct), FPU state will be initialized in fpu_clone(). A plain memcpy(dst, src, sizeof(struct task_struct)) should work too, but "_and_pad" looks safer. [ mingo: Simplify it a bit more. ] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143850.GA8997@redhat.com
2025-05-04x86/fpu: Remove DEFINE_EVENT(x86_fpu, x86_fpu_copy_src)Oleg Nesterov
trace_x86_fpu_copy_src() has no users after: 22aafe3bcb67 ("x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks") Remove the event. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143843.GA8989@redhat.com
2025-05-04x86/fpu: Remove x86_init_fpuOleg Nesterov
It is not actually used after: 55bc30f2e34d ("x86/fpu: Remove the thread::fpu pointer") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143837.GA8985@redhat.com
2025-05-04x86/fpu: Simplify the switch_fpu_prepare() + switch_fpu_finish() logicOleg Nesterov
Now that switch_fpu_finish() doesn't load the FPU state, it makes more sense to fold it into switch_fpu_prepare() renamed to switch_fpu(), and more importantly, use the "prev_p" task as a target for TIF_NEED_FPU_LOAD. It doesn't make any sense to delay set_tsk_thread_flag(TIF_NEED_FPU_LOAD) until "prev_p" is scheduled again. There is no worry about the very first context switch, fpu_clone() must always set TIF_NEED_FPU_LOAD. Also, shift the test_tsk_thread_flag(TIF_NEED_FPU_LOAD) from the callers to switch_fpu(). Note that the "PF_KTHREAD | PF_USER_WORKER" check can be removed but this deserves a separate patch which can change more functions, say, kernel_fpu_begin_mask(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chang S . Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250503143830.GA8982@redhat.com
2025-05-04Merge tag 'v6.15-rc4' into x86/fpu, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-04x86/boot/sev: Support memory acceptance in the EFI stub under SVSMArd Biesheuvel
Commit: d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") provided a fix for SEV-SNP memory acceptance from the EFI stub when running at VMPL #0. However, that fix was insufficient for SVSM SEV-SNP guests running at VMPL >0, as those rely on a SVSM calling area, which is a shared buffer whose address is programmed into a SEV-SNP MSR, and the SEV init code that sets up this calling area executes much later during the boot. Given that booting via the EFI stub at VMPL >0 implies that the firmware has configured this calling area already, reuse it for performing memory acceptance in the EFI stub. Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250428174322.2780170-2-ardb+git@google.com
2025-05-02KVM: x86/mmu: Prevent installing hugepages when mem attributes are changingSean Christopherson
When changing memory attributes on a subset of a potential hugepage, add the hugepage to the invalidation range tracking to prevent installing a hugepage until the attributes are fully updated. Like the actual hugepage tracking updates in kvm_arch_post_set_memory_attributes(), process only the head and tail pages, as any potential hugepages that are entirely covered by the range will already be tracked. Note, only hugepage chunks whose current attributes are NOT mixed need to be added to the invalidation set, as mixed attributes already prevent installing a hugepage, and it's perfectly safe to install a smaller mapping for a gfn whose attributes aren't changing. Fixes: 8dd2eee9d526 ("KVM: x86/mmu: Handle page fault for private memory") Cc: stable@vger.kernel.org Reported-by: Michael Roth <michael.roth@amd.com> Tested-by: Michael Roth <michael.roth@amd.com> Link: https://lore.kernel.org/r/20250430220954.522672-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fieldsTom Lendacky
Commit 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it") updated the SEV code to take a snapshot of the GHCB before using it. But the dump_ghcb() function wasn't updated to use the snapshot locations. This results in incorrect output from dump_ghcb() for the "is_valid" and "valid_bitmap" fields. Update dump_ghcb() to use the proper locations. Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lore.kernel.org/r/8f03878443681496008b1b37b7c4bf77a342b459.1745866531.git.thomas.lendacky@amd.com [sean: add comment and snapshot qualifier] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02KVM: VMX: Clean up and macrofy x86_opsVishal Verma
Eliminate a lot of stub definitions by using macros to define the TDX vs non-TDX versions of various x86_ops. Moving the x86_ops wrappers under CONFIG_KVM_INTEL_TDX also allows nearly all of vmx/main.c to go under a single #ifdef, eliminating trampolines in the generated code, and almost all of the stubs. For example, with CONFIG_KVM_INTEL_TDX=n, before this cleanup, vt_refresh_apicv_exec_ctrl() would produce: 0000000000036490 <vt_refresh_apicv_exec_ctrl>: 36490: f3 0f 1e fa endbr64 36494: e8 00 00 00 00 call 36499 <vt_refresh_apicv_exec_ctrl+0x9> 36495: R_X86_64_PLT32 __fentry__-0x4 36499: e9 00 00 00 00 jmp 3649e <vt_refresh_apicv_exec_ctrl+0xe> 3649a: R_X86_64_PLT32 vmx_refresh_apicv_exec_ctrl-0x4 3649e: 66 90 xchg %ax,%ax After this patch, this is completely eliminated. Based on a patch by Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/kvm/Z6v9yjWLNTU6X90d@google.com/ Cc: Sean Christopherson <seanjc@google.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20250318-vverma7-cleanup_x86_ops-v2-4-701e82d6b779@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02KVM: VMX: Define a VMX glue macro for kvm_complete_insn_gp()Vishal Verma
Define kvm_complete_insn_gp() as vmx_complete_emulated_msr() and use the glue wrapper in vt_complete_emulated_msr() so that VT's .complete_emulated_msr() implementation follows the soon-to-be-standard pattern of: vt_abc: if (is_td()) return tdx_abc(); return vmx_abc(); This will allow generating such wrappers via a macro, which in turn will make it trivially easy to skip the wrappers entirely when KVM_INTEL_TDX=n. Suggested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/kvm/Z6v9yjWLNTU6X90d@google.com/ Cc: Sean Christopherson <seanjc@google.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20250318-vverma7-cleanup_x86_ops-v2-3-701e82d6b779@intel.com [sean: massage shortlog+changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02KVM: VMX: Move vt_apicv_pre_state_restore() to posted_intr.c and tweak nameVishal Verma
In preparation for a cleanup of the kvm_x86_ops struct for TDX, all vt_* functions are expected to act as glue functions that route to either tdx_* or vmx_* based on the VM type. Specifically, the pattern is: vt_abc: if (is_td()) return tdx_abc(); return vmx_abc(); But vt_apicv_pre_state_restore() does not follow this pattern. To facilitate that cleanup, rename and move vt_apicv_pre_state_restore() into posted_intr.c. Opportunistically turn vcpu_to_pi_desc() back into a static function, as the only reason it was exposed outside of posted_intr.c was for vt_apicv_pre_state_restore(). No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/kvm/Z6v9yjWLNTU6X90d@google.com/ Cc: Sean Christopherson <seanjc@google.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linxu.intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20250318-vverma7-cleanup_x86_ops-v2-2-701e82d6b779@intel.com [sean: apply Chao's suggestions, massage shortlog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02KVM: x86: Revert kvm_x86_ops.mem_enc_ioctl() back to an OPTIONAL hookSean Christopherson
Restore KVM's handling of a NULL kvm_x86_ops.mem_enc_ioctl, as the hook is NULL on SVM when CONFIG_KVM_AMD_SEV=n, and TDX will soon follow suit. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at arch/x86/include/asm/kvm-x86-ops.h:130 kvm_x86_vendor_init+0x178b/0x18e0 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.15.0-rc2-dc1aead1a985-sink-vm #2 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_x86_vendor_init+0x178b/0x18e0 Call Trace: <TASK> svm_init+0x2e/0x60 do_one_initcall+0x56/0x290 kernel_init_freeable+0x192/0x1e0 kernel_init+0x16/0x130 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> ---[ end trace 0000000000000000 ]--- Opportunistically drop the superfluous curly braces. Link: https://lore.kernel.org/all/20250318-vverma7-cleanup_x86_ops-v2-4-701e82d6b779@intel.com Fixes: b2aaf38ced69 ("KVM: TDX: Add place holder for TDX VM specific mem_enc_op ioctl") Link: https://lore.kernel.org/r/20250502203421.865686-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02KVM: selftests: Add library support for interacting with SNPPratik R. Sampat
Extend the SEV library to include support for SNP ioctl() wrappers, which aid in launching and interacting with a SEV-SNP guest. Signed-off-by: Pratik R. Sampat <prsampat@amd.com> Link: https://lore.kernel.org/r/20250305230000.231025-8-prsampat@amd.com [sean: use BIT()] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-02x86/msr: Change the function type of native_read_msr_safe()Xin Li (Intel)
Modify the function type of native_read_msr_safe() to: int native_read_msr_safe(u32 msr, u64 *val) This change makes the function return an error code instead of the MSR value, aligning it with the type of native_write_msr_safe(). Consequently, their callers can check the results in the same way. While at it, convert leftover MSR data type "unsigned int" to u32. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-16-xin@zytor.com
2025-05-02x86/msr: Replace wrmsr(msr, low, 0) with wrmsrq(msr, low)Xin Li (Intel)
The third argument in wrmsr(msr, low, 0) is unnecessary. Instead, use wrmsrq(msr, low), which automatically sets the higher 32 bits of the MSR value to 0. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-15-xin@zytor.com
2025-05-02x86/pvops/msr: Refactor pv_cpu_ops.write_msr{,_safe}()Xin Li (Intel)
An MSR value is represented as a 64-bit unsigned integer, with existing MSR instructions storing it in EDX:EAX as two 32-bit segments. The new immediate form MSR instructions, however, utilize a 64-bit general-purpose register to store the MSR value. To unify the usage of all MSR instructions, let the default MSR access APIs accept an MSR value as a single 64-bit argument instead of two 32-bit segments. The dual 32-bit APIs are still available as convenient wrappers over the APIs that handle an MSR value as a single 64-bit argument. The following illustrates the updated derivation of the MSR write APIs: __wrmsrq(u32 msr, u64 val) / \ / \ native_wrmsrq(msr, val) native_wrmsr(msr, low, high) | | native_write_msr(msr, val) / \ / \ wrmsrq(msr, val) wrmsr(msr, low, high) When CONFIG_PARAVIRT is enabled, wrmsrq() and wrmsr() are defined on top of paravirt_write_msr(): paravirt_write_msr(u32 msr, u64 val) / \ / \ wrmsrq(msr, val) wrmsr(msr, low, high) paravirt_write_msr() invokes cpu.write_msr(msr, val), an indirect layer of pv_ops MSR write call: If on native: cpu.write_msr = native_write_msr If on Xen: cpu.write_msr = xen_write_msr Therefore, refactor pv_cpu_ops.write_msr{_safe}() to accept an MSR value in a single u64 argument, replacing the current dual u32 arguments. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-14-xin@zytor.com
2025-05-02x86/xen/msr: Remove the error pointer argument from set_seg()Xin Li (Intel)
set_seg() is used to write the following MSRs on Xen: MSR_FS_BASE MSR_KERNEL_GS_BASE MSR_GS_BASE But none of these MSRs are written using any MSR write safe API. Therefore there is no need to pass an error pointer argument to set_seg() for returning an error code to be used in MSR safe APIs. Remove the error pointer argument. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-13-xin@zytor.com
2025-05-02x86/xen/msr: Remove pmu_msr_{read,write}()Xin Li (Intel)
As pmu_msr_{read,write}() are now wrappers of pmu_msr_chk_emulated(), remove them and use pmu_msr_chk_emulated() directly. As pmu_msr_chk_emulated() could easily return false in the cases where it would set *emul to false, remove the "emul" argument and use the return value instead. While at it, convert the data type of MSR index to u32 in functions called in pmu_msr_chk_emulated(). Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com> Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-12-xin@zytor.com
2025-05-02x86/xen/msr: Remove calling native_{read,write}_msr{,_safe}() in ↵Xin Li (Intel)
pmu_msr_{read,write}() hpa found that pmu_msr_write() is actually a completely pointless function: https://lore.kernel.org/lkml/0ec48b84-d158-47c6-b14c-3563fd14bcc4@zytor.com/ all it does is shuffle some arguments, then calls pmu_msr_chk_emulated() and if it returns true AND the emulated flag is clear then does *exactly the same thing* that the calling code would have done if pmu_msr_write() itself had returned true. And pmu_msr_read() does the equivalent stupidity. Remove the calls to native_{read,write}_msr{,_safe}() within pmu_msr_{read,write}(). Instead reuse the existing calling code that decides whether to call native_{read,write}_msr{,_safe}() based on the return value from pmu_msr_{read,write}(). Consequently, eliminate the need to pass an error pointer to pmu_msr_{read,write}(). While at it, refactor pmu_msr_write() to take the MSR value as a u64 argument, replacing the current dual u32 arguments, because the dual u32 arguments were only used to call native_write_msr{,_safe}(), which has now been removed. Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-11-xin@zytor.com
2025-05-02x86/msr: Convert __rdmsr() uses to native_rdmsrq() usesXin Li (Intel)
__rdmsr() is the lowest level MSR write API, with native_rdmsr() and native_rdmsrq() serving as higher-level wrappers around it. #define native_rdmsr(msr, val1, val2) \ do { \ u64 __val = __rdmsr((msr)); \ (void)((val1) = (u32)__val); \ (void)((val2) = (u32)(__val >> 32)); \ } while (0) static __always_inline u64 native_rdmsrq(u32 msr) { return __rdmsr(msr); } However, __rdmsr() continues to be utilized in various locations. MSR APIs are designed for different scenarios, such as native or pvops, with or without trace, and safe or non-safe. Unfortunately, the current MSR API names do not adequately reflect these factors, making it challenging to select the most appropriate API for various situations. To pave the way for improving MSR API names, convert __rdmsr() uses to native_rdmsrq() to ensure consistent usage. Later, these APIs can be renamed to better reflect their implications, such as native or pvops, with or without trace, and safe or non-safe. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-10-xin@zytor.com
2025-05-02x86/msr: Add the native_rdmsrq() helperXin Li (Intel)
__rdmsr() is the lowest-level primitive MSR read API, implemented in assembly code and returning an MSR value in a u64 integer, on top of which a convenience wrapper native_rdmsr() is defined to return an MSR value in two u32 integers. For some reason, native_rdmsrq() is not defined and __rdmsr() is directly used when it needs to return an MSR value in a u64 integer. Add the native_rdmsrq() helper, which is simply an alias of __rdmsr(), to make native_rdmsr() and native_rdmsrq() a pair of MSR read APIs. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-9-xin@zytor.com
2025-05-02x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() usesXin Li (Intel)
__wrmsr() is the lowest level MSR write API, with native_wrmsr() and native_wrmsrq() serving as higher-level wrappers around it: #define native_wrmsr(msr, low, high) \ __wrmsr(msr, low, high) #define native_wrmsrl(msr, val) \ __wrmsr((msr), (u32)((u64)(val)), \ (u32)((u64)(val) >> 32)) However, __wrmsr() continues to be utilized in various locations. MSR APIs are designed for different scenarios, such as native or pvops, with or without trace, and safe or non-safe. Unfortunately, the current MSR API names do not adequately reflect these factors, making it challenging to select the most appropriate API for various situations. To pave the way for improving MSR API names, convert __wrmsr() uses to native_wrmsr{,q}() to ensure consistent usage. Later, these APIs can be renamed to better reflect their implications, such as native or pvops, with or without trace, and safe or non-safe. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-8-xin@zytor.com
2025-05-02x86/xen/msr: Return u64 consistently in Xen PMC xen_*_read functionsXin Li (Intel)
The pv_ops PMC read API is defined as: u64 (*read_pmc)(int counter); But Xen PMC read functions return 'unsigned long long', make them return u64 consistently. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250427092027.1598740-7-xin@zytor.com
2025-05-02x86/msr: Convert the rdpmc() macro to an __always_inline functionXin Li (Intel)
Functions offer type safety and better readability compared to macros. Additionally, always inline functions can match the performance of macros. Converting the rdpmc() macro into an always inline function is simple and straightforward, so just make the change. Moreover, the read result is now the returned value, further enhancing readability. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250427092027.1598740-6-xin@zytor.com
2025-05-02x86/msr: Rename rdpmcl() to rdpmc()Xin Li (Intel)
Now that rdpmc() is gone, rdpmcl() is the sole PMC read helper, simply rename rdpmcl() to rdpmc(). Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250427092027.1598740-5-xin@zytor.com
2025-05-02x86/msr: Remove the unused rdpmc() methodXin Li (Intel)
rdpmc() is not used anywhere anymore, remove it. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250427092027.1598740-4-xin@zytor.com
2025-05-02x86/msr: Move rdtsc{,_ordered}() to <asm/tsc.h>Xin Li (Intel)
Relocate rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h>. [ mingo: Do not remove the <asm/tsc.h> inclusion from <asm/msr.h> just yet, to reduce -next breakages. We can do this later on, separately, shortly before the next -rc1. ] Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250427092027.1598740-3-xin@zytor.com
2025-05-02x86/msr: Add explicit includes of <asm/msr.h>Xin Li (Intel)
For historic reasons there are some TSC-related functions in the <asm/msr.h> header, even though there's an <asm/tsc.h> header. To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h> and to eventually eliminate the inclusion of <asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency to the source files that reference definitions from <asm/msr.h>. [ mingo: Clarified the changelog. ] Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
2025-05-02x86/msr: Move the EAX_EDX_*() methods from <asm/msr.h> to <asm/asm.h>Ingo Molnar
We are going to use them from multiple headers, and in any case, such register access wrapper macros are better in <asm/asm.h> anyway. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: linux-kernel@vger.kernel.org
2025-05-02x86/msr: Rename DECLARE_ARGS() to EAX_EDX_DECLARE_ARGSIngo Molnar
DECLARE_ARGS() is way too generic of a name that says very little about why these args are declared in that fashion - use the EAX_EDX_ prefix to create a common prefix between the three helper methods: EAX_EDX_DECLARE_ARGS() EAX_EDX_VAL() EAX_EDX_RET() Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: linux-kernel@vger.kernel.org
2025-05-02x86/msr: Improve the comments of the ↵Ingo Molnar
DECLARE_ARGS()/EAX_EDX_VAL()/EAX_EDX_RET() facility Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: linux-kernel@vger.kernel.org
2025-05-02Merge tag 'v6.15-rc4' into x86/msr, to pick up fixes and resolve conflictsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-01x86/devmem: Remove duplicate range_is_allowed() definitionDan Williams
17 years ago, Venki suggested [1] "A future improvement would be to avoid the range_is_allowed duplication". The only thing preventing a common implementation is that phys_mem_access_prot_allowed() expects the range check to exit immediately when PAT is disabled [2]. I.e. there is no cache conflict to manage in that case. This cleanup was noticed on the path to considering changing range_is_allowed() policy to blanket deny /dev/mem for private (confidential computing) memory. Note, however that phys_mem_access_prot_allowed() has long since stopped being relevant for managing cache-type validation due to [3], and [4]. Commit 0124cecfc85a ("x86, PAT: disable /dev/mem mmap RAM with PAT") [1] Commit 9e41bff2708e ("x86: fix /dev/mem mmap breakage when PAT is disabled") [2] Commit 1886297ce0c8 ("x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386") [3] Commit 0c3c8a18361a ("x86, PAT: Remove duplicate memtype reserve in devmem mmap") [4] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/all/20250430024622.1134277-2-dan.j.williams%40intel.com
2025-04-30x86/CPU/AMD: Replace strcpy() with strscpy()Ruben Wauters
strcpy() is deprecated due to issues with bounds checking and overflows. Replace it with strscpy(). Signed-off-by: Ruben Wauters <rubenru09@aol.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250429230710.54014-1-rubenru09@aol.com
2025-04-30x86/microcode/AMD: Do not return error when microcode update is not necessaryAnnie Li
After 6f059e634dcd("x86/microcode: Clarify the late load logic"), if the load is up-to-date, the AMD side returns UCODE_OK which leads to load_late_locked() returning -EBADFD. Handle UCODE_OK in the switch case to avoid this error. [ bp: Massage commit message. ] Fixes: 6f059e634dcd ("x86/microcode: Clarify the late load logic") Signed-off-by: Annie Li <jiayanli@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250430053424.77438-1-jiayanli@google.com
2025-04-30perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value.Sean Christopherson
When generating the MSR_IA32_PEBS_ENABLE value that will be loaded on VM-Entry to a KVM guest, mask the value with the vCPU's desired PEBS_ENABLE value. Consulting only the host kernel's host vs. guest masks results in running the guest with PEBS enabled even when the guest doesn't want to use PEBS. Because KVM uses perf events to proxy the guest virtual PMU, simply looking at exclude_host can't differentiate between events created by host userspace, and events created by KVM on behalf of the guest. Running the guest with PEBS unexpectedly enabled typically manifests as crashes due to a near-infinite stream of #PFs. E.g. if the guest hasn't written MSR_IA32_DS_AREA, the CPU will hit page faults on address '0' when trying to record PEBS events. The issue is most easily reproduced by running `perf kvm top` from before commit 7b100989b4f6 ("perf evlist: Remove __evlist__add_default") (after which, `perf kvm top` effectively stopped using PEBS). The userspace side of perf creates a guest-only PEBS event, which intel_guest_get_msrs() misconstrues a guest-*owned* PEBS event. Arguably, this is a userspace bug, as enabling PEBS on guest-only events simply cannot work, and userspace can kill VMs in many other ways (there is no danger to the host). However, even if this is considered to be bad userspace behavior, there's zero downside to perf/KVM restricting PEBS to guest-owned events. Note, commit 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") fixed the case where host userspace is profiling KVM *and* userspace, but missed the case where userspace is profiling only KVM. Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Closes: https://lore.kernel.org/all/Z_VUswFkWiTYI0eD@do-x1carbon Reported-by: Seth Forshee <sforshee@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250426001355.1026530-1-seanjc@google.com
2025-04-30x86/bugs: Restructure SRSO mitigationDavid Kaplan
Restructure SRSO to use select/update/apply functions to create consistent vulnerability handling. Like with retbleed, the command line options directly select mitigations which can later be modified. While at it, remove a comment which doesn't apply anymore due to the changed mitigation detection flow. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-17-david.kaplan@amd.com
2025-04-29x86/bugs: Restructure L1TF mitigationDavid Kaplan
Restructure L1TF to use select/apply functions to create consistent vulnerability handling. Define new AUTO mitigation for L1TF. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-16-david.kaplan@amd.com
2025-04-29x86/bugs: Restructure SSB mitigationDavid Kaplan
Restructure SSB to use select/apply functions to create consistent vulnerability handling. Remove __ssb_select_mitigation() and split the functionality between the select/apply functions. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-15-david.kaplan@amd.com
2025-04-29x86/bugs: Restructure spectre_v2 mitigationDavid Kaplan
Restructure spectre_v2 to use select/update/apply functions to create consistent vulnerability handling. The spectre_v2 mitigation may be updated based on the selected retbleed mitigation. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-14-david.kaplan@amd.com
2025-04-29x86/bugs: Restructure BHI mitigationDavid Kaplan
Restructure BHI mitigation to use select/update/apply functions to create consistent vulnerability handling. BHI mitigation was previously selected from within spectre_v2_select_mitigation() and now is selected from cpu_select_mitigation() like with all others. Define new AUTO mitigation for BHI. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-13-david.kaplan@amd.com
2025-04-29x86/bugs: Restructure spectre_v2_user mitigationDavid Kaplan
Restructure spectre_v2_user to use select/update/apply functions to create consistent vulnerability handling. The IBPB/STIBP choices are first decided based on the spectre_v2_user command line but can be modified by the spectre_v2 command line option as well. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-12-david.kaplan@amd.com
2025-04-29KVM: x86: Unify cross-vCPU IBPBSean Christopherson
Both SVM and VMX have similar implementation for executing an IBPB between running different vCPUs on the same CPU to create separate prediction domains for different vCPUs. For VMX, when the currently loaded VMCS is changed in vmx_vcpu_load_vmcs(), an IBPB is executed if there is no 'buddy', which is the case on vCPU load. The intention is to execute an IBPB when switching vCPUs, but not when switching the VMCS within the same vCPU. Executing an IBPB on nested transitions within the same vCPU is handled separately and conditionally in nested_vmx_vmexit(). For SVM, the current VMCB is tracked on vCPU load and an IBPB is executed when it is changed. The intention is also to execute an IBPB when switching vCPUs, although it is possible that in some cases an IBBP is executed when switching VMCBs for the same vCPU. Executing an IBPB on nested transitions should be handled separately, and is proposed at [1]. Unify the logic by tracking the last loaded vCPU and execuintg the IBPB on vCPU change in kvm_arch_vcpu_load() instead. When a vCPU is destroyed, make sure all references to it are removed from any CPU. This is similar to how SVM clears the current_vmcb tracking on vCPU destruction. Remove the current VMCB tracking in SVM as it is no longer required, as well as the 'buddy' parameter to vmx_vcpu_load_vmcs(). [1] https://lore.kernel.org/lkml/20250221163352.3818347-4-yosry.ahmed@linux.dev Link: https://lore.kernel.org/all/20250320013759.3965869-1-yosry.ahmed@linux.dev Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> [sean: tweak comment to stay at/under 80 columns] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-29KVM: SVM: Clear current_vmcb during vCPU free for all *possible* CPUsYosry Ahmed
When freeing a vCPU and thus its VMCB, clear current_vmcb for all possible CPUs, not just online CPUs, as it's theoretically possible a CPU could go offline and come back online in conjunction with KVM reusing the page for a new VMCB. Link: https://lore.kernel.org/all/20250320013759.3965869-1-yosry.ahmed@linux.dev Fixes: fd65d3142f73 ("kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb") Cc: stable@vger.kernel.org Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-29x86/bugs: Restructure retbleed mitigationDavid Kaplan
Restructure retbleed mitigation to use select/update/apply functions to create consistent vulnerability handling. The retbleed_update_mitigation() simplifies the dependency between spectre_v2 and retbleed. The command line options now directly select a preferred mitigation which simplifies the logic. Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/20250418161721.1855190-11-david.kaplan@amd.com
2025-04-28x86/sgx: Use SHA-256 library API instead of crypto_shash APIEric Biggers
This user of SHA-256 does not support any other algorithm, so the crypto_shash abstraction provides no value. Just use the SHA-256 library API instead, which is much simpler and easier to use. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250428183838.799333-1-ebiggers%40kernel.org
2025-04-28x86/microcode/AMD: Use sha256() instead of init/update/finalEric Biggers
Just call sha256() instead of doing the init/update/final sequence. No functional changes. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250428183006.782501-1-ebiggers@kernel.org
2025-04-28x86/sev: Remove unnecessary GFP_KERNEL_ACCOUNT for temporary variablesPeng Hao
Some variables allocated in sev_send_update_data are released when the function exits, so there is no need to set GFP_KERNEL_ACCOUNT. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20250428063013.62311-1-flyingpeng@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-04-28KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload()Yan Zhao
Check request KVM_REQ_MMU_FREE_OBSOLETE_ROOTS to free obsolete roots in kvm_mmu_reload() to prevent kvm_mmu_reload() from seeing a stale obsolete root. Since kvm_mmu_reload() can be called outside the vcpu_enter_guest() path (e.g., kvm_arch_vcpu_pre_fault_memory()), it may be invoked after a root has been marked obsolete and before vcpu_enter_guest() is invoked to process KVM_REQ_MMU_FREE_OBSOLETE_ROOTS and set root.hpa to invalid. This causes kvm_mmu_reload() to fail to load a new root, which can lead to kvm_arch_vcpu_pre_fault_memory() being stuck in the while loop in kvm_tdp_map_page() since RET_PF_RETRY is always returned due to is_page_fault_stale(). Keep the existing check of KVM_REQ_MMU_FREE_OBSOLETE_ROOTS in vcpu_enter_guest() since the cost of kvm_check_request() is negligible, especially a check that's guarded by kvm_request_pending(). Export symbol of kvm_mmu_free_obsolete_roots() as kvm_mmu_reload() is inline and may be called outside of kvm.ko. Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20250318013333.5817-1-yan.y.zhao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>