summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2025-04-16x86/fpu/apx: Disallow conflicting MPX presenceChang S. Bae
XSTATE components are architecturally independent. There is no rule requiring their offsets in the non-compacted format to be strictly ascending or mutually non-overlapping. However, in practice, such overlaps have not occurred -- until now. APX is introduced as xstate component 19, following AMX. In the non-compacted XSAVE format, its offset overlaps with the space previously occupied by the now-deprecated MPX feature: 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86") To prevent conflicts, the kernel must ensure the CPU never expose both features at the same time. If so, it indicates unreliable hardware. In such cases, XSAVE should be disabled entirely as a precautionary measure. Add a sanity check to detect this condition and disable XSAVE if an invalid hardware configuration is identified. Note: MPX state components remain enabled on legacy systems solely for KVM guest support. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-4-chang.seok.bae@intel.com
2025-04-16x86/fpu/apx: Define APX state componentChang S. Bae
Advanced Performance Extensions (APX) is associated with a new state component number 19. To support saving and restoring of the corresponding registers via the XSAVE mechanism, introduce the component definition along with the necessary sanity checks. Define the new component number, state name, and those register data type. Then, extend the size checker to validate the register data type and explicitly list the APX feature flag as a dependency for the new component in xsave_cpuid_features[]. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-3-chang.seok.bae@intel.com
2025-04-16x86/cpufeatures: Add X86_FEATURE_APXChang S. Bae
Intel Advanced Performance Extensions (APX) introduce a new set of general-purpose registers, managed as an extended state component via the xstate management facility. Before enabling this new xstate, define a feature flag to clarify the dependency in xsave_cpuid_features[]. APX is enumerated under CPUID level 7 with EDX=1. Since this CPUID leaf is not yet allocated, place the flag in a scattered feature word. While this feature is intended only for userspace, exposing it via /proc/cpuinfo is unnecessary. Instead, the existing arch_prctl(2) mechanism with the ARCH_GET_XCOMP_SUPP option can be used to query the feature availability. Finally, clarify that APX depends on XSAVE. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250416021720.12305-2-chang.seok.bae@intel.com
2025-04-16crypto: x86/poly1305 - don't select CRYPTO_LIB_POLY1305_GENERICEric Biggers
The x86 Poly1305 code never falls back to the generic code, so selecting CRYPTO_LIB_POLY1305_GENERIC is unnecessary. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16crypto: x86/poly1305 - remove redundant shash algorithmEric Biggers
Since crypto/poly1305.c now registers a poly1305-$(ARCH) shash algorithm that uses the architecture's Poly1305 library functions, individual architectures no longer need to do the same. Therefore, remove the redundant shash algorithm from the arch-specific code and leave just the library functions there. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16crypto: poly1305 - centralize the shash wrappers for arch codeEric Biggers
Following the example of the crc32, crc32c, and chacha code, make the crypto subsystem register both generic and architecture-optimized poly1305 shash algorithms, both implemented on top of the appropriate library functions. This eliminates the need for every architecture to implement the same shash glue code. Note that the poly1305 shash requires that the key be prepended to the data, which differs from the library functions where the key is simply a parameter to poly1305_init(). Previously this was handled at a fairly low level, polluting the library code with shash-specific code. Reorganize things so that the shash code handles this quirk itself. Also, to register the architecture-optimized shashes only when architecture-optimized code is actually being used, add a function poly1305_is_arch_optimized() and make each arch implement it. Change each architecture's Poly1305 module_init function to arch_initcall so that the CPU feature detection is guaranteed to run before poly1305_is_arch_optimized() gets called by crypto/poly1305.c. (In cases where poly1305_is_arch_optimized() just returns true unconditionally, using arch_initcall is not strictly needed, but it's still good to be consistent across architectures.) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16crypto: lib/sm3 - Move sm3 library into lib/cryptoHerbert Xu
Move the sm3 library code into lib/crypto. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16x86: Make simd.h more resilientHerbert Xu
Add missing header inclusions and protect against double inclusion. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-16Merge branch 'x86/cpu' into x86/fpu, to pick up dependent commitsIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-16x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove coresPi Xiange
Bartlett Lake has a P-core only product with Raptor Cove. [ mingo: Switch around the define as pointed out by Christian Ludloff: Ratpr Cove is the core, Bartlett Lake is the product. Signed-off-by: Pi Xiange <xiange.pi@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Christian Ludloff <ludloff@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: "Ahmed S. Darwish" <darwi@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250414032839.5368-1-xiange.pi@intel.com
2025-04-16Merge branch 'linus' into x86/cpu, to resolve conflictsIngo Molnar
Conflicts: tools/arch/x86/include/asm/cpufeatures.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-15x86/cpufeatures: Shorten X86_FEATURE_AMD_HETEROGENEOUS_CORESXin Li (Intel)
Shorten X86_FEATURE_AMD_HETEROGENEOUS_CORES to X86_FEATURE_AMD_HTR_CORES to make the last column aligned consistently in the whole file. No functional changes. Suggested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250415175410.2944032-4-xin@zytor.com
2025-04-15x86/cpufeatures: Shorten X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXITXin Li (Intel)
Shorten X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT to X86_FEATURE_CLEAR_BHB_VMEXIT to make the last column aligned consistently in the whole file. There's no need to explain in the name what the mitigation does. No functional changes. Suggested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250415175410.2944032-3-xin@zytor.com
2025-04-15x86/cpufeatures: Clean up formattingBorislav Petkov (AMD)
It is a special file with special formatting so remove one whitespace damage and format newer defines like the rest. No functional changes. [ Xin: Do the same to tools/arch/x86/include/asm/cpufeatures.h. ] Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250415175410.2944032-2-xin@zytor.com
2025-04-14x86/bugs: Remove X86_BUG_MMIO_UNKNOWNBorislav Petkov (AMD)
Whack this thing because: - the "unknown" handling is done only for this vuln and not for the others - it doesn't do anything besides reporting things differently. It doesn't apply any mitigations - it is simply causing unnecessary complications to the code which don't bring anything besides maintenance overhead to what is already a very nasty spaghetti pile - all the currently unaffected CPUs can also be in "unknown" status so there's no need for special handling here so get rid of it. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: David Kaplan <david.kaplan@amd.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://lore.kernel.org/r/20250414150951.5345-1-bp@kernel.org
2025-04-14x86/cpuid: Align macro linebreaks verticallyBorislav Petkov (AMD)
Align the backspaces vertically again, after recent cleanups. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ahmed S. Darwish <darwi@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250414094130.6768-1-bp@kernel.org
2025-04-14x86/platform/amd: Move the <asm/amd_node.h> header to <asm/amd/node.h>Ingo Molnar
Collect AMD specific platform header files in <asm/amd/*.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Link: https://lore.kernel.org/r/20250413084144.3746608-7-mingo@kernel.org
2025-04-14x86/platform/amd: Clean up the <asm/amd/hsmp.h> header guards a bitIngo Molnar
- There's no need for a newline after the SPDX line - But there's a need for one before the closing header guard. Collect AMD specific platform header files in <asm/amd/*.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Cc: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Link: https://lore.kernel.org/r/20250413084144.3746608-6-mingo@kernel.org
2025-04-14x86/platform/amd: Move the <asm/amd_hsmp.h> header to <asm/amd/hsmp.h>Ingo Molnar
Collect AMD specific platform header files in <asm/amd/*.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Cc: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Link: https://lore.kernel.org/r/20250413084144.3746608-5-mingo@kernel.org
2025-04-14x86/platform/amd: Move the <asm/amd_nb.h> header to <asm/amd/nb.h>Ingo Molnar
Collect AMD specific platform header files in <asm/amd/*.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Link: https://lore.kernel.org/r/20250413084144.3746608-4-mingo@kernel.org
2025-04-14x86/platform/amd: Add standard header guards to <asm/amd/ibs.h>Ingo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Link: https://lore.kernel.org/r/20250413084144.3746608-3-mingo@kernel.org
2025-04-14x86/platform/amd: Move the <asm/amd-ibs.h> header to <asm/amd/ibs.h>Ingo Molnar
Collect AMD specific platform header files in <asm/amd/*.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Link: https://lore.kernel.org/r/20250413084144.3746608-2-mingo@kernel.org
2025-04-14x86/fpu: Clarify FPU context cacheline alignmentIngo Molnar
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/Z_ejggklB5-IWB5W@gmail.com
2025-04-14x86/fpu: Use 'fpstate' variable names consistentlyIngo Molnar
A few uses of 'fps' snuck in, which is rather confusing (to me) as it suggests frames-per-second. ;-) Rename them to the canonical 'fpstate' name. No change in functionality. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250409211127.3544993-9-mingo@kernel.org
2025-04-14x86/fpu: Remove init_task FPU state dependencies, add debugging warning for ↵Ingo Molnar
PF_KTHREAD tasks init_task's FPU state initialization was a bit of a hack: __x86_init_fpu_begin = .; . = __x86_init_fpu_begin + 128*PAGE_SIZE; __x86_init_fpu_end = .; But the init task isn't supposed to be using the FPU context in any case, so remove the hack and add in some debug warnings. As Linus noted in the discussion, the init task (and other PF_KTHREAD tasks) *can* use the FPU via kernel_fpu_begin()/_end(), but they don't need the context area because their FPU use is not preemptible or reentrant, and they don't return to user-space. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250409211127.3544993-8-mingo@kernel.org
2025-04-14x86/fpu: Make sure x86_task_fpu() doesn't get called for ↵Ingo Molnar
PF_KTHREAD|PF_USER_WORKER tasks during exit fpu__drop() and arch_release_task_struct() calls x86_task_fpu() unconditionally, while the FPU context area will not be present if it's the init task, and should not be in use when it's some other type of kthread. Return early for PF_KTHREAD or PF_USER_WORKER tasks. The debug warning in x86_task_fpu() will catch any kthreads attempting to use the FPU save area. Fixed-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250409211127.3544993-7-mingo@kernel.org
2025-04-14x86/fpu: Push 'fpu' pointer calculation into the fpu__drop() callIngo Molnar
This encapsulates the fpu__drop() functionality better, and it will also enable other changes that want to check a task for PF_KTHREAD before calling x86_task_fpu(). Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250409211127.3544993-6-mingo@kernel.org
2025-04-14x86/fpu: Remove the thread::fpu pointerIngo Molnar
As suggested by Oleg, remove the thread::fpu pointer, as we can calculate it via x86_task_fpu() at compile-time. This improves code generation a bit: kepler:~/tip> size vmlinux.before vmlinux.after text data bss dec hex filename 26475405 10435342 1740804 38651551 24dc69f vmlinux.before 26475339 10959630 1216516 38651485 24dc65d vmlinux.after Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250409211127.3544993-5-mingo@kernel.org
2025-04-14x86/fpu: Make task_struct::thread constant sizeIngo Molnar
Turn thread.fpu into a pointer. Since most FPU code internals work by passing around the FPU pointer already, the code generation impact is small. This allows us to remove the old kludge of task_struct being variable size: struct task_struct { ... /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. */ randomized_struct_fields_end /* CPU-specific state of this task: */ struct thread_struct thread; /* * WARNING: on x86, 'thread_struct' contains a variable-sized * structure. It *MUST* be at the end of 'task_struct'. * * Do not put anything below here! */ }; ... which creates a number of problems, such as requiring thread_struct to be the last member of the struct - not allowing it to be struct-randomized, etc. But the primary motivation is to allow the decoupling of task_struct from hardware details (<asm/processor.h> in particular), and to eventually allow the per-task infrastructure: DECLARE_PER_TASK(type, name); ... per_task(current, name) = val; ... which requires task_struct to be a constant size struct. The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed, now that the FPU structure is not embedded in the task struct anymore, which reduces text footprint a bit. Fixed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250409211127.3544993-4-mingo@kernel.org
2025-04-14x86/fpu: Convert task_struct::thread.fpu accesses to use x86_task_fpu()Ingo Molnar
This will make the removal of the task_struct::thread.fpu array easier. No change in functionality - code generated before and after this commit is identical on x86-defconfig: kepler:~/tip> diff -up vmlinux.before.asm vmlinux.after.asm kepler:~/tip> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250409211127.3544993-3-mingo@kernel.org
2025-04-14x86/fpu: Introduce the x86_task_fpu() helper methodIngo Molnar
The per-task FPU context/save area is allocated right next to task_struct, currently in a variable-size array via task_struct::thread.fpu[], but we plan to fully hide it from the C type scope. Introduce the x86_task_fpu() accessor that gets to the FPU context pointer explicitly from the task pointer. Right now this is a simple (task)->thread.fpu wrapper. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chang S. Bae <chang.seok.bae@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250409211127.3544993-2-mingo@kernel.org
2025-04-14x86/fpu/xstate: Adjust xstate copying logic for user ABIChang S. Bae
== Background == As feature positions in the userspace XSAVE buffer do not always align with their feature numbers, the XSAVE format conversion needs to be reconsidered to align with the revised xstate size calculation logic. * For signal handling, XSAVE and XRSTOR are used directly to save and restore extended registers. * For ptrace, KVM, and signal returns (for 32-bit frame), the kernel copies data between its internal buffer and the userspace XSAVE buffer. If memcpy() were used for these cases, existing offset helpers — such as __raw_xsave_addr() or xstate_offsets[] — would be sufficient to handle the format conversion. == Problem == When copying data from the compacted in-kernel buffer to the non-compacted userspace buffer, the function follows the user_regset_get2_fn() prototype. This means it utilizes struct membuf helpers for the destination buffer. As defined in regset.h, these helpers update the memory pointer during the copy process, enforcing sequential writes within the loop. Since xstate components are processed sequentially, any component whose buffer position does not align with its feature number has an issue. == Solution == Replace for_each_extended_xfeature() with the newly introduced for_each_extended_xfeature_in_order(). This macro ensures xstate components are handled in the correct order based on their actual positions in the destination buffer, rather than their feature numbers. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250320234301.8342-5-chang.seok.bae@intel.com
2025-04-14x86/fpu/xstate: Adjust XSAVE buffer size calculationChang S. Bae
The current xstate size calculation assumes that the highest-numbered xstate feature has the highest offset in the buffer, determining the size based on the topmost bit in the feature mask. However, this assumption is not architecturally guaranteed -- higher-numbered features may have lower offsets. With the introduction of the xfeature order table and its helper macro, xstate components can now be traversed in their positional order. Update the non-compacted format handling to iterate through the table to determine the last-positioned feature. Then, set the offset accordingly. Since size calculation primarily occurs during initialization or in non-critical paths, looping to find the last feature is not expected to have a meaningful performance impact. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250320234301.8342-4-chang.seok.bae@intel.com
2025-04-14x86/fpu/xstate: Introduce xfeature order table and accessor macroChang S. Bae
The kernel has largely assumed that higher xstate component numbers correspond to later offsets in the buffer. However, this assumption no longer holds for the non-compacted format, where a newer state component may have a lower offset. When iterating over xstate components in offset order, using the feature number as an index may be misleading. At the same time, the CPU exposes each component’s size and offset based on its feature number, making it a key for state information. To provide flexibility in handling xstate ordering, introduce a mapping table: feature order -> feature number. The table is dynamically populated based on the CPU-exposed features and is sorted in offset order at boot time. Additionally, add an accessor macro to facilitate sequential traversal of xstate components based on their actual buffer positions, given a feature bitmask. This accessor macro will be particularly useful for computing custom non-compacted format sizes and iterating over xstate offsets in non-compacted buffers. Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250320234301.8342-3-chang.seok.bae@intel.com
2025-04-14x86/fpu/xstate: Remove xstate offset checkChang S. Bae
Traditionally, new xstate components have been assigned sequentially, aligning feature numbers with their offsets in the XSAVE buffer. However, this ordering is not architecturally mandated in the non-compacted format, where a component's offset may not correspond to its feature number. The kernel caches CPUID-reported xstate component details, including size and offset in the non-compacted format. As part of this process, a sanity check is also conducted to ensure alignment between feature numbers and offsets. This check was likely intended as a general guideline rather than a strict requirement. Upcoming changes will support out-of-order offsets. Remove the check as becoming obsolete. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250320234301.8342-2-chang.seok.bae@intel.com
2025-04-13x86/uaccess: Use asm_inline() instead of asm() in __untagged_addr()Uros Bizjak
Use asm_inline() to instruct the compiler that the size of asm() is the minimum size of one instruction, ignoring how many instructions the compiler thinks it is. ALTERNATIVE macro that expands to several pseudo directives causes instruction length estimate to count more than 20 instructions. bloat-o-meter reports minimal code size increase (x86_64 defconfig with CONFIG_ADDRESS_MASKING, gcc-14.2.1): add/remove: 2/2 grow/shrink: 5/1 up/down: 2365/-1995 (370) Function old new delta ----------------------------------------------------- do_get_mempolicy - 1449 +1449 copy_nodes_to_user - 226 +226 __x64_sys_get_mempolicy 35 213 +178 syscall_user_dispatch_set_config 157 332 +175 __ia32_sys_get_mempolicy 31 206 +175 set_syscall_user_dispatch 29 181 +152 __do_sys_mremap 2073 2083 +10 sp_insert 133 117 -16 task_set_syscall_user_dispatch 172 - -172 kernel_get_mempolicy 1807 - -1807 Total: Before=21423151, After=21423521, chg +0.00% The code size increase is due to the compiler inlining more functions that inline untagged_addr(), e.g: task_set_syscall_user_dispatch() is now fully inlined in set_syscall_user_dispatch(): 000000000010b7e0 <set_syscall_user_dispatch>: 10b7e0: f3 0f 1e fa endbr64 10b7e4: 49 89 c8 mov %rcx,%r8 10b7e7: 48 89 d1 mov %rdx,%rcx 10b7ea: 48 89 f2 mov %rsi,%rdx 10b7ed: 48 89 fe mov %rdi,%rsi 10b7f0: 65 48 8b 3d 00 00 00 mov %gs:0x0(%rip),%rdi 10b7f7: 00 10b7f8: e9 03 fe ff ff jmp 10b600 <task_set_syscall_user_dispatch> that after inlining becomes: 000000000010b730 <set_syscall_user_dispatch>: 10b730: f3 0f 1e fa endbr64 10b734: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax 10b73b: 00 10b73c: 48 85 ff test %rdi,%rdi 10b73f: 74 54 je 10b795 <set_syscall_user_dispatch+0x65> 10b741: 48 83 ff 01 cmp $0x1,%rdi 10b745: 74 06 je 10b74d <set_syscall_user_dispatch+0x1d> 10b747: b8 ea ff ff ff mov $0xffffffea,%eax 10b74c: c3 ret 10b74d: 48 85 f6 test %rsi,%rsi 10b750: 75 7b jne 10b7cd <set_syscall_user_dispatch+0x9d> 10b752: 48 85 c9 test %rcx,%rcx 10b755: 74 1a je 10b771 <set_syscall_user_dispatch+0x41> 10b757: 48 89 cf mov %rcx,%rdi 10b75a: 49 b8 ef cd ab 89 67 movabs $0x123456789abcdef,%r8 10b761: 45 23 01 10b764: 90 nop 10b765: 90 nop 10b766: 90 nop 10b767: 90 nop 10b768: 90 nop 10b769: 90 nop 10b76a: 90 nop 10b76b: 90 nop 10b76c: 49 39 f8 cmp %rdi,%r8 10b76f: 72 6e jb 10b7df <set_syscall_user_dispatch+0xaf> 10b771: 48 89 88 48 08 00 00 mov %rcx,0x848(%rax) 10b778: 48 89 b0 50 08 00 00 mov %rsi,0x850(%rax) 10b77f: 48 89 90 58 08 00 00 mov %rdx,0x858(%rax) 10b786: c6 80 60 08 00 00 00 movb $0x0,0x860(%rax) 10b78d: f0 80 48 08 20 lock orb $0x20,0x8(%rax) 10b792: 31 c0 xor %eax,%eax 10b794: c3 ret 10b795: 48 09 d1 or %rdx,%rcx 10b798: 48 09 f1 or %rsi,%rcx 10b79b: 75 aa jne 10b747 <set_syscall_user_dispatch+0x17> 10b79d: 48 c7 80 48 08 00 00 movq $0x0,0x848(%rax) 10b7a4: 00 00 00 00 10b7a8: 48 c7 80 50 08 00 00 movq $0x0,0x850(%rax) 10b7af: 00 00 00 00 10b7b3: 48 c7 80 58 08 00 00 movq $0x0,0x858(%rax) 10b7ba: 00 00 00 00 10b7be: c6 80 60 08 00 00 00 movb $0x0,0x860(%rax) 10b7c5: f0 80 60 08 df lock andb $0xdf,0x8(%rax) 10b7ca: 31 c0 xor %eax,%eax 10b7cc: c3 ret 10b7cd: 48 8d 3c 16 lea (%rsi,%rdx,1),%rdi 10b7d1: 48 39 fe cmp %rdi,%rsi 10b7d4: 0f 82 78 ff ff ff jb 10b752 <set_syscall_user_dispatch+0x22> 10b7da: e9 68 ff ff ff jmp 10b747 <set_syscall_user_dispatch+0x17> 10b7df: b8 f2 ff ff ff mov $0xfffffff2,%eax 10b7e4: c3 ret Please note a series of NOPs that get replaced with an alternative: 11f0: 65 48 23 05 00 00 00 and %gs:0x0(%rip),%rax 11f7: 00 Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250407072129.33440-1-ubizjak@gmail.com
2025-04-13perf/x86/intel/bts: Replace offsetof() with struct_size()Thorsten Blum
Use struct_size() to calculate the number of bytes to allocate for a new bts_buffer. Compared to offsetof(), struct_size() provides additional compile-time checks (e.g., __must_be_array()). Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250413104108.49142-2-thorsten.blum@linux.dev
2025-04-13x86/msr: Add compatibility wrappers for rdmsrl()/wrmsrl()Ingo Molnar
To reduce the impact of the API renames in -next, add compatibility wrappers for the two most popular MSR access APIs: rdmsrl() and wrmsrl(). Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org
2025-04-13objtool, x86/hweight: Remove ANNOTATE_IGNORE_ALTERNATIVEJosh Poimboeuf
Since objtool's inception, frame pointer warnings have been manually silenced for __arch_hweight*() to allow those functions' inline asm to avoid using ASM_CALL_CONSTRAINT. The potentially dubious reasoning for that decision over nine years ago was that since !X86_FEATURE_POPCNT is exceedingly rare, it's not worth hurting the code layout for a function call that will never happen on the vast majority of systems. However, those functions actually started using ASM_CALL_CONSTRAINT with the following commit: 194a613088a8 ("x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()") And rightfully so, as it makes the code correct. ASM_CALL_CONSTRAINT will soon have no effect for non-FP configs anyway. With ASM_CALL_CONSTRAINT in place, ANNOTATE_IGNORE_ALTERNATIVE no longer has a purpose for the hweight functions. Remove it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/e7070dba3278c90f1a836b16157dcd34ccd21e21.1744318586.git.jpoimboe@kernel.org
2025-04-13x86/percpu: Refer __percpu_prefix to __force_percpu_prefixUros Bizjak
Refer __percpu_prefix to __force_percpu_prefix to avoid duplicate definition. While there, slightly reorder definitions to a more logical sequence, remove unneeded double quotes and move misplaced comment to the right place. No functional changes intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lore.kernel.org/r/20250411093130.81389-1-ubizjak@gmail.com
2025-04-12x86/microcode/AMD: Extend the SHA check to Zen5, block loading of any ↵Borislav Petkov (AMD)
unreleased standalone Zen5 microcode patches All Zen5 machines out there should get BIOS updates which update to the correct microcode patches addressing the microcode signature issue. However, silly people carve out random microcode blobs from BIOS packages and think are doing other people a service this way... Block loading of any unreleased standalone Zen5 microcode patches. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Cc: Nikolay Borisov <nik.borisov@suse.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20250410114222.32523-1-bp@kernel.org
2025-04-12x86/sev: Prepare for splitting off early SEV codeArd Biesheuvel
Prepare for splitting off parts of the SEV core.c source file into a file that carries code that must tolerate being called from the early 1:1 mapping. This will allow special build-time handling of thise code, to ensure that it gets generated in a way that is compatible with the early execution context. So create a de-facto internal SEV API and put the definitions into sev-internal.h. No attempt is made to allow this header file to be included in arbitrary other sources - this is explicitly not the intent. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-20-ardb+git@google.com
2025-04-12x86/boot: Drop RIP_REL_REF() uses from SME startup codeArd Biesheuvel
RIP_REL_REF() has no effect on code residing in arch/x86/boot/startup, as it is built with -fPIC. So remove any occurrences from the SME startup code. Note the SME is the only caller of cc_set_mask() that requires this, so drop it from there as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-19-ardb+git@google.com
2025-04-12x86/boot: Move early SME init code into startup/Ard Biesheuvel
Move the SME initialization code, which runs from the 1:1 mapping of memory as it operates on the kernel virtual mapping, into the new sub-directory arch/x86/boot/startup/ where all startup code will reside that needs to tolerate executing from the 1:1 mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-18-ardb+git@google.com
2025-04-12x86/boot: Drop RIP_REL_REF() uses from early mapping codeArd Biesheuvel
Now that __startup_64() is built using -fPIC, RIP_REL_REF() has become a NOP and can be removed. Only some occurrences of rip_rel_ptr() will remain, to explicitly take the address of certain global structures in the 1:1 mapping of memory. While at it, update the code comment to describe why this is needed. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-17-ardb+git@google.com
2025-04-12x86/boot: Move early kernel mapping code into startup/Ard Biesheuvel
The startup code that constructs the kernel virtual mapping runs from the 1:1 mapping of memory itself, and therefore, cannot use absolute symbol references. Before making changes in subsequent patches, move this code into a separate source file under arch/x86/boot/startup/ where all such code will be kept from now on. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-16-ardb+git@google.com
2025-04-12x86/boot: Move the early GDT/IDT setup code into startup/Ard Biesheuvel
Move the early GDT/IDT setup code that runs long before the kernel virtual mapping is up into arch/x86/boot/startup/, and build it in a way that ensures that the code tolerates being called from the 1:1 mapping of memory. The code itself is left unchanged by this patch. Also tweak the sed symbol matching pattern in the decompressor to match on lower case 't' or 'b', as these will be emitted by Clang for symbols with hidden linkage. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-15-ardb+git@google.com
2025-04-12x86/asm: Make rip_rel_ptr() usable from fPIC codeArd Biesheuvel
RIP_REL_REF() is used in non-PIC C code that is called very early, before the kernel virtual mapping is up, which is the mapping that the linker expects. It is currently used in two different ways: - to refer to the value of a global variable, including as an lvalue in assignments; - to take the address of a global variable via the mapping that the code currently executes at. The former case is only needed in non-PIC code, as PIC code will never use absolute symbol references when the address of the symbol is not being used. But taking the address of a variable in PIC code may still require extra care, as a stack allocated struct assignment may be emitted as a memcpy() from a statically allocated copy in .rodata. For instance, this void startup_64_setup_gdt_idt(void) { struct desc_ptr startup_gdt_descr = { .address = (__force unsigned long)gdt_page.gdt, .size = GDT_SIZE - 1, }; may result in an absolute symbol reference in PIC code, even though the struct is allocated on the stack and populated at runtime. To address this case, make rip_rel_ptr() accessible in PIC code, and update any existing uses where the address of a global variable is taken using RIP_REL_REF. Once all code of this nature has been moved into arch/x86/boot/startup and built with -fPIC, RIP_REL_REF() can be retired, and only rip_rel_ptr() will remain. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250410134117.3713574-14-ardb+git@google.com
2025-04-12x86/mm: Opt-in to IRQs-off activate_mm()Andy Lutomirski
We gain nothing by having the core code enable IRQs right before calling activate_mm() only for us to turn them right back off again in switch_mm(). This will save a few cycles, so execve() should be blazingly fast with this patch applied! Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20250402094540.3586683-8-mingo@kernel.org
2025-04-12x86/efi: Make efi_enter/leave_mm() use the use_/unuse_temporary_mm() machineryAndy Lutomirski
This should be considerably more robust. It's also necessary for optimized for_each_possible_lazymm_cpu() on x86 -- without this patch, EFI calls in lazy context would remove the lazy mm from mm_cpumask(). [ mingo: Merged it on top of x86/alternatives ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20250402094540.3586683-7-mingo@kernel.org