summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2025-05-17x86/bugs: Fix indentation due to ITS mergeBorislav Petkov (AMD)
No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-16KVM: x86/mmu: Use kvm_x86_call() instead of manual static_call()Sean Christopherson
Use KVM's preferred kvm_x86_call() wrapper to invoke static calls related to mirror page tables. No functional change intended. Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables") Fixes: 94faba8999b9 ("KVM: x86/tdp_mmu: Propagate tearing down mirror page tables") Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20250331182703.725214-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-16x86/io_apic: Switch to of_fwnode_handle()Jiri Slaby (SUSE)
of_node_to_fwnode() is irqdomain's reimplementation of the "officially" defined of_fwnode_handle(). The former is in the process of being removed, so use the latter instead. [ tglx: Fix up subject prefix ] Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-11-jirislaby@kernel.org
2025-05-16KVM: SVM: Add architectural definitions/assets for Bus Lock ThresholdNikunj A Dadhania
Virtual machines can exploit bus locks to degrade the performance of the system. Bus locks can be caused by Non-WB(Write back) and misaligned locked RMW (Read-modify-Write) instructions and require systemwide synchronization among all processors which can result into significant performance penalties. To address this issue, the Bus Lock Threshold feature is introduced to provide ability to hypervisor to restrict guests' capability of initiating mulitple buslocks, thereby preventing system slowdowns. Support for the buslock threshold is indicated via CPUID function 0x8000000A_EDX[29]. On the processors that support the Bus Lock Threshold feature, the VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus Lock threshold count. VMCB intercept bit VMCB Offset Bits Function 14h 5 Intercept bus lock operations Bus lock threshold count VMCB Offset Bits Function 120h 15:0 Bus lock counter When a VMRUN instruction is executed, the bus lock threshold count is loaded into an internal count register. Before the processor executes a bus lock in the guest, it checks the value of this register: - If the value is greater than '0', the processor successfully executes the bus lock and decrements the count. - If the value is '0', the bus lock is not executed, and a #VMEXIT to the VMM is taken. The bus lock threshold #VMEXIT is reported to the VMM with the VMEXIT code A5h, SVM_EXIT_BUS_LOCK. Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Co-developed-by: Manali Shukla <manali.shukla@amd.com> Signed-off-by: Manali Shukla <manali.shukla@amd.com> Link: https://lore.kernel.org/r/20250502050346.14274-4-manali.shukla@amd.com [sean: rewrite shortlog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-16x86/cpufeatures: Add CPUID feature bit for the Bus Lock ThresholdManali Shukla
Misbehaving guests can cause bus locks to degrade the performance of the system. The Bus Lock Threshold feature can be used to address this issue by providing capability to the hypervisor to limit guest's ability to generate bus lock, thereby preventing system slowdown due to performance penalities. When the Bus Lock Threshold feature is enabled, the processor checks the bus lock threshold count before executing the buslock and decides whether to trigger bus lock exit or not. The value of the bus lock threshold count '0' generates bus lock exits, and if the value is greater than '0', the bus lock is executed successfully and the bus lock threshold count is decremented. Presence of the Bus Lock threshold feature is indicated via CPUID function 0x8000000A_EDX[29]. Signed-off-by: Manali Shukla <manali.shukla@amd.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250502050346.14274-3-manali.shukla@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-16KVM: x86: Make kvm_pio_request.linear_rip a common field for user exitsManali Shukla
Move and rename kvm_pio_request.linear_rip to kvm_vcpu_arch.cui_linear_rip so that the field can be used by other userspace exit completion flows that need to take action if and only if userspace has not modified RIP. No functional changes intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Manali Shukla <manali.shukla@amd.com> Link: https://lore.kernel.org/r/20250502050346.14274-2-manali.shukla@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-16perf/x86/intel/ds: Remove redundant assignments to sample.periodChangbin Du
The perf_sample_data_init() has already set the period of sample, so no need to do it again. Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250506094907.2724-1-changbin.du@huawei.com
2025-05-16x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrlJames Morse
Resctrl is a filesystem interface to hardware that provides cache allocation policy and bandwidth control for groups of tasks or CPUs. To support more than one architecture, resctrl needs to live in /fs/. Move the code that is concerned with the filesystem interface to /fs/resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-25-james.morse@arm.com
2025-05-16x86/resctrl: Always initialise rid field in rdt_resources_all[]James Morse
x86 has an array, rdt_resources_all[], of all possible resources. The for-each-resource walkers depend on the rid field of all resources being initialised. If the array ever grows due to another architecture adding a resource type that is not defined on x86, the for-each-resources walkers will loop forever. Initialise all the rid values in resctrl_arch_late_init() before any for-each-resource walker can be called. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-24-james.morse@arm.com
2025-05-16x86/resctrl: Relax some asm #includesDave Martin
checkpatch.pl identifies some direct #includes of asm headers that can be satisfied by including the corresponding <linux/...> header instead. Fix them. No intentional functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-23-james.morse@arm.com
2025-05-16x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()Dave Martin
rdt_init_fs_context() sizes a typed allocation using an explicit sizeof(type) expression, which checkpatch.pl complains about. Since this code is about to be factored out and made generic, this is a good opportunity to fix the code to size the allocation based on the target pointer instead, to reduce the chance of future mis- maintenance. Fix it. No functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-22-james.morse@arm.com
2025-05-16x86/resctrl: Squelch whitespace anomalies in resctrl core codeDave Martin
checkpatch.pl complains about some whitespace anomalies in the resctrl core code. This doesn't matter, but since this code is about to be factored out and made generic, this is a good opportunity to fix these issues and so reduce future checkpatch fuzz. Fix them. No functional change. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-21-james.morse@arm.com
2025-05-16x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.hJames Morse
The resctrl pseudo-lock feature allows an architecture to allocate data into particular cache portions, which are then treated as reserved to avoid that data ever being evicted. Setting this up is deeply architecture specific as it involves disabling prefetchers etc. It is not possible to support this kind of feature on arm64. Risc-V is assumed to be the same. The prototypes for the architecture code were added to x86's asm/resctrl.h, with other architectures able to provide stubs for their architecture. This forces other architectures to provide identical stubs. Move the prototypes and stubs to linux/resctrl.h, and switch between them using the existing Kconfig symbol. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-20-james.morse@arm.com
2025-05-16x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubsJames Morse
resctrl_arch_mon_ctx_alloc() and resctrl_arch_mon_ctx_free() take an enum resctrl_event_id that is already defined in resctrl_types.h to be accessible to asm/resctrl.h. The x86 stubs take an int. Fix that. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-19-james.morse@arm.com
2025-05-16x86/resctrl: Move the filesystem bits to headers visible to fs/resctrlJames Morse
Once the filesystem parts of resctrl move to fs/resctrl, it cannot rely on definitions in x86's internal.h. Move definitions in internal.h that need to be shared between the filesystem and architecture code to header files that fs/resctrl can include. Doing this separately means the filesystem code only moves between files of the same name, instead of having these changes mixed in too. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-17-james.morse@arm.com
2025-05-16x86/mm: Remove duplicated word in warning messageLukas Bulwahn
Commit bbeb69ce3013 ("x86/mm: Remove CONFIG_HIGHMEM64G support") introduces a new warning message MSG_HIGHMEM_TRIMMED, which accidentally introduces a duplicated 'for for' in the warning message. Remove this duplicated word. This was noticed while reviewing for references to obsolete kernel build config options. Fixes: bbeb69ce3013 ("x86/mm: Remove CONFIG_HIGHMEM64G support") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: https://lore.kernel.org/r/20250516090810.556623-1-lukas.bulwahn@redhat.com
2025-05-16fs/resctrl: Add boiler plate for external resctrl codeJames Morse
Add Makefile and Kconfig for fs/resctrl. Add ARCH_HAS_CPU_RESCTRL for the common parts of the resctrl interface and make X86_CPU_RESCTRL select this. Adding an include of asm/resctrl.h to linux/resctrl.h allows the /fs/resctrl files to switch over to using this header instead. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-16-james.morse@arm.com
2025-05-16x86/cpuid: Rename ↵Ahmed S. Darwish
hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor() In order to let all the APIs under <cpuid/api.h> have a shared "cpuid_" namespace, rename hypervisor_cpuid_base() to cpuid_base_hypervisor(). To align with the new style, also rename: for_each_possible_hypervisor_cpuid_base(function) to: for_each_possible_cpuid_base_hypervisor(function) Adjust call-sites accordingly. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/aCZOi0Oohc7DpgTo@lx-t490
2025-05-16x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameterAhmed S. Darwish
The CPUID(0x2) descriptors iterator has been renamed from: for_each_leaf_0x2_entry() to: for_each_cpuid_0x2_desc() since it iterates over CPUID(0x2) cache and TLB "descriptors", not "entries". In the macro's x86/cpu call-site, rename the parameter denoting the parsed descriptor at each iteration from 'entry' to 'desc'. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-8-darwi@linutronix.de
2025-05-16x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameterAhmed S. Darwish
The CPUID(0x2) descriptors iterator has been renamed from: for_each_leaf_0x2_entry() to: for_each_cpuid_0x2_desc() since it iterates over CPUID(0x2) cache and TLB "descriptors", not "entries". In the macro's x86/cacheinfo call-site, rename the parameter denoting the parsed descriptor at each iteration from 'entry' to 'desc'. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-7-darwi@linutronix.de
2025-05-16x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2()Ahmed S. Darwish
Rename the CPUID(0x2) register accessor function: cpuid_get_leaf_0x2_regs(regs) to: cpuid_leaf_0x2(regs) for consistency with other <cpuid/api.h> accessors that return full CPUID registers outputs like: cpuid_leaf(regs) cpuid_subleaf(regs) In the same vein, rename the CPUID(0x2) iteration macro: for_each_leaf_0x2_entry() to: for_each_cpuid_0x2_desc() to include "cpuid" in the macro name, and since what is iterated upon is CPUID(0x2) cache and TLB "descriptos", not "entries". Prefix an underscore to that iterator macro parameters, so that the newly renamed 'desc' parameter do not get mixed with "union leaf_0x2_regs :: desc[]" in the macro's implementation. Adjust all the affected call-sites accordingly. While at it, use "CPUID(0x2)" instead of "CPUID leaf 0x2" as this is the recommended style. No change in functionality intended. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-6-darwi@linutronix.de
2025-05-16x86/resctrl: Split trace.hJames Morse
trace.h contains all the tracepoints. After the move to /fs/resctrl, some of these will be left behind. All the pseudo_lock tracepoints remain part of the architecture. The lone tracepoint in monitor.c moves to /fs/resctrl. Split trace.h so that each C file includes a different trace header file. This means the trace header files are not modified when they are moved. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-14-james.morse@arm.com
2025-05-16x86/resctrl: Expand the width of domid by replacing mon_data_bitsJames Morse
MPAM platforms retrieve the cache-id property from the ACPI PPTT table. The cache-id field is 32 bits wide. Under resctrl, the cache-id becomes the domain-id, and is packed into the mon_data_bits union bitfield. The width of cache-id in this field is 14 bits. Expanding the union would break 32bit x86 platforms as this union is stored as the kernfs kn->priv pointer. This saved allocating memory for the priv data storage. The firmware on MPAM platforms have used the PPTT cache-id field to expose the interconnect's id for the cache, which is sparse and uses more than 14 bits. Use of this id is to enable PCIe direct cache injection hints. Using this feature with VFIO means the value provided by the ACPI table should be exposed to user-space. To support cache-id values greater than 14 bits, convert the mon_data_bits union to a structure. These are shared between control and monitor groups, and are allocated on first use. The list of allocated struct mon_data is free'd when the filesystem is umount()ed. Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-13-james.morse@arm.com
2025-05-16x86/resctrl: Add end-marker to the resctrl_event_id enumJames Morse
The resctrl_event_id enum gives names to the counter event numbers on x86. These are used directly by resctrl. To allow the MPAM driver to keep an array of these the size of the enum needs to be known. Add a 'num_events' enum entry which can be used to size an array. This is added to the enum to reduce conflicts with another series, which in turn requires get_arch_mbm_state() to have a default case. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-12-james.morse@arm.com
2025-05-16x86/tracing, x86/mm: Move page fault tracepoints to genericNam Cao
Page fault tracepoints are interesting for other architectures as well. Move them to be generic. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/r/89c2f284adf9b4c933f0e65811c50cef900a5a95.1747046848.git.namcao@linutronix.de
2025-05-16x86/tracing, x86/mm: Remove redundant trace_pagefault_keyNam Cao
trace_pagefault_key is used to optimize the pagefault tracepoints when it is disabled. However, tracepoints already have built-in static_key for this exact purpose. Remove this redundant key. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/r/827c7666d2989f08742a4fb869b1ed5bfaaf1dbf.1747046848.git.namcao@linutronix.de
2025-05-16x86/resctrl: Move is_mba_sc() out of core.cJames Morse
is_mba_sc() is defined in core.c, but has no callers there. It does not access any architecture private structures. Move this to rdtgroup.c where the majority of callers are. This makes the move of the filesystem code to /fs/ cleaner. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-11-james.morse@arm.com
2025-05-15x86/resctrl: Drop __init/__exit on assorted symbolsJames Morse
Because ARM's MPAM controls are probed using MMIO, resctrl can't be initialised until enough CPUs are online to have determined the system-wide supported num_closid. Arm64 also supports 'late onlined secondaries', where only a subset of CPUs are online during boot. These two combine to mean the MPAM driver may not be able to initialise resctrl until user-space has brought 'enough' CPUs online. To allow MPAM to initialise resctrl after __init text has been free'd, remove all the __init markings from resctrl. The existing __exit markings cause these functions to be removed by the linker as it has never been possible to build resctrl as a module. MPAM has an error interrupt which causes the driver to reset and disable itself. Remove the __exit markings to allow the MPAM driver to tear down resctrl when an error occurs. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-10-james.morse@arm.com
2025-05-15x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount pointJames Morse
resctrl_exit() was intended for use when the 'resctrl' module was unloaded. resctrl can't be built as a module, and the kernfs helpers are not exported so this is unlikely to change. MPAM has an error interrupt which indicates the MPAM driver has gone haywire. Should this occur tasks could run with the wrong control values, leading to bad performance for important tasks. In this scenario the MPAM driver will reset the hardware, but it needs a way to tell resctrl that no further configuration should be attempted. In particular, moving tasks between control or monitor groups does not interact with the architecture code, so there is no opportunity for the arch code to indicate that the hardware is no-longer functioning. Using resctrl_exit() for this leaves the system in a funny state as resctrl is still mounted, but cannot be un-mounted because the sysfs directory that is typically used has been removed. Dave Martin suggests this may cause systemd trouble in the future as not all filesystems can be unmounted. Add calls to remove all the files and directories in resctrl, and remove the sysfs_remove_mount_point() call that leaves the system in a funny state. When triggered, this causes all the resctrl files to disappear. resctrl can be unmounted, but not mounted again. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-9-james.morse@arm.com
2025-05-15x86/resctrl: Check all domains are offline in resctrl_exit()James Morse
resctrl_exit() removes things like the resctrl mount point directory and unregisters the filesystem prior to freeing data structures that were allocated during resctrl_init(). This assumes that there are no online domains when resctrl_exit() is called. If any domain were online, the limbo or overflow handler could be scheduled to run. Add a check for any online control or monitor domains, and document that the architecture code is required to offline all monitor and control domains before calling resctrl_exit(). Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-8-james.morse@arm.com
2025-05-15x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"James Morse
resctrl_sched_in() loads the architecture specific CPU MSRs with the CLOSID and RMID values. This function was named before resctrl was split to have architecture specific code, and generic filesystem code. This function is obviously architecture specific, but does not begin with 'resctrl_arch_', making it the odd one out in the functions an architecture needs to support to enable resctrl. Rename it for consistency. This is purely cosmetic. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-7-james.morse@arm.com
2025-05-15x86/resctrl: Remove the limit on the number of CLOSIDAmit Singh Tomar
Resctrl allocates and finds free CLOSID values using the bits of a u32. This restricts the number of control groups that can be created by user-space. MPAM has an architectural limit of 2^16 CLOSID values, Intel x86 could be extended beyond 32 values. There is at least one MPAM platform which supports more than 32 CLOSID values. Replace the fixed size bitmap with calls to the bitmap API to allocate an array of a sufficient size. ffs() returns '1' for bit 0, hence the existing code subtracts 1 from the index to get the CLOSID value. find_first_bit() returns the bit number which does not need adjusting. [ morse: fixed the off-by-one in the allocator and the wrong not-found value. Removed the limit. Rephrase the commit message. ] Signed-off-by: Amit Singh Tomar <amitsinght@marvell.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-6-james.morse@arm.com
2025-05-15x86/resctrl: Optimize cpumask_any_housekeeping()Yury Norov [NVIDIA]
With the lack of cpumask_any_andnot_but(), cpumask_any_housekeeping() has to abuse cpumask_nth() functions. Update cpumask_any_housekeeping() to use the new cpumask_any_but() and cpumask_any_andnot_but(). These two functions understand RESCTRL_PICK_ANY_CPU, which simplifies cpumask_any_housekeeping() significantly. Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: James Morse <james.morse@arm.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-5-james.morse@arm.com
2025-05-15x86/sgx: Prevent attempts to reclaim poisoned pagesAndrew Zaborowski
TL;DR: SGX page reclaim touches the page to copy its contents to secondary storage. SGX instructions do not gracefully handle machine checks. Despite this, the existing SGX code will try to reclaim pages that it _knows_ are poisoned. Avoid even trying to reclaim poisoned pages. The longer story: Pages used by an enclave only get epc_page->poison set in arch_memory_failure() but they currently stay on sgx_active_page_list until sgx_encl_release(), with the SGX_EPC_PAGE_RECLAIMER_TRACKED flag untouched. epc_page->poison is not checked in the reclaimer logic meaning that, if other conditions are met, an attempt will be made to reclaim an EPC page that was poisoned. This is bad because 1. we don't want that page to end up added to another enclave and 2. it is likely to cause one core to shut down and the kernel to panic. Specifically, reclaiming uses microcode operations including "EWB" which accesses the EPC page contents to encrypt and write them out to non-SGX memory. Those operations cannot handle MCEs in their accesses other than by putting the executing core into a special shutdown state (affecting both threads with HT.) The kernel will subsequently panic on the remaining cores seeing the core didn't enter MCE handler(s) in time. Call sgx_unmark_page_reclaimable() to remove the affected EPC page from sgx_active_page_list on memory error to stop it being considered for reclaiming. Testing epc_page->poison in sgx_reclaim_pages() would also work but I assume it's better to add code in the less likely paths. The affected EPC page is not added to &node->sgx_poison_page_list until later in sgx_encl_release()->sgx_free_epc_page() when it is EREMOVEd. Membership on other lists doesn't change to avoid changing any of the lists' semantics except for sgx_active_page_list. There's a "TBD" comment in arch_memory_failure() about pre-emptive actions, the goal here is not to address everything that it may imply. This also doesn't completely close the time window when a memory error notification will be fatal (for a not previously poisoned EPC page) -- the MCE can happen after sgx_reclaim_pages() has selected its candidates or even *inside* a microcode operation (actually easy to trigger due to the amount of time spent in them.) The spinlock in sgx_unmark_page_reclaimable() is safe because memory_failure() runs in process context and no spinlocks are held, explicitly noted in a mm/memory-failure.c comment. Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: balrogg@gmail.com Cc: linux-sgx@vger.kernel.org Link: https://lore.kernel.org/r/20250508230429.456271-1-andrew.zaborowski@intel.com
2025-05-15x86/cpuid: Rename have_cpuid_p() to cpuid_feature()Ahmed S. Darwish
In order to let all the APIs under <cpuid/api.h> have a shared "cpuid_" namespace, rename have_cpuid_p() to cpuid_feature(). Adjust all call-sites accordingly. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-4-darwi@linutronix.de
2025-05-15x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID headerAhmed S. Darwish
The main CPUID header <asm/cpuid.h> was originally a storefront for the headers: <asm/cpuid/api.h> <asm/cpuid/leaf_0x2_api.h> Now that the latter CPUID(0x2) header has been merged into the former, there is no practical difference between <asm/cpuid.h> and <asm/cpuid/api.h>. Migrate all users to the <asm/cpuid/api.h> header, in preparation of the removal of <asm/cpuid.h>. Don't remove <asm/cpuid.h> just yet, in case some new code in -next started using it. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-3-darwi@linutronix.de
2025-05-15x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h>Ahmed S. Darwish
Move all of the CPUID(0x2) APIs at <cpuid/leaf_0x2_api.h> into <cpuid/api.h>, in order centralize all CPUID APIs into the latter. While at it, separate the different CPUID leaf parsing APIs using header comments like "CPUID(0xN) parsing: ". Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: x86-cpuid@lists.linux.dev Link: https://lore.kernel.org/r/20250508150240.172915-2-darwi@linutronix.de
2025-05-15perf/x86/intel: Fix segfault with PEBS-via-PT with sample_freqAdrian Hunter
Currently, using PEBS-via-PT with a sample frequency instead of a sample period, causes a segfault. For example: BUG: kernel NULL pointer dereference, address: 0000000000000195 <NMI> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0xca/0x290 ? exc_page_fault+0x7e/0x1b0 ? asm_exc_page_fault+0x26/0x30 ? intel_pmu_pebs_event_update_no_drain+0x40/0x60 ? intel_pmu_pebs_event_update_no_drain+0x32/0x60 intel_pmu_drain_pebs_icl+0x333/0x350 handle_pmi_common+0x272/0x3c0 intel_pmu_handle_irq+0x10a/0x2e0 perf_event_nmi_handler+0x2a/0x50 That happens because intel_pmu_pebs_event_update_no_drain() assumes all the pebs_enabled bits represent counter indexes, which is not always the case. In this particular case, bits 60 and 61 are set for PEBS-via-PT purposes. The behaviour of PEBS-via-PT with sample frequency is questionable because although a PMI is generated (PEBS_PMI_AFTER_EACH_RECORD), the period is not adjusted anyway. Putting that aside, fix intel_pmu_pebs_event_update_no_drain() by passing the mask of counter bits instead of 'size'. Note, prior to the Fixes commit, 'size' would be limited to the maximum counter index, so the issue was not hit. Fixes: 722e42e45c2f1 ("perf/x86: Support counter mask") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: linux-perf-users@vger.kernel.org Link: https://lore.kernel.org/r/20250508134452.73960-1-adrian.hunter@intel.com
2025-05-15perf/aux: Allocate non-contiguous AUX pages by defaultYabin Cui
perf always allocates contiguous AUX pages based on aux_watermark. However, this contiguous allocation doesn't benefit all PMUs. For instance, ARM SPE and TRBE operate with virtual pages, and Coresight ETR allocates a separate buffer. For these PMUs, allocating contiguous AUX pages unnecessarily exacerbates memory fragmentation. This fragmentation can prevent their use on long-running devices. This patch modifies the perf driver to be memory-friendly by default, by allocating non-contiguous AUX pages. For PMUs requiring contiguous pages (Intel BTS and some Intel PT), the existing PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't require but can benefit from contiguous pages (some Intel PT), a new capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their existing behavior. Signed-off-by: Yabin Cui <yabinc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250508232642.148767-1-yabinc@google.com
2025-05-15x86/msr: Add rdmsrl_on_cpu() compatibility wrapperIngo Molnar
Add a simple rdmsrl_on_cpu() compatibility wrapper for rdmsrq_on_cpu(), to make life in -next easier, where the PM tree recently grew more uses of the old API. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mario Limonciello <mario.limonciello@amd.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Link: https://lore.kernel.org/r/20250512145517.6e0666e3@canb.auug.org.au
2025-05-15x86/mm: Fix kernel-doc descriptions of various pgtable methodsShivank Garg
So 'make W=1' complains about a couple of kernel-doc descriptions in our MM primitives in pgtable.c: arch/x86/mm/pgtable.c:623: warning: Function parameter or struct member 'reserve' not described in 'reserve_top_address' arch/x86/mm/pgtable.c:672: warning: Function parameter or struct member 'p4d' not described in 'p4d_set_huge' arch/x86/mm/pgtable.c:672: warning: Function parameter or struct member 'addr' not described in 'p4d_set_huge' ... so on Fix them all up, add missing parameter documentation, and fix various spelling inconsistencies while at it. [ mingo: Harmonize kernel-doc annotations some more. ] Signed-off-by: Shivank Garg <shivankg@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20250514062637.3287779-1-shivankg@amd.com
2025-05-15x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm ↵Ard Biesheuvel
use too Expose certain 'struct cpuinfo_x86' fields via asm-offsets for x86_64 too, so that it will be possible to set CPU capabilities from 64-bit asm code. 32-bit already used these fields, so simply move those offset exports into the unified asm-offsets.c file. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250514104242.1275040-12-ardb+git@google.com
2025-05-14x86/boot: Defer initialization of VM space related global variablesArd Biesheuvel
The global pseudo-constants 'page_offset_base', 'vmalloc_base' and 'vmemmap_base' are not used extremely early during the boot, and cannot be used safely until after the KASLR memory randomization code in kernel_randomize_memory() executes, which may update their values. So there is no point in setting these variables extremely early, and it can wait until after the kernel itself is mapped and running from its permanent virtual mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250513111157.717727-9-ardb+git@google.com
2025-05-14x86/power: hibernate: Fix W=1 build kernel-doc warningsShivank Garg
Warnings generated with 'make W=1': arch/x86/power/hibernate.c:47: warning: Function parameter or struct member 'pfn' not described in 'pfn_is_nosave' arch/x86/power/hibernate.c:92: warning: Function parameter or struct member 'max_size' not described in 'arch_hibernation_header_save' Add missing parameter documentation in hibernate functions to fix kernel-doc warnings. Signed-off-by: Shivank Garg <shivankg@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250514062637.3287779-2-shivankg@amd.com
2025-05-14x86/mm/pat: Fix W=1 build kernel-doc warningShivank Garg
Building the kernel with W=1 generates the following warning: arch/x86/mm/pat/memtype.c:692: warning: Function parameter or struct member 'pfn' not described in 'pat_pfn_immune_to_uc_mtrr' Add missing parameter documentation to fix the kernel-doc warning. Signed-off-by: Shivank Garg <shivankg@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250514062637.3287779-3-shivankg@amd.com
2025-05-13x86/its: Fix build errors when CONFIG_MODULES=nEric Biggers
Fix several build errors when CONFIG_MODULES=n, including the following: ../arch/x86/kernel/alternative.c:195:25: error: incomplete definition of type 'struct module' 195 | for (int i = 0; i < mod->its_num_pages; i++) { Fixes: 872df34d7c51 ("x86/its: Use dynamic thunks for indirect branches") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-05-13x86/CPU/AMD: Add X86_FEATURE_ZEN6Yazen Ghannam
Add a synthetic feature flag for Zen6. [ bp: Move the feature flag to a free slot and avoid future merge conflicts from incoming stuff. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250513204857.3376577-1-yazen.ghannam@amd.com
2025-05-13x86/bugs: Fix SRSO reporting on Zen1/2 with SMT disabledBorislav Petkov (AMD)
1f4bb068b498 ("x86/bugs: Restructure SRSO mitigation") does this: if (boot_cpu_data.x86 < 0x19 && !cpu_smt_possible()) { setup_force_cpu_cap(X86_FEATURE_SRSO_NO); srso_mitigation = SRSO_MITIGATION_NONE; return; } and, in particular, sets srso_mitigation to NONE. This leads to reporting Speculative Return Stack Overflow: Vulnerable on Zen2 machines. There's a far bigger confusion with what SRSO_NO means and how it is used in the code but this will be a matter of future fixes and restructuring to how the SRSO mitigation gets determined. Fix the reporting issue for now. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: David Kaplan <david.kaplan@amd.com> Link: https://lore.kernel.org/20250513110405.15872-1-bp@kernel.org
2025-05-13x86/sev: Make sure pages are not skipped during kdumpAshish Kalra
When shared pages are being converted to private during kdump, additional checks are performed. They include handling the case of a GHCB page being contained within a huge page. Currently, this check incorrectly skips a page just below the GHCB page from being transitioned back to private during kdump preparation. This skipped page causes a 0x404 #VC exception when it is accessed later while dumping guest memory for vmcore generation. Correct the range to be checked for GHCB contained in a huge page. Also, ensure that the skipped huge page containing the GHCB page is transitioned back to private by applying the correct address mask later when changing GHCBs to private at end of kdump preparation. [ bp: Massage commit message. ] Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec") Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250506183529.289549-1-Ashish.Kalra@amd.com
2025-05-13x86/sev: Do not touch VMSA pages during SNP guest memory kdumpAshish Kalra
When kdump is running makedumpfile to generate vmcore and dump SNP guest memory it touches the VMSA page of the vCPU executing kdump. It then results in unrecoverable #NPF/RMP faults as the VMSA page is marked busy/in-use when the vCPU is running and subsequently a causes guest softlockup/hang. Additionally, other APs may be halted in guest mode and their VMSA pages are marked busy and touching these VMSA pages during guest memory dump will also cause #NPF. Issue AP_DESTROY GHCB calls on other APs to ensure they are kicked out of guest mode and then clear the VMSA bit on their VMSA pages. If the vCPU running kdump is an AP, mark it's VMSA page as offline to ensure that makedumpfile excludes that page while dumping guest memory. Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec") Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250428214151.155464-1-Ashish.Kalra@amd.com