summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
4 daysMerge tag 'x86_urgent_for_v6.16_rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure DR6 and DR7 are initialized to their architectural values and not accidentally cleared, leading to misconfigurations * tag 'x86_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/traps: Initialize DR7 by writing its architectural reset value x86/traps: Initialize DR6 by writing its architectural reset value
8 daysum: Use correct data source in fpregs_legacy_set()Tiwei Bie
Read from the buffer pointed to by 'from' instead of '&buf', as 'buf' contains no valid data when 'ubuf' is NULL. Fixes: b1e1bd2e6943 ("um: Add helper functions to get/set state for SECCOMP") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250606124428.148164-5-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
9 daysx86/traps: Initialize DR7 by writing its architectural reset valueXin Li (Intel)
Initialize DR7 by writing its architectural reset value to always set bit 10, which is reserved to '1', when "clearing" DR7 so as not to trigger unanticipated behavior if said bit is ever unreserved, e.g. as a feature enabling flag with inverted polarity. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Sean Christopherson <seanjc@google.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250620231504.2676902-3-xin%40zytor.com
9 daysx86/traps: Initialize DR6 by writing its architectural reset valueXin Li (Intel)
Initialize DR6 by writing its architectural reset value to avoid incorrectly zeroing DR6 to clear DR6.BLD at boot time, which leads to a false bus lock detected warning. The Intel SDM says: 1) Certain debug exceptions may clear bits 0-3 of DR6. 2) BLD induced #DB clears DR6.BLD and any other debug exception doesn't modify DR6.BLD. 3) RTM induced #DB clears DR6.RTM and any other debug exception sets DR6.RTM. To avoid confusion in identifying debug exceptions, debug handlers should set DR6.BLD and DR6.RTM, and clear other DR6 bits before returning. The DR6 architectural reset value 0xFFFF0FF0, already defined as macro DR6_RESERVED, satisfies these requirements, so just use it to reinitialize DR6 whenever needed. Since clear_all_debug_regs() no longer zeros all debug registers, rename it to initialize_debug_regs() to better reflect its current behavior. Since debug_read_clear_dr6() no longer clears DR6, rename it to debug_read_reset_dr6() to better reflect its current behavior. Fixes: ebb1064e7c2e9 ("x86/traps: Handle #DB for bus lock") Reported-by: Sohil Mehta <sohil.mehta@intel.com> Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/lkml/06e68373-a92b-472e-8fd9-ba548119770c@intel.com/ Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250620231504.2676902-2-xin%40zytor.com
11 daysMerge tag 'x86_urgent_for_v6.16_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure the array tracking which kernel text positions need to be alternatives-patched doesn't get mishandled by out-of-order modifications, leading to it overflowing and causing page faults when patching - Avoid an infinite loop when early code does a ranged TLB invalidation before the broadcast TLB invalidation count of how many pages it can flush, has been read from CPUID - Fix a CONFIG_MODULES typo - Disable broadcast TLB invalidation when PTI is enabled to avoid an overflow of the bitmap tracking dynamic ASIDs which need to be flushed when the kernel switches between the user and kernel address space - Handle the case of a CPU going offline and thus reporting zeroes when reading top-level events in the resctrl code * tag 'x86_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Fix int3 handling failure from broken text_poke array x86/mm: Fix early boot use of INVPLGB x86/its: Fix an ifdef typo in its_alloc() x86/mm: Disable INVLPGB when PTI is enabled x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem
11 daysMerge tag 'perf_urgent_for_v6.16_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Avoid a crash on a heterogeneous machine where not all cores support the same hw events features - Avoid a deadlock when throttling events - Document the perf event states more - Make sure a number of perf paths switching off or rescheduling events call perf_cgroup_event_disable() - Make sure perf does task sampling before its userspace mapping is torn down, and not after * tag 'perf_urgent_for_v6.16_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix crash in icl_update_topdown_event() perf: Fix the throttle error of some clock events perf: Add comment to enum perf_event_state perf/core: Fix WARN in perf_cgroup_switch() perf: Fix dangling cgroup pointer in cpuctx perf: Fix cgroup state vs ERROR perf: Fix sample vs do_exit()
11 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix another set of FP/SIMD/SVE bugs affecting NV, and plugging some missing synchronisation - A small fix for the irqbypass hook fixes, tightening the check and ensuring that we only deal with MSI for both the old and the new route entry - Rework the way the shadow LRs are addressed in a nesting configuration, plugging an embarrassing bug as well as simplifying the whole process - Add yet another fix for the dreaded arch_timer_edge_cases selftest RISC-V: - Fix the size parameter check in SBI SFENCE calls - Don't treat SBI HFENCE calls as NOPs x86 TDX: - Complete API for handling complex TDVMCALLs in userspace. This was delayed because the spec lacked a way for userspace to deny supporting these calls; the new exit code is now approved" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: TDX: Exit to userspace for GetTdVmCallInfo KVM: TDX: Handle TDG.VP.VMCALL<GetQuote> KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs KVM: arm64: VHE: Centralize ISBs when returning to host KVM: arm64: Remove cpacr_clear_set() KVM: arm64: Remove ad-hoc CPTR manipulation from kvm_hyp_handle_fpsimd() KVM: arm64: Remove ad-hoc CPTR manipulation from fpsimd_sve_sync() KVM: arm64: Reorganise CPTR trap manipulation KVM: arm64: VHE: Synchronize CPTR trap deactivation KVM: arm64: VHE: Synchronize restore of host debug registers KVM: arm64: selftests: Close the GIC FD in arch_timer_edge_cases KVM: arm64: Explicitly treat routing entry type changes as changes KVM: arm64: nv: Fix tracking of shadow list registers RISC-V: KVM: Don't treat SBI HFENCE calls as NOPs RISC-V: KVM: Fix the size parameter check in SBI SFENCE calls
13 daysKVM: TDX: Exit to userspace for GetTdVmCallInfoBinbin Wu
Exit to userspace for TDG.VP.VMCALL<GetTdVmCallInfo> via KVM_EXIT_TDX, to allow userspace to provide information about the support of TDVMCALLs when r12 is 1 for the TDVMCALLs beyond the GHCI base API. GHCI spec defines the GHCI base TDVMCALLs: <GetTdVmCallInfo>, <MapGPA>, <ReportFatalError>, <Instruction.CPUID>, <#VE.RequestMMIO>, <Instruction.HLT>, <Instruction.IO>, <Instruction.RDMSR> and <Instruction.WRMSR>. They must be supported by VMM to support TDX guests. For GetTdVmCallInfo - When leaf (r12) to enumerate TDVMCALL functionality is set to 0, successful execution indicates all GHCI base TDVMCALLs listed above are supported. Update the KVM TDX document with the set of the GHCI base APIs. - When leaf (r12) to enumerate TDVMCALL functionality is set to 1, it indicates the TDX guest is querying the supported TDVMCALLs beyond the GHCI base TDVMCALLs. Exit to userspace to let userspace set the TDVMCALL sub-function bit(s) accordingly to the leaf outputs. KVM could set the TDVMCALL bit(s) supported by itself when the TDVMCALLs don't need support from userspace after returning from userspace and before entering guest. Currently, no such TDVMCALLs implemented, KVM just sets the values returned from userspace. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> [Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
13 daysKVM: TDX: Handle TDG.VP.VMCALL<GetQuote>Binbin Wu
Handle TDVMCALL for GetQuote to generate a TD-Quote. GetQuote is a doorbell-like interface used by TDX guests to request VMM to generate a TD-Quote signed by a service hosting TD-Quoting Enclave operating on the host. A TDX guest passes a TD Report (TDREPORT_STRUCT) in a shared-memory area as parameter. Host VMM can access it and queue the operation for a service hosting TD-Quoting enclave. When completed, the Quote is returned via the same shared-memory area. KVM only checks the GPA from the TDX guest has the shared-bit set and drops the shared-bit before exiting to userspace to avoid bleeding the shared-bit into KVM's exit ABI. KVM forwards the request to userspace VMM (e.g. QEMU) and userspace VMM queues the operation asynchronously. KVM sets the return code according to the 'ret' field set by userspace to notify the TDX guest whether the request has been queued successfully or not. When the request has been queued successfully, the TDX guest can poll the status field in the shared-memory area to check whether the Quote generation is completed or not. When completed, the generated Quote is returned via the same buffer. Add KVM_EXIT_TDX as a new exit reason to userspace. Userspace is required to handle the KVM exit reason as the initial support for TDX, by reentering KVM to ensure that the TDVMCALL is complete. While at it, add a note that KVM_EXIT_HYPERCALL also requires reentry with KVM_RUN. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Mikko Ylinen <mikko.ylinen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> [Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
13 daysKVM: TDX: Add new TDVMCALL status code for unsupported subfuncsBinbin Wu
Add the new TDVMCALL status code TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED and return it for unimplemented TDVMCALL subfunctions. Returning TDVMCALL_STATUS_INVALID_OPERAND when a subfunction is not implemented is vague because TDX guests can't tell the error is due to the subfunction is not supported or an invalid input of the subfunction. New GHCI spec adds TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED to avoid the ambiguity. Use it instead of TDVMCALL_STATUS_INVALID_OPERAND. Before the change, for common guest implementations, when a TDX guest receives TDVMCALL_STATUS_INVALID_OPERAND, it has two cases: 1. Some operand is invalid. It could change the operand to another value retry. 2. The subfunction is not supported. For case 1, an invalid operand usually means the guest implementation bug. Since the TDX guest can't tell which case is, the best practice for handling TDVMCALL_STATUS_INVALID_OPERAND is stopping calling such leaf, treating the failure as fatal if the TDVMCALL is essential or ignoring it if the TDVMCALL is optional. With this change, TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED could be sent to old TDX guest that do not know about it, but it is expected that the guest will make the same action as TDVMCALL_STATUS_INVALID_OPERAND. Currently, no known TDX guest checks TDVMCALL_STATUS_INVALID_OPERAND specifically; for example Linux just checks for success. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> [Return it for untrapped KVM_HC_MAP_GPA_RANGE. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-18x86/alternatives: Fix int3 handling failure from broken text_poke arrayMasami Hiramatsu (Google)
Since smp_text_poke_single() does not expect there is another text_poke request is queued, it can make text_poke_array not sorted or cause a buffer overflow on the text_poke_array.vec[]. This will cause an Oops in int3 because of bsearch failing; CPU 0 CPU 1 CPU 2 ----- ----- ----- smp_text_poke_batch_add() smp_text_poke_single() <<-- Adds out of order <int3> [Fails o find address in text_poke_array ] OOPS! Or unhandled page fault because of a buffer overflow; CPU 0 CPU 1 ----- ----- smp_text_poke_batch_add() <<+ ... | smp_text_poke_batch_add() <<-- Adds TEXT_POKE_ARRAY_MAX times. smp_text_poke_single() { __smp_text_poke_batch_add() <<-- Adds entry at TEXT_POKE_ARRAY_MAX + 1 smp_text_poke_batch_finish() [Unhandled page fault because text_poke_array.nr_entries is overwritten] BUG! } Use smp_text_poke_batch_add() instead of __smp_text_poke_batch_add() so that it correctly flush the queue if needed. Closes: https://lore.kernel.org/all/CA+G9fYsLu0roY3DV=tKyqP7FEKbOEETRvTDhnpPxJGbA=Cg+4w@mail.gmail.com/ Fixes: c8976ade0c1b ("x86/alternatives: Simplify smp_text_poke_single() by using tp_vec and existing APIs") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lkml.kernel.org/r/\ 175020512308.3582717.13631440385506146631.stgit@mhiramat.tok.corp.google.com
2025-06-17x86/mm: Fix early boot use of INVPLGBRik van Riel
The INVLPGB instruction has limits on how many pages it can invalidate at once. That limit is enumerated in CPUID, read by the kernel, and stored in 'invpgb_count_max'. Ranged invalidation, like invlpgb_kernel_range_flush() break up their invalidations so that they do not exceed the limit. However, early boot code currently attempts to do ranged invalidation before populating 'invlpgb_count_max'. There is a for loop which is basically: for (...; addr < end; addr += invlpgb_count_max*PAGE_SIZE) If invlpgb_kernel_range_flush is called before the kernel has read the value of invlpgb_count_max from the hardware, the normally bounded loop can become an infinite loop if invlpgb_count_max is initialized to zero. Fix that issue by initializing invlpgb_count_max to 1. This way INVPLGB at early boot time will be a little bit slower than normal (with initialized invplgb_count_max), and not an instant hang at bootup time. Fixes: b7aa05cbdc52 ("x86/mm: Add INVLPGB support code") Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250606171112.4013261-3-riel%40surriel.com
2025-06-17x86/its: Fix an ifdef typo in its_alloc()Lukas Bulwahn
Commit a82b26451de1 ("x86/its: explicitly manage permissions for ITS pages") reworks its_alloc() and introduces a typo in an ifdef conditional, referring to CONFIG_MODULE instead of CONFIG_MODULES. Fix this typo in its_alloc(). Fixes: a82b26451de1 ("x86/its: explicitly manage permissions for ITS pages") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20250616100432.22941-1-lukas.bulwahn%40redhat.com
2025-06-17x86/mm: Disable INVLPGB when PTI is enabledDave Hansen
PTI uses separate ASIDs (aka. PCIDs) for kernel and user address spaces. When the kernel needs to flush the user address space, it just sets a bit in a bitmap and then flushes the entire PCID on the next switch to userspace. This bitmap is a single 'unsigned long' which is plenty for all 6 dynamic ASIDs. But, unfortunately, the INVLPGB support brings along a bunch more user ASIDs, as many as ~2k more. The bitmap can't address that many. Fortunately, the bitmap is only needed for PTI and all the CPUs with INVLPGB are AMD CPUs that aren't vulnerable to Meltdown and don't need PTI. The only way someone can run into an issue in practice is by booting with pti=on on a newer AMD CPU. Disable INVLPGB if PTI is enabled. Avoid overrunning the small bitmap. Note: this will be fixed up properly by making the bitmap bigger. For now, just avoid the mostly theoretical bug. Fixes: 4afeb0ed1753 ("x86/mm: Enable broadcast TLB invalidation for multi-threaded processes") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Rik van Riel <riel@surriel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250610222420.E8CBF472%40davehans-spike.ostc.intel.com
2025-06-16x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl ↵Qinyun Tan
subsystem In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain structure representing a NUMA node relies on the cacheinfo interface (rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map) for monitoring. The L3 cache information of a SNC NUMA node determines which domains are summed for the "top level" L3-scoped events. rdt_mon_domain::ci is initialized using the first online CPU of a NUMA node. When this CPU goes offline, its shared_cpu_map is cleared to contain only the offline CPU itself. Subsequently, attempting to read counters via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning zero values for "top-level events" without any error indication. Replace the cacheinfo references in struct rdt_mon_domain and struct rmid_read with the cacheinfo ID (a unique identifier for the L3 cache). rdt_domain_hdr::cpu_mask contains the online CPUs associated with that domain. When reading "top-level events", select a CPU from rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine valid CPUs for reading RMID counter via the MSR interface. Considering all CPUs associated with the L3 cache improves the chances of picking a housekeeping CPU on which the counter reading work can be queued, avoiding an unnecessary IPI. Fixes: 328ea68874642 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files") Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com
2025-06-16Merge tag 'x86_urgent_for_6.16-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Dave Hansen: "This is a pretty scattered set of fixes. The majority of them are further fixups around the recent ITS mitigations. The rest don't really have a coherent story: - Some flavors of Xen PV guests don't support large pages, but the set_memory.c code assumes all CPUs support them. Avoid problems with a quick CPU feature check. - The TDX code has some wrappers to help retry calls to the TDX module. They use function pointers to assembly functions and the compiler usually generates direct CALLs. But some new compilers, plus -Os turned them in to indirect CALLs and the assembly code was not annotated for indirect calls. Force inlining of the helper to fix it up. - Last, a FRED issue showed up when single-stepping. It's fine when using an external debugger, but was getting stuck returning from a SIGTRAP handler otherwise. Clear the FRED 'swevent' bit to ensure that forward progress is made" * tag 'x86_urgent_for_6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "mm/execmem: Unify early execmem_cache behaviour" x86/its: explicitly manage permissions for ITS pages x86/its: move its_pages array to struct mod_arch_specific x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is set x86/mm/pat: don't collapse pages without PSE set x86/virt/tdx: Avoid indirect calls to TDX assembly functions selftests/x86: Add a test to detect infinite SIGTRAP handler loop x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler
2025-06-13Merge tag 'pm-6.16-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix the cpupower utility installation, fix up the recently added Rust abstractions for cpufreq and OPP, restore the x86 update eliminating mwait_play_dead_cpuid_hint() that has been reverted during the 6.16 merge window along with preventing the failure caused by it from happening, and clean up mwait_idle_with_hints() usage in intel_idle: - Implement CpuId Rust abstraction and use it to fix doctest failure related to the recently introduced cpumask abstraction (Viresh Kumar) - Do minor cleanups in the `# Safety` sections for cpufreq abstractions added recently (Viresh Kumar) - Unbreak cpupower systemd service units installation on some systems by adding a unitdir variable for specifying the location to install them (Francesco Poli) - Eliminate mwait_play_dead_cpuid_hint() again after reverting its elimination during the 6.16 merge window due to a problem with handling "dead" SMT siblings, but this time prevent leaving them in C1 after initialization by taking them online and back offline when a proper cpuidle driver for the platform has been registered (Rafael Wysocki) - Update data types of variables passed as arguments to mwait_idle_with_hints() to match the function definition after recent changes (Uros Bizjak)" * tag 'pm-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: rust: cpu: Add CpuId::current() to retrieve current CPU ID rust: Use CpuId in place of raw CPU numbers rust: cpu: Introduce CpuId abstraction intel_idle: Update arguments of mwait_idle_with_hints() cpufreq: Convert `/// SAFETY` lines to `# Safety` sections cpupower: split unitdir from libdir in Makefile Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()" ACPI: processor: Rescan "dead" SMT siblings during initialization intel_idle: Rescan "dead" SMT siblings during initialization x86/smp: PM/hibernate: Split arch_resume_nosmt() intel_idle: Use subsys_initcall_sync() for initialization
2025-06-13Merge branch 'pm-cpuidle'Rafael J. Wysocki
Merge cpuidle updates for 6.16-rc2: - Update data types of variables passed as arguments to mwait_idle_with_hints() to match the function definition after recent changes (Uros Bizjak). - Eliminate mwait_play_dead_cpuid_hint() again after reverting its elimination during the merge window due to a problem with handling "dead" SMT siblings, but this time prevent leaving them in C1 after initialization by taking them online and back offline when a proper cpuidle driver for the platform has been registered (Rafael Wysocki). * pm-cpuidle: intel_idle: Update arguments of mwait_idle_with_hints() Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()" ACPI: processor: Rescan "dead" SMT siblings during initialization intel_idle: Rescan "dead" SMT siblings during initialization x86/smp: PM/hibernate: Split arch_resume_nosmt() intel_idle: Use subsys_initcall_sync() for initialization
2025-06-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Rework of system register accessors for system registers that are directly writen to memory, so that sanitisation of the in-memory value happens at the correct time (after the read, or before the write). For convenience, RMW-style accessors are also provided. - Multiple fixes for the so-called "arch-timer-edge-cases' selftest, which was always broken. x86: - Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to pass only the "untouched" addresses and flipping the shared/private bit in the implementation. - Disable SEV-SNP support on initialization failure * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORY KVM: SEV: Disable SEV-SNP support on initialization failure KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases KVM: arm64: selftests: Fix help text for arch_timer_edge_cases KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg KVM: arm64: Add RMW specific sysreg accessor KVM: arm64: Add assignment-specific sysreg accessor
2025-06-13perf/x86/intel: Fix crash in icl_update_topdown_event()Kan Liang
The perf_fuzzer found a hard-lockup crash on a RaptorLake machine: Oops: general protection fault, maybe for address 0xffff89aeceab400: 0000 CPU: 23 UID: 0 PID: 0 Comm: swapper/23 Tainted: [W]=WARN Hardware name: Dell Inc. Precision 9660/0VJ762 RIP: 0010:native_read_pmc+0x7/0x40 Code: cc e8 8d a9 01 00 48 89 03 5b cd cc cc cc cc 0f 1f ... RSP: 000:fffb03100273de8 EFLAGS: 00010046 .... Call Trace: <TASK> icl_update_topdown_event+0x165/0x190 ? ktime_get+0x38/0xd0 intel_pmu_read_event+0xf9/0x210 __perf_event_read+0xf9/0x210 CPUs 16-23 are E-core CPUs that don't support the perf metrics feature. The icl_update_topdown_event() should not be invoked on these CPUs. It's a regression of commit: f9bdf1f95339 ("perf/x86/intel: Avoid disable PMU if !cpuc->enabled in sample read") The bug introduced by that commit is that the is_topdown_event() function is mistakenly used to replace the is_topdown_count() call to check if the topdown functions for the perf metrics feature should be invoked. Fix it. Fixes: f9bdf1f95339 ("perf/x86/intel: Avoid disable PMU if !cpuc->enabled in sample read") Closes: https://lore.kernel.org/lkml/352f0709-f026-cd45-e60c-60dfd97f73f3@maine.edu/ Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@vger.kernel.org # v6.15+ Link: https://lore.kernel.org/r/20250612143818.2889040-1-kan.liang@linux.intel.com
2025-06-12KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORYPaolo Bonzini
Only let userspace pass the same addresses that were used in KVM_SET_USER_MEMORY_REGION (or KVM_SET_USER_MEMORY_REGION2); gpas in the the upper half of the address space are an implementation detail of TDX and KVM. Extracted from a patch by Sean Christopherson <seanjc@google.com>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-12KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORYPaolo Bonzini
Bug[*] reported for TDX case when enabling KVM_PRE_FAULT_MEMORY in QEMU. It turns out that @gpa passed to kvm_mmu_do_page_fault() doesn't have shared bit set when the memory attribute of it is shared, and it leads to wrong root in tdp_mmu_get_root_for_fault(). Fix it by embedding the direct bits in the gpa that is passed to kvm_tdp_map_page(), when the memory of the gpa is not private. [*] https://lore.kernel.org/qemu-devel/4a757796-11c2-47f1-ae0d-335626e818fd@intel.com/ Reported-by: Xiaoyao Li <xiaoyao.li@intel.com> Closes: https://lore.kernel.org/qemu-devel/4a757796-11c2-47f1-ae0d-335626e818fd@intel.com/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20250611001018.2179964-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-11mm: pgtable: fix pte_swp_exclusiveMagnus Lindholm
Make pte_swp_exclusive return bool instead of int. This will better reflect how pte_swp_exclusive is actually used in the code. This fixes swap/swapoff problems on Alpha due pte_swp_exclusive not returning correct values when _PAGE_SWP_EXCLUSIVE bit resides in upper 32-bits of PTE (like on alpha). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/lkml/20250218175735.19882-2-linmag7@gmail.com/ Link: https://lore.kernel.org/lkml/20250602041118.GA2675383@ZenIV/ [ Applied as the 'sed' script Al suggested - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-06-11KVM: SEV: Disable SEV-SNP support on initialization failureAshish Kalra
During platform init, SNP initialization may fail for several reasons, such as firmware command failures and incompatible versions. However, the KVM capability may continue to advertise support for it. The platform may have SNP enabled but if SNP_INIT fails then SNP is not supported by KVM. During KVM module initialization query the SNP platform status to obtain the SNP initialization state and use it as an additional condition to determine support for SEV-SNP. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Pratik R. Sampat <prsampat@amd.com> Signed-off-by: Pratik R. Sampat <prsampat@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Pavan Kumar Paluri <papaluri@amd.com> Message-ID: <20250512221634.12045-1-Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-11Revert "mm/execmem: Unify early execmem_cache behaviour"Mike Rapoport (Microsoft)
The commit d6d1e3e6580c ("mm/execmem: Unify early execmem_cache behaviour") changed early behaviour of execemem ROX cache to allow its usage in early x86 code that allocates text pages when CONFIG_MITGATION_ITS is enabled. The permission management of the pages allocated from execmem for ITS mitigation is now completely contained in arch/x86/kernel/alternatives.c and therefore there is no need to special case early allocations in execmem. This reverts commit d6d1e3e6580ca35071ad474381f053cbf1fb6414. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250603111446.2609381-6-rppt@kernel.org
2025-06-11x86/its: explicitly manage permissions for ITS pagesPeter Zijlstra (Intel)
execmem_alloc() sets permissions differently depending on the kernel configuration, CPU support for PSE and whether a page is allocated before or after mark_rodata_ro(). Add tracking for pages allocated for ITS when patching the core kernel and make sure the permissions for ITS pages are explicitly managed for both kernel and module allocations. Fixes: 872df34d7c51 ("x86/its: Use dynamic thunks for indirect branches") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250603111446.2609381-5-rppt@kernel.org
2025-06-11x86/its: move its_pages array to struct mod_arch_specificMike Rapoport (Microsoft)
The of pages with ITS thunks allocated for modules are tracked by an array in 'struct module'. Since this is very architecture specific data structure, move it to 'struct mod_arch_specific'. No functional changes. Fixes: 872df34d7c51 ("x86/its: Use dynamic thunks for indirect branches") Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250603111446.2609381-4-rppt@kernel.org
2025-06-11x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is setMike Rapoport (Microsoft)
Currently ROX cache in execmem is enabled regardless of STRICT_MODULE_RWX setting. This breaks an assumption that module memory is writable when STRICT_MODULE_RWX is disabled, for instance for kernel debuggin. Only enable ROX cache in execmem when STRICT_MODULE_RWX is set to restore the original behaviour of module text permissions. Fixes: 64f6a4e10c05 ("x86: re-enable EXECMEM_ROX support") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250603111446.2609381-3-rppt@kernel.org
2025-06-11x86/mm/pat: don't collapse pages without PSE setJuergen Gross
Collapsing pages to a leaf PMD or PUD should be done only if X86_FEATURE_PSE is available, which is not the case when running e.g. as a Xen PV guest. Fixes: 41d88484c71c ("x86/mm/pat: restore large ROX pages after fragmentation") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250528123557.12847-3-jgross@suse.com
2025-06-10x86/virt/tdx: Avoid indirect calls to TDX assembly functionsKai Huang
Two 'static inline' TDX helper functions (sc_retry() and sc_retry_prerr()) take function pointer arguments which refer to assembly functions. Normally, the compiler inlines the TDX helper, realizes that the function pointer targets are completely static -- thus can be resolved at compile time -- and generates direct call instructions. But, other times (like when CONFIG_CC_OPTIMIZE_FOR_SIZE=y), the compiler declines to inline the helpers and will instead generate indirect call instructions. Indirect calls to assembly functions require special annotation (for various Control Flow Integrity mechanisms). But TDX assembly functions lack the special annotations and can only be called directly. Annotate both the helpers as '__always_inline' to prod the compiler into maintaining the direct calls. There is no guarantee here, but Peter has volunteered to report the compiler bug if this assumption ever breaks[1]. Fixes: 1e66a7e27539 ("x86/virt/tdx: Handle SEAMCALL no entropy error in common code") Fixes: df01f5ae07dd ("x86/virt/tdx: Add SEAMCALL error printing for module initialization") Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/20250605145914.GW39944@noisy.programming.kicks-ass.net/ [1] Link: https://lore.kernel.org/all/20250606130737.30713-1-kai.huang%40intel.com
2025-06-09x86/fred/signal: Prevent immediate repeat of single step trap on return from ↵Xin Li (Intel)
SIGTRAP handler Clear the software event flag in the augmented SS to prevent immediate repeat of single step trap on return from SIGTRAP handler if the trap flag (TF) is set without an external debugger attached. Following is a typical single-stepping flow for a user process: 1) The user process is prepared for single-stepping by setting RFLAGS.TF = 1. 2) When any instruction in user space completes, a #DB is triggered. 3) The kernel handles the #DB and returns to user space, invoking the SIGTRAP handler with RFLAGS.TF = 0. 4) After the SIGTRAP handler finishes, the user process performs a sigreturn syscall, restoring the original state, including RFLAGS.TF = 1. 5) Goto step 2. According to the FRED specification: A) Bit 17 in the augmented SS is designated as the software event flag, which is set to 1 for FRED event delivery of SYSCALL, SYSENTER, or INT n. B) If bit 17 of the augmented SS is 1 and ERETU would result in RFLAGS.TF = 1, a single-step trap will be pending upon completion of ERETU. In step 4) above, the software event flag is set upon the sigreturn syscall, and its corresponding ERETU would restore RFLAGS.TF = 1. This combination causes a pending single-step trap upon completion of ERETU. Therefore, another #DB is triggered before any user space instruction is executed, which leads to an infinite loop in which the SIGTRAP handler keeps being invoked on the same user space IP. Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code") Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20250609084054.2083189-2-xin%40zytor.com
2025-06-08Merge tag 'timers-cleanups-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanup from Thomas Gleixner: "The delayed from_timer() API cleanup: The renaming to the timer_*() namespace was delayed due massive conflicts against Linux-next. Now that everything is upstream finish the conversion" * tag 'timers-cleanups-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename from_timer() to timer_container_of()
2025-06-08Merge tag 'x86-urgent-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - Cure IO bitmap inconsistencies A failed fork cleans up all resources of the newly created thread via exit_thread(). exit_thread() invokes io_bitmap_exit() which does the IO bitmap cleanups, which unfortunately assume that the cleanup is related to the current task, which is obviously bogus. Make it work correctly - A lockdep fix in the resctrl code removed the clearing of the command buffer in two places, which keeps stale error messages around. Bring them back. - Remove unused trace events" * tag 'x86-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Restore the rdt_last_cmd_clear() calls after acquiring rdtgroup_mutex x86/iopl: Cure TIF_IO_BITMAP inconsistencies x86/fpu: Remove unused trace events
2025-06-08Merge tag 'perf-urgent-2025-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fix from Thomas Gleixner: "A single fix for the x86 performance counters on Intel CPUs: The MSR offset calculations for fixed performance counters are stored at the wrong index in the configuration array causing the general purpose counter MSR offset to be overwritten, so both the general purpose and the fixed counters offsets are incorrect. Correct the array index calculation to fix that" * tag 'perf-urgent-2025-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix incorrect MSR index calculations in intel_pmu_config_acr()
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-06-07Merge tag 'kbuild-v6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which exports a symbol only to specified modules - Improve ABI handling in gendwarfksyms - Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n - Add checkers for redundant or missing <linux/export.h> inclusion - Deprecate the extra-y syntax - Fix a genksyms bug when including enum constants from *.symref files * tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits) genksyms: Fix enum consts from a reference affecting new values arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES} efi/libstub: use 'targets' instead of extra-y in Makefile module: make __mod_device_table__* symbols static scripts/misc-check: check unnecessary #include <linux/export.h> when W=1 scripts/misc-check: check missing #include <linux/export.h> when W=1 scripts/misc-check: add double-quotes to satisfy shellcheck kbuild: move W=1 check for scripts/misc-check to top-level Makefile scripts/tags.sh: allow to use alternative ctags implementation kconfig: introduce menu type enum docs: symbol-namespaces: fix reST warning with literal block kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION docs/core-api/symbol-namespaces: drop table of contents and section numbering modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time kbuild: move kbuild syntax processing to scripts/Makefile.build Makefile: remove dependency on archscripts for header installation Documentation/kbuild: Add new gendwarfksyms kABI rules Documentation/kbuild: Drop section numbers ...
2025-06-07Reapply "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"Rafael J. Wysocki
Revert commit 70523f335734 ("Revert "x86/smp: Eliminate mwait_play_dead_cpuid_hint()"") to reapply the changes from commit 96040f7273e2 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()") reverted by it. Previously, these changes caused idle power to rise on systems booting with "nosmt" in the kernel command line because they effectively caused "dead" SMT siblings to remain in idle state C1 after executing the HLT instruction, which prevented the processor from reaching package idle states deeper than PC2 going forward. Now, the "dead" SMT siblings are rescanned after initializing a proper cpuidle driver for the processor (either intel_idle or ACPI idle), at which point they are able to enter a sufficiently deep idle state in native_play_dead() via cpuidle, so the code changes in question can be reapplied. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Link: https://patch.msgid.link/7813065.EvYhyI6sBW@rjwysocki.net
2025-06-07x86/smp: PM/hibernate: Split arch_resume_nosmt()Rafael J. Wysocki
Move the inner part of the arch_resume_nosmt() code into a separate function called arch_cpu_rescan_dead_smt_siblings(), so it can be used in other places where "dead" SMT siblings may need to be taken online and offline again in order to get into deep idle states. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Link: https://patch.msgid.link/3361688.44csPzL39Z@rjwysocki.net [ rjw: Prevent build issues with CONFIG_SMP unset ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-06-07arch: use always-$(KBUILD_BUILTIN) for vmlinux.ldsMasahiro Yamada
The extra-y syntax is deprecated. Instead, use always-$(KBUILD_BUILTIN), which behaves equivalently. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2025-06-05Merge tag 'uml-for-linux-6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Johannes Berg: "The only really new thing is the long-standing seccomp work (originally from 2021!). Wven if it still isn't enabled by default due to security concerns it can still be used e.g. for tests. - remove obsolete network transports - remove PCI IO port support - start adding seccomp-based process handling instead of ptrace" * tag 'uml-for-linux-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (29 commits) um: remove "extern" from implementation of sigchld_handler um: fix unused variable warning um: fix SECCOMP 32bit xstate register restore um: pass FD for memory operations when needed um: Add SECCOMP support detection and initialization um: Implement kernel side of SECCOMP based process handling um: Track userspace children dying in SECCOMP mode um: Add helper functions to get/set state for SECCOMP um: Add stub side of SECCOMP/futex based process handling um: Move faultinfo extraction into userspace routine um: vector: Use mac_pton() for MAC address parsing um: vector: Clean up and modernize log messages um: chan_kern: use raw spinlock for irqs_to_free_lock MAINTAINERS: remove obsolete file entry in TUN/TAP DRIVER um: Fix tgkill compile error on old host OSes um: stop using PCI port I/O um: Remove legacy network transport infrastructure um: vector: Eliminate the dependency on uml_net um: Remove obsolete legacy network transports um/asm: Replace "REP; NOP" with PAUSE mnemonic ...
2025-06-05um: fix unused variable warningBenjamin Berg
The code was updated to access the PID of the userspace stub process in a different way, making the local cpu variable obsolete. Remove it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506050008.AwXLNxQX-lkp@intel.com/ Fixes: 406d17c6c370 ("um: Implement kernel side of SECCOMP based process handling") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250605050325.1077208-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-04Merge tag 'pci-v6.16-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Print the actual delay time in pci_bridge_wait_for_secondary_bus() instead of assuming it was 1000ms (Wilfred Mallawa) - Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices', which broke resume from system sleep on AMD platforms and has been fixed by other commits (Lukas Wunner) Resource management: - Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and unnecessary (Philipp Stanner) - Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and related flags since all uses have been removed (Philipp Stanner) - Rework devres 'request' functions so they are no longer 'hybrid', i.e., their behavior no longer depends on whether pcim_enable_device or pci_enable_device() was used, and remove related code (Philipp Stanner) - Warn (not BUG()) about failure to assign optional resources (Ilpo Järvinen) Error handling: - Log the DPC Error Source ID only when it's actually valid (when ERR_FATAL or ERR_NONFATAL was received from a downstream device) and decode into bus/device/function (Bjorn Helgaas) - Determine AER log level once and save it so all related messages use the same level (Karolina Stolarek) - Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable Errors (Karolina Stolarek) - Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs controls on interval and burst count, to avoid flooding logs and RCU stall warnings (Jon Pan-Doh) Power management: - Increment PM usage counter when probing reset methods so we don't try to read config space of a powered-off device (Alex Williamson) - Set all devices to D0 during enumeration to ensure ACPI opregion is connected via _REG (Mario Limonciello) Power control: - Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match the filename paths. Retain old deprecated symbols for compatibility, except for the pwrctrl slot driver (PCI_PWRCTRL_SLOT) (Johan Hovold) - When unregistering pwrctrl, cancel outstanding rescan work before cleaning up data structures to avoid use-after-free issues (Brian Norris) Bandwidth control: - Simplify link bandwidth controller by replacing the count of Link Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag (Ilpo Järvinen) - Update the Link Speed after retraining, since the Link Speed may have changed (Ilpo Järvinen) PCIe native device hotplug: - Ignore Presence Detect Changed caused by DPC. pciehp already ignores Link Down/Up events caused by DPC, but on slots using in-band presence detect, DPC causes a spurious Presence Detect Changed event (Lukas Wunner) - Ignore Link Down/Up caused by Secondary Bus Reset. On hotplug ports using in-band presence detect, the reset causes a Presence Detect Changed event, which mistakenly caused teardown and re-enumeration of the device. Drivers may need to annotate code that resets their device (Lukas Wunner) Virtualization: - Add an ACS quirk for Loongson Root Ports that don't advertise ACS but don't allow peer-to-peer transactions between Root Ports; the quirk allows each Root Port to be in a separate IOMMU group (Huacai Chen) Endpoint framework: - For fixed-size BARs, retain both the actual size and the possibly larger size allocated to accommodate iATU alignment requirements (Jerome Brunet) - Simplify ctrl/SPAD space allocation and avoid allocating more space than needed (Jerome Brunet) - Correct MSI-X PBA offset calculations for DesignWare and Cadence endpoint controllers (Niklas Cassel) - Align the return value (number of interrupts) encoding for pci_epc_get_msi()/pci_epc_ops::get_msi() and pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel) - Align the nr_irqs parameter encoding for pci_epc_set_msi()/pci_epc_ops::set_msi() and pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel) Common host controller library: - Convert pci-host-common to a library so platforms that don't need native host controller drivers don't need to include these helper functions (Manivannan Sadhasivam) Apple PCIe controller driver: - Extract ECAM bridge creation helper from pci_host_common_probe() to separate driver-specific things like MSI from PCI things (Marc Zyngier) - Dynamically allocate RID-to_SID bitmap to prepare for SoCs with varying capabilities (Marc Zyngier) - Skip ports disabled in DT when setting up ports (Janne Grunau) - Add t6020 compatible string (Alyssa Rosenzweig) - Add T602x PCIe support (Hector Martin) - Directly set/clear INTx mask bits because T602x dropped the accessors that could do this without locking (Marc Zyngier) - Move port PHY registers to their own reg items to accommodate T602x, which moves them around; retain default offsets for existing DTs that lack phy%d entries with the reg offsets (Hector Martin) - Stop polling for core refclk, which doesn't work on T602x and the bootloader has already done anyway (Hector Martin) - Use gpiod_set_value_cansleep() when asserting PERST# in probe because we're allowed to sleep there (Hector Martin) Cadence PCIe controller driver: - Drop a runtime PM 'put' to resolve a runtime atomic count underflow (Hans Zhang) - Make the cadence core buildable as a module (Kishon Vijay Abraham I) - Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by loadable drivers when they are removed (Siddharth Vadapalli) Freescale i.MX6 PCIe controller driver: - Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP (Richard Zhu) - Remove redundant dw_pcie_wait_for_link() from imx_pcie_start_link(); since the DWC core does this, imx6 only needs it when retraining for a faster link speed (Richard Zhu) - Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu) - Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in some cases, the controller can't exit 'L23 Ready' through Beacon or PERST# deassertion (Richard Zhu) - Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum: controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8 GT/s, causing timeouts in L1 (Richard Zhu) - Wait for i.MX95 PLL lock before enabling controller (Richard Zhu) - Save/restore i.MX95 LUT for suspend/resume (Richard Zhu) Mobiveil PCIe controller driver: - Return bool (not int) for link-up check in mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans Zhang) NVIDIA Tegra194 PCIe controller driver: - Create debugfs directory for 'aspm_state_cnt' only when CONFIG_PCIEASPM is enabled, since there are no other entries (Hans Zhang) Qualcomm PCIe controller driver: - Add OF support for parsing DT 'eq-presets-<N>gts' property for lane equalization presets (Krishna Chaitanya Chundru) - Read Maximum Link Width from the Link Capabilities register if DT lacks 'num-lanes' property (Krishna Chaitanya Chundru) - Add Physical Layer 64 GT/s Capability ID and register offsets for 8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya Chundru) - Add generic dwc support for configuring lane equalization presets (Krishna Chaitanya Chundru) - Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar) Renesas R-Car PCIe controller driver: - Describe endpoint BAR 4 as being fixed size (Jerome Brunet) - Document how to obtain R-Car V4H (r8a779g0) controller firmware (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Reorder rockchip_pci_core_rsts because reset_control_bulk_deassert() deasserts in reverse order, to fix a link training regression (Jensen Huang) - Mark RK3399 as being capable of raising INTx interrupts (Niklas Cassel) Rockchip DesignWare PCIe controller driver: - Check only PCIE_LINKUP, not LTSSM status, to determine whether the link is up (Shawn Lin) - Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s for Root Complex and Endpoint modes (Shawn Lin) - Hide the broken ATS Capability in rockchip_pcie_ep_init() instead of rockchip_pcie_ep_pre_init() so it stays hidden after PERST# resets non-sticky registers (Shawn Lin) - Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit() (Diederik de Haas) Synopsys DesignWare PCIe controller driver: - Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training more robust; this will not affect the intended link width if all lanes are functional (Wenbin Yao) - Return bool (not int) for link-up check in dw_pcie_ops.link_up() and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay, keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx, tegra194, uniphier, visconti (Hans Zhang) - Add debugfs support for exposing DWC device-specific PTM context (Manivannan Sadhasivam) TI J721E PCIe driver: - Make j721e buildable as a loadable and removable module (Siddharth Vadapalli) - Fix j721e host/endpoint dependencies that result in link failures in some configs (Arnd Bergmann) Device tree bindings: - Add qcom DT binding for 'global' interrupt (PCIe controller and link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p, sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan Sadhasivam) - Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074, ipq8074-gen3, ipq6018 (Manivannan Sadhasivam) - Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang) - Correct indentation and style of examples in brcm,stb-pcie, cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie, microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm (Krzysztof Kozlowski) - Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and armada8k from text to schema DT bindings (Rob Herring) - Remove obsolete .txt DT bindings for content that has been moved to schemas (Rob Herring) - Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074 and IPQ9574 (Varadarajan Narayanan) - Convert v3,v360epc-pci from text to DT schema binding (Rob Herring) - Change microchip,pcie-host DT binding to be 'dma-noncoherent' since PolarFire may be configured that way (Conor Dooley) Miscellaneous: - Drop 'pci' suffix from intel_mid_pci.c filename to match similar files (Andy Shevchenko) - All platforms with PCI have an MMU, so add PCI Kconfig dependency on MMU to simplify build testing and avoid inadvertent build regressions (Arnd Bergmann) - Update Krzysztof Wilczyński's email address in MAINTAINERS (Krzysztof Wilczyński) - Update Manivannan Sadhasivam's email address in MAINTAINERS (Manivannan Sadhasivam)" * tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits) MAINTAINERS: Update Manivannan Sadhasivam email address PCI: j721e: Fix host/endpoint dependencies PCI: j721e: Add support to build as a loadable module PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup PCI: cadence: Add support to build pcie-cadence library as a kernel module MAINTAINERS: Update Krzysztof Wilczyński email address PCI: Remove unnecessary linesplit in __pci_setup_bridge() PCI: WARN (not BUG()) when we fail to assign optional resources PCI: Remove unused pci_printk() PCI: qcom: Replace PERST# sleep time with proper macro PCI: dw-rockchip: Replace PERST# sleep time with proper macro PCI: host-common: Convert to library for host controller drivers PCI/ERR: Remove misleading TODO regarding kernel panic PCI: cadence: Remove duplicate message code definitions PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding PCI: cadence-ep: Correct PBA offset in .set_msix() callback ...
2025-06-04um: fix SECCOMP 32bit xstate register restoreBenjamin Berg
There was a typo that caused the extended FP state to be copied into the wrong location on 32 bit. On 32 bit we only store the xstate internally as that already contains everything. However, for compatibility, the mcontext on 32 bit first contains the legacy FP state and then the xstate. The code copied the xstate on top of the legacy FP state instead of using the correct offset. This offset was already calculated in the xstate_* variables, so simply switch to those to fix the problem. With this SECCOMP mode works on 32 bit, so lift the restriction. Fixes: b1e1bd2e6943 ("um: Add helper functions to get/set state for SECCOMP") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250604081705.934112-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-03Merge tag 'hyperv-next-signed-20250602' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel) - Fixes for Hyper-V UIO driver (Long Li) - Fixes for Hyper-V PCI driver (Michael Kelley) - Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley) - Documentation updates for Hyper-V VMBus (Michael Kelley) - Enhance logging for hv_kvp_daemon (Shradha Gupta) * tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits) Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir Documentation: hyperv: Update VMBus doc with new features and info PCI: hv: Remove unnecessary flex array in struct pci_packet Drivers: hv: Remove hv_alloc/free_* helpers Drivers: hv: Use kzalloc for panic page allocation uio_hv_generic: Align ring size to system page uio_hv_generic: Use correct size for interrupt and monitor pages Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary arch/x86: Provide the CPU number in the wakeup AP callback x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap() PCI: hv: Get vPCI MSI IRQ domain from DeviceTree ACPI: irq: Introduce acpi_get_gsi_dispatcher() Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device() Drivers: hv: vmbus: Get the IRQ number from DeviceTree dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties arm64, x86: hyperv: Report the VTL the system boots in arm64: hyperv: Initialize the Virtual Trust Level field Drivers: hv: Provide arch-neutral implementation of get_vtl() Drivers: hv: Enable VTL mode for arm64 ...
2025-06-03x86/iopl: Cure TIF_IO_BITMAP inconsistenciesThomas Gleixner
io_bitmap_exit() is invoked from exit_thread() when a task exists or when a fork fails. In the latter case the exit_thread() cleans up resources which were allocated during fork(). io_bitmap_exit() invokes task_update_io_bitmap(), which in turn ends up in tss_update_io_bitmap(). tss_update_io_bitmap() operates on the current task. If current has TIF_IO_BITMAP set, but no bitmap installed, tss_update_io_bitmap() crashes with a NULL pointer dereference. There are two issues, which lead to that problem: 1) io_bitmap_exit() should not invoke task_update_io_bitmap() when the task, which is cleaned up, is not the current task. That's a clear indicator for a cleanup after a failed fork(). 2) A task should not have TIF_IO_BITMAP set and neither a bitmap installed nor IOPL emulation level 3 activated. This happens when a kernel thread is created in the context of a user space thread, which has TIF_IO_BITMAP set as the thread flags are copied and the IO bitmap pointer is cleared. Other than in the failed fork() case this has no impact because kernel threads including IO workers never return to user space and therefore never invoke tss_update_io_bitmap(). Cure this by adding the missing cleanups and checks: 1) Prevent io_bitmap_exit() to invoke task_update_io_bitmap() if the to be cleaned up task is not the current task. 2) Clear TIF_IO_BITMAP in copy_thread() unconditionally. For user space forks it is set later, when the IO bitmap is inherited in io_bitmap_share(). For paranoia sake, add a warning into tss_update_io_bitmap() to catch the case, when that code is invoked with inconsistent state. Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped") Reported-by: syzbot+e2b1803445d236442e54@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/87wmdceom2.ffs@tglx
2025-06-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more kvm updates from Paolo Bonzini: Generic: - Clean up locking of all vCPUs for a VM by using the *_nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra - Add MGLRU support to the access tracking perf test ARM fixes: - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet s390: - Fix interaction between some filesystems and Secure Execution - Some cleanups and refactorings, preparing for an upcoming big series x86: - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM - Refine and harden handling of spurious faults - Add support for ALLOWED_SEV_FEATURES - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits - Don't account temporary allocations in sev_send_update_data() - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX - Advertise support to userspace for WRMSRNS and PREFETCHI - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing - Add a module param to control and enumerate support for device posted interrupts - Fix a potential overflow with nested virt on Intel systems running 32-bit kernels - Flush shadow VMCSes on emergency reboot - Add support for SNP to the various SEV selftests - Add a selftest to verify fastops instructions via forced emulation - Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) rtmutex_api: provide correct extern functions KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race KVM: arm64: Unmap vLPIs affected by changes to GSI routing information KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock KVM: arm64: Use lock guard in vgic_v4_set_forwarding() KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue KVM: s390: Simplify and move pv code KVM: s390: Refactor and split some gmap helpers KVM: s390: Remove unneeded srcu lock s390: Remove unneeded includes s390/uv: Improve splitting of large folios that cannot be split while dirty s390/uv: Always return 0 from s390_wiggle_split_folio() if successful s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful rust: add helper for mutex_trylock RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation ...
2025-06-02um: Implement kernel side of SECCOMP based process handlingBenjamin Berg
This adds the kernel side of the seccomp based process handling. Co-authored-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250602130052.545733-6-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-02um: Add helper functions to get/set state for SECCOMPBenjamin Berg
When not using ptrace, we need to both save and restore registers through the mcontext as provided by the host kernel to our signal handlers. Add corresponding functions to store the state to an mcontext and helpers to access the mcontext of the subprocess through the stub data. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250602130052.545733-4-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-06-02um: Add stub side of SECCOMP/futex based process handlingBenjamin Berg
This adds the stub side for the new seccomp process management code. In this case we do register save/restore through the signal handler mcontext. Add special code for handling TLS, which for x86_64 means setting the FS_BASE/GS_BASE registers while for i386 it means calling the set_thread_area syscall. Co-authored-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250602130052.545733-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-31Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores - "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts - "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter - "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count - "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c - "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts * tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits) llist: make llist_add_batch() a static inline delayacct: remove redundant code and adjust indentation squashfs: add optional full compressed block caching crash_dump, nvme: select CONFIGFS_FS as built-in scripts/gdb/symbols: determine KASLR offset on s390 during early boot scripts/gdb/symbols: factor out pagination_off() scripts/gdb/symbols: factor out get_vmlinux() kernel/panic.c: format kernel-doc comments mailmap: update and consolidate Casey Connolly's name and email nilfs2: remove wbc->for_reclaim handling fork: define a local GFP_VMAP_STACK fork: check charging success before zeroing stack fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code fork: clean-up ifdef logic around stack allocation kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count x86/crash: make the page that stores the dm crypt keys inaccessible x86/crash: pass dm crypt keys to kdump kernel Revert "x86/mm: Remove unused __set_memory_prot()" crash_dump: retrieve dm crypt keys in kdump kernel ...