Age | Commit message (Collapse) | Author |
|
Complete the AMD CPUID(4) emulation logic, which uses CPUID(0x80000006)
for L2/L3 cache info and an assocs[] associativity mapping array, by
adding entries for 3-way caches and 6-way caches.
Properly handle the case where CPUID(0x80000006) returns an L2/L3
associativity of 9. This is not real associativity, but a marker to
indicate that the respective L2/L3 cache information should be retrieved
from CPUID(0x8000001d) instead. If such a marker is encountered, return
early from legacy_amd_cpuid4(), thus effectively emulating an "invalid
index" CPUID(4) response with a cache type of zero.
When checking if CPUID(0x80000006) L2/L3 cache info output is valid, and
given the associtivity marker 9 above, do not just check if the whole
ECX/EDX register is zero. Rather, check if the associativity is zero or
9. An associativity of zero implies no L2/L3 cache, which make it the
more correct check anyway vs. a zero check of the whole output register.
Fixes: a326e948c538 ("x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: x86-cpuid@lists.linux.dev
Link: https://lore.kernel.org/r/20250409122233.1058601-3-darwi@linutronix.de
|
|
For the AMD CPUID(4) emulation cache info logic, the same associativity
mapping array, assocs[], is used for both CPUID(0x80000005) and
CPUID(0x80000006).
This is incorrect since per the AMD manuals, the mappings for
CPUID(0x80000005) L1d/L1i associativity is:
n = 0x1 -> 0xfe n
n = 0xff fully associative
while assocs[] maps these values to:
n = 0x1, 0x2, 0x4 n
n = 0x3, 0x7, 0x9 0
n = 0x6 8
n = 0x8 16
n = 0xa 32
n = 0xb 48
n = 0xc 64
n = 0xd 96
n = 0xe 128
n = 0xf fully associative
which is only valid for CPUID(0x80000006).
Parse CPUID(0x80000005) L1d/L1i associativity values as shown in the AMD
manuals. Since the 0xffff literal is used to denote full associativity
at the AMD CPUID(4)-emulation logic, define AMD_CPUID4_FULLY_ASSOCIATIVE
for it instead of spreading that literal in more places.
Mark the assocs[] mapping array as only valid for CPUID(0x80000006) L2/L3
cache information.
Fixes: a326e948c538 ("x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: x86-cpuid@lists.linux.dev
Link: https://lore.kernel.org/r/20250409122233.1058601-2-darwi@linutronix.de
|
|
The NULL array terminator at the end of erratum_1386_microcode was
removed during the switch from x86_cpu_desc to x86_cpu_id. This
causes readers to run off the end of the array.
Replace the NULL.
Fixes: f3f325152673 ("x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
|
|
Create a document to summarize hard-earned knowledge about RSB-related
mitigations, with references, and replace the overly verbose yet
incomplete comments with a reference to the document.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/ab73f4659ba697a974759f07befd41ae605e33dd.1744148254.git.jpoimboe@kernel.org
|
|
User->user Spectre v2 attacks (including RSB) across context switches
are already mitigated by IBPB in cond_mitigation(), if enabled globally
or if either the prev or the next task has opted in to protection. RSB
filling without IBPB serves no purpose for protecting user space, as
indirect branches are still vulnerable.
User->kernel RSB attacks are mitigated by eIBRS. In which case the RSB
filling on context switch isn't needed, so remove it.
Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Amit Shah <amit.shah@amd.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/98cdefe42180358efebf78e3b80752850c7a3e1b.1744148254.git.jpoimboe@kernel.org
|
|
eIBRS protects against guest->host RSB underflow/poisoning attacks.
Adding retpoline to the mix doesn't change that. Retpoline has a
balanced CALL/RET anyway.
So the current full RSB filling on VMEXIT with eIBRS+retpoline is
overkill. Disable it or do the VMEXIT_LITE mitigation if needed.
Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Amit Shah <amit.shah@amd.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Link: https://lore.kernel.org/r/84a1226e5c9e2698eae1b5ade861f1b8bf3677dc.1744148254.git.jpoimboe@kernel.org
|
|
IBPB is expected to clear the RSB. However, if X86_BUG_IBPB_NO_RET is
set, that doesn't happen. Make indirect_branch_prediction_barrier()
take that into account by calling write_ibpb() which clears RSB on
X86_BUG_IBPB_NO_RET:
/* Make sure IBPB clears return stack preductions too. */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_BUG_IBPB_NO_RET
Note that, as of the previous patch, write_ibpb() also reads
'x86_pred_cmd' in order to use SBPB when applicable:
movl _ASM_RIP(x86_pred_cmd), %eax
Therefore that existing behavior in indirect_branch_prediction_barrier()
is not lost.
Fixes: 50e4b3b94090 ("x86/entry: Have entry_ibpb() invalidate return predictions")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/bba68888c511743d4cd65564d1fc41438907523f.1744148254.git.jpoimboe@kernel.org
|
|
There's nothing entry-specific about entry_ibpb(). In preparation for
calling it from elsewhere, rename it to write_ibpb().
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/1e54ace131e79b760de3fe828264e26d0896e3ac.1744148254.git.jpoimboe@kernel.org
|
|
First of all, using 'mmio' prevents proper implementation of 8-bit accessors.
Second, it's simply inconsistent with uart8250 set of options. Rename it to
'mmio32'. While at it, remove rather misleading comment in the documentation.
From now on mmio32 is self-explanatory and pciserial supports not only 32-bit
MMIO accessors.
Also, while at it, fix the comment for the "pciserial" case. The comment
seems to be a copy'n'paste error when mentioning "serial" instead of
"pciserial" (with double quotes). Fix this.
With that, move it upper, so we don't calculate 'buf' twice.
Fixes: 3181424aeac2 ("x86/early_printk: Add support for MMIO-based UARTs")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Denis Mukhin <dmukhin@ford.com>
Link: https://lore.kernel.org/r/20250407172214.792745-1-andriy.shevchenko@linux.intel.com
|
|
Since
741c10b096bc ("kernfs: Use RCU to access kernfs_node::name.")
a helper rdt_kn_name() that checks that rdtgroup_mutex is held has been used
for all accesses to the kernfs node name.
rdtgroup_mkdir() uses the name to determine if a valid monitor group is being
created by checking the parent name is "mon_groups". This is done without
holding rdtgroup_mutex, and now triggers the following warning:
| WARNING: suspicious RCU usage
| 6.15.0-rc1 #4465 Tainted: G E
| -----------------------------
| arch/x86/kernel/cpu/resctrl/internal.h:408 suspicious rcu_dereference_check() usage!
[...]
| Call Trace:
| <TASK>
| dump_stack_lvl
| lockdep_rcu_suspicious.cold
| is_mon_groups
| rdtgroup_mkdir
| kernfs_iop_mkdir
| vfs_mkdir
| do_mkdirat
| __x64_sys_mkdir
| do_syscall_64
| entry_SYSCALL_64_after_hwframe
Creating a control or monitor group calls mkdir_rdt_prepare(), which uses
rdtgroup_kn_lock_live() to take the rdtgroup_mutex.
To avoid taking and dropping the lock, move the check for the monitor group
name and position into mkdir_rdt_prepare() so that it occurs under
rdtgroup_mutex. Hoist is_mon_groups() earlier in the file.
[ bp: Massage. ]
Fixes: 741c10b096bc ("kernfs: Use RCU to access kernfs_node::name.")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250407124637.2433230-1-james.morse@arm.com
|
|
e820__register_nosave_regions()
While debugging kexec/hibernation hangs and crashes, it turned out that
the current implementation of e820__register_nosave_regions() suffers from
multiple serious issues:
- The end of last region is tracked by PFN, causing it to find holes
that aren't there if two consecutive subpage regions are present
- The nosave PFN ranges derived from holes are rounded out (instead of
rounded in) which makes it inconsistent with how explicitly reserved
regions are handled
Fix this by:
- Treating reserved regions as if they were holes, to ensure consistent
handling (rounding out nosave PFN ranges is more correct as the
kernel does not use partial pages)
- Tracking the end of the last RAM region by address instead of pages
to detect holes more precisely
These bugs appear to have been introduced about ~18 years ago with the very
first version of e820_mark_nosave_regions(), and its flawed assumptions were
carried forward uninterrupted through various waves of rewrites and renames.
[ mingo: Added Git archeology details, for kicks and giggles. ]
Fixes: e8eff5ac294e ("[PATCH] Make swsusp avoid memory holes and reserved memory regions on x86_64")
Reported-by: Roberto Ricci <io@r-ricci.it>
Tested-by: Roberto Ricci <io@r-ricci.it>
Signed-off-by: Myrrh Periwinkle <myrrhperiwinkle@qtmlabs.xyz>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250406-fix-e820-nosave-v3-1-f3787bc1ee1d@qtmlabs.xyz
Closes: https://lore.kernel.org/all/Z4WFjBVHpndct7br@desktop0a/
|
|
Xen disables ACPI for PV guests in DomU, which causes acpi_mps_check() to
return 1 when CONFIG_X86_MPPARSE is not set. As a result, the local APIC is
disabled and the guest is later limited to a single vCPU, despite being
configured with more.
This regression was introduced in version 6.9 in commit 7c0edad3643f
("x86/cpu/topology: Rework possible CPU management"), which added an
early check that limits CPUs to 1 if apic_is_disabled.
Update the acpi_mps_check() logic to return 0 early when running as a Xen
PV guest in DomU, preventing APIC from being disabled in this specific case
and restoring correct multi-vCPU behaviour.
Fixes: 7c0edad3643f ("x86/cpu/topology: Rework possible CPU management")
Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250407132445.6732-2-arkamar@atlas.cz
|
|
If microcode did not get loaded there is no reason to keep it in the cache.
Moreover, if loading failed it will not be possible to load an earlier version
of microcode since the failed revision will always be selected from the cache
on the next reload attempt.
Since the failed revisions is not easily available at this point just clean the
whole cache. It will be rebuilt later if needed.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250327230503.1850368-3-boris.ostrovsky@oracle.com
|
|
Add some missing dependencies to the CPUID dependency table:
- All the AMX features depend on AMX_TILE
- All the SPEC_CTRL features depend on SPEC_CTRL
[ mingo: Keep the AMX part of the table grouped ... ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ahmed S. Darwish <darwi@linutronix.de>
Link: https://lore.kernel.org/r/20240924170128.2611854-1-ak@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A set of final cleanups for the timer subsystem:
- Convert all del_timer[_sync]() instances over to the new
timer_delete[_sync]() API and remove the legacy wrappers.
Conversion was done with coccinelle plus some manual fixups as
coccinelle chokes on scoped_guard().
- The final cleanup of the hrtimer_init() to hrtimer_setup()
conversion.
This has been delayed to the end of the merge window, so that all
patches which have been merged through other trees are in mainline
and all new users are catched.
Doing this right before rc1 ensures that new code which is merged post
rc1 is not introducing new instances of the original functionality"
* tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/timers: Rename the hrtimer_init event to hrtimer_setup
hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack()
hrtimers: Rename debug_init() to debug_setup()
hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper()
hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns()
hrtimers: Make callback function pointer private
hrtimers: Merge __hrtimer_init() into __hrtimer_setup()
hrtimers: Switch to use __htimer_setup()
hrtimers: Delete hrtimer_init()
treewide: Convert new and leftover hrtimer_init() users
treewide: Switch/rename to timer_delete[_sync]()
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Naming interrupt domains host is confusing at best and the irqdomain code
uses both domain and host inconsistently.
Therefore rename irq_set_default_host() to irq_set_default_domain().
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250319092951.37667-3-jirislaby@kernel.org
|
|
mwait_idle_with_hints() and prefer_mwait_c1_over_halt()
The following commit, 12 years ago:
7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
added barriers around the CLFLUSH in mwait_idle_with_hints(), justified with:
... and add memory barriers around it since the documentation is explicit
that CLFLUSH is only ordered with respect to MFENCE.
This also triggered, 11 years ago, the same adjustment in:
f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs")
during development, although it failed to get the static_cpu_has_bug() treatment.
X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel CPUs,
and the SDM currently states:
Executions of the CLFLUSH instruction are ordered with respect to each
other and with respect to writes, locked read-modify-write instructions,
and fence instructions[1].
With footnote 1 reading:
Earlier versions of this manual specified that executions of the CLFLUSH
instruction were ordered only by the MFENCE instruction. All processors
implementing the CLFLUSH instruction also order it relative to the other
operations enumerated above.
i.e. The SDM was incorrect at the time, and barriers should not have been
inserted. Double checking the original AAI65 errata (not available from
intel.com any more) shows no mention of barriers either.
Note: If this were a general codepath, the MFENCEs would be needed, because
AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs.
Remove the unnecessary barriers. Furthermore, use a plain alternative(),
rather than static_cpu_has_bug() and/or no optimisation. The workaround
is a single instruction.
Use an explicit %rax pointer rather than a general memory operand, because
MONITOR takes the pointer implicitly in the same way.
[ mingo: Cleaned up the commit a bit. ]
Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20250402172458.1378112-1-andrew.cooper3@citrix.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TDX updates from Dave Hansen:
"Avoid direct HLT instruction execution in TDX guests.
TDX guests aren't expected to use the HLT instruction directly. It
causes a virtualization exception (#VE). While the #VE _can_ be
handled, the current handling is slow and buggy and the easiest thing
is just to avoid HLT in the first place. Plus, the kernel already has
paravirt infrastructure that makes it relatively painless.
Make TDX guests require paravirt and add some TDX-specific paravirt
handlers which avoid HLT in the normal halt routines. Also add a
warning in case another HLT sneaks in.
There was a report that this leads to a "major performance
improvement" on specjbb2015, probably because of the extra #VE
overhead or missed wakeups from the buggy HLT handling"
* tag 'x86_tdx_for_6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tdx: Emit warning if IRQs are enabled during HLT #VE handling
x86/tdx: Fix arch_safe_halt() execution for TDX VMs
x86/paravirt: Move halt paravirt calls under CONFIG_PARAVIRT
|
|
Convert the last remaining printk() in nmi.c to pr_info(). Along with
it, use timespec macros to calculate the NMI handler duration.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250327234629.3953536-10-sohil.mehta@intel.com
|
|
The expected_testcase_failures variable in the NMI selftest has never
been set since its introduction. Remove this unused variable along with
the related checks to simplify the code.
While at it, replace printk() with the corresponding pr_{cont,info}()
calls. Also, get rid of the superfluous testname wrapper and the
redundant file path comment.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250327234629.3953536-9-sohil.mehta@intel.com
|
|
Some of the comments in the default NMI handling code are out of place
or inadequate. Move them to the appropriate locations and update them as
needed.
Move the comment related to CPU-specific NMIs closer to the actual code.
Also, add more details about how back-to-back NMIs are detected since
that isn't immediately obvious.
Opportunistically, replace an #ifdef section in the vicinity with an
IS_ENABLED() check to make the code easier to read.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Kai Huang <kai.huang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250327234629.3953536-7-sohil.mehta@intel.com
|
|
The comment in unknown_nmi_error() is incorrect and misleading. There
is no longer a restriction on having a single Unknown NMI handler. Also,
nmi_handle() never used the 'b2b' parameter.
The commits that made the comment outdated are:
0d443b70cc92 ("x86/platform: Remove warning message for duplicate NMI handlers")
bf9f2ee28d47 ("x86/nmi: Remove the 'b2b' parameter from nmi_handle()")
Remove the old comment and update it to reflect the current logic.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250327234629.3953536-6-sohil.mehta@intel.com
|
|
Commit:
feb6cd6a0f9f ("thermal/intel_powerclamp: stop sched tick in forced idle")
got rid of the last exported user of local_touch_nmi() a while back.
Remove the unnecessary export.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250327234629.3953536-5-sohil.mehta@intel.com
|
|
The NMI descriptors for each NMI type are stored in an array. However,
they are currently initialized using raw numbers, which makes it
difficult to understand the code.
Introduce a macro to initialize the NMI descriptors using the NMI type
enum values to make the code more readable.
No functional change intended.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250327234629.3953536-4-sohil.mehta@intel.com
|
|
Commit:
c305a4e98378 ("x86: Move sysctls into arch/x86")
recently moved the sysctl handling of panic_on_unrecovered_nmi and
panic_on_io_nmi to x86-specific code. These variables no longer need to
be declared in the generic header file.
Relocate the variable definitions and declarations closer to where they
are used. This makes all the NMI panic options consistent and easier to
track.
[ mingo: Fixed up the SHA1 of the commit reference. ]
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Cc: Joel Granados <joel.granados@kernel.org>
Link: https://lore.kernel.org/r/20250327234629.3953536-3-sohil.mehta@intel.com
|
|
The unknown_nmi_panic variable is used to control whether the kernel
should panic on unknown NMIs. There is a sysctl entry under:
/proc/sys/kernel/unknown_nmi_panic
which can be used to change the behavior at runtime.
However, it seems that in some places, the option unnecessarily depends
on CONFIG_X86_LOCAL_APIC. Other code in nmi.c uses unknown_nmi_panic
without such a dependency. This results in a few messy #ifdefs
splattered across the code. The dependency was likely introduce due to a
potential build bug reported a long time ago:
https://lore.kernel.org/lkml/40BC67F9.3000609@myrealbox.com/
This build bug no longer exists.
Also, similar NMI panic options, such as panic_on_unrecovered_nmi and
panic_on_io_nmi, do not have an explicit dependency on the local APIC
either.
Though, it's hard to imagine a production system without the local APIC
configuration, making a specific NMI sysctl option dependent on it
doesn't make sense.
Remove the explicit dependency between unknown NMI handling and the
local APIC to make the code cleaner and more consistent.
While at it, reorder the header includes to maintain alphabetical order.
[ mingo: Cleaned up the changelog a bit, truly ordered the headers ... ]
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250327234629.3953536-2-sohil.mehta@intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updatesk from Greg KH:
"Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
happened this development cycle, including:
- kernfs scaling changes to make it even faster thanks to rcu
- bin_attribute constify work in many subsystems
- faux bus minor tweaks for the rust bindings
- rust binding updates for driver core, pci, and platform busses,
making more functionaliy available to rust drivers. These are all
due to people actually trying to use the bindings that were in
6.14.
- make Rafael and Danilo full co-maintainers of the driver core
codebase
- other minor fixes and updates"
* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
rust: platform: require Send for Driver trait implementers
rust: pci: require Send for Driver trait implementers
rust: platform: impl Send + Sync for platform::Device
rust: pci: impl Send + Sync for pci::Device
rust: platform: fix unrestricted &mut platform::Device
rust: pci: fix unrestricted &mut pci::Device
rust: device: implement device context marker
rust: pci: use to_result() in enable_device_mem()
MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
rust/kernel/faux: mark Registration methods inline
driver core: faux: only create the device if probe() succeeds
rust/faux: Add missing parent argument to Registration::new()
rust/faux: Drop #[repr(transparent)] from faux::Registration
rust: io: fix devres test with new io accessor functions
rust: io: rename `io::Io` accessors
kernfs: Move dput() outside of the RCU section.
efi: rci2: mark bin_attribute as __ro_after_init
rapidio: constify 'struct bin_attribute'
firmware: qemu_fw_cfg: constify 'struct bin_attribute'
powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- The series "powerpc/crash: use generic crashkernel reservation" from
Sourabh Jain changes powerpc's kexec code to use more of the generic
layers.
- The series "get_maintainer: report subsystem status separately" from
Vlastimil Babka makes some long-requested improvements to the
get_maintainer output.
- The series "ucount: Simplify refcounting with rcuref_t" from
Sebastian Siewior cleans up and optimizing the refcounting in the
ucount code.
- The series "reboot: support runtime configuration of emergency
hw_protection action" from Ahmad Fatoum improves the ability for a
driver to perform an emergency system shutdown or reboot.
- The series "Converge on using secs_to_jiffies() part two" from Easwar
Hariharan performs further migrations from msecs_to_jiffies() to
secs_to_jiffies().
- The series "lib/interval_tree: add some test cases and cleanup" from
Wei Yang permits more userspace testing of kernel library code, adds
some more tests and performs some cleanups.
- The series "hung_task: Dump the blocking task stacktrace" from Masami
Hiramatsu arranges for the hung_task detector to dump the stack of
the blocking task and not just that of the blocked task.
- The series "resource: Split and use DEFINE_RES*() macros" from Andy
Shevchenko provides some cleanups to the resource definition macros.
- Plus the usual shower of singleton patches - please see the
individual changelogs for details.
* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
mailmap: consolidate email addresses of Alexander Sverdlin
fs/procfs: fix the comment above proc_pid_wchan()
relay: use kasprintf() instead of fixed buffer formatting
resource: replace open coded variant of DEFINE_RES()
resource: replace open coded variants of DEFINE_RES_*_NAMED()
resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
samples: add hung_task detector mutex blocking sample
hung_task: show the blocker task if the task is hung on mutex
kexec_core: accept unaccepted kexec segments' destination addresses
watchdog/perf: optimize bytes copied and remove manual NUL-termination
lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
lib/interval_tree: skip the check before go to the right subtree
lib/interval_tree: add test case for span iteration
lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
lib/rbtree: add random seed
lib/rbtree: split tests
lib/rbtree: enable userland test suite for rbtree related data structure
checkpatch: describe --min-conf-desc-length
scripts/gdb/symbols: determine KASLR offset on s390
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- The series "Enable strict percpu address space checks" from Uros
Bizjak uses x86 named address space qualifiers to provide
compile-time checking of percpu area accesses.
This has caused a small amount of fallout - two or three issues were
reported. In all cases the calling code was found to be incorrect.
- The series "Some cleanup for memcg" from Chen Ridong implements some
relatively monir cleanups for the memcontrol code.
- The series "mm: fixes for device-exclusive entries (hmm)" from David
Hildenbrand fixes a boatload of issues which David found then using
device-exclusive PTE entries when THP is enabled. More work is
needed, but this makes thins better - our own HMM selftests now
succeed.
- The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
remove the z3fold and zbud implementations. They have been deprecated
for half a year and nobody has complained.
- The series "mm: further simplify VMA merge operation" from Lorenzo
Stoakes implements numerous simplifications in this area. No runtime
effects are anticipated.
- The series "mm/madvise: remove redundant mmap_lock operations from
process_madvise()" from SeongJae Park rationalizes the locking in the
madvise() implementation. Performance gains of 20-25% were observed
in one MADV_DONTNEED microbenchmark.
- The series "Tiny cleanup and improvements about SWAP code" from
Baoquan He contains a number of touchups to issues which Baoquan
noticed when working on the swap code.
- The series "mm: kmemleak: Usability improvements" from Catalin
Marinas implements a couple of improvements to the kmemleak
user-visible output.
- The series "mm/damon/paddr: fix large folios access and schemes
handling" from Usama Arif provides a couple of fixes for DAMON's
handling of large folios.
- The series "mm/damon/core: fix wrong and/or useless damos_walk()
behaviors" from SeongJae Park fixes a few issues with the accuracy of
kdamond's walking of DAMON regions.
- The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
Stoakes changes the interaction between framebuffer deferred-io and
core MM. No functional changes are anticipated - this is preparatory
work for the future removal of page structure fields.
- The series "mm/damon: add support for hugepage_size DAMOS filter"
from Usama Arif adds a DAMOS filter which permits the filtering by
huge page sizes.
- The series "mm: permit guard regions for file-backed/shmem mappings"
from Lorenzo Stoakes extends the guard region feature from its
present "anon mappings only" state. The feature now covers shmem and
file-backed mappings.
- The series "mm: batched unmap lazyfree large folios during
reclamation" from Barry Song cleans up and speeds up the unmapping
for pte-mapped large folios.
- The series "reimplement per-vma lock as a refcount" from Suren
Baghdasaryan puts the vm_lock back into the vma. Our reasons for
pulling it out were largely bogus and that change made the code more
messy. This patchset provides small (0-10%) improvements on one
microbenchmark.
- The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
improves" from SeongJae Park does some maintenance work on the DAMON
docs.
- The series "hugetlb/CMA improvements for large systems" from Frank
van der Linden addresses a pile of issues which have been observed
when using CMA on large machines.
- The series "mm/damon: introduce DAMOS filter type for unmapped pages"
from SeongJae Park enables users of DMAON/DAMOS to filter my the
page's mapped/unmapped status.
- The series "zsmalloc/zram: there be preemption" from Sergey
Senozhatsky teaches zram to run its compression and decompression
operations preemptibly.
- The series "selftests/mm: Some cleanups from trying to run them" from
Brendan Jackman fixes a pile of unrelated issues which Brendan
encountered while runnimg our selftests.
- The series "fs/proc/task_mmu: add guard region bit to pagemap" from
Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
determine whether a particular page is a guard page.
- The series "mm, swap: remove swap slot cache" from Kairui Song
removes the swap slot cache from the allocation path - it simply
wasn't being effective.
- The series "mm: cleanups for device-exclusive entries (hmm)" from
David Hildenbrand implements a number of unrelated cleanups in this
code.
- The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
implements a number of preparatoty cleanups to the GENERIC_PTDUMP
Kconfig logic.
- The series "mm/damon: auto-tune aggregation interval" from SeongJae
Park implements a feedback-driven automatic tuning feature for
DAMON's aggregation interval tuning.
- The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
preparation for implementing lazy mmu mode for arm64 to optimize
vmalloc.
- The series "mm/page_alloc: Some clarifications for migratetype
fallback" from Brendan Jackman reworks some commentary to make the
code easier to follow.
- The series "page_counter cleanup and size reduction" from Shakeel
Butt cleans up the page_counter code and fixes a size increase which
we accidentally added late last year.
- The series "Add a command line option that enables control of how
many threads should be used to allocate huge pages" from Thomas
Prescher does that. It allows the careful operator to significantly
reduce boot time by tuning the parallalization of huge page
initialization.
- The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
from Tang Yizhou fixes the tracing output from the dirty page
balancing code.
- The series "mm/damon: make allow filters after reject filters useful
and intuitive" from SeongJae Park improves the handling of allow and
reject filters. Behaviour is made more consistent and the documention
is updated accordingly.
- The series "Switch zswap to object read/write APIs" from Yosry Ahmed
updates zswap to the new object read/write APIs and thus permits the
removal of some legacy code from zpool and zsmalloc.
- The series "Some trivial cleanups for shmem" from Baolin Wang does as
it claims.
- The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
Alistair Popple regularizes the weird ZONE_DEVICE page refcount
handling in DAX, permittig the removal of a number of special-case
checks.
- The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
preparatoty refactoring and cleanup of the mremap() code.
- The series "mm: MM owner tracking for large folios (!hugetlb) +
CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
which we determine whether a large folio is known to be mapped
exclusively into a single MM.
- The series "mm/damon: add sysfs dirs for managing DAMOS filters based
on handling layers" from SeongJae Park adds a couple of new sysfs
directories to ease the management of DAMON/DAMOS filters.
- The series "arch, mm: reduce code duplication in mem_init()" from
Mike Rapoport consolidates many per-arch implementations of
mem_init() into code generic code, where that is practical.
- The series "mm/damon/sysfs: commit parameters online via
damon_call()" from SeongJae Park continues the cleaning up of sysfs
access to DAMON internal data.
- The series "mm: page_ext: Introduce new iteration API" from Luiz
Capitulino reworks the page_ext initialization to fix a boot-time
crash which was observed with an unusual combination of compile and
cmdline options.
- The series "Buddy allocator like (or non-uniform) folio split" from
Zi Yan reworks the code to split a folio into smaller folios. The
main benefit is lessened memory consumption: fewer post-split folios
are generated.
- The series "Minimize xa_node allocation during xarry split" from Zi
Yan reduces the number of xarray xa_nodes which are generated during
an xarray split.
- The series "drivers/base/memory: Two cleanups" from Gavin Shan
performs some maintenance work on the drivers/base/memory code.
- The series "Add tracepoints for lowmem reserves, watermarks and
totalreserve_pages" from Martin Liu adds some more tracepoints to the
page allocator code.
- The series "mm/madvise: cleanup requests validations and
classifications" from SeongJae Park cleans up some warts which
SeongJae observed during his earlier madvise work.
- The series "mm/hwpoison: Fix regressions in memory failure handling"
from Shuai Xue addresses two quite serious regressions which Shuai
has observed in the memory-failure implementation.
- The series "mm: reliable huge page allocator" from Johannes Weiner
makes huge page allocations cheaper and more reliable by reducing
fragmentation.
- The series "Minor memcg cleanups & prep for memdescs" from Matthew
Wilcox is preparatory work for the future implementation of memdescs.
- The series "track memory used by balloon drivers" from Nico Pache
introduces a way to track memory used by our various balloon drivers.
- The series "mm/damon: introduce DAMOS filter type for active pages"
from Nhat Pham permits users to filter for active/inactive pages,
separately for file and anon pages.
- The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
separates the proactive reclaim statistics from the direct reclaim
statistics.
- The series "mm/vmscan: don't try to reclaim hwpoison folio" from
Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
code.
* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
x86/mm: restore early initialization of high_memory for 32-bits
mm/vmscan: don't try to reclaim hwpoison folio
mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
selftests/mm: speed up split_huge_page_test
selftests/mm: uffd-unit-tests support for hugepages > 2M
docs/mm/damon/design: document active DAMOS filter type
mm/damon: implement a new DAMOS filter type for active pages
fs/dax: don't disassociate zero page entries
MM documentation: add "Unaccepted" meminfo entry
selftests/mm: add commentary about 9pfs bugs
fork: use __vmalloc_node() for stack allocation
docs/mm: Physical Memory: Populate the "Zones" section
xen: balloon: update the NR_BALLOON_PAGES state
hv_balloon: update the NR_BALLOON_PAGES state
balloon_compaction: update the NR_BALLOON_PAGES state
meminfo: add a per node counter for balloon drivers
mm: remove references to folio in __memcg_kmem_uncharge_page()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull modules updates from Petr Pavlu:
- Use RCU instead of RCU-sched
The mix of rcu_read_lock(), rcu_read_lock_sched() and
preempt_disable() in the module code and its users has
been replaced with just rcu_read_lock()
- The rest of changes are smaller fixes and updates
* tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (32 commits)
MAINTAINERS: Update the MODULE SUPPORT section
module: Remove unnecessary size argument when calling strscpy()
module: Replace deprecated strncpy() with strscpy()
params: Annotate struct module_param_attrs with __counted_by()
bug: Use RCU instead RCU-sched to protect module_bug_list.
static_call: Use RCU in all users of __module_text_address().
kprobes: Use RCU in all users of __module_text_address().
bpf: Use RCU in all users of __module_text_address().
jump_label: Use RCU in all users of __module_text_address().
jump_label: Use RCU in all users of __module_address().
x86: Use RCU in all users of __module_address().
cfi: Use RCU while invoking __module_address().
powerpc/ftrace: Use RCU in all users of __module_text_address().
LoongArch: ftrace: Use RCU in all users of __module_text_address().
LoongArch/orc: Use RCU in all users of __module_address().
arm64: module: Use RCU in all users of __module_text_address().
ARM: module: Use RCU in all users of __module_text_address().
module: Use RCU in all users of __module_text_address().
module: Use RCU in all users of __module_address().
module: Use RCU in search_module_extables().
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes and updates from Ingo Molnar:
- Fix a large number of x86 Kconfig dependency and help text accuracy
bugs/problems, by Mateusz Jończyk and David Heideberg
- Fix a VM_PAT interaction with fork() crash. This also touches core
kernel code
- Fix an ORC unwinder bug for interrupt entries
- Fixes and cleanups
- Fix an AMD microcode loader bug that can promote verification
failures into success
- Add early-printk support for MMIO based UARTs on an x86 board that
had no other serial debugging facility and also experienced early
boot crashes
* tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Fix __apply_microcode_amd()'s return value
x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
x86/fpu: Update the outdated comment above fpstate_init_user()
x86/early_printk: Add support for MMIO-based UARTs
x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment
x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1
x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text
x86/Kconfig: Correct X86_X2APIC help text
x86/speculation: Remove the extra #ifdef around CALL_NOSPEC
x86/Kconfig: Document release year of glibc 2.3.3
x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32
x86/Kconfig: Document CONFIG_PCI_MMCONFIG
x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM
x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together
x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE
x86/Kconfig: Enable X86_X2APIC by default and improve help text
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc locking fixes and updates from Ingo Molnar:
- Fix a locking self-test FAIL on PREEMPT_RT kernels
- Fix nr_unused_locks accounting bug
- Simplify the split-lock debugging feature's fast-path
* tag 'locking-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class()
lockdep: Fix wait context check on softirq for PREEMPT_RT
x86/split_lock: Simplify reenabling
|
|
When verify_sha256_digest() fails, __apply_microcode_amd() should propagate
the failure by returning false (and not -1 which is promoted to true).
Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250327230503.1850368-2-boris.ostrovsky@oracle.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan:
- Remove support for IBM Cell Blades
- SMP support for microwatt platform
- Support for inline static calls on PPC32
- Enable pmu selftests for power11 platform
- Enable hardware trace macro (HTM) hcall support
- Support for limited address mode capability
- Changes to RMA size from 512 MB to 768 MB to handle fadump
- Misc fixes and cleanups
Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann,
Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom,
Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook,
Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani
(IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav
Jain, and Venkat Rao Bagalkote.
* tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits)
powerpc/kexec: fix physical address calculation in clear_utlb_entry()
crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD
powerpc: Fix 'intra_function_call not a direct call' warning
powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu'
KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests
powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7
powerpc/microwatt: Add SMP support
powerpc: Define config option for processors with broadcast TLBIE
powerpc/microwatt: Define an idle power-save function
powerpc/microwatt: Device-tree updates
powerpc/microwatt: Select COMMON_CLK in order to get the clock framework
net: toshiba: Remove reference to PPC_IBM_CELL_BLADE
net: spider_net: Remove powerpc Cell driver
cpufreq: ppc_cbe: Remove powerpc Cell driver
genirq: Remove IRQ_EDGE_EOI_HANDLER
docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR
powerpc: Remove UDBG_RTAS_CONSOLE
powerpc/io: Use standard barrier macros in io.c
powerpc/io: Rename _insw_ns() etc.
powerpc/io: Use generic raw accessors
...
|
|
Direct HLT instruction execution causes #VEs for TDX VMs which is routed
to hypervisor via TDCALL. If HLT is executed in STI-shadow, resulting #VE
handler will enable interrupts before TDCALL is routed to hypervisor
leading to missed wakeup events, as current TDX spec doesn't expose
interruptibility state information to allow #VE handler to selectively
enable interrupts.
Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
prevented the idle routines from executing HLT instruction in STI-shadow.
But it missed the paravirt routine which can be reached via this path
as an example:
kvm_wait() =>
safe_halt() =>
raw_safe_halt() =>
arch_safe_halt() =>
irq.safe_halt() =>
pv_native_safe_halt()
To reliably handle arch_safe_halt() for TDX VMs, introduce explicit
dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt()
routines with TDX-safe versions that execute direct TDCALL and needed
interrupt flag updates. Executing direct TDCALL brings in additional
benefit of avoiding HLT related #VEs altogether.
As tested by Ryan Afranji:
"Tested with the specjbb2015 benchmark. It has heavy lock contention which leads
to many halt calls. TDX VMs suffered a poor score before this patchset.
Verified the major performance improvement with this patchset applied."
Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Ryan Afranji <afranji@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250228014416.3925664-3-vannapurve@google.com
|
|
CONFIG_PARAVIRT_XXL is mainly defined/used by XEN PV guests. For
other VM guest types, features supported under CONFIG_PARAVIRT
are self sufficient. CONFIG_PARAVIRT mainly provides support for
TLB flush operations and time related operations.
For TDX guest as well, paravirt calls under CONFIG_PARVIRT meets
most of its requirement except the need of HLT and SAFE_HLT
paravirt calls, which is currently defined under
CONFIG_PARAVIRT_XXL.
Since enabling CONFIG_PARAVIRT_XXL is too bloated for TDX guest
like platforms, move HLT and SAFE_HLT paravirt calls under
CONFIG_PARAVIRT.
Moving HLT and SAFE_HLT paravirt calls are not fatal and should not
break any functionality for current users of CONFIG_PARAVIRT.
Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Ryan Afranji <afranji@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20250228014416.3925664-2-vannapurve@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
"Another set of improvements to the kernel's CRC (cyclic redundancy
check) code:
- Rework the CRC64 library functions to be directly optimized, like
what I did last cycle for the CRC32 and CRC-T10DIF library
functions
- Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ
support and acceleration for crc64_be and crc64_nvme
- Rewrite the riscv Zbc-optimized CRC code, and add acceleration for
crc_t10dif, crc64_be, and crc64_nvme
- Remove crc_t10dif and crc64_rocksoft from the crypto API, since
they are no longer needed there
- Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect
- Add kunit test cases for crc64_nvme and crc7
- Eliminate redundant functions for calculating the Castagnoli CRC32,
settling on just crc32c()
- Remove unnecessary prompts from some of the CRC kconfig options
- Further optimize the x86 crc32c code"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (36 commits)
x86/crc: drop the avx10_256 functions and rename avx10_512 to avx512
lib/crc: remove unnecessary prompt for CONFIG_CRC64
lib/crc: remove unnecessary prompt for CONFIG_LIBCRC32C
lib/crc: remove unnecessary prompt for CONFIG_CRC8
lib/crc: remove unnecessary prompt for CONFIG_CRC7
lib/crc: remove unnecessary prompt for CONFIG_CRC4
lib/crc7: unexport crc7_be_syndrome_table
lib/crc_kunit.c: update comment in crc_benchmark()
lib/crc_kunit.c: add test and benchmark for crc7_be()
x86/crc32: optimize tail handling for crc32c short inputs
riscv/crc64: add Zbc optimized CRC64 functions
riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function
riscv/crc32: reimplement the CRC32 functions using new template
riscv/crc: add "template" for Zbc optimized CRC functions
x86/crc: add ANNOTATE_NOENDBR to suppress objtool warnings
x86/crc32: improve crc32c_arch() code generation with clang
x86/crc64: implement crc64_be and crc64_nvme using new template
x86/crc-t10dif: implement crc_t10dif using new template
x86/crc32: implement crc32_le using new template
x86/crc: add "template" for [V]PCLMULQDQ based CRC functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These are dominated by cpufreq updates which in turn are dominated by
updates related to boost support in the core and drivers and
amd-pstate driver optimizations.
Apart from the above, there are some cpuidle updates including a
rework of the most recent idle intervals handling in the venerable
menu governor that leads to significant improvements in some
performance benchmarks, as the governor is now more likely to predict
a shorter idle duration in some cases, and there are updates of the
core device power management code, mostly related to system suspend
and resume, that should help to avoid potential issues arising when
the drivers of devices depending on one another want to use different
optimizations.
There is also a usual collection of assorted fixes and cleanups,
including removal of some unused code.
Specifics:
- Manage sysfs attributes and boost frequencies efficiently from
cpufreq core to reduce boilerplate code in drivers (Viresh Kumar)
- Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
Dhananjay Ugwekar, Imran Shaik, zuoqian)
- Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
Bai)
- cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski)
- Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng)
- Optimize the amd-pstate driver to avoid cases where call paths end
up calling the same writes multiple times and needlessly caching
variables through code reorganization, locking overhaul and tracing
adjustments (Mario Limonciello, Dhananjay Ugwekar)
- Make it possible to avoid enabling capacity-aware scheduling (CAS)
in the intel_pstate driver and relocate a check for out-of-band
(OOB) platform handling in it to make it detect OOB before checking
HWP availability (Rafael Wysocki)
- Fix dbs_update() to avoid inadvertent conversions of negative
integer values to unsigned int which causes CPU frequency selection
to be inaccurate in some cases when the "conservative" cpufreq
governor is in use (Jie Zhan)
- Update the handling of the most recent idle intervals in the menu
cpuidle governor to prevent useful information from being discarded
by it in some cases and improve the prediction accuracy (Rafael
Wysocki)
- Make it possible to tell the intel_idle driver to ignore its
built-in table of idle states for the given processor, clean up the
handling of auto-demotion disabling on Baytrail and Cherrytrail
chips in it, and update its MAINTAINERS entry (David Arcari, Artem
Bityutskiy, Rafael Wysocki)
- Make some cpuidle drivers use for_each_present_cpu() instead of
for_each_possible_cpu() during initialization to avoid issues
occurring when nosmp or maxcpus=0 are used (Jacky Bai)
- Clean up the Energy Model handling code somewhat (Rafael Wysocki)
- Use kfree_rcu() to simplify the handling of runtime Energy Model
updates (Li RongQing)
- Add an entry for the Energy Model framework to MAINTAINERS as
properly maintained (Lukasz Luba)
- Address RCU-related sparse warnings in the Energy Model code
(Rafael Wysocki)
- Remove ENERGY_MODEL dependency on SMP and allow it to be selected
when DEVFREQ is set without CPUFREQ so it can be used on a wider
range of systems (Jeson Gao)
- Unify error handling during runtime suspend and runtime resume in
the core to help drivers to implement more consistent runtime PM
error handling (Rafael Wysocki)
- Drop a redundant check from pm_runtime_force_resume() and rearrange
documentation related to __pm_runtime_disable() (Rafael Wysocki)
- Rework the handling of the "smart suspend" driver flag in the PM
core to avoid issues hat may occur when drivers using it depend on
some other drivers and clean up the related PM core code (Rafael
Wysocki, Colin Ian King)
- Fix the handling of devices with the power.direct_complete flag set
if device_suspend() returns an error for at least one device to
avoid situations in which some of them may not be resumed (Rafael
Wysocki)
- Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
possible deadlock that may occur if the "compressor" hibernation
module parameter is accessed during the registration of a new
ieee80211 device (Lizhi Xu)
- Suppress sleeping parent warning in device_pm_add() in the case
when new children are added under a device with the
power.direct_complete set after it has been processed by
device_resume() (Xu Yang)
- Remove needless return in three void functions related to system
wakeup (Zijun Hu)
- Replace deprecated kmap_atomic() with kmap_local_page() in the
hibernation core code (David Reaver)
- Remove unused helper functions related to system sleep (David Alan
Gilbert)
- Clean up s2idle_enter() so it does not lock and unlock CPU offline
in vain and update comments in it (Ulf Hansson)
- Clean up broken white space in dpm_wait_for_children() (Geert
Uytterhoeven)
- Update the cpupower utility to fix lib version-ing in it and memory
leaks in error legs, remove hard-coded values, and implement CPU
physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
Khan, Yiwei Lin, Zhongqiu Han)"
* tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits)
PM: sleep: Fix bit masking operation
dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
cpufreq: Init cpufreq only for present CPUs
PM: sleep: Fix handling devices with direct_complete set on errors
cpuidle: Init cpuidle only for present CPUs
PM: clk: Remove unused pm_clk_remove()
PM: sleep: core: Fix indentation in dpm_wait_for_children()
PM: s2idle: Extend comment in s2idle_enter()
PM: s2idle: Drop redundant locks when entering s2idle
PM: sleep: Remove unused pm_generic_ wrappers
cpufreq: tegra186: Share policy per cluster
cpupower: Make lib versioning scheme more obvious and fix version link
PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL
PM: EM: Address RCU-related sparse warnings
cpupower: Implement CPU physical core querying
pm: cpupower: remove hard-coded topology depth values
pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"From the functional perspective, the most significant changes here are
the ACPI fan driver update allowing it to handle fans with
fine-grained state checking supported, but without fine-grained
control, and the ACPI button driver update making it subscribe to
system event notifications (in addition to device notifications) which
on some systems is requisite for waking up the system from sleep.
The rest is fixes and cleanups including removal of some dead code.
Specifics:
- Use the str_on_off() helper function instead of hard-coded strings
in the ACPI power resources handling code (Thorsten Blum)
- Add fan speed reporting for ACPI fans that have _FST, but otherwise
do not support the entire ACPI 4 fan interface (Joshua Grisham)
- Fix a stale comment regarding trip points in acpi_thermal_add()
that diverged from the commented code after removing _CRT
evaluation from acpi_thermal_get_trip_points() (xueqin Luo)
- Make ACPI button driver also subscribe to system events (Mario
Limonciello)
- Use the str_yes_no() helper function instead of hard-coded strings
in the ACPI backlight (video) driver (Thorsten Blum)
- Add a missing header file include to the x86 arch CPPC code (Mario
Limonciello)
- Rework the sysfs attributes implementation in the ACPI
platform-profile driver and improve the unregistration code in it
(Nathan Chancellor, Kurt Borja)
- Prevent the ACPI HED driver from being built as a module and change
its initcall level to subsys_initcall to avoid initialization
ordering issues related to it (Xiaofei Tan)
- Update a maintainer email address in the ACPI PMIC entry in
MAINTAINERS (Mika Westerberg)
- Address a GCC 15's -Wunterminated-string-initialization warning in
the core PNP subsystem code and remove some dead code from it (Kees
Cook, David Alan Gilbert)"
* tag 'acpi-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PNP: Expand length of fixup id string
PNP: Remove prehistoric deadcode
ACPI: button: Install notifier for system events as well
ACPI: fan: Add fan speed reporting for fans with only _FST
ACPI: HED: Always initialize before evged
x86/ACPI: CPPC: Add missing include
ACPI: video: Use str_yes_no() helper in acpi_video_bus_add()
ACPI: platform_profile: Improve platform_profile_unregister()
ACPI: platform-profile: Fix CFI violation when accessing sysfs files
ACPI: power: Use str_on_off() helper function
ACPI: thermal: Fix stale comment regarding trip points
MAINTAINERS: Use my kernel.org address for ACPI PMIC work
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Add support for running as the root partition in Hyper-V (Microsoft
Hypervisor) by exposing /dev/mshv (Nuno and various people)
- Add support for CPU offlining in Hyper-V (Hamza Mahfooz)
- Misc fixes and cleanups (Roman Kisel, Tianyu Lan, Wei Liu, Michael
Kelley, Thorsten Blum)
* tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits)
x86/hyperv: fix an indentation issue in mshyperv.h
x86/hyperv: Add comments about hv_vpset and var size hypercall input args
Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs
hyperv: Add definitions for root partition driver to hv headers
x86: hyperv: Add mshv_handler() irq handler and setup function
Drivers: hv: Introduce per-cpu event ring tail
Drivers: hv: Export some functions for use by root partition module
acpi: numa: Export node_to_pxm()
hyperv: Introduce hv_recommend_using_aeoi()
arm64/hyperv: Add some missing functions to arm64
x86/mshyperv: Add support for extended Hyper-V features
hyperv: Log hypercall status codes as strings
x86/hyperv: Fix check of return value from snp_set_vmsa()
x86/hyperv: Add VTL mode callback for restarting the system
x86/hyperv: Add VTL mode emergency restart callback
hyperv: Remove unused union and structs
hyperv: Add CONFIG_MSHV_ROOT to gate root partition support
hyperv: Change hv_root_partition into a function
hyperv: Convert hypercall statuses to linux error codes
drivers/hv: add CPU offlining support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS update from Borislav Petkov:
- A cleanup to the MCE notification machinery
* tag 'ras_core_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/inject: Remove call to mce_notify_irq()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- First part of the MPAM work: split the architectural part of resctrl
from the filesystem part so that ARM's MPAM varian of resource
control can be added later while sharing the user interface with x86
(James Morse)
* tag 'x86_cache_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
x86/resctrl: Move get_{mon,ctrl}_domain_from_cpu() to live with their callers
x86/resctrl: Move get_config_index() to a header
x86/resctrl: Handle throttle_mode for SMBA resources
x86/resctrl: Move RFTYPE flags to be managed by resctrl
x86/resctrl: Make resctrl_arch_pseudo_lock_fn() take a plr
x86/resctrl: Make prefetch_disable_bits belong to the arch code
x86/resctrl: Allow an architecture to disable pseudo lock
x86/resctrl: Add resctrl_arch_ prefix to pseudo lock functions
x86/resctrl: Move mbm_cfg_mask to struct rdt_resource
x86/resctrl: Move mba_mbps_default_event init to filesystem code
x86/resctrl: Change mon_event_config_{read,write}() to be arch helpers
x86/resctrl: Add resctrl_arch_is_evt_configurable() to abstract BMEC
x86/resctrl: Move the is_mbm_*_enabled() helpers to asm/resctrl.h
x86/resctrl: Rewrite and move the for_each_*_rdt_resource() walkers
x86/resctrl: Move monitor init work to a resctrl init call
x86/resctrl: Move monitor exit work to a resctrl exit call
x86/resctrl: Add an arch helper to reset one resource
x86/resctrl: Move resctrl types to a separate header
x86/resctrl: Move rdt_find_domain() to be visible to arch and fs code
x86/resctrl: Expose resctrl fs's init function to the rest of the kernel
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 speculation mitigation updates from Borislav Petkov:
- Some preparatory work to convert the mitigations machinery to
mitigating attack vectors instead of single vulnerabilities
- Untangle and remove a now unneeded X86_FEATURE_USE_IBPB flag
- Add support for a Zen5-specific SRSO mitigation
- Cleanups and minor improvements
* tag 'x86_bugs_for_v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2
x86/bugs: Use the cpu_smt_possible() helper instead of open-coded code
x86/bugs: Add AUTO mitigations for mds/taa/mmio/rfds
x86/bugs: Relocate mds/taa/mmio/rfds defines
x86/bugs: Add X86_BUG_SPECTRE_V2_USER
x86/bugs: Remove X86_FEATURE_USE_IBPB
KVM: nVMX: Always use IBPB to properly virtualize IBRS
x86/bugs: Use a static branch to guard IBPB on vCPU switch
x86/bugs: Remove the X86_FEATURE_USE_IBPB check in ib_prctl_set()
x86/mm: Remove X86_FEATURE_USE_IBPB checks in cond_mitigation()
x86/bugs: Move the X86_FEATURE_USE_IBPB check into callers
x86/bugs: KVM: Add support for SRSO_MSR_FIX
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Nothing major this time around.
Apart from the usual perf/PMU updates, some page table cleanups, the
notable features are average CPU frequency based on the AMUv1
counters, CONFIG_HOTPLUG_SMT and MOPS instructions (memcpy/memset) in
the uaccess routines.
Perf and PMUs:
- Support for the 'Rainier' CPU PMU from Arm
- Preparatory driver changes and cleanups that pave the way for BRBE
support
- Support for partial virtualisation of the Apple-M1 PMU
- Support for the second event filter in Arm CSPMU designs
- Minor fixes and cleanups (CMN and DWC PMUs)
- Enable EL2 requirements for FEAT_PMUv3p9
Power, CPU topology:
- Support for AMUv1-based average CPU frequency
- Run-time SMT control wired up for arm64 (CONFIG_HOTPLUG_SMT). It
adds a generic topology_is_primary_thread() function overridden by
x86 and powerpc
New(ish) features:
- MOPS (memcpy/memset) support for the uaccess routines
Security/confidential compute:
- Fix the DMA address for devices used in Realms with Arm CCA. The
CCA architecture uses the address bit to differentiate between
shared and private addresses
- Spectre-BHB: assume CPUs Linux doesn't know about vulnerable by
default
Memory management clean-ups:
- Drop the P*D_TABLE_BIT definition in preparation for 128-bit PTEs
- Some minor page table accessor clean-ups
- PIE/POE (permission indirection/overlay) helpers clean-up
Kselftests:
- MTE: skip hugetlb tests if MTE is not supported on such mappings
and user correct naming for sync/async tag checking modes
Miscellaneous:
- Add a PKEY_UNRESTRICTED definition as 0 to uapi (toolchain people
request)
- Sysreg updates for new register fields
- CPU type info for some Qualcomm Kryo cores"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (72 commits)
arm64: mm: Don't use %pK through printk
perf/arm_cspmu: Fix missing io.h include
arm64: errata: Add newer ARM cores to the spectre_bhb_loop_affected() lists
arm64: cputype: Add MIDR_CORTEX_A76AE
arm64: errata: Add KRYO 2XX/3XX/4XX silver cores to Spectre BHB safe list
arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB
arm64: errata: Add QCOM_KRYO_4XX_GOLD to the spectre_bhb_k24_list
arm64/sysreg: Enforce whole word match for open/close tokens
arm64/sysreg: Fix unbalanced closing block
arm64: Kconfig: Enable HOTPLUG_SMT
arm64: topology: Support SMT control on ACPI based system
arch_topology: Support SMT control for OF based system
cpu/SMT: Provide a default topology_is_primary_thread()
arm64/mm: Define PTDESC_ORDER
perf/arm_cspmu: Add PMEVFILT2R support
perf/arm_cspmu: Generalise event filtering
perf/arm_cspmu: Move register definitons to header
arm64/kernel: Always use level 2 or higher for early mappings
arm64/mm: Drop PXD_TABLE_BIT
arm64/mm: Check pmd_table() in pmd_trans_huge()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq driver updates from Thomas Gleixner:
- Support for hard indices on RISC-V. The hart index identifies a hart
(core) within a specific interrupt domain in RISC-V's Priviledged
Architecture.
- Rework of the RISC-V MSI driver
This moves the driver over to the generic MSI library and solves the
affinity problem of unmaskable PCI/MSI controllers. Unmaskable
PCI/MSI controllers are prone to lose interrupts when the MSI message
is updated to change the affinity because the message write consists
of three 32-bit subsequent writes, which update address and data. As
these writes are non-atomic versus the device raising an interrupt,
the device can observe a half written update and issue an interrupt
on the wrong vector. This is mitiated by a carefully orchestrated
step by step update and the observation of an eventually pending
interrupt on the CPU which issues the update. The algorithm follows
the well established method of the X86 MSI driver.
- A new driver for the RISC-V Sophgo SG2042 MSI controller
- Overhaul of the Renesas RZQ2L driver
Simplification of the probe function by using devm_*() mechanisms,
which avoid the endless list of error prone gotos in the failure
paths.
- Expand the Renesas RZV2H driver to support RZ/G3E SoCs
- A workaround for Rockchip 3568002 erratum in the GIC-V3 driver to
ensure that the addressing is limited to the lower 32-bit of the
physical address space.
- Add support for the Allwinner AS23 NMI controller
- Expand the IMX irqsteer driver to handle up to 960 input interrupts
- The usual small updates, cleanups and device tree changes
* tag 'irq-drivers-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
irqchip/imx-irqsteer: Support up to 960 input interrupts
irqchip/sunxi-nmi: Support Allwinner A523 NMI controller
dt-bindings: irq: sun7i-nmi: Document the Allwinner A523 NMI controller
irqchip/davinci-cp-intc: Remove public header
irqchip/renesas-rzv2h: Add RZ/G3E support
irqchip/renesas-rzv2h: Update macros ICU_TSSR_TSSEL_{MASK,PREP}
irqchip/renesas-rzv2h: Update TSSR_TIEN macro
irqchip/renesas-rzv2h: Add field_width to struct rzv2h_hw_info
irqchip/renesas-rzv2h: Add max_tssel to struct rzv2h_hw_info
irqchip/renesas-rzv2h: Add struct rzv2h_hw_info with t_offs variable
irqchip/renesas-rzv2h: Use devm_pm_runtime_enable()
irqchip/renesas-rzv2h: Use devm_reset_control_get_exclusive_deasserted()
irqchip/renesas-rzv2h: Simplify rzv2h_icu_init()
irqchip/renesas-rzv2h: Drop irqchip from struct rzv2h_icu_priv
irqchip/renesas-rzv2h: Fix wrong variable usage in rzv2h_tint_set_type()
dt-bindings: interrupt-controller: renesas,rzv2h-icu: Document RZ/G3E SoC
riscv: sophgo: dts: Add msi controller for SG2042
irqchip: Add the Sophgo SG2042 MSI interrupt controller
dt-bindings: interrupt-controller: Add Sophgo SG2042 MSI
arm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI
...
|
|
The actual serial output function is a no-op for now.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20250314173226.3062535-3-dwmw2@infradead.org
|
|
[ mingo: Minor readability edits ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20250314173226.3062535-2-dwmw2@infradead.org
|
|
The x86/cacheinfo code has been heavily refactored and fleshed out at
parent commits, where any necessary coding style fixes were also done
in place.
Apply Documentation/process/maintainer-tip.rst coding style fixes to the
rest of the code, and align its assignment expressions for readability.
Standardize on CPUID(n) when mentioning leaf queries.
Avoid breaking long lines when doing so helps readability.
At cacheinfo_amd_init_llc_id(), rename variable 'msb' to 'index_msb' as
this is how it's called at the rest of cacheinfo.c code.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250324133324.23458-30-darwi@linutronix.de
|
|
Multiple code paths at cacheinfo.c and amd_nb.c check for AMD/Hygon CPUs
L3 cache presensce by directly checking leaf 0x80000006 EDX output.
Extract that logic into its own function. While at it, rework the
AMD/Hygon LLC topology ID caclculation comments for clarity.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250324133324.23458-29-darwi@linutronix.de
|