Age | Commit message (Collapse) | Author |
|
Under VMware hypervisors, SEV-SNP enabled VMs are fundamentally able to boot
without UEFI, but this regressed a year ago due to:
0f4a1e80989a ("x86/sev: Skip ROM range scans and validation for SEV-SNP guests")
In this case, mpparse_find_mptable() has to be called to parse MP
tables which contains the necessary boot information.
[ mingo: Updated the changelog. ]
Fixes: 0f4a1e80989a ("x86/sev: Skip ROM range scans and validation for SEV-SNP guests")
Co-developed-by: Ye Li <ye.li@broadcom.com>
Signed-off-by: Ye Li <ye.li@broadcom.com>
Signed-off-by: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Ye Li <ye.li@broadcom.com>
Reviewed-by: Kevin Loughlin <kevinloughlin@google.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250313173111.10918-1-ajay.kaher@broadcom.com
|
|
Currently, load_microcode_amd() iterates over all NUMA nodes, retrieves their
CPU masks and unconditionally accesses per-CPU data for the first CPU of each
mask.
According to Documentation/admin-guide/mm/numaperf.rst:
"Some memory may share the same node as a CPU, and others are provided as
memory only nodes."
Therefore, some node CPU masks may be empty and wouldn't have a "first CPU".
On a machine with far memory (and therefore CPU-less NUMA nodes):
- cpumask_of_node(nid) is 0
- cpumask_first(0) is CONFIG_NR_CPUS
- cpu_data(CONFIG_NR_CPUS) accesses the cpu_info per-CPU array at an
index that is 1 out of bounds
This does not have any security implications since flashing microcode is
a privileged operation but I believe this has reliability implications by
potentially corrupting memory while flashing a microcode update.
When booting with CONFIG_UBSAN_BOUNDS=y on an AMD machine that flashes
a microcode update. I get the following splat:
UBSAN: array-index-out-of-bounds in arch/x86/kernel/cpu/microcode/amd.c:X:Y
index 512 is out of range for type 'unsigned long[512]'
[...]
Call Trace:
dump_stack
__ubsan_handle_out_of_bounds
load_microcode_amd
request_microcode_amd
reload_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
entry_SYSCALL_64_after_hwframe
Change the loop to go over only NUMA nodes which have CPUs before determining
whether the first CPU on the respective node needs microcode update.
[ bp: Massage commit message, fix typo. ]
Fixes: 7ff6edf4fef3 ("x86/microcode/AMD: Fix mixed steppings support")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250310144243.861978-1-revest@chromium.org
|
|
The kernel requires X86_FEATURE_SGX_LC to be able to create SGX enclaves,
not just X86_FEATURE_SGX.
There is quite a number of hardware which has X86_FEATURE_SGX but not
X86_FEATURE_SGX_LC. A kernel running on such hardware does not create
the /dev/sgx_enclave file and does so silently.
Explicitly warn if X86_FEATURE_SGX_LC is not enabled to properly notify
users that the kernel disabled the SGX driver.
The X86_FEATURE_SGX_LC, a.k.a. SGX Launch Control, is a CPU feature
that enables LE (Launch Enclave) hash MSRs to be writable (with
additional opt-in required in the 'feature control' MSR) when running
enclaves, i.e. using a custom root key rather than the Intel proprietary
key for enclave signing.
I've hit this issue myself and have spent some time researching where
my /dev/sgx_enclave file went on SGX-enabled hardware.
Related links:
https://github.com/intel/linux-sgx/issues/837
https://patchwork.kernel.org/project/platform-driver-x86/patch/20180827185507.17087-3-jarkko.sakkinen@linux.intel.com/
[ mingo: Made the error message a bit more verbose, and added other cases
where the kernel fails to create the /dev/sgx_enclave device node. ]
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Kai Huang <kai.huang@intel.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250309172215.21777-2-vdronov@redhat.com
|
|
Add some more forgotten models to the SHA check.
Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Link: https://lore.kernel.org/r/20250307220256.11816-1-bp@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix CPUID leaf 0x2 parsing bugs
- Sanitize very early boot parameters to avoid crash
- Fix size overflows in the SGX code
- Make CALL_NOSPEC use consistent
* tag 'x86-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Sanitize boot params before parsing command line
x86/sgx: Fix size overflows in sgx_encl_create()
x86/cpu: Properly parse CPUID leaf 0x2 TLB descriptor 0x63
x86/cpu: Validate CPUID leaf 0x2 EDX output
x86/cacheinfo: Validate CPUID leaf 0x2 EDX output
x86/speculation: Add a conditional CS prefix to CALL_NOSPEC
x86/speculation: Simplify and make CALL_NOSPEC consistent
|
|
Xen doesn't offer MSR_FAM10H_MMIO_CONF_BASE to all guests. This results
in the following warning:
unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
Call Trace:
xen_read_msr+0x1e/0x30
amd_get_mmconfig_range+0x2b/0x80
quirk_amd_mmconfig_area+0x28/0x100
pnp_fixup_device+0x39/0x50
__pnp_add_device+0xf/0x150
pnp_add_device+0x3d/0x100
pnpacpi_add_device_handler+0x1f9/0x280
acpi_ns_get_device_callback+0x104/0x1c0
acpi_ns_walk_namespace+0x1d0/0x260
acpi_get_devices+0x8a/0xb0
pnpacpi_init+0x50/0x80
do_one_initcall+0x46/0x2e0
kernel_init_freeable+0x1da/0x2f0
kernel_init+0x16/0x1b0
ret_from_fork+0x30/0x50
ret_from_fork_asm+0x1b/0x30
based on quirks for a "PNP0c01" device. Treating MMCFG as disabled is the
right course of action, so no change is needed there.
This was most likely exposed by fixing the Xen MSR accessors to not be
silently-safe.
Fixes: 3fac3734c43a ("xen/pv: support selecting safe/unsafe msr accesses")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250307002846.3026685-1-andrew.cooper3@citrix.com
|
|
The total size calculated for EPC can overflow u64 given the added up page
for SECS. Further, the total size calculated for shmem can overflow even
when the EPC size stays within limits of u64, given that it adds the extra
space for 128 byte PCMD structures (one for each page).
Address this by pre-evaluating the micro-architectural requirement of
SGX: the address space size must be power of two. This is eventually
checked up by ECREATE but the pre-check has the additional benefit of
making sure that there is some space for additional data.
Fixes: 888d24911787 ("x86/sgx: Add SGX_IOC_ENCLAVE_CREATE")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250305050006.43896-1-jarkko@kernel.org
Closes: https://lore.kernel.org/linux-sgx/c87e01a0-e7dd-4749-a348-0980d3444f04@stanley.mountain/
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull AMD microcode loading fixes from Borislav Petkov:
- Load only sha256-signed microcode patch blobs
- Other good cleanups
* tag 'x86_microcode_for_v6.14_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Load only SHA256-checksummed patches
x86/microcode/AMD: Add get_patch_level()
x86/microcode/AMD: Get rid of the _load_microcode_amd() forward declaration
x86/microcode/AMD: Merge early_apply_microcode() into its single callsite
x86/microcode/AMD: Remove unused save_microcode_in_initrd_amd() declarations
x86/microcode/AMD: Remove ugly linebreak in __verify_patch_section() signature
|
|
CPUID leaf 0x2's one-byte TLB descriptors report the number of entries
for specific TLB types, among other properties.
Typically, each emitted descriptor implies the same number of entries
for its respective TLB type(s). An emitted 0x63 descriptor is an
exception: it implies 4 data TLB entries for 1GB pages and 32 data TLB
entries for 2MB or 4MB pages.
For the TLB descriptors parsing code, the entry count for 1GB pages is
encoded at the intel_tlb_table[] mapping, but the 2MB/4MB entry count is
totally ignored.
Update leaf 0x2's parsing logic 0x2 to account for 32 data TLB entries
for 2MB/4MB pages implied by the 0x63 descriptor.
Fixes: e0ba94f14f74 ("x86/tlb_info: get last level TLB entry number of CPU")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250304085152.51092-4-darwi@linutronix.de
|
|
CPUID leaf 0x2 emits one-byte descriptors in its four output registers
EAX, EBX, ECX, and EDX. For these descriptors to be valid, the most
significant bit (MSB) of each register must be clear.
Leaf 0x2 parsing at intel.c only validated the MSBs of EAX, EBX, and
ECX, but left EDX unchecked.
Validate EDX's most-significant bit as well.
Fixes: e0ba94f14f74 ("x86/tlb_info: get last level TLB entry number of CPU")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250304085152.51092-3-darwi@linutronix.de
|
|
CPUID leaf 0x2 emits one-byte descriptors in its four output registers
EAX, EBX, ECX, and EDX. For these descriptors to be valid, the most
significant bit (MSB) of each register must be clear.
The historical Git commit:
019361a20f016 ("- pre6: Intel: start to add Pentium IV specific stuff (128-byte cacheline etc)...")
introduced leaf 0x2 output parsing. It only validated the MSBs of EAX,
EBX, and ECX, but left EDX unchecked.
Validate EDX's most-significant bit.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250304085152.51092-2-darwi@linutronix.de
|
|
When both of X86_LOCAL_APIC and X86_THERMAL_VECTOR are disabled,
the irq tracing produces a W=1 build warning for the tracing
definitions:
In file included from include/trace/trace_events.h:27,
from include/trace/define_trace.h:113,
from arch/x86/include/asm/trace/irq_vectors.h:383,
from arch/x86/kernel/irq.c:29:
include/trace/stages/init.h:2:23: error: 'str__irq_vectors__trace_system_name' defined but not used [-Werror=unused-const-variable=]
Make the tracepoints conditional on the same symbosl that guard
their usage.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250225213236.3141752-1-arnd@kernel.org
|
|
I still have some Soekris net4826 in a Community Wireless Network I
volunteer with. These devices use an AMD SC1100 SoC. I am running
OpenWrt on them, which uses a patched kernel, that naturally has
evolved over time. I haven't updated the ones in the field in a
number of years (circa 2017), but have one in a test bed, where I have
intermittently tried out test builds.
A few years ago, I noticed some trouble, particularly when "warm
booting", that is, doing a reboot without removing power, and noticed
the device was hanging after the kernel message:
[ 0.081615] Working around Cyrix MediaGX virtual DMA bugs.
If I removed power and then restarted, it would boot fine, continuing
through the message above, thusly:
[ 0.081615] Working around Cyrix MediaGX virtual DMA bugs.
[ 0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor.
[ 0.100000] Enable Memory access reorder on Cyrix/NSC processor.
[ 0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 0.120037] CPU: NSC Geode(TM) Integrated Processor by National Semi (family: 0x5, model: 0x9, stepping: 0x1)
[...]
In order to continue using modern tools, like ssh, to interact with
the software on these old devices, I need modern builds of the OpenWrt
firmware on the devices. I confirmed that the warm boot hang was still
an issue in modern OpenWrt builds (currently using a patched linux
v6.6.65).
Last night, I decided it was time to get to the bottom of the warm
boot hang, and began bisecting. From preserved builds, I narrowed down
the bisection window from late February to late May 2019. During this
period, the OpenWrt builds were using 4.14.x. I was able to build
using period-correct Ubuntu 18.04.6. After a number of bisection
iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as
the commit that introduced the warm boot hang.
https://github.com/openwrt/openwrt/commit/07aaa7e3d62ad32767d7067107db64b6ade81537
Looking at the upstream changes in the stable kernel between 4.14.112
and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5
So, I tried reverting just that kernel change on top of the breaking
OpenWrt commit, and my warm boot hang went away.
Presumably, the warm boot hang is due to some register not getting
cleared in the same way that a loss of power does. That is
approximately as much as I understand about the problem.
More poking/prodding and coaching from Jonas Gorski, it looks
like this test patch fixes the problem on my board: Tested against
v6.6.67 and v4.14.113.
Fixes: 18fb053f9b82 ("x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors")
Debugged-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/CAHP3WfOgs3Ms4Z+L9i0-iBOE21sdMk5erAiJurPjnrL9LSsgRA@mail.gmail.com
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
There are cases when it is useful to use both ACPI and DTB provided by
the bootloader, however in such cases we should make sure to prevent
conflicts between the two. Namely, don't try to use DTB for SMP setup
if ACPI is enabled.
Precisely, this prevents at least:
- incorrectly calling register_lapic_address(APIC_DEFAULT_PHYS_BASE)
after the LAPIC was already successfully enumerated via ACPI, causing
noisy kernel warnings and probably potential real issues as well
- failed IOAPIC setup in the case when IOAPIC is enumerated via mptable
instead of ACPI (e.g. with acpi=noirq), due to
mpparse_parse_smp_config() overridden by x86_dtb_parse_smp_config()
Signed-off-by: Dmytro Maluka <dmaluka@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250105172741.3476758-2-dmaluka@chromium.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix AVX-VNNI CPU feature dependency bug triggered via the 'noxsave'
boot option
- Fix typos in the SVA documentation
- Add Tony Luck as RDT co-maintainer and remove Fenghua Yu
* tag 'x86-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
docs: arch/x86/sva: Fix two grammar errors under Background and FAQ
x86/cpufeatures: Make AVX-VNNI depend on AVX
MAINTAINERS: Change maintainer for RDT
|
|
Load patches for which the driver carries a SHA256 checksum of the patch
blob.
This can be disabled by adding "microcode.amd_sha_check=off" on the
kernel cmdline. But it is highly NOT recommended.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
The 'noxsave' boot option disables support for AVX, but support for the
AVX-VNNI feature was still declared on CPUs that support it. Fix this.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20250220060124.89622-1-ebiggers@kernel.org
|
|
Put the MSR_AMD64_PATCH_LEVEL reading of the current microcode revision
the hw has, into a separate function.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250211163648.30531-6-bp@kernel.org
|
|
Simply move save_microcode_in_initrd() down.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250211163648.30531-5-bp@kernel.org
|
|
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250211163648.30531-4-bp@kernel.org
|
|
Commit
a7939f016720 ("x86/microcode/amd: Cache builtin/initrd microcode early")
renamed it to save_microcode_in_initrd() and made it static. Zap the
forgotten declarations.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250211163648.30531-3-bp@kernel.org
|
|
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20250211163648.30531-2-bp@kernel.org
|
|
In [1] the meaning of the synthetic IBPB flags has been redefined for a
better separation of concerns:
- ENTRY_IBPB -- issue IBPB on entry only
- IBPB_ON_VMEXIT -- issue IBPB on VM-Exit only
and the Retbleed mitigations have been updated to match this new
semantics.
Commit [2] was merged shortly before [1], and their interaction was not
handled properly. This resulted in IBPB not being triggered on VM-Exit
in all SRSO mitigation configs requesting an IBPB there.
Specifically, an IBPB on VM-Exit is triggered only when
X86_FEATURE_IBPB_ON_VMEXIT is set. However:
- X86_FEATURE_IBPB_ON_VMEXIT is not set for "spec_rstack_overflow=ibpb",
because before [1] having X86_FEATURE_ENTRY_IBPB was enough. Hence,
an IBPB is triggered on entry but the expected IBPB on VM-exit is
not.
- X86_FEATURE_IBPB_ON_VMEXIT is not set also when
"spec_rstack_overflow=ibpb-vmexit" if X86_FEATURE_ENTRY_IBPB is
already set.
That's because before [1] this was effectively redundant. Hence, e.g.
a "retbleed=ibpb spec_rstack_overflow=bpb-vmexit" config mistakenly
reports the machine still vulnerable to SRSO, despite an IBPB being
triggered both on entry and VM-Exit, because of the Retbleed selected
mitigation config.
- UNTRAIN_RET_VM won't still actually do anything unless
CONFIG_MITIGATION_IBPB_ENTRY is set.
For "spec_rstack_overflow=ibpb", enable IBPB on both entry and VM-Exit
and clear X86_FEATURE_RSB_VMEXIT which is made superfluous by
X86_FEATURE_IBPB_ON_VMEXIT. This effectively makes this mitigation
option similar to the one for 'retbleed=ibpb', thus re-order the code
for the RETBLEED_MITIGATION_IBPB option to be less confusing by having
all features enabling before the disabling of the not needed ones.
For "spec_rstack_overflow=ibpb-vmexit", guard this mitigation setting
with CONFIG_MITIGATION_IBPB_ENTRY to ensure UNTRAIN_RET_VM sequence is
effectively compiled in. Drop instead the CONFIG_MITIGATION_SRSO guard,
since none of the SRSO compile cruft is required in this configuration.
Also, check only that the required microcode is present to effectively
enabled the IBPB on VM-Exit.
Finally, update the KConfig description for CONFIG_MITIGATION_IBPB_ENTRY
to list also all SRSO config settings enabled by this guard.
Fixes: 864bcaa38ee4 ("x86/cpu/kvm: Provide UNTRAIN_RET_VM") [1]
Fixes: d893832d0e1e ("x86/srso: Add IBPB on VMEXIT") [2]
Reported-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Patrick Bellasi <derkling@google.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
- The biggest changes are the TLB flushing scalability optimizations,
to update the mm_cpumask lazily and related changes.
This feature has both a track record and a continued risk of
performance regressions, so it was already delayed by a cycle - but
it's all 100% perfect now™ (Rik van Riel)
- Also miscellaneous fixes and cleanups. (Gautam Somani, Kirill
Shutemov, Sebastian Andrzej Siewior)
* tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove unnecessary include of <linux/extable.h>
x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
x86/mm/selftests: Fix typo in lam.c
x86/mm/tlb: Only trim the mm_cpumask once a second
x86/mm/tlb: Also remove local CPU from mm_cpumask if stale
x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU
x86/mm/tlb: Update mm_cpumask lazily
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Not much this cycle, there are multiple small fixes.
Core:
- use boolean values with device_init_wakeup()
Drivers:
- pcf2127: add BSM support
- pcf85063: fix possible out of bounds write"
* tag 'rtc-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: pcf2127: add BSM support
rtc: Remove hpet_rtc_dropped_irq()
dt-bindings: rtc: mxc: Document fsl,imx31-rtc
rtc: stm32: Use syscon_regmap_lookup_by_phandle_args
rtc: zynqmp: Fix optional clock name property
rtc: loongson: clear TOY_MATCH0_REG in loongson_rtc_isr()
rtc: pcf85063: fix potential OOB write in PCF85063 NVMEM read
rtc: tps6594: Fix integer overflow on 32bit systems
rtc: use boolean values with device_init_wakeup()
rtc: RTC_DRV_SPEAR should not default to y when compile-testing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Add a new ACPI-related quirk for Vexia EDU ATLA 10 tablet 5V (Hans de
Goede) and fix the MADT parsing code so that CPUs with different entry
types (LAPIC and x2APIC) are initialized in the order in which they
appear in the MADT as required by the ACPI specification (Zhang Rui)"
* tag 'acpi-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/acpi: Fix LAPIC/x2APIC parsing order
ACPI: x86: Add skip i2c clients quirk for Vexia EDU ATLA 10 tablet 5V
|
|
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.
Created this by running an spatch followed by a sed command:
Spatch:
virtual patch
@
depends on !(file in "net")
disable optional_qualifier
@
identifier table_name != {
watchdog_hardlockup_sysctl,
iwcm_ctl_table,
ucma_ctl_table,
memory_allocation_profiling_sysctls,
loadpin_sysctl_table
};
@@
+ const
struct ctl_table table_name [] = { ... };
sed:
sed --in-place \
-e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \
kernel/utsname_sysctl.c
Reviewed-by: Song Liu <song@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for kernel/trace/
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> # SCSI
Reviewed-by: Darrick J. Wong <djwong@kernel.org> # xfs
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs.
- "Allocate and free frozen pages" from Matthew Wilcox reorganizes
the page allocator so we end up with the ability to allocate and
free zero-refcount pages. So that callers (ie, slab) can avoid a
refcount inc & dec
- "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to
use large folios other than PMD-sized ones
- "Fix mm/rodata_test" from Petr Tesarik performs some maintenance
and fixes for this small built-in kernel selftest
- "mas_anode_descend() related cleanup" from Wei Yang tidies up part
of the mapletree code
- "mm: fix format issues and param types" from Keren Sun implements a
few minor code cleanups
- "simplify split calculation" from Wei Yang provides a few fixes and
a test for the mapletree code
- "mm/vma: make more mmap logic userland testable" from Lorenzo
Stoakes continues the work of moving vma-related code into the
(relatively) new mm/vma.c
- "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David
Hildenbrand cleans up and rationalizes handling of gfp flags in the
page allocator
- "readahead: Reintroduce fix for improper RA window sizing" from Jan
Kara is a second attempt at fixing a readahead window sizing issue.
It should reduce the amount of unnecessary reading
- "synchronously scan and reclaim empty user PTE pages" from Qi Zheng
addresses an issue where "huge" amounts of pte pagetables are
accumulated:
https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
Qi's series addresses this windup by synchronously freeing PTE
memory within the context of madvise(MADV_DONTNEED)
- "selftest/mm: Remove warnings found by adding compiler flags" from
Muhammad Usama Anjum fixes some build warnings in the selftests
code when optional compiler warnings are enabled
- "mm: don't use __GFP_HARDWALL when migrating remote pages" from
David Hildenbrand tightens the allocator's observance of
__GFP_HARDWALL
- "pkeys kselftests improvements" from Kevin Brodsky implements
various fixes and cleanups in the MM selftests code, mainly
pertaining to the pkeys tests
- "mm/damon: add sample modules" from SeongJae Park enhances DAMON to
estimate application working set size
- "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn
provides some cleanups to memcg's hugetlb charging logic
- "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song
removes the global swap cgroup lock. A speedup of 10% for a
tmpfs-based kernel build was demonstrated
- "zram: split page type read/write handling" from Sergey Senozhatsky
has several fixes and cleaups for zram in the area of
zram_write_page(). A watchdog softlockup warning was eliminated
- "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin
Brodsky cleans up the pagetable destructor implementations. A rare
use-after-free race is fixed
- "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes
simplifies and cleans up the debugging code in the VMA merging
logic
- "Account page tables at all levels" from Kevin Brodsky cleans up
and regularizes the pagetable ctor/dtor handling. This results in
improvements in accounting accuracy
- "mm/damon: replace most damon_callback usages in sysfs with new
core functions" from SeongJae Park cleans up and generalizes
DAMON's sysfs file interface logic
- "mm/damon: enable page level properties based monitoring" from
SeongJae Park increases the amount of information which is
presented in response to DAMOS actions
- "mm/damon: remove DAMON debugfs interface" from SeongJae Park
removes DAMON's long-deprecated debugfs interfaces. Thus the
migration to sysfs is completed
- "mm/hugetlb: Refactor hugetlb allocation resv accounting" from
Peter Xu cleans up and generalizes the hugetlb reservation
accounting
- "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino
removes a never-used feature of the alloc_pages_bulk() interface
- "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park
extends DAMOS filters to support not only exclusion (rejecting),
but also inclusion (allowing) behavior
- "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi
introduces a new memory descriptor for zswap.zpool that currently
overlaps with struct page for now. This is part of the effort to
reduce the size of struct page and to enable dynamic allocation of
memory descriptors
- "mm, swap: rework of swap allocator locks" from Kairui Song redoes
and simplifies the swap allocator locking. A speedup of 400% was
demonstrated for one workload. As was a 35% reduction for kernel
build time with swap-on-zram
- "mm: update mips to use do_mmap(), make mmap_region() internal"
from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that
mmap_region() can be made MM-internal
- "mm/mglru: performance optimizations" from Yu Zhao fixes a few
MGLRU regressions and otherwise improves MGLRU performance
- "Docs/mm/damon: add tuning guide and misc updates" from SeongJae
Park updates DAMON documentation
- "Cleanup for memfd_create()" from Isaac Manjarres does that thing
- "mm: hugetlb+THP folio and migration cleanups" from David
Hildenbrand provides various cleanups in the areas of hugetlb
folios, THP folios and migration
- "Uncached buffered IO" from Jens Axboe implements the new
RWF_DONTCACHE flag which provides synchronous dropbehind for
pagecache reading and writing. To permite userspace to address
issues with massive buildup of useless pagecache when
reading/writing fast devices
- "selftests/mm: virtual_address_range: Reduce memory" from Thomas
Weißschuh fixes and optimizes some of the MM selftests"
* tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm/compaction: fix UBSAN shift-out-of-bounds warning
s390/mm: add missing ctor/dtor on page table upgrade
kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags()
tools: add VM_WARN_ON_VMG definition
mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()
seqlock: add missing parameter documentation for raw_seqcount_try_begin()
mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh
mm/page_alloc: remove the incorrect and misleading comment
zram: remove zcomp_stream_put() from write_incompressible_page()
mm: separate move/undo parts from migrate_pages_batch()
mm/kfence: use str_write_read() helper in get_access_type()
selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()
kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags()
selftests/mm: virtual_address_range: avoid reading from VM_IO mappings
selftests/mm: vm_util: split up /proc/self/smaps parsing
selftests/mm: virtual_address_range: unmap chunks after validation
selftests/mm: virtual_address_range: mmap() without PROT_WRITE
selftests/memfd/memfd_test: fix possible NULL pointer dereference
mm: add FGP_DONTCACHE folio creation flag
mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Mainly individually changelogged singleton patches. The patch series
in this pull are:
- "lib min_heap: Improve min_heap safety, testing, and documentation"
from Kuan-Wei Chiu provides various tightenings to the min_heap
library code
- "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms
some cleanup and Rust preparation in the xarray library code
- "Update reference to include/asm-<arch>" from Geert Uytterhoeven
fixes pathnames in some code comments
- "Converge on using secs_to_jiffies()" from Easwar Hariharan uses
the new secs_to_jiffies() in various places where that is
appropriate
- "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen
switches two filesystems to the new mount API
- "Convert ocfs2 to use folios" from Matthew Wilcox does that
- "Remove get_task_comm() and print task comm directly" from Yafang
Shao removes now-unneeded calls to get_task_comm() in various
places
- "squashfs: reduce memory usage and update docs" from Phillip
Lougher implements some memory savings in squashfs and performs
some maintainability work
- "lib: clarify comparison function requirements" from Kuan-Wei Chiu
tightens the sort code's behaviour and adds some maintenance work
- "nilfs2: protect busy buffer heads from being force-cleared" from
Ryusuke Konishi fixes an issues in nlifs when the fs is presented
with a corrupted image
- "nilfs2: fix kernel-doc comments for function return values" from
Ryusuke Konishi fixes some nilfs kerneldoc
- "nilfs2: fix issues with rename operations" from Ryusuke Konishi
addresses some nilfs BUG_ONs which syzbot was able to trigger
- "minmax.h: Cleanups and minor optimisations" from David Laight does
some maintenance work on the min/max library code
- "Fixes and cleanups to xarray" from Kemeng Shi does maintenance
work on the xarray library code"
* tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits)
ocfs2: use str_yes_no() and str_no_yes() helper functions
include/linux/lz4.h: add some missing macros
Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent
Xarray: remove repeat check in xas_squash_marks()
Xarray: distinguish large entries correctly in xas_split_alloc()
Xarray: move forward index correctly in xas_pause()
Xarray: do not return sibling entries from xas_find_marked()
ipc/util.c: complete the kernel-doc function descriptions
gcov: clang: use correct function param names
latencytop: use correct kernel-doc format for func params
minmax.h: remove some #defines that are only expanded once
minmax.h: simplify the variants of clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: update some comments
minmax.h: add whitespace around operators and after commas
nilfs2: do not update mtime of renamed directory that is not moved
nilfs2: handle errors that nilfs_prepare_chunk() may return
CREDITS: fix spelling mistake
...
|
|
Before SLUB initialization, various subsystems used memblock_alloc to
allocate memory. In most cases, when memory allocation fails, an
immediate panic is required. To simplify this behavior and reduce
repetitive checks, introduce `memblock_alloc_or_panic`. This function
ensures that memory allocation failures result in a panic automatically,
improving code readability and consistency across subsystems that require
this behavior.
[guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic]
Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com
Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/
Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com
Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390]
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Move pagetable_dtor() to __tlb_remove_table(), so that ptlock and page
table pages can be freed together (regardless of whether RCU is used).
This prevents the use-after-free problem where the ptlock is freed
immediately but the page table pages is freed later via RCU.
Link: https://lkml.kernel.org/r/27b3cdc8786bebd4f748380bf82f796482718504.1736317725.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Convert __tlb_remove_table() to use struct ptdesc, which will help to move
pagetable_dtor() to __tlb_remove_table().
And page tables shouldn't have swap cache, so use pagetable_free() instead
of free_page_and_swap_cache() to free page table pages.
Link: https://lkml.kernel.org/r/39f60f93143ff77cf5d6b3c3e75af0ffc1480adb.1736317725.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Introduce a new set of Hyper-V headers in include/hyperv and replace
the old hyperv-tlfs.h with the new headers (Nuno Das Neves)
- Fixes for the Hyper-V VTL mode (Roman Kisel)
- Fixes for cpu mask usage in Hyper-V code (Michael Kelley)
- Document the guest VM hibernation behaviour (Michael Kelley)
- Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain)
* tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Documentation: hyperv: Add overview of guest VM hibernation
hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id()
hyperv: Do not overlap the hvcall IO areas in get_vtl()
hyperv: Enable the hypercall output page for the VTL mode
hv_balloon: Fallback to generic_online_page() for non-HV hot added mem
Drivers: hv: vmbus: Log on missing offers if any
Drivers: hv: vmbus: Wait for boot-time offers during boot and resume
uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup
iommu/hyper-v: Don't assume cpu_possible_mask is dense
Drivers: hv: Don't assume cpu_possible_mask is dense
x86/hyperv: Don't assume cpu_possible_mask is dense
hyperv: Remove the now unused hyperv-tlfs.h files
hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h
hyperv: Add new Hyper-V headers in include/hyperv
hyperv: Clean up unnecessary #includes
hyperv: Move hv_connection_id to hyperv-tlfs.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
- A large and involved preparatory series to pave the way to add
exception handling for relocate_kernel - which will be a debugging
facility that has aided in the field to debug an exceptionally hard
to debug early boot bug. Plus assorted cleanups and fixes that were
discovered along the way, by David Woodhouse:
- Clean up and document register use in relocate_kernel_64.S
- Use named labels in swap_pages in relocate_kernel_64.S
- Only swap pages for ::preserve_context mode
- Allocate PGD for x86_64 transition page tables separately
- Copy control page into place in machine_kexec_prepare()
- Invoke copy of relocate_kernel() instead of the original
- Move relocate_kernel to kernel .data section
- Add data section to relocate_kernel
- Drop page_list argument from relocate_kernel()
- Eliminate writes through kernel mapping of relocate_kernel page
- Clean up register usage in relocate_kernel()
- Mark relocate_kernel page as ROX instead of RWX
- Disable global pages before writing to control page
- Ensure preserve_context flag is set on return to kernel
- Use correct swap page in swap_pages function
- Fix stack and handling of re-entry point for ::preserve_context
- Mark machine_kexec() with __nocfi
- Cope with relocate_kernel() not being at the start of the page
- Use typedef for relocate_kernel_fn function prototype
- Fix location of relocate_kernel with -ffunction-sections (fix by Nathan Chancellor)
- A series to remove the last remaining absolute symbol references from
.head.text, and enforce this at build time, by Ard Biesheuvel:
- Avoid WARN()s and panic()s in early boot code
- Don't hang but terminate on failure to remap SVSM CA
- Determine VA/PA offset before entering C code
- Avoid intentional absolute symbol references in .head.text
- Disable UBSAN in early boot code
- Move ENTRY_TEXT to the start of the image
- Move .head.text into its own output section
- Reject absolute references in .head.text
- The above build-time enforcement uncovered a handful of bugs of
essentially non-working code, and a wrokaround for a toolchain bug,
fixed by Ard Biesheuvel as well:
- Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
- Disable UBSAN on SEV code that may execute very early
- Disable ftrace branch profiling in SEV startup code
- And miscellaneous cleanups:
- kexec_core: Add and update comments regarding the KEXEC_JUMP flow (Rafael J. Wysocki)
- x86/sysfs: Constify 'struct bin_attribute' (Thomas Weißschuh)"
* tag 'x86-boot-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
x86/sev: Disable ftrace branch profiling in SEV startup code
x86/kexec: Use typedef for relocate_kernel_fn function prototype
x86/kexec: Cope with relocate_kernel() not being at the start of the page
kexec_core: Add and update comments regarding the KEXEC_JUMP flow
x86/kexec: Mark machine_kexec() with __nocfi
x86/kexec: Fix location of relocate_kernel with -ffunction-sections
x86/kexec: Fix stack and handling of re-entry point for ::preserve_context
x86/kexec: Use correct swap page in swap_pages function
x86/kexec: Ensure preserve_context flag is set on return to kernel
x86/kexec: Disable global pages before writing to control page
x86/sev: Don't hang but terminate on failure to remap SVSM CA
x86/sev: Disable UBSAN on SEV code that may execute very early
x86/boot/64: Fix spurious undefined reference when CONFIG_X86_5LEVEL=n, on GCC-12
x86/sysfs: Constify 'struct bin_attribute'
x86/kexec: Mark relocate_kernel page as ROX instead of RWX
x86/kexec: Clean up register usage in relocate_kernel()
x86/kexec: Eliminate writes through kernel mapping of relocate_kernel page
x86/kexec: Drop page_list argument from relocate_kernel()
x86/kexec: Add data section to relocate_kernel
x86/kexec: Move relocate_kernel to kernel .data section
...
|
|
On some systems, the same CPU (with the same APIC ID) is assigned a
different logical CPU id after commit ec9aedb2aa1a ("x86/acpi: Ignore
invalid x2APIC entries").
This means that Linux enumerates the CPUs in a different order, which
violates ACPI specification[1] that states:
"OSPM should initialize processors in the order that they appear in
the MADT"
The problematic commit parses all LAPIC entries before any x2APIC
entries, aiming to ignore x2APIC entries with APIC ID < 255 when valid
LAPIC entries exist. However, it disrupts the CPU enumeration order on
systems where x2APIC entries precede LAPIC entries in the MADT.
Fix this problem by:
1) Parsing LAPIC entries first without registering them in the
topology to evaluate whether valid LAPIC entries exist.
2) Restoring the MADT in order parser which invokes either the LAPIC
or the X2APIC parser function depending on the entry type.
The X2APIC parser still ignores entries < 0xff in case that #1 found
valid LAPIC entries independent of their position in the MADT table.
Link: https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#madt-processor-local-apic-sapic-structure-entry-order
Cc: All applicable <stable@vger.kernel.org>
Reported-by: Jim Mattson <jmattson@google.com>
Closes: https://lore.kernel.org/all/20241010213136.668672-1-jmattson@google.com/
Fixes: ec9aedb2aa1a ("x86/acpi: Ignore invalid x2APIC entries")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Tested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20250117081420.4046737-1-rui.zhang@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace updates from Steven Rostedt:
- Have fprobes built on top of function graph infrastructure
The fprobe logic is an optimized kprobe that uses ftrace to attach to
functions when a probe is needed at the start or end of the function.
The fprobe and kretprobe logic implements a similar method as the
function graph tracer to trace the end of the function. That is to
hijack the return address and jump to a trampoline to do the trace
when the function exits. To do this, a shadow stack needs to be
created to store the original return address. Fprobes and function
graph do this slightly differently. Fprobes (and kretprobes) has
slots per callsite that are reserved to save the return address. This
is fine when just a few points are traced. But users of fprobes, such
as BPF programs, are starting to add many more locations, and this
method does not scale.
The function graph tracer was created to trace all functions in the
kernel. In order to do this, when function graph tracing is started,
every task gets its own shadow stack to hold the return address that
is going to be traced. The function graph tracer has been updated to
allow multiple users to use its infrastructure. Now have fprobes be
one of those users. This will also allow for the fprobe and kretprobe
methods to trace the return address to become obsolete. With new
technologies like CFI that need to know about these methods of
hijacking the return address, going toward a solution that has only
one method of doing this will make the kernel less complex.
- Cleanup with guard() and free() helpers
There were several places in the code that had a lot of "goto out" in
the error paths to either unlock a lock or free some memory that was
allocated. But this is error prone. Convert the code over to use the
guard() and free() helpers that let the compiler unlock locks or free
memory when the function exits.
- Remove disabling of interrupts in the function graph tracer
When function graph tracer was first introduced, it could race with
interrupts and NMIs. To prevent that race, it would disable
interrupts and not trace NMIs. But the code has changed to allow NMIs
and also interrupts. This change was done a long time ago, but the
disabling of interrupts was never removed. Remove the disabling of
interrupts in the function graph tracer is it is not needed. This
greatly improves its performance.
- Allow the :mod: command to enable tracing module functions on the
kernel command line.
The function tracer already has a way to enable functions to be
traced in modules by writing ":mod:<module>" into set_ftrace_filter.
That will enable either all the functions for the module if it is
loaded, or if it is not, it will cache that command, and when the
module is loaded that matches <module>, its functions will be
enabled. This also allows init functions to be traced. But currently
events do not have that feature.
Because enabling function tracing can be done very early at boot up
(before scheduling is enabled), the commands that can be done when
function tracing is started is limited. Having the ":mod:" command to
trace module functions as they are loaded is very useful. Update the
kernel command line function filtering to allow it.
* tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits)
ftrace: Implement :mod: cache filtering on kernel command line
tracing: Adopt __free() and guard() for trace_fprobe.c
bpf: Use ftrace_get_symaddr() for kprobe_multi probes
ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr
Documentation: probes: Update fprobe on function-graph tracer
selftests/ftrace: Add a test case for repeating register/unregister fprobe
selftests: ftrace: Remove obsolate maxactive syntax check
tracing/fprobe: Remove nr_maxactive from fprobe
fprobe: Add fprobe_header encoding feature
fprobe: Rewrite fprobe on function-graph tracer
s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC
ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
tracing: Add ftrace_fill_perf_regs() for perf event
tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
fprobe: Use ftrace_regs in fprobe exit handler
fprobe: Use ftrace_regs in fprobe entry handler
fgraph: Pass ftrace_regs to retfunc
fgraph: Replace fgraph_ret_regs with ftrace_regs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt subsystem updates from Thomas Gleixner:
- Consolidate the machine_kexec_mask_interrupts() by providing a
generic implementation and replacing the copy & pasta orgy in the
relevant architectures.
- Prevent unconditional operations on interrupt chips during kexec
shutdown, which can trigger warnings in certain cases when the
underlying interrupt has been shut down before.
- Make the enforcement of interrupt handling in interrupt context
unconditionally available, so that it actually works for non x86
related interrupt chips. The earlier enablement for ARM GIC chips set
the required chip flag, but did not notice that the check was hidden
behind a config switch which is not selected by ARM[64].
- Decrapify the handling of deferred interrupt affinity setting.
Some interrupt chips require that affinity changes are made from the
context of handling an interrupt to avoid certain race conditions.
For x86 this was the default, but with interrupt remapping this
requirement was lifted and a flag was introduced which tells the core
code that affinity changes can be done in any context. Unrestricted
affinity changes are the default for the majority of interrupt chips.
RISCV has the requirement to add the deferred mode to one of it's
interrupt controllers, but with the original implementation this
would require to add the any context flag to all other RISC-V
interrupt chips. That's backwards, so reverse the logic and require
that chips, which need the deferred mode have to be marked
accordingly. That avoids chasing the 'sane' chips and marking them.
- Add multi-node support to the Loongarch AVEC interrupt controller
driver.
- The usual tiny cleanups, fixes and improvements all over the place.
* tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set()
genirq/timings: Add kernel-doc for a function parameter
genirq: Remove IRQ_MOVE_PCNTXT and related code
x86/apic: Convert to IRQCHIP_MOVE_DEFERRED
genirq: Provide IRQCHIP_MOVE_DEFERRED
hexagon: Remove GENERIC_PENDING_IRQ leftover
ARC: Remove GENERIC_PENDING_IRQ
genirq: Remove handle_enforce_irqctx() wrapper
genirq: Make handle_enforce_irqctx() unconditionally available
irqchip/loongarch-avec: Add multi-nodes topology support
irqchip/ts4800: Replace seq_printf() by seq_puts()
irqchip/ti-sci-inta : Add module build support
irqchip/ti-sci-intr: Add module build support
irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function
irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args
genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown
kexec: Consolidate machine_kexec_mask_interrupts() implementation
genirq: Reuse irq_thread_fn() for forced thread case
genirq: Move irq_thread_fn() further up in the code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Fair scheduler (SCHED_FAIR) enhancements:
- Behavioral improvements:
- Untangle NEXT_BUDDY and pick_next_task() (Peter Zijlstra)
- Delayed-dequeue enhancements & fixes: (Vincent Guittot)
- Rename h_nr_running into h_nr_queued
- Add new cfs_rq.h_nr_runnable
- Use the new cfs_rq.h_nr_runnable
- Removed unsued cfs_rq.h_nr_delayed
- Rename cfs_rq.idle_h_nr_running into h_nr_idle
- Remove unused cfs_rq.idle_nr_running
- Rename cfs_rq.nr_running into nr_queued
- Do not try to migrate delayed dequeue task
- Fix variable declaration position
- Encapsulate set custom slice in a __setparam_fair() function
- Fixes:
- Fix race between yield_to() and try_to_wake_up() (Tianchen Ding)
- Fix CPU bandwidth limit bypass during CPU hotplug (Vishal
Chourasia)
- Cleanups:
- Clean up in migrate_degrades_locality() to improve readability
(Peter Zijlstra)
- Mark m*_vruntime() with __maybe_unused (Andy Shevchenko)
- Update comments after sched_tick() rename (Sebastian Andrzej
Siewior)
- Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used()
(Valentin Schneider)
Deadline scheduler (SCHED_DL) enhancements:
- Restore dl_server bandwidth on non-destructive root domain changes
(Juri Lelli)
- Correctly account for allocated bandwidth during hotplug (Juri
Lelli)
- Check bandwidth overflow earlier for hotplug (Juri Lelli)
- Clean up goto label in pick_earliest_pushable_dl_task() (John
Stultz)
- Consolidate timer cancellation (Wander Lairson Costa)
Load-balancer enhancements:
- Improve performance by prioritizing migrating eligible tasks in
sched_balance_rq() (Hao Jia)
- Do not compute NUMA Balancing stats unnecessarily during
load-balancing (K Prateek Nayak)
- Do not compute overloaded status unnecessarily during
load-balancing (K Prateek Nayak)
Generic scheduling code enhancements:
- Use READ_ONCE() in task_on_rq_queued(), to consistently use the
WRITE_ONCE() updated ->on_rq field (Harshit Agarwal)
Isolated CPUs support enhancements: (Waiman Long)
- Make "isolcpus=nohz" equivalent to "nohz_full"
- Consolidate housekeeping cpumasks that are always identical
- Remove HK_TYPE_SCHED
- Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE
RSEQ enhancements:
- Validate read-only fields under DEBUG_RSEQ config (Mathieu
Desnoyers)
PSI enhancements:
- Fix race when task wakes up before psi_sched_switch() adjusts flags
(Chengming Zhou)
IRQ time accounting performance enhancements: (Yafang Shao)
- Define sched_clock_irqtime as static key
- Don't account irq time if sched_clock_irqtime is disabled
Virtual machine scheduling enhancements:
- Don't try to catch up excess steal time (Suleiman Souhlal)
Heterogenous x86 CPU scheduling enhancements: (K Prateek Nayak)
- Convert "sysctl_sched_itmt_enabled" to boolean
- Use guard() for itmt_update_mutex
- Move the "sched_itmt_enabled" sysctl to debugfs
- Remove x86_smt_flags and use cpu_smt_flags directly
- Use x86_sched_itmt_flags for PKG domain unconditionally
Debugging code & instrumentation enhancements:
- Change need_resched warnings to pr_err() (David Rientjes)
- Print domain name in /proc/schedstat (K Prateek Nayak)
- Fix value reported by hot tasks pulled in /proc/schedstat (Peter
Zijlstra)
- Report the different kinds of imbalances in /proc/schedstat
(Swapnil Sapkal)
- Move sched domain name out of CONFIG_SCHED_DEBUG (Swapnil Sapkal)
- Update Schedstat version to 17 (Swapnil Sapkal)"
* tag 'sched-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
rseq: Fix rseq unregistration regression
psi: Fix race when task wakes up before psi_sched_switch() adjusts flags
sched, psi: Don't account irq time if sched_clock_irqtime is disabled
sched: Don't account irq time if sched_clock_irqtime is disabled
sched: Define sched_clock_irqtime as static key
sched/fair: Do not compute overloaded status unnecessarily during lb
sched/fair: Do not compute NUMA Balancing stats unnecessarily during lb
x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
x86/itmt: Use guard() for itmt_update_mutex
x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
sched/core: Prioritize migrating eligible tasks in sched_balance_rq()
sched/debug: Change need_resched warnings to pr_err
sched/fair: Encapsulate set custom slice in a __setparam_fair() function
sched: Fix race between yield_to() and try_to_wake_up()
docs: Update Schedstat version to 17
sched/stats: Print domain name in /proc/schedstat
sched: Move sched domain name out of CONFIG_SCHED_DEBUG
sched: Report the different kinds of imbalances in /proc/schedstat
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Miscellaneous x86 cleanups and typo fixes, and also the removal of
the 'disablelapic' boot parameter"
* tag 'x86-cleanups-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Remove a stray tab in the IO-APIC type string
x86/cpufeatures: Remove "AMD" from the comments to the AMD-specific leaf
Documentation/kernel-parameters: Fix a typo in kvm.enable_virt_at_load text
x86/cpu: Fix typo in x86_match_cpu()'s doc
x86/apic: Remove "disablelapic" cmdline option
Documentation: Merge x86-specific boot options doc into kernel-parameters.txt
x86/ioremap: Remove unused size parameter in remapping functions
x86/ioremap: Simplify setup_data mapping variants
x86/boot/compressed: Remove unused header includes from kaslr.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Seqlock optimizations that arose in a perf context and were merged
into the perf tree:
- seqlock: Add raw_seqcount_try_begin (Suren Baghdasaryan)
- mm: Convert mm_lock_seq to a proper seqcount (Suren Baghdasaryan)
- mm: Introduce mmap_lock_speculate_{try_begin|retry} (Suren
Baghdasaryan)
- mm/gup: Use raw_seqcount_try_begin() (Peter Zijlstra)
Core perf enhancements:
- Reduce 'struct page' footprint of perf by mapping pages in advance
(Lorenzo Stoakes)
- Save raw sample data conditionally based on sample type (Yabin Cui)
- Reduce sampling overhead by checking sample_type in
perf_sample_save_callchain() and perf_sample_save_brstack() (Yabin
Cui)
- Export perf_exclude_event() (Namhyung Kim)
Uprobes scalability enhancements: (Andrii Nakryiko)
- Simplify find_active_uprobe_rcu() VMA checks
- Add speculative lockless VMA-to-inode-to-uprobe resolution
- Simplify session consumer tracking
- Decouple return_instance list traversal and freeing
- Ensure return_instance is detached from the list before freeing
- Reuse return_instances between multiple uretprobes within task
- Guard against kmemdup() failing in dup_return_instance()
AMD core PMU driver enhancements:
- Relax privilege filter restriction on AMD IBS (Namhyung Kim)
AMD RAPL energy counters support: (Dhananjay Ugwekar)
- Introduce topology_logical_core_id() (K Prateek Nayak)
- Remove the unused get_rapl_pmu_cpumask() function
- Remove the cpu_to_rapl_pmu() function
- Rename rapl_pmu variables
- Make rapl_model struct global
- Add arguments to the init and cleanup functions
- Modify the generic variable names to *_pkg*
- Remove the global variable rapl_msrs
- Move the cntr_mask to rapl_pmus struct
- Add core energy counter support for AMD CPUs
Intel core PMU driver enhancements:
- Support RDPMC 'metrics clear mode' feature (Kan Liang)
- Clarify adaptive PEBS processing (Kan Liang)
- Factor out functions for PEBS records processing (Kan Liang)
- Simplify the PEBS records processing for adaptive PEBS (Kan Liang)
Intel uncore driver enhancements: (Kan Liang)
- Convert buggy pmu->func_id use to pmu->registered
- Support more units on Granite Rapids"
* tag 'perf-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
perf: map pages in advance
perf/x86/intel/uncore: Support more units on Granite Rapids
perf/x86/intel/uncore: Clean up func_id
perf/x86/intel: Support RDPMC metrics clear mode
uprobes: Guard against kmemdup() failing in dup_return_instance()
perf/x86: Relax privilege filter restriction on AMD IBS
perf/core: Export perf_exclude_event()
uprobes: Reuse return_instances between multiple uretprobes within task
uprobes: Ensure return_instance is detached from the list before freeing
uprobes: Decouple return_instance list traversal and freeing
uprobes: Simplify session consumer tracking
uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
uprobes: simplify find_active_uprobe_rcu() VMA checks
mm: introduce mmap_lock_speculate_{try_begin|retry}
mm: convert mm_lock_seq to a proper seqcount
mm/gup: Use raw_seqcount_try_begin()
seqlock: add raw_seqcount_try_begin
perf/x86/rapl: Add core energy counter support for AMD CPUs
perf/x86/rapl: Move the cntr_mask to rapl_pmus struct
perf/x86/rapl: Remove the global variable rapl_msrs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- Introduce the generic section-based annotation infrastructure a.k.a.
ASM_ANNOTATE/ANNOTATE (Peter Zijlstra)
- Convert various facilities to ASM_ANNOTATE/ANNOTATE: (Peter Zijlstra)
- ANNOTATE_NOENDBR
- ANNOTATE_RETPOLINE_SAFE
- instrumentation_{begin,end}()
- VALIDATE_UNRET_BEGIN
- ANNOTATE_IGNORE_ALTERNATIVE
- ANNOTATE_INTRA_FUNCTION_CALL
- {.UN}REACHABLE
- Optimize the annotation-sections parsing code (Peter Zijlstra)
- Centralize annotation definitions in <linux/objtool.h>
- Unify & simplify the barrier_before_unreachable()/unreachable()
definitions (Peter Zijlstra)
- Convert unreachable() calls to BUG() in x86 code, as unreachable()
has unreliable code generation (Peter Zijlstra)
- Remove annotate_reachable() and annotate_unreachable(), as it's
unreliable against compiler optimizations (Peter Zijlstra)
- Fix non-standard ANNOTATE_REACHABLE annotation order (Peter Zijlstra)
- Robustify the annotation code by warning about unknown annotation
types (Peter Zijlstra)
- Allow arch code to discover jump table size, in preparation of
annotated jump table support (Ard Biesheuvel)
* tag 'objtool-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Convert unreachable() to BUG()
objtool: Allow arch code to discover jump table size
objtool: Warn about unknown annotation types
objtool: Fix ANNOTATE_REACHABLE to be a normal annotation
objtool: Convert {.UN}REACHABLE to ANNOTATE
objtool: Remove annotate_{,un}reachable()
loongarch: Use ASM_REACHABLE
x86: Convert unreachable() to BUG()
unreachable: Unify
objtool: Collect more annotations in objtool.h
objtool: Collapse annotate sequences
objtool: Convert ANNOTATE_INTRA_FUNCTION_CALL to ANNOTATE
objtool: Convert ANNOTATE_IGNORE_ALTERNATIVE to ANNOTATE
objtool: Convert VALIDATE_UNRET_BEGIN to ANNOTATE
objtool: Convert instrumentation_{begin,end}() to ANNOTATE
objtool: Convert ANNOTATE_RETPOLINE_SAFE to ANNOTATE
objtool: Convert ANNOTATE_NOENDBR to ANNOTATE
objtool: Generic annotation infrastructure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Borislav Petkov:
- The first part of a restructuring of AMD's representation of a
northbridge which is legacy now, and the creation of the new AMD node
concept which represents the Zen architecture of having a collection
of I/O devices within an SoC. Those nodes comprise the so-called data
fabric on Zen.
This has at least one practical advantage of not having to add a PCI
ID each time a new data fabric PCI device releases. Eventually, the
lot more uniform provider of data fabric functionality amd_node.c
will be used by all the drivers which need it
- Smaller cleanups
* tag 'x86_misc_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd_node: Use defines for SMN register offsets
x86/amd_node: Remove dependency on AMD_NB
x86/amd_node: Update __amd_smn_rw() error paths
x86/amd_nb: Move SMN access code to a new amd_node driver
x86/amd_nb, hwmon: (k10temp): Simplify amd_pci_dev_to_node_id()
x86/amd_nb: Simplify function 3 search
x86/amd_nb: Use topology info to get AMD node count
x86/amd_nb: Simplify root device search
x86/amd_nb: Simplify function 4 search
x86: Start moving AMD node functionality out of AMD_NB
x86/amd_nb: Clean up early_is_amd_nb()
x86/amd_nb: Restrict init function to AMD-based systems
x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Remove the less generic CPU matching infra around struct x86_cpu_desc
and use the generic struct x86_cpu_id thing
- Remove magic naked numbers for CPUID functions and use proper defines
of the prefix CPUID_LEAF_*. Consolidate some of the crazy use around
the tree
- Smaller cleanups and improvements
* tag 'x86_cpu_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Make all all CPUID leaf names consistent
x86/fpu: Remove unnecessary CPUID level check
x86/fpu: Move CPUID leaf definitions to common code
x86/tsc: Remove CPUID "frequency" leaf magic numbers.
x86/tsc: Move away from TSC leaf magic numbers
x86/cpu: Move TSC CPUID leaf definition
x86/cpu: Refresh DCA leaf reading code
x86/cpu: Remove unnecessary MwAIT leaf checks
x86/cpu: Use MWAIT leaf definition
x86/cpu: Move MWAIT leaf definition to common header
x86/cpu: Remove 'x86_cpu_desc' infrastructure
x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'
x86/cpu: Replace PEBS use of 'x86_cpu_desc' use with 'x86_cpu_id'
x86/cpu: Expose only stepping min/max interface
x86/cpu: Introduce new microcode matching helper
x86/cpufeature: Document cpu_feature_enabled() as the default to use
x86/paravirt: Remove the WBINVD callback
x86/cpufeatures: Free up unused feature bits
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- A segmented Reverse Map table (RMP) is a across-nodes distributed
table of sorts which contains per-node descriptors of each node-local
4K page, denoting its ownership (hypervisor, guest, etc) in the realm
of confidential computing. Add support for such a table in order to
improve referential locality when accessing or modifying RMP table
entries
- Add support for reading the TSC in SNP guests by removing any
interference or influence the hypervisor might have, with the goal of
making a confidential guest even more independent from the hypervisor
* tag 'x86_sev_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Add the Secure TSC feature for SNP guests
x86/tsc: Init the TSC for Secure TSC guests
x86/sev: Mark the TSC in a secure TSC guest as reliable
x86/sev: Prevent RDTSC/RDTSCP interception for Secure TSC enabled guests
x86/sev: Prevent GUEST_TSC_FREQ MSR interception for Secure TSC enabled guests
x86/sev: Change TSC MSR behavior for Secure TSC enabled guests
x86/sev: Add Secure TSC support for SNP guests
x86/sev: Relocate SNP guest messaging routines to common code
x86/sev: Carve out and export SNP guest messaging init routines
virt: sev-guest: Replace GFP_KERNEL_ACCOUNT with GFP_KERNEL
virt: sev-guest: Remove is_vmpck_empty() helper
x86/sev/docs: Document the SNP Reverse Map Table (RMP)
x86/sev: Add full support for a segmented RMP table
x86/sev: Treat the contiguous RMP table as a single RMP segment
x86/sev: Map only the RMP table entries instead of the full RMP range
x86/sev: Move the SNP probe routine out of the way
x86/sev: Require the RMPREAD instruction after Zen4
x86/sev: Add support for the RMPREAD instruction
x86/sev: Prepare for using the RMPREAD instruction to access the RMP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loader updates from Borislav Petkov:
- A bunch of minor cleanups
* tag 'x86_microcode_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Remove ret local var in early_apply_microcode()
x86/microcode/AMD: Have __apply_microcode_amd() return bool
x86/microcode/AMD: Make __verify_patch_size() return bool
x86/microcode/AMD: Remove bogus comment from parse_container()
x86/microcode/AMD: Return bool from find_blobs_in_containers()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Extend resctrl with the capability of total memory bandwidth
monitoring, thus accomodating systems which support only total but
not local memory bandwidth monitoring. Add the respective new mount
options
- The usual cleanups
* tag 'x86_cache_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Document the new "mba_MBps_event" file
x86/resctrl: Add write option to "mba_MBps_event" file
x86/resctrl: Add "mba_MBps_event" file to CTRL_MON directories
x86/resctrl: Make mba_sc use total bandwidth if local is not supported
x86/resctrl: Compute memory bandwidth for all supported events
x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event
x86/resctrl: Prepare for per-CTRL_MON group mba_MBps control
x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags
x86/resctrl: Use kthread_run_on_cpu()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU speculation update from Borislav Petkov:
- Add support for AMD hardware which is not affected by SRSO on the
user/kernel attack vector and advertise it to guest userspace
* tag 'x86_bugs_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM: x86: Advertise SRSO_USER_KERNEL_NO to userspace
x86/bugs: Add SRSO_USER_KERNEL_NO support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Remove the shared threshold bank hack on AMD and streamline and
simplify it
- Cleanup and sanitize MCA code
* tag 'ras_core_for_v6.14_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/amd: Remove shared threshold bank plumbing
x86/mce: Remove the redundant mce_hygon_feature_init()
x86/mce: Convert family/model mixed checks to VFM-based checks
x86/mce: Break up __mcheck_cpu_apply_quirks()
x86/mce: Make four functions return bool
x86/mce/threshold: Remove the redundant this_cpu_dec_return()
x86/mce: Make several functions return bool
|
|
Instead of marking individual interrupts as safe to be migrated in
arbitrary contexts, mark the interrupt chips, which require the interrupt
to be moved in actual interrupt context, with the new IRQCHIP_MOVE_DEFERRED
flag. This makes more sense because this is a per interrupt chip property
and not restricted to individual interrupts.
That flips the logic from the historical opt-out to a opt-in model. This is
simpler to handle for other architectures, which default to unrestricted
affinity setting. It also allows to cleanup the redundant core logic
significantly.
All interrupt chips, which belong to a top-level domain sitting directly on
top of the x86 vector domain are marked accordingly, unless the related
setup code marks the interrupts with IRQ_MOVE_PCNTXT, i.e. XEN.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/all/20241210103335.563277044@linutronix.de
|