Age | Commit message (Collapse) | Author |
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250411054105.2341982-11-mingo@kernel.org
|
|
Put it into the text_poke_* namespace of <asm/text-patching.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250411054105.2341982-10-mingo@kernel.org
|
|
Put it into the text_poke_* namespace of <asm/text-patching.h>.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250411054105.2341982-9-mingo@kernel.org
|
|
All related functions in this subsystem already have a
text_poke_int3_ prefix - add it to the trap handler
as well.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250411054105.2341982-8-mingo@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250411054105.2341982-7-mingo@kernel.org
|
|
'smp_text_poke_batch_process()'
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250411054105.2341982-6-mingo@kernel.org
|
|
Make it clear that these reference counts lock access
to text_poke_array.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250411054105.2341982-5-mingo@kernel.org
|
|
text_poke_int3_vec'
Follow the INT3 text-poking nomenclature, and also adopt the
'vector' name for the entire object, instead of the rather
opaque 'descriptor' naming.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250411054105.2341982-4-mingo@kernel.org
|
|
bit more
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250411054105.2341982-3-mingo@kernel.org
|
|
sharing in poke_int3_handler()
eBPF programs can be run 50,000,000 times per second on busy servers.
Whenever /proc/sys/kernel/bpf_stats_enabled is turned off,
hundreds of calls sites are patched from text_poke_bp_batch()
and we see a huge loss of performance due to false sharing
on bp_desc.refs lasting up to three seconds.
51.30% server_bin [kernel.kallsyms] [k] poke_int3_handler
|
|--46.45%--poke_int3_handler
| exc_int3
| asm_exc_int3
| |
| |--24.26%--cls_bpf_classify
| | tcf_classify
| | __dev_queue_xmit
| | ip6_finish_output2
| | ip6_output
| | ip6_xmit
| | inet6_csk_xmit
| | __tcp_transmit_skb
Fix this by replacing bp_desc.refs with a per-cpu bp_refs.
Before the patch, on a host with 240 cores (480 threads):
$ sysctl -wq kernel.bpf_stats_enabled=0
text_poke_bp_batch(nr_entries=164) : Took 2655300 usec
$ bpftool prog | grep run_time_ns
...
105: sched_cls name hn_egress tag 699fc5eea64144e3 gpl run_time_ns
3009063719 run_cnt 82757845 : average cost is 36 nsec per call
After this patch:
$ sysctl -wq kernel.bpf_stats_enabled=0
text_poke_bp_batch(nr_entries=164) : Took 702 usec
$ bpftool prog | grep run_time_ns
...
105: sched_cls name hn_egress tag 699fc5eea64144e3 gpl run_time_ns
1928223019 run_cnt 67682728 : average cost is 28 nsec per call
Ie. text-patching performance improved 3700x: from 2.65 seconds
to 0.0007 seconds.
Since the atomic_cond_read_acquire(refs, !VAL) spin-loop was not triggered
even once in my tests, add an unlikely() annotation, because this appears
to be the common case.
[ mingo: Improved the changelog some more. ]
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250411054105.2341982-2-mingo@kernel.org
|
|
Initializing a percpu variable with the address of a struct tagged as
.initdata is breaking the build with CONFIG_SECTION_MISMATCH_WARN_ONLY
not set to "y".
Fix that by using an access function instead returning the .initdata
struct address if the percpu space of the struct hasn't been
allocated yet.
Fixes: 368990a7fe30 ("xen: fix multicall debug data referencing")
Reported-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: "Borislav Petkov (AMD)" <bp@alien8.de>
Tested-by: "Borislav Petkov (AMD)" <bp@alien8.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Message-ID: <20250327190602.26015-1-jgross@suse.com>
|
|
There's a lockdep false positive warning related to i8253_lock:
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
...
systemd-sleep/3324 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffffffffb2c23398 (i8253_lock){+.+.}-{2:2}, at: pcspkr_event+0x3f/0xe0 [pcspkr]
...
... which became HARDIRQ-irq-unsafe at:
...
lock_acquire+0xd0/0x2f0
_raw_spin_lock+0x30/0x40
clockevent_i8253_disable+0x1c/0x60
pit_timer_init+0x25/0x50
hpet_time_init+0x46/0x50
x86_late_time_init+0x1b/0x40
start_kernel+0x962/0xa00
x86_64_start_reservations+0x24/0x30
x86_64_start_kernel+0xed/0xf0
common_startup_64+0x13e/0x141
...
Lockdep complains due pit_timer_init() using the lock in an IRQ-unsafe
fashion, but it's a false positive, because there is no deadlock
possible at that point due to init ordering: at the point where
pit_timer_init() is called there is no other possible usage of
i8253_lock because the system is still in the very early boot stage
with no interrupts.
But in any case, pit_timer_init() should disable interrupts before
calling clockevent_i8253_disable() out of general principle, and to
keep lockdep working even in this scenario.
Use scoped_guard() for that, as suggested by Thomas Gleixner.
[ mingo: Cleaned up the changelog. ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/Z-uwd4Bnn7FcCShX@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix CPU topology related regression that limited Xen PV guests to a
single CPU
- Fix ancient e820__register_nosave_regions() bugs that were causing
problems with kexec's artificial memory maps
- Fix an S4 hibernation crash caused by two missing ENDBR's that were
mistakenly removed in a recent commit
- Fix a resctrl serialization bug
- Fix early_printk documentation and comments
- Fix RSB bugs, combined with preparatory updates to better match the
code to vendor recommendations.
- Add RSB mitigation document
- Fix/update documentation
- Fix the erratum_1386_microcode[] table to be NULL terminated
* tag 'x86-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ibt: Fix hibernate
x86/cpu: Avoid running off the end of an AMD erratum table
Documentation/x86: Zap the subsection letters
Documentation/x86: Update the naming of CPU features for /proc/cpuinfo
x86/bugs: Add RSB mitigation document
x86/bugs: Don't fill RSB on context switch with eIBRS
x86/bugs: Don't fill RSB on VMEXIT with eIBRS+retpoline
x86/bugs: Fix RSB clearing in indirect_branch_prediction_barrier()
x86/bugs: Use SBPB in write_ibpb() if applicable
x86/bugs: Rename entry_ibpb() to write_ibpb()
x86/early_printk: Use 'mmio32' for consistency, fix comments
x86/resctrl: Fix rdtgroup_mkdir()'s unlocked use of kernfs_node::name
x86/e820: Fix handling of subpage regions when calculating nosave ranges in e820__register_nosave_regions()
x86/acpi: Don't limit CPUs to 1 for Xen PV guests due to disabled ACPI
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc objtool fixes from Ingo Molnar:
- Remove the recently introduced ANNOTATE_IGNORE_ALTERNATIVE noise from
clac()/stac() code to make .s files more readable
- Fix INSN_SYSCALL / INSN_SYSRET semantics
- Fix various false-positive warnings
* tag 'objtool-urgent-2025-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix false-positive "ignoring unreachables" warning
objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC
objtool, xen: Fix INSN_SYSCALL / INSN_SYSRET semantics
objtool: Stop UNRET validation on UD2
objtool: Split INSN_CONTEXT_SWITCH into INSN_SYSCALL and INSN_SYSRET
objtool: Fix INSN_CONTEXT_SWITCH handling in validate_unret()
|
|
SNP platform can provide a vTPM device emulated by SVSM.
The "tpm-svsm" device can be handled by the platform driver registered by the
x86/sev core code.
Register the platform device only when SVSM is available and it supports vTPM
commands as checked by snp_svsm_vtpm_probe().
[ bp: Massage commit message. ]
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/r/20250410135118.133240-5-sgarzare@redhat.com
|
|
Add two new functions to probe and send commands to the SVSM vTPM. They
leverage the two calls defined by the AMD SVSM specification [1] for the vTPM
protocol: SVSM_VTPM_QUERY and SVSM_VTPM_CMD.
Expose snp_svsm_vtpm_send_command() to be used by a TPM driver.
[1] "Secure VM Service Module for SEV-SNP Guests"
Publication # 58019 Revision: 1.00
[ bp: Some doc touchups. ]
Co-developed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Co-developed-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/r/20250403100943.120738-2-sgarzare@redhat.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A simple fix adding the module description of the Xenbus frontend
module
- A fix correcting the xen-acpi-processor Kconfig dependency for PVH
Dom0 support
- A fix for the Xen balloon driver when running as Xen Dom0 in PVH mode
- A fix for PVH Dom0 in order to avoid problems with CPU idle and
frequency drivers conflicting with Xen
* tag 'for-linus-6.15a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: disable CPU idle and frequency drivers for PVH dom0
x86/xen: fix balloon target initialization for PVH dom0
xen: Change xen-acpi-processor dom0 dependency
xenbus: add module description
|
|
Reduce the window during which exceptions are unhandled, by leaving the
GDT/IDT in place all the way into the relocate_kernel() function, until
the moment that %cr3 gets replaced.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250326142404.256980-4-dwmw2@infradead.org
|
|
This supports the same 32-bit MMIO-mapped 8250 as the early_printk code.
It's not clear why the early_printk code supports this form and only this
form; the actual runtime 8250_pci doesn't seem to support it. But having
hacked up QEMU to expose such a device, early_printk does work with it,
and now so does the kexec debug code.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250326142404.256980-3-dwmw2@infradead.org
|
|
If a serial port was configured for early_printk, use it for debug output
from the relocate_kernel exception handler too.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250326142404.256980-2-dwmw2@infradead.org
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the customary type used for hardware ABIs.
Suggested-by: Xin Li <xin@zytor.com>
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In <asm/msr.h> the first parameter of do_trace_rdpmc() is named 'msr':
extern void do_trace_rdpmc(unsigned int msr, u64 val, int failed);
But in the definition it's 'counter':
void do_trace_rdpmc(unsigned counter, u64 val, int failed)
Use 'msr' in both cases, and change the type to u32.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The paravirt_read_pmc() result is in fact only loaded into an u64 variable.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Also fix some nearby whitespace damage while at it.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There's 9 uses of 'unsigned long long' in <asm/msr.h>, which is
really the same as 'u64', which is used 34 times.
Standardize on u64.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
GCC PR82602 that caused invalid scheduling of volatile asms:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82602
was fixed for gcc-8.1.0, the current minimum version of the
compiler required to compile the kernel.
Remove workaround that prevented invalid scheduling for
compilers, affected by PR82602.
There were no differences between old and new kernel object file
when compiled for x86_64 defconfig with gcc-8.1.0.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20250407112316.378347-1-ubizjak@gmail.com
|
|
There are 4 wariants of initmem_init(), for 32 and 64 bits and for
CONFIG_NUMA enabled and disabled.
After commit bbeb69ce3013 ("x86/mm: Remove CONFIG_HIGHMEM64G support")
NUMA is not supported on 32 bit kernels anymore, and
arch/x86/mm/numa_32.c can be just deleted and setup_bootmem_allocator()
with completely misleading name can be folded into initmem_init().
For 64 bits the NUMA variant calls x86_numa_init() and !NUMA variant
sets all memory to node 0. The later can be split out into inline helper
called x86_numa_init() and then both initmem_init() functions become the
same.
Split out memblock_set_node() from initmem_init() for !NUMA on 64 bit
into x86_numa_init() helper and remove arch/x86/mm/numa_*.c that only
contained initmem_init() variants for NUMA configs.
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <len.brown@intel.com>
Link: https://lore.kernel.org/r/20250409122815.420041-1-rppt@kernel.org
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This works around what seems to be an optimization bug in GCC (at least
13.3.0), where it predicts access_ok() to fail despite the hint to the
contrary.
_copy_to_user() contains:
if (access_ok(to, n)) {
instrument_copy_to_user(to, from, n);
n = raw_copy_to_user(to, from, n);
}
Where access_ok() is likely(__access_ok(addr, size)), yet the compiler
emits conditional jumps forward for the case where it succeeds:
<+0>: endbr64
<+4>: mov %rdx,%rcx
<+7>: mov %rdx,%rax
<+10>: xor %edx,%edx
<+12>: add %rdi,%rcx
<+15>: setb %dl
<+18>: movabs $0x123456789abcdef,%r8
<+28>: test %rdx,%rdx
<+31>: jne 0xffffffff81b3b7c6 <_copy_to_user+38>
<+33>: cmp %rcx,%r8
<+36>: jae 0xffffffff81b3b7cb <_copy_to_user+43>
<+38>: jmp 0xffffffff822673e0 <__x86_return_thunk>
<+43>: nop
<+44>: nop
<+45>: nop
<+46>: mov %rax,%rcx
<+49>: rep movsb %ds:(%rsi),%es:(%rdi)
<+51>: nop
<+52>: nop
<+53>: nop
<+54>: mov %rcx,%rax
<+57>: nop
<+58>: nop
<+59>: nop
<+60>: jmp 0xffffffff822673e0 <__x86_return_thunk>
Patching _copy_to_user() to likely() around the access_ok() use does
not change the asm.
However, spelling out the prediction *within* valid_user_address() does the
trick:
<+0>: endbr64
<+4>: xor %eax,%eax
<+6>: mov %rdx,%rcx
<+9>: add %rdi,%rdx
<+12>: setb %al
<+15>: movabs $0x123456789abcdef,%r8
<+25>: test %rax,%rax
<+28>: jne 0xffffffff81b315e6 <_copy_to_user+54>
<+30>: cmp %rdx,%r8
<+33>: jb 0xffffffff81b315e6 <_copy_to_user+54>
<+35>: nop
<+36>: nop
<+37>: nop
<+38>: rep movsb %ds:(%rsi),%es:(%rdi)
<+40>: nop
<+41>: nop
<+42>: nop
<+43>: nop
<+44>: nop
<+45>: nop
<+46>: mov %rcx,%rax
<+49>: jmp 0xffffffff82255ba0 <__x86_return_thunk>
<+54>: mov %rcx,%rax
<+57>: jmp 0xffffffff82255ba0 <__x86_return_thunk>
Since we kinda expect valid_user_address() to be likely anyway,
add the likely() annotation that also happens to work around
this compiler bug.
[ mingo: Moved the unlikely() branch into valid_user_address() & updated the changelog ]
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250401203029.1132135-1-mjguzik@gmail.com
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Todd reported, and Len confirmed, that commit 582077c94052 ("x86/cfi:
Clean up linkage") broke S4 hiberate on a fair number of machines.
Turns out these machines trip #CP when trying to restore the image.
As it happens, the commit in question removes two ENDBR instructions
in the hibernate code, and clearly got it wrong.
Notably restore_image() does an indirect jump to
relocated_restore_code(), which is a relocated copy of
core_restore_code().
In turn, core_restore_code(), will at the end do an indirect jump to
restore_jump_address (r8), which is pointing at a relocated
restore_registers().
So both sites do indeed need to be ENDBR.
Fixes: 582077c94052 ("x86/cfi: Clean up linkage")
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Tested-by: Len Brown <len.brown@intel.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219998
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219998
|
|
Complete the AMD CPUID(4) emulation logic, which uses CPUID(0x80000006)
for L2/L3 cache info and an assocs[] associativity mapping array, by
adding entries for 3-way caches and 6-way caches.
Properly handle the case where CPUID(0x80000006) returns an L2/L3
associativity of 9. This is not real associativity, but a marker to
indicate that the respective L2/L3 cache information should be retrieved
from CPUID(0x8000001d) instead. If such a marker is encountered, return
early from legacy_amd_cpuid4(), thus effectively emulating an "invalid
index" CPUID(4) response with a cache type of zero.
When checking if CPUID(0x80000006) L2/L3 cache info output is valid, and
given the associtivity marker 9 above, do not just check if the whole
ECX/EDX register is zero. Rather, check if the associativity is zero or
9. An associativity of zero implies no L2/L3 cache, which make it the
more correct check anyway vs. a zero check of the whole output register.
Fixes: a326e948c538 ("x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: x86-cpuid@lists.linux.dev
Link: https://lore.kernel.org/r/20250409122233.1058601-3-darwi@linutronix.de
|
|
For the AMD CPUID(4) emulation cache info logic, the same associativity
mapping array, assocs[], is used for both CPUID(0x80000005) and
CPUID(0x80000006).
This is incorrect since per the AMD manuals, the mappings for
CPUID(0x80000005) L1d/L1i associativity is:
n = 0x1 -> 0xfe n
n = 0xff fully associative
while assocs[] maps these values to:
n = 0x1, 0x2, 0x4 n
n = 0x3, 0x7, 0x9 0
n = 0x6 8
n = 0x8 16
n = 0xa 32
n = 0xb 48
n = 0xc 64
n = 0xd 96
n = 0xe 128
n = 0xf fully associative
which is only valid for CPUID(0x80000006).
Parse CPUID(0x80000005) L1d/L1i associativity values as shown in the AMD
manuals. Since the 0xffff literal is used to denote full associativity
at the AMD CPUID(4)-emulation logic, define AMD_CPUID4_FULLY_ASSOCIATIVE
for it instead of spreading that literal in more places.
Mark the assocs[] mapping array as only valid for CPUID(0x80000006) L2/L3
cache information.
Fixes: a326e948c538 ("x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: x86-cpuid@lists.linux.dev
Link: https://lore.kernel.org/r/20250409122233.1058601-2-darwi@linutronix.de
|
|
The NULL array terminator at the end of erratum_1386_microcode was
removed during the switch from x86_cpu_desc to x86_cpu_id. This
causes readers to run off the end of the array.
Replace the NULL.
Fixes: f3f325152673 ("x86/cpu: Move AMD erratum 1386 table over to 'x86_cpu_id'")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
|
|
The chacha_use_simd static branch is required for x86 machines that
lack SSSE3 support. Restore it and the generic fallback code.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Fixes: 9b4400215e0e ("crypto: x86/chacha - Remove SIMD fallback path")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|