Age | Commit message (Collapse) | Author |
|
When decoding an instruction or handling a perf event that references an
LDT segment, if we don't have a valid user context, trying to access the
LDT by any means other than SLDT is racy. Certainly, using
current->active_mm is wrong, as active_mm can point to a real user mm when
CR3 and LDTR no longer reference that mm.
Clean up the code. If nmi_uaccess_okay() says we don't have a valid
context, just fail. Otherwise use current->mm.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20250402094540.3586683-3-mingo@kernel.org
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Xin Li <xin@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
platforms
This change alters the PowerPC and x86 driver implementations to record
the last sample period before the event is updated for the next period.
A common pattern in PMU driver implementations is to have a
"*_event_set_period" function which takes care of updating the various
period-related fields in a perf_event structure. In most cases, the
drivers choose to call this function after initializing a sample data
structure with perf_sample_data_init. The x86 and PowerPC drivers
deviate from this, choosing to update the period before initializing the
sample data. When using an event with an alternate sample period, this
causes an incorrect period to be written to the sample data that gets
reported to userspace.
Signed-off-by: Mark Barnett <mark.barnett@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250408171530.140858-2-mark.barnett@arm.com
|
|
Rename struct bts_buffer objects from 'buf' to 'bb' to improve the
readability when accessing the structure's 'buf' member. For example,
'buf->buf[]' becomes 'bb->buf[]'.
Indent line 327 using tabs to silence a checkpatch warning.
No functional changes intended.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250407085253.742834-2-thorsten.blum@linux.dev
|
|
The relative rates among two or more events are useful for performance
analysis, e.g., a high branch miss rate may indicate a performance
issue. Usually, the samples with a relative rate that exceeds some
threshold are more useful. However, the traditional sampling takes
samples of events separately. To get the relative rates among two or
more events, a high sample rate is required, which can bring high
overhead. Many samples taken in the non-hotspot area are also dropped
(useless) in the post-process.
The auto counter reload (ACR) feature takes samples when the relative
rate of two or more events exceeds some threshold, which provides the
fine-grained information at a low cost.
To support the feature, two sets of MSRs are introduced. For a given
counter IA32_PMC_GPn_CTR/IA32_PMC_FXm_CTR, bit fields in the
IA32_PMC_GPn_CFG_B/IA32_PMC_FXm_CFG_B MSR indicate which counter(s)
can cause a reload of that counter. The reload value is stored in the
IA32_PMC_GPn_CFG_C/IA32_PMC_FXm_CFG_C.
The details can be found at Intel SDM (085), Volume 3, 21.9.11 Auto
Counter Reload.
In the hw_config(), an ACR event is specially configured, because the
cause/reloadable counter mask has to be applied to the dyn_constraint.
Besides the HW limit, e.g., not support perf metrics, PDist and etc, a
SW limit is applied as well. ACR events in a group must be contiguous.
It facilitates the later conversion from the event idx to the counter
idx. Otherwise, the intel_pmu_acr_late_setup() has to traverse the whole
event list again to find the "cause" event.
Also, add a new flag PERF_X86_EVENT_ACR to indicate an ACR group, which
is set to the group leader.
The late setup() is also required for an ACR group. It's to convert the
event idx to the counter idx, and saved it in hw.config1.
The ACR configuration MSRs are only updated in the enable_event().
The disable_event() doesn't clear the ACR CFG register.
Add acr_cfg_b/acr_cfg_c in the struct cpu_hw_events to cache the MSR
values. It can avoid a MSR write if the value is not changed.
Expose an acr_mask to the sysfs. The perf tool can utilize the new
format to configure the relation of events in the group. The bit
sequence of the acr_mask follows the events enabled order of the group.
Example:
Here is the snippet of the mispredict.c. Since the array has a random
numbers, jumps are random and often mispredicted.
The mispredicted rate depends on the compared value.
For the Loop1, ~11% of all branches are mispredicted.
For the Loop2, ~21% of all branches are mispredicted.
main()
{
...
for (i = 0; i < N; i++)
data[i] = rand() % 256;
...
/* Loop 1 */
for (k = 0; k < 50; k++)
for (i = 0; i < N; i++)
if (data[i] >= 64)
sum += data[i];
...
...
/* Loop 2 */
for (k = 0; k < 50; k++)
for (i = 0; i < N; i++)
if (data[i] >= 128)
sum += data[i];
...
}
Usually, a code with a high branch miss rate means a bad performance.
To understand the branch miss rate of the codes, the traditional method
usually samples both branches and branch-misses events. E.g.,
perf record -e "{cpu_atom/branch-misses/ppu, cpu_atom/branch-instructions/u}"
-c 1000000 -- ./mispredict
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.925 MB perf.data (5106 samples) ]
The 5106 samples are from both events and spread in both Loops.
In the post-process stage, a user can know that the Loop 2 has a 21%
branch miss rate. Then they can focus on the samples of branch-misses
events for the Loop 2.
With this patch, the user can generate the samples only when the branch
miss rate > 20%. For example,
perf record -e "{cpu_atom/branch-misses,period=200000,acr_mask=0x2/ppu,
cpu_atom/branch-instructions,period=1000000,acr_mask=0x3/u}"
-- ./mispredict
(Two different periods are applied to branch-misses and
branch-instructions. The ratio is set to 20%.
If the branch-instructions is overflowed first, the branch-miss
rate < 20%. No samples should be generated. All counters should be
automatically reloaded.
If the branch-misses is overflowed first, the branch-miss rate > 20%.
A sample triggered by the branch-misses event should be
generated. Just the counter of the branch-instructions should be
automatically reloaded.
The branch-misses event should only be automatically reloaded when
the branch-instructions is overflowed. So the "cause" event is the
branch-instructions event. The acr_mask is set to 0x2, since the
event index in the group of branch-instructions is 1.
The branch-instructions event is automatically reloaded no matter which
events are overflowed. So the "cause" events are the branch-misses
and the branch-instructions event. The acr_mask should be set to 0x3.)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.098 MB perf.data (2498 samples) ]
$perf report
Percent │154: movl $0x0,-0x14(%rbp)
│ ↓ jmp 1af
│ for (i = j; i < N; i++)
│15d: mov -0x10(%rbp),%eax
│ mov %eax,-0x18(%rbp)
│ ↓ jmp 1a2
│ if (data[i] >= 128)
│165: mov -0x18(%rbp),%eax
│ cltq
│ lea 0x0(,%rax,4),%rdx
│ mov -0x8(%rbp),%rax
│ add %rdx,%rax
│ mov (%rax),%eax
│ ┌──cmp $0x7f,%eax
100.00 0.00 │ ├──jle 19e
│ │sum += data[i];
The 2498 samples are all from the branch-misses events for the Loop 2.
The number of samples and overhead is significantly reduced without
losing any information.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-6-kan.liang@linux.intel.com
|
|
The counters that support the auto counter reload feature can be
enumerated in the CPUID Leaf 0x23 sub-leaf 0x2.
Add acr_cntr_mask to store the mask of counters which are reloadable.
Add acr_cause_mask to store the mask of counters which can cause reload.
Since the e-core and p-core may have different numbers of counters,
track the masks in the struct x86_hybrid_pmu as well.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-5-kan.liang@linux.intel.com
|
|
The auto counter reload feature requires an event flag to indicate an
auto counter reload group, which can only be scheduled on specific
counters that enumerated in CPUID. However, the hw_perf_event.flags has
run out on X86.
Two solutions were considered to address the issue.
- Currently, 20 bits are reserved for the architecture-specific flags.
Only the bit 31 is used for the generic flag. There is still plenty
of space left. Reserve 8 more bits for the arch-specific flags.
- Add a new X86 specific hw_perf_event.flags1 to support more flags.
The former is implemented. Enough room is still left in the global
generic flag.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-4-kan.liang@linux.intel.com
|
|
When a machine supports PEBS v6, perf unconditionally searches the
cpuc->event_list[] for every event and check if the late setup is
required, which is unnecessary.
The late setup is only required for special events, e.g., events support
counters snapshotting feature. Add n_late_setup to track the num of
events that needs the late setup.
Other features, e.g., auto counter reload feature, require the late
setup as well. Add a wrapper, intel_pmu_pebs_late_setup, for the events
that support counters snapshotting feature.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-3-kan.liang@linux.intel.com
|
|
More and more features require a dynamic event constraint, e.g., branch
counter logging, auto counter reload, Arch PEBS, etc.
Add a generic flag, PMU_FL_DYN_CONSTRAINT, to indicate the case. It
avoids keeping adding the individual flag in intel_cpuc_prepare().
Add a variable dyn_constraint in the struct hw_perf_event to track the
dynamic constraint of the event. Apply it if it's updated.
Apply the generic dynamic constraint for branch counter logging.
Many features on and after V6 require dynamic constraint. So
unconditionally set the flag for V6+.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lkml.kernel.org/r/20250327195217.2683619-2-kan.liang@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Various minor updates to the LSM Rust bindings
Changes include marking trivial Rust bindings as inlines and comment
tweaks to better reflect the LSM hooks.
- Add LSM/SELinux access controls to io_uring_allowed()
Similar to the io_uring_disabled sysctl, add a LSM hook to
io_uring_allowed() to enable LSMs a simple way to enforce security
policy on the use of io_uring. This pull request includes SELinux
support for this new control using the io_uring/allowed permission.
- Remove an unused parameter from the security_perf_event_open() hook
The perf_event_attr struct parameter was not used by any currently
supported LSMs, remove it from the hook.
- Add an explicit MAINTAINERS entry for the credentials code
We've seen problems in the past where patches to the credentials code
sent by non-maintainers would often languish on the lists for
multiple months as there was no one explicitly tasked with the
responsibility of reviewing and/or merging credentials related code.
Considering that most of the code under security/ has a vested
interest in ensuring that the credentials code is well maintained,
I'm volunteering to look after the credentials code and Serge Hallyn
has also volunteered to step up as an official reviewer. I posted the
MAINTAINERS update as a RFC to LKML in hopes that someone else would
jump up with an "I'll do it!", but beyond Serge it was all crickets.
- Update Stephen Smalley's old email address to prevent confusion
This includes a corresponding update to the mailmap file.
* tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
mailmap: map Stephen Smalley's old email addresses
lsm: remove old email address for Stephen Smalley
MAINTAINERS: add Serge Hallyn as a credentials reviewer
MAINTAINERS: add an explicit credentials entry
cred,rust: mark Credential methods inline
lsm,rust: reword "destroy" -> "release" in SecurityCtx
lsm,rust: mark SecurityCtx methods inline
perf: Remove unnecessary parameter of security check
lsm: fix a missing security_uring_allowed() prototype
io_uring,lsm,selinux: add LSM hooks for io_uring_setup()
io_uring: refactor io_uring_allowed()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
the callback pointer. This turned out to be suboptimal for the
upcoming Rust integration and is obviously a silly implementation to
begin with.
This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
with hrtimer_setup(T, cb);
The conversion was done with Coccinelle and a few manual fixups.
Once the conversion has completely landed in mainline, hrtimer_init()
will be removed and the hrtimer::function becomes a private member"
* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
wifi: rt2x00: Switch to use hrtimer_update_function()
io_uring: Use helper function hrtimer_update_function()
serial: xilinx_uartps: Use helper function hrtimer_update_function()
ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
RDMA: Switch to use hrtimer_setup()
virtio: mem: Switch to use hrtimer_setup()
drm/vmwgfx: Switch to use hrtimer_setup()
drm/xe/oa: Switch to use hrtimer_setup()
drm/vkms: Switch to use hrtimer_setup()
drm/msm: Switch to use hrtimer_setup()
drm/i915/request: Switch to use hrtimer_setup()
drm/i915/uncore: Switch to use hrtimer_setup()
drm/i915/pmu: Switch to use hrtimer_setup()
drm/i915/perf: Switch to use hrtimer_setup()
drm/i915/gvt: Switch to use hrtimer_setup()
drm/i915/huc: Switch to use hrtimer_setup()
drm/amdgpu: Switch to use hrtimer_setup()
stm class: heartbeat: Switch to use hrtimer_setup()
i2c: Switch to use hrtimer_setup()
iio: Switch to use hrtimer_setup()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
"x86 CPU features support:
- Generate the <asm/cpufeaturemasks.h> header based on build config
(H. Peter Anvin, Xin Li)
- x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
- Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
- Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
Jackman)
- Utilize CPU-type for CPU matching (Pawan Gupta)
- Warn about unmet CPU feature dependencies (Sohil Mehta)
- Prepare for new Intel Family numbers (Sohil Mehta)
Percpu code:
- Standardize & reorganize the x86 percpu layout and related cleanups
(Brian Gerst)
- Convert the stackprotector canary to a regular percpu variable
(Brian Gerst)
- Add a percpu subsection for cache hot data (Brian Gerst)
- Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
- Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
MM:
- Add support for broadcast TLB invalidation using AMD's INVLPGB
instruction (Rik van Riel)
- Rework ROX cache to avoid writable copy (Mike Rapoport)
- PAT: restore large ROX pages after fragmentation (Kirill A.
Shutemov, Mike Rapoport)
- Make memremap(MEMREMAP_WB) map memory as encrypted by default
(Kirill A. Shutemov)
- Robustify page table initialization (Kirill A. Shutemov)
- Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
- Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
(Matthew Wilcox)
KASLR:
- x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
Singh)
CPU bugs:
- Implement FineIBT-BHI mitigation (Peter Zijlstra)
- speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
- speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
Gupta)
- RFDS: Exclude P-only parts from the RFDS affected list (Pawan
Gupta)
System calls:
- Break up entry/common.c (Brian Gerst)
- Move sysctls into arch/x86 (Joel Granados)
Intel LAM support updates: (Maciej Wieczor-Retman)
- selftests/lam: Move cpu_has_la57() to use cpuinfo flag
- selftests/lam: Skip test if LAM is disabled
- selftests/lam: Test get_user() LAM pointer handling
AMD SMN access updates:
- Add SMN offsets to exclusive region access (Mario Limonciello)
- Add support for debugfs access to SMN registers (Mario Limonciello)
- Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
Power management updates: (Patryk Wlazlyn)
- Allow calling mwait_play_dead with an arbitrary hint
- ACPI/processor_idle: Add FFH state handling
- intel_idle: Provide the default enter_dead() handler
- Eliminate mwait_play_dead_cpuid_hint()
Build system:
- Raise the minimum GCC version to 8.1 (Brian Gerst)
- Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)
Kconfig: (Arnd Bergmann)
- Add cmpxchg8b support back to Geode CPUs
- Drop 32-bit "bigsmp" machine support
- Rework CONFIG_GENERIC_CPU compiler flags
- Drop configuration options for early 64-bit CPUs
- Remove CONFIG_HIGHMEM64G support
- Drop CONFIG_SWIOTLB for PAE
- Drop support for CONFIG_HIGHPTE
- Document CONFIG_X86_INTEL_MID as 64-bit-only
- Remove old STA2x11 support
- Only allow CONFIG_EISA for 32-bit
Headers:
- Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
headers (Thomas Huth)
Assembly code & machine code patching:
- x86/alternatives: Simplify alternative_call() interface (Josh
Poimboeuf)
- x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
- KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
- x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
- x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
- x86/kexec: Merge x86_32 and x86_64 code using macros from
<asm/asm.h> (Uros Bizjak)
- Use named operands in inline asm (Uros Bizjak)
- Improve performance by using asm_inline() for atomic locking
instructions (Uros Bizjak)
Earlyprintk:
- Harden early_serial (Peter Zijlstra)
NMI handler:
- Add an emergency handler in nmi_desc & use it in
nmi_shootdown_cpus() (Waiman Long)
Miscellaneous fixes and cleanups:
- by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
Kuznetsov, Xin Li, liuye"
* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
x86/asm: Make asm export of __ref_stack_chk_guard unconditional
x86/mm: Only do broadcast flush from reclaim if pages were unmapped
perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
x86/asm: Use asm_inline() instead of asm() in clwb()
x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
x86/hweight: Use asm_inline() instead of asm()
x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
x86/hweight: Use named operands in inline asm()
x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
x86/head/64: Avoid Clang < 17 stack protector in startup code
x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
x86/cpu/intel: Limit the non-architectural constant_tsc model checks
x86/mm/pat: Replace Intel x86_model checks with VFM ones
x86/cpu/intel: Fix fast string initialization for extended Families
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Core:
- Move perf_event sysctls into kernel/events/ (Joel Granados)
- Use POLLHUP for pinned events in error (Namhyung Kim)
- Avoid the read if the count is already updated (Peter Zijlstra)
- Allow the EPOLLRDNORM flag for poll (Tao Chen)
- locking/percpu-rwsem: Add guard support [ NOTE: this got
(mis-)merged into the perf tree due to related work ] (Peter
Zijlstra)
perf_pmu_unregister() related improvements: (Peter Zijlstra)
- Simplify the perf_event_alloc() error path
- Simplify the perf_pmu_register() error path
- Simplify perf_pmu_register()
- Simplify perf_init_event()
- Simplify perf_event_alloc()
- Merge struct pmu::pmu_disable_count into struct
perf_cpu_pmu_context::pmu_disable_count
- Add this_cpc() helper
- Introduce perf_free_addr_filters()
- Robustify perf_event_free_bpf_prog()
- Simplify the perf_mmap() control flow
- Further simplify perf_mmap()
- Remove retry loop from perf_mmap()
- Lift event->mmap_mutex in perf_mmap()
- Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes
- Fix perf_mmap() failure path
Uprobes:
- Harden x86 uretprobe syscall trampoline check (Jiri Olsa)
- Remove redundant spinlock in uprobe_deny_signal() (Liao Chang)
- Remove the spinlock within handle_singlestep() (Liao Chang)
x86 Intel PMU enhancements:
- Support PEBS counters snapshotting (Kan Liang)
- Fix intel_pmu_read_event() (Kan Liang)
- Extend per event callchain limit to branch stack (Kan Liang)
- Fix system-wide LBR profiling (Kan Liang)
- Allocate bts_ctx only if necessary (Li RongQing)
- Apply static call for drain_pebs (Peter Zijlstra)
x86 AMD PMU enhancements: (Ravi Bangoria)
- Remove pointless sample period check
- Fix ->config to sample period calculation for OP PMU
- Fix perf_ibs_op.cnt_mask for CurCnt
- Don't allow freq mode event creation through ->config interface
- Add PMU specific minimum period
- Add ->check_period() callback
- Ceil sample_period to min_period
- Add support for OP Load Latency Filtering
- Update DTLB/PageSize decode logic
Hardware breakpoints:
- Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar
Bhaskar)
Hardlockup detector improvements: (Li Huafei)
- perf_event memory leak
- Warn if watchdog_ev is leaked
Fixes and cleanups:
- Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter
Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan)"
* tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
perf: Fix __percpu annotation
perf: Clean up pmu specific data
perf/x86: Remove swap_task_ctx()
perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide mode
perf: Supply task information to sched_task()
perf: attach/detach PMU specific data
locking/percpu-rwsem: Add guard support
perf: Save PMU specific data in task_struct
perf: Extend per event callchain limit to branch stack
perf/ring_buffer: Allow the EPOLLRDNORM flag for poll
perf/core: Use POLLHUP for pinned events in error
perf/core: Use sysfs_emit() instead of scnprintf()
perf/core: Remove optional 'size' arguments from strscpy() calls
perf/x86/intel/bts: Check if bts_ctx is allocated when calling BTS functions
uprobes/x86: Harden uretprobe syscall trampoline check
watchdog/hardlockup/perf: Warn if watchdog_ev is leaked
watchdog/hardlockup/perf: Fix perf_event memory leak
perf/x86: Annotate struct bts_buffer::buf with __counted_by()
perf/core: Clean up perf_try_init_event()
perf/core: Fix perf_mmap() failure path
...
|
|
Although IBS "swfilt" can prevent leaking samples with kernel RIP to the
userspace, there are few subtle cases where a 'data' address and/or a
'branch target' address can fall under kernel address range although RIP
is from userspace. Prevent leaking kernel 'data' addresses by discarding
such samples when {exclude_kernel=1,swfilt=1}.
IBS can now be invoked by unprivileged user with the introduction of
"swfilt". However, this creates a loophole in the interface where an
unprivileged user can get physical address of the userspace virtual
addresses through IBS register raw dump (PERF_SAMPLE_RAW). Prevent this
as well.
This upstream commit fixed the most obvious leak:
65a99264f5e5 perf/x86: Check data address for IBS software filter
Follow that up with a more complete fix.
Fixes: d29e744c7167 ("perf/x86: Relax privilege filter restriction on AMD IBS")
Suggested-by: Matteo Rizzo <matteorizzo@google.com>
Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250321161251.1033-1-ravi.bangoria@amd.com
|
|
If init_rapl_pmu() fails while allocating memory for "rapl_pmu" objects,
we miss freeing the "rapl_pmus" object in the error path. Fix that.
Fixes: 9b99d65c0bb4 ("perf/x86/rapl: Move the pmu allocation out of CPU hotplug")
Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250320100617.4480-1-dhananjay.ugwekar@amd.com
|
|
Introduce a name for an old Pentium 4 model and replace the x86_model
checks with VFM ones. This gets rid of one of the last remaining
Intel-specific x86_model checks.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250318223828.2945651-3-sohil.mehta@intel.com
|
|
Architectural Perfmon was introduced on the Family 6 "Core" processors
starting with Yonah. Processors before Yonah need their own customized
PMU initialization.
p6_pmu_init() is expected to provide that initialization for early
Family 6 processors. But, currently, it could get called for any Family
6 processor if the architectural perfmon feature is disabled on that
processor. To simplify, restrict the P6 PMU initialization to early
Family 6 processors that do not have architectural perfmon support and
truly need the special handling.
As a result, the "unsupported" console print becomes practically
unreachable because all the released P6 processors are covered by the
switch cases. Move the console print to a common location where it can
cover all modern processors (including Family >15) that may not have
architectural perfmon support enumerated.
Also, use this opportunity to get rid of the unnecessary switch cases in
P6 initialization. Only the Pentium Pro processor needs a quirk, and the
rest of the processors do not need any special handling. The gaps in the
case numbers are only due to no processor with those model numbers being
released.
Use decimal numbers to represent Intel Family numbers. Also, convert one
of the last few Intel x86_model comparisons to a VFM-based one.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250318223828.2945651-2-sohil.mehta@intel.com
|
|
The IBS software filter is filtering kernel samples for regular users in
the PMI handler. It checks the instruction address in the IBS register to
determine if it was in kernel mode or not.
But it turns out that it's possible to report a kernel data address even
if the instruction address belongs to user-space. Matteo Rizzo
found that when an instruction raises an exception, IBS can report some
kernel data addresses like IDT while holding the faulting instruction's
RIP. To prevent an information leak, it should double check if the data
address in PERF_SAMPLE_DATA is in the kernel space as well.
[ mingo: Clarified the changelog ]
Suggested-by: Matteo Rizzo <matteorizzo@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250317163755.1842589-1-namhyung@kernel.org
|
|
The pmu specific data is saved in task_struct now. It doesn't need to
swap between context.
Remove swap_task_ctx() support.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250314172700.438923-6-kan.liang@linux.intel.com
|
|
In the system-wide mode, LBR callstacks are shorter in comparison to
the per-process mode.
LBR MSRs are reset during a context switch in the system-wide mode. For
the LBR call stack, the LBRs should be always saved/restored during a
context switch.
Use the space in task_struct to save/restore the LBR call stack data.
For a system-wide event, it's unnecessagy to update the
lbr_callstack_users for each threads. Add a variable in x86_pmu to
indicate whether the system-wide event is active.
Fixes: 76cb2c617f12 ("perf/x86/intel: Save/restore LBR stack during context switch")
Reported-by: Andi Kleen <ak@linux.intel.com>
Reported-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Debugged-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250314172700.438923-5-kan.liang@linux.intel.com
|
|
To save/restore LBR call stack data in system-wide mode, the task_struct
information is required.
Extend the parameters of sched_task() to supply task_struct information.
When schedule in, the LBR call stack data for new task will be restored.
When schedule out, the LBR call stack data for old task will be saved.
Only need to pass the required task_struct information.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250314172700.438923-4-kan.liang@linux.intel.com
|
|
bts_ctx might not be allocated, for example if the CPU has X86_FEATURE_PTI,
but intel_bts_disable/enable_local() and intel_bts_interrupt() are called
unconditionally from intel_pmu_handle_irq() and crash on bts_ctx.
So check if bts_ctx is allocated when calling BTS functions.
Fixes: 3acfcefa795c ("perf/x86/intel/bts: Allocate bts_ctx only if necessary")
Reported-by: Jiri Olsa <olsajiri@gmail.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250306051102.2642-1-lirongqing@baidu.com
|
|
Add the __counted_by() compiler attribute to the flexible array member
buf to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
No functional changes intended.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250305123134.215577-2-thorsten.blum@linux.dev
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
get_this_hybrid_cpu_type() misses a case when cpu-type is populated
regardless of X86_FEATURE_HYBRID_CPU. This is particularly true for hybrid
variants that have P or E cores fused off.
Instead use the cpu-type cached in struct x86_topology, as it does not rely
on hybrid feature to enumerate cpu-type. This can also help avoid the
model-specific fixup get_hybrid_cpu_type(). Also replace the
get_this_hybrid_cpu_native_id() with its cached value in struct
x86_topology.
While at it, remove enum hybrid_cpu_type as it serves no purpose when we
have the exact cpu-types defined in enum intel_cpu_type. Also rename
atom_native_id to intel_native_id and move it to intel-family.h where
intel_cpu_type lives.
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20241211-add-cpu-type-v5-3-2ae010f50370@linux.intel.com
|
|
We are going to apply a new series that conflicts with pending
work in x86/mm, so merge in x86/mm to avoid it, and also to
refresh the x86/cpu branch with fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It seems that the attr parameter was never been used in security
checks since it was first introduced by:
commit da97e18458fb ("perf_event: Add support for LSM and SELinux checks")
so remove it.
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Add Arrow Lake U model for RAPL:
$ ls -1 /sys/devices/power/events/
energy-cores
energy-cores.scale
energy-cores.unit
energy-gpu
energy-gpu.scale
energy-gpu.unit
energy-pkg
energy-pkg.scale
energy-pkg.unit
energy-psys
energy-psys.scale
energy-psys.unit
The same output as ArrowLake:
$ perf stat -a -I 1000 --per-socket -e power/energy-pkg/
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20241224145516.349028-1-aaron.ma@canonical.com
|
|
Freqency mode is the current default mode of Linux perf. A period of 1 is
used as a starting period. The period is auto-adjusted on each tick or an
overflow, to meet the frequency target.
The start period of 1 is too low and may trigger some issues:
- Many HWs do not support period 1 well.
https://lore.kernel.org/lkml/875xs2oh69.ffs@tglx/
- For an event that occurs frequently, period 1 is too far away from the
real period. Lots of samples are generated at the beginning.
The distribution of samples may not be even.
- A low starting period for frequently occurring events also challenges
virtualization, which has a longer path to handle a PMI.
The limit_period value only checks the minimum acceptable value for HW.
It cannot be used to set the start period, because some events may
need a very low period. The limit_period cannot be set too high. It
doesn't help with the events that occur frequently.
It's hard to find a universal starting period for all events. The idea
implemented by this patch is to only give an estimate for the popular
HW and HW cache events. For the rest of the events, start from the lowest
possible recommended value.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250117151913.3043942-3-kan.liang@linux.intel.com
|
|
Perf doesn't work at low frequencies:
$ perf record -e cpu_core/instructions/ppp -F 120
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (cpu_core/instructions/ppp).
"dmesg | grep -i perf" may provide additional information.
The limit_period() check avoids a low sampling period on a counter. It
doesn't intend to limit the frequency.
The check in the x86_pmu_hw_config() should be limited to non-freq mode.
The attr.sample_period and attr.sample_freq are union. The
attr.sample_period should not be used to indicate the frequency mode.
Fixes: c46e665f0377 ("perf/x86: Add INST_RETIRED.ALL workarounds")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250117151913.3043942-1-kan.liang@linux.intel.com
Closes: https://lore.kernel.org/lkml/20250115154949.3147-1-ravi.bangoria@amd.com/
|
|
Avoid unnecessary per-CPU memory allocation on unsupported CPUs,
this can save 12K memory for each CPU
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20250122074103.3091-1-lirongqing@baidu.com
|
|
new patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
According to the latest event list, update the event constraint tables
for Lion Cove core.
The general rule (the event codes < 0x90 are restricted to counters
0-3.) has been removed. There is no restriction for most of the
performance monitoring events.
Fixes: a932aa0e868f ("perf/x86: Add Lunar Lake and Arrow Lake support")
Reported-by: Amiri Khalil <amiri.khalil@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20250219141005.2446823-1-kan.liang@linux.intel.com
|
|
Pick up upstream x86 fixes before applying new patches.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
hrtimer_setup() takes the callback function pointer as argument and
initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of
hrtimer::function with the new setup mechanism.
Patch was created by using Coccinelle.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/e84ad5a660b8e10867e547db8e64f7e99c48ebba.1738746821.git.namcao@linutronix.de
|
|
IBS Op PMU on Zen5 reports DTLB and page size information differently
compared to prior generation. The change is enumerated by
CPUID_Fn8000001B_EAX[19].
IBS_OP_DATA3 Zen3/4 Zen5
----------------------------------------------------------------
19 IbsDcL2TlbHit1G Reserved
----------------------------------------------------------------
6 IbsDcL2tlbHit2M Reserved
----------------------------------------------------------------
5 IbsDcL1TlbHit1G PageSize:
4 IbsDcL1TlbHit2M 0 - 4K
1 - 2M
2 - 1G
3 - Reserved
Valid only if
IbsDcPhyAddrValid = 1
----------------------------------------------------------------
3 IbsDcL2TlbMiss IbsDcL2TlbMiss
Valid only if
IbsDcPhyAddrValid = 1
----------------------------------------------------------------
2 IbsDcL1tlbMiss IbsDcL1tlbMiss
Valid only if
IbsDcPhyAddrValid = 1
----------------------------------------------------------------
o Currently, only bit 2 and 3 are interpreted by IBS NMI handler for
PERF_SAMPLE_DATA_SRC. Add dependency on IbsDcPhyAddrValid for those
bits.
o Introduce new IBS Op PMU capability and expose it to userspace via
PMU's sysfs directory.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250205060547.1337-3-ravi.bangoria@amd.com
|
|
IBS Op PMU on Zen5 uarch added new Load Latency filtering capability. It's
advertised by CPUID_Fn8000001B_EAX bit 12. When enabled, IBS HW will raise
interrupt only for sample that had an IbsDcMissLat value greater than N
cycles, where N is a programmable value defined as multiples of 128 (i.e.
128, 256, 384 etc.) from 128-2048 cycles. Similar to L3MissOnly, IBS HW
internally drops the sample and restarts if the sample does not meet the
filtering criteria.
Add support for LdLat filtering in IBS Op PMU. Since hardware supports
threshold in multiple of 128, add a software filter on top to support
latency threshold with the granularity of 1 cycle between [128-2048].
Example usage:
# perf record -a -e ibs_op/ldlat=128/ -- sleep 5
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250205060547.1337-2-ravi.bangoria@amd.com
|
|
Pretty much every caller of is_endbr() actually wants to test something at an
address and ends up doing get_kernel_nofault(). Fold the lot into a more
convenient helper.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250207122546.181367417@infradead.org
|
|
The init_one_iommu() takes an unsigned int argument that can't be checked for
the boundaries at compile time and GCC complains about that when build with
`make W=1`:
arch/x86/events/amd/iommu.c:441:53: note: directive argument in the range [0, 4294967294]
arch/x86/events/amd/iommu.c:441:9: note: ‘snprintf’ output between 12 and 21 bytes into a destination of size 16
Increase the size to cover all possible cases.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250210193412.483233-1-andriy.shevchenko@linux.intel.com
|
|
Explicitly clear DEBUGCTL.LBR when a CPU is starting, prior to purging the
LBR MSRs themselves, as at least one system has been found to transfer
control to the kernel with LBRs enabled (it's unclear whether it's a BIOS
flaw or a CPU goof). Because the kernel preserves the original DEBUGCTL,
even when toggling LBRs, leaving DEBUGCTL.LBR as is results in running
with LBRs enabled at all times.
Closes: https://lore.kernel.org/all/c9d8269bff69f6359731d758e3b1135dedd7cc61.camel@redhat.com
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20250131010721.470503-1-seanjc@google.com
|
|
The EAX of the CPUID Leaf 023H enumerates the mask of valid sub-leaves.
To tell the availability of the sub-leaf 1 (enumerate the counter mask),
perf should check the bit 1 (0x2) of EAS, rather than bit 0 (0x1).
The error is not user-visible on bare metal. Because the sub-leaf 0 and
the sub-leaf 1 are always available. However, it may bring issues in a
virtualization environment when a VMM only enumerates the sub-leaf 0.
Introduce the cpuid35_e?x to replace the macros, which makes the
implementation style consistent.
Fixes: eb467aaac21e ("perf/x86/intel: Support Architectural PerfMon Extension leaf")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20250129154820.3755948-3-kan.liang@linux.intel.com
|
|
The PEBS-via-PT feature is exposed for the e-core of some hybrid
platforms, e.g., ADL and MTL. But it never works.
$ dmesg | grep PEBS
[ 1.793888] core: cpu_atom PMU driver: PEBS-via-PT
$ perf record -c 1000 -e '{intel_pt/branch=0/,
cpu_atom/cpu-cycles,aux-output/pp}' -C8
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (cpu_atom/cpu-cycles,aux-output/pp).
"dmesg | grep -i perf" may provide additional information.
The "PEBS-via-PT" is printed if the corresponding bit of per-PMU
capabilities is set. Since the feature is supported by the e-core HW,
perf sets the bit for e-core. However, for Intel PT, if a feature is not
supported on all CPUs, it is not supported at all. The PEBS-via-PT event
cannot be created successfully.
The PEBS-via-PT is no longer enumerated on the latest hybrid platform. It
will be deprecated on future platforms with Arch PEBS. Let's remove it
from the existing hybrid platforms.
Fixes: d9977c43bff8 ("perf/x86: Register hybrid PMUs")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250129154820.3755948-2-kan.liang@linux.intel.com
|