summaryrefslogtreecommitdiff
path: root/arch/x86/events
AgeCommit message (Collapse)Author
2024-06-17perf/x86/uncore: Apply the unit control RB tree to PCI uncore unitsKan Liang
The unit control RB tree has the unit control and unit ID information for all the PCI units. Use them to replace the box_ctls/pci_offsets to get an accurate unit control address for PCI uncore units. The UPI/M3UPI units in the discovery table are ignored. Please see the commit 65248a9a9ee1 ("perf/x86/uncore: Add a quirk for UPI on SPR"). Manually allocate a unit control RB tree for UPI/M3UPI. Add cleanup_extra_boxes to release such manual allocation. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-7-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Apply the unit control RB tree to MSR uncore unitsKan Liang
The unit control RB tree has the unit control and unit ID information for all the MSR units. Use them to replace the box_ctl and uncore_msr_box_ctl() to get an accurate unit control address for MSR uncore units. Add intel_generic_uncore_assign_hw_event(), which utilizes the accurate unit control address from the unit control RB tree to calculate the config_base and event_base. The unit id related information should be retrieved from the unit control RB tree as well. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-6-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Apply the unit control RB tree to MMIO uncore unitsKan Liang
The unit control RB tree has the unit control and unit ID information for all the units. Use it to replace the box_ctls/mmio_offsets to get an accurate unit control address for MMIO uncore units. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-5-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Retrieve the unit ID from the unit control RB treeKan Liang
The box_ids only save the unit ID for the first die. If a unit, e.g., a CXL unit, doesn't exist in the first die. The unit ID cannot be retrieved. The unit control RB tree also stores the unit ID information. Retrieve the unit ID from the unit control RB tree Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-4-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Support per PMU cpumaskKan Liang
The cpumask of some uncore units, e.g., CXL uncore units, may be wrong under some configurations. Perf may access an uncore counter of a non-existent uncore unit. The uncore driver assumes that all uncore units are symmetric among dies. A global cpumask is shared among all uncore PMUs. However, some CXL uncore units may only be available on some dies. A per PMU cpumask is introduced to track the CPU mask of this PMU. The driver searches the unit control RB tree to check whether the PMU is available on a given die, and updates the per PMU cpumask accordingly. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-3-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Save the unit control address of all unitsKan Liang
The unit control address of some CXL units may be wrongly calculated under some configuration on a EMR machine. The current implementation only saves the unit control address of the units from the first die, and the first unit of the rest of dies. Perf assumed that the units from the other dies have the same offset as the first die. So the unit control address of the rest of the units can be calculated. However, the assumption is wrong, especially for the CXL units. Introduce an RB tree for each uncore type to save the unit control address and three kinds of ID information (unit ID, PMU ID, and die ID) for all units. The unit ID is a physical ID of a unit. The PMU ID is a logical ID assigned to a unit. The logical IDs start from 0 and must be contiguous. The physical ID and the logical ID are 1:1 mapping. The units with the same physical ID in different dies share the same PMU. The die ID indicates which die a unit belongs to. The RB tree can be searched by two different keys (unit ID or PMU ID + die ID). During the RB tree setup, the unit ID is used as a key to look up the RB tree. The perf can create/assign a proper PMU ID to the unit. Later, after the RB tree is setup, PMU ID + die ID is used as a key to look up the RB tree to fill the cpumask of a PMU. It's used more frequently, so PMU ID + die ID is compared in the unit_less(). The uncore_find_unit() has to be O(N). But the RB tree setup only occurs once during the driver load time. It should be acceptable. Compared with the current implementation, more space is required to save the information of all units. The extra size should be acceptable. For example, on EMR, there are 221 units at most. For a 2-socket machine, the extra space is ~6KB at most. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240614134631.1092359-2-kan.liang@linux.intel.com
2024-06-11perf/x86: Serialize set_attr_rdpmc()Thomas Gleixner
Yue and Xingwei reported a jump label failure. It's caused by the lack of serialization in set_attr_rdpmc(): CPU0 CPU1 Assume: x86_pmu.attr_rdpmc == 0 if (val != x86_pmu.attr_rdpmc) { if (val == 0) ... else if (x86_pmu.attr_rdpmc == 0) static_branch_dec(&rdpmc_never_available_key); if (val != x86_pmu.attr_rdpmc) { if (val == 0) ... else if (x86_pmu.attr_rdpmc == 0) FAIL, due to imbalance ---> static_branch_dec(&rdpmc_never_available_key); The reported BUG() is a consequence of the above and of another bug in the jump label core code. The core code needs a separate fix, but that cannot prevent the imbalance problem caused by set_attr_rdpmc(). Prevent this by serializing set_attr_rdpmc() locally. Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks") Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com Reported-by: Yue Sun <samsun1006219@gmail.com> Reported-by: Xingwei Lee <xrivendell7@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20240610124406.359476013@linutronix.de
2024-05-31perf/x86/intel: Add missing MODULE_DESCRIPTION() linesJeff Johnson
Fix the 'make W=1 C=1' warnings: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-uncore.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-cstate.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-intel-v1-1-8252194ed20a@quicinc.com
2024-05-31perf/x86/rapl: Add missing MODULE_DESCRIPTION() lineJeff Johnson
Fix the warning from 'make C=1 W=1': WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/rapl.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-v1-1-e45ffa8af99f@quicinc.com
2024-05-28perf/x86/rapl: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240520224620.9480-44-tony.luck%40intel.com
2024-05-28x86/cpu: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Update INTEL_CPU_DESC() to work with vendor/family/model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240520224620.9480-34-tony.luck%40intel.com
2024-05-28perf/x86/intel: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240520224620.9480-32-tony.luck%40intel.com
2024-05-22Merge tag 'driver-core-6.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the small set of driver core and kernfs changes for 6.10-rc1. Nothing major here at all, just a small set of changes for some driver core apis, and minor fixups. Included in here are: - sysfs_bin_attr_simple_read() helper added and used - device_show_string() helper added and used All usages of these were acked by the various maintainers. Also in here are: - kernfs minor cleanup - removed unused functions - typo fix in documentation - pay attention to sysfs_create_link() failures in module.c finally All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: device property: Fix a typo in the description of device_get_child_node_count() kernfs: mount: Remove unnecessary ‘NULL’ values from knparent scsi: Use device_show_string() helper for sysfs attributes platform/x86: Use device_show_string() helper for sysfs attributes perf: Use device_show_string() helper for sysfs attributes IB/qib: Use device_show_string() helper for sysfs attributes hwmon: Use device_show_string() helper for sysfs attributes driver core: Add device_show_string() helper for sysfs attributes treewide: Use sysfs_bin_attr_simple_read() helper sysfs: Add sysfs_bin_attr_simple_read() helper module: don't ignore sysfs_create_link() failures driver core: Remove unused platform_notify, platform_notify_remove
2024-05-18perf/x86/amd: Use try_cmpxchg() in events/amd/{un,}core.cUros Bizjak
Replace this pattern in events/amd/{un,}core.c: cmpxchg(*ptr, old, new) == old ... with the simpler and faster: try_cmpxchg(*ptr, &old, new) The x86 CMPXCHG instruction returns success in the ZF flag, so this change saves a compare after the CMPXCHG. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240425101708.5025-1-ubizjak@gmail.com
2024-05-08perf/x86/cstate: Remove unused 'struct perf_cstate_msr'Ingo Molnar
Use of this structure was removed in: 8f2a28c5859b ("perf/x86/cstate: Use new probe function") Remove the now stale type as well. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-05-04perf: Use device_show_string() helper for sysfs attributesLukas Wunner
Deduplicate sysfs ->show() callbacks which expose a string at a static memory location. Use the newly introduced device_show_string() helper in the driver core instead by declaring those sysfs attributes with DEVICE_STRING_ATTR_RO(). No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://lore.kernel.org/r/3a297850312b4ecb62d6872121de04496900f502.1713608122.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02perf/x86/rapl: Rename 'maxdie' to nr_rapl_pmu and 'dieid' to rapl_pmu_idxDhananjay Ugwekar
AMD CPUs have the scope of RAPL energy-pkg event as package, whereas Intel Cascade Lake CPUs have the scope as die. To account for the difference in the energy-pkg event scope between AMD and Intel CPUs, give more generic and semantically correct names to the maxdie and dieid variables. No functional change. Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20240502095115.177713-2-Dhananjay.Ugwekar@amd.com
2024-05-02Merge branch 'x86/cpu' into perf/core, to pick up dependent commitsIngo Molnar
We are going to fix perf-events fallout of changes in tip:x86/cpu, so merge in that branch first. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-05-02Merge tag 'v6.9-rc6' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-29perf/x86/msr: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/20240424181503.41614-1-tony.luck%40intel.com
2024-04-29perf/x86/intel/uncore: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. [ bp: Squash *three* uncore patches into one. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/20240424181501.41557-1-tony.luck%40intel.com
2024-04-25perf/x86/intel/pt: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240424181500.41538-1-tony.luck%40intel.com
2024-04-25perf/x86/lbr: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240424181500.41519-1-tony.luck%40intel.com
2024-04-25perf/x86/intel/cstate: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240424181459.41500-1-tony.luck%40intel.com
2024-04-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "This is a bit on the large side, mostly due to two changes: - Changes to disable some broken PMU virtualization (see below for details under "x86 PMU") - Clean up SVM's enter/exit assembly code so that it can be compiled without OBJECT_FILES_NON_STANDARD. This fixes a warning "Unpatched return thunk in use. This should not happen!" when running KVM selftests. Everything else is small bugfixes and selftest changes: - Fix a mostly benign bug in the gfn_to_pfn_cache infrastructure where KVM would allow userspace to refresh the cache with a bogus GPA. The bug has existed for quite some time, but was exposed by a new sanity check added in 6.9 (to ensure a cache is either GPA-based or HVA-based). - Drop an unused param from gfn_to_pfn_cache_invalidate_start() that got left behind during a 6.9 cleanup. - Fix a math goof in x86's hugepage logic for KVM_SET_MEMORY_ATTRIBUTES that results in an array overflow (detected by KASAN). - Fix a bug where KVM incorrectly clears root_role.direct when userspace sets guest CPUID. - Fix a dirty logging bug in the where KVM fails to write-protect SPTEs used by a nested guest, if KVM is using Page-Modification Logging and the nested hypervisor is NOT using EPT. x86 PMU: - Drop support for virtualizing adaptive PEBS, as KVM's implementation is architecturally broken without an obvious/easy path forward, and because exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak host kernel addresses to the guest. - Set the enable bits for general purpose counters in PERF_GLOBAL_CTRL at RESET time, as done by both Intel and AMD processors. - Disable LBR virtualization on CPUs that don't support LBR callstacks, as KVM unconditionally uses PERF_SAMPLE_BRANCH_CALL_STACK when creating the perf event, and would fail on such CPUs. Tests: - Fix a flaw in the max_guest_memory selftest that results in it exhausting the supply of ucall structures when run with more than 256 vCPUs. - Mark KVM_MEM_READONLY as supported for RISC-V in set_memory_region_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits) KVM: Drop unused @may_block param from gfn_to_pfn_cache_invalidate_start() KVM: selftests: Add coverage of EPT-disabled to vmx_dirty_log_test KVM: x86/mmu: Fix and clarify comments about clearing D-bit vs. write-protecting KVM: x86/mmu: Remove function comments above clear_dirty_{gfn_range,pt_masked}() KVM: x86/mmu: Write-protect L2 SPTEs in TDP MMU when clearing dirty status KVM: x86/mmu: Precisely invalidate MMU root_role during CPUID update KVM: VMX: Disable LBR virtualization if the CPU doesn't support LBR callstacks perf/x86/intel: Expose existence of callback support to KVM KVM: VMX: Snapshot LBR capabilities during module initialization KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel compatible KVM: x86: Stop compiling vmenter.S with OBJECT_FILES_NON_STANDARD KVM: SVM: Create a stack frame in __svm_sev_es_vcpu_run() KVM: SVM: Save/restore args across SEV-ES VMRUN via host save area KVM: SVM: Save/restore non-volatile GPRs in SEV-ES VMRUN via host save area KVM: SVM: Clobber RAX instead of RBX when discarding spec_ctrl_intercepted KVM: SVM: Drop 32-bit "support" from __svm_sev_es_vcpu_run() KVM: SVM: Wrap __svm_sev_es_vcpu_run() with #ifdef CONFIG_KVM_AMD_SEV KVM: SVM: Create a stack frame in __svm_vcpu_run() for unwinding KVM: SVM: Remove a useless zeroing of allocated memory ...
2024-04-14Merge branch 'linus' into perf/core, to pick up perf/urgent fixesIngo Molnar
Pick up perf/urgent fixes that are upstream already, but not yet in the perf/core development branch. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-11perf/x86/intel: Expose existence of callback support to KVMSean Christopherson
Add a "has_callstack" field to the x86_pmu_lbr structure used to pass information to KVM, and set it accordingly in x86_perf_get_lbr(). KVM will use has_callstack to avoid trying to create perf LBR events with PERF_SAMPLE_BRANCH_CALL_STACK on CPUs that don't support callstacks. Reviewed-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20240307011344.835640-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-10perf/x86/rapl: Add support for Intel Lunar LakeZhang Rui
Lunar Lake RAPL support is the same as previous Sky Lake. Add Lunar Lake model for RAPL. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240410124554.448987-2-rui.zhang@intel.com
2024-04-10perf/x86/rapl: Add support for Intel Arrow LakeZhang Rui
Arrow Lake RAPL support is the same as previous Sky Lake. Add Arrow Lake model for RAPL. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240410124554.448987-1-rui.zhang@intel.com
2024-04-10perf/x86: Fix out of range dataNamhyung Kim
On x86 each struct cpu_hw_events maintains a table for counter assignment but it missed to update one for the deleted event in x86_pmu_del(). This can make perf_clear_dirty_counters() reset used counter if it's called before event scheduling or enabling. Then it would return out of range data which doesn't make sense. The following code can reproduce the problem. $ cat repro.c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <linux/perf_event.h> #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/syscall.h> struct perf_event_attr attr = { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, .disabled = 1, }; void *worker(void *arg) { int cpu = (long)arg; int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0); int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0); void *p; do { ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0); p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0); ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0); ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0); munmap(p, 4096); ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0); } while (1); return NULL; } int main(void) { int i; int n = sysconf(_SC_NPROCESSORS_ONLN); pthread_t *th = calloc(n, sizeof(*th)); for (i = 0; i < n; i++) pthread_create(&th[i], NULL, worker, (void *)(long)i); for (i = 0; i < n; i++) pthread_join(th[i], NULL); free(th); return 0; } And you can see the out of range data using perf stat like this. Probably it'd be easier to see on a large machine. $ gcc -o repro repro.c -pthread $ ./repro & $ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }' 1.001028462 CPU6 196,719,295,683,763 cycles # 194290.996 GHz (71.54%) 1.001028462 CPU3 396,077,485,787,730 branch-misses # 15804359784.80% of all branches (71.07%) 1.001028462 CPU17 197,608,350,727,877 branch-misses # 14594186554.56% of all branches (71.22%) 2.020064073 CPU4 198,372,472,612,140 cycles # 194681.113 GHz (70.95%) 2.020064073 CPU6 199,419,277,896,696 cycles # 195720.007 GHz (70.57%) 2.020064073 CPU20 198,147,174,025,639 cycles # 194474.654 GHz (71.03%) 2.020064073 CPU20 198,421,240,580,145 stalled-cycles-frontend # 100.14% frontend cycles idle (70.93%) 3.037443155 CPU4 197,382,689,923,416 cycles # 194043.065 GHz (71.30%) 3.037443155 CPU20 196,324,797,879,414 cycles # 193003.773 GHz (71.69%) 3.037443155 CPU5 197,679,956,608,205 stalled-cycles-backend # 1315606428.66% backend cycles idle (71.19%) 3.037443155 CPU5 198,571,860,474,851 instructions # 13215422.58 insn per cycle It should move the contents in the cpuc->assign as well. Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240306061003.1894224-1-namhyung@kernel.org
2024-04-03perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS eventKan Liang
The MSR_PEBS_DATA_CFG MSR register is used to configure which data groups should be generated into a PEBS record, and it's shared among all counters. If there are different configurations among counters, perf combines all the configurations. The first perf command as below requires a complete PEBS record (including memory info, GPRs, XMMs, and LBRs). The second perf command only requires a basic group. However, after the second perf command is running, the MSR_PEBS_DATA_CFG register is cleared. Only a basic group is generated in a PEBS record, which is wrong. The required information for the first perf command is missed. $ perf record --intr-regs=AX,SP,XMM0 -a -C 8 -b -W -d -c 100000003 -o /dev/null -e cpu/event=0xd0,umask=0x81/upp & $ sleep 5 $ perf record --per-thread -c 1 -e cycles:pp --no-timestamp --no-tid taskset -c 8 ./noploop 1000 The first PEBS event is a system-wide PEBS event. The second PEBS event is a per-thread event. When the thread is scheduled out, the intel_pmu_pebs_del() function is invoked to update the PEBS state. Since the system-wide event is still available, the cpuc->n_pebs is 1. The cpuc->pebs_data_cfg is cleared. The data configuration for the system-wide PEBS event is lost. The (cpuc->n_pebs == 1) check was introduced in commit: b6a32f023fcc ("perf/x86: Fix PEBS threshold initialization") At that time, it indeed didn't hurt whether the state was updated during the removal, because only the threshold is updated. The calculation of the threshold takes the last PEBS event into account. However, since commit: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") we delay the threshold update, and clear the PEBS data config, which triggers the bug. The PEBS data config update scope should not be shrunk during removal. [ mingo: Improved the changelog & comments. ] Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240401133320.703971-1-kan.liang@linux.intel.com
2024-04-03perf/x86/amd: Don't reject non-sampling events with configured LBRAndrii Nakryiko
Now that it's possible to capture LBR on AMD CPU from BPF at arbitrary point, there is no reason to artificially limit this feature to just sampling events. So corresponding check is removed. AFAIU, there is no correctness implications of doing this (and it was possible to bypass this check by just setting perf_event's sample_period to 1 anyways, so it doesn't guard all that much). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240402022118.1046049-5-andrii@kernel.org
2024-04-03perf/x86/amd: Support capturing LBR from software eventsAndrii Nakryiko
Upstream commit c22ac2a3d4bd ("perf: Enable branch record for software events") added ability to capture LBR (Last Branch Records) on Intel CPUs from inside BPF program at pretty much any arbitrary point. This is extremely useful capability that allows to figure out otherwise hard to debug problems, because LBR is now available based on some application-defined conditions, not just hardware-supported events. 'retsnoop' is one such tool that takes a huge advantage of this functionality and has proved to be an extremely useful tool in practice: https://github.com/anakryiko/retsnoop Now, AMD Zen4 CPUs got support for similar LBR functionality, but necessary wiring inside the kernel is not yet setup. This patch seeks to rectify this and follows a similar approach to the original patch for Intel CPUs. We implement an AMD-specific callback set to be called through perf_snapshot_branch_stack static call. Previous preparatory patches ensured that amd_pmu_core_disable_all() and __amd_pmu_lbr_disable() will be completely inlined and will have no branches, so LBR snapshot contamination will be minimized. This was tested on AMD Bergamo CPU and worked well when utilized from the aforementioned retsnoop tool. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240402022118.1046049-4-andrii@kernel.org
2024-04-03perf/x86/amd: Avoid taking branches before disabling LBRAndrii Nakryiko
In the following patches we will enable LBR capture on AMD CPUs at arbitrary point in time, which means that LBR recording won't be frozen by hardware automatically as part of hardware overflow event. So we need to take care to minimize amount of branches and function calls/returns on the path to freezing LBR, minimizing LBR snapshot altering as much as possible. As such, split out LBR disabling logic from the sanity checking logic inside amd_pmu_lbr_disable_all(). This will ensure that no branches are taken before LBR is frozen in the functionality added in the next patch. Use __always_inline to also eliminate any possible function calls. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240402022118.1046049-3-andrii@kernel.org
2024-04-03perf/x86/amd: Ensure amd_pmu_core_disable_all() is always inlinedAndrii Nakryiko
In the following patches we will enable LBR capture on AMD CPUs at arbitrary point in time, which means that LBR recording won't be frozen by hardware automatically as part of hardware overflow event. So we need to take care to minimize amount of branches and function calls/returns on the path to freezing LBR, minimizing LBR snapshot altering as much as possible. amd_pmu_core_disable_all() is one of the functions on this path, and is already marked as __always_inline. But it calls amd_pmu_set_global_ctl() which is marked as just inline. So to guarantee no function call will be generated thoughout mark amd_pmu_set_global_ctl() as __always_inline as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240402022118.1046049-2-andrii@kernel.org
2024-04-03Merge tag 'v6.9-rc2' into perf/core, to pick up dependent commitsIngo Molnar
Pick up fixes that followup patches are going to depend on. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-03-26perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and laterSandipan Das
Add the "ref-cycles" event for AMD processors based on Zen 4 and later microarchitectures. The backing event is based on PMCx120 which counts cycles not in halt state in P0 frequency (same as MPERF). Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/089155f19f7c7e65aeb1caa727a882e2ca9b8b04.1711352180.git.sandipan.das@amd.com
2024-03-26perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and laterSandipan Das
AMD processors based on Zen 2 and later microarchitectures do not support PMCx087 (instruction pipe stalls) which is used as the backing event for "stalled-cycles-frontend" and "stalled-cycles-backend". Use PMCx0A9 (cycles where micro-op queue is empty) instead to count frontend stalls and remove the entry for backend stalls since there is no direct replacement. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Fixes: 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h") Link: https://lore.kernel.org/r/03d7fc8fa2a28f9be732116009025bdec1b3ec97.1711352180.git.sandipan.das@amd.com
2024-03-25perf/x86/amd/lbr: Use freeze based on availabilitySandipan Das
Currently, the LBR code assumes that LBR Freeze is supported on all processors when X86_FEATURE_AMD_LBR_V2 is available i.e. CPUID leaf 0x80000022[EAX] bit 1 is set. This is incorrect as the availability of the feature is additionally dependent on CPUID leaf 0x80000022[EAX] bit 2 being set, which may not be set for all Zen 4 processors. Define a new feature bit for LBR and PMC freeze and set the freeze enable bit (FLBRI) in DebugCtl (MSR 0x1d9) conditionally. It should still be possible to use LBR without freeze for profile-guided optimization of user programs by using an user-only branch filter during profiling. When the user-only filter is enabled, branches are no longer recorded after the transition to CPL 0 upon PMI arrival. When branch entries are read in the PMI handler, the branch stack does not change. E.g. $ perf record -j any,u -e ex_ret_brn_tkn ./workload Since the feature bit is visible under flags in /proc/cpuinfo, it can be used to determine the feasibility of use-cases which require LBR Freeze to be supported by the hardware such as profile-guided optimization of kernels. Fixes: ca5b7c0d9621 ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/69a453c97cfd11c6f2584b19f937fe6df741510f.1711091584.git.sandipan.das@amd.com
2024-03-21perf/x86/rapl: Prefer struct_size() over open coded arithmeticErick Archer
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments https://github.com/KSPP/linux/issues/160 As the "rapl_pmus" variable is a pointer to "struct rapl_pmus" and this structure ends in a flexible array: struct rapl_pmus { [...] struct rapl_pmu *pmus[] __counted_by(maxdie); }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + count * size" in the kzalloc() function. This way, the code is more readable and safer. Signed-off-by: Erick Archer <erick.archer@gmx.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240317164442.6729-1-erick.archer@gmx.com
2024-03-13perf/x86/amd/core: Avoid register reset when CPU is deadSandipan Das
When bringing a CPU online, some of the PMC and LBR related registers are reset. The same is done when a CPU is taken offline although that is unnecessary. This currently happens in the "cpu_dead" callback which is also incorrect as the callback runs on a control CPU instead of the one that is being taken offline. This also affects hibernation and suspend to RAM on some platforms as reported in the link below. Fixes: 21d59e3e2c40 ("perf/x86/amd/core: Detect PerfMonV2 support") Reported-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/550a026764342cf7e5812680e3e2b91fe662b5ac.1706526029.git.sandipan.das@amd.com
2024-03-13perf/x86/amd/lbr: Discard erroneous branch entriesSandipan Das
The Revision Guide for AMD Family 19h Model 10-1Fh processors declares Erratum 1452 which states that non-branch entries may erroneously be recorded in the Last Branch Record (LBR) stack with the valid and spec bits set. Such entries can be recognized by inspecting bit 61 of the corresponding LastBranchStackToIp register. This bit is currently reserved but if found to be set, the associated branch entry should be discarded. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://bugzilla.kernel.org/attachment.cgi?id=305518 Link: https://lore.kernel.org/r/3ad2aa305f7396d41a40e3f054f740d464b16b7f.1706526029.git.sandipan.das@amd.com
2024-03-11Merge tag 'x86-cleanups-2024-03-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups, including a large series from Thomas Gleixner to cure sparse warnings" * tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/nmi: Drop unused declaration of proc_nmi_enabled() x86/callthunks: Use EXPORT_PER_CPU_SYMBOL_GPL() for per CPU variables x86/cpu: Provide a declaration for itlb_multihit_kvm_mitigation x86/cpu: Use EXPORT_PER_CPU_SYMBOL_GPL() for x86_spec_ctrl_current x86/uaccess: Add missing __force to casts in __access_ok() and valid_user_address() x86/percpu: Cure per CPU madness on UP smp: Consolidate smp_prepare_boot_cpu() x86/msr: Add missing __percpu annotations x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h> perf/x86/amd/uncore: Fix __percpu annotation x86/nmi: Remove an unnecessary IS_ENABLED(CONFIG_SMP) x86/apm_32: Remove dead function apm_get_battery_status() x86/insn-eval: Fix function param name in get_eff_addr_sib()
2024-03-04x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h>Thomas Gleixner
To clean up the per CPU insanity of UP which causes sparse to be rightfully unhappy and prevents the usage of the generic per CPU accessors on cpu_info it is necessary to include <linux/percpu.h> into <asm/msr.h>. Including <linux/percpu.h> into <asm/msr.h> is impossible because it ends up in header dependency hell. The problem is that <asm/processor.h> includes <asm/msr.h>. The inclusion of <linux/percpu.h> results in a compile fail where the compiler cannot longer handle an include in <asm/cpufeature.h> which references boot_cpu_data which is defined in <asm/processor.h>. The only reason why <asm/msr.h> is included in <asm/processor.h> are the set/get_debugctlmsr() inlines. They are defined there because <asm/processor.h> is such a nice dump ground for everything. In fact they belong obviously into <asm/debugreg.h>. Move them to <asm/debugreg.h> and fix up the resulting damage which is just exposing the reliance on random include chains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240304005104.454678686@linutronix.de
2024-03-04perf/x86/amd/uncore: Fix __percpu annotationThomas Gleixner
The __percpu annotation in struct amd_uncore is confusing Sparse: uncore.c:649:10: sparse: warning: incorrect type in initializer (different address spaces) uncore.c:649:10: sparse: expected void const [noderef] __percpu *__vpp_verify uncore.c:649:10: sparse: got union amd_uncore_info * The reason is that the __percpu annotation sits between the '*' dereferencing operator and the member name. Move it before the dereferencing operator to cure this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240304005104.394845326@linutronix.de
2024-02-16x86/cpu/topology: Get rid of cpuinfo::x86_max_coresThomas Gleixner
Now that __num_cores_per_package and __num_threads_per_package are available, cpuinfo::x86_max_cores and the related math all over the place can be replaced with the ready to consume data. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240213210253.176147806@linutronix.de
2024-02-15x86/cpu/topology: Rename topology_max_die_per_package()Thomas Gleixner
The plural of die is dies. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20240213210253.065874205@linutronix.de
2024-02-15x86/cpu/amd: Provide a separate accessor for Node IDThomas Gleixner
AMD (ab)uses topology_die_id() to store the Node ID information and topology_max_dies_per_pkg to store the number of nodes per package. This collides with the proper processor die level enumeration which is coming on AMD with CPUID 8000_0026, unless there is a correlation between the two. There is zero documentation about that. So provide new storage and new accessors which for now still access die_id and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are converted over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20240212153624.956116738@linutronix.de
2024-01-08Merge tag 'perf-core-2024-01-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: - Add branch stack counters ABI extension to better capture the growing amount of information the PMU exposes via branch stack sampling. There's matching tooling support. - Fix race when creating the nr_addr_filters sysfs file - Add Intel Sierra Forest and Grand Ridge intel/cstate PMU support - Add Intel Granite Rapids, Sierra Forest and Grand Ridge uncore PMU support - Misc cleanups & fixes * tag 'perf-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Factor out topology_gidnid_map() perf/x86/intel/uncore: Fix NULL pointer dereference issue in upi_fill_topology() perf/x86/amd: Reject branch stack for IBS events perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge perf/x86/intel/uncore: Support IIO free-running counters on GNR perf/x86/intel/uncore: Support Granite Rapids perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR perf: Fix the nr_addr_filters fix perf/x86/intel/cstate: Add Grand Ridge support perf/x86/intel/cstate: Add Sierra Forest support x86/smp: Export symbol cpu_clustergroup_mask() perf/x86/intel/cstate: Cleanup duplicate attr_groups perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file perf/x86/intel: Support branch counters logging perf/x86/intel: Reorganize attrs and is_visible perf: Add branch_sample_call_stack perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag perf: Add branch stack counters
2024-01-08Merge tag 'x86-cleanups-2024-01-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: - Change global variables to local - Add missing kernel-doc function parameter descriptions - Remove unused parameter from a macro - Remove obsolete Kconfig entry - Fix comments - Fix typos, mostly scripted, manually reviewed and a micro-optimization got misplaced as a cleanup: - Micro-optimize the asm code in secondary_startup_64_no_verify() * tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch/x86: Fix typos x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify() x86/docs: Remove reference to syscall trampoline in PTI x86/Kconfig: Remove obsolete config X86_32_SMP x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro x86/mtrr: Document missing function parameters in kernel-doc x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()