summaryrefslogtreecommitdiff
path: root/tools/perf/util/amd-sample-raw.c
AgeCommit message (Collapse)Author
2025-06-03Merge tag 'perf-tools-for-v6.16-1-2025-06-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "perf report/top/annotate TUI: - Accept the left arrow key as a Zoom out if done on the first column - Show if source code toggle status in title, to help spotting bugs with the various disassemblers (capstone, llvm, objdump) - Provide feedback on unhandled hotkeys Build: - Better inform when certain features are not available with warnings in the build process and in 'perf version --build-options' or 'perf -vv' perf record: - Improve the --off-cpu code by synthesizing events for switch-out -> switch-in intervals using a BPF program. This can be fine tuned using a --off-cpu-thresh knob perf report: - Add 'tgid' sort key perf mem/c2c: - Add 'op', 'cache', 'snoop', 'dtlb' output fields - Add support for 'ldlat' on AMD IBS (Instruction Based Sampling) perf ftrace: - Use process/session specific trace settings instead of messing with the global ftrace knobs perf trace: - Implement syscall summary in BPF - Support --summary-mode=cgroup - Always print return value for syscalls returning a pid - The rseq and set_robust_list don't return a pid, just -errno perf lock contention: - Symbolize zone->lock using BTF - Add -J/--inject-delay option to estimate impact on application performance by optimization of kernel locking behavior perf stat: - Improve hybrid support for the NMI watchdog warning Symbol resolution: - Handle 'u' and 'l' symbols in /proc/kallsyms, resolving some Rust symbols - Improve Rust demangler Hardware tracing: Intel PT: - Fix PEBS-via-PT data_src - Do not default to recording all switch events - Fix pattern matching with python3 on the SQL viewer script arm64: - Fixups for the hip08 hha PMU Vendor events: - Update Intel events/metrics files for alderlake, alderlaken, arrowlake, bonnell, broadwell, broadwellde, broadwellx, cascadelakex, clearwaterforest, elkhartlake, emeraldrapids, grandridge, graniterapids, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, lunarlake, meteorlake, nehalemep, nehalemex, rocketlake, sandybridge, sapphirerapids, sierraforest, skylake, skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp, westmereep-sx python support: - Add support for event counts in the python binding, add a counting.py example perf list: - Display the PMU name associated with a perf metric in JSON perf test: - Hybrid improvements for metric value validation test - Fix LBR test by ignoring idle task - Add AMD IBS sw filter ana d'ldlat' tests - Add 'perf trace --summary-mode=cgroup' test - Add tests for the various language symbol demanglers Miscellaneous: - Allow specifying the cpu an event will be tied using '-e event/cpu=N/' - Sync various headers with the kernel sources - Add annotations to use clang's -Wthread-safety and fix some problems it detected - Make dump_stack() use perf's symbol resolution to provide better backtraces - Intel TPEBS support cleanups and fixes. TPEBS stands for Timed PEBS (Precision Event-Based Sampling), that adds timing info, the retirement latency of instructions - Various memory allocation (some detected by ASAN) and reference counting fixes - Add a 8-byte aligned PERF_RECORD_COMPRESSED2 to replace PERF_RECORD_COMPRESSED - Skip unsupported event types in perf.data files, don't stop when finding one - Improve lookups using hashmaps and binary searches" * tag 'perf-tools-for-v6.16-1-2025-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (206 commits) perf callchain: Always populate the addr_location map when adding IP perf lock contention: Reject more than 10ms delays for safety perf trace: Set errpid to false for rseq and set_robust_list perf symbol: Move demangling code out of symbol-elf.c perf trace: Always print return value for syscalls returning a pid perf script: Print PERF_AUX_FLAG_COLLISION flag perf mem: Show absolute percent in mem_stat output perf mem: Display sort order only if it's available perf mem: Describe overhead calculation in brief perf record: Fix incorrect --user-regs comments Revert "perf thread: Ensure comm_lock held for comm_list" perf test trace_summary: Skip --bpf-summary tests if no libbpf perf test intel-pt: Skip jitdump test if no libelf perf intel-tpebs: Avoid race when evlist is being deleted perf test demangle-java: Don't segv if demangling fails perf symbol: Fix use-after-free in filename__read_build_id perf pmu: Avoid segv for missing name/alias_name in wildcarding perf machine: Factor creating a "live" machine out of dwarf-unwind perf test: Add AMD IBS sw filter test perf mem: Count L2 HITM for c2c statistic ...
2025-04-29perf amd ibs: Incorporate Zen5 DTLB and PageSize informationRavi Bangoria
IBS Op PMU on Zen5 reports DTLB and page size information differently compared to prior generation. IBS_OP_DATA3 Zen3/4 Zen5 ---------------------------------------------------------------- 19 IbsDcL2TlbHit1G Reserved ---------------------------------------------------------------- 6 IbsDcL2tlbHit2M Reserved ---------------------------------------------------------------- 5 IbsDcL1TlbHit1G PageSize: 4 IbsDcL1TlbHit2M 0 - 4K 1 - 2M 2 - 1G 3 - Reserved Valid only if IbsDcPhyAddrValid = 1 ---------------------------------------------------------------- 3 IbsDcL2TlbMiss IbsDcL2TlbMiss Valid only if IbsDcPhyAddrValid = 1 ---------------------------------------------------------------- 2 IbsDcL1tlbMiss IbsDcL1tlbMiss Valid only if IbsDcPhyAddrValid = 1 ---------------------------------------------------------------- Kernel expose this change as "dtlb_pgsize" capability in PMU sysfs. Change IBS register raw-dump logic according to new bit definitions. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20250429035938.1301-3-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-29perf amd ibs: Add Load Latency bits in raw dumpRavi Bangoria
IBS OP PMU on Zen5 supports Load Latency filtering. Decode and dump Load Latency filtering related bits into perf script raw dump. Also add oneliner example in the perf-amd-ibs man page. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20250429035938.1301-2-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-04-14x86/platform/amd: Move the <asm/amd-ibs.h> header to <asm/amd/ibs.h>Ingo Molnar
Collect AMD specific platform header files in <asm/amd/*.h>. Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mario Limonciello <superm1@kernel.org> Link: https://lore.kernel.org/r/20250413084144.3746608-2-mingo@kernel.org
2023-08-23perf script ibs: Remove unused includeIan Rogers
Done to reduce dependencies on pmu-events.h. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230823080828.1460376-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-10perf script ibs: Change bit description according to latest AMD PPR ↵Ravi Bangoria
("Processor Programming Reference") Some of the IBS_OP_DATA2 bit descriptions were stale (taken from old version of PPR). Change it according to latest PPR. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230407112459.548-5-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf tools: Move 'struct perf_sample' to a separate header file to ↵Arnaldo Carvalho de Melo
disentangle headers Some places were including event.h just to get 'struct perf_sample', move it to a separate place so that we speed up a bit the build. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-24perf script ibs: Support new IBS bits in raw trace dumpRavi Bangoria
Interpret Additional set of IBS register bits while doing perf report/script raw dump. IBS op PMU ex: $ sudo ./perf record -c 130 -a -e ibs_op/l3missonly=1/ --raw-samples $ sudo ./perf report -D ... ibs_op_ctl: 0000004500070008 MaxCnt 128 L3MissOnly 1 En 1 Val 1 CntCtl 0=cycles CurCnt 69 ibs_op_data: 0000000000710002 CompToRetCtr 2 TagToRetCtr 113 BrnRet 0 RipInvalid 0 BrnFuse 0 Microcode 0 ibs_op_data2: 0000000000000002 CacheHitSt 0=M-state RmtNode 0 DataSrc 2=A peer cache in a near CCX ibs_op_data3: 000000681d1700a1 LdOp 1 StOp 0 DcL1TlbMiss 0 DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 1 DcL2TlbHit2M 0 DcMiss 1 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0 DcMissNoMabAlloc 1 DcLinAddrValid 1 DcPhyAddrValid 1 DcL2TlbHit1G 0 L2Miss 1 SwPf 0 OpMemWidth 8 bytes OpDcMissOpenMemReqs 7 DcMissLat 104 TlbRefillLat 0 IBS Fetch PMU ex: $ sudo ./perf record -c 130 -a -e ibs_fetch/l3missonly=1/ --raw-samples $ sudo ./perf report -D ... ibs_fetch_ctl: 3c1f00c700080008 MaxCnt 128 Cnt 128 Lat 199 En 1 Val 1 Comp 1 IcMiss 1 PhyAddrValid 1 L1TlbPgSz 4KB L1TlbMiss 0 L2TlbMiss 0 RandEn 0 L2Miss 1 L3MissOnly 1 FetchOcMiss 1 FetchL3Miss 1 With the DataSrc extensions, the source of data can be decoded among: - Local L3 or other L1/L2 in CCX. - A peer cache in a near CCX. - Data returned from DRAM. - A peer cache in a far CCX. - DRAM address map with "long latency" bit set. - Data returned from MMIO/Config/PCI/APIC. - Extension Memory (S-Link, GenZ, etc - identified by the CS target and/or address map at DF's choice). - Peer Agent Memory. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rrichter@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: like.xu.linux@gmail.com Cc: x86@kernel.org Link: https://lore.kernel.org/r/20220604044519.594-9-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-24perf tool ibs: Sync AMD IBS header fileRavi Bangoria
IBS support has been enhanced with two new features in upcoming uarch: 1. DataSrc extension 2. L3 miss filtering. Additional set of bits has been introduced in IBS registers to exploit these features. New bits are already defining in arch/x86/ header. Sync it with tools header file. Also rename existing ibs_op_data field 'data_src' to 'data_src_lo'. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rrichter@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: like.xu.linux@gmail.com Cc: x86@kernel.org Link: https://lore.kernel.org/r/20220604044519.594-8-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-10perf report: Add support to print a textual representation of IBS raw sample ↵Kim Phillips
data Perf records IBS (Instruction Based Sampling) extra sample data when 'perf record --raw-samples' is used with an IBS-compatible event, on a machine that supports IBS. IBS support is indicated in CPUID_Fn80000001_ECX bit #10. Up until now, users have been able to see the extra sample data solely in raw hex format using 'perf report --dump-raw-trace'. From there, users could decode the data either manually, or by using an external script. Enable the built-in 'perf report --dump-raw-trace' to do the decoding of the extra sample data bits, so manual or external script decoding isn't necessary. Example usage: $ sudo perf record -c 10000001 -a --raw-samples -e ibs_fetch/rand_en=1/,ibs_op/cnt_ctl=1/ -C 0,1 taskset -c 0,1 7za b -mmt2 | perf report --dump-raw-trace Stdout contains IBS Fetch samples, e.g.: ibs_fetch_ctl: 02170007ffffffff MaxCnt 1048560 Cnt 1048560 Lat 7 En 1 Val 1 Comp 1 IcMiss 0 PhyAddrValid 1 L1TlbPgSz 4KB L1TlbMiss 0 L2TlbMiss 0 RandEn 1 L2Miss 0 IbsFetchLinAd: 000056016b2ead40 IbsFetchPhysAd: 000000115cedfd40 c_ibs_ext_ctl: 0000000000000000 IbsItlbRefillLat 0 ..and IBS Op samples, e.g.: ibs_op_ctl: 0000009e009e8968 MaxCnt 10000000 En 1 Val 1 CntCtl 1=uOps CurCnt 158 IbsOpRip: 000056016b2ea73d ibs_op_data: 00000000000b0002 CompToRetCtr 2 TagToRetCtr 11 BrnRet 0 RipInvalid 0 BrnFuse 0 Microcode 0 ibs_op_data2: 0000000000000002 CacheHitSt 0=M-state RmtNode 0 DataSrc 2=Local node cache ibs_op_data3: 0000000000c60002 LdOp 0 StOp 1 DcL1TlbMiss 0 DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 0 DcL2TlbHit2M 0 DcMiss 0 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0 DcMissNoMabAlloc 0 DcLinAddrValid 1 DcPhyAddrValid 1 DcL2TlbHit1G 0 L2Miss 0 SwPf 0 OpMemWidth 4 bytes OpDcMissOpenMemReqs 0 DcMissLat 0 TlbRefillLat 0 IbsDCLinAd: 00007f133c319ce0 IbsDCPhysAd: 0000000270485ce0 Committer notes: Fixed up this: util/amd-sample-raw.c: In function ‘evlist__amd_sample_raw’: util/amd-sample-raw.c:125:42: error: ‘ bytes’ directive output may be truncated writing 6 bytes into a region of size between 4 and 7 [-Werror=format-truncation=] 125 | " OpMemWidth %2d bytes", 1 << (reg.op_mem_width - 1)); | ^~~~~~ In file included from /usr/include/stdio.h:866, from util/amd-sample-raw.c:7: /usr/include/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 21 and 24 bytes into a destination of size 21 71 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 72 | __glibc_objsize (__s), __fmt, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 73 | __va_arg_pack ()); | ~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors As that %2d won't limit the number of chars to 2, just state that 2 is the minimal width: $ cat printf.c #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { char bf[64]; int len = snprintf(bf, sizeof(bf), "%2d", atoi(argv[1])); printf("strlen(%s): %u\n", bf, len); return 0; } $ ./printf 1 strlen( 1): 2 $ ./printf 12 strlen(12): 2 $ ./printf 123 strlen(123): 3 $ ./printf 1234 strlen(1234): 4 $ ./printf 12345 strlen(12345): 5 $ ./printf 123456 strlen(123456): 6 $ And since we probably don't want that output to be truncated, just assume the worst case, as the compiler did, and add a few more chars to that buffer. Also use sizeof(var) instead of sizeof(dup-of-wanted-format-string) to avoid bugs when changing one but not the other. I also had to change this: -#include <asm/amd-ibs.h> +#include "../../arch/x86/include/asm/amd-ibs.h" To make it build on other architectures, just like intel-pt does. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: https //lore.kernel.org/r/20210817221509.88391-4-kim.phillips@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>