summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2025-03-19perf: intel-tpebs: Fix incorrect usage of zfree()James Clark
zfree() requires an address otherwise it frees what's in name, rather than name itself. Pass the address of name to fix it. This was the only incorrect occurrence in Perf found using a search. Fixes: 8db5cabcf1b6 ("perf stat: Fork and launch 'perf record' when 'perf stat' needs to get retire latency value for a metric.") Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250319101614.190922-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-19perf cpumap: Increment reference count for online cpumapIan Rogers
Thomas Richter <tmricht@linux.ibm.com> reported a double put on the cpumap for the placeholder core PMU: https://lore.kernel.org/lkml/20250318095132.1502654-3-tmricht@linux.ibm.com/ Requiring the caller to get the cpumap is not how these things are usually done, switch cpu_map__online to do the get and then fix up any use cases where a put is needed. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20250318171914.145616-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-19perf dso: fix dso__is_kallsyms() checkStephen Brennan
Kernel modules for which we cannot find a file on-disk will have a dso->long_name that looks like "[module_name]". Prior to the commit listed in the fixes, the dso->kernel field would be zero (for user space), so dso__is_kallsyms() would return false. After the commit, kernel module DSOs are correctly labeled, but the result is that dso__is_kallsyms() erroneously returns true for those modules without a filesystem path. Later, build_id_cache__add() consults this value of is_kallsyms, and when true, it copies /proc/kallsyms into the cache. Users with many kernel modules without a filesystem path (e.g. ksplice or possibly kernel live patch modules) have reported excessive disk space usage in the build ID cache directory due to this behavior. To reproduce the issue, it's enough to build a trivial out-of-tree hello world kernel module, load it using insmod, and then use: perf record -ag -- sleep 1 In the build ID directory, there will be a directory for your module name containing a kallsyms file. Fix this up by changing dso__is_kallsyms() to consult the dso_binary_type enumeration, which is also symmetric to the above checks for dso__is_vmlinux() and dso__is_kcore(). With this change, kallsyms is not cached in the build-id cache for out-of-tree modules. Fixes: 02213cec64bbe ("perf maps: Mark module DSOs with kernel type") Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Link: https://lore.kernel.org/r/20250318230012.2038790-1-stephen.s.brennan@oracle.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-19x86/cpufeatures: Remove {disabled,required}-features.hXin Li (Intel)
The functionalities of {disabled,required}-features.h have been replaced with the auto-generated generated/<asm/cpufeaturemasks.h> header. Thus they are no longer needed and can be removed. None of the macros defined in {disabled,required}-features.h is used in tools, delete them too. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250305184725.3341760-4-xin@zytor.com
2025-03-18perf kwork: Remove unreachable judgmentsFeng Yang
When s2[i] = '\0', if s1[i] != '\0', it will be judged by ret, and if s1[i] = '\0', it will be judegd by !s1[i]. So in reality, s2 [i] will never make a judgment Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250314031013.94480-1-yangfeng59949@163.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-18perf python: Check if there is space to copy all the eventArnaldo Carvalho de Melo
The pyrf_event__new() method copies the event obtained from the perf ring buffer to a structure that will then be turned into a python object for further consumption, so it copies perf_event.header.size bytes to its 'event' member: $ pahole -C pyrf_event /tmp/build/perf-tools-next/python/perf.cpython-312-x86_64-linux-gnu.so struct pyrf_event { PyObject ob_base; /* 0 16 */ struct evsel * evsel; /* 16 8 */ struct perf_sample sample; /* 24 312 */ /* XXX last struct has 7 bytes of padding, 2 holes */ /* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */ union perf_event event; /* 336 4168 */ /* size: 4504, cachelines: 71, members: 4 */ /* member types with holes: 1, total: 2 */ /* paddings: 1, sum paddings: 7 */ /* last cacheline: 24 bytes */ }; $ It was doing so without checking if the event just obtained has more than that space, fix it. This isn't a proper, final solution, as we need to support larger events, but for the time being we at least bounds check and document it. Fixes: 877108e42b1b9ba6 ("perf tools: Initial python binding") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-7-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-18perf python: Don't keep a raw_data pointer to consumed ring buffer spaceArnaldo Carvalho de Melo
When processing tracepoints the perf python binding was parsing the event before calling perf_mmap__consume(&md->core) in pyrf_evlist__read_on_cpu(). But part of this event parsing was to set the perf_sample->raw_data pointer to the payload of the event, which then could be overwritten by other event before tracepoint fields were asked for via event.prev_comm in a python program, for instance. This also happened with other fields, but strings were were problems were surfacing, as there is UTF-8 validation for the potentially garbled data. This ended up showing up as (with some added debugging messages): ( field 'prev_comm' ret=0x7f7c31f65110, raw_size=68 ) ( field 'prev_pid' ret=0x7f7c23b1bed0, raw_size=68 ) ( field 'prev_prio' ret=0x7f7c239c0030, raw_size=68 ) ( field 'prev_state' ret=0x7f7c239c0250, raw_size=68 ) time 14771421785867 prev_comm= prev_pid=1919907691 prev_prio=796026219 prev_state=0x303a32313175 ==> ( XXX '��' len=16, raw_size=68) ( field 'next_comm' ret=(nil), raw_size=68 ) Traceback (most recent call last): File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 51, in <module> main() File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 46, in main event.next_comm, ^^^^^^^^^^^^^^^ AttributeError: 'perf.sample_event' object has no attribute 'next_comm' When event.next_comm was asked for, the PyUnicode_FromString() python API would fail and that tracepoint field wouldn't be available, stopping the tools/perf/python/tracepoint.py test tool. But, since we already do a copy of the whole event in pyrf_event__new, just use it and while at it remove what was done in in e8968e654191390a ("perf python: Fix pyrf_evlist__read_on_cpu event consuming") because we don't really need to wait for parsing the sample before declaring the event as consumed. This copy is questionable as is now, as it limits the maximum event + sample_type and tracepoint payload to sizeof(union perf_event), this all has been "working" because 'struct perf_event_mmap2', the largest entry in 'union perf_event' is: $ pahole -C perf_event ~/bin/perf | grep mmap2 struct perf_record_mmap2 mmap2; /* 0 4168 */ $ Fixes: bae57e3825a3dded ("perf python: Add support to resolve tracepoint fields") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-6-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-18perf python: Decrement the refcount of just created event on failureArnaldo Carvalho de Melo
To avoid a leak if we have the python object but then something happens and we need to return the operation, decrement the offset of the newly created object. Fixes: 377f698db12150a1 ("perf python: Add struct evsel into struct pyrf_event") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-5-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-18perf python tracepoint.py: Change the COMM using setproctitle if availableArnaldo Carvalho de Melo
Otherwise when debugging we see just "python" in perf, top, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-4-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-18perf python: Remove some unused macros (_PyUnicode_FromString(arg), etc)Arnaldo Carvalho de Melo
When python2 support was removed in e7e9943c87d857da ("perf python: Remove python 2 scripting support"), all use of the _PyUnicode_FromString(arg), _PyUnicode_FromFormat(...), and _PyLong_FromLong(arg) macros was removed as well, so remove it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-3-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-18perf python: Fixup description of sample.id event memberArnaldo Carvalho de Melo
Some old cut'n'paste error, its "ip", so the description should be "event ip", not "event type". Fixes: 877108e42b1b9ba6 ("perf tools: Initial python binding") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312203141.285263-2-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-17perf test dso-data: Correctly free test file in read testIan Rogers
The DSO data read test opens a file but as dsos__exit is used the test file isn't closed. This causes the subsequent subtests in don't fork (-F) mode to fail as one more than expected file descriptor is open. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250318043151.137973-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-17perf dso: Use lock annotations to fix asan deadlockIan Rogers
dso__list_del with address sanitizer and/or reference count checking will call dso__put that can call dso__data_close reentrantly trying to lock the dso__data_open_lock and deadlocking. Switch from pthread mutexes to perf's mutex so that lock checking is performed in debug builds. Add lock annotations that diagnosed the problem. Release the dso__data_open_lock around the dso__put to avoid the deadlock. Change the declaration of dso__data_get_fd to return a boolean, indicating the fd is valid and the lock is held, to make it compatible with the thread safety annotations as a try lock. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250318043151.137973-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-17perf mutex: Add annotations for LOCKS_EXCLUDED and LOCKS_RETURNEDIan Rogers
Used to annotate when locks shouldn't be held for a function or if a function returns a lock that's used by later mutex lock unlock operations. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250318043151.137973-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-14perf test: Add pipe output testing for annotateIan Rogers
Parameterize the basic testing to generate directly a perf.data file or to generate/use one from pipe input or output. To simplify the refactor move some of the head/grep logic around. Use "-q" with grep to make the test output cleaner. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250311211635.541090-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-14perf test: Fixes to variable expansion and stdout for diff testIan Rogers
When make_data fails its error message needs to go to stderr rather than stdout and the stdout value is captured in a variable. Quote the $err value so that it is always a valid input for test. This error is commonly encountered if no sample data is gathered by the test. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250312001841.1515779-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-14perf libunwind: Fixup conversion perf_sample->user_regs to a pointerArnaldo Carvalho de Melo
The dc6d2bc2d893a878 ("perf sample: Make user_regs and intr_regs optional") misses the changes to a file, resulting in this problem: $ make LIBUNWIND=1 -C tools/perf O=/tmp/build/perf-tools-next install-bin <SNIP> CC /tmp/build/perf-tools-next/util/unwind-libunwind-local.o CC /tmp/build/perf-tools-next/util/unwind-libunwind.o <SNIP> util/unwind-libunwind-local.c: In function ‘access_mem’: util/unwind-libunwind-local.c:582:56: error: ‘ui->sample->user_regs’ is a pointer; did you mean to use ‘->’? 582 | if (__write || !stack || !ui->sample->user_regs.regs) { | ^ | -> util/unwind-libunwind-local.c:587:38: error: passing argument 2 of ‘perf_reg_value’ from incompatible pointer type [-Wincompatible-pointer-types] 587 | ret = perf_reg_value(&start, &ui->sample->user_regs, | ^~~~~~~~~~~~~~~~~~~~~~ | | | struct regs_dump ** <SNIP> ⬢ [acme@toolbox perf-tools-next]$ git bisect bad dc6d2bc2d893a878e7b58578ff01b4738708deb4 is the first bad commit commit dc6d2bc2d893a878e7b58578ff01b4738708deb4 (HEAD) Author: Ian Rogers <irogers@google.com> Date: Mon Jan 13 11:43:45 2025 -0800 perf sample: Make user_regs and intr_regs optional Detected using: make -C tools/perf build-test Fixes: dc6d2bc2d893a878 ("perf sample: Make user_regs and intr_regs optional") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250313033121.758978-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-14perf test stat_all_pmu.sh: Correctly check 'perf stat' resultVeronika Molnarova
Test case "stat_all_pmu.sh" is not correctly checking 'perf stat' output due to a poor design. Firstly, having the 'set -e' option with a trap catching the sigexit causes the shell to exit immediately if 'perf stat' ends with any non-zero value, which is then caught by the trap reporting an unexpected signal. This causes events that should be parsed by the if-else statement to be caught by the trap handler and are reported as errors: $ perf test -vv "perf all pmu" Testing i915/actual-frequency/ Unexpected signal in main Error: Access to performance monitoring and observability operations is limited. Secondly, the if-else branches are not exclusive as the checking if the event is present in the output log covers also the "<not supported>" events, which should be accepted, and also the "Bad name events", which should be rejected. Remove the "set -e" option from the test case, correctly parse the "perf stat" output log and check its return value. Add the missing outputs for the 'perf stat' result and also add logs messages to report the branch that parsed the event for more info. Fixes: 7e73ea40295620e7 ("perf test: Ignore security failures in all PMU test") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Tested-by: Qiao Zhao <qzhao@redhat.com> Link: https://lore.kernel.org/r/20241122231233.79509-1-vmolnaro@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-14perf script: Update brstack syntax documentationYujie Liu
The following commits added new fields/flags to the branch stack field list: commit 1f48989cdc7d ("perf script: Output branch sample type") commit 6ade6c646035 ("perf script: Show branch speculation info") commit 1e66dcff7b9b ("perf script: Add not taken event for branch stack") Update brstack syntax documentation to be consistent with the latest branch stack field list. Improve the descriptions to help users interpret the fields accurately. Signed-off-by: Yujie Liu <yujie.liu@intel.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20250312072329.419020-1-yujie.liu@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf script: Fix typo in branch event maskYujie Liu
BRACH -> BRANCH Fixes: 88b1473135e4 ("perf script: Separate events from branch types") Signed-off-by: Yujie Liu <yujie.liu@intel.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250312075636.429127-1-yujie.liu@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf hist stdio: Do bounds check when printing callchains to avoid UB with ↵Arnaldo Carvalho de Melo
new gcc versions Do a simple bounds check to avoid this on new gcc versions: 31 15.81 fedora:rawhide : FAIL gcc version 15.0.1 20250225 (Red Hat 15.0.1-0) (GCC) In function 'callchain__fprintf_left_margin', inlined from 'callchain__fprintf_graph.constprop' at ui/stdio/hist.c:246:12: ui/stdio/hist.c:27:39: error: iteration 2147483647 invokes undefined behavior [-Werror=aggressive-loop-optimizations] 27 | for (i = 0; i < left_margin; i++) | ~^~ ui/stdio/hist.c:27:23: note: within this loop 27 | for (i = 0; i < left_margin; i++) | ~~^~~~~~~~~~~~~ cc1: all warnings being treated as errors Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250310194534.265487-4-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf units: Fix insufficient array spaceArnaldo Carvalho de Melo
No need to specify the array size, let the compiler figure that out. This addresses this compiler warning that was noticed while build testing on fedora rawhide: 31 15.81 fedora:rawhide : FAIL gcc version 15.0.1 20250225 (Red Hat 15.0.1-0) (GCC) util/units.c: In function 'unit_number__scnprintf': util/units.c:67:24: error: initializer-string for array of 'char' is too long [-Werror=unterminated-string-initialization] 67 | char unit[4] = "BKMG"; | ^~~~~~ cc1: all warnings being treated as errors Fixes: 9808143ba2e54818 ("perf tools: Add unit_number__scnprintf function") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250310194534.265487-3-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Add --code-with-type option.Namhyung Kim
This option is to show data type info in the regular (code) annotation. It tries to find data type for each (memory) instruction in the function. It'd be useful to see function-level memory access pattern and also to debug the data type profiling result. The output would be added at the end of the line and have "# data-type:" prefix. For now, it only works with --stdio mode for simplicity. I can work on enabling it for TUI later. $ perf annotate --stdio --code-with-type Percent | Source code & Disassembly of vmlinux for cpu/mem-loads/ppk (253 samples, percent: local period) --------------------------------------------------------------------------------------------------------------- : 0 0xffffffff81baa000 <check_preemption_disabled>: 0.00 : ffffffff81baa000: pushq %r12 # data-type: (stack operation) 0.00 : ffffffff81baa002: pushq %rbp # data-type: (stack operation) 0.00 : ffffffff81baa003: pushq %rbx # data-type: (stack operation) 0.00 : ffffffff81baa004: subq $0x8, %rsp 18.00 : ffffffff81baa008: movl %gs:0x7e48893d(%rip), %ebx # 0x3294c <pcpu_hot+0xc> # data-type: struct pcpu_hot +0xc (cpu_number) 12.58 : ffffffff81baa00f: movl %gs:0x7e488932(%rip), %eax # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa016: testl $0x7fffffff, %eax 0.00 : ffffffff81baa01b: je 0xffffffff81baa02c <check_preemption_disabled+0x2c> 0.00 : ffffffff81baa01d: addq $0x8, %rsp 0.00 : ffffffff81baa021: movl %ebx, %eax 14.19 : ffffffff81baa023: popq %rbx # data-type: (stack operation) 18.86 : ffffffff81baa024: popq %rbp # data-type: (stack operation) 12.10 : ffffffff81baa025: popq %r12 # data-type: (stack operation) 17.78 : ffffffff81baa027: jmp 0xffffffff81bc1170 <__x86_return_thunk> 6.49 : ffffffff81baa02c: callq *0xc9139e(%rip) # 0xffffffff8283b3d0 <pv_ops+0xf0> # data-type: (stack operation) 0.00 : ffffffff81baa032: testb $0x2, %ah 0.00 : ffffffff81baa035: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa037: movq %rdi, %rbp 0.00 : ffffffff81baa03a: movq %gs:0x32940, %rax # data-type: struct pcpu_hot +0 (current_task) 0.00 : ffffffff81baa043: testb $0x4, 0x2f(%rax) # data-type: struct task_struct +0x2f (flags) 0.00 : ffffffff81baa047: je 0xffffffff81baa052 <check_preemption_disabled+0x52> 0.00 : ffffffff81baa049: cmpl $0x1, 0x3d0(%rax) # data-type: struct task_struct +0x3d0 (nr_cpus_allowed) 0.00 : ffffffff81baa050: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa052: movq %gs:0x32940, %r12 # data-type: struct pcpu_hot +0 (current_task) 0.00 : ffffffff81baa05b: cmpw $0x0, 0x7f0(%r12) # data-type: struct task_struct +0x7f0 (migration_disabled) 0.00 : ffffffff81baa065: movq %rsi, (%rsp) 0.00 : ffffffff81baa069: jne 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa06b: movl 0xe8dd13(%rip), %eax # 0xffffffff82a37d84 <system_state> # data-type: enum system_states +0 0.00 : ffffffff81baa071: testl %eax, %eax 0.00 : ffffffff81baa073: je 0xffffffff81baa01d <check_preemption_disabled+0x1d> 0.00 : ffffffff81baa075: incl %gs:0x7e4888cc(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa07c: movq $-0x7e14a100, %rdi 0.00 : ffffffff81baa083: callq 0xffffffff81148c40 <__printk_ratelimit> # data-type: (stack operation) 0.00 : ffffffff81baa088: testl %eax, %eax 0.00 : ffffffff81baa08a: je 0xffffffff81baa0d5 <check_preemption_disabled+0xd5> 0.00 : ffffffff81baa08c: movl 0x958(%r12), %r9d # data-type: struct task_struct +0x958 (pid) 0.00 : ffffffff81baa094: movq (%rsp), %rdx # data-type: char* +0 0.00 : ffffffff81baa098: movq %rbp, %rsi 0.00 : ffffffff81baa09b: leaq 0xb88(%r12), %r8 # data-type: struct task_struct +0xb88 (comm) 0.00 : ffffffff81baa0a3: movl %gs:0x7e48889e(%rip), %ecx # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa0aa: andl $0x7fffffff, %ecx 0.00 : ffffffff81baa0b0: movq $-0x7dd3cdf0, %rdi 0.00 : ffffffff81baa0b7: subl $0x1, %ecx 0.00 : ffffffff81baa0ba: callq 0xffffffff81149340 <_printk> # data-type: (stack operation) 0.00 : ffffffff81baa0bf: movq 0x20(%rsp), %rsi 0.00 : ffffffff81baa0c4: movq $-0x7ddb8c7e, %rdi 0.00 : ffffffff81baa0cb: callq 0xffffffff81149340 <_printk> # data-type: (stack operation) 0.00 : ffffffff81baa0d0: callq 0xffffffff81b7ab60 <dump_stack> # data-type: (stack operation) 0.00 : ffffffff81baa0d5: decl %gs:0x7e48886c(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count) 0.00 : ffffffff81baa0dc: jmp 0xffffffff81baa01d <check_preemption_disabled+0x1d> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Implement code + data type annotationNamhyung Kim
Sometimes it's useful to see both instructions and their data type together. Let's extend the annotate code to use data type profiling functions. To make it easy to pass more argument, introduce a struct to carry necessary information together. Also add a new annotation_option called 'code_with_type' to control the behavior. This is not enabled yet but it'll be set later from the command line. For simplicity, this is implemented for --stdio only. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Factor out __hist_entry__get_data_type()Namhyung Kim
So that it can only handle a single disasm_linme and hopefully make the code simpler. This is also a preparation to be called from different places later. The NO_TYPE macro was added to distinguish when it failed or needs retry. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Pass hist_entry to annotate functionsNamhyung Kim
It's a prepartion to support code annotation and data type annotation at the same time. Data type annotation needs more information in the hist_entry so it needs to be passed deeper. Also rename a function with the same name in the builtin-annotate.c to hist_entry__stdio_annotate since it matches better to the command line option. And change the condition inside to be simpler. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Pass annotation_options to annotation_line__print()Namhyung Kim
The annotation_line__print() has many arguments. But min_percent, max_lines and percent_type are from struct annotaion_options. So let's pass a pointer to the option instead of passing them separately to reduce the number of function arguments. Actually it has a recursive call if 'queue' is set. Add a new option instance to pass different values for the case. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate: Remove unused len parameter from annotation_line__print()Namhyung Kim
It's not used anywhere, let's get rid of it. Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf annotate-data: Add annotated_data_type__get_member_name()Namhyung Kim
Factor out a function to get the name of member field at the given offset. This will be used in other places. Also update the output of typeoff sort key a little bit. As we know that some special types like (stack operation), (stack canary) and (unknown) won't have fields, skip printing the offset and field. For example, the following change is expected. "(stack operation) +0 (no field)" ==> "(stack operation)" Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250310224925.799005-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf ftrace: Use atomic inc to update histogram in BPFNamhyung Kim
It should use an atomic instruction to update even if the histogram is keyed by delta as it's also used for stats. Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250227191223.1288473-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf ftrace: Remove an unnecessary condition check in BPFNamhyung Kim
The bucket_num is set based on the {max,min}_latency already in cmd_ftrace(), so no need to check it again in BPF. Also I found that it didn't pass the max_latency to BPF. :) No functional changes intended. Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250227191223.1288473-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-13perf ftrace: Fix latency stats with BPFNamhyung Kim
When BPF collects the stats for the latency in usec, it first divides the time by 1000. But that means it would have 0 if the delta is small and won't update the total time properly. Let's keep the stats in nsec always and adjust to usec before printing. Before: $ sudo ./perf ftrace latency -ab -T mutex_lock --hide-empty -- sleep 0.1 # DURATION | COUNT | GRAPH | 0 - 1 us | 765 | ############################################# | 1 - 2 us | 10 | | 2 - 4 us | 2 | | 4 - 8 us | 5 | | # statistics (in usec) total time: 0 <<<--- (here) avg time: 0 max time: 6 min time: 0 count: 782 After: $ sudo ./perf ftrace latency -ab -T mutex_lock --hide-empty -- sleep 0.1 # DURATION | COUNT | GRAPH | 0 - 1 us | 880 | ############################################ | 1 - 2 us | 13 | | 2 - 4 us | 8 | | 4 - 8 us | 3 | | # statistics (in usec) total time: 268 <<<--- (here) avg time: 0 max time: 6 min time: 0 count: 904 Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250227191223.1288473-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf test stat: Additional topdown grouping testsIan Rogers
Add a loop and helper function to avoid repetition, the loop uses arrays so switch the shell to bash. Add additional topdown group tests where a topdown event needs to be moved beyond others and the slots event isn't first in the target group. This replicates issues that occur on hybrid systems where the other events are for the cpu_atom PMU. Test with both PMU and software events. Place the slots event later in the event list. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250307023906.1135613-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf x86 evlist: Update comments on topdown regroupingDapeng Mi
Update to remove comments about groupings not working and with the: ``` perf stat -e "{instructions,slots},{cycles,topdown-retiring}" ``` case that now works. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250307023906.1135613-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf parse-events: Corrections to topdown sortingIan Rogers
In the case of '{instructions,slots},faults,topdown-retiring' the first event that must be grouped, slots, is ignored causing the topdown-retiring event not to be adjacent to the group it needs to be inserted into. Don't ignore the group members when computing the force_grouped_index. Make the force_grouped_index be for the leader of the group it is within and always use it first rather than a group leader index so that topdown events may be sorted from one group into another. As the PMU name comparison applies to moving events in the same group ensure the name ordering is always respected. Change the group splitting logic to not group if there are no other topdown events and to fix cases where the force group leader wasn't being grouped with the other members of its group. Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Closes: https://lore.kernel.org/lkml/20250224083306.71813-2-dapeng1.mi@linux.intel.com/ Closes: https://lore.kernel.org/lkml/f7e4f7e8-748c-4ec7-9088-0e844392c11a@linux.intel.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20250307023906.1135613-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf x86/topdown: Fix topdown leader sampling test error on hybridDapeng Mi
When running topdown leader smapling test on Intel hybrid platforms, such as LNL/ARL, we see the below error. Topdown leader sampling test Topdown leader sampling [Failed topdown events not reordered correctly] It indciates the below command fails. perf record -o "${perfdata}" -e "{instructions,slots,topdown-retiring}:S" true The root cause is that perf tool creats a perf event for each PMU type if it can create. As for this command, there would be 5 perf events created, cpu_atom/instructions/,cpu_atom/topdown_retiring/, cpu_core/slots/,cpu_core/instructions/,cpu_core/topdown-retiring/ For these 5 events, the 2 cpu_atom events are in a group and the other 3 cpu_core events are in another group. When arch_topdown_sample_read() traverses all these 5 events, events cpu_atom/instructions/ and cpu_core/slots/ don't have a same group leade, and then return false directly and lead to cpu_core/slots/ event is used to sample and this is not allowed by PMU driver. It's a overkill to return false directly if "evsel->core.leader != leader->core.leader" since there could be multiple groups in the event list. Just "continue" instead of "return false" to fix this issue. Fixes: 1e53e9d1787b ("perf x86/topdown: Correct leader selection with sample_read enabled") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Tested-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250307023906.1135613-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf tools: Improve handling of hybrid PMUs in perf_event_attr__fprintfIan Rogers
Support the PMU name from the legacy hardware and hw_cache PMU extended types. Remove some macros and make variables more intention revealing, rather than just being called "value". Before: ``` $ perf stat -vv -e instructions true ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0xa00000001 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 181636 cpu -1 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0x400000001 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid 181636 cpu -1 group_fd -1 flags 0x8 = 6 ... ``` After: ``` $ perf stat -vv -e instructions true ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0xa00000001 (cpu_atom/PERF_COUNT_HW_INSTRUCTIONS/) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 181724 cpu -1 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0x400000001 (cpu_core/PERF_COUNT_HW_INSTRUCTIONS/) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 181724 cpu -1 group_fd -1 flags 0x8 = 6 ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Tested-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250307023906.1135613-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python tracepoint: Switch to using parse_eventsIan Rogers
Rather than manually configuring an evsel, switch to using parse_events for greater commonality with the rest of the perf code. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add evlist.config to set up record optionsIan Rogers
Add access to evlist__config that is used to configure an evlist with record options. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add evlist all_cpus accessorIan Rogers
Add a means to get the reference counted all_cpus CPU map from an evlist in its python form. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Avoid duplicated code in get_tracepoint_fieldIan Rogers
The code replicates computations done in evsel__tp_format, reuse evsel__tp_format to simplify the python C code. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Update ungrouped evsel leader in cloneIan Rogers
evsels are cloned in the python code as they form part of the Python object pyrf_evsel. The cloning doesn't update the evsel's leader, do this for the case of an evsel being ungrouped. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add optional cpus and threads arguments to parse_eventsIan Rogers
Used for the evlist initialization. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add member access to a number of evsel variablesIan Rogers
Most variables are part of the perf_event_attr, so that they may be queried and modified. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf python: Add evlist enable and disable methodsIan Rogers
By default the evsels from parse_events will be disabled. Add access to the evlist functions so they can be enabled/disabled. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf evsel: tp_format accessing improvementsIan Rogers
Ensure evsel__clone copies the tp_sys and tp_name variables. In evsel__tp_format, if tp_sys isn't set, use the config value to find the tp_format. This succeeds in python code where pyrf__tracepoint has already found the format. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-4-irogers@google.com Fixes: 6c8310e8380d472c ("perf evsel: Allow evsel__newtp without libtraceevent") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf evlist: Add success path to evlist__create_syswide_mapsIan Rogers
Over various refactorings evlist__create_syswide_maps has been made to only ever return with -ENOMEM. Fix this so that when perf_evlist__set_maps is successfully called, 0 is returned. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-3-irogers@google.com Fixes: 8c0498b6891d7ca5 ("perf evlist: Fix create_syswide_maps() not propagating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-11perf debug: Avoid stack overflow in recursive error messageIan Rogers
In debug_file, pr_warning_once is called on error. As that function calls debug_file the function will yield a stack overflow. Switch the location of the call so the recursion is avoided. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250228222308.626803-2-irogers@google.com Fixes: ec49230cf6dda704 ("perf debug: Expose debug file") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf symbol: Support .gnu_debugdata for symbolsStephen Brennan
Fedora introduced a "MiniDebuginfo" feature, in which an LZMA-compressed ELF file is placed inside a section named ".gnu_debugdata". This file contains nothing but a symbol table, which can be used to supplement the .dynsym section which only contains required symbols for runtime. It is supported by GDB for stack traces, but it should be useful for tracing as well. Implement support for loading symbols from .gnu_debugdata. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250307232206.2102440-4-stephen.s.brennan@oracle.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-10perf tools: Add LZMA decompression from FILEStephen Brennan
Internally lzma_decompress_to_file() creates a FILE from the filename. Add an API that takes an existing FILE directly. This allows decompressing already-open files and even buffers opened by fmemopen(). It is necessary for supporting .gnu_debugdata in the next patch. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250307232206.2102440-3-stephen.s.brennan@oracle.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>