summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2025-03-07perf report: Use map_symbol__copy() when copying callchainsNamhyung Kim
It seems there are places to miss updating refcount of maps. Let's use map_symbol__copy() helper to properly copy them with refcounts updated. Link: https://lore.kernel.org/r/20250307061250.320849-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-06perf annotate: Return errors from disasm_line__parse_powerpc()Athira Rajeev
In disasm_line__parse_powerpc() , return code from function disasm_line__parse() is ignored. This will result in bad results if the disasm_line__parse() fails to disasm the line. Use the return code to fix this. Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/r/20250304154114.62093-2-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-06perf annotate: Add annotation_options.disassembler_usedAthira Rajeev
When doing "perf annotate", perf tool provides option to use specific disassembler like llvm/objdump/capstone. The order picked is to use llvm first and if that fails fallback to objdump ie to use PERF_DISASM_LLVM, PERF_DISASM_CAPSTONE and PERF_DISASM_OBJDUMP In powerpc, when using "data type" sort keys, first preferred approach is to read the raw instruction from the DSO. In objdump is specified in "--objdump" option, it picks the symbol disassemble using objdump. Currently disasm_line__parse_powerpc() function uses length of the "line" to determine if objdump is used. But there are few cases, where if objdump doesn't recognise the instruction, the disassembled string will be empty. Example: 134cdc: c4 05 82 41 beq 1352a0 <getcwd+0x6e0> 134ce0: ac 00 8e 40 bne cr3,134d8c <getcwd+0x1cc> 134ce4: 0f 00 10 04 pld r9,1028308 ====>134ce8: d4 b0 20 e5 134cec: 16 00 40 39 li r10,22 134cf0: 48 01 21 ea ld r17,328(r1) So depending on length of line will give bad results. Add a new filed to annotation options structure, "struct annotation_options" to save the disassembler used. Use this info to determine if disassembly is done while parsing the disasm line. Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/r/20250304154114.62093-1-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-06perf report: Do not process non-JIT BPF ksymbol eventsNamhyung Kim
The length of PERF_RECORD_KSYMBOL for BPF is a size of JITed code so it'd be 0 when it's not JITed. The ksymbol is needed to symbolize the code when it gets samples in the region but non-JITed code cannot get samples. Thus it'd be ok to ignore them. Actually it caused a performance issue in the perf tools on old ARM kernels where it can refuse to JIT some BPF codes. It ended up splitting the existing kernel map (kallsyms). And later lookup for a kernel symbol would create a new kernel map from kallsyms and then split it again and again. :( Probably there's a bug in the kernel map/symbol handling in perf tools. But I think we need to fix this anyway. Reported-by: Kevin Nomura <nomurak@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250305232838.128692-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-06perf test: Fix leak in "Synthesize attr update" testIan Rogers
The own_cpus map variable may be non-NULL and hold a reference, in particular on hybrid machines. Do a put before overwriting the variable to avoid a memory leak. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250305191931.604764-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf machine: Fix insertion of PERF_RECORD_KSYMBOL related kernel mapsNamhyung Kim
This was detected at the end of a 'perf record' session when build-id collection was enabled and thus the BPF programs put in place while the session was running, some even put in place by perf itself were processed and inserted, with some overlaps related to BPF trampolines and programs took place. Using maps__fixup_overlap_and_insert() instead of maps__insert() "fixes" the problem, in the sense that overlaps will be dealt with and then the consistency will be kept, but it would be interesting to fully understand why such overlaps take place and how to deal with them when doing symbol resolution. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Suggested-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/CAP-5=fXEEMFgPF2aZhKsfrY_En+qoqX20dWfuE_ad73Uxf0ZHQ@mail.gmail.com Link: https://lore.kernel.org/r/20250228211734.33781-7-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf maps: Add missing map__set_kmap_maps() when replacing a kernel mapArnaldo Carvalho de Melo
Since in this case __maps__insert_sorted() is not called and thus doesn't have the opportunity to do the needed map__set_kmap_maps() calls on the new map. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z7-May5w9VQd5QD0@x1 Link: https://lore.kernel.org/r/20250228211734.33781-6-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf maps: Fixup maps_by_name when modifying maps_by_addressNamhyung Kim
We can't just replacing the map in the maps_by_address and not touching on the maps_by_name, that would leave the refcount as 1 and thus trip another consistency check, this one: perf: util/maps.c:110: check_invariants: Assertion `refcount_read(map__refcnt(map)) > 1' failed. 106 /* 107 * Maps by name maps should be in maps_by_address, so 108 * the reference count should be higher. 109 */ 110 assert(refcount_read(map__refcnt(map)) > 1); Committer notice: Initialize the newly added 'ni' variable, that really can't be accessed unitialized trips some gcc versions, like: 12 20.00 archlinux:base : FAIL gcc version 13.2.1 20230801 (GCC) util/maps.c: In function ‘__maps__fixup_overlap_and_insert’: util/maps.c:896:54: error: ‘ni’ may be used uninitialized [-Werror=maybe-uninitialized] 896 | map__put(maps_by_name[ni]); | ^ util/maps.c:816:25: note: ‘ni’ was declared here 816 | unsigned int i, ni; | ^~ cc1: all warnings being treated as errors make[3]: *** [/git/perf-6.14.0-rc1/tools/build/Makefile.build:138: util] Error 2 Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z79std66tPq-nqsD@google.com Link: https://lore.kernel.org/r/20250228211734.33781-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf machine: Fixup kernel maps ends after adding extra mapsNamhyung Kim
I just noticed it would add extra kernel maps after modules. I think it should fixup end address of the kernel maps after adding all maps first. Fixes: 876e80cf83d10585 ("perf tools: Fixup end address of modules") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z7TvZGjVix2asYWI@x1 Link: https://lore.kernel.org/lkml/Z712hzvv22Ni63f1@google.com Link: https://lore.kernel.org/r/20250228211734.33781-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf maps: Set the kmaps for newly created/added kernel mapsArnaldo Carvalho de Melo
When using __maps__insert_sorted() the map kmaps field needs to be initialized, as we need kernel maps to work with map__kmap(). Fix it by using the newly introduced map__set_kmap() method. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z74V0hZXrTLM6VIJ@x1 Link: https://lore.kernel.org/r/20250228211734.33781-3-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf maps: Introduce map__set_kmap_maps() for kernel mapsArnaldo Carvalho de Melo
We need to set it in other places than __maps__insert(), so that we can have access to the 'struct maps' from a kernel 'struct map'. When building perf with 'DEBUG=1' we can notice it failing a consistency check done in the check_invariants() function: root@number:~# perf record -- perf test -w offcpu [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.040 MB perf.data (23 samples) ] perf: util/maps.c:95: check_invariants: Assertion `map__end(prev) <= map__end(map)' failed. Aborted (core dumped) root@number:~# The investigation on that was happening bisected to 876e80cf83d10585 ("perf tools: Fixup end address of modules"), and the following patches will plug the problems found, this patch is just legwork on that direction. Use the map__set_kmap_maps() name as per a review comment from Ian Rogers, later there are further suggestions from him on getting rid of the kmaps variable, see the thread referenced in the Link below. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/lkml/Z74V0hZXrTLM6VIJ@x1 Link: https://lore.kernel.org/r/20250228211734.33781-2-acme@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf script: Fix output type for dynamically allocated core PMU'sThomas Falcon
This patch was originally posted here: https://lore.kernel.org/all/20241213215421.661139-1-thomas.falcon@intel.com/ I have rebased on top of Arnaldo's patch here: https://lore.kernel.org/all/Z2XCi3PgstSrV0SE@x1/ The original commit message: " perf script output may show different fields on different core PMU's that exist on heterogeneous platforms. For example, perf record -e "{cpu_core/mem-loads-aux/,cpu_core/event=0xcd,\ umask=0x01,ldlat=3,name=MEM_UOPS_RETIRED.LOAD_LATENCY/}:upp"\ -c10000 -W -d -a -- sleep 1 perf script: chromium-browse 46572 [002] 544966.882384: 10000 cpu_core/MEM_UOPS_RETIRED.LOAD_LATENCY/: 7ffdf1391b0c 10268100142 \ |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 5 7 0 7fad7c47425d [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3) perf record -e cpu_atom/event=0xd0,umask=0x05,ldlat=3,\ name=MEM_UOPS_RETIRED.LOAD_LATENCY/upp -c10000 -W -d -a -- sleep 1 perf script: gnome-control-c 534224 [023] 544951.816227: 10000 cpu_atom/MEM_UOPS_RETIRED.LOAD_LATENCY/: 7f0aaaa0aae0 [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3) Some fields, such as data_src, are not included by default. The cause is that while one PMU may be assigned a type such as PERF_TYPE_RAW, other core PMU's are dynamically allocated at boot time. If this value does not match an existing PERF_TYPE_X value, output_type(perf_event_attr.type) will return OUTPUT_TYPE_OTHER. Instead search for a core PMU with a matching perf_event_attr type and, if one is found, return PERF_TYPE_RAW to match output of other core PMU's. " Suggested-by: Kan Liang <kan.liang@intel.com> Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250305163935.1605312-1-thomas.falcon@intel.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf bench: Fix perf bench syscall loop countThomas Richter
Command 'perf bench syscall fork -l 100000' offers option -l to run for a specified number of iterations. However this option is not always observed. The number is silently limited to 10000 iterations as can be seen: Output before: # perf bench syscall fork -l 100000 # Running 'syscall/fork' benchmark: # Executed 10,000 fork() calls Total time: 23.388 [sec] 2338.809800 usecs/op 427 ops/sec # When explicitly specified with option -l or --loops, also observe higher number of iterations: Output after: # perf bench syscall fork -l 100000 # Running 'syscall/fork' benchmark: # Executed 100,000 fork() calls Total time: 716.982 [sec] 7169.829510 usecs/op 139 ops/sec # This patch fixes the issue for basic execve fork and getpgid. Fixes: ece7f7c0507c ("perf bench syscall: Add fork syscall benchmark") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/20250304092349.2618082-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf test: Simplify data symbol testNamhyung Kim
Now the workload will end after 1 second. Just run it with perf instead of waiting for the background process. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304022837.1877845-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf test: Add timeout to datasym workloadNamhyung Kim
Unlike others it has an infinite loop that make it annoying to call. Make it finish after 1 second and handle command-line argument to change the setting. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304022837.1877845-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf test: Add trace record and replay testNamhyung Kim
It just check trace record and replay could display correct output. It uses 'sleep' process and sees there's a clock_nanosleep syscall. $ sudo perf test -vv replay 108: perf trace record and replay: --- start --- test child forked, pid 1563219 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.077 MB /tmp/temporary_file.w1ApA (242 samples) ] 0.686 (1000.068 ms): sleep/1563226 clock_nanosleep(rqtp: 0x7ffc20ffee10, rmtp: 0x7ffc20ffee50) = 0 ---- end(0) ---- 108: perf trace record and replay : Ok Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250304022837.1877845-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf test: Skip perf trace tests when running as non-rootNamhyung Kim
perf trace requires root because it needs to use tracepoints and BPF. Skip those test when it's not run as root. Before: $ perf test trace 15: Parse sched tracepoints fields : Skip (permissions) 80: perf ftrace tests : Skip 105: perf trace enum augmentation tests : FAILED! 106: perf trace BTF general tests : FAILED! 107: perf trace exit race : FAILED! 118: probe libc's inet_pton & backtrace it with ping : Skip 125: Check Arm CoreSight trace data recording and synthesized samples: Skip 127: Check Arm SPE trace data recording and synthesized samples : Skip 132: Check open filename arg using perf trace + vfs_getname : FAILED! After: $ perf test trace 15: Parse sched tracepoints fields : Skip (permissions) 80: perf ftrace tests : Skip 105: perf trace enum augmentation tests : Skip 106: perf trace BTF general tests : Skip 107: perf trace exit race : Skip 118: probe libc's inet_pton & backtrace it with ping : Skip 125: Check Arm CoreSight trace data recording and synthesized samples: Skip 127: Check Arm SPE trace data recording and synthesized samples : Skip 132: Check open filename arg using perf trace + vfs_getname : Skip Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Link: https://lore.kernel.org/r/20250304022837.1877845-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf test: Skip perf probe tests when running as non-rootNamhyung Kim
perf trace requires root because it needs to use [ku]probes. Skip those test when it's not run as root. Before: $ perf test probe 47: Probe SDT events : Ok 104: test perf probe of function from different CU : FAILED! 115: perftool-testsuite_probe : FAILED! 117: Add vfs_getname probe to get syscall args filenames : FAILED! 118: probe libc's inet_pton & backtrace it with ping : FAILED! 119: Use vfs_getname probe to get syscall args filenames : FAILED! After: $ perf test probe 47: Probe SDT events : Ok 104: test perf probe of function from different CU : Skip 115: perftool-testsuite_probe : Skip 117: Add vfs_getname probe to get syscall args filenames : Skip 118: probe libc's inet_pton & backtrace it with ping : Skip 119: Use vfs_getname probe to get syscall args filenames : Skip Tested-by: Thomas Falcon <thomas.falcon@intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20250304022837.1877845-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf test: Add --metric-only to perf stat output testsNamhyung Kim
Add a test case for --metric-only for std, csv, json output mode using shadow IPC metric from instructions and cycles events. It should produce 'insn per cycle' metric. But currently JSON output has (none) 'GHz' as well. It looks like a bug but I don't have enough time to debug it for now so I made it pass. :( $ perf stat --metric-only -e instructions,cycles true Performance counter stats for 'true': 0.56 0.002127319 seconds time elapsed 0.002077000 seconds user 0.000000000 seconds sys $ perf stat -x, --metric-only -e instructions,cycles true 0.55,, $ perf stat -j --metric-only -e instructions,cycles true {"insn per cycle" : "0.53", "GHz" : "none"} $ perf test output -v 5: Test data source output : Ok 31: Sort output of hist entries : Ok 88: perf stat CSV output linter : Ok 90: perf stat JSON output linter : Ok 92: perf stat STD output linter : Ok Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250304022837.1877845-2-namhyung@kernel.org Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Support previous branch target (PBT) addressLeo Yan
When FEAT_SPE_PBT is implemented, the previous branch target address (named as PBT) before the sampled operation, will be recorded. This commit first introduces a 'prev_br_tgt' field in the record for saving the PBT address in the decoder. If the current operation is a branch instruction, by combining with PBT, it can create a chain with two consecutive branches. As the branch stack stores branches in descending order, meaning a newer branch is stored in a lower entry in the stack. Arm SPE stores the latest branch in the first entry of branch stack, and the previous branch coming from PBT is stored into the second entry. Otherwise, if current operation is not a branch, the last branch will be saved for PBT only. PBT lacks associated information such as branch source address, branch type, and events. The branch entry fills zeros for the corresponding fields and only set its target address. After: perf script -f --itrace=bl -F flags,addr,brstack jcc ffff800080187914 0xffff8000801878fc/0xffff800080187914/P/-/-/1/COND/- 0x0/0xffff8000801878f8/-/-/-/0//- jcc ffff8000802d12d8 0xffff8000802d12f8/0xffff8000802d12d8/P/-/-/1/COND/- 0x0/0xffff8000802d12ec/-/-/-/0//- jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//- jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//- jmp ffff800081410980 0xffff800081419108/0xffff800081410980/P/-/-/1//- 0x0/0xffff800081419104/-/-/-/0//- return ffff80008036e064 0xffff80008141ba84/0xffff80008036e064/P/-/-/1/RET/- 0x0/0xffff80008141ba60/-/-/-/0//- jcc ffff8000803d54f0 0xffff8000803d54e8/0xffff8000803d54f0/P/-/-/1/COND/- 0x0/0xffff8000803d54e0/-/-/-/0//- jmp ffff80008015e468 0xffff8000803d46dc/0xffff80008015e468/P/-/-/1//- 0x0/0xffff8000803d46c8/-/-/-/0//- jmp ffff8000806e2d50 0xffff80008040f710/0xffff8000806e2d50/P/-/-/1//- 0x0/0xffff80008040f6e8/-/-/-/0//- jcc ffff800080721704 0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/- 0x0/0xffff8000807216ac/-/-/-/0//- Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-13-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Add branch stackLeo Yan
Although Arm SPE cannot generate continuous branch records, this commit creates a branch stack with only one branch entry. A single branch info can be used for performance optimization. A branch stack structure is dynamically allocated in the decode queue. The branch stack and stack flags are synthesized based on branch types and associated events. After: # perf script --itrace=bl1 -F flags,addr,brstack jcc ffffc0fad9c6b214 0xffffc0fad9c6b234/0xffffc0fad9c6b214/P/-/-/7/COND/- jcc/miss,not_taken/ ffffc0fadaaebb30 0xffffc0fadaaebb2c/0xffffc0fadaaebb30/MN/-/-/7/COND/- jmp ffffc0fadaaea358 0xffffc0fadaaea5ec/0xffffc0fadaaea358/P/-/-/5//- jcc/not_taken/ ffffc0fadaae6494 0xffffc0fadaae6490/0xffffc0fadaae6494/PN/-/-/11/COND/- jcc/not_taken/ ffff7f83ab54 0xffff7f83ab50/0xffff7f83ab54/PN/-/-/13/COND/- jcc/not_taken/ ffff7f83ab08 0xffff7f83ab04/0xffff7f83ab08/PN/-/-/8/COND/- jcc ffff7f83aa80 0xffff7f83aa58/0xffff7f83aa80/P/-/-/10/COND/- jcc ffff7f9a45d0 0xffff7f9a43f0/0xffff7f9a45d0/P/-/-/29/COND/- jcc/not_taken/ ffffc0fad9ba6db4 0xffffc0fad9ba6db0/0xffffc0fad9ba6db4/PN/-/-/44/COND/- jcc ffffc0fadaac2964 0xffffc0fadaac2970/0xffffc0fadaac2964/P/-/-/6/COND/- jcc ffffc0fad99ddc10 0xffffc0fad99ddc04/0xffffc0fad99ddc10/P/-/-/72/COND/- jcc/not_taken/ ffffc0fad9b3f21c 0xffffc0fad9b3f218/0xffffc0fad9b3f21c/PN/-/-/64/COND/- jcc ffffc0fad9c3b604 0xffffc0fad9c3b5f8/0xffffc0fad9c3b604/P/-/-/13/COND/- jcc ffffc0fadaad6048 0xffffc0fadaad5f8c/0xffffc0fadaad6048/P/-/-/5/COND/- return/miss/ ffff7f84e614 0xffffc0fad98a2274/0xffff7f84e614/M/-/-/13/RET/- jcc/not_taken/ ffffc0fadaac4eb4 0xffffc0fadaac4eb0/0xffffc0fadaac4eb4/PN/-/-/5/COND/- jmp ffff7f8e3130 0xffff7f87555c/0xffff7f8e3130/P/-/-/5//- jcc/not_taken/ ffffc0fad9b3d9b0 0xffffc0fad9b3d9ac/0xffffc0fad9b3d9b0/PN/-/-/14/COND/- return ffffc0fad9b91950 0xffffc0fad98c3e28/0xffffc0fad9b91950/P/-/-/12/RET/- Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-12-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Set sample flags with supplement infoLeo Yan
Based on the supplement information in the record, this commit sets the sample flags for conditional branch, function call, return. It also sets events in flags, such as mispredict, not taken, and in transaction. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-11-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Fill branch operations and events to recordLeo Yan
The new added branch operations and events are filled into record, the information will be consumed when synthesizing samples. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-10-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Decode transactional eventLeo Yan
The bit[16] in an event payload indicates an operation is in transactional state. Decode the bit. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-9-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Extend branch operationsLeo Yan
In Arm ARM (ARM DDI 0487, L.a), the section "D18.2.7 Operation Type packet", the branch subclass is extended for Call Return (CR), Guarded control stack data access (GCS). This commit adds support CR and GCS operations. The IND (indirect) operation is defined only in bit [1], its macro is updated accordingly. Move the COND (Conditional) macro into the same group with other operations for better maintenance. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-8-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Fix load-store operation checkingLeo Yan
The ARM_SPE_OP_LD and ARM_SPE_OP_ST operations are secondary operation type, they are overlapping with other second level's operation types belonging to SVE and branch operations. As a result, a non load-store operation can be parsed for data source and memory sample. To fix the issue, this commit introduces a is_ldst_op() macro for checking LDST operation, and apply the checking when synthesize data source and memory samples. Fixes: a89dbc9b988f ("perf arm-spe: Set sample's data source field") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250304111240.3378214-7-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf script: Add not taken event for branch stackLeo Yan
The branch stack has an existed field for printing mispredict, extend the field for printing events and add support not-taken event. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-6-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf script: Add not taken event for branchesLeo Yan
Some hardware (e.g., Arm SPE) can trace the not taken event for branches. Add a flag for this event and support printing it. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-5-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf script: Separate events from branch typesLeo Yan
Branch types and events are two different things. A branch type can be a conditional branch, an indirect branch, a procedure call, a return, or an exception taken, etc. The extra event information is provided for what happens during a branch, e.g. if a branch is mispredicted or not taken (specific to conditional branches). To deliver information about branches, this commit separates events from branch types. It parses branch types first, then appends event strings embraced by the '/' character. If multiple events occur, the events is separated with a comma (,). Also add a minor improvement by adding char 'm' in char array for branch mispredict event. Below are extracted sample flags. Before: branch: br miss instructions: br miss After: branch: jmp/miss/ instructions: jmp/miss/ Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf script: Refactor sample_flags_to_name() functionLeo Yan
When generating a string for sample flags, the sample_flags_to_name() function lacks the ability to parse the trace start bit or trace end bit. Therefore, the function is invoked multiple times after clearing its unsupported bits. This commit improves the sample_flags_to_name() function to parse sample flags in one go for three kinds of information: - The prefix info for trace start, trace end, etc. - Branch types. - Extra info for transaction and interrupt related info. As a result, the code is simplified to call the sample_flags_to_name() only once. No expectation for any changes in the perf script output. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf script: Make printing flags reliableLeo Yan
Add a check for the generated string of flags. Print out the raw number if the string generation fails. Use the SAMPLE_FLAGS_STR_ALIGNED_SIZE macro to replace the value '21'. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20250304111240.3378214-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-03perf stat: Fix non-uniquified hybrid legacy eventsJames Clark
Legacy hybrid events have attr.type == PERF_TYPE_HARDWARE, so they look like plain legacy events if we only look at attr.type. But legacy events should still be uniquified if they were opened on a non-legacy PMU. Fix it by checking if the evsel is hybrid and forcing needs_uniquify before looking at the attr.type. This restores PMU names on hybrid systems and also changes "perf stat metrics (shadow stat) test" from a FAIL back to a SKIP (on hybrid). The test was gated on "cycles" appearing alone which doesn't happen on here. Before: $ perf stat -- true ... <not counted> instructions:u (0.00%) 162,536 instructions:u # 0.58 insn per cycle ... After: $ perf stat -- true ... <not counted> cpu_atom/instructions/u (0.00%) 162,541 cpu_core/instructions/u # 0.62 insn per cycle ... Fixes: 357b965deba9 ("perf stat: Changes to event name uniquification") Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://lore.kernel.org/r/20250226145526.632380-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-02perf tools: Skip BPF sideband event for userspace profilingNamhyung Kim
The BPF sideband information is tracked using a separate thread and evlist. But it's only useful for profiling kernel and we can skip it when users profile their application only. It seems it already fails to open the sideband event in that case. Let's remove the noise in the verbose output anyway. Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250226203039.1099131-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-28perf test: Fix spelling mistake "sythesizing" -> "synthesizing"Colin Ian King
There are spelling mistakes in TEST_ASSERT_VAL messages. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20250228090941.680226-1-colin.i.king@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-28perf build: Fix in-tree build due to symbolic linkLuca Ceresoli
Building perf in-tree is broken after commit 890a1961c812 ("perf tools: Create source symlink in perf object dir") which added a 'source' symlink in the output dir pointing to the source dir. With in-tree builds, the added 'SOURCE = ...' line is executed multiple times (I observed 2 during the build plus 2 during installation). This is a minor inefficiency, in theory not harmful because symlink creation is assumed to be idempotent. But it is not. Considering with in-tree builds: srctree=/absolute/path/to/linux OUTPUT=/absolute/path/to/linux/tools/perf here's what happens: 1. ln -sf $(srctree)/tools/perf $(OUTPUT)/source -> creates /absolute/path/to/linux/tools/perf/source link to /absolute/path/to/linux/tools/perf => OK, that's what was intended 2. ln -sf $(srctree)/tools/perf $(OUTPUT)/source # same command as 1 -> creates /absolute/path/to/linux/tools/perf/perf link to /absolute/path/to/linux/tools/perf => Not what was intended, not idempotent 3. Now the build _should_ create the 'perf' executable, but it fails The reason is the tricky 'ln' command line. At the first invocation 'ln' uses the 1st form: ln [OPTION]... [-T] TARGET LINK_NAME and creates a link to TARGET *called LINK_NAME*. At the second invocation $(OUTPUT)/source exists, so 'ln' uses the 3rd form: ln [OPTION]... TARGET... DIRECTORY and creates a link to TARGET *called TARGET* inside DIRECTORY. Fix by adding -n/--no-dereference to "treat LINK_NAME as a normal file if it is a symbolic link to a directory", as the manpage says. Closes: https://lore.kernel.org/all/20241125182506.38af9907@booty/ Fixes: 890a1961c812 ("perf tools: Create source symlink in perf object dir") Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20250124-perf-fix-intree-build-v1-1-485dd7a855e4@bootlin.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-28perf arm-spe: Report error if set frequencyLeo Yan
When users set the parameter '-F' to specify frequency for Arm SPE, the tool reports error: perf record -F 1000 -e arm_spe_0// -- sleep 1 Error: Invalid event (arm_spe_0//) in per-thread mode, enable system wide with '-a'. The output logs are confused and it does not give the correct reminding. Arm SPE does not support frequency setting given it adopts a statistical based approach. Alternatively, Arm SPE supports setting period. This commit adds a for frequency setting. It reports error and reminds users to set period instead. After: perf record -F 1000 -e arm_spe_0// -- sleep 1 Arm SPE: Frequency is not supported. Set period with -c option or PMU parameter (-e arm_spe_0/period=NUM/). Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250227085544.2154136-1-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-28perf lock: Report owner stack in usermodeChun-Tse Shao
This patch parses `owner_lock_stat` into a RB tree, enabling ordered reporting of owner lock statistics with stack traces. It also updates the documentation for the `-o` option in contention mode, decouples `-o` from `-t`, and issues a warning to inform users about the new behavior of `-ov`. Example output: $ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex-spin -E3 perf bench sched pipe ... contended total wait max wait avg wait type caller 171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57 0xffffffffac6318e7 pipe_read+0x57 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50 0xffffffffac631ee0 pipe_write+0x50 0xffffffffac6241db vfs_write+0x3bb 0xffffffffac6244ab ksys_write+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d 0xffffffffac691f0d do_epoll_wait+0x24d 0xffffffffac69249b do_epoll_pwait.part.0+0xb 0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 === owner stack trace === 3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348 0xffffffffac631bd8 pipe_read+0x348 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 ... Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-5-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-28perf lock: Make rb_tree helper functions genericChun-Tse Shao
The rb_tree helper functions can be reused for parsing `owner_lock_stat` into rb tree for sorting. Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-4-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-28perf lock: Retrieve owner callstack in bpf programChun-Tse Shao
This implements per-callstack aggregation of lock owners in addition to per-thread. The owner callstack is captured using `bpf_get_task_stack()` at `contention_begin()` and it also adds a custom stackid function for the owner stacks to be compared easily. The owner info is kept in a hash map using lock addr as a key to handle multiple waiters for the same lock. At `contention_end()`, it updates the owner lock stat based on the info that was saved at `contention_begin()`. If there are more waiters, it'd update the owner pid to itself as `contention_end()` means it gets the lock now. But it also needs to check the return value of the lock function in case task was killed by a signal or something. Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-3-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-28perf lock: Add bpf maps for owner stack tracingChun-Tse Shao
Add a struct and few bpf maps in order to tracing owner stack. `struct owner_tracing_data`: Contains owner's pid, stack id, timestamp for when the owner acquires lock, and the count of lock waiters. `stack_buf`: Percpu buffer for retrieving owner stacktrace. `owner_stacks`: For tracing owner stacktrace to customized owner stack id. `owner_data`: For tracing lock_address to `struct owner_tracing_data` in bpf program. `owner_stat`: For reporting owner stacktrace in usermode. Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-2-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-27perf cpumap: Reduce cpu size from int to int16_tIan Rogers
Fewer than 32k logical CPUs are currently supported by perf. A cpumap is indexed by an integer (see perf_cpu_map__cpu) yielding a perf_cpu that wraps a 4-byte int for the logical CPU - the wrapping is done deliberately to avoid confusing a logical CPU with an index into a cpumap. Using a 4-byte int within the perf_cpu is larger than required so this patch reduces it to the 2-byte int16_t. For a cpumap containing 16 entries this will reduce the array size from 64 to 32 bytes. For very large servers with lots of logical CPUs the size savings will be greater. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250210191231.156294-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-27perf trace: Add missing perf_tool__init()Athira Rajeev
Perf trace on perf.data fails as below: ./perf trace record -- sleep 1 ./perf trace -i perf.data perf: Segmentation fault Segmentation fault (core dumped) Backtrace pointed to : ?? () perf_session.process_user_event () reader.read_event () perf_session.process_events () cmd_trace () run_builtin () handle_internal_command () main () Further debug pointed that, segmentation fault happens when trying to access id_index. Code snippet: case PERF_RECORD_ID_INDEX: err = tool->id_index(session, event); Since 'commit 15d4a6f41d72 ("perf tool: Remove perf_tool__fill_defaults()")', perf_tool__fill_defaults is removed. All tools are initialized using perf_tool__init() prior to use. But in builtin-trace, perf_tool__init is not used and hence the defaults are not initialized. Use perf_tool__init() in perf trace to handle the initialization. Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250225113157.28836-1-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-26perf list: Document -v option deduplication featureJames Clark
-v disables deduplication of similarly suffixed PMUs so add it to the help and doc strings. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250226104111.564443-4-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-26perf pmu: Don't double count common sysfs and json eventsJames Clark
After pmu_add_cpu_aliases() is called, perf_pmu__num_events() returns an incorrect value that double counts common events and doesn't match the actual count of events in the alias list. This is because after 'cpu_aliases_added == true', the number of events returned is 'sysfs_aliases + cpu_json_aliases'. But when adding 'case EVENT_SRC_SYSFS' events, 'sysfs_aliases' and 'cpu_json_aliases' are both incremented together, failing to account that these ones overlap and only add a single item to the list. Fix it by adding another counter for overlapping events which doesn't influence 'cpu_json_aliases'. There doesn't seem to be a current issue because it's used in perf list before pmu_add_cpu_aliases() so the correct value is returned. Other uses in tests may also miss it for other reasons like only looking at uncore events. However it's marked as a fixes commit in case any new fix with new uses of perf_pmu__num_events() is backported. Fixes: d9c5f5f94c2d ("perf pmu: Count sys and cpuid JSON events separately") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250226104111.564443-3-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-26perf pmu: Dynamically allocate tool PMUJames Clark
perf_pmus__destroy() treats all PMUs as allocated and free's them so we can't have any static PMUs that are added to the PMU lists. Fix it by allocating the tool PMU in the same way as the others. Current users of the tool PMU already use find_pmu() and not perf_pmus__tool_pmu(), so rename the function to add 'new' to avoid it being misused in the future. perf_pmus__fake_pmu() can remain as static as it's not added to the PMU lists. Fixes the following error: $ perf bench internals pmu-scan # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times munmap_chunk(): invalid pointer Aborted (core dumped) Fixes: 240505b2d0ad ("perf tool_pmu: Factor tool events into their own PMU") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250226104111.564443-2-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-26perf probe: Pick the correct dwarf die while adding probe pointsAthira Rajeev
Perf probe on vfs_fstatat fails as below on a powerpc system $ ./perf probe -nf --max-probes=512 -a 'vfs_fstatat $params' Segmentation fault (core dumped) This is observed while running perftool-testsuite_probe testcase. While running with verbose, its observed that segfault happens at: synthesize_probe_trace_arg () synthesize_probe_trace_command () probe_file.add_event () apply_perf_probe_events () __cmd_probe () cmd_probe () run_builtin () handle_internal_command () main () Code in synthesize_probe_trace_arg() access a null value and results in segfault. Data structure which is null: struct probe_trace_arg arg->value We are hitting a case where arg->value is null in probe point: "vfs_fstatat $params". This is happening since 'commit e896474fe485 ("getname_maybe_null() - the third variant of pathname copy-in")' Before the commit, probe point for vfs_fstatat was getting added only for one location: Writing event: p:probe/vfs_fstatat _text+6345404 dfd=%gpr3:s32 filename=%gpr4:x64 stat=%gpr5:x64 flags=%gpr6:s32 With this change, vfs_fstatat code is inlined for other locations in the code: Probe point found: __do_sys_lstat64+48 Probe point found: __do_sys_stat64+48 Probe point found: __do_sys_newlstat+48 Probe point found: __do_sys_newstat+48 Probe point found: vfs_fstatat+0 When trying to find matching dwarf information entry (DIE) from the debuginfo, the code incorrectly picks DIE which is not referring to vfs_fstatat. Snippet from dwarf entry in vmlinux debuginfo file. The main abstract die is: <1><4214883>: Abbrev Number: 147 (DW_TAG_subprogram) <4214885> DW_AT_external : 1 <4214885> DW_AT_name : (indirect string, offset: 0x17b9f3): vfs_fstatat With formal parameters: <2><4214896>: Abbrev Number: 51 (DW_TAG_formal_parameter) <4214897> DW_AT_name : dfd <2><42148a3>: Abbrev Number: 23 (DW_TAG_formal_parameter) <42148a4> DW_AT_name : (indirect string, offset: 0x8fda9): filename <2><42148b0>: Abbrev Number: 23 (DW_TAG_formal_parameter) <42148b1> DW_AT_name : (indirect string, offset: 0x16bd9c): stat <2><42148bd>: Abbrev Number: 23 (DW_TAG_formal_parameter) <42148be> DW_AT_name : (indirect string, offset: 0x39832b): flags While collecting variables/parameters for a probe point, the function copy_variables_cb() also looks at dwarf debug entries based on the instruction address. Snippet if (dwarf_haspc(die_mem, vf->pf->addr)) return DIE_FIND_CB_CONTINUE; else return DIE_FIND_CB_SIBLING; But incase of inlined function instance for vfs_fstatat, there are two entries which has the instruction address entry point as same. Instance 1: which is for vfs_fstatat and DW_AT_abstract_origin points to 0x4214883 (reference above for main abstract die) <3><42131fa>: Abbrev Number: 59 (DW_TAG_inlined_subroutine) <42131fb> DW_AT_abstract_origin: <0x4214883> <42131ff> DW_AT_entry_pc : 0xc00000000062b1e0 Instance 2: which is not for vfs_fstatat but for getname <5><4213270>: Abbrev Number: 39 (DW_TAG_inlined_subroutine) <4213271> DW_AT_abstract_origin: <0x4215b6b> <4213275> DW_AT_entry_pc : 0xc00000000062b1e0 But the copy_variables_cb() continues to add parameters from second instance also based on the dwarf_haspc() check. This results in formal parameters for getname also appended to params. But while filling in the args->value for these parameters, since these args are not part of dwarf with offset "42131fa". Hence value will be null. This incorrect args results in segfault when value field is accessed. Save the dwarf dieoffset of the actual DW_TAG_subprogram as part of "struct probe_finder". In copy_variables_cb(), include check to make sure the DW_AT_abstract_origin points to the correct entry if the dwarf_haspc() matches the instruction address. Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20250225123042.37263-1-atrajeev@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-26perf ftrace latency: allow to hide empty bucketsGabriele Monaco
Especially while using several buckets, it isn't uncommon to have some of them empty and reading the histogram may be a bit more complex: # perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency 200 # DURATION | COUNT | GRAPH | 0 - 5 us | 14816 | ###################################### | 5 - 10 us | 1228 | ### | 10 - 15 us | 438 | # | 15 - 20 us | 106 | | 20 - 25 us | 21 | | 25 - 30 us | 11 | | 30 - 35 us | 1 | | 35 - 40 us | 2 | | 40 - 45 us | 4 | | 45 - 50 us | 0 | | 50 - 55 us | 1 | | 55 - 60 us | 0 | | 60 - 65 us | 1 | | 65 - 70 us | 1 | | 70 - 75 us | 1 | | 75 - 80 us | 2 | | 80 - 85 us | 0 | | 85 - 90 us | 1 | | 90 - 95 us | 0 | | 95 - 100 us | 1 | | 100 - 105 us | 0 | | 105 - 110 us | 0 | | 110 - 115 us | 0 | | 115 - 120 us | 0 | | 120 - 125 us | 1 | | 125 - 130 us | 0 | | 130 - 135 us | 0 | | 135 - 140 us | 1 | | 140 - 145 us | 0 | | 145 - 150 us | 0 | | 150 - 155 us | 0 | | 155 - 160 us | 0 | | 160 - 165 us | 0 | | 165 - 170 us | 0 | | 170 - 175 us | 0 | | 175 - 180 us | 0 | | 180 - 185 us | 0 | | 185 - 190 us | 0 | | 190 - 195 us | 0 | | 195 - 200 us | 0 | | 200 - ... us | 2 | | Allow the optional flag --hide-empty to remove buckets with no element and produce a more compact graph. This feature could be misleading since there is no clear indication for missing buckets, for this reason it's disabled by default. # perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency --hide-empty 200 # DURATION | COUNT | GRAPH | 0 - 5 us | 14816 | ###################################### | 5 - 10 us | 1228 | ### | 10 - 15 us | 438 | # | 15 - 20 us | 106 | | 20 - 25 us | 21 | | 25 - 30 us | 11 | | 30 - 35 us | 1 | | 35 - 40 us | 2 | | 40 - 45 us | 4 | | 50 - 55 us | 1 | | 60 - 65 us | 1 | | 65 - 70 us | 1 | | 70 - 75 us | 1 | | 75 - 80 us | 2 | | 85 - 90 us | 1 | | 95 - 100 us | 1 | | 120 - 125 us | 1 | | 135 - 140 us | 1 | | 200 - ... us | 2 | | Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250207080446.77630-2-gmonaco@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-26perf ftrace latency: variable histogram bucketsGabriele Monaco
The max-latency value can make the histogram smaller, but not larger, we have a maximum of 22 buckets and specifying a max-latency that would require more buckets has no effect. Dynamically allocate the buckets and compute the bucket number from the max latency as (max-min) / range + 2 If the maximum is not specified, we still set the bucket number to 22 and compute the maximum accordingly. Fail if the maximum is smaller than min+range, this way we make sure we always have 3 buckets: those below min, those above max and one in the middle. Since max-latency is not available in log2 mode, always use 22 buckets. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250207080446.77630-1-gmonaco@redhat.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-26perf annotate-data: Handle direct use of stack pointer without fbregNamhyung Kim
Sometimes compiler generates code to use the stack pointer register without frame pointer. As we know RSP is the stack register on x86, let's treat it as same as fbreg. But the offset would be opposite direction so update the debug message accordingly. Reported-by: Blake Jones <blakejones@google.com> Link: https://lore.kernel.org/r/20250126210242.1181225-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-25Merge tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix tools/ quiet build Makefile infrastructure that was broken when working on tools/perf/ without testing on other tools/ living utilities. * tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: tools: Remove redundant quiet setup tools: Unify top-level quiet infrastructure