Age | Commit message (Collapse) | Author |
|
It seems there are places to miss updating refcount of maps.
Let's use map_symbol__copy() helper to properly copy them with
refcounts updated.
Link: https://lore.kernel.org/r/20250307061250.320849-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In disasm_line__parse_powerpc() , return code from function
disasm_line__parse() is ignored. This will result in bad results
if the disasm_line__parse() fails to disasm the line. Use
the return code to fix this.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20250304154114.62093-2-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When doing "perf annotate", perf tool provides option to
use specific disassembler like llvm/objdump/capstone. The
order picked is to use llvm first and if that fails fallback
to objdump ie to use PERF_DISASM_LLVM, PERF_DISASM_CAPSTONE
and PERF_DISASM_OBJDUMP
In powerpc, when using "data type" sort keys, first preferred
approach is to read the raw instruction from the DSO. In objdump
is specified in "--objdump" option, it picks the symbol disassemble
using objdump. Currently disasm_line__parse_powerpc() function
uses length of the "line" to determine if objdump is used.
But there are few cases, where if objdump doesn't recognise the
instruction, the disassembled string will be empty.
Example:
134cdc: c4 05 82 41 beq 1352a0 <getcwd+0x6e0>
134ce0: ac 00 8e 40 bne cr3,134d8c <getcwd+0x1cc>
134ce4: 0f 00 10 04 pld r9,1028308
====>134ce8: d4 b0 20 e5
134cec: 16 00 40 39 li r10,22
134cf0: 48 01 21 ea ld r17,328(r1)
So depending on length of line will give bad results.
Add a new filed to annotation options structure,
"struct annotation_options" to save the disassembler used.
Use this info to determine if disassembly is done while
parsing the disasm line.
Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/r/20250304154114.62093-1-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The length of PERF_RECORD_KSYMBOL for BPF is a size of JITed code so
it'd be 0 when it's not JITed. The ksymbol is needed to symbolize the
code when it gets samples in the region but non-JITed code cannot get
samples. Thus it'd be ok to ignore them.
Actually it caused a performance issue in the perf tools on old ARM
kernels where it can refuse to JIT some BPF codes. It ended up
splitting the existing kernel map (kallsyms). And later lookup for a
kernel symbol would create a new kernel map from kallsyms and then
split it again and again. :(
Probably there's a bug in the kernel map/symbol handling in perf tools.
But I think we need to fix this anyway.
Reported-by: Kevin Nomura <nomurak@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250305232838.128692-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The own_cpus map variable may be non-NULL and hold a reference, in
particular on hybrid machines. Do a put before overwriting the
variable to avoid a memory leak.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250305191931.604764-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This was detected at the end of a 'perf record' session when build-id
collection was enabled and thus the BPF programs put in place while the
session was running, some even put in place by perf itself were
processed and inserted, with some overlaps related to BPF trampolines
and programs took place.
Using maps__fixup_overlap_and_insert() instead of maps__insert() "fixes"
the problem, in the sense that overlaps will be dealt with and then the
consistency will be kept, but it would be interesting to fully
understand why such overlaps take place and how to deal with them when
doing symbol resolution.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Suggested-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/CAP-5=fXEEMFgPF2aZhKsfrY_En+qoqX20dWfuE_ad73Uxf0ZHQ@mail.gmail.com
Link: https://lore.kernel.org/r/20250228211734.33781-7-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Since in this case __maps__insert_sorted() is not called and thus
doesn't have the opportunity to do the needed map__set_kmap_maps() calls on
the new map.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z7-May5w9VQd5QD0@x1
Link: https://lore.kernel.org/r/20250228211734.33781-6-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
We can't just replacing the map in the maps_by_address and not touching
on the maps_by_name, that would leave the refcount as 1 and thus trip
another consistency check, this one:
perf: util/maps.c:110: check_invariants:
Assertion `refcount_read(map__refcnt(map)) > 1' failed.
106 /*
107 * Maps by name maps should be in maps_by_address, so
108 * the reference count should be higher.
109 */
110 assert(refcount_read(map__refcnt(map)) > 1);
Committer notice:
Initialize the newly added 'ni' variable, that really can't be
accessed unitialized trips some gcc versions, like:
12 20.00 archlinux:base : FAIL gcc version 13.2.1 20230801 (GCC)
util/maps.c: In function ‘__maps__fixup_overlap_and_insert’:
util/maps.c:896:54: error: ‘ni’ may be used uninitialized [-Werror=maybe-uninitialized]
896 | map__put(maps_by_name[ni]);
| ^
util/maps.c:816:25: note: ‘ni’ was declared here
816 | unsigned int i, ni;
| ^~
cc1: all warnings being treated as errors
make[3]: *** [/git/perf-6.14.0-rc1/tools/build/Makefile.build:138: util] Error 2
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z79std66tPq-nqsD@google.com
Link: https://lore.kernel.org/r/20250228211734.33781-5-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
I just noticed it would add extra kernel maps after modules. I think it
should fixup end address of the kernel maps after adding all maps first.
Fixes: 876e80cf83d10585 ("perf tools: Fixup end address of modules")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z7TvZGjVix2asYWI@x1
Link: https://lore.kernel.org/lkml/Z712hzvv22Ni63f1@google.com
Link: https://lore.kernel.org/r/20250228211734.33781-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When using __maps__insert_sorted() the map kmaps field needs to be
initialized, as we need kernel maps to work with map__kmap().
Fix it by using the newly introduced map__set_kmap() method.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z74V0hZXrTLM6VIJ@x1
Link: https://lore.kernel.org/r/20250228211734.33781-3-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
We need to set it in other places than __maps__insert(), so that we can
have access to the 'struct maps' from a kernel 'struct map'.
When building perf with 'DEBUG=1' we can notice it failing a consistency
check done in the check_invariants() function:
root@number:~# perf record -- perf test -w offcpu
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.040 MB perf.data (23 samples) ]
perf: util/maps.c:95: check_invariants: Assertion `map__end(prev) <= map__end(map)' failed.
Aborted (core dumped)
root@number:~#
The investigation on that was happening bisected to 876e80cf83d10585
("perf tools: Fixup end address of modules"), and the following patches
will plug the problems found, this patch is just legwork on that
direction.
Use the map__set_kmap_maps() name as per a review comment from Ian
Rogers, later there are further suggestions from him on getting rid of
the kmaps variable, see the thread referenced in the Link below.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/lkml/Z74V0hZXrTLM6VIJ@x1
Link: https://lore.kernel.org/r/20250228211734.33781-2-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This patch was originally posted here:
https://lore.kernel.org/all/20241213215421.661139-1-thomas.falcon@intel.com/
I have rebased on top of Arnaldo's patch here:
https://lore.kernel.org/all/Z2XCi3PgstSrV0SE@x1/
The original commit message:
"
perf script output may show different fields on different core PMU's
that exist on heterogeneous platforms. For example,
perf record -e "{cpu_core/mem-loads-aux/,cpu_core/event=0xcd,\
umask=0x01,ldlat=3,name=MEM_UOPS_RETIRED.LOAD_LATENCY/}:upp"\
-c10000 -W -d -a -- sleep 1
perf script:
chromium-browse 46572 [002] 544966.882384: 10000 cpu_core/MEM_UOPS_RETIRED.LOAD_LATENCY/: 7ffdf1391b0c 10268100142 \
|OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 5 7 0 7fad7c47425d [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3)
perf record -e cpu_atom/event=0xd0,umask=0x05,ldlat=3,\
name=MEM_UOPS_RETIRED.LOAD_LATENCY/upp -c10000 -W -d -a -- sleep 1
perf script:
gnome-control-c 534224 [023] 544951.816227: 10000 cpu_atom/MEM_UOPS_RETIRED.LOAD_LATENCY/: 7f0aaaa0aae0 [unknown] (/usr/lib64/libglib-2.0.so.0.8000.3)
Some fields, such as data_src, are not included by default.
The cause is that while one PMU may be assigned a type such as
PERF_TYPE_RAW, other core PMU's are dynamically allocated at boot time.
If this value does not match an existing PERF_TYPE_X value,
output_type(perf_event_attr.type) will return OUTPUT_TYPE_OTHER.
Instead search for a core PMU with a matching perf_event_attr type
and, if one is found, return PERF_TYPE_RAW to match output of other
core PMU's.
"
Suggested-by: Kan Liang <kan.liang@intel.com>
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250305163935.1605312-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Command 'perf bench syscall fork -l 100000' offers option -l to run for
a specified number of iterations. However this option is not always
observed. The number is silently limited to 10000 iterations as can be
seen:
Output before:
# perf bench syscall fork -l 100000
# Running 'syscall/fork' benchmark:
# Executed 10,000 fork() calls
Total time: 23.388 [sec]
2338.809800 usecs/op
427 ops/sec
#
When explicitly specified with option -l or --loops, also observe
higher number of iterations:
Output after:
# perf bench syscall fork -l 100000
# Running 'syscall/fork' benchmark:
# Executed 100,000 fork() calls
Total time: 716.982 [sec]
7169.829510 usecs/op
139 ops/sec
#
This patch fixes the issue for basic execve fork and getpgid.
Fixes: ece7f7c0507c ("perf bench syscall: Add fork syscall benchmark")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20250304092349.2618082-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now the workload will end after 1 second. Just run it with perf instead
of waiting for the background process.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-7-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Unlike others it has an infinite loop that make it annoying to call.
Make it finish after 1 second and handle command-line argument to change
the setting.
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-6-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It just check trace record and replay could display correct output.
It uses 'sleep' process and sees there's a clock_nanosleep syscall.
$ sudo perf test -vv replay
108: perf trace record and replay:
--- start ---
test child forked, pid 1563219
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.077 MB /tmp/temporary_file.w1ApA (242 samples) ]
0.686 (1000.068 ms): sleep/1563226 clock_nanosleep(rqtp: 0x7ffc20ffee10, rmtp: 0x7ffc20ffee50) = 0
---- end(0) ----
108: perf trace record and replay : Ok
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf trace requires root because it needs to use tracepoints and BPF.
Skip those test when it's not run as root.
Before:
$ perf test trace
15: Parse sched tracepoints fields : Skip (permissions)
80: perf ftrace tests : Skip
105: perf trace enum augmentation tests : FAILED!
106: perf trace BTF general tests : FAILED!
107: perf trace exit race : FAILED!
118: probe libc's inet_pton & backtrace it with ping : Skip
125: Check Arm CoreSight trace data recording and synthesized samples: Skip
127: Check Arm SPE trace data recording and synthesized samples : Skip
132: Check open filename arg using perf trace + vfs_getname : FAILED!
After:
$ perf test trace
15: Parse sched tracepoints fields : Skip (permissions)
80: perf ftrace tests : Skip
105: perf trace enum augmentation tests : Skip
106: perf trace BTF general tests : Skip
107: perf trace exit race : Skip
118: probe libc's inet_pton & backtrace it with ping : Skip
125: Check Arm CoreSight trace data recording and synthesized samples: Skip
127: Check Arm SPE trace data recording and synthesized samples : Skip
132: Check open filename arg using perf trace + vfs_getname : Skip
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Howard Chu <howardchu95@gmail.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf trace requires root because it needs to use [ku]probes.
Skip those test when it's not run as root.
Before:
$ perf test probe
47: Probe SDT events : Ok
104: test perf probe of function from different CU : FAILED!
115: perftool-testsuite_probe : FAILED!
117: Add vfs_getname probe to get syscall args filenames : FAILED!
118: probe libc's inet_pton & backtrace it with ping : FAILED!
119: Use vfs_getname probe to get syscall args filenames : FAILED!
After:
$ perf test probe
47: Probe SDT events : Ok
104: test perf probe of function from different CU : Skip
115: perftool-testsuite_probe : Skip
117: Add vfs_getname probe to get syscall args filenames : Skip
118: probe libc's inet_pton & backtrace it with ping : Skip
119: Use vfs_getname probe to get syscall args filenames : Skip
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250304022837.1877845-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a test case for --metric-only for std, csv, json output mode using
shadow IPC metric from instructions and cycles events. It should
produce 'insn per cycle' metric.
But currently JSON output has (none) 'GHz' as well. It looks like a bug
but I don't have enough time to debug it for now so I made it pass. :(
$ perf stat --metric-only -e instructions,cycles true
Performance counter stats for 'true':
0.56
0.002127319 seconds time elapsed
0.002077000 seconds user
0.000000000 seconds sys
$ perf stat -x, --metric-only -e instructions,cycles true
0.55,,
$ perf stat -j --metric-only -e instructions,cycles true
{"insn per cycle" : "0.53", "GHz" : "none"}
$ perf test output -v
5: Test data source output : Ok
31: Sort output of hist entries : Ok
88: perf stat CSV output linter : Ok
90: perf stat JSON output linter : Ok
92: perf stat STD output linter : Ok
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250304022837.1877845-2-namhyung@kernel.org
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When FEAT_SPE_PBT is implemented, the previous branch target address
(named as PBT) before the sampled operation, will be recorded.
This commit first introduces a 'prev_br_tgt' field in the record for
saving the PBT address in the decoder.
If the current operation is a branch instruction, by combining with PBT,
it can create a chain with two consecutive branches. As the branch
stack stores branches in descending order, meaning a newer branch is
stored in a lower entry in the stack. Arm SPE stores the latest branch
in the first entry of branch stack, and the previous branch coming from
PBT is stored into the second entry.
Otherwise, if current operation is not a branch, the last branch will be
saved for PBT only. PBT lacks associated information such as branch
source address, branch type, and events. The branch entry fills zeros
for the corresponding fields and only set its target address.
After:
perf script -f --itrace=bl -F flags,addr,brstack
jcc ffff800080187914 0xffff8000801878fc/0xffff800080187914/P/-/-/1/COND/- 0x0/0xffff8000801878f8/-/-/-/0//-
jcc ffff8000802d12d8 0xffff8000802d12f8/0xffff8000802d12d8/P/-/-/1/COND/- 0x0/0xffff8000802d12ec/-/-/-/0//-
jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//-
jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//-
jmp ffff800081410980 0xffff800081419108/0xffff800081410980/P/-/-/1//- 0x0/0xffff800081419104/-/-/-/0//-
return ffff80008036e064 0xffff80008141ba84/0xffff80008036e064/P/-/-/1/RET/- 0x0/0xffff80008141ba60/-/-/-/0//-
jcc ffff8000803d54f0 0xffff8000803d54e8/0xffff8000803d54f0/P/-/-/1/COND/- 0x0/0xffff8000803d54e0/-/-/-/0//-
jmp ffff80008015e468 0xffff8000803d46dc/0xffff80008015e468/P/-/-/1//- 0x0/0xffff8000803d46c8/-/-/-/0//-
jmp ffff8000806e2d50 0xffff80008040f710/0xffff8000806e2d50/P/-/-/1//- 0x0/0xffff80008040f6e8/-/-/-/0//-
jcc ffff800080721704 0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/- 0x0/0xffff8000807216ac/-/-/-/0//-
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-13-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Although Arm SPE cannot generate continuous branch records, this commit
creates a branch stack with only one branch entry. A single branch info
can be used for performance optimization.
A branch stack structure is dynamically allocated in the decode queue.
The branch stack and stack flags are synthesized based on branch types
and associated events.
After:
# perf script --itrace=bl1 -F flags,addr,brstack
jcc ffffc0fad9c6b214 0xffffc0fad9c6b234/0xffffc0fad9c6b214/P/-/-/7/COND/-
jcc/miss,not_taken/ ffffc0fadaaebb30 0xffffc0fadaaebb2c/0xffffc0fadaaebb30/MN/-/-/7/COND/-
jmp ffffc0fadaaea358 0xffffc0fadaaea5ec/0xffffc0fadaaea358/P/-/-/5//-
jcc/not_taken/ ffffc0fadaae6494 0xffffc0fadaae6490/0xffffc0fadaae6494/PN/-/-/11/COND/-
jcc/not_taken/ ffff7f83ab54 0xffff7f83ab50/0xffff7f83ab54/PN/-/-/13/COND/-
jcc/not_taken/ ffff7f83ab08 0xffff7f83ab04/0xffff7f83ab08/PN/-/-/8/COND/-
jcc ffff7f83aa80 0xffff7f83aa58/0xffff7f83aa80/P/-/-/10/COND/-
jcc ffff7f9a45d0 0xffff7f9a43f0/0xffff7f9a45d0/P/-/-/29/COND/-
jcc/not_taken/ ffffc0fad9ba6db4 0xffffc0fad9ba6db0/0xffffc0fad9ba6db4/PN/-/-/44/COND/-
jcc ffffc0fadaac2964 0xffffc0fadaac2970/0xffffc0fadaac2964/P/-/-/6/COND/-
jcc ffffc0fad99ddc10 0xffffc0fad99ddc04/0xffffc0fad99ddc10/P/-/-/72/COND/-
jcc/not_taken/ ffffc0fad9b3f21c 0xffffc0fad9b3f218/0xffffc0fad9b3f21c/PN/-/-/64/COND/-
jcc ffffc0fad9c3b604 0xffffc0fad9c3b5f8/0xffffc0fad9c3b604/P/-/-/13/COND/-
jcc ffffc0fadaad6048 0xffffc0fadaad5f8c/0xffffc0fadaad6048/P/-/-/5/COND/-
return/miss/ ffff7f84e614 0xffffc0fad98a2274/0xffff7f84e614/M/-/-/13/RET/-
jcc/not_taken/ ffffc0fadaac4eb4 0xffffc0fadaac4eb0/0xffffc0fadaac4eb4/PN/-/-/5/COND/-
jmp ffff7f8e3130 0xffff7f87555c/0xffff7f8e3130/P/-/-/5//-
jcc/not_taken/ ffffc0fad9b3d9b0 0xffffc0fad9b3d9ac/0xffffc0fad9b3d9b0/PN/-/-/14/COND/-
return ffffc0fad9b91950 0xffffc0fad98c3e28/0xffffc0fad9b91950/P/-/-/12/RET/-
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-12-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Based on the supplement information in the record, this commit sets the
sample flags for conditional branch, function call, return. It also
sets events in flags, such as mispredict, not taken, and in transaction.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-11-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The new added branch operations and events are filled into record, the
information will be consumed when synthesizing samples.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-10-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The bit[16] in an event payload indicates an operation is in
transactional state. Decode the bit.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-9-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In Arm ARM (ARM DDI 0487, L.a), the section "D18.2.7 Operation Type
packet", the branch subclass is extended for Call Return (CR), Guarded
control stack data access (GCS).
This commit adds support CR and GCS operations. The IND (indirect)
operation is defined only in bit [1], its macro is updated accordingly.
Move the COND (Conditional) macro into the same group with other
operations for better maintenance.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-8-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The ARM_SPE_OP_LD and ARM_SPE_OP_ST operations are secondary operation
type, they are overlapping with other second level's operation types
belonging to SVE and branch operations. As a result, a non load-store
operation can be parsed for data source and memory sample.
To fix the issue, this commit introduces a is_ldst_op() macro for
checking LDST operation, and apply the checking when synthesize data
source and memory samples.
Fixes: a89dbc9b988f ("perf arm-spe: Set sample's data source field")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250304111240.3378214-7-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The branch stack has an existed field for printing mispredict, extend
the field for printing events and add support not-taken event.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-6-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Some hardware (e.g., Arm SPE) can trace the not taken event for
branches. Add a flag for this event and support printing it.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-5-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Branch types and events are two different things. A branch type can be
a conditional branch, an indirect branch, a procedure call, a return, or
an exception taken, etc. The extra event information is provided for
what happens during a branch, e.g. if a branch is mispredicted or not
taken (specific to conditional branches).
To deliver information about branches, this commit separates events from
branch types. It parses branch types first, then appends event strings
embraced by the '/' character. If multiple events occur, the events is
separated with a comma (,).
Also add a minor improvement by adding char 'm' in char array for branch
mispredict event.
Below are extracted sample flags.
Before:
branch: br miss
instructions: br miss
After:
branch: jmp/miss/
instructions: jmp/miss/
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-4-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When generating a string for sample flags, the sample_flags_to_name()
function lacks the ability to parse the trace start bit or trace end bit.
Therefore, the function is invoked multiple times after clearing its
unsupported bits.
This commit improves the sample_flags_to_name() function to parse sample
flags in one go for three kinds of information:
- The prefix info for trace start, trace end, etc.
- Branch types.
- Extra info for transaction and interrupt related info.
As a result, the code is simplified to call the sample_flags_to_name()
only once. No expectation for any changes in the perf script output.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-3-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a check for the generated string of flags. Print out the raw number
if the string generation fails.
Use the SAMPLE_FLAGS_STR_ALIGNED_SIZE macro to replace the value '21'.
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20250304111240.3378214-2-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Legacy hybrid events have attr.type == PERF_TYPE_HARDWARE, so they look
like plain legacy events if we only look at attr.type. But legacy events
should still be uniquified if they were opened on a non-legacy PMU. Fix
it by checking if the evsel is hybrid and forcing needs_uniquify
before looking at the attr.type.
This restores PMU names on hybrid systems and also changes "perf stat
metrics (shadow stat) test" from a FAIL back to a SKIP (on hybrid). The
test was gated on "cycles" appearing alone which doesn't happen on
here.
Before:
$ perf stat -- true
...
<not counted> instructions:u (0.00%)
162,536 instructions:u # 0.58 insn per cycle
...
After:
$ perf stat -- true
...
<not counted> cpu_atom/instructions/u (0.00%)
162,541 cpu_core/instructions/u # 0.62 insn per cycle
...
Fixes: 357b965deba9 ("perf stat: Changes to event name uniquification")
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20250226145526.632380-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The BPF sideband information is tracked using a separate thread and
evlist. But it's only useful for profiling kernel and we can skip it
when users profile their application only.
It seems it already fails to open the sideband event in that case.
Let's remove the noise in the verbose output anyway.
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250226203039.1099131-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
There are spelling mistakes in TEST_ASSERT_VAL messages. Fix them.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20250228090941.680226-1-colin.i.king@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Building perf in-tree is broken after commit 890a1961c812 ("perf tools:
Create source symlink in perf object dir") which added a 'source' symlink
in the output dir pointing to the source dir.
With in-tree builds, the added 'SOURCE = ...' line is executed multiple
times (I observed 2 during the build plus 2 during installation). This is a
minor inefficiency, in theory not harmful because symlink creation is
assumed to be idempotent. But it is not.
Considering with in-tree builds:
srctree=/absolute/path/to/linux
OUTPUT=/absolute/path/to/linux/tools/perf
here's what happens:
1. ln -sf $(srctree)/tools/perf $(OUTPUT)/source
-> creates /absolute/path/to/linux/tools/perf/source
link to /absolute/path/to/linux/tools/perf
=> OK, that's what was intended
2. ln -sf $(srctree)/tools/perf $(OUTPUT)/source # same command as 1
-> creates /absolute/path/to/linux/tools/perf/perf
link to /absolute/path/to/linux/tools/perf
=> Not what was intended, not idempotent
3. Now the build _should_ create the 'perf' executable, but it fails
The reason is the tricky 'ln' command line. At the first invocation 'ln'
uses the 1st form:
ln [OPTION]... [-T] TARGET LINK_NAME
and creates a link to TARGET *called LINK_NAME*.
At the second invocation $(OUTPUT)/source exists, so 'ln' uses the 3rd
form:
ln [OPTION]... TARGET... DIRECTORY
and creates a link to TARGET *called TARGET* inside DIRECTORY.
Fix by adding -n/--no-dereference to "treat LINK_NAME as a normal file
if it is a symbolic link to a directory", as the manpage says.
Closes: https://lore.kernel.org/all/20241125182506.38af9907@booty/
Fixes: 890a1961c812 ("perf tools: Create source symlink in perf object dir")
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20250124-perf-fix-intree-build-v1-1-485dd7a855e4@bootlin.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When users set the parameter '-F' to specify frequency for Arm SPE, the
tool reports error:
perf record -F 1000 -e arm_spe_0// -- sleep 1
Error:
Invalid event (arm_spe_0//) in per-thread mode, enable system wide with '-a'.
The output logs are confused and it does not give the correct reminding.
Arm SPE does not support frequency setting given it adopts a statistical
based approach.
Alternatively, Arm SPE supports setting period. This commit adds a
for frequency setting. It reports error and reminds users to set period
instead.
After:
perf record -F 1000 -e arm_spe_0// -- sleep 1
Arm SPE: Frequency is not supported. Set period with -c option or PMU parameter (-e arm_spe_0/period=NUM/).
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250227085544.2154136-1-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This patch parses `owner_lock_stat` into a RB tree, enabling ordered
reporting of owner lock statistics with stack traces. It also updates
the documentation for the `-o` option in contention mode, decouples `-o`
from `-t`, and issues a warning to inform users about the new behavior
of `-ov`.
Example output:
$ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex-spin -E3 perf bench sched pipe
...
contended total wait max wait avg wait type caller
171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57
0xffffffffac6318e7 pipe_read+0x57
0xffffffffac623862 vfs_read+0x332
0xffffffffac62434b ksys_read+0xbb
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50
0xffffffffac631ee0 pipe_write+0x50
0xffffffffac6241db vfs_write+0x3bb
0xffffffffac6244ab ksys_write+0xbb
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d
0xffffffffac691f0d do_epoll_wait+0x24d
0xffffffffac69249b do_epoll_pwait.part.0+0xb
0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
=== owner stack trace ===
3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348
0xffffffffac631bd8 pipe_read+0x348
0xffffffffac623862 vfs_read+0x332
0xffffffffac62434b ksys_read+0xbb
0xfffffffface604b2 do_syscall_64+0x82
0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76
...
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-5-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The rb_tree helper functions can be reused for parsing `owner_lock_stat`
into rb tree for sorting.
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-4-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This implements per-callstack aggregation of lock owners in addition to
per-thread. The owner callstack is captured using `bpf_get_task_stack()`
at `contention_begin()` and it also adds a custom stackid function for the
owner stacks to be compared easily.
The owner info is kept in a hash map using lock addr as a key to handle
multiple waiters for the same lock. At `contention_end()`, it updates the
owner lock stat based on the info that was saved at `contention_begin()`.
If there are more waiters, it'd update the owner pid to itself as
`contention_end()` means it gets the lock now. But it also needs to check
the return value of the lock function in case task was killed by a signal
or something.
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-3-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a struct and few bpf maps in order to tracing owner stack.
`struct owner_tracing_data`: Contains owner's pid, stack id, timestamp for
when the owner acquires lock, and the count of lock waiters.
`stack_buf`: Percpu buffer for retrieving owner stacktrace.
`owner_stacks`: For tracing owner stacktrace to customized owner stack id.
`owner_data`: For tracing lock_address to `struct owner_tracing_data` in
bpf program.
`owner_stat`: For reporting owner stacktrace in usermode.
Signed-off-by: Chun-Tse Shao <ctshao@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250227003359.732948-2-ctshao@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Fewer than 32k logical CPUs are currently supported by perf. A cpumap
is indexed by an integer (see perf_cpu_map__cpu) yielding a perf_cpu
that wraps a 4-byte int for the logical CPU - the wrapping is done
deliberately to avoid confusing a logical CPU with an index into a
cpumap. Using a 4-byte int within the perf_cpu is larger than required
so this patch reduces it to the 2-byte int16_t. For a cpumap
containing 16 entries this will reduce the array size from 64 to 32
bytes. For very large servers with lots of logical CPUs the size
savings will be greater.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250210191231.156294-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Perf trace on perf.data fails as below:
./perf trace record -- sleep 1
./perf trace -i perf.data
perf: Segmentation fault
Segmentation fault (core dumped)
Backtrace pointed to :
?? ()
perf_session.process_user_event ()
reader.read_event ()
perf_session.process_events ()
cmd_trace ()
run_builtin ()
handle_internal_command ()
main ()
Further debug pointed that, segmentation fault happens when
trying to access id_index. Code snippet:
case PERF_RECORD_ID_INDEX:
err = tool->id_index(session, event);
Since 'commit 15d4a6f41d72 ("perf tool: Remove
perf_tool__fill_defaults()")', perf_tool__fill_defaults is
removed. All tools are initialized using perf_tool__init()
prior to use. But in builtin-trace, perf_tool__init is not
used and hence the defaults are not initialized. Use
perf_tool__init() in perf trace to handle the initialization.
Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com>
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20250225113157.28836-1-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
-v disables deduplication of similarly suffixed PMUs so add it to the
help and doc strings.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250226104111.564443-4-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
After pmu_add_cpu_aliases() is called, perf_pmu__num_events() returns an
incorrect value that double counts common events and doesn't match the
actual count of events in the alias list. This is because after
'cpu_aliases_added == true', the number of events returned is
'sysfs_aliases + cpu_json_aliases'. But when adding 'case
EVENT_SRC_SYSFS' events, 'sysfs_aliases' and 'cpu_json_aliases' are both
incremented together, failing to account that these ones overlap and
only add a single item to the list. Fix it by adding another counter for
overlapping events which doesn't influence 'cpu_json_aliases'.
There doesn't seem to be a current issue because it's used in perf list
before pmu_add_cpu_aliases() so the correct value is returned. Other
uses in tests may also miss it for other reasons like only looking at
uncore events. However it's marked as a fixes commit in case any new fix
with new uses of perf_pmu__num_events() is backported.
Fixes: d9c5f5f94c2d ("perf pmu: Count sys and cpuid JSON events separately")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250226104111.564443-3-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf_pmus__destroy() treats all PMUs as allocated and free's them so we
can't have any static PMUs that are added to the PMU lists. Fix it by
allocating the tool PMU in the same way as the others. Current users of
the tool PMU already use find_pmu() and not perf_pmus__tool_pmu(), so
rename the function to add 'new' to avoid it being misused in the
future.
perf_pmus__fake_pmu() can remain as static as it's not added to the
PMU lists.
Fixes the following error:
$ perf bench internals pmu-scan
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
munmap_chunk(): invalid pointer
Aborted (core dumped)
Fixes: 240505b2d0ad ("perf tool_pmu: Factor tool events into their own PMU")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250226104111.564443-2-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Perf probe on vfs_fstatat fails as below on a powerpc system
$ ./perf probe -nf --max-probes=512 -a 'vfs_fstatat $params'
Segmentation fault (core dumped)
This is observed while running perftool-testsuite_probe testcase.
While running with verbose, its observed that segfault happens
at:
synthesize_probe_trace_arg ()
synthesize_probe_trace_command ()
probe_file.add_event ()
apply_perf_probe_events ()
__cmd_probe ()
cmd_probe ()
run_builtin ()
handle_internal_command ()
main ()
Code in synthesize_probe_trace_arg() access a null value and results in
segfault. Data structure which is null:
struct probe_trace_arg arg->value
We are hitting a case where arg->value is null in probe point:
"vfs_fstatat $params". This is happening since 'commit e896474fe485
("getname_maybe_null() - the third variant of pathname copy-in")'
Before the commit, probe point for vfs_fstatat was getting added only
for one location:
Writing event: p:probe/vfs_fstatat _text+6345404 dfd=%gpr3:s32 filename=%gpr4:x64 stat=%gpr5:x64 flags=%gpr6:s32
With this change, vfs_fstatat code is inlined for other locations in the
code:
Probe point found: __do_sys_lstat64+48
Probe point found: __do_sys_stat64+48
Probe point found: __do_sys_newlstat+48
Probe point found: __do_sys_newstat+48
Probe point found: vfs_fstatat+0
When trying to find matching dwarf information entry (DIE)
from the debuginfo, the code incorrectly picks DIE which is
not referring to vfs_fstatat. Snippet from dwarf entry in vmlinux
debuginfo file.
The main abstract die is:
<1><4214883>: Abbrev Number: 147 (DW_TAG_subprogram)
<4214885> DW_AT_external : 1
<4214885> DW_AT_name : (indirect string, offset: 0x17b9f3): vfs_fstatat
With formal parameters:
<2><4214896>: Abbrev Number: 51 (DW_TAG_formal_parameter)
<4214897> DW_AT_name : dfd
<2><42148a3>: Abbrev Number: 23 (DW_TAG_formal_parameter)
<42148a4> DW_AT_name : (indirect string, offset: 0x8fda9): filename
<2><42148b0>: Abbrev Number: 23 (DW_TAG_formal_parameter)
<42148b1> DW_AT_name : (indirect string, offset: 0x16bd9c): stat
<2><42148bd>: Abbrev Number: 23 (DW_TAG_formal_parameter)
<42148be> DW_AT_name : (indirect string, offset: 0x39832b): flags
While collecting variables/parameters for a probe point, the function
copy_variables_cb() also looks at dwarf debug entries based on the
instruction address. Snippet
if (dwarf_haspc(die_mem, vf->pf->addr))
return DIE_FIND_CB_CONTINUE;
else
return DIE_FIND_CB_SIBLING;
But incase of inlined function instance for vfs_fstatat, there are two
entries which has the instruction address entry point as same.
Instance 1: which is for vfs_fstatat and DW_AT_abstract_origin points to
0x4214883 (reference above for main abstract die)
<3><42131fa>: Abbrev Number: 59 (DW_TAG_inlined_subroutine)
<42131fb> DW_AT_abstract_origin: <0x4214883>
<42131ff> DW_AT_entry_pc : 0xc00000000062b1e0
Instance 2: which is not for vfs_fstatat but for getname
<5><4213270>: Abbrev Number: 39 (DW_TAG_inlined_subroutine)
<4213271> DW_AT_abstract_origin: <0x4215b6b>
<4213275> DW_AT_entry_pc : 0xc00000000062b1e0
But the copy_variables_cb() continues to add parameters from second
instance also based on the dwarf_haspc() check. This results in
formal parameters for getname also appended to params. But while
filling in the args->value for these parameters, since these args
are not part of dwarf with offset "42131fa". Hence value will be
null. This incorrect args results in segfault when value field is
accessed.
Save the dwarf dieoffset of the actual DW_TAG_subprogram as part of
"struct probe_finder". In copy_variables_cb(), include check to make
sure the DW_AT_abstract_origin points to the correct entry if the
dwarf_haspc() matches the instruction address.
Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250225123042.37263-1-atrajeev@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Especially while using several buckets, it isn't uncommon to have some
of them empty and reading the histogram may be a bit more complex:
# perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency 200
# DURATION | COUNT | GRAPH |
0 - 5 us | 14816 | ###################################### |
5 - 10 us | 1228 | ### |
10 - 15 us | 438 | # |
15 - 20 us | 106 | |
20 - 25 us | 21 | |
25 - 30 us | 11 | |
30 - 35 us | 1 | |
35 - 40 us | 2 | |
40 - 45 us | 4 | |
45 - 50 us | 0 | |
50 - 55 us | 1 | |
55 - 60 us | 0 | |
60 - 65 us | 1 | |
65 - 70 us | 1 | |
70 - 75 us | 1 | |
75 - 80 us | 2 | |
80 - 85 us | 0 | |
85 - 90 us | 1 | |
90 - 95 us | 0 | |
95 - 100 us | 1 | |
100 - 105 us | 0 | |
105 - 110 us | 0 | |
110 - 115 us | 0 | |
115 - 120 us | 0 | |
120 - 125 us | 1 | |
125 - 130 us | 0 | |
130 - 135 us | 0 | |
135 - 140 us | 1 | |
140 - 145 us | 0 | |
145 - 150 us | 0 | |
150 - 155 us | 0 | |
155 - 160 us | 0 | |
160 - 165 us | 0 | |
165 - 170 us | 0 | |
170 - 175 us | 0 | |
175 - 180 us | 0 | |
180 - 185 us | 0 | |
185 - 190 us | 0 | |
190 - 195 us | 0 | |
195 - 200 us | 0 | |
200 - ... us | 2 | |
Allow the optional flag --hide-empty to remove buckets with no element
and produce a more compact graph. This feature could be misleading since
there is no clear indication for missing buckets, for this reason it's
disabled by default.
# perf ftrace latency -a -T mutex_lock --bucket-range 5 --max-latency --hide-empty 200
# DURATION | COUNT | GRAPH |
0 - 5 us | 14816 | ###################################### |
5 - 10 us | 1228 | ### |
10 - 15 us | 438 | # |
15 - 20 us | 106 | |
20 - 25 us | 21 | |
25 - 30 us | 11 | |
30 - 35 us | 1 | |
35 - 40 us | 2 | |
40 - 45 us | 4 | |
50 - 55 us | 1 | |
60 - 65 us | 1 | |
65 - 70 us | 1 | |
70 - 75 us | 1 | |
75 - 80 us | 2 | |
85 - 90 us | 1 | |
95 - 100 us | 1 | |
120 - 125 us | 1 | |
135 - 140 us | 1 | |
200 - ... us | 2 | |
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250207080446.77630-2-gmonaco@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The max-latency value can make the histogram smaller, but not larger, we
have a maximum of 22 buckets and specifying a max-latency that would
require more buckets has no effect.
Dynamically allocate the buckets and compute the bucket number from the
max latency as (max-min) / range + 2
If the maximum is not specified, we still set the bucket number to 22
and compute the maximum accordingly.
Fail if the maximum is smaller than min+range, this way we make sure we
always have 3 buckets: those below min, those above max and one in the
middle.
Since max-latency is not available in log2 mode, always use 22 buckets.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250207080446.77630-1-gmonaco@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Sometimes compiler generates code to use the stack pointer register
without frame pointer. As we know RSP is the stack register on x86,
let's treat it as same as fbreg. But the offset would be opposite
direction so update the debug message accordingly.
Reported-by: Blake Jones <blakejones@google.com>
Link: https://lore.kernel.org/r/20250126210242.1181225-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix tools/ quiet build Makefile infrastructure that was broken when
working on tools/perf/ without testing on other tools/ living
utilities.
* tag 'perf-tools-fixes-for-v6.14-2-2025-02-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
tools: Remove redundant quiet setup
tools: Unify top-level quiet infrastructure
|