Age | Commit message (Collapse) | Author |
|
For ELF file dsos read the e_machine from the ELF header. For kernel
types assume the e_machine matches the perf tool. In other cases
return EM_NONE.
When reading from the ELF header use DSO__SWAP that may need
dso->needs_swap initializing. Factor out dso__swap_init to allow this.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: https://lore.kernel.org/r/20250319050741.269828-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The syscalltbl held entries of system call name and number pairs,
generated from a native syscalltbl at start up. As there are gaps in
the system call number there is a notion of index into the
table. Going forward we want the system call table to be identifiable
by a machine type, for example, i386 vs x86-64. Change the interface
to the syscalltbl so (1) a (currently unused machine type of EM_HOST)
is passed (2) the index to syscall number and system call name mapping
is computed at build time.
Two tables are used for this, an array of system call number to name,
an array of system call numbers sorted by the system call name. The
sorted array doesn't store strings in part to save memory and
relocations. The index notion is carried forward and is an index into
the sorted array of system call numbers, the data structures are
opaque (held only in syscalltbl.c), and so the number of indices for a
machine type is exposed as a new API.
The arrays are computed in the syscalltbl.sh script and so no start-up
time computation and storage is necessary.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: https://lore.kernel.org/r/20250319050741.269828-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Identify struct syscall information in the syscalls table by a machine
type and syscall number, not just system call number. Having the
machine type means that 32-bit system calls can be differentiated from
64-bit ones on a machine capable of both. Having a table for all
machine types and all system call numbers would be too large, so
maintain a sorted array of system calls as they are encountered.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: https://lore.kernel.org/r/20250319050741.269828-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The definition of "static const char *const syscalltbl[] = {" is done
in a generated syscalls_32.h or syscalls_64.h that is architecture
dependent. In order to include the appropriate file a syscall_table.h
is found via the perf include path and it includes the syscalls_32.h
or syscalls_64.h as appropriate.
To support having multiple syscall tables, one for 32-bit and one for
64-bit, or for different architectures, an include path cannot be
used. Remove syscall_table.h because of this and inline what it does
into syscalltbl.c.
For architectures without a syscall_table.h this will cause a failure
to include either syscalls_32.h or syscalls_64.h rather than a failure
to include syscall_table.h. For architectures that only included one
or other, the behavior matches BITS_PER_LONG as previously done on
architectures supporting both syscalls_32.h and syscalls_64.h.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: https://lore.kernel.org/r/20250319050741.269828-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
There are many and non-obvious meanings to the dso_binary_type enum
values. Add kernel-doc to speed interpretting their meanings.
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250319050741.269828-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The variables elf_base_addr, debug_frame_offset, eh_frame_hdr_addr and
eh_frame_hdr_offset are only accessed in unwind-libunwind-local.c
which is conditionally built on having libunwind support. Make the
variables conditional on libunwind support too.
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: https://lore.kernel.org/r/20250319050741.269828-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
I've realized that it doesn't make sense to accumulate the samples to
parent in the callchain when data type profiling is enabled. Because it
won't have the same data type access in the parent. Otherwise it'd see
something like this:
$ perf report -s type --stdio -g none
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 2K of event 'cycles:Pu'
# Event count (approx.): 8266456478
#
# Children Latency Self Latency Data Type
# ........ ....... ........ ........ .........
#
698.97% 697.72% 99.80% 99.61% (unknown)
0.09% 0.18% 0.09% 0.18% Elf64_Rela
0.05% 0.10% 0.05% 0.10% unsigned char
0.05% 0.10% 0.05% 0.10% struct exit_function_list
0.00% 0.01% 0.00% 0.01% struct rtld_global
Link: https://lore.kernel.org/r/20250307080829.354947-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It was prohibited because the output fields in the children mode were
not handled properly with hierarchy. But we can have the output fields
in the same level, it can allow them together.
For example, latency mode adds more output fields by default and now
they are displayed properly.
$ perf record --latency -g -- perf test -w thloop
$ perf report -H --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 2K of event 'cycles:Pu'
# Event count (approx.): 8266456478
#
# Children Latency Overhead Latency Command / Shared Object / Symbol
# ........................................... ........................................................
#
0.08% 0.16% 100.00% 100.00% perf
0.08% 0.16% 0.24% 0.47% ld-linux-x86-64.so.2
0.12% 0.24% 0.12% 0.24% [.] _dl_relocate_object
0.08% 0.16% 0.08% 0.16% [.] _dl_lookup_symbol_x
0.03% 0.06% 0.03% 0.06% [.] strcmp
0.00% 0.01% 0.00% 0.01% [.] _dl_start
0.00% 0.00% 0.00% 0.00% [.] _dl_start_user
0.00% 0.00% 0.00% 0.00% [.] _dl_sysdep_start
0.00% 0.00% 0.00% 0.00% [.] _start
0.00% 0.00% 0.00% 0.00% [.] dl_main
0.03% 0.06% 0.03% 0.06% libLLVM-16.so.1
0.03% 0.06% 0.03% 0.06% [.] llvm::StringMapImpl::RehashTable(unsigned int)
0.00% 0.00% 0.00% 0.00% [.] 0x00007f137ccd18e8
0.00% 0.00% 99.66% 99.31% perf
99.66% 99.31% 99.66% 99.31% [.] test_loop
|
|--49.86%--0x7f137b633d68
| 0x55dbdbbb7d2c
...
Link: https://lore.kernel.org/r/20250307080829.354947-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This is useful for hierarchy output mode where the first level is
considered as output fields. We want them in the same level so that it
can show only the remaining groups in the hierarchy.
Before:
$ perf report -s overhead,sample,period,comm,dso -H --stdio
...
# Overhead Samples / Period / Command / Shared Object
# ................. ..........................................
#
100.00% 4035
100.00% 3835883066
100.00% perf
99.37% perf
0.50% ld-linux-x86-64.so.2
0.06% [unknown]
0.04% libc.so.6
0.02% libLLVM-16.so.1
After:
$ perf report -s overhead,sample,period,comm,dso -H --stdio
...
# Overhead Samples Period Command / Shared Object
# ....................................... .......................
#
100.00% 4035 3835883066 perf
99.37% 4005 3811826223 perf
0.50% 19 19210014 ld-linux-x86-64.so.2
0.06% 8 2367089 [unknown]
0.04% 2 1720336 libc.so.6
0.02% 1 759404 libLLVM-16.so.1
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250307080829.354947-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
On linux-next
commit 72c6f57a4193 ("perf pmu: Dynamically allocate tool PMU")
allocated PMU named "tool" dynamicly. However that allocation
can fail and a NULL pointer is returned. That case is currently
not handled and would result in an invalid address reference.
Add a check for NULL pointer.
Fixes: 72c6f57a4193 ("perf pmu: Dynamically allocate tool PMU")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250319122820.2898333-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
zfree() requires an address otherwise it frees what's in name, rather
than name itself. Pass the address of name to fix it.
This was the only incorrect occurrence in Perf found using a search.
Fixes: 8db5cabcf1b6 ("perf stat: Fork and launch 'perf record' when 'perf stat' needs to get retire latency value for a metric.")
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250319101614.190922-1-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Thomas Richter <tmricht@linux.ibm.com> reported a double put on the
cpumap for the placeholder core PMU:
https://lore.kernel.org/lkml/20250318095132.1502654-3-tmricht@linux.ibm.com/
Requiring the caller to get the cpumap is not how these things are
usually done, switch cpu_map__online to do the get and then fix up any
use cases where a put is needed.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20250318171914.145616-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Kernel modules for which we cannot find a file on-disk will have a
dso->long_name that looks like "[module_name]". Prior to the commit
listed in the fixes, the dso->kernel field would be zero (for user
space), so dso__is_kallsyms() would return false. After the commit,
kernel module DSOs are correctly labeled, but the result is that
dso__is_kallsyms() erroneously returns true for those modules without a
filesystem path.
Later, build_id_cache__add() consults this value of is_kallsyms, and
when true, it copies /proc/kallsyms into the cache. Users with many
kernel modules without a filesystem path (e.g. ksplice or possibly
kernel live patch modules) have reported excessive disk space usage in
the build ID cache directory due to this behavior.
To reproduce the issue, it's enough to build a trivial out-of-tree hello
world kernel module, load it using insmod, and then use:
perf record -ag -- sleep 1
In the build ID directory, there will be a directory for your module
name containing a kallsyms file.
Fix this up by changing dso__is_kallsyms() to consult the
dso_binary_type enumeration, which is also symmetric to the above checks
for dso__is_vmlinux() and dso__is_kcore(). With this change, kallsyms is
not cached in the build-id cache for out-of-tree modules.
Fixes: 02213cec64bbe ("perf maps: Mark module DSOs with kernel type")
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/r/20250318230012.2038790-1-stephen.s.brennan@oracle.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The functionalities of {disabled,required}-features.h have been replaced with
the auto-generated generated/<asm/cpufeaturemasks.h> header.
Thus they are no longer needed and can be removed.
None of the macros defined in {disabled,required}-features.h is used in tools,
delete them too.
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250305184725.3341760-4-xin@zytor.com
|
|
When s2[i] = '\0', if s1[i] != '\0', it will be judged by ret,
and if s1[i] = '\0', it will be judegd by !s1[i].
So in reality, s2 [i] will never make a judgment
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250314031013.94480-1-yangfeng59949@163.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The pyrf_event__new() method copies the event obtained from the perf
ring buffer to a structure that will then be turned into a python object
for further consumption, so it copies perf_event.header.size bytes to
its 'event' member:
$ pahole -C pyrf_event /tmp/build/perf-tools-next/python/perf.cpython-312-x86_64-linux-gnu.so
struct pyrf_event {
PyObject ob_base; /* 0 16 */
struct evsel * evsel; /* 16 8 */
struct perf_sample sample; /* 24 312 */
/* XXX last struct has 7 bytes of padding, 2 holes */
/* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */
union perf_event event; /* 336 4168 */
/* size: 4504, cachelines: 71, members: 4 */
/* member types with holes: 1, total: 2 */
/* paddings: 1, sum paddings: 7 */
/* last cacheline: 24 bytes */
};
$
It was doing so without checking if the event just obtained has more
than that space, fix it.
This isn't a proper, final solution, as we need to support larger
events, but for the time being we at least bounds check and document it.
Fixes: 877108e42b1b9ba6 ("perf tools: Initial python binding")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250312203141.285263-7-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When processing tracepoints the perf python binding was parsing the
event before calling perf_mmap__consume(&md->core) in
pyrf_evlist__read_on_cpu().
But part of this event parsing was to set the perf_sample->raw_data
pointer to the payload of the event, which then could be overwritten by
other event before tracepoint fields were asked for via event.prev_comm
in a python program, for instance.
This also happened with other fields, but strings were were problems
were surfacing, as there is UTF-8 validation for the potentially garbled
data.
This ended up showing up as (with some added debugging messages):
( field 'prev_comm' ret=0x7f7c31f65110, raw_size=68 ) ( field 'prev_pid' ret=0x7f7c23b1bed0, raw_size=68 ) ( field 'prev_prio' ret=0x7f7c239c0030, raw_size=68 ) ( field 'prev_state' ret=0x7f7c239c0250, raw_size=68 ) time 14771421785867 prev_comm= prev_pid=1919907691 prev_prio=796026219 prev_state=0x303a32313175 ==>
( XXX '��' len=16, raw_size=68) ( field 'next_comm' ret=(nil), raw_size=68 ) Traceback (most recent call last):
File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 51, in <module>
main()
File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 46, in main
event.next_comm,
^^^^^^^^^^^^^^^
AttributeError: 'perf.sample_event' object has no attribute 'next_comm'
When event.next_comm was asked for, the PyUnicode_FromString() python
API would fail and that tracepoint field wouldn't be available, stopping
the tools/perf/python/tracepoint.py test tool.
But, since we already do a copy of the whole event in pyrf_event__new,
just use it and while at it remove what was done in in e8968e654191390a
("perf python: Fix pyrf_evlist__read_on_cpu event consuming") because we
don't really need to wait for parsing the sample before declaring the
event as consumed.
This copy is questionable as is now, as it limits the maximum event +
sample_type and tracepoint payload to sizeof(union perf_event), this all
has been "working" because 'struct perf_event_mmap2', the largest entry
in 'union perf_event' is:
$ pahole -C perf_event ~/bin/perf | grep mmap2
struct perf_record_mmap2 mmap2; /* 0 4168 */
$
Fixes: bae57e3825a3dded ("perf python: Add support to resolve tracepoint fields")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250312203141.285263-6-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To avoid a leak if we have the python object but then something happens
and we need to return the operation, decrement the offset of the newly
created object.
Fixes: 377f698db12150a1 ("perf python: Add struct evsel into struct pyrf_event")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250312203141.285263-5-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Otherwise when debugging we see just "python" in perf, top, etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250312203141.285263-4-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When python2 support was removed in e7e9943c87d857da ("perf python:
Remove python 2 scripting support"), all use of the
_PyUnicode_FromString(arg), _PyUnicode_FromFormat(...), and
_PyLong_FromLong(arg) macros was removed as well, so remove it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250312203141.285263-3-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Some old cut'n'paste error, its "ip", so the description should be
"event ip", not "event type".
Fixes: 877108e42b1b9ba6 ("perf tools: Initial python binding")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250312203141.285263-2-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The DSO data read test opens a file but as dsos__exit is used the test
file isn't closed. This causes the subsequent subtests in don't fork
(-F) mode to fail as one more than expected file descriptor is open.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250318043151.137973-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
dso__list_del with address sanitizer and/or reference count checking
will call dso__put that can call dso__data_close reentrantly trying to
lock the dso__data_open_lock and deadlocking. Switch from pthread
mutexes to perf's mutex so that lock checking is performed in debug
builds. Add lock annotations that diagnosed the problem. Release the
dso__data_open_lock around the dso__put to avoid the deadlock.
Change the declaration of dso__data_get_fd to return a boolean,
indicating the fd is valid and the lock is held, to make it compatible
with the thread safety annotations as a try lock.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250318043151.137973-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Used to annotate when locks shouldn't be held for a function or if a
function returns a lock that's used by later mutex lock unlock
operations.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250318043151.137973-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Parameterize the basic testing to generate directly a perf.data file
or to generate/use one from pipe input or output. To simplify the
refactor move some of the head/grep logic around. Use "-q" with grep
to make the test output cleaner.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250311211635.541090-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When make_data fails its error message needs to go to stderr rather
than stdout and the stdout value is captured in a variable. Quote the
$err value so that it is always a valid input for test. This error is
commonly encountered if no sample data is gathered by the test.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250312001841.1515779-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The dc6d2bc2d893a878 ("perf sample: Make user_regs and intr_regs optional") misses
the changes to a file, resulting in this problem:
$ make LIBUNWIND=1 -C tools/perf O=/tmp/build/perf-tools-next install-bin
<SNIP>
CC /tmp/build/perf-tools-next/util/unwind-libunwind-local.o
CC /tmp/build/perf-tools-next/util/unwind-libunwind.o
<SNIP>
util/unwind-libunwind-local.c: In function ‘access_mem’:
util/unwind-libunwind-local.c:582:56: error: ‘ui->sample->user_regs’ is a pointer; did you mean to use ‘->’?
582 | if (__write || !stack || !ui->sample->user_regs.regs) {
| ^
| ->
util/unwind-libunwind-local.c:587:38: error: passing argument 2 of ‘perf_reg_value’ from incompatible pointer type [-Wincompatible-pointer-types]
587 | ret = perf_reg_value(&start, &ui->sample->user_regs,
| ^~~~~~~~~~~~~~~~~~~~~~
| |
| struct regs_dump **
<SNIP>
⬢ [acme@toolbox perf-tools-next]$ git bisect bad
dc6d2bc2d893a878e7b58578ff01b4738708deb4 is the first bad commit
commit dc6d2bc2d893a878e7b58578ff01b4738708deb4 (HEAD)
Author: Ian Rogers <irogers@google.com>
Date: Mon Jan 13 11:43:45 2025 -0800
perf sample: Make user_regs and intr_regs optional
Detected using:
make -C tools/perf build-test
Fixes: dc6d2bc2d893a878 ("perf sample: Make user_regs and intr_regs optional")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250313033121.758978-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Test case "stat_all_pmu.sh" is not correctly checking 'perf stat' output
due to a poor design. Firstly, having the 'set -e' option with a trap
catching the sigexit causes the shell to exit immediately if 'perf stat' ends
with any non-zero value, which is then caught by the trap reporting an
unexpected signal. This causes events that should be parsed by the if-else
statement to be caught by the trap handler and are reported as errors:
$ perf test -vv "perf all pmu"
Testing i915/actual-frequency/
Unexpected signal in main
Error:
Access to performance monitoring and observability operations is limited.
Secondly, the if-else branches are not exclusive as the checking if the
event is present in the output log covers also the "<not supported>"
events, which should be accepted, and also the "Bad name events", which
should be rejected.
Remove the "set -e" option from the test case, correctly parse the
"perf stat" output log and check its return value. Add the missing
outputs for the 'perf stat' result and also add logs messages to
report the branch that parsed the event for more info.
Fixes: 7e73ea40295620e7 ("perf test: Ignore security failures in all PMU test")
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Tested-by: Qiao Zhao <qzhao@redhat.com>
Link: https://lore.kernel.org/r/20241122231233.79509-1-vmolnaro@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The following commits added new fields/flags to the branch stack field
list:
commit 1f48989cdc7d ("perf script: Output branch sample type")
commit 6ade6c646035 ("perf script: Show branch speculation info")
commit 1e66dcff7b9b ("perf script: Add not taken event for branch stack")
Update brstack syntax documentation to be consistent with the latest
branch stack field list. Improve the descriptions to help users
interpret the fields accurately.
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20250312072329.419020-1-yujie.liu@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
BRACH -> BRANCH
Fixes: 88b1473135e4 ("perf script: Separate events from branch types")
Signed-off-by: Yujie Liu <yujie.liu@intel.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250312075636.429127-1-yujie.liu@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
new gcc versions
Do a simple bounds check to avoid this on new gcc versions:
31 15.81 fedora:rawhide : FAIL gcc version 15.0.1 20250225 (Red Hat 15.0.1-0) (GCC)
In function 'callchain__fprintf_left_margin',
inlined from 'callchain__fprintf_graph.constprop' at ui/stdio/hist.c:246:12:
ui/stdio/hist.c:27:39: error: iteration 2147483647 invokes undefined behavior [-Werror=aggressive-loop-optimizations]
27 | for (i = 0; i < left_margin; i++)
| ~^~
ui/stdio/hist.c:27:23: note: within this loop
27 | for (i = 0; i < left_margin; i++)
| ~~^~~~~~~~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250310194534.265487-4-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
No need to specify the array size, let the compiler figure that out.
This addresses this compiler warning that was noticed while build
testing on fedora rawhide:
31 15.81 fedora:rawhide : FAIL gcc version 15.0.1 20250225 (Red Hat 15.0.1-0) (GCC)
util/units.c: In function 'unit_number__scnprintf':
util/units.c:67:24: error: initializer-string for array of 'char' is too long [-Werror=unterminated-string-initialization]
67 | char unit[4] = "BKMG";
| ^~~~~~
cc1: all warnings being treated as errors
Fixes: 9808143ba2e54818 ("perf tools: Add unit_number__scnprintf function")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250310194534.265487-3-acme@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This option is to show data type info in the regular (code) annotation.
It tries to find data type for each (memory) instruction in the
function. It'd be useful to see function-level memory access pattern
and also to debug the data type profiling result.
The output would be added at the end of the line and have "# data-type:"
prefix.
For now, it only works with --stdio mode for simplicity. I can work on
enabling it for TUI later.
$ perf annotate --stdio --code-with-type
Percent | Source code & Disassembly of vmlinux for cpu/mem-loads/ppk (253 samples, percent: local period)
---------------------------------------------------------------------------------------------------------------
: 0 0xffffffff81baa000 <check_preemption_disabled>:
0.00 : ffffffff81baa000: pushq %r12 # data-type: (stack operation)
0.00 : ffffffff81baa002: pushq %rbp # data-type: (stack operation)
0.00 : ffffffff81baa003: pushq %rbx # data-type: (stack operation)
0.00 : ffffffff81baa004: subq $0x8, %rsp
18.00 : ffffffff81baa008: movl %gs:0x7e48893d(%rip), %ebx # 0x3294c <pcpu_hot+0xc> # data-type: struct pcpu_hot +0xc (cpu_number)
12.58 : ffffffff81baa00f: movl %gs:0x7e488932(%rip), %eax # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count)
0.00 : ffffffff81baa016: testl $0x7fffffff, %eax
0.00 : ffffffff81baa01b: je 0xffffffff81baa02c <check_preemption_disabled+0x2c>
0.00 : ffffffff81baa01d: addq $0x8, %rsp
0.00 : ffffffff81baa021: movl %ebx, %eax
14.19 : ffffffff81baa023: popq %rbx # data-type: (stack operation)
18.86 : ffffffff81baa024: popq %rbp # data-type: (stack operation)
12.10 : ffffffff81baa025: popq %r12 # data-type: (stack operation)
17.78 : ffffffff81baa027: jmp 0xffffffff81bc1170 <__x86_return_thunk>
6.49 : ffffffff81baa02c: callq *0xc9139e(%rip) # 0xffffffff8283b3d0 <pv_ops+0xf0> # data-type: (stack operation)
0.00 : ffffffff81baa032: testb $0x2, %ah
0.00 : ffffffff81baa035: je 0xffffffff81baa01d <check_preemption_disabled+0x1d>
0.00 : ffffffff81baa037: movq %rdi, %rbp
0.00 : ffffffff81baa03a: movq %gs:0x32940, %rax # data-type: struct pcpu_hot +0 (current_task)
0.00 : ffffffff81baa043: testb $0x4, 0x2f(%rax) # data-type: struct task_struct +0x2f (flags)
0.00 : ffffffff81baa047: je 0xffffffff81baa052 <check_preemption_disabled+0x52>
0.00 : ffffffff81baa049: cmpl $0x1, 0x3d0(%rax) # data-type: struct task_struct +0x3d0 (nr_cpus_allowed)
0.00 : ffffffff81baa050: je 0xffffffff81baa01d <check_preemption_disabled+0x1d>
0.00 : ffffffff81baa052: movq %gs:0x32940, %r12 # data-type: struct pcpu_hot +0 (current_task)
0.00 : ffffffff81baa05b: cmpw $0x0, 0x7f0(%r12) # data-type: struct task_struct +0x7f0 (migration_disabled)
0.00 : ffffffff81baa065: movq %rsi, (%rsp)
0.00 : ffffffff81baa069: jne 0xffffffff81baa01d <check_preemption_disabled+0x1d>
0.00 : ffffffff81baa06b: movl 0xe8dd13(%rip), %eax # 0xffffffff82a37d84 <system_state> # data-type: enum system_states +0
0.00 : ffffffff81baa071: testl %eax, %eax
0.00 : ffffffff81baa073: je 0xffffffff81baa01d <check_preemption_disabled+0x1d>
0.00 : ffffffff81baa075: incl %gs:0x7e4888cc(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count)
0.00 : ffffffff81baa07c: movq $-0x7e14a100, %rdi
0.00 : ffffffff81baa083: callq 0xffffffff81148c40 <__printk_ratelimit> # data-type: (stack operation)
0.00 : ffffffff81baa088: testl %eax, %eax
0.00 : ffffffff81baa08a: je 0xffffffff81baa0d5 <check_preemption_disabled+0xd5>
0.00 : ffffffff81baa08c: movl 0x958(%r12), %r9d # data-type: struct task_struct +0x958 (pid)
0.00 : ffffffff81baa094: movq (%rsp), %rdx # data-type: char* +0
0.00 : ffffffff81baa098: movq %rbp, %rsi
0.00 : ffffffff81baa09b: leaq 0xb88(%r12), %r8 # data-type: struct task_struct +0xb88 (comm)
0.00 : ffffffff81baa0a3: movl %gs:0x7e48889e(%rip), %ecx # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count)
0.00 : ffffffff81baa0aa: andl $0x7fffffff, %ecx
0.00 : ffffffff81baa0b0: movq $-0x7dd3cdf0, %rdi
0.00 : ffffffff81baa0b7: subl $0x1, %ecx
0.00 : ffffffff81baa0ba: callq 0xffffffff81149340 <_printk> # data-type: (stack operation)
0.00 : ffffffff81baa0bf: movq 0x20(%rsp), %rsi
0.00 : ffffffff81baa0c4: movq $-0x7ddb8c7e, %rdi
0.00 : ffffffff81baa0cb: callq 0xffffffff81149340 <_printk> # data-type: (stack operation)
0.00 : ffffffff81baa0d0: callq 0xffffffff81b7ab60 <dump_stack> # data-type: (stack operation)
0.00 : ffffffff81baa0d5: decl %gs:0x7e48886c(%rip) # 0x32948 <pcpu_hot+0x8> # data-type: struct pcpu_hot +0x8 (preempt_count)
0.00 : ffffffff81baa0dc: jmp 0xffffffff81baa01d <check_preemption_disabled+0x1d>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-8-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Sometimes it's useful to see both instructions and their data type
together. Let's extend the annotate code to use data type profiling
functions.
To make it easy to pass more argument, introduce a struct to carry
necessary information together. Also add a new annotation_option called
'code_with_type' to control the behavior. This is not enabled yet but
it'll be set later from the command line.
For simplicity, this is implemented for --stdio only.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-7-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
So that it can only handle a single disasm_linme and hopefully make the
code simpler. This is also a preparation to be called from different
places later.
The NO_TYPE macro was added to distinguish when it failed or needs retry.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-6-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It's a prepartion to support code annotation and data type
annotation at the same time. Data type annotation needs more
information in the hist_entry so it needs to be passed deeper.
Also rename a function with the same name in the builtin-annotate.c
to hist_entry__stdio_annotate since it matches better to the command
line option. And change the condition inside to be simpler.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The annotation_line__print() has many arguments. But min_percent,
max_lines and percent_type are from struct annotaion_options. So let's
pass a pointer to the option instead of passing them separately to
reduce the number of function arguments.
Actually it has a recursive call if 'queue' is set. Add a new option
instance to pass different values for the case.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It's not used anywhere, let's get rid of it.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Factor out a function to get the name of member field at the given
offset. This will be used in other places.
Also update the output of typeoff sort key a little bit. As we know
that some special types like (stack operation), (stack canary) and
(unknown) won't have fields, skip printing the offset and field.
For example, the following change is expected.
"(stack operation) +0 (no field)" ==> "(stack operation)"
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It should use an atomic instruction to update even if the histogram is
keyed by delta as it's also used for stats.
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250227191223.1288473-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The bucket_num is set based on the {max,min}_latency already in
cmd_ftrace(), so no need to check it again in BPF. Also I found
that it didn't pass the max_latency to BPF. :)
No functional changes intended.
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250227191223.1288473-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When BPF collects the stats for the latency in usec, it first divides
the time by 1000. But that means it would have 0 if the delta is small
and won't update the total time properly.
Let's keep the stats in nsec always and adjust to usec before printing.
Before:
$ sudo ./perf ftrace latency -ab -T mutex_lock --hide-empty -- sleep 0.1
# DURATION | COUNT | GRAPH |
0 - 1 us | 765 | ############################################# |
1 - 2 us | 10 | |
2 - 4 us | 2 | |
4 - 8 us | 5 | |
# statistics (in usec)
total time: 0 <<<--- (here)
avg time: 0
max time: 6
min time: 0
count: 782
After:
$ sudo ./perf ftrace latency -ab -T mutex_lock --hide-empty -- sleep 0.1
# DURATION | COUNT | GRAPH |
0 - 1 us | 880 | ############################################ |
1 - 2 us | 13 | |
2 - 4 us | 8 | |
4 - 8 us | 3 | |
# statistics (in usec)
total time: 268 <<<--- (here)
avg time: 0
max time: 6
min time: 0
count: 904
Tested-by: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250227191223.1288473-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a loop and helper function to avoid repetition, the loop uses
arrays so switch the shell to bash. Add additional topdown group tests
where a topdown event needs to be moved beyond others and the slots
event isn't first in the target group. This replicates issues that
occur on hybrid systems where the other events are for the cpu_atom
PMU. Test with both PMU and software events. Place the slots event
later in the event list.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250307023906.1135613-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Update to remove comments about groupings not working and with the:
```
perf stat -e "{instructions,slots},{cycles,topdown-retiring}"
```
case that now works.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250307023906.1135613-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In the case of '{instructions,slots},faults,topdown-retiring' the
first event that must be grouped, slots, is ignored causing the
topdown-retiring event not to be adjacent to the group it needs to be
inserted into. Don't ignore the group members when computing the
force_grouped_index.
Make the force_grouped_index be for the leader of the group it is
within and always use it first rather than a group leader index so
that topdown events may be sorted from one group into another.
As the PMU name comparison applies to moving events in the same group
ensure the name ordering is always respected.
Change the group splitting logic to not group if there are no other
topdown events and to fix cases where the force group leader wasn't
being grouped with the other members of its group.
Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Closes: https://lore.kernel.org/lkml/20250224083306.71813-2-dapeng1.mi@linux.intel.com/
Closes: https://lore.kernel.org/lkml/f7e4f7e8-748c-4ec7-9088-0e844392c11a@linux.intel.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250307023906.1135613-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When running topdown leader smapling test on Intel hybrid platforms,
such as LNL/ARL, we see the below error.
Topdown leader sampling test
Topdown leader sampling [Failed topdown events not reordered correctly]
It indciates the below command fails.
perf record -o "${perfdata}" -e "{instructions,slots,topdown-retiring}:S" true
The root cause is that perf tool creats a perf event for each PMU type
if it can create.
As for this command, there would be 5 perf events created,
cpu_atom/instructions/,cpu_atom/topdown_retiring/,
cpu_core/slots/,cpu_core/instructions/,cpu_core/topdown-retiring/
For these 5 events, the 2 cpu_atom events are in a group and the other 3
cpu_core events are in another group.
When arch_topdown_sample_read() traverses all these 5 events, events
cpu_atom/instructions/ and cpu_core/slots/ don't have a same group
leade, and then return false directly and lead to cpu_core/slots/ event
is used to sample and this is not allowed by PMU driver.
It's a overkill to return false directly if "evsel->core.leader !=
leader->core.leader" since there could be multiple groups in the event
list.
Just "continue" instead of "return false" to fix this issue.
Fixes: 1e53e9d1787b ("perf x86/topdown: Correct leader selection with sample_read enabled")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250307023906.1135613-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Support the PMU name from the legacy hardware and hw_cache PMU
extended types. Remove some macros and make variables more intention
revealing, rather than just being called "value".
Before:
```
$ perf stat -vv -e instructions true
...
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0xa00000001
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 181636 cpu -1 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0x400000001
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 181636 cpu -1 group_fd -1 flags 0x8 = 6
...
```
After:
```
$ perf stat -vv -e instructions true
...
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0xa00000001 (cpu_atom/PERF_COUNT_HW_INSTRUCTIONS/)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 181724 cpu -1 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0x400000001 (cpu_core/PERF_COUNT_HW_INSTRUCTIONS/)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 181724 cpu -1 group_fd -1 flags 0x8 = 6
...
```
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20250307023906.1135613-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than manually configuring an evsel, switch to using
parse_events for greater commonality with the rest of the perf code.
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-12-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add access to evlist__config that is used to configure an evlist with
record options.
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a means to get the reference counted all_cpus CPU map from an
evlist in its python form.
Reviewed-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20250228222308.626803-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|