summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2024-04-26perf tests parse-events: Use "branches" rather than "cache-references"Ian Rogers
Switch from "cache-references" to "branches" in test as Intel has a sysfs event for "cache-references" and changing the priority for sysfs over legacy causes the test to fail. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Atish Patra <atishp@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Beeman Strong <beeman@rivosinc.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240416061533.921723-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26perf pmu: Refactor perf_pmu__match()Ian Rogers
Move all implementation to pmu code. Don't allocate a fnmatch wildcard pattern, matching ignoring the suffix already handles this, and only use fnmatch if the given PMU name has a '*' in it. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Atish Patra <atishp@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Beeman Strong <beeman@rivosinc.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240416061533.921723-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26perf parse-events: Avoid copying an empty listIan Rogers
In parse_events_add_pmu, delay copying the list of terms until it is known the list contains terms. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Atish Patra <atishp@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Beeman Strong <beeman@rivosinc.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240416061533.921723-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26perf parse-events: Directly pass PMU to parse_events_add_pmu()Ian Rogers
Avoid passing the name of a PMU then finding it again, just directly pass the PMU. parse_events_multi_pmu_add_or_add_pmu() is the only version that needs to find a PMU, so move the find there. Remove the error message as parse_events_multi_pmu_add_or_add_pmu will given an error at the end when a name isn't either a PMU name or event name. Without the error message being created the location in the input parameter (loc) can be removed. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Atish Patra <atishp@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Beeman Strong <beeman@rivosinc.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240416061533.921723-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26perf parse-events: Factor out '<event_or_pmu>/.../' parsingIan Rogers
Factor out the case of an event or PMU name followed by a slash based term list. This is with a view to sharing the code with new legacy hardware parsing. Use early return to reduce indentation in the code. Make parse_events_add_pmu static now it doesn't need sharing with parse-events.y. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Atish Patra <atishp@rivosinc.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Beeman Strong <beeman@rivosinc.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240416061533.921723-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26perf scripts python: Add a script to run instances of 'perf script' in parallelAdrian Hunter
Add a Python script to run a perf script command multiple times in parallel, using perf script options --cpu and --time so that each job processes a different chunk of the data. Extend perf script tests to test also the new script. The script supports the use of normal 'perf script' options like --dlfilter and --script, so that the benefit of running parallel jobs naturally extends to them also. In addition, a command can be provided (refer --pipe-to option) to pipe standard output to a custom command. Refer to the script's own help text at the end of the patch for more details. The script is useful for Intel PT traces, that can be efficiently decoded by 'perf script' when split by CPU and/or time ranges. Running jobs in parallel can decrease the overall decoding time. Committer testing: Ian reported that shellcheck found some issues, I installed it as there are no warnings about it not being available, but when available it fails the build with: TEST /tmp/build/perf-tools-next/tests/shell/script.sh.shellcheck_log CC /tmp/build/perf-tools-next/util/header.o In tests/shell/script.sh line 20: rm -rf "${temp_dir}/"* ^-------------^ SC2115 (warning): Use "${var:?}" to ensure this never expands to /* . In tests/shell/script.sh line 83: output1_dir="${temp_dir}/output1" ^---------^ SC2034 (warning): output1_dir appears unused. Verify use (or export if used externally). In tests/shell/script.sh line 84: output2_dir="${temp_dir}/output2" ^---------^ SC2034 (warning): output2_dir appears unused. Verify use (or export if used externally). In tests/shell/script.sh line 86: python3 "${pp}" -o "${output_dir}" --jobs 4 --verbose -- perf script -i "${perf_data}" ^-----------^ SC2154 (warning): output_dir is referenced but not assigned (did you mean 'output1_dir'?). For more information: https://www.shellcheck.net/wiki/SC2034 -- output1_dir appears unused. Verif... https://www.shellcheck.net/wiki/SC2115 -- Use "${var:?}" to ensure this nev... https://www.shellcheck.net/wiki/SC2154 -- output_dir is referenced but not ... Did these fixes: - rm -rf "${temp_dir}/"* + rm -rf "${temp_dir:?}/"* And: @@ -83,8 +83,8 @@ test_parallel_perf() output1_dir="${temp_dir}/output1" output2_dir="${temp_dir}/output2" perf record -o "${perf_data}" --sample-cpu uname - python3 "${pp}" -o "${output_dir}" --jobs 4 --verbose -- perf script -i "${perf_data}" - python3 "${pp}" -o "${output_dir}" --jobs 4 --verbose --per-cpu -- perf script -i "${perf_data}" + python3 "${pp}" -o "${output1_dir}" --jobs 4 --verbose -- perf script -i "${perf_data}" + python3 "${pp}" -o "${output2_dir}" --jobs 4 --verbose --per-cpu -- perf script -i "${perf_data}" After that: root@number:~# perf test -vv "perf script tests" 97: perf script tests: --- start --- test child forked, pid 4084139 DB test [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.032 MB /tmp/perf-test-script.T4MJDr0L6J/perf.data (7 samples) ] <SNIP> DB test [Success] parallel-perf test Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.034 MB /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data (7 samples) ] Starting: perf script --time=,91898.301878499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --time=91898.301878500,91898.301905999 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --time=91898.301906000,91898.301933499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --time=91898.301933500, -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --time=91898.301878500,91898.301905999 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --time=91898.301906000,91898.301933499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 4 jobs: 2 completed, 2 running Finished: perf script --time=,91898.301878499 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --time=91898.301933500, -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 4 jobs: 4 completed, 0 running All jobs finished successfully parallel-perf.py done Starting: perf script --cpu=0 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=1 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=2 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=3 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=0 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=1 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=2 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=3 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 28 jobs: 4 completed, 0 running Starting: perf script --cpu=4 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=5 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=6 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=7 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=4 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=5 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=6 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=7 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 28 jobs: 8 completed, 0 running Starting: perf script --cpu=8 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=9 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=10 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=11 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=8 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=9 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=10 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=11 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 28 jobs: 12 completed, 0 running Starting: perf script --cpu=12 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=13 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=14 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=15 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=12 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=13 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=14 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=15 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 28 jobs: 16 completed, 0 running Starting: perf script --cpu=16 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=17 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=18 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=19 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=16 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=17 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=18 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=19 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 28 jobs: 20 completed, 0 running Starting: perf script --cpu=20 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=21 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=22 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=23 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=20 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=21 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=22 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=23 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 28 jobs: 24 completed, 0 running Starting: perf script --cpu=24 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=25 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=26 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Starting: perf script --cpu=27 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=25 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=26 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data Finished: perf script --cpu=27 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 28 jobs: 27 completed, 1 running Finished: perf script --cpu=24 -i /tmp/perf-test-script.T4MJDr0L6J/pp-perf.data There are 28 jobs: 28 completed, 0 running All jobs finished successfully parallel-perf.py done parallel-perf test [Success] --- Cleaning up --- ---- end(0) ---- 97: perf script tests : Ok root@number:~# Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240423133248.10206-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26perf tests shell kprobes: Add missing description as used by 'perf test' outputArnaldo Carvalho de Melo
Before: root@x1:~# perf test 76 76: SPDX-License-Identifier: GPL-2.0 : Ok root@x1:~# After: root@x1:~# perf test 76 76: Add 'perf probe's, list and remove them. : Ok root@x1:~# Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/lkml/ZigRDKUGkcDqD-yW@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-26perf riscv: Fix the warning due to the incompatible typeBen Zong-You Xie
In the 32-bit platform, the second argument of getline is expectd to be 'size_t *'(aka 'unsigned int *'), but line_sz is of type 'unsigned long *'. Therefore, declare line_sz as size_t. Signed-off-by: Ben Zong-You Xie <ben717@andestech.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240305120501.1785084-3-ben717@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tools include UAPI: Sync linux/vhost.h with the kernel sourcesArnaldo Carvalho de Melo
To get the changes in: 2855c2a7820bc819 ("vhost-vdpa: change ioctl # for VDPA_GET_VRING_SIZE") 1496c47065f9f841 ("vhost-vdpa: uapi to support reporting per vq size") To pick up these changes and support them: $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before $ cp include/uapi/linux/vhost.h tools/perf/trace/beauty/include/uapi/linux/vhost.h $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after $ diff -u before after --- before 2024-04-22 13:39:37.185674799 -0300 +++ after 2024-04-22 13:39:52.043344784 -0300 @@ -50,5 +50,6 @@ [0x7F] = "VDPA_GET_VRING_DESC_GROUP", [0x80] = "VDPA_GET_VQS_COUNT", [0x81] = "VDPA_GET_GROUP_NUM", + [0x82] = "VDPA_GET_VRING_SIZE", [0x8] = "NEW_WORKER", }; $ For instance, see how those 'cmd' ioctl arguments get translated, now VDPA_GET_VRING_SIZE will be as well: # perf trace -a -e ioctl --max-events=10 0.000 ( 0.011 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1) = 0 21.353 ( 0.014 ms): pipewire/2261 ioctl(fd: 60, cmd: SNDRV_PCM_HWSYNC, arg: 0x1) = 0 25.766 ( 0.014 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740) = 0 25.845 ( 0.034 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70) = 0 25.916 ( 0.011 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0) = 0 25.941 ( 0.025 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ATOMIC, arg: 0x7ffe4a22c840) = 0 32.915 ( 0.009 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_RMFB, arg: 0x7ffe4a22cf9c) = 0 42.522 ( 0.013 ms): gnome-shell/2196 ioctl(fd: 14, cmd: DRM_I915_IRQ_WAIT, arg: 0x7ffe4a22c740) = 0 42.579 ( 0.031 ms): gnome-shel:cs0/2212 ioctl(fd: 14, cmd: DRM_I915_IRQ_EMIT, arg: 0x7fd43915dc70) = 0 42.644 ( 0.010 ms): gnome-shell/2196 ioctl(fd: 9, cmd: DRM_MODE_ADDFB2, arg: 0x7ffe4a22c8a0) = 0 # This addresses this perf tools build warning: diff -u tools/perf/trace/beauty/include/uapi/linux/vhost.h include/uapi/linux/vhost.h But this specific process, usually boring, this time around catch a problem, namely the addition of VDPA_GET_VRING_SIZE used an ioctl number already taken, which went on unnoticed and only got caught when the tools/perf/trace/beauty/vhost_virtio_ioctl.sh script was run as part of the perf tools process of updating the tools copies of system headers it uses for creating id->string tables that, well, broke the perf tools build because there were multiple initializations in the strings table for the 0x80 entry... I'm adding here a link to the discussion, that is lacking in the fix for the reported problem, and a quote from one of the developers involved: "Thanks a lot for taking care of this! So given the header is actually buggy pls hang on to this change until I merge the fix for the header (you were CC'd on the patch). It's great we have this redundancy which allowed us to catch the bug in time, and many thanks to Namhyung Kim for reporting the issue!" This is here as a hint for anyone thinking about ways to automate checking these issues in a more automated way... ;-) Link: https://lore.kernel.org/lkml/ 20240402172151-mutt-send-email-mst@kernel.org Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Zhu Lingshan <lingshan.zhu@intel.com> Link: https://lore.kernel.org/lkml/ZiaW-csEZLKK48BE@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-22Merge remote-tracking branch 'torvalds/master' into perf-tools-nextArnaldo Carvalho de Melo
To pick up fixes sent via perf-tools, by Namhyung Kim. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18Revert "tools headers: Remove almost unused copy of uapi/stat.h, add few ↵Arnaldo Carvalho de Melo
conditional defines" This reverts commit a672af9139a843eb7a48fd7846bb6df8f81b5f86. By now it is not used for building tools/perf, but Stephen Rothwell reported that when building on a O= directory that had been built with torvalds/master and this perf build command line: $ make -C tools/perf -f Makefile.perf -s -O -j60 O=/home/sfr/next/perf NO_BPF_SKEL=1 If we then merge perf-tools-next, as he did for linux-next, then we end up with a build failure for libbpf: PERF_VERSION = 6.9.rc3.g42c4635c8dee make[3]: *** No rule to make target '/home/sfr/next/next/tools/include/uapi/linux/stat.h', needed by '/home/sfr/next/perf/libbpf/staticobjs/libbpf.o'. Stop. make[2]: *** [Makefile:157: /home/sfr/next/perf/libbpf/staticobjs/libbpf-in.o] Error 2 make[1]: *** [Makefile.perf:892: /home/sfr/next/perf/libbpf/libbpf.a] Error 2 make[1]: *** Waiting for unfinished jobs.... make: *** [Makefile.perf:264: sub-make] Error 2 This needs to be further investigated to figure out how to check if libbpf really needs something that is in that tools/include/uapi/linux/stat.h file and if not to remove that file in a way that we don't break the build in any situation, to avoid requiring doing a 'make clean'. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # PowerPC le incermental build Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240413124340.4d48c6d8@canb.auug.org.au Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18perf probe-event: Better error message for a too-long probe nameDima Kogan
This is a common failure mode when probing userspace C++ code (where the mangling adds significant length to the symbol names). Prior to this patch, only a very generic error message is produced, making the user guess at what the issue is. Signed-off-by: Dima Kogan <dima@secretsauce.net> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240416045533.162692-3-dima@secretsauce.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18perf probe-event: Un-hardcode sizeof(buf)Dima Kogan
In several places we had char buf[64]; ... snprintf(buf, 64, ...); This patch changes it to char buf[64]; ... snprintf(buf, sizeof(buf), ...); so the "64" is only stated once. Signed-off-by: Dima Kogan <dima@secretsauce.net> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240416045533.162692-2-dima@secretsauce.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18perf stat: Add new field in stat_config to enable hardware aware groupingWeilin Wang
Hardware counter and event information could be used to help creating event groups that better utilize hardware counters and improve multiplexing. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Weilin Wang <weilin.wang@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Link: https://lore.kernel.org/r/20240412210756.309828-2-weilin.wang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18perf test shell arm_coresight: Increase buffer size for Coresight basic testsJames Clark
These tests record in a mode that includes kernel trace but look for samples of a userspace process. This makes them sensitive to any kernel compilation options that increase the amount of time spent in the kernel. If the trace buffer is completely filled before userspace is reached then the test will fail. Double the buffer size to fix this. The other tests in the same file aren't sensitive to this for various reasons, for example the iterate devices test filters by userspace trace only. But in order to keep coverage of all the modes, increase the buffer size rather than filtering by userspace for the basic tests. Fixes: d1efa4a0a696e487 ("perf cs-etm: Add separate decode paths for timeless and per-thread modes") Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20240326113749.257250-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18perf genelf: Fix compiling with libelf on rv32Chen Pei
When cross-compiling perf with libelf, the following error occurred: In file included from tests/genelf.c:14: tests/../util/genelf.h:50:2: error: #error "unsupported architecture" 50 | #error "unsupported architecture" | ^~~~~ tests/../util/genelf.h:59:5: warning: "GEN_ELF_CLASS" is not defined, evaluates to 0 [-Wundef] 59 | #if GEN_ELF_CLASS == ELFCLASS64 Fix this by adding GEN-ELF-ARCH and GEN-ELF-CLASS definitions for rv32. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Chen Pei <cp0613@linux.alibaba.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20240415095532.4930-1-cp0613@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18perf vendor events arm64: AmpereOne/AmpereOneX: Mark L1D_CACHE_INVAL ↵Ilkka Koskinen
impacted by errata L1D_CACHE_INVAL overcounts in certain situations. See AC03_CPU_41 and AC04_CPU_1 for more details. Mark the event impacted by the errata. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240408214022.541839-1-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18perf test bpf-counters: Add test for BPF event modifierIan Rogers
Refactor test to better enable sharing of logic, to give an idea of progress and introduce test functions. Add test of measuring both cycles and cycles:b simultaneously. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240416170014.985191-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-18perf docs: Document bpf event modifierIan Rogers
Document that 'b' is used as a modifier to make an event use a BPF counter. Fixes: 01bd8efcec444468 ("perf stat: Introduce ':b' modifier") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20240416170014.985191-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-17perf tools: Enable configs required for test_uprobe_from_different_cu.shChaitanya S Prakash
Test "perf probe of function from different CU" fails due to certain configs not being enabled. Building the kernel with CONFIG_KPROBE_EVENTS=y and CONFIG_UPROBE_EVENTS=y fixes the issue. As CONFIG_KPROBE_EVENTS is dependent on CONFIG_KPROBES, enable it as well. Some platforms enable these configs as a part of their defconfig, so this change is only required for the ones that don't do so. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20240408062230.1949882-1-ChaitanyaS.Prakash@arm.com Link: https://lore.kernel.org/r/20240408062230.1949882-7-ChaitanyaS.Prakash@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-17perf report: Add weight[123] output fieldsNamhyung Kim
Add weight1, weight2 and weight3 fields to -F/--fields and their aliases like 'ins_lat', 'p_stage_cyc' and 'retire_lat'. Note that they are in the sort keys too but the difference is that output fields will sum up the weight values and display the average. In the sort key, users can see the distribution of weight value and I think it's confusing we have local vs. global weight for the same weight. For example, I experiment with mem-loads events to get the weights. On my laptop, it seems only weight1 field is supported. $ perf mem record -- perf test -w noploop Let's look at the noploop function only. It has 7 samples. $ perf script -F event,ip,sym,weight | grep noploop # event weight ip sym cpu/mem-loads,ldlat=30/P: 43 55b3c122bffc noploop cpu/mem-loads,ldlat=30/P: 48 55b3c122bffc noploop cpu/mem-loads,ldlat=30/P: 38 55b3c122bffc noploop <--- same weight cpu/mem-loads,ldlat=30/P: 38 55b3c122bffc noploop <--- same weight cpu/mem-loads,ldlat=30/P: 59 55b3c122bffc noploop cpu/mem-loads,ldlat=30/P: 33 55b3c122bffc noploop cpu/mem-loads,ldlat=30/P: 38 55b3c122bffc noploop <--- same weight When you use the 'weight' sort key, it'd show entries with a separate weight value separately. Also note that the first entry has 3 samples with weight value 38, so they are displayed together and the weight value is the sum of 3 samples (114 = 38 * 3). $ perf report -n -s +weight | grep -e Weight -e noploop # Overhead Samples Command Shared Object Symbol Weight 0.53% 3 perf perf [.] noploop 114 0.18% 1 perf perf [.] noploop 59 0.18% 1 perf perf [.] noploop 48 0.18% 1 perf perf [.] noploop 43 0.18% 1 perf perf [.] noploop 33 If you use 'local_weight' sort key, you can see the actual weight. $ perf report -n -s +local_weight | grep -e Weight -e noploop # Overhead Samples Command Shared Object Symbol Local Weight 0.53% 3 perf perf [.] noploop 38 0.18% 1 perf perf [.] noploop 59 0.18% 1 perf perf [.] noploop 48 0.18% 1 perf perf [.] noploop 43 0.18% 1 perf perf [.] noploop 33 But when you use the -F/--field option instead, you can see the average weight for the while noploop function (as it won't group samples by weight value and use the default 'comm,dso,sym' sort keys). $ perf report -n -F +weight | grep -e Weight -e noploop Warning: --fields weight shows the average value unlike in the --sort key. # Overhead Samples Weight1 Command Shared Object Symbol 1.23% 7 42.4 perf perf [.] noploop The weight1 field shows the average value: (38 * 3 + 59 + 48 + 43 + 33) / 7 = 42.4 Also it'd show the warning that 'weight' field has the average value. Using 'weight1' can remove the warning. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240411181718.2367948-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-17perf hist: Add weight fields to hist entry statsNamhyung Kim
Like period and sample numbers, it'd be better to track weight values and display them in the output rather than having them as sort keys. This patch just adds a few more fields to save the weights in a hist entry. It'll be displayed as new output fields in the later patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240411181718.2367948-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-17perf hist: Move histogram related code to hist.hNamhyung Kim
It's strange that sort.h has the definition of struct hist_entry. As sort.h already includes hist.h, let's move the data structure to hist.h. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240411181718.2367948-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16perf annotate-data: Handle RSP if it's not the FB registerNamhyung Kim
In some cases, the stack pointer on x86 (rsp = reg7) is used to point variables on stack but it's not the frame base register. Then it should handle the register like normal registers (IOW not to access the other stack variables using offset calculation) but it should not assume it would have a pointer. Before: ----------------------------------------------------------- find data type for 0x7c(reg7) at tcp_getsockopt+0xb62 CU for net/ipv4/tcp.c (die:0x7b5f516) frame base: cfa=0 fbreg=6 no pointer or no type check variable "zc" failed (die: 0x7b9580a) variable location: base=reg7, offset=0x40 type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4) After: ----------------------------------------------------------- find data type for 0x7c(reg7) at tcp_getsockopt+0xb62 CU for net/ipv4/tcp.c (die:0x7b5f516) frame base: cfa=0 fbreg=6 found "zc" in scope=3/3 (die: 0x7b957fc) type_offset=0x3c variable location: base=reg7, offset=0x40 type='struct tcp_zerocopy_receive' size=0x40 (die:0x7b947f4) Note that the type-offset was properly calculated to 0x3c as the variable starts at 0x40. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240412183310.2518474-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16perf dwarf-aux: Check variable address range properlyNamhyung Kim
In match_var_offset(), it just checked the end address of the variable with the given offset because it assumed the register holds a pointer to the data type and the offset starts from the base. But I found some cases that the stack pointer (rsp = reg7) register is used to pointer a stack variable while the frame base is maintained by a different register (rbp = reg6). In that case, it cannot simply use the stack pointer as it cannot guarantee that it points to the frame base. So it needs to check both boundaries of the variable location. Before: ----------------------------------------------------------- find data type for 0x7c(reg7) at tcp_getsockopt+0xb62 CU for net/ipv4/tcp.c (die:0x7b5f516) frame base: cfa=0 fbreg=6 no pointer or no type check variable "tss" failed (die: 0x7b95801) variable location: base reg7, offset=0x110 type='struct scm_timestamping_internal' size=0x30 (die:0x7b8c126) So the current code just checks register number for the non-PC and non-FB registers and assuming it has offset 0. But this variable has offset 0x110 so it should not match to this. After: ----------------------------------------------------------- find data type for 0x7c(reg7) at tcp_getsockopt+0xb62 CU for net/ipv4/tcp.c (die:0x7b5f516) frame base: cfa=0 fbreg=6 no pointer or no type check variable "zc" failed (die: 0x7b9580a) variable location: base=reg7, offset=0x40 type='struct tcp_zerocopy_receive' size=0x40 (die:7b947f4) Now it find the correct variable "zc". It was located at reg7 + 0x40 and the size if 0x40 which means it should cover [0x40, 0x80). And the access was for reg7 + 0x7c so it found the right one. But it still failed to use the variable and it would be handled in the next patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240412183310.2518474-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16perf dwarf-aux: Check pointer offset when checking variablesNamhyung Kim
In match_var_offset(), it checks the offset range with the target type only for non-pointer types. But it also needs to check the pointer types with the target type. This is because there can be more than one pointer variable located in the same register. Let's look at the following example. It's looking up a variable for reg3 at tcp_get_info+0x62. It found "sk" variable but it wasn't the right one since it accesses beyond the target type (struct 'sock' in this case) size. ----------------------------------------------------------- find data type for 0x7bc(reg3) at tcp_get_info+0x62 CU for net/ipv4/tcp.c (die:0x7b5f516) frame base: cfa=0 fbreg=6 offset: 1980 is bigger than size: 760 check variable "sk" failed (die: 0x7b92b2c) variable location: reg3 type='struct sock' size=0x2f8 (die:0x7b63c3a) Actually there was another variable "tp" in the function and it's located at the same (reg3) because it's just type-casted like below. void tcp_get_info(struct sock *sk, struct tcp_info *info) { const struct tcp_sock *tp = tcp_sk(sk); ... The 'struct tcp_sock' contains the 'struct sock' at offset 0 so it can just use the same address as a pointer to tcp_sock. That means it should match variables correctly by checking the offset and size. Actually it cannot distinguish if the offset was smaller than the size of the original struct sock. But I think it's fine as they are the same at that part. So let's check the target type size and retry if it doesn't match. Now it succeeded to find the correct variable. ----------------------------------------------------------- find data type for 0x7bc(reg3) at tcp_get_info+0x62 CU for net/ipv4/tcp.c (die:0x7b5f516) frame base: cfa=0 fbreg=6 found "tp" in scope=1/1 (die: 0x7b92b16) type_offset=0x7bc variable location: reg3 type='struct tcp_sock' size=0xa68 (die:0x7b81380) Fixes: bc10db8eb8955fbc ("perf annotate-data: Support stack variables") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240412183310.2518474-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-16perf annotate-data: Improve debug message with location infoNamhyung Kim
To verify it found the correct variable, let's add the location expression to the debug message. $ perf --debug type-profile annotate --data-type ... ----------------------------------------------------------- find data type for 0xaf0(reg15) at schedule+0xeb CU for kernel/sched/core.c (die:0x1180523) frame base: cfa=0 fbreg=6 found "rq" in scope=3/4 (die: 0x11b6a00) type_offset=0xaf0 variable location: reg15 type='struct rq' size=0xfc0 (die:0x11892e2) ----------------------------------------------------------- find data type for 0x7bc(reg3) at tcp_get_info+0x62 CU for net/ipv4/tcp.c (die:0x7b5f516) frame base: cfa=0 fbreg=6 offset: 1980 is bigger than size: 760 check variable "sk" failed (die: 0x7b92b2c) variable location: reg3 type='struct sock' size=0x2f8 (die:0x7b63c3a) ----------------------------------------------------------- ... The first case is fine. It looked up a data type in r15 with offset of 0xaf0 at schedule+0xeb. It found the CU die and the frame base info and the variable "rq" was found in the scope 3/4. Its location is the r15 register and the type size is 0xfc0 which includes 0xaf0. But the second case is not good. It looked up a data type in rbx (reg3) with offset 0x7bc. It found a CU and the frame base which is good so far. And it also found a variable "sk" but the access offset is bigger than the type size (1980 vs. 760 or 0x7bc vs. 0x2f8). The variable has the right location (reg3) but I need to figure out why it accesses beyond what it's supposed to. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240412183310.2518474-2-namhyung@kernel.org [ Fix the build on 32-bit by casting Dwarf_Word to (long) in pr_debug_location() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf bench uprobe: Add uretprobe variant of uprobe benchmarksIan Rogers
Name benchmarks with _ret at the end to avoid creating a new set of benchmarks. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrei Vagin <avagin@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Kook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240406040911.1603801-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf bench uprobe: Remove lib64 from libc.so.6 binary pathIan Rogers
bpf_program__attach_uprobe_opts will search LD_LIBRARY_PATH and so specifying `/lib64` is unnecessary and causes failures for libc.so.6 paths like `/lib/x86_64-linux-gnu/libc.so.6`. Fixes: 7b47623b8cae8149 ("perf bench uprobe trace_printk: Add entry attaching an BPF program that does a trace_printk") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrei Vagin <avagin@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Kook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240406040911.1603801-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf trace beauty: Add shellcheck to scriptsIan Rogers
Add shell check to scripts generating perf trace lookup tables. Fix quoting issue in arch_errno_names.sh. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20240409023216.2342032-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf util: Add shellcheck to generate-cmdlist.shIan Rogers
Add shellcheck to generate-cmdlist.sh to avoid basic shell script mistakes. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20240409023216.2342032-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf arch x86: Add shellcheck to buildIan Rogers
Add shellcheck for: tools/perf/arch/x86/tests/gen-insn-x86-dat.sh tools/perf/arch/x86/entry/syscalls/syscalltbl.sh Address a minor quoting issue. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20240409023216.2342032-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf build: Add shellcheck to tools/perf scriptsIan Rogers
Address shell check errors/warnings in perf-archive.sh and perf-completion.sh. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20240409023216.2342032-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf list: Escape '\r' in JSON outputIan Rogers
Events like for sapphirerapids have '\r' in the uncore descriptions. The non-escaped versions of this fail JSON validation the the 'perf list' test. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240410222353.1722840-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf dsos: Switch more loops to dsos__for_each_dso()Ian Rogers
Switch loops within dsos.c, add a version that isn't locked. Switch some unlocked loops to hold the read lock. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240410064214.2755936-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf dso: Move dso functions out of dsos.cIan Rogers
Move dso and dso_id functions to dso.c to match the struct declarations. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240410064214.2755936-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf dsos: Introduce dsos__for_each_dso()Ian Rogers
To better abstract the dsos internals, introduce dsos__for_each_dso that does a callback on each dso. This also means the read lock can be correctly held. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240410064214.2755936-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf dsos: Tidy reference counting and lockingIan Rogers
Move more functionality into dsos.c generally from machine.c, renaming functions to match their new usage. The find function is made to always "get" before returning a dso. Reduce the scope of locks in vdso to match the locking paradigm. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240410064214.2755936-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf dsos: Attempt to better abstract DSOs internalsIan Rogers
Move functions from machine and build-id to dsos. Pass 'struct dsos' rather than internal state. Rename some functions to better represent which data structure they operate on. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240410064214.2755936-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf record: Fix debug message placement for test consumptionAdrian Hunter
evlist__config() might mess up the debug output consumed by test "Test per-thread recording" in "Miscellaneous Intel PT testing". Move it out from between the debug prints: "perf record opening and mmapping events" and "perf record done opening and mmapping events" Fixes: da4062021e0e6da5 ("perf tools: Add debug messages and comments for testing") Closes: https://lore.kernel.org/linux-perf-users/ZhVfc5jYLarnGzKa@x1/ Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240411075447.17306-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf annotate: Skip DSOs not foundNamhyung Kim
In some data file, I see the following messages repeated. It seems it doesn't have DSOs in the system and the dso->binary_type is set to DSO_BINARY_TYPE__NOT_FOUND. Let's skip them to avoid the followings. No output from objdump --start-address=0x0000000000000000 --stop-address=0x00000000000000d4 -d --no-show-raw-insn -C "$1" Error running objdump --start-address=0x0000000000000000 --stop-address=0x0000000000000631 -d --no-show-raw-insn -C "$1" ... Closes: https://lore.kernel.org/linux-perf-users/15e1a2847b8cebab4de57fc68e033086aa6980ce.camel@yandex.ru/ Reported-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240410185117.1987239-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf report: Do not collect sample histogram unnecessarilyNamhyung Kim
The data type profiling alone doesn't need the sample histogram for functions. It only needs the histogram for the types. Let's remove the condition in the report_callback to check if data type profiling is selected and make sure the annotation has the 'struct annotated_source' instantiated before calling symbol__disassemble(). Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf report: Add a menu item to annotate data type in TUINamhyung Kim
When the hist entry has the type info, it should be able to display the annotation browser for the type like in `perf annotate --data-type`. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf annotate-data: Support event group display in TUINamhyung Kim
Like in stdio, it should print all events in a group together. Committer notes: Collect it: root@number:~# perf record -a -e '{cpu_core/mem-loads,ldlat=30/P,cpu_core/mem-stores/P}' ^C[ perf record: Woken up 8 times to write data ] [ perf record: Captured and wrote 4.980 MB perf.data (55825 samples) ] root@number:~# Then do it in stdio: root@number:~# perf annotate --stdio --data-type Annotate type: 'union ' in /usr/lib64/libc.so.6 (1131 samples): event[0] = cpu_core/mem-loads,ldlat=30/P event[1] = cpu_core/mem-stores/P ============================================================================ Percent offset size field 100.00 100.00 0 40 union { 100.00 100.00 0 40 struct __pthread_mutex_s __data { 48.61 23.46 0 4 int __lock; 0.00 0.48 4 4 unsigned int __count; 6.38 41.32 8 4 int __owner; 8.74 34.02 12 4 unsigned int __nusers; 35.66 0.26 16 4 int __kind; 0.61 0.45 20 2 short int __spins; 0.00 0.00 22 2 short int __elision; 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0 0 char* __size; 48.61 23.94 0 8 long int __align; }; Now with TUI before this patch: root@number:~# perf annotate --tui --data-type Annotate type: 'union ' (790 samples) Percent Offset Size Field 100.00 0 40 union { 100.00 0 40 struct __pthread_mutex_s __data { 48.61 0 4 int __lock; 0.00 4 4 unsigned int __count; 6.38 8 4 int __owner; 8.74 12 4 unsigned int __nusers; 35.66 16 4 int __kind; 0.61 20 2 short int __spins; 0.00 22 2 short int __elision; 0.00 24 16 __pthread_list_t __list { 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 32 8 struct __pthread_internal_list* __next; 0.00 0 0 char* __size; 48.61 0 8 long int __align; }; And now after this patch: Annotate type: 'union ' (790 samples) Percent Offset Size Field 100.00 100.00 0 40 union { 100.00 100.00 0 40 struct __pthread_mutex_s __data { 48.61 23.46 0 4 int __lock; 0.00 0.48 4 4 unsigned int __count; 6.38 41.32 8 4 int __owner; 8.74 34.02 12 4 unsigned int __nusers; 35.66 0.26 16 4 int __kind; 0.61 0.45 20 2 short int __spins; 0.00 0.00 22 2 short int __elision; 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0 0 char* __size; 48.61 23.94 0 8 long int __align; }; On a followup patch the --tui output should have this that is present in --stdio: And the --stdio has all the missing info in TUI: Annotate type: 'union ' in /usr/lib64/libc.so.6 (1131 samples): event[0] = cpu_core/mem-loads,ldlat=30/P event[1] = cpu_core/mem-stores/P Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf annotate-data: Add hist_entry__annotate_data_tui()Namhyung Kim
Support data type profiling output on TUI. Testing from Arnaldo: First make sure that the debug information for your workload binaries in embedded in them by building it with '-g' or install the debuginfo packages, since our workload is 'find': root@number:~# type find find is hashed (/usr/bin/find) root@number:~# rpm -qf /usr/bin/find findutils-4.9.0-5.fc39.x86_64 root@number:~# dnf debuginfo-install findutils <SNIP> root@number:~# Then collect some data: root@number:~# echo 1 > /proc/sys/vm/drop_caches root@number:~# perf mem record find / > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.331 MB perf.data (3982 samples) ] root@number:~# Finally do data-type annotation with the following command, that will default, as 'perf report' to the --tui mode, with lines colored to highlight the hotspots, etc. root@number:~# perf annotate --data-type Annotate type: 'struct predicate' (58 samples) Percent Offset Size Field 100.00 0 312 struct predicate { 0.00 0 8 PRED_FUNC pred_func; 0.00 8 8 char* p_name; 0.00 16 4 enum predicate_type p_type; 0.00 20 4 enum predicate_precedence p_prec; 0.00 24 1 _Bool side_effects; 0.00 25 1 _Bool no_default_print; 0.00 26 1 _Bool need_stat; 0.00 27 1 _Bool need_type; 0.00 28 1 _Bool need_inum; 0.00 32 4 enum EvaluationCost p_cost; 0.00 36 4 float est_success_rate; 0.00 40 1 _Bool literal_control_chars; 0.00 41 1 _Bool artificial; 0.00 48 8 char* arg_text; <SNIP> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf annotate-data: Add hist_entry__annotate_data_tty()Namhyung Kim
And move the related code into util/annotate-data.c file. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf annotate: Show progress of sample processingNamhyung Kim
Like 'perf report', it can take a while to process samples. Show a progress window to inform users how that it is not stuck. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf annotate-data: Skip sample histogram for stack canaryNamhyung Kim
It's a pseudo data type and has no field. Fixes: b3c95109c131fcc9 ("perf annotate-data: Add stack canary type") Closes: https://lore.kernel.org/lkml/Zhb6jJneP36Z-or0@x1 Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240411033256.2099646-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-12perf tests: Remove dependency on lscpuJames Clark
This check can be done with uname which is more portable. At the same time re-arrange it into a standard if statement so that it's more readable. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Spoorthy S <spoorts2@in.ibm.com> Link: https://lore.kernel.org/r/20240410103458.813656-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>