summaryrefslogtreecommitdiff
path: root/tools/perf/builtin-lock.c
AgeCommit message (Collapse)Author
2025-05-31perf lock contention: Reject more than 10ms delays for safetyNamhyung Kim
Delaying kernel operations can be dangerous and the kernel may kill (non-sleepable) BPF programs running for long in the future. Limit the max delay to 10ms and update the document about it. $ sudo ./perf lock con -abl -J 100000us@cgroup_mutex true lock delay is too long: 100000us (> 10ms) Usage: perf lock contention [<options>] -J, --inject-delay <TIME@FUNC> Inject delays to specific locks Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250515181042.555189-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf lock contention: Add -J/--inject-delay optionNamhyung Kim
This is to slow down lock acquistion (on contention locks) deliberately. A possible use case is to estimate impact on application performance by optimization of kernel locking behavior. By delaying the lock it can simulate the worse condition as a control group, and then compare with the current behavior as a optimized condition. The syntax is 'time@function' and the time can have unit suffix like "us" and "ms". For example, I ran a simple test like below. $ sudo perf lock con -abl -L tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 92 1.18 ms 199.54 us 12.79 us ffffffff8a806080 tasklist_lock (rwlock) The contention count was 92 and the average wait time was around 10 us. But if I add 100 usec of delay to the tasklist_lock, $ sudo perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 190 15.67 ms 230.10 us 82.46 us ffffffff8a806080 tasklist_lock (rwlock) The contention count increased and the average wait time was up closed to 100 usec. If I increase the delay even more, $ sudo perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1002 2.80 s 3.01 ms 2.80 ms ffffffff8a806080 tasklist_lock (rwlock) Now every sleep process had contention and the wait time was more than 1 msec. This is on my 4 CPU laptop so I guess one CPU has the lock while other 3 are waiting for it mostly. For simplicity, it only supports global locks for now. Committer testing: root@number:~# grep -m1 'model name' /proc/cpuinfo model name : AMD Ryzen 9 9950X3D 16-Core Processor root@number:~# perf lock con -abl -L tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 142 453.85 us 25.39 us 3.20 us ffffffffae808080 tasklist_lock (rwlock) root@number:~# perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1040 2.39 s 3.11 ms 2.30 ms ffffffffae808080 tasklist_lock (rwlock) root@number:~# perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1025 24.72 s 31.01 ms 24.12 ms ffffffffae808080 tasklist_lock (rwlock) root@number:~# Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250509171950.183591-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-02-28perf lock: Report owner stack in usermodeChun-Tse Shao
This patch parses `owner_lock_stat` into a RB tree, enabling ordered reporting of owner lock statistics with stack traces. It also updates the documentation for the `-o` option in contention mode, decouples `-o` from `-t`, and issues a warning to inform users about the new behavior of `-ov`. Example output: $ sudo ~/linux/tools/perf/perf lock con -abvo -Y mutex-spin -E3 perf bench sched pipe ... contended total wait max wait avg wait type caller 171 1.55 ms 20.26 us 9.06 us mutex pipe_read+0x57 0xffffffffac6318e7 pipe_read+0x57 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 36 193.71 us 15.27 us 5.38 us mutex pipe_write+0x50 0xffffffffac631ee0 pipe_write+0x50 0xffffffffac6241db vfs_write+0x3bb 0xffffffffac6244ab ksys_write+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 4 51.22 us 16.47 us 12.80 us mutex do_epoll_wait+0x24d 0xffffffffac691f0d do_epoll_wait+0x24d 0xffffffffac69249b do_epoll_pwait.part.0+0xb 0xffffffffac693ba5 __x64_sys_epoll_pwait+0x95 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 === owner stack trace === 3 31.24 us 15.27 us 10.41 us mutex pipe_read+0x348 0xffffffffac631bd8 pipe_read+0x348 0xffffffffac623862 vfs_read+0x332 0xffffffffac62434b ksys_read+0xbb 0xfffffffface604b2 do_syscall_64+0x82 0xffffffffad00012f entry_SYSCALL_64_after_hwframe+0x76 ... Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-5-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-28perf lock: Make rb_tree helper functions genericChun-Tse Shao
The rb_tree helper functions can be reused for parsing `owner_lock_stat` into rb tree for sorting. Signed-off-by: Chun-Tse Shao <ctshao@google.com> Tested-by: Athira Rajeev <atrajeev@linux.ibm.com> Link: https://lore.kernel.org/r/20250227003359.732948-4-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17perf lock: Rename fields in lock_type_tableChun-Tse Shao
`lock_type_table` contains `name` and `str` which can be confusing. Rename them to `flags_name` and `lock_name` and add descriptions to enhance understanding. Tested by building perf for x86. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-3-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17perf lock: Add percpu-rwsem for type filterChun-Tse Shao
percpu-rwsem was missing in man page. And for backward compatibility, replace `pcpu-sem` with `percpu-rwsem` before parsing lock name. Tested `./perf lock con -ab -Y pcpu-sem` and `./perf lock con -ab -Y percpu-rwsem` Fixes: 4f701063bfa2 ("perf lock contention: Show lock type with address") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-2-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17perf lock: Fix parse_lock_type which only retrieve one lock flagChun-Tse Shao
`parse_lock_type` can only add the first lock flag in `lock_type_table` given input `str`. For example, for `Y rwlock`, it only adds `rwlock:R` into this perf session. Another example is for `-Y mutex`, it only adds the mutex without `LCB_F_SPIN` flag. The patch fixes this issue, makes sure both `rwlock:R` and `rwlock:W` will be added with `-Y rwlock`, and so on. Testing: $ ./perf lock con -ab -Y mutex,rwlock -- perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 9.313 [sec] 9.313976 usecs/op 107365 ops/sec contended total wait max wait avg wait type caller 176 1.65 ms 19.43 us 9.38 us mutex pipe_read+0x57 34 180.14 us 10.93 us 5.30 us mutex pipe_write+0x50 7 77.48 us 16.09 us 11.07 us mutex do_epoll_wait+0x24d 7 74.70 us 13.50 us 10.67 us mutex do_epoll_wait+0x24d 3 35.97 us 14.44 us 11.99 us rwlock:W ep_done_scan+0x2d 3 35.00 us 12.23 us 11.66 us rwlock:W do_epoll_wait+0x255 2 15.88 us 11.96 us 7.94 us rwlock:W do_epoll_wait+0x47c 1 15.23 us 15.23 us 15.23 us rwlock:W do_epoll_wait+0x4d0 1 14.26 us 14.26 us 14.26 us rwlock:W ep_done_scan+0x2d 2 14.00 us 7.99 us 7.00 us mutex pipe_read+0x282 1 12.29 us 12.29 us 12.29 us rwlock:R ep_poll_callback+0x35 1 12.02 us 12.02 us 12.02 us rwlock:W do_epoll_ctl+0xb65 1 10.25 us 10.25 us 10.25 us rwlock:R ep_poll_callback+0x35 1 7.86 us 7.86 us 7.86 us mutex do_epoll_ctl+0x6c1 1 5.04 us 5.04 us 5.04 us mutex do_epoll_ctl+0x3d4 [namhyung: Add a comment and rename to 'mutex:spin' for consistency Fixes: d783ea8f62c4 ("perf lock contention: Simplify parse_lock_type()") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-1-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17perf lock: Fix return code for functions in __cmd_contentionAthira Rajeev
perf lock contention returns zero exit value even if the lock contention BPF setup failed. # ./perf lock con -b true libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF libbpf: Error loading vmlinux BTF: -ESRCH libbpf: failed to load object 'lock_contention_bpf' libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH Failed to load lock-contention BPF skeleton lock contention BPF setup failed # echo $? 0 Fix this by saving the return code for lock_contention_prepare so that command exits with proper return code. Similarly set the return code properly for two other functions in builtin-lock, namely setup_output_field() and select_key(). Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250110093730.93610-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-12-23perf lock contention: Handle slab objects in -L/--lock-filter optionNamhyung Kim
This is to filter lock contention from specific slab objects only. Like in the lock symbol output, we can use '&' prefix to filter slab object names. root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 3 14.99 us 14.44 us 5.00 us ffffffff851c0940 pack_mutex (mutex) 2 2.75 us 2.56 us 1.38 us ffff98d7031fb498 &task_struct (mutex) 4 1.42 us 557 ns 355 ns ffff98d706311400 &kmalloc-cg-512 (mutex) 2 953 ns 714 ns 476 ns ffffffff851c3620 delayed_uprobe_lock (mutex) 1 929 ns 929 ns 929 ns ffff98d7031fb538 &task_struct (mutex) 3 561 ns 210 ns 187 ns ffffffff84a8b3a0 text_mutex (mutex) 1 479 ns 479 ns 479 ns ffffffff851b4cf8 tracepoint_srcu_srcu_usage (mutex) 2 320 ns 195 ns 160 ns ffffffff851cf840 pcpu_alloc_mutex (mutex) 1 212 ns 212 ns 212 ns ffff98d7031784d8 &signal_cache (mutex) 1 177 ns 177 ns 177 ns ffffffff851b4c28 tracepoint_srcu_srcu_usage (mutex) With the filter, it can show contentions from the task_struct only. root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -abl -L '&task_struct' sleep 1 contended total wait max wait avg wait address symbol 2 1.97 us 1.71 us 987 ns ffff98d7032fd658 &task_struct (mutex) 1 1.20 us 1.20 us 1.20 us ffff98d7032fd6f8 &task_struct (mutex) It can work with other aggregation mode: root@virtme-ng:/home/namhyung/project/linux# tools/perf/perf lock con -ab -L '&task_struct' sleep 1 contended total wait max wait avg wait type caller 1 25.10 us 25.10 us 25.10 us mutex perf_event_exit_task+0x39 1 21.60 us 21.60 us 21.60 us mutex futex_exit_release+0x21 1 5.56 us 5.56 us 5.56 us mutex futex_exec_release+0x21 Committer testing: root@number:~# perf lock con -abl sleep 1 contended total wait max wait avg wait address symbol 1 20.80 us 20.80 us 20.80 us ffff9d417fbd65d0 (spinlock) 8 12.85 us 2.41 us 1.61 us ffff9d415eeb6a40 rq_lock (spinlock) 1 2.55 us 2.55 us 2.55 us ffff9d415f636a40 rq_lock (spinlock) 7 1.92 us 840 ns 274 ns ffff9d39c2cbc8c4 (spinlock) 1 1.23 us 1.23 us 1.23 us ffff9d415fb36a40 rq_lock (spinlock) 2 928 ns 738 ns 464 ns ffff9d39c1fa6660 &kmalloc-rnd-14-192 (rwlock) 4 788 ns 252 ns 197 ns ffffffffb8608a80 jiffies_lock (spinlock) 1 304 ns 304 ns 304 ns ffff9d39c2c979c4 (spinlock) 1 216 ns 216 ns 216 ns ffff9d3a0225c660 &kmalloc-rnd-14-192 (rwlock) 1 89 ns 89 ns 89 ns ffff9d3a0adbf3e0 &kmalloc-rnd-14-192 (rwlock) 1 61 ns 61 ns 61 ns ffff9d415f9b6a40 rq_lock (spinlock) root@number:~# uname -r 6.13.0-rc2 root@number:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-20perf lock contention: Add and use LCB_F_TYPE_MASKNamhyung Kim
This is a preparation for the later change. It'll use more bits in the flags so let's rename the type part and use the mask to extract the type. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kees Cook <kees@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20241220060009.507297-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-18perf lock: Move common lock contention code to new fileIan Rogers
Avoid references from util code to builtin-lock that require python stubs. Move the functions and related variables to util/lock-contention.c. Add max_stack_depth parameter to match_callstack_filter to avoid sharing a global variable. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20241119011644.971342-16-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04libsubcmd: Don't free the usage stringAditya Gupta
Currently, commands which depend on 'parse_options_subcommand()' don't show the usage string, and instead show '(null)' $ ./perf sched Usage: (null) -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -i, --input <file> input file name -v, --verbose be more verbose (show symbol address, etc) 'parse_options_subcommand()' is generally expected to initialise the usage string, with information in the passed 'subcommands[]' array This behaviour was changed in: 230a7a71f92212e7 ("libsubcmd: Fix parse-options memory leak") Where the generated usage string is deallocated, and usage[0] string is reassigned as NULL. As discussed in [1], free the allocated usage string in the main function itself, and don't reset usage string to NULL in parse_options_subcommand With this change, the behaviour is restored. $ ./perf sched Usage: perf sched [<options>] {record|latency|map|replay|script|timehist} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -i, --input <file> input file name -v, --verbose be more verbose (show symbol address, etc) [1]: https://lore.kernel.org/linux-perf-users/htq5vhx6piet4nuq2mmhk7fs2bhfykv52dbppwxmo3s7du2odf@styd27tioc6e/ Fixes: 230a7a71f92212e7 ("libsubcmd: Fix parse-options memory leak") Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904061836.55873-2-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf lock: Use perf_tool__init()Ian Rogers
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const and not relying on perf_tool__fill_defaults(). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf tool: Constify tool pointersIan Rogers
The tool pointer (to a struct largely of function pointers) is passed around but is unchanged except at initialization. Change parameter and variable types to be const to lower the possibilities of what could happen with a tool. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-03perf lock info: Display both map and thread by defaultNick Forrington
Change "perf lock info" argument handling to: Display both map and thread info (rather than an error) when neither are specified. Display both map and thread info (rather than just thread info) when both are requested. Signed-off-by: Nick Forrington <nick.forrington@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240513091413.738537-2-nick.forrington@arm.com
2024-05-10perf lock: Avoid memory leaks from strdup()Ian Rogers
Leak sanitizer complains about the strdup-ed arguments not being freed and given cmd_record doesn't modify the given strings, remove the strdups. Original discussion in this patch: https://lore.kernel.org/lkml/20240430184156.1824083-1-irogers@google.com/ Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240509053123.1918093-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-27perf lock: Fix a memory leak on an error pathzhaimingbing
if a strdup-ed string is NULL,the allocated memory needs freeing. Signed-off-by: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20231124092657.10392-1-zhaimingbing@cmss.chinamobile.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-10-12perf lock: Fix a memory leak on an error pathIan Rogers
If a memory allocation fails then the strdup-ed string needs freeing. Detected by clang-tidy. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: llvm@lists.linux.dev Cc: Ming Wang <wangming01@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20231009183920.200859-15-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-10-04tools/perf: Update call stack check in builtin-lock.cKajol Jain
The perf test named "kernel lock contention analysis test" fails in powerpc system with below error: [command]# ./perf test 81 -vv 81: kernel lock contention analysis test : --- start --- test child forked, pid 2140 Testing perf lock record and perf lock contention Testing perf lock contention --use-bpf [Skip] No BPF support Testing perf lock record and perf lock contention at the same time Testing perf lock contention --threads Testing perf lock contention --lock-addr Testing perf lock contention --type-filter (w/ spinlock) Testing perf lock contention --lock-filter (w/ tasklist_lock) Testing perf lock contention --callstack-filter (w/ unix_stream) [Fail] Recorded result should have a lock from unix_stream: test child finished with -1 ---- end ---- kernel lock contention analysis test: FAILED! The test is failing because we get an address entry with 0 in perf lock samples for powerpc, and code for lock contention option "--callstack-filter" will not check further entries after address 0. Below are some of the samples from test generated perf.data file, which have 0 address in the 2nd entry of callstack: -------- sched-messaging 3409 [001] 7152.904029: lock:contention_begin: 0xc00000c80904ef00 (flags=SPIN) c0000000001e926c __traceiter_contention_begin+0x6c ([kernel.kallsyms]) 0 [unknown] ([unknown]) c000000000f8a178 native_queued_spin_lock_slowpath+0x1f8 ([kernel.kallsyms]) c000000000f89f44 _raw_spin_lock_irqsave+0x84 ([kernel.kallsyms]) c0000000001d9fd0 prepare_to_wait+0x50 ([kernel.kallsyms]) c000000000c80f50 sock_alloc_send_pskb+0x1b0 ([kernel.kallsyms]) c000000000e82298 unix_stream_sendmsg+0x2b8 ([kernel.kallsyms]) c000000000c78980 sock_sendmsg+0x80 ([kernel.kallsyms]) sched-messaging 3408 [005] 7152.904036: lock:contention_begin: 0xc00000c80904ef00 (flags=SPIN) c0000000001e926c __traceiter_contention_begin+0x6c ([kernel.kallsyms]) 0 [unknown] ([unknown]) c000000000f8a178 native_queued_spin_lock_slowpath+0x1f8 ([kernel.kallsyms]) c000000000f89f44 _raw_spin_lock_irqsave+0x84 ([kernel.kallsyms]) c0000000001d9fd0 prepare_to_wait+0x50 ([kernel.kallsyms]) c000000000c80f50 sock_alloc_send_pskb+0x1b0 ([kernel.kallsyms]) c000000000e82298 unix_stream_sendmsg+0x2b8 ([kernel.kallsyms]) c000000000c78980 sock_sendmsg+0x80 ([kernel.kallsyms]) -------- Based on commit 20002ded4d93 ("perf_counter: powerpc: Add callchain support"), incase of powerpc, the callchain saved by kernel always includes first three entries as the NIP (next instruction pointer), LR (link register), and the contents of LR save area in the second stack frame. In certain scenarios its possible to have invalid kernel instruction addresses in either of LR or the second stack frame's LR. In that case, kernel will store the address as zer0. Hence, its possible to have 2nd or 3rd callstack entry as 0. As per the current code in match_callstack_filter function, we skip the callstack check incase we get 0 address. And hence the test case is failing in powerpc. Fix this issue by updating the check in match_callstack_filter function, to not skip callstack check if the 2nd or 3rd entry have 0 address for powerpc. Result in powerpc after patch changes: [command]# ./perf test 81 -vv 81: kernel lock contention analysis test : --- start --- test child forked, pid 4570 Testing perf lock record and perf lock contention Testing perf lock contention --use-bpf [Skip] No BPF support Testing perf lock record and perf lock contention at the same time Testing perf lock contention --threads Testing perf lock contention --lock-addr Testing perf lock contention --type-filter (w/ spinlock) Testing perf lock contention --lock-filter (w/ tasklist_lock) [Skip] Could not find 'tasklist_lock' Testing perf lock contention --callstack-filter (w/ unix_stream) Testing perf lock contention --callstack-filter with task aggregation Testing perf lock contention CSV output [Skip] No BPF support test child finished with 0 ---- end ---- kernel lock contention analysis test: Ok Fixes: ebab291641be ("perf lock contention: Support filters for different aggregation") Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Cc: maddy@linux.ibm.com Cc: atrajeev@linux.vnet.ibm.com Link: https://lore.kernel.org/r/20231003092113.252380-1-kjain@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-09-12perf lock contention: Add -G/--cgroup-filter optionNamhyung Kim
The -G/--cgroup-filter is to limit lock contention collection on the tasks in the specific cgroups only. $ sudo ./perf lock con -abt -G /user.slice/.../vte-spawn-52221fb8-b33f-4a52-b5c3-e35d1e6fc0e0.scope \ ./perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.174 [sec] contended total wait max wait avg wait pid comm 4 114.45 us 60.06 us 28.61 us 214847 sched-messaging 2 111.40 us 60.84 us 55.70 us 214848 sched-messaging 2 106.09 us 59.42 us 53.04 us 214837 sched-messaging 1 81.70 us 81.70 us 81.70 us 214709 sched-messaging 68 78.44 us 6.83 us 1.15 us 214633 sched-messaging 69 73.71 us 2.69 us 1.07 us 214632 sched-messaging 4 72.62 us 60.83 us 18.15 us 214850 sched-messaging 2 71.75 us 67.60 us 35.88 us 214840 sched-messaging 2 69.29 us 67.53 us 34.65 us 214804 sched-messaging 2 69.00 us 68.23 us 34.50 us 214826 sched-messaging ... Export cgroup__new() function as it's needed from outside. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf lock contention: Add --lock-cgroup optionNamhyung Kim
The --lock-cgroup option shows lock contention stats break down by cgroups. Add LOCK_AGGR_CGROUP mode and use it instead of use_cgroup field. $ sudo ./perf lock con -ab --lock-cgroup sleep 1 contended total wait max wait avg wait cgroup 8 15.70 us 6.34 us 1.96 us / 2 1.48 us 747 ns 738 ns /user.slice/.../app.slice/app-gnome-google\x2dchrome-6442.scope 1 848 ns 848 ns 848 ns /user.slice/.../session.slice/org.gnome.Shell@x11.service 1 220 ns 220 ns 220 ns /user.slice/.../session.slice/pipewire-pulse.service For now, the cgroup mode only works with BPF (-b). Committer notes: Remove -g as it is used in the other tools with a clear meaning of collect/show callchains. As agreed with Namhyung off list. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-12perf lock contention: Prepare to handle cgroupsNamhyung Kim
Save cgroup info and display cgroup names if requested. This is a preparation for the next patch. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230906174903.346486-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-25perf lock contention: Fix typo in max-stack option descriptionKajol Jain
Fix typo in max-stack option description by changing lopck contention to lock contention. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230825104700.440809-1-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-17perf lock: Don't pass an ERR_PTR() directly to perf_session__delete()Arnaldo Carvalho de Melo
While debugging a segfault on 'perf lock contention' without an available perf.data file I noticed that it was basically calling: perf_session__delete(ERR_PTR(-1)) Resulting in: (gdb) run lock contention Starting program: /root/bin/perf lock contention [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". failed to open perf.data: No such file or directory (try 'perf record' first) Initializing perf session failed Program received signal SIGSEGV, Segmentation fault. 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 2858 if (!session->auxtrace) (gdb) p session $1 = (struct perf_session *) 0xffffffffffffffff (gdb) bt #0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 #1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300 #2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161 #3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604 #4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322 #5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375 #6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419 #7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535 (gdb) So just set it to NULL after using PTR_ERR(session) to decode the error as perf_session__delete(NULL) is supported. Fixes: eef4fee5e52071d5 ("perf lock: Dynamically allocate lockhash_table") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/lkml/ZN4R1AYfsD2J8lRs@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-07-01perf lock contention: Add --output optionNamhyung Kim
To avoid formatting failures for example in CSV output due to debug messages, add --output option to put the result in a file. Unfortunately the short -o option was taken by the --owner already. $ sudo ./perf lock con -ab --output lock-out.txt -v sleep 1 Looking at the vmlinux_path (8 entries long) symsrc__init: cannot get elf header. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols $ head lock-out.txt contended total wait max wait avg wait type caller 3 76.79 us 26.89 us 25.60 us rwlock:R ep_poll_callback+0x2d 0xffffffff9a23f4b5 _raw_read_lock_irqsave+0x45 0xffffffff99bbd4dd ep_poll_callback+0x2d 0xffffffff999029f3 __wake_up_common+0x73 0xffffffff99902b82 __wake_up_common_lock+0x82 0xffffffff99fa5b1c sock_def_readable+0x3c 0xffffffff9a11521d unix_stream_sendmsg+0x18d 0xffffffff99f9fc9c sock_sendmsg+0x5c Suggested-by: Ian Rogers <irogers@google.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628200141.2739587-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-07-01perf lock contention: Add -x option for CSV style outputNamhyung Kim
Sometimes we want to process the output by external programs. Let's add the -x option to specify the field separator like perf stat. $ sudo ./perf lock con -ab -x, sleep 1 # output: contended, total wait, max wait, avg wait, type, caller 19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0 15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e 4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d 1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135 8, 67608, 27404, 8451, spinlock, __queue_work+0x174 3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff 3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248 2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad 1, 24619, 24619, 24619, spinlock, rcu_core+0xd4 The first line is a comment that shows the output format. Each line is separated by the given string ("," in this case). The time is printed in nsec without the unit so that it can be parsed easily. The characters can be used in the output like (":", "+" and ".") are not allowed for the -x option. $ ./perf lock con -x: Cannot use the separator that is already used Usage: perf lock contention [<options>] -x, --field-separator <separator> print result in CSV format with custom separator The stacktraces are printed in the same line separated by ":". The header is updated to show the stacktrace. Also the debug output is added at the end as a comment. $ sudo ./perf lock con -abv -x, -F wait_total sleep 1 Looking at the vmlinux_path (8 entries long) symsrc__init: cannot get elf header. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols # output: total wait, type, caller, stacktrace 37134, spinlock, rcu_core+0xd4, 0xffffffff9d0401e4 _raw_spin_lock_irqsave+0x44: 0xffffffff9c738114 rcu_core+0xd4: ... 21213, spinlock, raw_spin_rq_lock_nested+0x1b, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9c6d9cfb raw_spin_rq_lock_nested+0x1b: ... 20506, rwlock:W, ep_done_scan+0x2d, 0xffffffff9c9bc4dd ep_done_scan+0x2d: 0xffffffff9c9bd5f1 do_epoll_wait+0x6d1: ... 18044, rwlock:R, ep_poll_callback+0x2d, 0xffffffff9d040555 _raw_read_lock_irqsave+0x45: 0xffffffff9c9bc81d ep_poll_callback+0x2d: ... 17890, rwlock:W, do_epoll_wait+0x47b, 0xffffffff9c9bd39b do_epoll_wait+0x47b: 0xffffffff9c9be9ef __x64_sys_epoll_wait+0x6d1: ... 12114, spinlock, futex_wait_queue+0x60, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9d037cae __schedule+0xbe: ... # debug: total=7, bad=0, bad_task=0, bad_stack=0, bad_time=0, bad_data=0 Also note that some field (like lock symbols) can be empty. $ sudo ./perf lock con -abl -x, -E 10 sleep 1 # output: contended, total wait, max wait, avg wait, address, symbol, type 6, 275025, 61764, 45837, ffff9dcc9f7d60d0, , spinlock 18, 87716, 11196, 4873, ffff9dc540059000, , spinlock 2, 6472, 5499, 3236, ffff9dcc7f730e00, rq_lock, spinlock 3, 4429, 2341, 1476, ffff9dcc7f7b0e00, rq_lock, spinlock 3, 3974, 1635, 1324, ffff9dcc7f7f0e00, rq_lock, spinlock 4, 3290, 1326, 822, ffff9dc5f4e2cde0, , rwlock 3, 2894, 1023, 964, ffffffff9e0d7700, rcu_state, spinlock 1, 2567, 2567, 2567, ffff9dcc7f6b0e00, rq_lock, spinlock 4, 1259, 596, 314, ffff9dc69c2adde0, , rwlock 1, 934, 934, 934, ffff9dcc7f670e00, rq_lock, spinlock Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628200141.2739587-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-07-01perf lock: Remove stale commentsNamhyung Kim
The comment was for symbol_conf.sort_by_name which was deleted already. Let's get rid of the stale comments as well. Acked-by: Ian Rogers <irogers@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230628200141.2739587-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-06-23perf symbol: Remove now unused symbol_conf.sort_by_nameIan Rogers
Previously used to specify symbol_name_rb_node was in use. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20230623054520.4118442-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-06-12perf callchain: Use pthread keys for tls callchain_cursorIan Rogers
Pthread keys are more portable than __thread and allow the association of a destructor with the key. Use the destructor to clean up TLS callchain cursors to aid understanding memory leaks. Committer notes: Had to fixup a series of unconverted places and also check for the return of get_tls_callchain_cursor() as it may fail and return NULL. In that unlikely case we now either print something to a file, if the caller was expecting to print a callchain, or return an error code to state that resolving the callchain isn't possible. In some cases this was made easier because thread__resolve_callchain() already can fail for other reasons, so this new one (cursor == NULL) can be added and the callers don't have to explicitely check for this new condition. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-25-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-28perf lock: Dynamically allocate lockhash_tableIan Rogers
lockhash_table is 32,768 bytes in .bss, make it a memory allocation so that the space is freed for non-lock perf commands. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20230526183401.2326121-10-irogers@google.com Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-06Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"Arnaldo Carvalho de Melo
This reverts commit a980755beb5aca9002e1c95ba519b83a44242b5b. We need to better polish building with BPF skels, so revert back to making it an experimental feature that has to be explicitely enabled using BUILD_BPF_SKEL=1. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf map: Add helper for ->map_ip() and ->unmap_ip()Ian Rogers
Later changes will add reference count checking for struct map, add a helper function to invoke the map_ip and unmap_ip function pointers. The helper allows the reference count check to be in fewer places. Committer notes: Add missing conversions to: tools/perf/util/map.c tools/perf/util/cs-etm.c tools/perf/util/annotate.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/arch/s390/annotate/instructions.c Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Revise needs_callstack() conditionNamhyung Kim
It needs callstacks for two reasons: * for stack aggregation mode, the map key is the stack id and it can also show the full stack traces when -v is used * for other aggregation modes, the stack filter can be used to limit lock contentions from known call paths The -v option is meaningful (in terms of stack trace) only for stack aggregation mode, so it should not set the save_callstack for other mode like with -t or -l options. I've noticed this with the following command line: $ sudo ./perf lock con -ablv -E 3 -M 16 -- ./perf bench sched messaging ... contended total wait max wait avg wait address symbol 88 4.59 ms 108.07 us 52.13 us ffff935757f46ec0 (spinlock) 33 905.22 us 73.67 us 27.43 us ffff935757f41700 (spinlock) 28 703.69 us 79.28 us 25.13 us ffff938a3d9b0c80 rq_lock (spinlock) === output for debug === bad: 12272, total: 12421 bad rate: 98.80 % histogram of failure reasons task: 8285 stack: 3987 <---------- here time: 0 data: 0 It should not have any failure on stacks since it doesn't use it. No functional change intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Update total/bad stats for hidden entriesNamhyung Kim
When -E option is used, it only prints the given number of entries but the event stat at the end should have the numbers for entire entries. Likewise, -S option will hide entries that don't have the named function in the callstack. Also update event stat for them. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Add data failure statNamhyung Kim
It's possible to fail to update the data when the lock_stat map is full. We should check that case and show the number at the end. $ sudo ./perf lock con -ablv -E3 -- ./perf bench sched messaging ... contended total wait max wait avg wait address symbol 6157 208.48 ms 69.29 us 33.86 us ffff934c001c1f00 (spinlock) 4030 72.04 ms 61.84 us 17.88 us ffff934c000415c0 (spinlock) 3201 50.30 ms 47.73 us 15.71 us ffff934c2eead850 (spinlock) === output for debug === bad: 0, total: 13388 bad rate: 0.00 % histogram of failure reasons task: 0 stack: 0 time: 0 data: 0 <----- added Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Update default map size to 16384Namhyung Kim
The BPF hash map will align the map size to a power of 2. So 10k would be 16k anyway. Let's have the actual size to avoid confusions. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Use -M for --map-nr-entriesNamhyung Kim
Users often want to change the map size, let's add a short option (-M) for that. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Simplify parse_lock_type()Namhyung Kim
The get_type_flag() should check both str and name fields in the lock_type_table so that it can find the appropriate flag without retrying with ':R' or ':W' suffix from the caller. Also fix a typo in the rt-mutex. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf lock contention: Show detail failure reason for BPFNamhyung Kim
It can fail to collect lock stat from BPF for various reasons. For example, I've got a report that sometimes time calculation seems wrong in case of contended spinlocks. I suspect the time delta went negative for some reason. Count them separately and show in the output like below: $ sudo perf lock contention -abE5 sleep 10 contended total wait max wait avg wait type caller 13 785.61 us 79.36 us 60.43 us spinlock remove_wait_queue+0x14 10 469.02 us 87.51 us 46.90 us spinlock prepare_to_wait+0x27 9 289.09 us 69.08 us 32.12 us spinlock finish_wait+0x36 114 251.05 us 8.56 us 2.20 us spinlock try_to_wake_up+0x1f5 132 188.63 us 5.01 us 1.43 us spinlock __wake_up_common_lock+0x62 === output for debug === bad: 1, total: 279 bad rate: 0.36 % histogram of failure reasons task: 1 stack: 0 time: 0 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230327225711.245738-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf lock contention: Fix debug stat if no contentionNamhyung Kim
It should not divide if the total number is 0. Otherwise it'd show NaN in the bad rate output. Also add a whitespace in the "output for debug" message. $ sudo perf lock contention -abv true Looking at the vmlinux_path (8 entries long) symsrc__init: cannot get elf header. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols contended total wait max wait avg wait type caller === output for debug=== bad: 0, total: 0 bad rate: -nan % <------------------------- (here) histogram of events caused bad sequence acquire: 0 acquired: 0 contended: 0 release: 0 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230327225711.245738-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-14perf lock contention: Show lock type with addressNamhyung Kim
Show lock type names after the symbol of locks if any. This can be useful especially when it doesn't show the lock symbols. The indentation before the lock type parenthesis is to recognize lock symbols more easily. $ sudo ./perf lock con -abl -- sleep 1 contended total wait max wait avg wait address symbol 44 6.13 ms 284.49 us 139.28 us ffffffff92e06080 tasklist_lock (rwlock) 159 983.38 us 12.38 us 6.18 us ffff8cc717c90000 siglock (spinlock) 10 679.90 us 153.35 us 67.99 us ffff8cdc2872aaf8 mmap_lock (rwsem) 9 558.11 us 180.67 us 62.01 us ffff8cd647914038 mmap_lock (rwsem) 78 228.56 us 7.82 us 2.93 us ffff8cc700061c00 (spinlock) 5 41.60 us 16.93 us 8.32 us ffffd853acb41468 (spinlock) 10 37.24 us 5.87 us 3.72 us ffff8cd560b5c200 siglock (spinlock) 4 11.17 us 3.97 us 2.79 us ffff8d053ddf0c80 rq_lock (spinlock) 1 7.86 us 7.86 us 7.86 us ffff8cd64791404c (spinlock) 1 4.13 us 4.13 us 4.13 us ffff8d053d930c80 rq_lock (spinlock) 7 3.98 us 1.67 us 568 ns ffff8ccb92479440 (mutex) 2 2.62 us 2.33 us 1.31 us ffff8cc702e6ede0 (rwlock) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-14perf lock contention: Track and show siglock with addressNamhyung Kim
Likewise, we can display siglock by following the pointer like current->sighand->siglock. $ sudo ./perf lock con -abl -- sleep 1 contended total wait max wait avg wait address symbol 16 2.18 ms 305.35 us 136.34 us ffffffff92e06080 tasklist_lock 28 521.78 us 31.16 us 18.63 us ffff8cc703783ec4 7 119.03 us 23.55 us 17.00 us ffff8ccb92479440 15 88.29 us 10.06 us 5.89 us ffff8cd560b5f380 siglock 7 37.67 us 9.16 us 5.38 us ffff8d053daf0c80 5 8.81 us 4.92 us 1.76 us ffff8d053d6b0c80 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-14perf lock contention: Track and show mmap_lock with addressNamhyung Kim
Sometimes there are severe contentions on the mmap_lock and we want see it in the -l/--lock-addr output. However it cannot symbolize the mmap_lock because it's allocated dynamically without symbols. Stephane and Hao gave me an idea separately to display mmap_lock by following the current->mm pointer. I added a flag to mark mmap_lock after comparing the lock address so that it can show them differently. With this change it can show mmap_lock like below: $ sudo ./perf lock con -abl -- sleep 10 contended total wait max wait avg wait address symbol ... 16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640 17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0 3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock 3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58 1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70 9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock 14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0 3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock 16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560 11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock 1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8 1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock 581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058 5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070 112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120 381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock 255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80 Note that mmap_lock was renamed some time ago and it needs to support old kernels with a different name 'mmap_sem'. Suggested-by: Hao Luo <haoluo@google.com> Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-14perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKELIan Rogers
BPF skeleton support is now key to a number of perf features. Rather than making it so that BPF support must be enabled for the build, make this the default and error if the build lacks a clang and libbpf that are sufficient. To avoid the error and build without BPF skeletons the NO_BPF_SKEL=1 flag can be used. Add a build-options flag to 'perf version' to enable detection of the BPF skeleton support and use this in the offcpu shell test. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Pavithra Gurushankar <gpavithrasha@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Roberto Sassu <roberto.sassu@huawei.com> Cc: Stephane Eranian <eranian@google.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Tom Rix <trix@redhat.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230311065753.3012826-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-08perf lock contention: Add -o/--lock-owner optionNamhyung Kim
When there're many lock contentions in the system, people sometimes want to know who caused the contention, IOW who's the owner of the locks. The -o/--lock-owner option tries to follow the lock owners for the contended mutexes and rwsems from BPF, and then attributes the contention time to the owner instead of the waiter. It's a best effort approach to get the owner info at the time of the contention and doesn't guarantee to have the precise tracking of owners if it's changing over time. Currently it only handles mutex and rwsem that have owner field in their struct and it basically points to a task_struct that owns the lock at the moment. Technically its type is atomic_long_t and it comes with some LSB bits used for other meanings. So it needs to clear them when casting it to a pointer to task_struct. Also the atomic_long_t is a typedef of the atomic 32 or 64 bit types depending on arch which is a wrapper struct for the counter value. I'm not aware of proper ways to access those kernel atomic types from BPF so I just read the internal counter value directly. Please let me know if there's a better way. When -o/--lock-owner option is used, it goes to the task aggregation mode like -t/--threads option does. However it cannot get the owner for other lock types like spinlock and sometimes even for mutex. $ sudo ./perf lock con -abo -- ./perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 4.766 [sec] 4.766540 usecs/op 209795 ops/sec contended total wait max wait avg wait pid owner 403 565.32 us 26.81 us 1.40 us -1 Unknown 4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe 1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe 1 2.03 us 2.03 us 2.03 us 5068 chrome As you can see, the owner is unknown for the most cases. But if we filter only for the mutex locks, it'd more likely get the onwers. $ sudo ./perf lock con -abo -Y mutex -- ./perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 4.910 [sec] 4.910435 usecs/op 203647 ops/sec contended total wait max wait avg wait pid owner 2 15.50 us 8.29 us 7.75 us 1582852 sched-pipe 7 7.20 us 2.47 us 1.03 us -1 Unknown 1 6.74 us 6.74 us 6.74 us 1582851 sched-pipe Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230207002403.63590-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-08perf lock contention: Fix to save callstack for the default modifiedNamhyung Kim
The previous change missed to set the con->save_callstack for the LOCK_AGGR_CALLER mode resulting in no caller information. Fixes: ebab291641bed48f ("perf lock contention: Support filters for different aggregation") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230207002403.63590-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-03perf lock contention: Support filters for different aggregationNamhyung Kim
It'd be useful to filter other than the current aggregation mode. For example, users may want to see callstacks for specific locks only. Or they may want tasks from a certain callstack. The tracepoints already collected the information but it needs to check the condition again when processing the event. And it needs to change BPF to allow the key combinations. The lock contentions on 'rcu_state' spinlock can be monitored: $ sudo perf lock con -abv -L rcu_state sleep 1 ... contended total wait max wait avg wait type caller 4 151.39 us 62.57 us 37.85 us spinlock rcu_core+0xcb 0xffffffff81fd1666 _raw_spin_lock_irqsave+0x46 0xffffffff8172d76b rcu_core+0xcb 0xffffffff822000eb __softirqentry_text_start+0xeb 0xffffffff816a0ba9 __irq_exit_rcu+0xc9 0xffffffff81fc0112 sysvec_apic_timer_interrupt+0xa2 0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16 0xffffffff81d49f78 cpuidle_enter_state+0xd8 0xffffffff81d4a259 cpuidle_enter+0x29 1 30.21 us 30.21 us 30.21 us spinlock rcu_core+0xcb 0xffffffff81fd1666 _raw_spin_lock_irqsave+0x46 0xffffffff8172d76b rcu_core+0xcb 0xffffffff822000eb __softirqentry_text_start+0xeb 0xffffffff816a0ba9 __irq_exit_rcu+0xc9 0xffffffff81fc00c4 sysvec_apic_timer_interrupt+0x54 0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16 1 28.84 us 28.84 us 28.84 us spinlock rcu_accelerate_cbs_unlocked+0x40 0xffffffff81fd1c60 _raw_spin_lock+0x30 0xffffffff81728cf0 rcu_accelerate_cbs_unlocked+0x40 0xffffffff8172da82 rcu_core+0x3e2 0xffffffff822000eb __softirqentry_text_start+0xeb 0xffffffff816a0ba9 __irq_exit_rcu+0xc9 0xffffffff81fc0112 sysvec_apic_timer_interrupt+0xa2 0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16 0xffffffff81d49f78 cpuidle_enter_state+0xd8 ... To see tasks calling 'rcu_core' function: $ sudo perf lock con -abt -S rcu_core sleep 1 contended total wait max wait avg wait pid comm 19 23.46 us 2.21 us 1.23 us 0 swapper 2 18.37 us 17.01 us 9.19 us 2061859 ThreadPoolForeg 3 5.76 us 1.97 us 1.92 us 3909 pipewire-pulse 1 2.26 us 2.26 us 2.26 us 1809271 MediaSu~isor #2 1 1.97 us 1.97 us 1.97 us 1514882 Chrome_ChildIOT 1 987 ns 987 ns 987 ns 3740 pipewire-pulse Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230203021324.143540-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-03perf lock contention: Use lock_stat_find{,new}Namhyung Kim
This is a preparation work to support complex keys of BPF maps. Now it has single value key according to the aggregation mode like stack_id or pid. But we want to use a combination of those keys. Then lock_contention_read() should still aggregate the result based on the key that was requested by user. The other key info will be used for filtering. So instead of creating a lock_stat entry always, Check if it's already there using lock_stat_find() first. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230203021324.143540-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-02perf lock contention: Add -S/--callstack-filter optionNamhyung Kim
The -S/--callstack-filter is to limit display entries having the given string in the callstack (not only in the caller in the output). The following example shows lock contention results if the callstack has 'net' substring somewhere. Note that the caller '__dev_queue_xmit' does not match to it, but it has 'inet6_csk_xmit' in the callstack. This applies even if you don't use -v option to show the full callstack. $ sudo ./perf lock con -abv -S net sleep 1 ... contended total wait max wait avg wait type caller 5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d 0xffffffffa5dd1c60 _raw_spin_lock+0x30 0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d 0xffffffffa5cd8267 ip6_finish_output2+0x2c7 0xffffffffa5cdac14 ip6_finish_output+0x1d4 0xffffffffa5cdb477 ip6_xmit+0x457 0xffffffffa5d1fd17 inet6_csk_xmit+0xd7 0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a 0xffffffffa5c6467d tcp_keepalive_timer+0x2fd Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230126000936.3017683-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-03perf lock contention: Fix core dump related to not finding the ↵Thomas Richter
"__sched_text_end" symbol on s/390 The test case perf lock contention dumps core on s390. Run the following commands: # ./perf lock record -- ./perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 2.799 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.073 MB perf.data (100 samples) ] # # ./perf lock contention Segmentation fault (core dumped) # The function call stack is lengthy, here are the top 5 functions: # gdb ./perf core.24048 GNU gdb (GDB) Fedora Linux 12.1-6.fc37 Core was generated by `./perf lock contention'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356 3356 machine->sched.text_end = kmap->unmap_ip(kmap, sym->start); (gdb) where #0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356 #1 0x000000000109f244 in callchain_id (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:957 #2 0x000000000109e094 in get_key_by_aggr_mode (key=0x3ffea4f7290, addr=27758136, evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:586 #3 0x000000000109f4d0 in report_lock_contention_begin_event (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1004 #4 0x00000000010a00ae in evsel__process_contention_begin (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1254 #5 0x00000000010a0e14 in process_sample_event (tool=0x3ffea4f8480, event=0x3ff85601ef8, sample=0x3ffea4f77d0, evsel=0x30313e0, machine=0x3029e28) at builtin-lock.c:1464 ..... The issue is in function machine__is_lock_function() in file ./util/machine.c lines 3355: /* should not fail from here */ sym = machine__find_kernel_symbol_by_name(machine, "__sched_text_end", &kmap); machine->sched.text_end = kmap->unmap_ip(kmap, sym->start) On s390 the symbol __sched_text_end is *NOT* in the symbol list and the resulting pointer sym is set to NULL. The sym->start is then a NULL pointer access and generates the core dump. The reason why __sched_text_end is not in the symbol list on s390 is simple: When the symbol list is created at perf start up with function calls dso__load +--> dso__load_vmlinux_path +--> dso__load_vmlinux +--> dso__load_sym +--> dso__load_sym_internal (reads kernel symbols) +--> symbols__fixup_end +--> symbols__fixup_duplicate The issue is in function symbols__fixup_duplicate(). It deletes all symbols with have the same address. On s390: # nm -g ~/linux/vmlinux| fgrep c68390 0000000000c68390 T __cpuidle_text_start 0000000000c68390 T __sched_text_end # two symbols have identical addresses and __sched_text_end is considered duplicate (in ascending sort order) and removed from the symbol list. Therefore it is missing and an invalid pointer reference occurs. The code checks for symbol __sched_text_start and when it exists assumes symbol __sched_text_end is also in the symbol table. However this is not the case on s390. Same situation exists for symbol __lock_text_start: 0000000000c68770 T __cpuidle_text_end 0000000000c68770 T __lock_text_start This symbol is also removed from the symbol table but used in function machine__is_lock_function(). To fix this and keep duplicate symbols in the symbol table, set symbol_conf.allow_aliases to true. This prevents the removal of duplicate symbols in function symbols__fixup_duplicate(). Output After: # ./perf lock contention contended total wait max wait avg wait type caller 48 124.39 ms 123.99 ms 2.59 ms rwsem:W unlink_anon_vmas+0x24a 47 83.68 ms 83.26 ms 1.78 ms rwsem:W free_pgtables+0x132 5 41.22 us 10.55 us 8.24 us rwsem:W free_pgtables+0x140 4 40.12 us 20.55 us 10.03 us rwsem:W copy_process+0x1ac8 # Fixes: 0d2997f750d1de39 ("perf lock: Look up callchain for the contended locks") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20221230102627.2410847-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>