summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2021-12-21perf tools: Record ARM64 LR register automaticallyAlexandre Truong
On ARM64, automatically record the link register if the frame pointer mode is on. It will be used to do a dwarf unwind to find the caller of the leaf frame if the frame pointer was omitted. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Alexandre Truong <alexandre.truong@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211217154521.80603-2-german.gomez@arm.com Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-21perf test: Use 3 digits for test numbering now we can have more testsCarsten Haitzler
This is in preparation for adding more tests that will need the test number to be 3 digts so they align nicely in the output. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Link: http://lore.kernel.org/lkml/20211215160403.69264-3-carsten.haitzler@foss.arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-18perf inject: Fix segfault due to perf_data__fd() without openAdrian Hunter
The fixed commit attempts to get the output file descriptor even if the file was never opened e.g. $ perf record uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ] $ perf inject -i perf.data --vm-time-correlation=dry-run Segmentation fault (core dumped) $ gdb --quiet perf Reading symbols from perf... (gdb) r inject -i perf.data --vm-time-correlation=dry-run Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. __GI___fileno (fp=0x0) at fileno.c:35 35 fileno.c: No such file or directory. (gdb) bt #0 __GI___fileno (fp=0x0) at fileno.c:35 #1 0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72 #2 perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69 #3 cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017 #4 0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313 #5 0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365 #6 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409 #7 main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539 (gdb) Fixes: 0ae03893623dd1dd ("perf tools: Pass a fd to perf_file_header__read_pipe()") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20211213084829.114772-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-18perf inject: Fix segfault due to close without openAdrian Hunter
The fixed commit attempts to close inject.output even if it was never opened e.g. $ perf record uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ] $ perf inject -i perf.data --vm-time-correlation=dry-run Segmentation fault (core dumped) $ gdb --quiet perf Reading symbols from perf... (gdb) r inject -i perf.data --vm-time-correlation=dry-run Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48 48 iofclose.c: No such file or directory. (gdb) bt #0 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48 #1 0x0000557fc7b74f92 in perf_data__close (data=data@entry=0x7ffcdafa6578) at util/data.c:376 #2 0x0000557fc7a6b807 in cmd_inject (argc=<optimized out>, argv=<optimized out>) at builtin-inject.c:1085 #3 0x0000557fc7ac4783 in run_builtin (p=0x557fc8074878 <commands+600>, argc=4, argv=0x7ffcdafb6a60) at perf.c:313 #4 0x0000557fc7a25d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365 #5 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409 #6 main (argc=4, argv=0x7ffcdafb6a60) at perf.c:539 (gdb) Fixes: 02e6246f5364d526 ("perf inject: Close inject.output on exit") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20211213084829.114772-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-18perf expr: Fix missing check for return value of hashmap__new()Miaoqian Lin
The hashmap__new() function may return ERR_PTR(-ENOMEM) when malloc() fails, add IS_ERR() checking for ctx->ids. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20211212062504.25841-1-linmq006@gmail.com [ s/kfree()/free()/ and add missing linux/err.h include ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-17perf arm-spe: Synthesize SPE instruction eventsGerman Gomez
Synthesize instruction events for every ARM SPE record. Arm SPE implements a hardware-based sample period, and perf implements a software-based one. Add a warning message to inform the user of this. Signed-off-by: German Gomez <german.gomez@arm.com> Tested-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211216152404.52474-1-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16perf test: Test 73 Sig_trap fails on s390Thomas Richter
In Linux next commit 5504f67944484495 ("perf test sigtrap: Add basic stress test for sigtrap handling") introduced the new test which uses breakpoint events. These events are not supported on s390 and PowerPC and always fail: # perf test -F 73 73: Sigtrap : FAILED! # Fix it the same way as in the breakpoint tests in file tests/bp_account.c where these type of tests are skipped on s390 and PowerPC platforms. With this patch skip this test on both platforms. Output after: # perf test -F 73 73: Sigtrap # Fixes: 5504f67944484495 ("perf test sigtrap: Add basic stress test for sigtrap handling") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Marco Elver <elver@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20211216151454.752066-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf ftrace: Implement cpu and task filters in BPFNamhyung Kim
Honor cpu and task options to set up filters (by pid or tid) in the BPF program. For example, the following command will show latency of the mutex_lock for process 2570. # perf ftrace latency -b -T mutex_lock -p 2570 sleep 3 # DURATION | COUNT | GRAPH | 0 - 1 us | 675 | ############################## | 1 - 2 us | 9 | | 2 - 4 us | 0 | | 4 - 8 us | 0 | | 8 - 16 us | 0 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | Committer testing: Looking at faults on a firefox process: # strace -e bpf perf ftrace latency -b -p 1674378 -T __handle_mm_fault bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffee1fee740, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\08\0\0\08\0\0\0\t\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=89, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\f\0\0\0\f\0\0\0\7\0\0\0\1\0\0\0\0\0\0\20"..., btf_log_buf=NULL, btf_size=43, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\5\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=77, btf_log_size=0, btf_log_level=0}, 128) = -1 EINVAL (Invalid argument) bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0 \3\0\0 \3\0\0\306\3\0\0\0\0\0\0\0\0\0\2"..., btf_log_buf=NULL, btf_size=1790, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=32, max_entries=1, map_flags=0, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=5, insns=0x7ffee1fee570, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 5 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=4, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffee1fee3c0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="test", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 4 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10000, map_flags=0, inner_map_fd=0, map_name="functime", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=1, map_flags=0, inner_map_fd=0, map_name="cpu_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 5 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=36, map_flags=0, inner_map_fd=0, map_name="task_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 6 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERCPU_ARRAY, key_size=4, value_size=8, max_entries=22, map_flags=0, inner_map_fd=0, map_name="latency", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 7 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=12, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="func_lat.bss", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=32, btf_vmlinux_value_type_id=0}, 128) = 8 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=8, key=0x7ffee1fee580, value=0x7f01d940a000, flags=BPF_ANY}, 128) = 0 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=42, insns=0x1871f30, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 14, 16), prog_flags=0, prog_name="func_begin", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x18746a0, func_info_cnt=1, line_info_rec_size=16, line_info=0x1874550, line_info_cnt=20, attach_btf_id=0, attach_prog_fd=0}, 128) = 9 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=99, insns=0x18769b0, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 14, 16), prog_flags=0, prog_name="func_end", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x188a640, func_info_cnt=1, line_info_rec_size=16, line_info=0x188a660, line_info_cnt=20, attach_btf_id=0, attach_prog_fd=0}, 128) = 10 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=6, key=0x7ffee1fee8e0, value=0x7ffee1fee8df, flags=BPF_ANY}, 128) = 0 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=2, insns=0x7ffee1fee3c0, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 12 bpf(BPF_LINK_CREATE, {link_create={prog_fd=12, target_fd=-1, attach_type=0x29 /* BPF_??? */, flags=0}}, 128) = -1 EINVAL (Invalid argument) ^Cstrace: Process 1702285 detached # DURATION | COUNT | GRAPH | 0 - 1 us | 109 | ################# | 1 - 2 us | 127 | ################### | 2 - 4 us | 36 | ##### | 4 - 8 us | 20 | ### | 8 - 16 us | 2 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf ftrace: Add -b/--use-bpf option for latency subcommandNamhyung Kim
The -b/--use-bpf option is to use BPF to get latency info of kernel functions. It'd have better performance impact and I observed that latency of same function is smaller than before when using BPF. Committer testing: # strace -e bpf perf ftrace latency -b -T __handle_mm_fault -a sleep 1 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7fff51914e00, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\08\0\0\08\0\0\0\t\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=89, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\f\0\0\0\f\0\0\0\7\0\0\0\1\0\0\0\0\0\0\20"..., btf_log_buf=NULL, btf_size=43, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\t\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=81, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0000\0\0\0000\0\0\0\5\0\0\0\0\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=77, btf_log_size=0, btf_log_level=0}, 128) = -1 EINVAL (Invalid argument) bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\350\2\0\0\350\2\0\0\353\2\0\0\0\0\0\0\0\0\0\2"..., btf_log_buf=NULL, btf_size=1515, btf_log_size=0, btf_log_level=0}, 128) = 3 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=32, max_entries=1, map_flags=0, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=5, insns=0x7fff51914c30, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 5 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=4, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="", map_ifindex=0, btf_fd=0, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7fff51914a80, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="test", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 4 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10000, map_flags=0, inner_map_fd=0, map_name="functime", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 4 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=1, map_flags=0, inner_map_fd=0, map_name="cpu_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 5 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=1, max_entries=1, map_flags=0, inner_map_fd=0, map_name="task_filter", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 7 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERCPU_ARRAY, key_size=4, value_size=8, max_entries=22, map_flags=0, inner_map_fd=0, map_name="latency", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=0, btf_vmlinux_value_type_id=0}, 128) = 8 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_ARRAY, key_size=4, value_size=4, max_entries=1, map_flags=BPF_F_MMAPABLE, inner_map_fd=0, map_name="func_lat.bss", map_ifindex=0, btf_fd=3, btf_key_type_id=0, btf_value_type_id=30, btf_vmlinux_value_type_id=0}, 128) = 9 bpf(BPF_MAP_UPDATE_ELEM, {map_fd=9, key=0x7fff51914c40, value=0x7f6e99be2000, flags=BPF_ANY}, 128) = 0 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=18, insns=0x11e4160, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 14, 16), prog_flags=0, prog_name="func_begin", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x11dfc50, func_info_cnt=1, line_info_rec_size=16, line_info=0x11e04c0, line_info_cnt=9, attach_btf_id=0, attach_prog_fd=0}, 128) = 10 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_KPROBE, insn_cnt=99, insns=0x11ded70, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(5, 14, 16), prog_flags=0, prog_name="func_end", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=3, func_info_rec_size=8, func_info=0x11dfc70, func_info_cnt=1, line_info_rec_size=16, line_info=0x11f6e10, line_info_cnt=20, attach_btf_id=0, attach_prog_fd=0}, 128) = 11 bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_TRACEPOINT, insn_cnt=2, insns=0x7fff51914a80, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0, func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 128) = 13 bpf(BPF_LINK_CREATE, {link_create={prog_fd=13, target_fd=-1, attach_type=0x29 /* BPF_??? */, flags=0}}, 128) = -1 EINVAL (Invalid argument) --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1699992, si_uid=0, si_status=0, si_utime=0, si_stime=0} --- bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=8, key=0x7fff51914f84, value=0x11f6fa0, flags=BPF_ANY}, 128) = 0 # DURATION | COUNT | GRAPH | 0 - 1 us | 52 | ################### | 1 - 2 us | 36 | ############# | 2 - 4 us | 24 | ######### | 4 - 8 us | 7 | ## | 8 - 16 us | 1 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | +++ exited with 0 +++ # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-5-namhyung@kernel.org [ Add missing util/cpumap.h include and removed unused 'fd' variable ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf ftrace: Add 'latency' subcommandNamhyung Kim
The perf ftrace latency is to get a histogram of function execution time. Users should give a function name using -T option. This is implemented using function_graph tracer with the given function only. And it parses the output to extract the time. $ sudo perf ftrace latency -a -T mutex_lock sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 4596 | ######################## | 1 - 2 us | 1680 | ######### | 2 - 4 us | 1106 | ##### | 4 - 8 us | 546 | ## | 8 - 16 us | 562 | ### | 16 - 32 us | 1 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | Committer testing: Latency for the __handle_mm_fault kernel function, system wide for 1 second, see how one can go from the usual 'perf ftrace' output, now the same as for the 'perf ftrace trace' subcommand, to the new 'perf ftrace latency' subcommand: # perf ftrace -T __handle_mm_fault -a sleep 1 | wc -l 709 # perf ftrace -T __handle_mm_fault -a sleep 1 | wc -l 510 # perf ftrace -T __handle_mm_fault -a sleep 1 | head -20 # tracer: function # # entries-in-buffer/entries-written: 0/0 #P:32 # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | perf-exec-1685104 [007] 90638.894613: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894620: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894622: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894635: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894688: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894702: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894714: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894728: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894740: __handle_mm_fault <-handle_mm_fault perf-exec-1685104 [007] 90638.894751: __handle_mm_fault <-handle_mm_fault sleep-1685104 [007] 90638.894962: __handle_mm_fault <-handle_mm_fault sleep-1685104 [007] 90638.894977: __handle_mm_fault <-handle_mm_fault sleep-1685104 [007] 90638.894983: __handle_mm_fault <-handle_mm_fault sleep-1685104 [007] 90638.894995: __handle_mm_fault <-handle_mm_fault # perf ftrace latency -T __handle_mm_fault -a sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 125 | ###### | 1 - 2 us | 249 | ############# | 2 - 4 us | 455 | ######################## | 4 - 8 us | 37 | # | 8 - 16 us | 0 | | 16 - 32 us | 0 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf ftrace: Move out common code from __cmd_ftraceNamhyung Kim
The signal setup code and evlist__prepare_workload() can be used for other subcommands. Let's move them out of the __cmd_ftrace(). Then it doesn't need to pass argc and argv. On the other hand, select_tracer() is specific to the 'trace' subcommand so it'd better moving it into the __cmd_ftrace(). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf ftrace: Add 'trace' subcommandNamhyung Kim
This is a preparation to add more sub-commands for ftrace. The 'trace' subcommand does the same thing when no subcommand is given. Committer testing: The previous mode, i.e. no subcommand and the new 'perf ftrace trace' are equivalent: # perf ftrace -G check_preempt_curr sleep 0.00001 # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 25) | check_preempt_curr() { 25) | resched_curr() { 25) | native_smp_send_reschedule() { 25) | default_send_IPI_single_phys() { 25) 0.110 us | __default_send_IPI_dest_field(); 25) 0.490 us | } 25) 0.640 us | } 25) 0.850 us | } 25) 2.060 us | } # perf ftrace trace -G check_preempt_curr sleep 0.00001 # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 10) | check_preempt_curr() { 10) | resched_curr() { 10) | native_smp_send_reschedule() { 10) | default_send_IPI_single_phys() { 10) 0.080 us | __default_send_IPI_dest_field(); 10) 0.460 us | } 10) 0.610 us | } 10) 0.830 us | } 10) 2.020 us | } # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20211215185154.360314-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf arch: Support register names from all archsGerman Gomez
When reading a perf.data file with register values, there is a mismatch between the names and the values of the registers because the tool is built using only the register names from the local architecture. Reading a perf.data file that was recorded on ARM64, gives the following erroneous output on an X86 machine: # perf report -i perf_arm64.data -D [...] 24661932634451 0x698 [0x21d0]: PERF_RECORD_SAMPLE(IP, 0x1): 43239/43239: 0xffffc5be8f100f98 period: 1 addr: 0 ... user regs: mask 0x1ffffffff ABI 64-bit .... AX 0x0000ffffd1515817 .... BX 0x0000ffffd1515480 .... CX 0x0000aaaadabf6c80 .... DX 0x000000000000002e .... SI 0x0000000040100401 .... DI 0x0040600200000080 .... BP 0x0000ffffd1510e10 .... SP 0x0000000000000000 .... IP 0x00000000000000dd .... FLAGS 0x0000ffffd1510cd0 .... CS 0x0000000000000000 .... SS 0x0000000000000030 .... DS 0x0000ffffa569a208 .... ES 0x0000000000000000 .... FS 0x0000000000000000 .... GS 0x0000000000000000 .... R8 0x0000aaaad3de9650 .... R9 0x0000ffffa57397f0 .... R10 0x0000000000000001 .... R11 0x0000ffffa57fd000 .... R12 0x0000ffffd1515817 .... R13 0x0000ffffd1515480 .... R14 0x0000aaaadabf6c80 .... R15 0x0000000000000000 .... unknown 0x0000000000000001 .... unknown 0x0000000000000000 .... unknown 0x0000000000000000 .... unknown 0x0000000000000000 .... unknown 0x0000000000000000 .... unknown 0x0000ffffd1510d90 .... unknown 0x0000ffffa5739b90 .... unknown 0x0000ffffd1510d80 .... XMM0 0x0000ffffa57392c8 ... thread: perf-exec:43239 ...... dso: [kernel.kallsyms] As can be seen, the register names correspond to X86 registers, even though the perf.data file was recorded on an ARM64 system. After this patch, the output of the command displays the correct register names: # perf report -i perf_arm64.data -D [...] 24661932634451 0x698 [0x21d0]: PERF_RECORD_SAMPLE(IP, 0x1): 43239/43239: 0xffffc5be8f100f98 period: 1 addr: 0 ... user regs: mask 0x1ffffffff ABI 64-bit .... x0 0x0000ffffd1515817 .... x1 0x0000ffffd1515480 .... x2 0x0000aaaadabf6c80 .... x3 0x000000000000002e .... x4 0x0000000040100401 .... x5 0x0040600200000080 .... x6 0x0000ffffd1510e10 .... x7 0x0000000000000000 .... x8 0x00000000000000dd .... x9 0x0000ffffd1510cd0 .... x10 0x0000000000000000 .... x11 0x0000000000000030 .... x12 0x0000ffffa569a208 .... x13 0x0000000000000000 .... x14 0x0000000000000000 .... x15 0x0000000000000000 .... x16 0x0000aaaad3de9650 .... x17 0x0000ffffa57397f0 .... x18 0x0000000000000001 .... x19 0x0000ffffa57fd000 .... x20 0x0000ffffd1515817 .... x21 0x0000ffffd1515480 .... x22 0x0000aaaadabf6c80 .... x23 0x0000000000000000 .... x24 0x0000000000000001 .... x25 0x0000000000000000 .... x26 0x0000000000000000 .... x27 0x0000000000000000 .... x28 0x0000000000000000 .... x29 0x0000ffffd1510d90 .... lr 0x0000ffffa5739b90 .... sp 0x0000ffffd1510d80 .... pc 0x0000ffffa57392c8 ... thread: perf-exec:43239 ...... dso: [kernel.kallsyms] Tester comments: Athira reports: "Looks good to me. Tested this patchset in powerpc by capturing regs in powerpc and doing perf report to read the data from x86." Reported-by: Alexandre Truong <alexandre.truong@arm.com> Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20211207180653.1147374-4-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf arm64: Rename perf_event_arm_regs for ARM64 registersGerman Gomez
The registers for ARM and ARM64 are enumerated using two enums that have the same name. In order to be able to import both headers, the name of one can be replaced using the C preprocessor like so: #define perf_event_arm_regs perf_event_arm64_regs #include <asm/perf_regs.h> #undef perf_event_arm_regs This patch updates all imports of ARM64's perf_regs.h in order to prevent the naming collision. Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20211207180653.1147374-3-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf namespaces: Add helper nsinfo__is_in_root_namespace()Leo Yan
Refactors code for gathering PID infos, it creates the function nsinfo__get_nspid() to parse process 'status' node in folder '/proc'. Base on the refactoring, this patch introduces a new helper nsinfo__is_in_root_namespace(), it returns true when the caller runs in the root PID namespace. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonatan Goldschmidt <yonatan.goldschmidt@granulate.io> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20211212134721.1721245-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf bpf-loader: Use IS_ERR_OR_NULL() to clean code and fix checkMiaoqian Lin
Use IS_ERR_OR_NULL() to make the code cleaner. Also if the priv is NULL, it's improper to call PTR_ERR(priv). Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Cc: unlisted-recipients Link: http://lore.kernel.org/lkml/20211212135613.20000-1-linmq006@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf cs-etm: Remove duplicate and incorrect aux size checksJames Clark
There are two checks, one is for size when running without admin, but this one is covered by the driver and reported on in more detail here (builtin-record.c): pr_err("Permission error mapping pages.\n" "Consider increasing " "/proc/sys/kernel/perf_event_mlock_kb,\n" "or try again with a smaller value of -m/--mmap_pages.\n" "(current value: %u,%u)\n", This had the effect of artificially limiting the aux buffer size to a value smaller than what was allowed because perf_event_mlock_kb wasn't taken into account. The second is to check for a power of two, but this is covered here (evlist.c): pr_info("rounding mmap pages size to %s (%lu pages)\n", buf, pages); Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211208115435.610101-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf vendor events: Rename arm64 arch std event filesAndrew Kilroy
A previous commit adds pmu events into the files armv8-common-and-microarch.json armv8-recommended.json that are actually specified in an armv9 reference supplement, not armv8. As such, naming the files with the armv8 prefix seems artificial. This patch renames the files to reflect that these two files are for arch std events regardless of whether they are defined in armv8 or armv9. Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211210123706.7490-3-andrew.kilroy@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf vendor events: For the Arm Neoverse N2Andrew Kilroy
Updates the common and microarch json file to add counters available in the Arm Neoverse N2 chip, but should also apply to other ArmV8 and ArmV9 cpus. Specified in ArmV8 architecture reference manual https://developer.arm.com/documentation/ddi0487/gb/?lang=en Some of the counters added to armv8-common-and-microarch.json are specified in the ArmV9 architecture reference manual supplement (issue A.a): https://developer.arm.com/documentation/ddi0608/aa The additional ArmV9 counters are TRB_WRAP TRCEXTOUT0 TRCEXTOUT1 TRCEXTOUT2 TRCEXTOUT3 CTI_TRIGOUT4 CTI_TRIGOUT5 CTI_TRIGOUT6 CTI_TRIGOUT7 This patch also adds files in pmu-events/arch/arm64/arm/neoverse-n2 for perf list to output the counter names in categories. Counters on the Neoverse N2 are stated in its reference manual: https://developer.arm.com/documentation/102099/0000 Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211210123706.7490-2-andrew.kilroy@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf dlfilter: Drop unused variableSalvatore Bonaccorso
Compiling tools/perf/dlfilters/dlfilter-test-api-v0.c result in: checking for stdlib.h... dlfilters/dlfilter-test-api-v0.c: In function ‘filter_event’: dlfilters/dlfilter-test-api-v0.c:311:29: warning: unused variable ‘d’ [-Wunused-variable] 311 | struct filter_data *d = data; | So remove the variable now. Reviewed-by: German Gomez <german.gomez@arm.com> Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20211123211821.132924-1-carnil@debian.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf arm-spe: Add SPE total latency as PERF_SAMPLE_WEIGHTNamhyung Kim
Use total latency info in the SPE counter packet as sample weight so that we can see it in local_weight and (global) weight sort keys. Maybe we can use PERF_SAMPLE_WEIGHT_STRUCT to support ins_lat as well but I'm not sure which latency it matches. So just adding total latency first. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211201220855.1260688-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf bench: Use unbuffered output when pipe/tee'ing to a fileSohaib Mohamed
The output of 'perf bench' gets buffered when I pipe it to a file or to tee, in such a way that I can see it only at the end. E.g. $ perf bench internals synthesize -t < output comes out fine after each test run > $ perf bench internals synthesize -t | tee file.txt < output comes out only at the end of all tests > This patch resolves this issue for 'bench' and 'test' subcommands. See, also: $ perf bench mem all | tee file.txt $ perf bench sched all | tee file.txt $ perf bench internals all -t | tee file.txt $ perf bench internals all | tee file.txt Committer testing: It really gets staggered, i.e. outputs in bursts, when the buffer fills up and has to be drained to make up space for more output. Suggested-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Sohaib Mohamed <sohaib.amhmd@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fabian Hemmer <copy@copy.sh> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20211119061409.78004-1-sohaib.amhmd@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo
To pick up fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-14tools/perf: Stop using bpf_object__find_program_by_title API.Kui-Feng Lee
bpf_obj__find_program_by_title() in libbpf is going to be deprecated. Call bpf_object_for_each_program to find a program in the section with a given name instead. Signed-off-by: Kui-Feng Lee <kuifeng@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211214035931.1148209-4-kuifeng@fb.com
2021-12-11perf python: Fix NULL vs IS_ERR_OR_NULL() checkingMiaoqian Lin
The function trace_event__tp_format_id may return ERR_PTR(-ENOMEM). Use IS_ERR_OR_NULL to check tp_format. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: http://lore.kernel.org/lkml/20211211053856.19827-1-linmq006@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11perf intel-pt: Fix error timestamp setting on the decoder error pathAdrian Hunter
An error timestamp shows the last known timestamp for the queue, but this is not updated on the error path. Fix by setting it. Fixes: f4aa081949e7b6 ("perf tools: Add Intel PT decoder") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11perf intel-pt: Fix missing 'instruction' events with 'q' optionAdrian Hunter
FUP packets contain IP information, which makes them also an 'instruction' event in 'hop' mode i.e. the itrace 'q' option. That wasn't happening, so restructure the logic so that FUP events are added along with appropriate 'instruction' and 'branch' events. Fixes: 7c1b16ba0e26e6 ("perf intel-pt: Add support for decoding FUP/TIP only") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11perf intel-pt: Fix next 'err' value, walking traceAdrian Hunter
Code after label 'next:' in intel_pt_walk_trace() assumes 'err' is zero, but it may not be, if arrived at via a 'goto'. Ensure it is zero. Fixes: 7c1b16ba0e26e6 ("perf intel-pt: Add support for decoding FUP/TIP only") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11perf intel-pt: Fix state setting when receiving overflow (OVF) packetAdrian Hunter
An overflow (OVF packet) is treated as an error because it represents a loss of trace data, but there is no loss of synchronization, so the packet state should be INTEL_PT_STATE_IN_SYNC not INTEL_PT_STATE_ERR_RESYNC. To support that, some additional variables must be reset, and the FUP packet that may follow OVF is treated as an FUP event. Fixes: f4aa081949e7b6 ("perf tools: Add Intel PT decoder") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11perf intel-pt: Fix intel_pt_fup_event() assumptions about setting state typeAdrian Hunter
intel_pt_fup_event() assumes it can overwrite the state type if there has been an FUP event, but this is an unnecessary and unexpected constraint on callers. Fix by touching only the state type flags that are affected by an FUP event. Fixes: a472e65fc490a ("perf intel-pt: Add decoder support for ptwrite and power event packets") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11perf intel-pt: Fix sync state when a PSB (synchronization) packet is foundAdrian Hunter
When syncing, it may be that branch packet generation is not enabled at that point, in which case there will not immediately be a control-flow packet, so some packets before a control flow packet turns up, get ignored. However, the decoder is in sync as soon as a PSB is found, so the state should be set accordingly. Fixes: f4aa081949e7b6 ("perf tools: Add Intel PT decoder") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11perf intel-pt: Fix some PGE (packet generation enable/control flow packets) ↵Adrian Hunter
usage Packet generation enable (PGE) refers to whether control flow (COFI) packets are being produced. PGE may be false even when branch-tracing is enabled, due to being out-of-context, or outside a filter address range. Fix some missing PGE usage. Fixes: 7c1b16ba0e26e6 ("perf intel-pt: Add support for decoding FUP/TIP only") Fixes: 839598176b0554 ("perf intel-pt: Allow decoding with branch tracing disabled") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.15+ Link: https://lore.kernel.org/r/20211210162303.2288710-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-11perf tools: Prevent out-of-bounds access to registersGerman Gomez
The size of the cache of register values is arch-dependant (PERF_REGS_MAX). This has the potential of causing an out-of-bounds access in the function "perf_reg_value" if the local architecture contains less registers than the one the perf.data file was recorded on. Since the maximum number of registers is bound by the bitmask "u64 cache_mask", and the size of the cache when running under x86 systems is 64 already, fix the size to 64 and add a range-check to the function "perf_reg_value" to prevent out-of-bounds access. Reported-by: Alexandre Truong <alexandre.truong@arm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20211201123334.679131-2-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-10Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Andrii Nakryiko says: ==================== bpf-next 2021-12-10 v2 We've added 115 non-merge commits during the last 26 day(s) which contain a total of 182 files changed, 5747 insertions(+), 2564 deletions(-). The main changes are: 1) Various samples fixes, from Alexander Lobakin. 2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov. 3) A batch of new unified APIs for libbpf, logging improvements, version querying, etc. Also a batch of old deprecations for old APIs and various bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko. 4) BPF documentation reorganization and improvements, from Christoph Hellwig and Dave Tucker. 5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in libbpf, from Hengqi Chen. 6) Verifier log fixes, from Hou Tao. 7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong. 8) Extend branch record capturing to all platforms that support it, from Kajol Jain. 9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi. 10) bpftool doc-generating script improvements, from Quentin Monnet. 11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet. 12) Deprecation warning fix for perf/bpf_counter, from Song Liu. 13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf, from Tiezhu Yang. 14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song. 15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun, and others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits) libbpf: Add "bool skipped" to struct bpf_map libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition bpftool: Switch bpf_object__load_xattr() to bpf_object__load() selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr() selftests/bpf: Add test for libbpf's custom log_buf behavior selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load() libbpf: Deprecate bpf_object__load_xattr() libbpf: Add per-program log buffer setter and getter libbpf: Preserve kernel error code and remove kprobe prog type guessing libbpf: Improve logging around BPF program loading libbpf: Allow passing user log setting through bpf_object_open_opts libbpf: Allow passing preallocated log_buf when loading BTF into kernel libbpf: Add OPTS-based bpf_btf_load() API libbpf: Fix bpf_prog_load() log_buf logic for log_level 0 samples/bpf: Remove unneeded variable bpf: Remove redundant assignment to pointer t selftests/bpf: Fix a compilation warning perf/bpf_counter: Use bpf_map_create instead of bpf_create_map samples: bpf: Fix 'unknown warning group' build warning on Clang samples: bpf: Fix xdp_sample_user.o linking with Clang ... ==================== Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08perf/bpf_counter: Use bpf_map_create instead of bpf_create_mapSong Liu
bpf_create_map is deprecated. Replace it with bpf_map_create. Also add a __weak bpf_map_create() so that when older version of libbpf is linked as a shared library, it falls back to bpf_create_map(). Fixes: 992c4225419a ("libbpf: Unify low-level map creation APIs w/ new bpf_map_create()") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211207232340.2561471-1-song@kernel.org
2021-12-07perf vendor events arm64: Fix JSON indentation to 4 spaces standardAndrew Kilroy
Correct indentation to 4 spaces, same as the other JSON files. Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20211203123525.31127-2-andrew.kilroy@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf stat: Support --cputype option for hybrid eventsJin Yao
In previous patch, we have supported the syntax which enables the event on a specified pmu, such as: cpu_core/<event>/ cpu_atom/<event>/ While this syntax is not very easy for applying on a set of events or applying on a group. In following example, we have to explicitly assign the pmu prefix. # ./perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -- sleep 1 Performance counter stats for 'sleep 1': 1,158,545 cpu_core/cycles/ 1,003,113 cpu_core/instructions/ 1.002428712 seconds time elapsed A much easier way is: # ./perf stat --cputype core -e '{cycles,instructions}' -- sleep 1 Performance counter stats for 'sleep 1': 1,101,071 cpu_core/cycles/ 939,892 cpu_core/instructions/ 1.002363142 seconds time elapsed For this example, the '--cputype' enables the events from specified pmu (cpu_core). If '--cputype' conflicts with pmu prefix, '--cputype' is ignored. # ./perf stat --cputype core -e cycles,cpu_atom/instructions/ -a -- sleep 1 Performance counter stats for 'system wide': 21,003,407 cpu_core/cycles/ 367,886 cpu_atom/instructions/ 1.002203520 seconds time elapsed Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210909062215.10278-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf tools: Drop requirement for libstdc++.so for libopencsd checkUwe Kleine-König
It's possible to link against libopencsd_c_api without having libstdc++.so available, only libstdc++.so.6.0.28 (or whatever version is in use) needs to be available. The same holds true for libopencsd.so. When -lstdc++ (or -lopencsd) is explicitly passed to the linker however the .so file must be available. So wrap adding the dependencies into a check for static linking that actually requires adding them all. The same construct is already used for some other tests in the same file to reduce dependencies in the dynamic linking case. Fixes: 573cf5c9a152 ("perf build: Add missing -lstdc++ when linking with libopencsd") Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org> Cc: Adrian Bunk <bunk@debian.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Branislav Rankov <branislav.rankov@arm.com> Cc: Diederik de Haas <didi.debian@cknow.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/all/20211203210544.1137935-1-uwe@kleine-koenig.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf parse-events: Architecture specific leader overrideIan Rogers
Currently topdown events must appear after a slots event: $ perf stat -e '{slots,topdown-fe-bound}' /bin/true Performance counter stats for '/bin/true': 3,183,090 slots 986,133 topdown-fe-bound Reversing the events yields: $ perf stat -e '{topdown-fe-bound,slots}' /bin/true Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-fe-bound). For metrics the order of events is determined by iterating over a hashmap, and so slots isn't guaranteed to be first which can yield this error. Change the set_leader in parse-events, called when a group is closed, so that rather than always making the first event the leader, if the slots event exists then it is made the leader. It is then moved to the head of the evlist otherwise it won't be opened in the correct order. The result is: $ perf stat -e '{topdown-fe-bound,slots}' /bin/true Performance counter stats for '/bin/true': 3,274,795 slots 1,001,702 topdown-fe-bound A problem with this approach is the slots event is identified by name, names can be overwritten like 'cpu/slots,name=foo/' and this causes the leader change to fail. The change also modifies and fixes mixed groups like, with the change: $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2 Performance counter stats for 'system wide': 5574985410 slots 971981616 instructions 1348461887 topdown-fe-bound 2.001263120 seconds time elapsed Without the change: $ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2 Performance counter stats for 'system wide': <not counted> instructions <not counted> slots <not supported> topdown-fe-bound 2.006247990 seconds time elapsed Something that may be undesirable here is that the events are reordered in the output. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vineet Singh <vineet.singh@intel.com> Link: http://lore.kernel.org/lkml/20211130174945.247604-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf evlist: Allow setting arbitrary leaderIan Rogers
The leader of a group is the first, but allow it to be an arbitrary list member so that for Intel topdown events slots may always be the group leader. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vineet Singh <vineet.singh@intel.com> Link: http://lore.kernel.org/lkml/20211130174945.247604-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf metric: Reduce multiplexing with duration_timeIan Rogers
It is common to use the same counters with and without duration_time. The ID sharing code treats duration_time as if it were a hardware event placed in the same group. This causes unnecessary multiplexing such as in the following example where l3_cache_access isn't shared: $ perf stat -M l3 -a sleep 1 Performance counter stats for 'system wide': 3,117,007 l3_cache_miss # 199.5 MB/s l3_rd_bw # 43.6 % l3_hits # 56.4 % l3_miss (50.00%) 5,526,447 l3_cache_access (50.00%) 5,392,435 l3_cache_access # 5389191.2 access/s l3_access_rate (50.00%) 1,000,601,901 ns duration_time 1.000601901 seconds time elapsed Fix this by placing duration_time in all groups unless metric sharing has been disabled on the command line: $ perf stat -M l3 -a sleep 1 Performance counter stats for 'system wide': 3,597,972 l3_cache_miss # 230.3 MB/s l3_rd_bw # 48.0 % l3_hits # 52.0 % l3_miss 6,914,459 l3_cache_access # 6909935.9 access/s l3_access_rate 1,000,654,579 ns duration_time 1.000654579 seconds time elapsed $ perf stat --metric-no-merge -M l3 -a sleep 1 Performance counter stats for 'system wide': 3,501,834 l3_cache_miss # 53.5 % l3_miss (24.99%) 6,548,173 l3_cache_access (24.99%) 3,417,622 l3_cache_miss # 45.7 % l3_hits (25.04%) 6,294,062 l3_cache_access (25.04%) 5,923,238 l3_cache_access # 5919688.1 access/s l3_access_rate (24.99%) 1,000,599,683 ns duration_time 3,607,486 l3_cache_miss # 230.9 MB/s l3_rd_bw (49.97%) 1.000599683 seconds time elapsed v2. Doesn't count duration_time in the metric_list_cmp function that sorts larger metrics first. Without this a metric with duration_time and an event is sorted the same as a metric with two events, possibly not allowing the first metric to share with the second. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211124015226.3317994-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf trace: Enable ignore_missing_thread for traceGang Li
perf already support ignore_missing_thread for -u/-p, but not yet applied to `perf trace`. This patch enables ignore_missing_thread for `perf trace`. Signed-off-by: Gang Li <ligang.bdlg@bytedance.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1481538943-21874-6-git-send-email-jolsa@kernel.org Link: http://lkml.kernel.org/r/1513148513-6974-1-git-send-email-zhangmengting@huawei.com Link: http://lore.kernel.org/lkml/20211123074018.11406-1-ligang.bdlg@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf docs: Update link to AMD documentationSandipan Das
This updates the link to documentation on AMD processors. The new link points to a page where users can find the Processor Programming Reference (PPR) documents for the family and model codes corresponding to processors they are using. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Robert Richter <rrichter@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20211123084613.243792-2-sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf docs: Add info on AMD raw event encodingSandipan Das
AMD processors have events with event select codes and unit masks larger than a byte. The core PMU, for example, uses 12-bit event select codes split between bits 0-7 and 32-35 of the PERF_CTL MSRs as can be seen from /sys/bus/event_sources/devices/cpu/format/*. The Processor Programming Reference (PPR) lists the event codes as unified 12-bit hexadecimal values instead and the split between the bits is not apparent to someone who is not aware of the layout of the PERF_CTL MSRs. 8-bit event select codes continue to work as the layout matches that of the PERF_CTL MSRs i.e. bits 0-7 for event select and 8-15 for unit mask. This adds more details in the perf man pages about using /sys/bus/event_sources/devices/*/format/* for determining the correct raw event encoding scheme. E.g. the "op_cache_hit_miss.op_cache_hit" event with code 0x28f and umask 0x03 can be programmed using its symbolic name as: $ sudo perf --debug perf-event-open stat -e op_cache_hit_miss.op_cache_hit sleep 1 ------------------------------------------------------------ perf_event_attr: type 4 size 128 config 0x20000038f sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ [...] One might use a simple eventsel+umask combination based on what the current man pages say and incorrectly program the event as: $ sudo perf --debug perf-event-open stat -e r0328f sleep 1 ------------------------------------------------------------ perf_event_attr: type 4 size 128 config 0x328f sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ [...] When it should have been based on the format from sysfs: $ cat /sys/bus/event_source/devices/cpu/format/event config:0-7,32-35 $ sudo perf --debug perf-event-open stat -e r20000038f sleep 1 ------------------------------------------------------------ perf_event_attr: type 4 size 128 config 0x20000038f sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ------------------------------------------------------------ [...] Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Robert Richter <rrichter@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20211123084613.243792-1-sandipan.das@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07libperf: Adopt perf_counts_values__scale() from tools/perf/utilShunsuke Nakamura
Move perf_counts_values__scale() from tools/perf/util to tools/lib/perf so that it can be used with libperf. Committer notes: As noted by Jiri, use __s8 instead of s8 on the exported function. Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20211109085831.3770594-2-nakamura.shun@fujitsu.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07tools build: Enable warnings through HOSTCFLAGSJohn Garry
The tools build system uses KBUILD_HOSTCFLAGS symbol for obvious purposes. However this is not set for anything under tools/ As such, host tools apps built have no compiler warnings enabled. Declare HOSTCFLAGS for perf tools build, and also use that symbol in declaration of host_c_flags. HOSTCFLAGS comes from EXTRA_WARNINGS, which is independent of target platform/arch warning flags. Suggested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Laura Abbott <labbott@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1635525041-151876-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf test sigtrap: Print errno string when failingArnaldo Carvalho de Melo
Helps a bit the user figuring out why it is failing: Before: $ perf test sigtrap 73: Sigtrap : FAILED! $ perf test -v sigtrap 73: Sigtrap : --- start --- test child forked, pid 3816772 FAILED sys_perf_event_open() test child finished with -1 ---- end ---- Sigtrap: FAILED! $ After: $ perf test sigtrap 73: Sigtrap : FAILED! $ perf test -v sigtrap 73: Sigtrap : --- start --- test child forked, pid 3816772 FAILED sys_perf_event_open(): Permission denied test child finished with -1 ---- end ---- Sigtrap: FAILED! $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fabian Hemmer <copy@copy.sh> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kasan-dev@googlegroups.com Link: http://lore.kernel.org/lkml/YZOpSVOCXe0zWeRs@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-07perf test sigtrap: Add basic stress test for sigtrap handlingMarco Elver
Add basic stress test for sigtrap handling as a perf tool built-in test. This allows sanity checking the basic sigtrap functionality from within the perf tool. Committer notes: Reported that !root was getting -EPERM, applied a fixup from Marco to set .exclude_{hv,kernel} that made it work. Signed-off-by: Marco Elver <elver@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fabian Hemmer <copy@copy.sh> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kasan-dev@googlegroups.com Link: http://lore.kernel.org/lkml/20211115112822.4077224-1-elver@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-06perf bpf_skel: Do not use typedef to avoid error on old clangSong Liu
When building bpf_skel with clang-10, typedef causes confusions like: libbpf: map 'prev_readings': unexpected def kind var. Fix this by removing the typedef. Fixes: 7fac83aaf2eecc9e ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/BEF5C312-4331-4A60-AEC0-AD7617CB2BC4@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>