summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2024-04-03perf annotate: Use ins__is_xxx() if possibleNamhyung Kim
This is to prepare separation of disasm related code. Use the public ins API instead of checking the internal data structure. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf evsel: Use evsel__name_is() helperYang Jihong
Code cleanup, replace strcmp(evsel__name(evsel, {NAME})) with evsel__name_is() helper. No functional change. Committer notes: Fix this build error: trace.syscalls.events.bpf_output = evlist__last(trace.evlist); - assert(evsel__name_is(trace.syscalls.events.bpf_output), "__augmented_syscalls__"); + assert(evsel__name_is(trace.syscalls.events.bpf_output, "__augmented_syscalls__")); Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240401062724.1006010-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf sched timehist: Fix -g/--call-graph option failureYang Jihong
When 'perf sched' enables the call-graph recording, sample_type of dummy event does not have PERF_SAMPLE_CALLCHAIN, timehist_check_attr() checks that the evsel does not have a callchain, and set show_callchain to 0. Currently 'perf sched timehist' only saves callchain when processing the 'sched:sched_switch event', timehist_check_attr() only needs to determine whether the event has PERF_SAMPLE_CALLCHAIN. Before: # perf sched record -g true [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 4.153 MB perf.data (7536 samples) ] # perf sched timehist Samples do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 147851.826019 [0000] perf[285035] 0.000 0.000 0.000 147851.826029 [0000] migration/0[15] 0.000 0.003 0.009 147851.826063 [0001] perf[285035] 0.000 0.000 0.000 147851.826069 [0001] migration/1[21] 0.000 0.003 0.006 <SNIP> After: # perf sched record -g true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.572 MB perf.data (822 samples) ] # perf sched timehist time cpu task name waittime sch delay runtime [tid/pid] (msec) (msec) (msec) ----------- --- --------------- -------- -------- ----- 4193.035164 [0] perf[277062] 0.000 0.000 0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion 4193.035174 [0] migration/0[15] 0.000 0.003 0.009 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork 4193.035207 [1] perf[277062] 0.000 0.000 0.000 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- preempt_schedule_common <- __cond_resched <- __wait_for_common <- wait_for_completion 4193.035214 [1] migration/1[21] 0.000 0.003 0.007 __traceiter_sched_switch <- __traceiter_sched_switch <- __sched_text_start <- smpboot_thread_fn <- kthread <- ret_from_fork <SNIP> Fixes: 9c95e4ef06572349 ("perf evlist: Add evlist__findnew_tracking_event() helper") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240401062724.1006010-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf annotate: Honor output options with --data-typeNamhyung Kim
For data type profiling output, it should be in sync with normal output so make it display percentage for each field. Also use coloring scheme for users to identify fields with big overhead easily. Users can use --show-total-period or --show-nr-samples to change the output style like in the normal perf annotate output. Before: $ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ samples offset size field 34 0 9792 struct task_struct { 2 0 24 struct thread_info thread_info { 0 0 8 long unsigned int flags; 1 8 8 long unsigned int syscall_work; 0 16 4 u32 status; 1 20 4 u32 cpu; }; After: $ perf annotate --data-type Annotate type: 'struct task_struct' in [kernel.kallsyms] (34 samples): ============================================================================ Percent offset size field 100.00 0 9792 struct task_struct { 3.55 0 24 struct thread_info thread_info { 0.00 0 8 long unsigned int flags; 1.63 8 8 long unsigned int syscall_work; 0.00 16 4 u32 status; 1.91 20 4 u32 cpu; }; Committer testing: First collect a suitable perf.data file for use with 'perf annotate --data-type': root@number:~# perf mem record -a sleep 1s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 11.047 MB perf.data (3466 samples) ] root@number:~# Then, before: root@number:~# perf annotate --data-type Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ samples offset size field 6 0 40 union { 6 0 40 struct __pthread_mutex_s __data { 2 0 4 int __lock; 0 4 4 unsigned int __count; 0 8 4 int __owner; 1 12 4 unsigned int __nusers; 2 16 4 int __kind; 1 20 2 short int __spins; 0 22 2 short int __elision; 0 24 16 __pthread_list_t __list { 0 24 8 struct __pthread_internal_list* __prev; 0 32 8 struct __pthread_internal_list* __next; }; }; 0 0 0 char* __size; 2 0 8 long int __align; }; <SNIP> And after: Annotate type: 'union ' in /usr/lib64/libc.so.6 (6 samples): ============================================================================ Percent offset size field 100.00 0 40 union { 100.00 0 40 struct __pthread_mutex_s __data { 31.27 0 4 int __lock; 0.00 4 4 unsigned int __count; 0.00 8 4 int __owner; 7.67 12 4 unsigned int __nusers; 53.10 16 4 int __kind; 7.96 20 2 short int __spins; 0.00 22 2 short int __elision; 0.00 24 16 __pthread_list_t __list { 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0 0 char* __size; 31.27 0 8 long int __align; }; <SNIP> The lines with percentages >= 7.67 have its percentages red colored. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf annotate: Get rid of duplicate --group option itemNamhyung Kim
The options array in cmd_annotate() has duplicate --group options. It only needs one and let's get rid of the other. $ perf annotate -h 2>&1 | grep group --group Show event group information together --group Show event group information together Fixes: 7ebaf4890f63eb90 ("perf annotate: Support '--group' option") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240322224313.423181-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf beauty: Move uapi/linux/vhost.h copy out of the directory used to build ↵Arnaldo Carvalho de Melo
perf It is only used to generate string tables, not to build perf, so move it to the tools/perf/trace/beauty/include/ hierarchy, that is used just for scraping. This is a something that should've have happened, as happened with the linux/socket.h scrapper, do it now as Ian suggested while doing an audit/refactor session in the headers used by perf. No other tools/ living code uses it, just <linux/vhost.h> coming from either 'make install_headers' or from the system /usr/include/ directory. Suggested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fWZVrpRufO4w-S4EcSi9STXcTAN2ERLwTSN7yrSSA-otQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf dso: Reorder members to save space in 'struct dso'Ian Rogers
Save 40 bytes and move from 8 to 7 cache lines. Make member dwfl dependent on being a powerpc build. Squeeze bits of int/enum types when appropriate. Remove holes/padding by reordering variables. Before: struct dso { struct mutex lock; /* 0 40 */ struct list_head node; /* 40 16 */ struct rb_node rb_node __attribute__((__aligned__(8))); /* 56 24 */ /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */ struct rb_root * root; /* 80 8 */ struct rb_root_cached symbols; /* 88 16 */ struct symbol * * symbol_names; /* 104 8 */ size_t symbol_names_len; /* 112 8 */ struct rb_root_cached inlined_nodes; /* 120 16 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ struct rb_root_cached srclines; /* 136 16 */ struct { u64 addr; /* 152 8 */ struct symbol * symbol; /* 160 8 */ } last_find_result; /* 152 16 */ void * a2l; /* 168 8 */ char * symsrc_filename; /* 176 8 */ unsigned int a2l_fails; /* 184 4 */ enum dso_space_type kernel; /* 188 4 */ /* --- cacheline 3 boundary (192 bytes) --- */ _Bool is_kmod; /* 192 1 */ /* XXX 3 bytes hole, try to pack */ enum dso_swap_type needs_swap; /* 196 4 */ enum dso_binary_type symtab_type; /* 200 4 */ enum dso_binary_type binary_type; /* 204 4 */ enum dso_load_errno load_errno; /* 208 4 */ u8 adjust_symbols:1; /* 212: 0 1 */ u8 has_build_id:1; /* 212: 1 1 */ u8 header_build_id:1; /* 212: 2 1 */ u8 has_srcline:1; /* 212: 3 1 */ u8 hit:1; /* 212: 4 1 */ u8 annotate_warned:1; /* 212: 5 1 */ u8 auxtrace_warned:1; /* 212: 6 1 */ u8 short_name_allocated:1; /* 212: 7 1 */ u8 long_name_allocated:1; /* 213: 0 1 */ u8 is_64_bit:1; /* 213: 1 1 */ /* XXX 6 bits hole, try to pack */ _Bool sorted_by_name; /* 214 1 */ _Bool loaded; /* 215 1 */ u8 rel; /* 216 1 */ /* XXX 7 bytes hole, try to pack */ struct build_id bid; /* 224 32 */ /* --- cacheline 4 boundary (256 bytes) --- */ u64 text_offset; /* 256 8 */ u64 text_end; /* 264 8 */ const char * short_name; /* 272 8 */ const char * long_name; /* 280 8 */ u16 long_name_len; /* 288 2 */ u16 short_name_len; /* 290 2 */ /* XXX 4 bytes hole, try to pack */ void * dwfl; /* 296 8 */ struct auxtrace_cache * auxtrace_cache; /* 304 8 */ int comp; /* 312 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 5 boundary (320 bytes) --- */ struct { struct rb_root cache; /* 320 8 */ int fd; /* 328 4 */ int status; /* 332 4 */ u32 status_seen; /* 336 4 */ /* XXX 4 bytes hole, try to pack */ u64 file_size; /* 344 8 */ struct list_head open_entry; /* 352 16 */ u64 elf_base_addr; /* 368 8 */ u64 debug_frame_offset; /* 376 8 */ /* --- cacheline 6 boundary (384 bytes) --- */ u64 eh_frame_hdr_addr; /* 384 8 */ u64 eh_frame_hdr_offset; /* 392 8 */ } data; /* 320 80 */ struct { u32 id; /* 400 4 */ u32 sub_id; /* 404 4 */ struct perf_env * env; /* 408 8 */ } bpf_prog; /* 400 16 */ union { void * priv; /* 416 8 */ u64 db_id; /* 416 8 */ }; /* 416 8 */ struct nsinfo * nsinfo; /* 424 8 */ struct dso_id id; /* 432 24 */ /* --- cacheline 7 boundary (448 bytes) was 8 bytes ago --- */ refcount_t refcnt; /* 456 4 */ char name[]; /* 460 0 */ /* size: 464, cachelines: 8, members: 49 */ /* sum members: 440, holes: 4, sum holes: 18 */ /* sum bitfield members: 10 bits, bit holes: 1, sum bit holes: 6 bits */ /* padding: 4 */ /* forced alignments: 1 */ /* last cacheline: 16 bytes */ } __attribute__((__aligned__(8))); After: struct dso { struct mutex lock; /* 0 40 */ struct list_head node; /* 40 16 */ struct rb_node rb_node __attribute__((__aligned__(8))); /* 56 24 */ /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */ struct rb_root * root; /* 80 8 */ struct rb_root_cached symbols; /* 88 16 */ struct symbol * * symbol_names; /* 104 8 */ size_t symbol_names_len; /* 112 8 */ struct rb_root_cached inlined_nodes; /* 120 16 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ struct rb_root_cached srclines; /* 136 16 */ struct { u64 addr; /* 152 8 */ struct symbol * symbol; /* 160 8 */ } last_find_result; /* 152 16 */ struct build_id bid; /* 168 32 */ /* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */ u64 text_offset; /* 200 8 */ u64 text_end; /* 208 8 */ const char * short_name; /* 216 8 */ const char * long_name; /* 224 8 */ void * a2l; /* 232 8 */ char * symsrc_filename; /* 240 8 */ struct nsinfo * nsinfo; /* 248 8 */ /* --- cacheline 4 boundary (256 bytes) --- */ struct auxtrace_cache * auxtrace_cache; /* 256 8 */ union { void * priv; /* 264 8 */ u64 db_id; /* 264 8 */ }; /* 264 8 */ struct { struct perf_env * env; /* 272 8 */ u32 id; /* 280 4 */ u32 sub_id; /* 284 4 */ } bpf_prog; /* 272 16 */ struct { struct rb_root cache; /* 288 8 */ struct list_head open_entry; /* 296 16 */ u64 file_size; /* 312 8 */ /* --- cacheline 5 boundary (320 bytes) --- */ u64 elf_base_addr; /* 320 8 */ u64 debug_frame_offset; /* 328 8 */ u64 eh_frame_hdr_addr; /* 336 8 */ u64 eh_frame_hdr_offset; /* 344 8 */ int fd; /* 352 4 */ int status; /* 356 4 */ u32 status_seen; /* 360 4 */ } data; /* 288 80 */ /* XXX last struct has 4 bytes of padding */ struct dso_id id; /* 368 24 */ /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */ unsigned int a2l_fails; /* 392 4 */ int comp; /* 396 4 */ refcount_t refcnt; /* 400 4 */ enum dso_load_errno load_errno; /* 404 4 */ u16 long_name_len; /* 408 2 */ u16 short_name_len; /* 410 2 */ enum dso_binary_type symtab_type:8; /* 412: 0 4 */ enum dso_binary_type binary_type:8; /* 412: 8 4 */ enum dso_space_type kernel:2; /* 412:16 4 */ enum dso_swap_type needs_swap:2; /* 412:18 4 */ /* Bitfield combined with next fields */ _Bool is_kmod:1; /* 414: 4 1 */ u8 adjust_symbols:1; /* 414: 5 1 */ u8 has_build_id:1; /* 414: 6 1 */ u8 header_build_id:1; /* 414: 7 1 */ u8 has_srcline:1; /* 415: 0 1 */ u8 hit:1; /* 415: 1 1 */ u8 annotate_warned:1; /* 415: 2 1 */ u8 auxtrace_warned:1; /* 415: 3 1 */ u8 short_name_allocated:1; /* 415: 4 1 */ u8 long_name_allocated:1; /* 415: 5 1 */ u8 is_64_bit:1; /* 415: 6 1 */ /* XXX 1 bit hole, try to pack */ _Bool sorted_by_name; /* 416 1 */ _Bool loaded; /* 417 1 */ u8 rel; /* 418 1 */ char name[]; /* 419 0 */ /* size: 424, cachelines: 7, members: 48 */ /* sum members: 415 */ /* sum bitfield members: 31 bits, bit holes: 1, sum bit holes: 1 bits */ /* padding: 5 */ /* paddings: 1, sum paddings: 4 */ /* forced alignments: 1 */ /* last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240321160300.1635121-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf lock contention: Trim backtrace by skipping traceiter functionsAnne Macedo
The 'perf lock contention' program currently shows the caller of the locks as __traceiter_contention_begin+0x??. This caller can be ignored, as it is from the traceiter itself. Instead, it should show the real callers for the locks. When fiddling with the --stack-skip parameter, the actual callers for the locks start to show up. However, just ignore the __traceiter_contention_begin and the __traceiter_contention_end symbols so the actual callers will show up. Before this patch is applied: sudo perf lock con -a -b -- sleep 3 contended total wait max wait avg wait type caller 8 2.33 s 2.28 s 291.18 ms rwlock:W __traceiter_contention_begin+0x44 4 2.33 s 2.28 s 582.35 ms rwlock:W __traceiter_contention_begin+0x44 7 140.30 ms 46.77 ms 20.04 ms rwlock:W __traceiter_contention_begin+0x44 2 63.35 ms 33.76 ms 31.68 ms mutex trace_contention_begin+0x84 2 46.74 ms 46.73 ms 23.37 ms rwlock:W __traceiter_contention_begin+0x44 1 13.54 us 13.54 us 13.54 us mutex trace_contention_begin+0x84 1 3.67 us 3.67 us 3.67 us rwsem:R __traceiter_contention_begin+0x44 Before this patch is applied - using --stack-skip 5 sudo perf lock con --stack-skip 5 -a -b -- sleep 3 contended total wait max wait avg wait type caller 2 2.24 s 2.24 s 1.12 s rwlock:W do_epoll_wait+0x5a0 4 1.65 s 824.21 ms 412.08 ms rwlock:W do_exit+0x338 2 824.35 ms 824.29 ms 412.17 ms spinlock get_signal+0x108 2 824.14 ms 824.14 ms 412.07 ms rwlock:W release_task+0x68 1 25.22 ms 25.22 ms 25.22 ms mutex cgroup_kn_lock_live+0x58 1 24.71 us 24.71 us 24.71 us spinlock do_exit+0x44 1 22.04 us 22.04 us 22.04 us rwsem:R lock_mm_and_find_vma+0xb0 After this patch is applied: sudo ./perf lock con -a -b -- sleep 3 contended total wait max wait avg wait type caller 4 4.13 s 2.07 s 1.03 s rwlock:W release_task+0x68 2 2.07 s 2.07 s 1.03 s rwlock:R mm_update_next_owner+0x50 2 2.07 s 2.07 s 1.03 s rwlock:W do_exit+0x338 1 41.56 ms 41.56 ms 41.56 ms mutex cgroup_kn_lock_live+0x58 2 36.12 us 18.83 us 18.06 us rwlock:W do_exit+0x338 Signed-off-by: Anne Macedo <retpolanne@posteo.net> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240319143629.3422590-1-retpolanne@posteo.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Remove info metrics erroneously in TopdownL1Ian Rogers
Bug affected server metrics only. This doesn't impact default metrics but if the TopdownL1 metric group is specified. Passes on the fix in: https://github.com/intel/perfmon/commit/b09f0a3953234ec592b4a872b87764c78da05d8b Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update snowridgex to 1.22Ian Rogers
Update events from 1.21 to 1.22 as released in: https://github.com/intel/perfmon/commit/ba4f96039f96231b51e3eb69d5a21e2b00f6de5b Updates various descriptions and removes the event UNC_IIO_NUM_REQ_FROM_CPU.IRP. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-12-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update skylake to v58Ian Rogers
Update events from: https://github.com/intel/perfmon/commit/f2e5136e062a91ae554dc40530132e66f9271848 This change didn't increase the version number from v58. Updates various descriptions. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update skylakex to 1.33Ian Rogers
Update events from 1.32 to 1.33 as released in: https://github.com/intel/perfmon/commit/3fe7390dd18496c35ec3a9cf17de0473fd5485cb Various description updates. Adds the event OFFCORE_RESPONSE.ALL_READS.L3_HIT.HIT_OTHER_CORE_FWD. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update sierraforest to 1.02Ian Rogers
Update events from 1.01 to 1.02 as released in: https://github.com/intel/perfmon/commit/451dd41ae627b56433ad4065bf3632789eb70834 Various description updates. Adds topdown events TOPDOWN_BAD_SPECULATION.ALL_P, TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P and TOPDOWN_RETIRING.ALL_P. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update sapphirerapids to 1.20Ian Rogers
Update events from 1.17 to 1.20 as released in: https://github.com/intel/perfmon/commit/6f674057745acf0125395638ca6be36458a59bda Various description updates. Adds uncore events UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_LOCAL, UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_REMOTE, UNC_CHA_TOR_INSERTS.IO_ITOM_LOCAL, UNC_CHA_TOR_INSERTS.IO_ITOM_REMOTE, UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL, UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_REMOTE, UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOMCACHENEAR_LOCAL, UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOMCACHENEAR_REMOTE, UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOM_LOCAL, UNC_CHA_TOR_OCCUPANCY.IO_MISS_ITOM_REMOTE, UNC_CHA_TOR_OCCUPANCY.IO_MISS_PCIRDCUR_LOCAL, UNC_CHA_TOR_OCCUPANCY.IO_MISS_PCIRDCUR_REMOTE and removes core events AMX_OPS_RETIRED.BF16 and AMX_OPS_RETIRED.INT8. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update meteorlake to 1.08Ian Rogers
Update events from 1.07 to 1.08 as released in: https://github.com/intel/perfmon/commit/f0f8f3e163d9eb84e6ce8e2108a22cb43b2527e5 Various description updates. Adds topdown, offcore and uncore events OCR.DEMAND_DATA_RD.L3_HIT, OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD, OCR.DEMAND_RFO.L3_HIT, OCR.DEMAND_DATA_RD.L3_MISS, OCR.DEMAND_RFO.L3_MISS, OCR.DEMAND_DATA_RD.ANY_RESPONSE, OCR.DEMAND_DATA_RD.DRAM, OCR.DEMAND_RFO.ANY_RESPONSE, OCR.DEMAND_RFO.DRAM, TOPDOWN_BAD_SPECULATION.ALL_P, TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P, TOPDOWN_RETIRING.ALL_P, UNC_ARB_DAT_OCCUPANCY.RD and UNC_HAC_ARB_COH_TRK_REQUESTS.ALL. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update lunarlake to 1.01Ian Rogers
Update events from 1.00 to 1.01 as released in: https://github.com/intel/perfmon/commit/56ab8d837ac566d51a4d8748b6b4b817a22c9b84 Various encoding and description updates. Adds the events CPU_CLK_UNHALTED.CORE, CPU_CLK_UNHALTED.CORE_P, CPU_CLK_UNHALTED.REF_TSC_P, CPU_CLK_UNHALTED.THREAD, MISC_RETIRED.LBR_INSERTS, TOPDOWN_BAD_SPECULATION.ALL_P, TOPDOWN_BE_BOUND.ALL_P, TOPDOWN_FE_BOUND.ALL_P, TOPDOWN_RETIRING.ALL_P. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update icelakex to 1.24Ian Rogers
Update events from 1.23 to 1.24 as released in: https://github.com/intel/perfmon/commit/d883888ae60882028e387b6fe1ebf683beb693fa Fixes spelling and descriptions. Adds the uncore events UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL and UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_REMOTE, while removing UNC_IIO_NUM_REQ_FROM_CPU.IRP. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update grandridge to 1.02Ian Rogers
Update events from 1.01 to 1.02 as released in: https://github.com/intel/perfmon/commit/b2a81e803add1ba0af68a442c975683d226d868c Fixes spelling and descriptions. Adds topdown events and uncore cache UNC_CHA_TOR_OCCUPANCY.IA_HIT_DRD_OPT, UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_OPT, UNC_CHA_TOR_OCCUPANCY.IA_DRD_OPT. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update emeraldrapids to 1.06Ian Rogers
Update events from 1.03 to 1.96 as released in: https://github.com/intel/perfmon/commit/21a8be3ea7918749141db4036fb65a2343cd865d Fixes spelling and descriptions. Adds cache miss latency events UNC_CHA_TOR_(INSERTS|OCCUPANCY).IO_(PCIRDCUR|ITOM|ITOMCACHENEAR)_(LOCAL|REMOTE). Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf vendor events intel: Update cascadelakex to 1.21Ian Rogers
Update events from 1.20 to 1.21 as released in: https://github.com/intel/perfmon/commit/fcfdba3be8f3be81ad6b509fdebf953ead92dc2c Largely fixes spelling and descriptions. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20240321060016.1464787-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf probe: Add missing libgen.h header needed for using basename()Arnaldo Carvalho de Melo
This prototype is obtained indirectly, by luck, from some other header in probe-event.c in most systems, but recently exploded on alpine:edge: 8 13.39 alpine:edge : FAIL gcc version 13.2.1 20240309 (Alpine 13.2.1_git20240309) util/probe-event.c: In function 'convert_exec_to_group': util/probe-event.c:225:16: error: implicit declaration of function 'basename' [-Werror=implicit-function-declaration] 225 | ptr1 = basename(exec_copy); | ^~~~~~~~ util/probe-event.c:225:14: error: assignment to 'char *' from 'int' makes pointer from integer without a cast [-Werror=int-conversion] 225 | ptr1 = basename(exec_copy); | ^ cc1: all warnings being treated as errors make[3]: *** [/git/perf-6.8.0/tools/build/Makefile.build:158: util] Error 2 Fix it by adding the libgen.h header where basename() is prototyped. Fixes: fb7345bbf7fad9bf ("perf probe: Support basic dwarf-based operations on uprobe events") Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf trace: Fix 'newfstatat'/'fstatat' argument pretty printingArnaldo Carvalho de Melo
There were needless two entries, one for 'newfstatat' and another for 'fstatat', keep just one and pretty print its 'flags' argument using the fs_at_flags scnprintf that is also used by other FS syscalls such as 'stat', now: root@number:~# perf trace -e newfstatat --max-events=5 0.000 ( 0.010 ms): abrt-dump-jour/1400 newfstatat(dfd: 7, filename: "", statbuf: 0x7fff0d127000, flag: EMPTY_PATH) = 0 0.020 ( 0.003 ms): abrt-dump-jour/1400 newfstatat(dfd: 9, filename: "", statbuf: 0x55752507b0e8, flag: EMPTY_PATH) = 0 0.039 ( 0.004 ms): abrt-dump-jour/1400 newfstatat(dfd: 19, filename: "", statbuf: 0x557525061378, flag: EMPTY_PATH) = 0 0.047 ( 0.003 ms): abrt-dump-jour/1400 newfstatat(dfd: 20, filename: "", statbuf: 0x5575250b8cc8, flag: EMPTY_PATH) = 0 0.053 ( 0.003 ms): abrt-dump-jour/1400 newfstatat(dfd: 22, filename: "", statbuf: 0x5575250535d8, flag: EMPTY_PATH) = 0 root@number:~# Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-6-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf trace: Beautify the 'flags' arg of unlinkatArnaldo Carvalho de Melo
Reusing the fs_at_flags array done for the 'stat' syscall. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf beauty: Introduce faccessat2 flags scnprintf routineArnaldo Carvalho de Melo
The fsaccessat and fsaccessat2 now have beautifiers for its arguments. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf beauty: Introduce scrape script for the 'statx' syscall 'mask' argumentArnaldo Carvalho de Melo
It was using the first variation on producing a string representation for a binary flag, one that used the system's stat.h and preprocessor tricks that had to be updated everytime a new flag was introduced. Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo. $ tools/perf/trace/beauty/statx_mask.sh static const char *statx_mask[] = { [ilog2(0x00000001) + 1] = "TYPE", [ilog2(0x00000002) + 1] = "MODE", [ilog2(0x00000004) + 1] = "NLINK", [ilog2(0x00000008) + 1] = "UID", [ilog2(0x00000010) + 1] = "GID", [ilog2(0x00000020) + 1] = "ATIME", [ilog2(0x00000040) + 1] = "MTIME", [ilog2(0x00000080) + 1] = "CTIME", [ilog2(0x00000100) + 1] = "INO", [ilog2(0x00000200) + 1] = "SIZE", [ilog2(0x00000400) + 1] = "BLOCKS", [ilog2(0x00000800) + 1] = "BTIME", [ilog2(0x00001000) + 1] = "MNT_ID", [ilog2(0x00002000) + 1] = "DIOALIGN", [ilog2(0x00004000) + 1] = "MNT_ID_UNIQUE", }; $ Now we need a copy of uapi/linux/stat.h from tools/include/ in the scrape only directory tools/perf/trace/beauty/include. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-3-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf beauty: Introduce scrape script for various fs syscalls 'flags' argumentsArnaldo Carvalho de Melo
It was using the first variation on producing a string representation for a binary flag, one that used the system's fcntl.h and preprocessor tricks that had to be updated everytime a new flag was introduced. Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo. $ tools/perf/trace/beauty/fs_at_flags.sh static const char *fs_at_flags[] = { [ilog2(0x100) + 1] = "SYMLINK_NOFOLLOW", [ilog2(0x200) + 1] = "REMOVEDIR", [ilog2(0x400) + 1] = "SYMLINK_FOLLOW", [ilog2(0x800) + 1] = "NO_AUTOMOUNT", [ilog2(0x1000) + 1] = "EMPTY_PATH", [ilog2(0x0000) + 1] = "STATX_SYNC_AS_STAT", [ilog2(0x2000) + 1] = "STATX_FORCE_SYNC", [ilog2(0x4000) + 1] = "STATX_DONT_SYNC", [ilog2(0x8000) + 1] = "RECURSIVE", [ilog2(0x80000000) + 1] = "GETATTR_NOSEC", }; $ Now we need a copy of uapi/linux/fcntl.h from tools/include/ in the scrape only directory tools/perf/trace/beauty/include and will use that fs_at_flags array for other fs syscalls. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20240320193115.811899-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf tests: Run tests in parallel by defaultIan Rogers
Switch from running tests sequentially to running in parallel by default. Change the opt-in '-p' or '--parallel' flag to '-S' or '--sequential'. On an 8 core tigerlake an address sanitizer run time changes from: 326.54user 622.73system 6:59.91elapsed 226%CPU to: 973.02user 583.98system 3:01.17elapsed 859%CPU So over twice as fast, saving 4 minutes. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240301174711.2646944-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf help: Lower levenshtein penality for deleting characterIan Rogers
The levenshtein penalty for deleting a character was far higher than subsituting or inserting a character. Lower the penalty to match that of inserting a character. Before: $ perf recccord perf: 'recccord' is not a perf-command. See 'perf --help'. $ After: $ perf recccord perf: 'recccord' is not a perf-command. See 'perf --help'. Did you mean this? record $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240301201306.2680986-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf tools: Suggest inbuilt commands for unknown commandIan Rogers
The existing unknown command code looks for perf scripts like perf-archive.sh and perf-iostat.sh, however, inbuilt commands aren't suggested. Add the inbuilt commands so they may be suggested too. Before: $ perf reccord perf: 'reccord' is not a perf-command. See 'perf --help'. $ After: $ perf reccord perf: 'reccord' is not a perf-command. See 'perf --help'. Did you mean this? record $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240301201306.2680986-1-irogers@google.com [ Added some fixes from Ian to problems I noticed while testing ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf test: Read child test 10 times a second rather than 1Ian Rogers
Make the perf test output smoother by timing out the poll of the child process after 100ms rather than 1s. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf test: Use a single fd for the child process out/errIan Rogers
Switch from dumping err then out, to a single file descriptor for both of them. This allows the err and output to be correctly interleaved in verbose output. Fixes: b482f5f8e0168f1e ("perf tests: Add option to run tests in parallel") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf test: Stat output per thread of just the parent processIan Rogers
Per-thread mode requires either system-wide (-a), a pid (-p) or a tid (-t). The stat output tests were using system-wide mode but this is racy when threads are starting and exiting - something that happens a lot when running the tests in parallel (perf test -p). Avoid the race conditions by using pid mode with the pid of the parent process. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf record: Delete session after stopping sideband threadIan Rogers
The session has a header in it which contains a perf env with bpf_progs. The bpf_progs are accessed by the sideband thread and so the sideband thread must be stopped before the session is deleted, to avoid a use after free. This error was detected by AddressSanitizer in the following: ==2054673==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d000161e00 at pc 0x55769289de54 bp 0x7f9df36d4ab0 sp 0x7f9df36d4aa8 READ of size 8 at 0x61d000161e00 thread T1 #0 0x55769289de53 in __perf_env__insert_bpf_prog_info util/env.c:42 #1 0x55769289dbb1 in perf_env__insert_bpf_prog_info util/env.c:29 #2 0x557692bbae29 in perf_env__add_bpf_info util/bpf-event.c:483 #3 0x557692bbb01a in bpf_event__sb_cb util/bpf-event.c:512 #4 0x5576928b75f4 in perf_evlist__poll_thread util/sideband_evlist.c:68 #5 0x7f9df96a63eb in start_thread nptl/pthread_create.c:444 #6 0x7f9df9726a4b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 0x61d000161e00 is located 384 bytes inside of 2136-byte region [0x61d000161c80,0x61d0001624d8) freed by thread T0 here: #0 0x7f9dfa6d7288 in __interceptor_free libsanitizer/asan/asan_malloc_linux.cpp:52 #1 0x557692978d50 in perf_session__delete util/session.c:319 #2 0x557692673959 in __cmd_record tools/perf/builtin-record.c:2884 #3 0x55769267a9f0 in cmd_record tools/perf/builtin-record.c:4259 #4 0x55769286710c in run_builtin tools/perf/perf.c:349 #5 0x557692867678 in handle_internal_command tools/perf/perf.c:402 #6 0x557692867a40 in run_argv tools/perf/perf.c:446 #7 0x557692867fae in main tools/perf/perf.c:562 #8 0x7f9df96456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 Fixes: 657ee5531903339b ("perf evlist: Introduce side band thread") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240301074639.2260708-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf tools: Add/use PMU reverse lookup from config to nameIan Rogers
Add perf_pmu__name_from_config that does a reverse lookup from a config number to an alias name. The lookup is expensive as the config is computed for every alias by filling in a perf_event_attr, but this is only done when verbose output is enabled. The lookup also only considers config, and not config1, config2 or config3. An example of the output: $ perf stat -vv -e data_read true ... perf_event_attr: type 24 (uncore_imc_free_running_0) size 136 config 0x20ff (data_read) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ... Committer notes: Fix the python binding build by adding dummies for not strictly needed perf_pmu__name_from_config() and perf_pmus__find_by_type(). Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf tools: Use pmus to describe type from attributeIan Rogers
When dumping a perf_event_attr, use pmus to find the PMU and its name by the type number. This allows dynamically added PMUs to be described. Before: $ perf stat -vv -e data_read true ... perf_event_attr: type 24 size 136 config 0x20ff sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ... After: $ perf stat -vv -e data_read true ... perf_event_attr: type 24 (uncore_imc_free_running_0) size 136 config 0x20ff sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ... However, it also means that when we have a PMU name we prefer it to a hard coded name: Before: $ perf stat -vv -e faults true ... perf_event_attr: type 1 (PERF_TYPE_SOFTWARE) size 136 config 0x2 (PERF_COUNT_SW_PAGE_FAULTS) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ... After: $ perf stat -vv -e faults true ... perf_event_attr: type 1 (software) size 136 config 0x2 (PERF_COUNT_SW_PAGE_FAULTS) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 exclude_guest 1 ... It feels more consistent to do this, rather than only prefer a PMU name when a hard coded name isn't available. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf list: Give more details about raw event encodingsIan Rogers
List all the PMUs, not just the first core one, and list real format specifiers with value ranges. Before: $ perf list ... rNNN [Raw hardware event descriptor] cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] [(see 'man perf-list' on how to encode it)] mem:<addr>[/len][:access] [Hardware breakpoint] ... After: $ perf list ... rNNN [Raw event descriptor] cpu/event=0..255,pc,edge,.../modifier [Raw event descriptor] [(see 'man perf-list' or 'man perf-record' on how to encode it)] breakpoint//modifier [Raw event descriptor] cstate_core/event=0..0xffffffffffffffff/modifier [Raw event descriptor] cstate_pkg/event=0..0xffffffffffffffff/modifier [Raw event descriptor] i915/i915_eventid=0..0x1fffff/modifier [Raw event descriptor] intel_bts//modifier [Raw event descriptor] intel_pt/ptw,event,cyc_thresh=0..15,.../modifier [Raw event descriptor] kprobe/retprobe/modifier [Raw event descriptor] msr/event=0..0xffffffffffffffff/modifier [Raw event descriptor] power/event=0..255/modifier [Raw event descriptor] software//modifier [Raw event descriptor] tracepoint//modifier [Raw event descriptor] uncore_arb/event=0..255,edge,inv,.../modifier [Raw event descriptor] uncore_cbox/event=0..255,edge,inv,.../modifier [Raw event descriptor] uncore_clock/event=0..255/modifier [Raw event descriptor] uncore_imc_free_running/event=0..255,umask=0..255/modifier[Raw event descriptor] uprobe/ref_ctr_offset=0..0xffffffff,retprobe/modifier[Raw event descriptor] mem:<addr>[/len][:access] [Hardware breakpoint] ... With '--details' provide more details on the formats encoding: cpu/event=0..255,pc,edge,.../modifier [Raw event descriptor] [(see 'man perf-list' or 'man perf-record' on how to encode it)] cpu/event=0..255,pc,edge,offcore_rsp=0..0xffffffffffffffff,ldlat=0..0xffff,inv, umask=0..255,frontend=0..0xffffff,cmask=0..255,config=0..0xffffffffffffffff, config1=0..0xffffffffffffffff,config2=0..0xffffffffffffffff,config3=0..0xffffffffffffffff, name=string,period=number,freq=number,branch_type=(u|k|hv|any|...),time, call-graph=(fp|dwarf|lbr),stack-size=number,max-stack=number,nr=number,inherit,no-inherit, overwrite,no-overwrite,percore,aux-output,aux-sample-size=number/modifier breakpoint//modifier [Raw event descriptor] breakpoint//modifier cstate_core/event=0..0xffffffffffffffff/modifier [Raw event descriptor] cstate_core/event=0..0xffffffffffffffff/modifier cstate_pkg/event=0..0xffffffffffffffff/modifier [Raw event descriptor] cstate_pkg/event=0..0xffffffffffffffff/modifier i915/i915_eventid=0..0x1fffff/modifier [Raw event descriptor] i915/i915_eventid=0..0x1fffff/modifier intel_bts//modifier [Raw event descriptor] intel_bts//modifier intel_pt/ptw,event,cyc_thresh=0..15,.../modifier [Raw event descriptor] intel_pt/ptw,event,cyc_thresh=0..15,pt,notnt,branch,tsc,pwr_evt,fup_on_ptw,cyc,noretcomp, mtc,psb_period=0..15,mtc_period=0..15/modifier kprobe/retprobe/modifier [Raw event descriptor] kprobe/retprobe/modifier msr/event=0..0xffffffffffffffff/modifier [Raw event descriptor] msr/event=0..0xffffffffffffffff/modifier power/event=0..255/modifier [Raw event descriptor] power/event=0..255/modifier software//modifier [Raw event descriptor] software//modifier tracepoint//modifier [Raw event descriptor] tracepoint//modifier uncore_arb/event=0..255,edge,inv,.../modifier [Raw event descriptor] uncore_arb/event=0..255,edge,inv,umask=0..255,cmask=0..31/modifier uncore_cbox/event=0..255,edge,inv,.../modifier [Raw event descriptor] uncore_cbox/event=0..255,edge,inv,umask=0..255,cmask=0..31/modifier uncore_clock/event=0..255/modifier [Raw event descriptor] uncore_clock/event=0..255/modifier uncore_imc_free_running/event=0..255,umask=0..255/modifier[Raw event descriptor] uncore_imc_free_running/event=0..255,umask=0..255/modifier uprobe/ref_ctr_offset=0..0xffffffff,retprobe/modifier[Raw event descriptor] uprobe/ref_ctr_offset=0..0xffffffff,retprobe/modifier Committer notes: Address this build error in various distros: 55 58.44 ubuntu:24.04 : FAIL gcc version 13.2.0 (Ubuntu 13.2.0-17ubuntu2) util/pmu.c:1638:70: error: '_Static_assert' with no message is a C2x extension [-Werror,-Wc2x-extensions] 1638 | _Static_assert(ARRAY_SIZE(terms) == __PARSE_EVENTS__TERM_TYPE_NR - 6); | ^ | , "" 1 error generated. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf list: Allow wordwrap to wrap on commasIan Rogers
A raw event encoding may be a block with terms separated by commas. If wrapping such a string it would be useful to break at the commas, so add this ability to wordwrap. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf pmu: Drop "default_core" from alias namesIan Rogers
"default_core" is used by jevents.py for json events' PMU name when none is specified. On x86 the "default_core" is typically the PMU "cpu". When creating an alias see if the event's PMU name is "default_core" in which case don't record it. This means in places like "perf list" the PMU's name will be used in its place. Before: $ perf list --details ... cache: l1d.replacement [Counts the number of cache lines replaced in L1 data cache] default_core/event=0x51,period=0x186a3,umask=0x1/ ... After: $ perf list --details ... cache: l1d.replacement [Counts the number of cache lines replaced in L1 data cache. Unit: cpu] cpu/event=0x51,period=0x186a3,umask=0x1/ ... Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf list: Add tracepoint encoding to detailed outputIan Rogers
The tracepoint id holds the config value and is probed in determining what an event is. Add reading of the id so that we can display the event encoding as: $ perf list --details ... alarmtimer:alarmtimer_cancel [Tracepoint event] tracepoint/config=0x18c/ ... Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20240308001915.4060155-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf beauty: Introduce scrape script for 'clone' syscall 'flags' argumentArnaldo Carvalho de Melo
It was using the first variation on producing a string representation for a binary flag, one that used the copy of uapi/linux/sched.h with preprocessor tricks that had to be updated everytime a new flag was introduced. Use the more recent scrape script + strarray + strarray__scnprintf_flags() combo. $ tools/perf/trace/beauty/clone.sh | head -5 static const char *clone_flags[] = { [ilog2(0x00000100) + 1] = "VM", [ilog2(0x00000200) + 1] = "FS", [ilog2(0x00000400) + 1] = "FILES", [ilog2(0x00000800) + 1] = "SIGHAND", $ Now we can move uapi/linux/sched.h from tools/include/, that is used for building perf to the scrape only directory tools/perf/trace/beauty/include. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZfnULIn3XKDq0bpc@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Do not retry for invalid typesNamhyung Kim
In some cases, it was able to find a type or location info (for per-cpu variable) but cannot match because of invalid offset or missing global information. In those cases, it's meaningless to go to the outer scope and retry because there will be no additional information. Let's change the return type of find_matching_type() and bail out if it returns -1 for the cases. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-24-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Add a cache for global variable typesNamhyung Kim
They are often searched by many different places. Let's add a cache for them to reduce the duplicate DWARF access. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-23-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Add stack canary typeNamhyung Kim
When the stack protector is enabled, compiler would generate code to check stack overflow with a special value called 'stack carary' at runtime. On x86_64, GCC hard-codes the stack canary as %gs:40. While there's a definition of fixed_percpu_data in asm/processor.h, it seems that the header is not included everywhere and many places it cannot find the type info. As it's in the well-known location (at %gs:40), let's add a pseudo stack canary type to handle it specially. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-22-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Handle ADD instructionsNamhyung Kim
There are different patterns for percpu variable access using a constant value added to the base.  2aeb:  mov    -0x7da0f7e0(,%rax,8),%r14 # r14 = __per_cpu_offset[cpu]  2af3:  mov    $0x34740,%rax # rax = address of runqueues * 2afa:  add    %rax,%r14 # r14 = &per_cpu(runqueues, cpu)  2bfd:  cmpl   $0x0,0x10(%r14) # cpu_rq(cpu)->has_blocked_load  2b03:  je     0x2b36 At the first instruction, r14 has the __per_cpu_offset. And then rax has an immediate value and then added to r14 to calculate the address of a per-cpu variable. So it needs to track the immediate values and ADD instructions. Similar but a little different case is to use "this_cpu_off" instead of "__per_cpu_offset" for the current CPU. This time the variable address comes with PC-rel addressing. 89: mov $0x34740,%rax # rax = address of runqueues * 90: add %gs:0x7f015f60(%rip),%rax # 19a78 <this_cpu_off> 98: incl 0xd8c(%rax) # cpu_rq(cpu)->sched_count Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-21-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Support general per-cpu accessNamhyung Kim
This is to support per-cpu variable access often without a matching DWARF entry. For some reason, I cannot find debug info of per-cpu variables sometimes. They have more complex pattern to calculate the address of per-cpu variables like below. 2b7d: mov -0x1e0(%rbp),%rax ; rax = cpu 2b84: mov -0x7da0f7e0(,%rax,8),%rcx ; rcx = __per_cpu_offset[cpu] * 2b8c: mov 0x34870(%rcx),%rax ; *(__per_cpu_offset[cpu] + 0x34870) Let's assume the rax register has a number for a CPU at 2b7d. The next instruction is to get the per-cpu offset' for that cpu. The offset -0x7da0f7e0 is 0xffffffff825f0820 in u64 which is the address of the '__per_cpu_offset' array in my system. So it'd get the actual offset of that CPU's per-cpu region and save it to the rcx register. Then, at 2b8c, accesses using rcx can be handled same as the global variable access. To handle this case, it should check if the offset of the instruction matches to the address of '__per_cpu_offset'. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-20-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Track instructions with a this-cpu variableNamhyung Kim
Like global variables, this per-cpu variables should be tracked correctly. Factor our get_global_var_type() to handle both global and per-cpu (for this cpu) variables in the same manner. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-19-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Handle this-cpu variables in kernelNamhyung Kim
On x86, the kernel gets the current task using the current macro like below: #define current get_current() static __always_inline struct task_struct *get_current(void) { return this_cpu_read_stable(pcpu_hot.current_task); } So it returns the current_task field of struct pcpu_hot which is the first member. On my build, it's located at 0x32940. $ nm vmlinux | grep pcpu_hot 0000000000032940 D pcpu_hot And the current macro generates the instructions like below: mov %gs:0x32940, %rcx So the %gs segment register points to the beginning of the per-cpu region of this cpu and it points the variable with a constant. Let's update the instruction location info to have a segment register and handle %gs in kernel to look up a global variable. Pretend it as a global variable by changing the register number to DWARF_REG_PC. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-18-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate: Parse x86 segment register locationNamhyung Kim
Add a segment field in the struct annotated_insn_loc and save it for the segment based addressing like %gs:0x28. For simplicity it now handles %gs register only. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-17-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Check register state for typeNamhyung Kim
As instruction tracking updates the type state for each register, check the final type info for the target register at the given instruction. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-16-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Implement instruction trackingNamhyung Kim
If it failed to find a variable for the location directly, it might be due to a missing variable in the source code. For example, accessing pointer variables in a chain can result in the case like below: struct foo *foo = ...; int i = foo->bar->baz; The DWARF debug information is created for each variable so it'd have one for 'foo'. But there's no variable for 'foo->bar' and then it cannot know the type of 'bar' and 'baz'. The above source code can be compiled to the follow x86 instructions: mov 0x8(%rax), %rcx mov 0x4(%rcx), %rdx <=== PMU sample mov %rdx, -4(%rbp) Let's say 'foo' is located in the %rax and it has a pointer to struct foo. But perf sample is captured in the second instruction and there is no variable or type info for the %rcx. It'd be great if compiler could generate debug info for %rcx, but we should handle it on our side. So this patch implements the logic to iterate instructions and update the type table for each location. As it already collected a list of scopes including the target instruction, we can use it to construct the type table smartly. +---------------- scope[0] subprogram | | +-------------- scope[1] lexical_block | | | | +------------ scope[2] inlined_subroutine | | | | | | +---------- scope[3] inlined_subroutine | | | | | | | | +-------- scope[4] lexical_block | | | | | | | | | | *** target instruction ... Image the target instruction has 5 scopes, each scope will have its own variables and parameters. Then it can start with the innermost scope (4). So it'd search the shortest path from the start of scope[4] to the target address and build a list of basic blocks. Then it iterates the basic blocks with the variables in the scope and update the table. If it finds a type at the target instruction, then returns it. Otherwise, it moves to the upper scope[3]. Now it'd search the shortest path from the start of scope[3] to the start of scope[4]. Then connect it to the existing basic block list. Then it'd iterate the blocks with variables for both scopes. It can repeat this until it finds a type at the target instruction or reaches to the top scope[0]. As the basic blocks contain the shortest path, it won't worry about branches and can update the table simply. The final check will be done by find_matching_type() in the next patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-15-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>