summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2024-08-12perf lock: Use perf_tool__init()Ian Rogers
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const and not relying on perf_tool__fill_defaults(). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf kvm: Use perf_tool__init()Ian Rogers
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const and not relying on perf_tool__fill_defaults(). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf buildid-list: Use perf_tool__initIan Rogers
Reduce scope of build_id__mark_dso_hit_ops() to the scope of function perf_session__list_build_ids, its only use, and use perf_tool__init() for the default values. Move perf_event__exit_del_thread() to event.[ch] so it can be used in builtin-buildid-list.c. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf kmem: Use perf_tool__initIan Rogers
Reduce the scope of the tool from global/static to just that of the cmd_kmem function where the session is scoped. Use the perf_tool__init() to initialize default values. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf tool: Add perf_tool__init()Ian Rogers
Add init function that behaves like perf_tool__fill_defaults() but assumes all values haven't been initialized. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf tool: Move fill defaults into tool.cIan Rogers
The aim here is to eventually make perf_tool__fill_defaults() an init function so that the tools struct is more const. Create a tool.c to go along with tool.h. Move perf_tool__fill_defaults() out of session.c into tool.c along with the default stub values. Add perf_tool__compressed_is_stub() for a test in perf_session__process_user_event(). perf_session__process_compressed_event() is only used from being default initialized so migrate into tool.c. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf tool: Constify tool pointersIan Rogers
The tool pointer (to a struct largely of function pointers) is passed around but is unchanged except at initialization. Change parameter and variable types to be const to lower the possibilities of what could happen with a tool. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf s390-cpumsf: Remove unused structIan Rogers
struct s390_cpumsf_synth was likely cargo culted from other auxtrace examples. It has no users, so remove. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf auxtrace: Remove dummy toolsIan Rogers
Add perf_session__deliver_synth_attr_event that synthesizes a perf_record_header_attr event with one id. Remove use of perf_event__synthesize_attr that necessitates the use of the dummy tool in order to pass the session. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf inject: Fix leader sampling inserting additional samplesIan Rogers
The processing of leader samples would turn an individual sample with a group of read values into multiple samples. 'perf inject' would pass through the additional samples increasing the output data file size: $ perf record -g -e "{instructions,cycles}:S" -o perf.orig.data true $ perf script -D -i perf.orig.data | sed -e 's/perf.orig.data/perf.data/g' > orig.txt $ perf inject -i perf.orig.data -o perf.new.data $ perf script -D -i perf.new.data | sed -e 's/perf.new.data/perf.data/g' > new.txt $ diff -u orig.txt new.txt --- orig.txt 2024-07-29 14:29:40.606576769 -0700 +++ new.txt 2024-07-29 14:30:04.142737434 -0700 ... -0xc550@perf.data [0x30]: event: 3 +0xc550@perf.data [0xd0]: event: 9 +. +. ... raw event: size 208 bytes +. 0000: 09 00 00 00 01 00 d0 00 fc 72 01 86 ff ff ff ff .........r...... +. 0010: 74 7d 2c 00 74 7d 2c 00 fb c3 79 f9 ba d5 05 00 t},.t},...y..... +. 0020: e6 cb 1a 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ +. 0030: 02 00 00 00 00 00 00 00 76 01 00 00 00 00 00 00 ........v....... +. 0040: e6 cb 1a 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ +. 0050: 62 18 00 00 00 00 00 00 f6 cb 1a 00 00 00 00 00 b............... +. 0060: 00 00 00 00 00 00 00 00 0c 00 00 00 00 00 00 00 ................ +. 0070: 80 ff ff ff ff ff ff ff fc 72 01 86 ff ff ff ff .........r...... +. 0080: f3 0e 6e 85 ff ff ff ff 0c cb 7f 85 ff ff ff ff ..n............. +. 0090: bc f2 87 85 ff ff ff ff 44 af 7f 85 ff ff ff ff ........D....... +. 00a0: bd be 7f 85 ff ff ff ff 26 d0 7f 85 ff ff ff ff ........&....... +. 00b0: 6d a4 ff 85 ff ff ff ff ea 00 20 86 ff ff ff ff m......... ..... +. 00c0: 00 fe ff ff ff ff ff ff 57 14 4f 43 fc 7e 00 00 ........W.OC.~.. + +1642373909693435 0xc550 [0xd0]: PERF_RECORD_SAMPLE(IP, 0x1): 2915700/2915700: 0xffffffff860172fc period: 1 addr: 0 +... FP chain: nr:12 +..... 0: ffffffffffffff80 +..... 1: ffffffff860172fc +..... 2: ffffffff856e0ef3 +..... 3: ffffffff857fcb0c +..... 4: ffffffff8587f2bc +..... 5: ffffffff857faf44 +..... 6: ffffffff857fbebd +..... 7: ffffffff857fd026 +..... 8: ffffffff85ffa46d +..... 9: ffffffff862000ea +..... 10: fffffffffffffe00 +..... 11: 00007efc434f1457 +... sample_read: +.... group nr 2 +..... id 00000000001acbe6, value 0000000000000176, lost 0 +..... id 00000000001acbf6, value 0000000000001862, lost 0 + +0xc620@perf.data [0x30]: event: 3 ... This behavior is incorrect as in the case above 'perf inject' should have done nothing. Fix this behavior by disabling separating samples for a tool that requests it. Only request this for `perf inject` so as to not affect other perf tools. With the patch and the test above there are no differences between the orig.txt and new.txt. Fixes: e4caec0d1af3d608 ("perf evsel: Add PERF_SAMPLE_READ sample related processing") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240729220620.2957754-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf annotate-data: Show first-level children by default in TUINamhyung Kim
Now default is to fold everything but it only shows the name of the top-level data type which is not very useful. Instead just expand the top level entry so that it can show the layout at a higher level. Annotate type: 'struct task_struct' (4 samples) Percent Offset Size Field - 100.00 0 9792 struct task_struct { ◆ + 0.50 0 24 struct thread_info thread_info; ▒ 0.00 24 4 unsigned int __state; ▒ 0.00 32 8 void* stack; ▒ + 0.00 40 4 refcount_t usage; ▒ 0.00 44 4 unsigned int flags; ▒ 0.00 48 4 unsigned int ptrace; ▒ 0.00 52 4 int on_cpu; ▒ + 0.00 56 16 struct __call_single_node wake_entry; ▒ 0.00 72 4 unsigned int wakee_flips; ▒ 0.00 80 8 long unsigned int wakee_flip_decay_ts;▒ 0.00 88 8 struct task_struct* last_wakee; ▒ 0.00 96 4 int recent_used_cpu; ▒ 0.00 100 4 int wake_cpu; ▒ 0.00 104 4 int on_rq; ▒ 0.00 108 4 int prio; ▒ 0.00 112 4 int static_prio; ▒ 0.00 116 4 int normal_prio; ▒ 0.00 120 4 unsigned int rt_priority; ▒ + 0.00 128 256 struct sched_entity se; ▒ + 0.00 384 48 struct sched_rt_entity rt; ▒ + 0.00 432 224 struct sched_dl_entity dl; ▒ 0.00 656 8 struct sched_class* sched_class; ▒ ... Committer testing: # perf mem record -a sleep 5s # perf annotate --group --data-type=pthread_mutex_t Annotate type: 'pthread_mutex_t' (13 samples) Percent Offset Size Field - 100.00 0 40 pthread_mutex_t { ▒ - 100.00 0 40 struct __pthread_mutex_s __data { ▒ 39.45 0 4 int __lock; ▒ 0.00 4 4 unsigned int __count; ▒ 7.80 8 4 int __owner; ▒ 6.88 12 4 unsigned int __nusers; ▒ 45.87 16 4 int __kind; ▒ 0.00 20 2 short int __spins; ▒ 0.00 22 2 short int __elision; ▒ + 0.00 24 16 __pthread_list_t __list; ▒ }; ▒ 0.00 0 0 char[] __size; ▒ 39.45 0 8 long int __align; Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812194447.2049187-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf annotate-data: Implement folding in TUI browserNamhyung Kim
Like 'perf report', use 'e' or 'E' key to toggle folding the current entry so that it can control displaying child entries. Note I didn't add the 'c' and 'C' key to collapse the entry because it's also handled with the 'e'/'E' since it toggles the state. Committer testing: Do some 'perf mem record' for some workload of the whole system, using the target options, as usual (--pid/-p, -C/--cpu, -a for the system wide profiling, etc) and then: # perf annotate --skip-empty --data-type=pthread_mutex_t That, by default, will start as --tui, then press 'E' to see the whole struct unfolded, etc. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812194447.2049187-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf annotate-data: Support folding in TUI browserNamhyung Kim
Like in the hists browser, it should support folding current entry so that it can hide unwanted details in some data structures. The folded entries will be displayed with the '+' sign, while unfolded entries will have the '-' sign. Entries that have no children will not show any signs. Annotate type: 'struct socket' (1 samples) Percent Offset Size Field - 100.00 0 128 struct socket { ◆ 0.00 0 4 socket_state state; ▒ 0.00 4 2 short int type; ▒ 0.00 8 8 long unsigned int flags; ▒ 0.00 16 8 struct file* file; ▒ 100.00 24 8 struct sock* sk; ▒ 0.00 32 8 struct proto_ops* ops; ▒ - 0.00 64 64 struct socket_wq wq { ▒ - 0.00 64 24 wait_queue_head_t wait { ▒ + 0.00 64 4 spinlock_t lock; ▒ - 0.00 72 16 struct list_head head { ▒ 0.00 72 8 struct list_head* next; ▒ 0.00 80 8 struct list_head* prev; ▒ }; ▒ }; ▒ 0.00 88 8 struct fasync_struct* fasync_list; ▒ 0.00 96 8 long unsigned int flags; ▒ + 0.00 104 16 struct callback_head rcu; ▒ }; ▒ }; ▒ This just adds the display logic for folding, actually folding action will be implemented in the next patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812194447.2049187-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf vendor events: SKX, CLX, SNR uncore cache event fixesIan Rogers
Cache home agent (CHA) events were setting the low rather than high config1 bits. SNR was using CLX CHA events, however its CHA is similar to ICX so remove the events. Incorporate the updates in: https://github.com/intel/perfmon/pull/215 https://github.com/intel/perfmon/pull/216 Fixes: 4cc49942444e958b ("perf vendor events: Update cascadelakex events/metrics") Closes: https://lore.kernel.org/linux-perf-users/CAPhsuW4nem9XZP+b=sJJ7kqXG-cafz0djZf51HsgjCiwkGBA+A@mail.gmail.com/ Reported-by: Song Liu <song@kernel.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Co-authored-by: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240811042004.421869-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf lock contention: Change stack_id type to s32Namhyung Kim
The bpf_get_stackid() helper returns a signed type to check whether it failed to get a stacktrace or not. But it saved the result in u32 and checked if the value is negative. 376 if (needs_callstack) { 377 pelem->stack_id = bpf_get_stackid(ctx, &stacks, 378 BPF_F_FAST_STACK_CMP | stack_skip); --> 379 if (pelem->stack_id < 0) ./tools/perf/util/bpf_skel/lock_contention.bpf.c:379 contention_begin() warn: unsigned 'pelem->stack_id' is never less than zero. Let's change the type to s32 instead. Fixes: 6d499a6b3d90277d ("perf lock: Print the number of lost entries for BPF") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812172533.2015291-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf annotate-data: Fix a buffer overflow in TUI browserNamhyung Kim
In get_member_overhead(), k is updated when it has a entry in the histogram. But the entry->hists array is allocated with the number of evsel in the group. So the k should be reset when it iterates the event using for_each_group_evsel(), otherwise it'd crash due to a buffer overflow. Fixes: cb1898f58e0f175d ("perf annotate-data: Support --skip-empty option") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240810191502.1947959-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf docs: Refine the description for the buffer sizeLeo Yan
Current description for the AUX trace buffer size is misleading. When a user specifies the option '-m,512M', it represents a size value in bytes (512MiB) but not 512M pages (512M x 4KiB regard to a page of 4KiB). Make the document clear that the normal buffer and the AUX tracing buffer share the same semantics. Syncs the documents for consistent text. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812093459.2575278-1-leo.yan@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf script: add --addr2line optionMartin Liška
Similarly to other subcommands (like report, top), it would be handy to provide a path for addr2line command. Signed-off-by: Martin Liska <martin.liska@hey.com> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/eadc3e36-029d-4848-9d69-272fe5a83a26@foxlink.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf tests pmu: Initialize all fields of test_pmu variableArnaldo Carvalho de Melo
Instead of explicitely initializing just the .name and .alias_name, use struct member named initialization of just the non-null -name field, the compiler will initialize all the other non-explicitely initialized fields to NULL. This makes the code more robust, avoiding the error recently fixed when the .alias_name was used and contained a random value. Reviewed-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Radostin Stoyanov <rstoyano@redhat.com> Link: https://lore.kernel.org/lkml/e26941f9-f86c-4f2e-b812-20c49fb2c0d3@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09perf daemon: Fix the build on 32-bit architecturesArnaldo Carvalho de Melo
Noticed with: 1 6.22 debian:experimental-x-mipsel : FAIL gcc version 13.2.0 (Debian 13.2.0-25) builtin-daemon.c: In function 'cmd_session_list': builtin-daemon.c:691:35: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'time_t' {aka 'long long int'} [-Werror=format=] Use inttypes.h's PRIu64 to deal with that. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/ZplvH21aQ8pzmza_@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-09perf annotate-data: Support --skip-empty optionNamhyung Kim
The --skip-empty option is to hide dummy events in a group. Like other output mode in 'perf report' and 'perf annotate', the data-type profiling output should support the option. Committer testing: With dummy: root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24 Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples): event[0] = cpu_atom/mem-loads,ldlat=30/P event[1] = cpu_atom/mem-stores/P event[2] = dummy:u ============================================================================ Percent offset size field 100.00 100.00 0.00 0 40 pthread_mutex_t { 100.00 100.00 0.00 0 40 struct __pthread_mutex_s __data { 45.21 84.54 0.00 0 4 int __lock; 0.00 0.00 0.00 4 4 unsigned int __count; 0.00 1.83 0.00 8 4 int __owner; 5.19 10.65 0.00 12 4 unsigned int __nusers; 49.61 2.97 0.00 16 4 int __kind; 0.00 0.00 0.00 20 2 short int __spins; 0.00 0.00 0.00 22 2 short int __elision; 0.00 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0.00 0 0 char[] __size; 45.21 84.54 0.00 0 8 long int __align; }; Skipping it: root@number:~# perf annotate --stdio --group --data-type --skip-empty | head -24 Annotate type: 'pthread_mutex_t' in /usr/lib64/libc.so.6 (50 samples): event[0] = cpu_atom/mem-loads,ldlat=30/P event[1] = cpu_atom/mem-stores/P ============================================================================ Percent offset size field 100.00 100.00 0 40 pthread_mutex_t { 100.00 100.00 0 40 struct __pthread_mutex_s __data { 45.21 84.54 0 4 int __lock; 0.00 0.00 4 4 unsigned int __count; 0.00 1.83 8 4 int __owner; 5.19 10.65 12 4 unsigned int __nusers; 49.61 2.97 16 4 int __kind; 0.00 0.00 20 2 short int __spins; 0.00 0.00 22 2 short int __elision; 0.00 0.00 24 16 __pthread_list_t __list { 0.00 0.00 24 8 struct __pthread_internal_list* __prev; 0.00 0.00 32 8 struct __pthread_internal_list* __next; }; }; 0.00 0.00 0 0 char[] __size; 45.21 84.54 0 8 long int __align; }; Annotate type: 'pthread_mutexattr_t' in /usr/lib64/libc.so.6 (1 samples): root@number:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240807061713.1642924-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09perf annotate: Fix --group behavior when leader has no samplesNamhyung Kim
When --group option is used, it should display all events together. But the current logic only checks if the first (leader) event has samples or not. Let's check the member events as well. Also it missed to put the linked samples from member evsels to the output RB-tree so that it can be displayed in the output. For example, take a look at this example. $ ./perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u It has three events but 'path_put' function has samples only for mem-stores (second) event. $ sudo ./perf annotate --stdio -f path_put Percent | Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period) ---------------------------------------------------------------------------------------------------------- : 0 0xffffffffae600020 <path_put>: 0.00 : ffffffffae600020: endbr64 0.00 : ffffffffae600024: nopl (%rax, %rax) 91.22 : ffffffffae600029: pushq %rbx 0.00 : ffffffffae60002a: movq %rdi, %rbx 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi 8.78 : ffffffffae600031: callq 0xffffffffae614aa0 0.00 : ffffffffae600036: movq (%rbx), %rdi 0.00 : ffffffffae600039: popq %rbx 0.00 : ffffffffae60003a: jmp 0xffffffffae620670 0.00 : ffffffffae60003f: nop Therefore, it didn't show up when --group option is used since the leader ("mem-loads") event has no samples. But now it checks both events. Before: $ sudo ./perf annotate --stdio -f --group path_put (no output) After: $ sudo ./perf annotate --stdio -f --group path_put Percent | Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period) ------------------------------------------------------------------------------------------------------------------------------------------------------------- : 0 0xffffffffae600020 <path_put>: 0.00 0.00 0.00 : ffffffffae600020: endbr64 0.00 0.00 0.00 : ffffffffae600024: nopl (%rax, %rax) 0.00 91.22 0.00 : ffffffffae600029: pushq %rbx 0.00 0.00 0.00 : ffffffffae60002a: movq %rdi, %rbx 0.00 0.00 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi 0.00 8.78 0.00 : ffffffffae600031: callq 0xffffffffae614aa0 0.00 0.00 0.00 : ffffffffae600036: movq (%rbx), %rdi 0.00 0.00 0.00 : ffffffffae600039: popq %rbx 0.00 0.00 0.00 : ffffffffae60003a: jmp 0xffffffffae620670 0.00 0.00 0.00 : ffffffffae60003f: nop Committer testing: Before: root@number:~# perf annotate --group --stdio2 clear_page_erms root@number:~# After: root@number:~# perf annotate --group --stdio2 clear_page_erms Samples: 125 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 13198416, [percent: local period] clear_page_erms() /proc/kcore Percent 0xffffffff990c6cc0 <clear_page_erms>: endbr64 movl $0x1000,%ecx xorl %eax,%eax 0.00 100.00 0.00 rep stosb %al, (%rdi) ← retq int3 int3 int3 int3 nop nop root@number:~# Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20240807061555.1642669-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09perf tools: Create source symlink in perf object dirAndi Kleen
Create a source symlink to the original source in the objdir. This is similar to what the main kernel build script does. Committer testing: ⬢[acme@toolbox perf-tools-next]$ make O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin <SNIP> ⬢[acme@toolbox perf-tools-next]$ ls -la /tmp/build/perf-tools-next/source lrwxrwxrwx. 1 acme acme 41 Aug 9 16:26 /tmp/build/perf-tools-next/source -> /home/acme/git/perf-tools-next/tools/perf ⬢[acme@toolbox perf-tools-next]$ Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240807231823.898979-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09perf debuginfo: Fix the build with !HAVE_DWARF_SUPPORTArnaldo Carvalho de Melo
In that case we have a set of placeholder functions, one of them uses a 'Dwarf_Addr' type that is not present as it is defined in the missing DWARF libraries, so provide a placeholder typedef for that as well. The build error before this patch: In file included from util/annotate.c:28: util/debuginfo.h:44:46: error: unknown type name ‘Dwarf_Addr’ 44 | Dwarf_Addr *offs __maybe_unused, | ^~~~~~~~~~ make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: util/annotate.o] Error 1 make[6]: *** Waiting for unfinished jobs.... Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/lkml/CAM9d7ciushSwEfj7yW4rtDEJBTcCB991V4cswwFEL+cv6QF2pg@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-09perf script python: Add the 'ins_lat' field to event handlerZixian Cai
For example, when using the Alder Lake PMU memory load event, the instruction latency is stored in 'ins_lat', while the cache latency is stored in 'weight'. This patch reports the 'ins_lat' field for Python scripting. Committer testing: On a Rocket Lake Refresh Intel machine (14th gen): root@number:~# grep -m1 'model name' /proc/cpuinfo model name : Intel(R) Core(TM) i7-14700K root@number:~# perf mem record -a sleep 5 Memory events are enabled on a subset of CPUs: 16-27 [ perf record: Woken up 85 times to write data ] [ perf record: Captured and wrote 41.236 MB perf.data (191390 samples) ] root@number:~# perf evlist -v cpu_atom/mem-loads,ldlat=30/P: type: 10 (cpu_atom), size: 136, config: 0x5d0 (mem-loads), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1, { bp_addr, config1 }: 0x1f cpu_atom/mem-stores/P: type: 10 (cpu_atom), size: 136, config: 0x6d0 (mem-stores), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CPU|PERIOD|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|ADDR|CPU|IDENTIFIER|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 root@number:~# Now generate a python script to then dump the dictionary that now needs to have that 'ins_lat' field: root@number:~# perf script --gen python generated Python script: perf-script.py root@number:~# vim perf-script.py root@number:~# perf script -s perf-script.py | head -40 in trace_begin in trace_end root@number:~# vim perf-script.py Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Zixian Cai <fzczx123@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240809080137.3590148-1-fzczx123@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08perf test shell lbr: Support hybrid x86 systems tooArnaldo Carvalho de Melo
Running on a: root@x1:~# grep 'model name' -m1 /proc/cpuinfo model name : 13th Gen Intel(R) Core(TM) i7-1365U root@x1:~# It skips all the tests with: root@x1:~# perf test -vvvv LBR 97: perf record LBR tests: --- start --- test child forked, pid 2033388 Skip: only x86 CPUs support LBR ---- end(-2) ---- 97: perf record LBR tests : Skip root@x1:~# Because the test checks for the /sys/devices/cpu/caps/branches file, that isn't present as we have instead: root@x1:~# ls -la /sys/devices/cpu*/caps/branches -r--r--r--. 1 root root 4096 Aug 8 11:22 /sys/devices/cpu_atom/caps/branches -r--r--r--. 1 root root 4096 Aug 8 11:21 /sys/devices/cpu_core/caps/branches root@x1:~# If we check as well for one of those, /sys/devices/cpu_core/caps/branches, then we don't skip the tests and all are run on these x86 Intel Hybrid systems as well, passing all of them: root@x1:~# perf test -vvvv LBR 97: perf record LBR tests: --- start --- test child forked, pid 2034956 LBR callgraph [ perf record: Woken up 5 times to write data ] [ perf record: Captured and wrote 1.812 MB /tmp/__perf_test.perf.data.B2HvQ (8114 samples) ] LBR callgraph [Success] LBR any branch test [ perf record: Woken up 25 times to write data ] [ perf record: Captured and wrote 6.382 MB /tmp/__perf_test.perf.data.B2HvQ (8071 samples) ] LBR any branch test: 8071 samples LBR any branch test [Success] LBR any call test [ perf record: Woken up 23 times to write data ] [ perf record: Captured and wrote 6.208 MB /tmp/__perf_test.perf.data.B2HvQ (8092 samples) ] LBR any call test: 8092 samples LBR any call test [Success] LBR any ret test [ perf record: Woken up 24 times to write data ] [ perf record: Captured and wrote 6.396 MB /tmp/__perf_test.perf.data.B2HvQ (8093 samples) ] LBR any ret test: 8093 samples LBR any ret test [Success] LBR any indirect call test [ perf record: Woken up 25 times to write data ] [ perf record: Captured and wrote 6.344 MB /tmp/__perf_test.perf.data.B2HvQ (8067 samples) ] LBR any indirect call test: 8067 samples LBR any indirect call test [Success] LBR any indirect jump test [ perf record: Woken up 12 times to write data ] [ perf record: Captured and wrote 3.073 MB /tmp/__perf_test.perf.data.B2HvQ (8061 samples) ] LBR any indirect jump test: 8061 samples LBR any indirect jump test [Success] LBR direct calls test [ perf record: Woken up 25 times to write data ] [ perf record: Captured and wrote 6.380 MB /tmp/__perf_test.perf.data.B2HvQ (8076 samples) ] LBR direct calls test: 8076 samples LBR direct calls test [Success] LBR any indirect user call test [ perf record: Woken up 5 times to write data ] [ perf record: Captured and wrote 1.597 MB /tmp/__perf_test.perf.data.B2HvQ (8079 samples) ] LBR any indirect user call test: 8079 samples LBR any indirect user call test [Success] LBR system wide any branch test [ perf record: Woken up 26 times to write data ] [ perf record: Captured and wrote 9.088 MB /tmp/__perf_test.perf.data.B2HvQ (9209 samples) ] LBR system wide any branch test: 9209 samples LBR system wide any branch test [Success] LBR system wide any call test [ perf record: Woken up 25 times to write data ] [ perf record: Captured and wrote 8.945 MB /tmp/__perf_test.perf.data.B2HvQ (9333 samples) ] LBR system wide any call test: 9333 samples LBR system wide any call test [Success] LBR parallel any branch test LBR parallel any call test LBR parallel any ret test LBR parallel any indirect call test LBR parallel any indirect jump test LBR parallel direct calls test LBR parallel system wide any branch test LBR parallel any indirect user call test LBR parallel system wide any call test [ perf record: Woken up 9 times to write data ] [ perf record: Woken up 51 times to write data ] [ perf record: Woken up 1 times to write data ] [ perf record: Woken up 5 times to write data ] [ perf record: Woken up 559 times to write data ] [ perf record: Woken up 14 times to write data ] [ perf record: Woken up 17 times to write data ] [ perf record: Woken up 1 times to write data ] [ perf record: Woken up 11 times to write data ] [ perf record: Captured and wrote 0.150 MB /tmp/__perf_test.perf.data.lANpR (1909 samples) ] [ perf record: Captured and wrote 2.371 MB /tmp/__perf_test.perf.data.Olum8 (3033 samples) ] [ perf record: Captured and wrote 1.230 MB /tmp/__perf_test.perf.data.njfJ8 (1742 samples) ] [ perf record: Captured and wrote 5.554 MB /tmp/__perf_test.perf.data.4ZTrj (29662 samples) ] [ perf record: Captured and wrote 19.906 MB /tmp/__perf_test.perf.data.dlGQt (29576 samples) ] [ perf record: Captured and wrote 0.289 MB /tmp/__perf_test.perf.data.CAT7y (4311 samples) ] [ perf record: Captured and wrote 3.129 MB /tmp/__perf_test.perf.data.diuKG (3971 samples) ] LBR parallel any indirect user call test: 1909 samples [ perf record: Captured and wrote 4.858 MB /tmp/__perf_test.perf.data.sVjtN (6130 samples) ] LBR parallel any indirect user call test [Success] [ perf record: Captured and wrote 3.669 MB /tmp/__perf_test.perf.data.AJtNI (4827 samples) ] LBR parallel any indirect jump test: 4311 samples LBR parallel any indirect jump test [Success] LBR parallel direct calls test: 3033 samples LBR parallel direct calls test [Success] LBR parallel any indirect call test: 1742 samples LBR parallel any indirect call test [Success] LBR parallel any call test: 4827 samples LBR parallel any call test [Success] LBR parallel any branch test: 6130 samples LBR parallel any branch test [Success] LBR parallel system wide any branch test: 29662 samples LBR parallel any ret test: 3971 samples LBR parallel any ret test [Success] LBR parallel system wide any branch test [Success] LBR parallel system wide any call test: 29576 samples LBR parallel system wide any call test [Success] ---- end(0) ---- 97: perf record LBR tests : Ok root@x1:~# Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZrTXftup0H46R8WK@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08perf test: Add set of perf record LBR testsIan Rogers
Adds coverage for LBR operations and LBR callgraph. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240808054644.1286065-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08perf callchain: Fix stitch LBR memory leaksIan Rogers
The 'struct callchain_cursor_node' has a 'struct map_symbol' whose maps and map members are reference counted. Ensure these values use a _get routine to increment the reference counts and use map_symbol__exit() to release the reference counts. Do similar for 'struct thread's prev_lbr_cursor, but save the size of the prev_lbr_cursor array so that it may be iterated. Ensure that when stitch_nodes are placed on the free list the map_symbols are exited. Fix resolve_lbr_callchain_sample() by replacing list_replace_init() to list_splice_init(), so the whole list is moved and nodes aren't leaked. A reproduction of the memory leaks is possible with a leak sanitizer build in the perf report command of: ``` $ perf record -e cycles --call-graph lbr perf test -w thloop $ perf report --stitch-lbr ``` Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Fixes: ff165628d72644e3 ("perf callchain: Stitch LBR call stack") Signed-off-by: Ian Rogers <irogers@google.com> [ Basic tests after applying the patch, repeating the example above ] Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anne Macedo <retpolanne@posteo.net> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240808054644.1286065-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08perf test pmu: Set uninitialized PMU alias to nullVeronika Molnarova
Commit 3e0bf9fde2984469 ("perf pmu: Restore full PMU name wildcard support") adds a test case "PMU cmdline match" that covers PMU name wildcard support provided by function perf_pmu__match(). The test works with a wide range of supported combinations of PMU name matching but omits the case that if the perf_pmu__match() cannot match the PMU name to the wildcard, it tries to match its alias. However, this variable is not set up, causing the test case to fail when run with subprocesses or to segfault if run as a single process. ./perf test -vv 9 9: Sysfs PMU tests : 9.1: Parsing with PMU format directory : Ok 9.2: Parsing with PMU event : Ok 9.3: PMU event names : Ok 9.4: PMU name combining : Ok 9.5: PMU name comparison : Ok 9.6: PMU cmdline match : FAILED! ./perf test -F 9 9.1: Parsing with PMU format directory : Ok 9.2: Parsing with PMU event : Ok 9.3: PMU event names : Ok 9.4: PMU name combining : Ok 9.5: PMU name comparison : Ok Segmentation fault (core dumped) Initialize the PMU alias to null for all tests of perf_pmu__match() as this functionality is not being tested and the alias matching works exactly the same as the matching of the PMU name. ./perf test -F 9 9.1: Parsing with PMU format directory : Ok 9.2: Parsing with PMU event : Ok 9.3: PMU event names : Ok 9.4: PMU name combining : Ok 9.5: PMU name comparison : Ok 9.6: PMU cmdline match : Ok Fixes: 3e0bf9fde2984469 ("perf pmu: Restore full PMU name wildcard support") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Radostin Stoyanov <rstoyano@redhat.com> Link: https://lore.kernel.org/r/20240808103749.9356-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08perf tests ftrace: Add pattern check for time, countArnaldo Carvalho de Melo
In 'perf ftrace profile sleep 0.1' we know that we'll have an specific kernel function that will take a bit more than 0.1 seconds and will take place just one time, so we can add a check for that so that we validate more than just the presence of some functions in the profile. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/ZrTBo7KACZeuCyLj@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08perf test: Add a new shell test for perf ftraceNamhyung Kim
$ sudo ./perf test ftrace -vv 86: perf ftrace tests: --- start --- test child forked, pid 1772223 perf ftrace list test syscalls for sleep: __x64_sys_nanosleep __ia32_sys_nanosleep __x64_sys_clock_nanosleep __ia32_sys_clock_nanosleep perf ftrace list test [Success] perf ftrace trace test # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) | __x64_sys_clock_nanosleep() { 0) | common_nsleep() { 0) | hrtimer_nanosleep() { 0) | do_nanosleep() { perf ftrace trace test [Success] perf ftrace latency test target function: __x64_sys_clock_nanosleep # DURATION | COUNT | GRAPH | 32 - 64 ms | 1 | ############################################## | perf ftrace latency test [Success] perf ftrace profile test # Total (us) Avg (us) Max (us) Count Function 100136.400 100136.400 100136.400 1 __x64_sys_clock_nanosleep 100135.200 100135.200 100135.200 1 common_nsleep 100134.700 100134.700 100134.700 1 hrtimer_nanosleep 100133.700 100133.700 100133.700 1 do_nanosleep 100130.600 100130.600 100130.600 1 schedule 166.868 55.623 80.299 3 scheduler_tick 5.926 5.926 5.926 1 native_smp_send_reschedule 301.941 301.941 301.941 1 __x64_sys_execve 295.786 295.786 295.786 1 do_execveat_common.isra.0 71.397 35.699 46.403 2 bprm_execve 2.519 1.260 1.547 2 sched_mm_cid_before_execve 1.098 0.549 0.686 2 sched_mm_cid_after_execve perf ftrace profile test [Success] ---- end(0) ---- 86: perf ftrace tests : Ok Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20240808044954.1775333-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08perf annotate-data: Show typedef names properlyNamhyung Kim
The die_get_typename() would resolve typedef and get to the original type. But sometimes the original type is a struct without name and it makes the output confusing and hard to read. This is a diff of perf report -s type before and after the change. New types such as atomic{,64}_t and sigset_t appeared and the portion of unnamed struct was reduced. Also u32, u64 and size_t were splitted from the base types. --- b 2024-08-01 17:02:34.307809952 -0700 +++ a 2024-08-07 14:17:05.245853999 -0700 - 2.40% long unsigned int + 2.26% long unsigned int - 1.56% unsigned int + 1.27% unsigned int - 0.98% struct - 0.79% long long unsigned int + 0.58% long long unsigned int + 0.36% struct + 0.27% atomic64_t + 0.22% u32 + 0.21% u64 + 0.19% atomic_t + 0.13% size_t - 0.08% struct seqcount_spinlock + 0.08% seqcount_spinlock_t + 0.08% sigset_t + 0.08% __poll_t Let's use the typedef name directly and the resolved to get the size of the type. Committer testing: root@x1:~# diff -u before after | head -30 --- before 2024-08-08 09:35:13.917325041 -0300 +++ after 2024-08-08 09:37:35.312257905 -0300 @@ -10,25 +10,27 @@ # ........ ......... # 79.40% (unknown) - 2.28% union 1.96% (stack operation) - 1.24% struct + 1.87% pthread_mutex_t 0.99% u32[] - 0.92% unsigned int 0.77% struct task_struct + 0.75% U32 0.75% struct pcpu_hot 0.63% struct qspinlock + 0.61% atomic_t 0.59% struct list_head - 0.58% int 0.53% struct cfs_rq 0.51% BYTE* - 0.48% unsigned char + 0.48% BYTE 0.48% long unsigned int 0.46% struct rq 0.41% struct worker 0.41% struct memcg_vmstats_percpu + 0.41% pthread_cond_t 0.37% _Bool + 0.36% int root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240807223129.1738004-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08perf annotate: Cache debuginfo for data type profilingNamhyung Kim
In find_data_type(), it creates and deletes a debug info whenver it tries to find data type for a sample. This is inefficient and it most likely accesses the same binary again and again. Let's add a single entry cache the debug info structure for the last DSO. Depending on sample data, it usually gives me 2~3x (and sometimes more) speed ups. Note that this will introduce a little difference in the output due to the order of checking stack operations. It used to check the stack ops before checking the availability of debug info but I moved it after the symbol check. So it'll report stack operations in DSOs without debug info as unknown. But I think it's ok and better to have the checking near the caching logic. Committer testing: root@x1:~# perf mem record -a sleep 5s root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# diff -u before after --- before 2024-08-08 09:33:53.880780784 -0300 +++ after 2024-08-08 09:35:13.917325041 -0300 @@ -81,8 +81,8 @@ # Overhead Data Type # ........ ......... # - 55.43% (unknown) - 11.61% (stack operation) + 55.56% (unknown) + 11.48% (stack operation) 4.93% struct pcpu_hot 3.26% unsigned int 2.48% struct Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240805234648.1453689-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-08perf hist: Fix reference counting of branch_infoIan Rogers
iter_finish_branch_entry() doesn't put the branch_info from/to map elements creating memory leaks. This can be seen with: ``` $ perf record -e cycles -b perf test -w noploop $ perf report -D ... Direct leak of 984344 byte(s) in 123043 object(s) allocated from: #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x564d3400d10b in map__get util/map.h:186 #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981 #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151 #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898 #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238 #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334 #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655 #8 0x564d3403ba52 in do_flush util/ordered-events.c:245 #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324 #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708 #11 0x564d34032480 in perf_session__process_event util/session.c:1877 #12 0x564d340336ad in reader__read_event util/session.c:2399 #13 0x564d34033fdc in reader__process_events util/session.c:2448 #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495 #15 0x564d34033fdc in perf_session__process_events util/session.c:2661 #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065 #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805 #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350 #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403 #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447 #21 0x564d33cdd827 in main tools/perf/perf.c:561 ... ``` Clearing up the map_symbols properly creates maps reference count issues so resolve those. Resolving this issue doesn't improve peak heap consumption for the test above. Committer testing: $ sudo dnf install libasan $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240807065136.1039977-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-07tools/include: Sync filesystem headers with the kernel sourcesNamhyung Kim
To pick up changes from: 0f9ca80fa4f9 fs: Add initial atomic write support info to statx f9af549d1fd3 fs: export mount options via statmount() 0a3deb11858a fs: Allow listmount() in foreign mount namespace 09b31295f833 fs: export the mount ns id via statmount d04bccd8c19d listmount: allow listing in reverse order bfc69fd05ef9 fs/procfs: add build ID fetching to PROCMAP_QUERY API ed5d583a88a9 fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps This should be used to beautify FS syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h diff -u tools/perf/trace/beauty/include/uapi/linux/fs.h include/uapi/linux/fs.h diff -u tools/perf/trace/beauty/include/uapi/linux/mount.h include/uapi/linux/mount.h diff -u tools/perf/trace/beauty/include/uapi/linux/stat.h include/uapi/linux/stat.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-07tools/include: Sync network socket headers with the kernel sourcesNamhyung Kim
To pick up changes from: d25a92ccae6b net/smc: Introduce IPPROTO_SMC 060f4ba6e403 io_uring/net: move charging socket out of zc io_uring bb6aaf736680 net: Split a __sys_listen helper for io_uring dc2e77979412 net: Split a __sys_bind helper for io_uring This should be used to beautify socket syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-07tools/include: Sync uapi/asm-generic/unistd.h with the kernel sourcesNamhyung Kim
And arch syscall tables to pick up changes from: b1e31c134a8a powerpc: restore some missing spu syscalls d3882564a77c syscalls: fix compat_sys_io_pgetevents_time64 usage 54233a425403 uretprobe: change syscall number, again 63ded110979b uprobe: Change uretprobe syscall scope and number 9142be9e6443 x86/syscall: Mark exit[_group] syscall handlers __noreturn 9aae1baa1c5d x86, arm: Add missing license tag to syscall tables files 5c28424e9a34 syscalls: Fix to add sys_uretprobe to syscall.tbl 190fec72df4a uprobe: Wire up uretprobe system call This should be used to beautify syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-arch@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-06tools/include: Sync uapi/sound/asound.h with the kernel sourcesNamhyung Kim
To pick up changes from: f05c1ffc2745 ALSA: pcm: reinvent the stream synchronization ID API This should be used to beautify sound syscall arguments and it addresses these tools/perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/sound/asound.h include/uapi/sound/asound.h Please see tools/include/uapi/README for details (it's in the first patch of this series). Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: linux-sound@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-06Merge remote-tracking branch 'torvalds/master' into perf-tools-nextArnaldo Carvalho de Melo
To pick a patch that albeit being for tools/perf/ directory went thru a different tree and ended up breaking some recent tests introduced in the perf-tools-next tree to validate duplicate events in the JSON performance event files. Link: https://lore.kernel.org/lkml/ZrIqDMg7cBVhstYU@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-06perf jevents.py: Ensure event names aren't duplicatedIan Rogers
Duplicate event names break invariants in 'perf list'. Assert that an event name isn't duplicated so that broken JSON won't build. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Charles Ci-Jyun Wu <dminus@andestech.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Greentime Hu <greentime.hu@sifive.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Inochi Amaoto <inochiama@outlook.com> Cc: James Clark <james.clark@linaro.org> Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Locus Wei-Han Chen <locus84@andestech.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Vincent Chen <vincent.chen@sifive.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240805194424.597244-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-06perf pmu-events: Remove duplicated ampereone eventIan Rogers
OP_SPEC is repeated twice in the file which will break invariants in 'perf list' as discussed in this thread: https://lore.kernel.org/linux-perf-users/20240719081651.24853-1-eric.lin@sifive.com/ Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Charles Ci-Jyun Wu <dminus@andestech.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Greentime Hu <greentime.hu@sifive.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Inochi Amaoto <inochiama@outlook.com> Cc: James Clark <james.clark@linaro.org> Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Locus Wei-Han Chen <locus84@andestech.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Vincent Chen <vincent.chen@sifive.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240805194424.597244-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-06perf pmu-events: Change dependencies for empty-pmu-events.c testIan Rogers
Switch from $? (all the prerequisites that are newer than the target) to $^ (all the prerequisites) as touching jevents.py will mean that empty-pmu-events.c won't be passed to the diff command breaking the build. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Charles Ci-Jyun Wu <dminus@andestech.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Greentime Hu <greentime.hu@sifive.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Inochi Amaoto <inochiama@outlook.com> Cc: James Clark <james.clark@linaro.org> Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Locus Wei-Han Chen <locus84@andestech.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Vincent Chen <vincent.chen@sifive.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240805194424.597244-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-06perf test: Add build test for JEVENTS_ARCH=allIan Rogers
Building with JEVENTS_ARCH=all builds all CPU types and allows things like assertions to check the validity of the input JSON. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Charles Ci-Jyun Wu <dminus@andestech.com> Cc: Eric Lin <eric.lin@sifive.com> Cc: Greentime Hu <greentime.hu@sifive.com> Cc: Guilherme Amadio <amadio@gentoo.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Inochi Amaoto <inochiama@outlook.com> Cc: James Clark <james.clark@linaro.org> Cc: Ji Sheng Teoh <jisheng.teoh@starfivetech.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Locus Wei-Han Chen <locus84@andestech.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Vincent Chen <vincent.chen@sifive.com> Cc: Will Deacon <will@kernel.org> Cc: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20240805194424.597244-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05perf annotate: Add --skip-empty optionNamhyung Kim
Like in 'perf report', we want to hide empty events in the 'perf annotate' output. This is consistent when the option is set in perf report. For example, the following command would use 3 events including dummy. $ perf mem record -a -- perf test -w noploop $ perf evlist cpu/mem-loads,ldlat=30/P cpu/mem-stores/P dummy:u Just using perf annotate with --group will show the all 3 events. $ perf annotate --group --stdio | head Percent | Source code & Disassembly of ... -------------------------------------------------------------- : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 0.00 : e060: pushq %rbp 0.00 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 0.00 : e064: pushq %r15 0.00 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 0.00 : e069: pushq %r14 0.00 0.00 0.00 : e06b: pushq %r13 0.00 0.00 0.00 : e06d: movl %edx, %r13d Now with --skip-empty, it'll hide the last dummy event. $ perf annotate --group --stdio --skip-empty | head Percent | Source code & Disassembly of ... ------------------------------------------------------ : 0 0xe060 <_dl_relocate_object>: 0.00 0.00 : e060: pushq %rbp 0.00 0.00 : e061: movq %rsp, %rbp 0.00 0.00 : e064: pushq %r15 0.00 0.00 : e066: movq %rdi, %r15 0.00 0.00 : e069: pushq %r14 0.00 0.00 : e06b: pushq %r13 0.00 0.00 : e06d: movl %edx, %r13d Committer testing: root@x1:~# perf evlist cpu_atom/mem-loads,ldlat=30/P cpu_atom/mem-stores/P dummy:u root@x1:~# Before: root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 0.00 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# After: root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25 Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period] do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2 Percent 0x9900 <do_lookup_x>: pushq %rbp movq %rsp,%rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $0x88,%rsp movq %rdi,-0x50(%rbp) movl 8(%r9),%edi movq 0x10(%rbp),%r12 movq 0x28(%rbp),%r10 movq %rdx,-0x70(%rbp) movq %rcx,-0x58(%rbp) movq %rdi,%r11 0.00 5.73 movq %r8,-0x68(%rbp) movq (%r9),%r8 movl %esi,%eax 8.30 0.00 movl 0x30(%rbp),%r9d movl %esi,%r15d shrl $6, %eax movq %r8,%r13 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05perf annotate: Set al->data_nr using the notes->src->nr_eventsNamhyung Kim
This is a preparation to support skipping empty events. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05perf annotate: Use annotation__pcnt_width() consistentlyNamhyung Kim
The annotation__pcnt_width() calculates the screen width for the overhead (percent) area considering event groups properly. Use this function consistently so that we can make sure it has similar output in different modes. But there's a difference in stdio and tui output: stdio uses 8 and tui uses 7 for a percent. Let's use 8 and adjust the print width in __annotation_line__write() properly. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05perf annotate: Set notes->src->nr_events earlyNamhyung Kim
We want to use it in different places so make sure it sets properly in symbol__annotate() before creating the disasm lines. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05perf annotate: Use al->data_nr if possibleNamhyung Kim
The data_nr keeps the number of entries in al->data[] so it should use it when it iterates the array. The notes->src->nr_events should have the same number but it'd be natural to use al->data_nr. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240803211332.1107222-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-05perf mem: Update documentation for new optionsNamhyung Kim
Add a common options section and move some items to the section. Also add description of new options to report options. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/lkml/20240802180913.1023886-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-02Merge tag 'riscv-for-linus-6.11-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix to avoid dropping some of the internal pseudo-extensions, which breaks *envcfg dependency parsing - The kernel entry address is now aligned in purgatory, which avoids a misaligned load that can lead to crash on systems that don't support misaligned accesses early in boot - The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of perf JSON configurations, one of them been updated to FW_SFENCE_VMA_ASID_SENT - The starfive cache driver is now restricted to 64-bit systems, as it isn't 32-bit clean - A fix for to avoid aliasing legacy-mode perf counters with software perf counters - VM_FAULT_SIGSEGV is now handled in the page fault code - A fix for stalls during CPU hotplug due to IPIs being disabled - A fix for memblock bounds checking. This manifests as a crash on systems with discontinuous memory maps that have regions that don't fit in the linear map * tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix linear mapping checks for non-contiguous memory regions RISC-V: Enable the IPI before workqueue_online_cpu() riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error() perf: riscv: Fix selecting counters in legacy mode cache: StarFive: Require a 64-bit system perf arch events: Fix duplicate RISC-V SBI firmware event name riscv/purgatory: align riscv_kernel_entry riscv: cpufeature: Do not drop Linux-internal extensions