summaryrefslogtreecommitdiff
path: root/tools/perf/util/sample.h
AgeCommit message (Collapse)Author
2025-02-18perf hist: Shrink struct hist_entry sizeDmitry Vyukov
Reorder the struct fields by size to reduce paddings and reduce struct simd_flags size from 8 to 1 byte. This reduces struct hist_entry size by 8 bytes (592->584), and leaves a single more usable 6 byte padding hole. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lore.kernel.org/r/7c1cb1c8f9901e945162701ba7269d0f9c70be89.1739437531.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-12perf sample: Make user_regs and intr_regs optionalIan Rogers
The struct dump_regs contains 512 bytes of cache_regs, meaning the two values in perf_sample contribute 1088 bytes of its total 1384 bytes size. Initializing this much memory has a cost reported by Tavian Barnes <tavianator@tavianator.com> as about 2.5% when running `perf script --itrace=i0`: https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/ Adrian Hunter <adrian.hunter@intel.com> replied that the zero initialization was necessary and couldn't simply be removed. This patch aims to strike a middle ground of still zeroing the perf_sample, but removing 79% of its size by make user_regs and intr_regs optional pointers to zalloc-ed memory. To support the allocation accessors are created for user_regs and intr_regs. To support correct cleanup perf_sample__init and perf_sample__exit functions are created and added throughout the code base. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250113194345.1537821-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-11-09perf tools: Add branch counter knobKan Liang
Add a new branch filter, "counter", for the branch counter option. It is used to mark the events which should be logged in the branch. If it is applied with the -j option, the counters of all the events should be logged in the branch. If the legacy kernel doesn't support the new branch sample type, switching off the branch counter filter. The stored counter values in each branch are displayed right after the regular branch stack information via perf report -D. Usage examples: # perf record -e "{branch-instructions,branch-misses}:S" -j any,counter Only the first event, branch-instructions, collect the LBR. Both branch-instructions and branch-misses are marked as logged events. The occurrences information of them can be found in the branch stack extension space of each branch. # perf record -e "{cpu/branch-instructions,branch_type=any/,cpu/branch-misses,branch_type=counter/}" Only the first event, branch-instructions, collect the LBR. Only the branch-misses event is marked as a logged event. Committer notes: I noticed 'perf test "Sample parsing"' failing, reported to the list and Kan provided a patch that checks if the evsel has a leader and that evsel->evlist is set, the comment in the source code further explains it. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tinghao Zhang <tinghao.zhang@intel.com> Link: https://lore.kernel.org/r/20231025201626.3000228-8-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20perf event: Add 'simd_flags' field to 'struct perf_sample'German Gomez
Add new field to 'struct perf_sample' to store flags related to SIMD ops. It will be used to store SIMD information from SVE and NEON when profiling using ARM SPE. Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman.Khandual@arm.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-03perf report: Support Retire LatencyKan Liang
The Retire Latency field is added in the var3_w of the PERF_SAMPLE_WEIGHT_STRUCT. The Retire Latency reports pipeline stall of this instruction compared to the previous instruction in cycles. That's quite useful to display the information with perf mem report. The p_stage_cyc for Power is also from the var3_w. Union the p_stage_cyc and retire_lat to share the code. Implement X86 specific codes to display the X86 specific header. Add a new sort key retire_lat for the Retire Latency. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20230104201349.1451191-8-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf tools: Move 'struct perf_sample' to a separate header file to ↵Arnaldo Carvalho de Melo
disentangle headers Some places were including event.h just to get 'struct perf_sample', move it to a separate place so that we speed up a bit the build. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>