summaryrefslogtreecommitdiff
path: root/tools/perf/arch
AgeCommit message (Collapse)Author
2023-04-04perf stat: Suppress warning when using cpum_cf events on s390Thomas Richter
Running command perf stat -vv -e cpu_cycles -C0 -- true displays this warning: Attempting to add event pmu 'cpum_cf' with 'cpu_cycles,' that may result in non-fatal errors Make the PMU cpum_cf selectable and avoid this warning. While at it also fix this warning for PMUs pai_crypto and pai_ext. Output before: # ./perf stat -vv -e cpu_cycles -C0 -- true Using CPUID IBM,3931,704,A01,3.7,002f Attempting to add event pmu 'cpum_cf' with 'cpu_cycles,' that may result in non-fatal errors After aliases, add event pmu 'cpum_cf' with 'event,' that may result in non-fatal errors cpu_cycles -> cpum_cf/event=0/ Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 10 size 128 config 0x1001 sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 exclude_guest 1 ------------------------------------------------------------ sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3 cpu_cycles: 0: 290434 2479172 2479172: cpu_cycles: 290434 2479172 2479172 Performance counter stats for 'CPU(s) 0': 290,434 cpu_cycles 0.002465617 seconds time elapsed # Now the warning "Attempting to add event pmu 'cpum_cf' ..." does not show up anymore. Output after: # ./perf stat -vv -e cpu_cycles -C0 -- true Using CPUID IBM,3931,704,A01,3.7,002f After aliases, add event pmu 'cpum_cf' with 'event,' that may result in non-fatal errors cpu_cycles -> cpum_cf/event=0/ Control descriptor is not initialized .... Performance counter stats for 'CPU(s) 0': 357,023 cpu_cycles 0.002454995 seconds time elapsed # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20230316074946.41110-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-23sh: remove sh5/sh64 last fragmentsRandy Dunlap
A previous patch removed most of the sh5 (sh64) support from the kernel tree. Now remove the last stragglers. Fixes: 37744feebc08 ("sh: remove sh5 support") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: linux-sh@vger.kernel.org Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230306040037.20350-6-rdunlap@infradead.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-03-20perf intel-pt: Add support for new branch instructions ERETS and ERETUAdrian Hunter
Intel Flexible Return and Event Delivery (FRED) adds instructions ERETS (return to supervisor) and ERETU (return to user). Intel PT instruction decoder needs to know about these instructions because they are branch instructions. Similar to IRET instructions, when the decoder encounters one of these instructions it will match it to a TIP (target instruction pointer) packet that informs what the branch destination is. The existing "x86 instruction decoder - new instructions" test can be used to test the result e.g. $ perf test -v ins |& grep eret Decoded ok: f2 0f 01 ca erets Decoded ok: f3 0f 01 ca eretu Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230320183517.15099-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-15perf kvm: Use macro to replace variable 'decode_str_len'Leo Yan
The variable 'decode_str_len' defines the string length for KVM event name and every arch defines its own values. This introduces complexity that the variable definition are spreading in multiple source files under arch folder. This patch refactors code to use a macro KVM_EVENT_NAME_LEN to define event name length and thus remove the definitions in arch files. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230315145112.186603-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-13perf cs-etm: Avoid printing warning in cs_etm_is_ete() checkJames Clark
When checking for the presence of ETE, a register is read that may not be present on older kernels or if ETE isn't available. cs_etm_get_ro() will print a warning if it doesn't exist, so check for the existence first before accessing it. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230308094843.287093-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-13perf cs-etm: Reduce verbosity of ts_source warningJames Clark
This is printed as a warning but it is normal behavior that users shouldn't be expected to do anything about. Reduce the warning level to debug3 so it's only seen in verbose mode to avoid confusion. Reviewed-by: Leo Yan <leo.yan@linaro.org Signed-off-by: James Clark <james.clark@arm.com> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230308094843.287093-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-13perf parse-events: Sort and group parsed eventsIan Rogers
This change is intended to be a no-op for most current cases, the default sort order is the order the events were parsed. Where it varies is in how groups are handled. Previously an uncore and core event that are grouped would most often cause the group to be removed: ``` $ perf stat -e '{instructions,uncore_imc_free_running_0/data_total/}' -a sleep 1 WARNING: grouped events cpus do not match, disabling group: anon group { instructions, uncore_imc_free_running_0/data_total/ } ... ``` However, when wildcards are used the events should be re-sorted and re-grouped in parse_events__set_leader, but this currently fails for simple examples: ``` $ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1 Performance counter stats for 'system wide': <not counted> MiB uncore_imc_free_running/data_read/ <not counted> MiB uncore_imc_free_running/data_write/ 1.000996992 seconds time elapsed ``` A futher failure mode, fixed in this patch, is to force topdown events into a group. This change moves sorting the evsels in the evlist after parsing. It requires parsing to set up groups. First the evsels are sorted respecting the existing groupings and parse order, but also reordering to ensure evsels of the same PMU and group appear together. So that software and aux events respect groups, their pmu_name is taken from the group leader. The sorting is done with list_sort removing a memory allocation. After sorting a pass is done to correct the group leaders and for topdown events ensuring they have a group leader. This fixes the problems seen before: ``` $ perf stat -e '{uncore_imc_free_running/data_read/,uncore_imc_free_running/data_write/}' -a sleep 1 Performance counter stats for 'system wide': 727.42 MiB uncore_imc_free_running/data_read/ 81.84 MiB uncore_imc_free_running/data_write/ 1.000948615 seconds time elapsed ``` As well as making groups not fail for cases like: ``` $ perf stat -e '{imc_free_running_0/data_total/,imc_free_running_1/data_total/}' -a sleep 1 Performance counter stats for 'system wide': 256.47 MiB imc_free_running_0/data_total/ 256.48 MiB imc_free_running_1/data_total/ 1.001165442 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230312021543.3060328-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-13perf pmu: Earlier PMU auxtrace initializationIan Rogers
This allows event parsing to use the evsel__is_aux_event function, which is important when determining event grouping. Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230312021543.3060328-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-19perf stat: Implement --topdown using json metricsIan Rogers
Request the topdown metric group of a level with the metrics in the group 'TopdownL<level>' rather than through specific events. As more topdown levels are supported this way, such as 6 on Intel Ice Lake, default to just showing the level 1 metrics. This can be overridden using '--td-level'. Rather than determine the maximum topdown level from sysfs, use the metric group names. Remove some now unused topdown code. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-stm32@st-md-mailman.stormreply.com Link: https://lore.kernel.org/r/20230219092848.639226-41-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-19perf stat: Add TopdownL1 metric as a default if presentIan Rogers
When there are no events and on Intel, the topdown events will be added by default if present. To display the metrics associated with these request special handling in stat-shadow.c. To more easily update these metrics use the json metric version via the TopdownL1 group. This makes the handling less platform specific. Modify the metricgroup__has_metric code to also cover metric groups. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-stm32@st-md-mailman.stormreply.com Link: https://lore.kernel.org/r/20230219092848.639226-40-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-19perf pmu-events: Change aggr_mode to be an enumIan Rogers
Rather than use a string to encode aggr_mode, use an enum value. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-stm32@st-md-mailman.stormreply.com Link: https://lore.kernel.org/r/20230219092848.639226-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-06perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCTKan Liang
In arch_perf_synthesize_sample_weight(), the retire_lat was mistakenly missed, add it. perf test -v "x86 sample parsing" 74: x86 Sample parsing : --- start --- test child forked, pid 72526 Samples differ at 'retire_lat' parsing failed for sample_type 0x1000000 test child finished with -1 ---- end ---- x86 Sample parsing: FAILED! Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230206162100.3329395-1-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-06perf test x86: Support the retire_lat (Retire Latency) sample_type checkKan Liang
Add test for the new field for Retire Latency in the X86 specific test. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230202192209.1795329-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-03perf report: Support Retire LatencyKan Liang
The Retire Latency field is added in the var3_w of the PERF_SAMPLE_WEIGHT_STRUCT. The Retire Latency reports pipeline stall of this instruction compared to the previous instruction in cycles. That's quite useful to display the information with perf mem report. The p_stage_cyc for Power is also from the var3_w. Union the p_stage_cyc and retire_lat to share the code. Implement X86 specific codes to display the X86 specific header. Add a new sort key retire_lat for the Retire Latency. Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20230104201349.1451191-8-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-03perf pmu-events: Introduce pmu_metrics_tableIan Rogers
Add a metrics table that is just a cast from pmu_events_table. This changes the APIs so that event and metric usage of the underlying table is different. For the no jevents case the tables are already separate, later changes will separate the tables for the jevents case. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230126233645.200509-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-03perf pmu-events: Separate the metrics from events for no jeventsIan Rogers
Separate the event and metric table when building without jevents. Add find_core_metrics_table and perf_pmu__find_metrics_table while renaming existing utilities to be event specific, so that users can find the right table for their need. Committer notes: Fix the build on aarch64 with: tools/perf/arch/arm64/util/pmu.c @@ -32,7 +32,7 @@ const struct pmu_events_table *pmu_events_table__find(void) - return perf_pmu__find_table(pmu); + return perf_pmu__find_events_table(pmu); Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230126233645.200509-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-02perf pmu-events: Add separate metric from pmu_eventIan Rogers
Create a new pmu_metric for the metric related variables from pmu_event but that is initially just a clone of pmu_event. Add iterators for pmu_metric and use in places that metrics are desired rather than events. Make the event iterator skip metric only events, and the metric iterator skip event only events. Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kang Minchul <tegongkang@gmail.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20230126233645.200509-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-27perf cs-etm: Improve missing sink warning messageJames Clark
Make the sink error message more similar to the event error message that reminds about missing kernel support. The available sinks are also determined by the hardware so mention that too. Also, usually it's not necessary to specify the sink, so add that as a hint. Now the error for a made up sink looks like this: $ perf record -e cs_etm/@abc/ Couldn't find sink "abc" on event cs_etm/@abc/. Missing kernel or device support? Hint: An appropriate sink will be picked automatically if one isn't is specified. For any error other than ENOENT, the same message as before is displayed. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/ec7502e6-b406-3997-c2a5-24f98e5c4854@arm.com Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230124110220.460551-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-22perf cs_etm: Record ts_source in AUXTRACE_INFO for ETMv4 and ETEGerman Gomez
Read the value of ts_source exposed by the driver and store it in the ETMv4 and ETE header. If the interface doesn't exist (such as in older Kernels), defaults to a safe value of -1. Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Tested-by: Tanmay Jagdale <tanmay@marvell.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bharat Bhushan <bbhushan2@marvell.com> Cc: George Cherian <gcherian@marvell.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Linu Cherian <lcherian@marvell.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sunil Kovvuri Goutham <sgoutham@marvell.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230120143702.4035046-7-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-22perf cs_etm: Keep separate symbols for ETMv4 and ETE parametersGerman Gomez
Previously, adding a new parameter at the end of ETMv4 meant adding it somewhere in the middle of ETE, which is not supported by the current header version. Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Tested-by: Tanmay Jagdale <tanmay@marvell.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bharat Bhushan <bbhushan2@marvell.com> Cc: George Cherian <gcherian@marvell.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Linu Cherian <lcherian@marvell.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sunil Kovvuri Goutham <sgoutham@marvell.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230120143702.4035046-6-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-22perf pmu: Remove duplication around EVENT_SOURCE_DEVICE_PATHJames Clark
The pattern for accessing EVENT_SOURCE_DEVICE_PATH is duplicated in a few places, so add two utility functions to cover it. Also just use perf_pmu__scan_file() instead of pmu_type() which already does the same thing. No functional changes. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Tested-by: Tanmay Jagdale <tanmay@marvell.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bharat Bhushan <bbhushan2@marvell.com> Cc: George Cherian <gcherian@marvell.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Linu Cherian <lcherian@marvell.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sunil Kovvuri Goutham <sgoutham@marvell.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230120143702.4035046-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-19perf pmu: Add #slots literal support for arm64Jing Zhang
The slots in each architecture may be different, so add #slots literal to obtain the slots of different architectures, and the #slots can be applied in the metric. Currently, The #slots just support for arm64, and other architectures will return NAN. On arm64, the value of slots is from the register PMMIR_EL1.SLOT, which I can read in /sys/bus/event_source/device/armv8_pmuv3_*/caps/slots. PMMIR_EL1.SLOT might read as zero if the PMU version is lower than ID_AA64DFR0_EL1_PMUVer_V3P4 or the STALL_SLOT event is not implemented. Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Zhuo Song <zhuo.song@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/1673940573-90503-2-git-send-email-renyu.zj@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-02perf tools riscv: Fix build error on riscv due to missing header for 'struct ↵Eric Lin
perf_sample' Since the definition of 'struct perf_sample' has been moved to sample.h, we need to include this header file to fix the build error as follows: arch/riscv/util/unwind-libdw.c: In function 'libdw__arch_set_initial_registers': arch/riscv/util/unwind-libdw.c:12:50: error: invalid use of undefined type 'struct perf_sample' 12 | struct regs_dump *user_regs = &ui->sample->user_regs; | ^~ Fixes: 9823147da6c893d9 ("perf tools: Move 'struct perf_sample' to a separate header file to disentangle headers") Signed-off-by: Eric Lin <eric.lin@sifive.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: greentime.hu@sifive.com Cc: Jiri Olsa <jolsa@kernel.org> Cc: linux-riscv@lists.infradead.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Chen <vincent.chen@sifive.com> Link: https://lore.kernel.org/r/20221231052731.24908-1-eric.lin@sifive.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-21perf arm64: Simplify mksyscalltblHans-Peter Nilsson
This patch isn't intended to have any effect on the compiled code. It just removes one level of indirection: calling the *host* compiler to build and then run a program that just printf:s the numerical entries of the syscall-table. In other words, the generated syscalls.c changes from: [46] = "ftruncate", to: [__NR3264_ftruncate] = "ftruncate", The latter is as good as the former to the user of perf, and this can be done directly by the shell-script. The syscalls defined as non-literal values (like "#define __NR_ftruncate __NR3264_ftruncate") are trivially resolved at compile-time without namespace-leaking and/or collision for its sole user, perf/util/syscalltbl.c, that just #includes the generated file. A future "-mabi=32" support would probably have to handle this differently, but that is a pre-existing problem not affected by this simplification. Calling the *host* compiler only complicates things and accidentally can get a completely wrong set of files and syscall numbers, see earlier commits. Note that the script parameter hostcc is now unused. At the time of this patch, powerpc (the origin, see comments), and also e.g. x86 has moved on, from filtering "gcc -dM -E" output to reading separate specific text-file, a table of syscall numbers. IMHO should arm64 consider adopting this. Signed-off-by: Hans-Peter Nilsson <hp@axis.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20201228024159.2BB66203B5@pchp3.se.axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-20tools headers UAPI: Sync powerpc syscall table with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in these csets: ce883a2ba310cd7c ("powerpc/32: fix syscall wrappers with 64-bit arguments") That doesn't cause any changes in the perf tools. This table is used in tools perf to allow features as described in the last update to this file. This addresses this perf build warning: Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Y6H0C5plZ4V4aiPm@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-14perf build: Use libtraceevent from the systemIan Rogers
Remove the LIBTRACEEVENT_DYNAMIC and LIBTRACEFS_DYNAMIC make command line variables. If libtraceevent isn't installed or NO_LIBTRACEEVENT=1 is passed to the build, don't compile in libtraceevent and libtracefs support. This also disables CONFIG_TRACE that controls "perf trace". CONFIG_LIBTRACEEVENT is used to control enablement in Build/Makefiles, HAVE_LIBTRACEEVENT is used in C code. Without HAVE_LIBTRACEEVENT tracepoints are disabled and as such the commands kmem, kwork, lock, sched and timechart are removed. The majority of commands continue to work including "perf test". Committer notes: Fixed up a tools/perf/util/Build reject and added: #include <traceevent/event-parse.h> to tools/perf/util/scripting-engines/trace-event-perl.c. Committer testing: $ rpm -qi libtraceevent-devel Name : libtraceevent-devel Version : 1.5.3 Release : 2.fc36 Architecture: x86_64 Install Date: Mon 25 Jul 2022 03:20:19 PM -03 Group : Unspecified Size : 27728 License : LGPLv2+ and GPLv2+ Signature : RSA/SHA256, Fri 15 Apr 2022 02:11:58 PM -03, Key ID 999f7cbf38ab71f4 Source RPM : libtraceevent-1.5.3-2.fc36.src.rpm Build Date : Fri 15 Apr 2022 10:57:01 AM -03 Build Host : buildvm-x86-05.iad2.fedoraproject.org Packager : Fedora Project Vendor : Fedora Project URL : https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/ Bug URL : https://bugz.fedoraproject.org/libtraceevent Summary : Development headers of libtraceevent Description : Development headers of libtraceevent-libs $ Default build: $ ldd ~/bin/perf | grep tracee libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007f1dcaf8f000) $ # perf trace -e sched:* --max-events 10 0.000 migration/0/17 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, dest_cpu: 1) 0.005 migration/0/17 sched:sched_wake_idle_without_ipi(cpu: 1) 0.011 migration/0/17 sched:sched_switch(prev_comm: "", prev_pid: 17 (migration/0), prev_state: 1, next_comm: "", next_prio: 120) 1.173 :0/0 sched:sched_wakeup(comm: "", pid: 3138 (gnome-terminal-), prio: 120) 1.180 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 3138 (gnome-terminal-), next_prio: 120) 0.156 migration/1/21 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, orig_cpu: 1, dest_cpu: 2) 0.160 migration/1/21 sched:sched_wake_idle_without_ipi(cpu: 2) 0.166 migration/1/21 sched:sched_switch(prev_comm: "", prev_pid: 21 (migration/1), prev_state: 1, next_comm: "", next_prio: 120) 1.183 :0/0 sched:sched_wakeup(comm: "", pid: 1602985 (kworker/u16:0-f), prio: 120, target_cpu: 1) 1.186 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 1602985 (kworker/u16:0-f), next_prio: 120) # Had to tweak tools/perf/util/setup.py to make sure the python binding shared object links with libtraceevent if -DHAVE_LIBTRACEEVENT is present in CFLAGS. Building with NO_LIBTRACEEVENT=1 uncovered some more build failures: - Make building of data-convert-bt.c to CONFIG_LIBTRACEEVENT=y - perf-$(CONFIG_LIBTRACEEVENT) += scripts/ - bpf_kwork.o needs also to be dependent on CONFIG_LIBTRACEEVENT=y - The python binding needed some fixups and util/trace-event.c can't be built and linked with the python binding shared object, so remove it in tools/perf/util/setup.py and exclude it from the list of dependencies in the python/perf.so Makefile.perf target. Building without libtraceevent-devel installed uncovered more build failures: - The python binding tools/perf/util/python.c was assuming that traceevent/parse-events.h was always available, which was the case when we defaulted to using the in-kernel tools/lib/traceevent/ files, now we need to enclose it under ifdef HAVE_LIBTRACEEVENT, just like the other parts of it that deal with tracepoints. - We have to ifdef the rules in the Build files with CONFIG_LIBTRACEEVENT=y to build builtin-trace.c and tools/perf/trace/beauty/ as we only ifdef setting CONFIG_TRACE=y when setting NO_LIBTRACEEVENT=1 in the make command line, not when we don't detect libtraceevent-devel installed in the system. Simplification here to avoid these two ways of disabling builtin-trace.c and not having CONFIG_TRACE=y when libtraceevent-devel isn't installed is the clean way. From Athira: <quote> tools/perf/arch/powerpc/util/Build -perf-y += kvm-stat.o +perf-$(CONFIG_LIBTRACEEVENT) += kvm-stat.o </quote> Then, ditto for arm64 and s390, detected by container cross build tests. - s/390 uses test__checkevent_tracepoint() that is now only available if HAVE_LIBTRACEEVENT is defined, enclose the callsite with ifder HAVE_LIBTRACEEVENT. Also from Athira: <quote> With this change, I could successfully compile in these environment: - Without libtraceevent-devel installed - With libtraceevent-devel installed - With “make NO_LIBTRACEEVENT=1” </quote> Then, finally rename CONFIG_TRACEEVENT to CONFIG_LIBTRACEEVENT for consistency with other libraries detected in tools/perf/. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: http://lore.kernel.org/lkml/20221205225940.3079667-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-05perf arm64: Fix mksyscalltbl, don't lose syscalls due to sort -nuHans-Peter Nilsson
When using "sort -nu", arm64 syscalls were lost. That is, the io_setup syscall (number 0) and all but one (typically ftruncate; 64) of the syscalls that are defined symbolically (like "#define __NR_ftruncate __NR3264_ftruncate") at the point where "sort" is applied. This creation-of-syscalls.c-scheme is, judging from comments, copy-pasted from powerpc, and worked there because at the time, its tools/arch/powerpc/include/uapi/asm/unistd.h had *literals*, like "#define __NR_ftruncate 93". With sort being numeric and the non-numeric key effectively evaluating to 0, the sort option "-u" means these "duplicates" are removed. There's no need to remove syscall lines with duplicate numbers for arm64 because there are none, so let's fix that by just losing the "-u". Having the table numerically sorted on syscall-number for the rest of the syscalls looks nice, so keep the "-n". Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Hans-Peter Nilsson <hp@axis.com> Tested-by: Leo Yan <leo.yan@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20201228023941.E0DE2203B5@pchp3.se.axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-24perf stat: Pass through 'struct outstate'Namhyung Kim
Now most of the print functions take a pointer to the struct outstate. We have one in the evlist__print_counters() and pass it through the child functions. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221123180208.2068936-13-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-16perf cpumap: Tidy libperf includesIan Rogers
Use public API when possible, don't include internal API in header files in evsel.h. Fix any related breakages. Committer note: There was one missing case, when building for arm64: arch/arm64/util/pmu.c: In function 'pmu_events_table__find': arch/arm64/util/pmu.c:18:30: error: invalid use of undefined type 'struct perf_cpu_map' 18 | if (pmu->cpus->nr != cpu__max_cpu().cpu) | ^~ Fix it by adding one more exception, including <internal/cpumap.h> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: http://lore.kernel.org/lkml/20221109184914.1357295-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-09perf intel-pt: Add hybrid CPU compatibility testAdrian Hunter
The kernel driver assumes hybrid CPUs will have Intel PT capabilities that are compatible with the boot CPU. Add a test to check that is the case. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20221104121805.5264-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-09perf intel-pt: Redefine test_suite to allow for adding more subtestsAdrian Hunter
In preparation for adding more Intel PT testing, redefine the test_suite to allow for adding more subtests. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20221104121805.5264-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-09perf intel-pt: Start turning intel-pt-pkt-decoder-test.c into a suite of ↵Adrian Hunter
intel-pt subtests In preparation for adding more Intel PT testing, rename intel-pt-pkt-decoder-test.c to intel-pt-test.c. Subtests will later be added to intel-pt-test.c. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20221104121805.5264-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-31perf tools: Move 'struct perf_sample' to a separate header file to ↵Arnaldo Carvalho de Melo
disentangle headers Some places were including event.h just to get 'struct perf_sample', move it to a separate place so that we speed up a bit the build. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf arch x86: Add missing stdlib.h to get free() prototypeArnaldo Carvalho de Melo
It was getting indirectly, out of luck, add it. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-27perf tools riscv: Add support for get_cpuid_str functionNikita Shubin
The get_cpuid_str function returns the string that contains values of MVENDORID, MARCHID and MIMPID in hex format separated by coma. The values themselves are taken from first cpu entry in "/proc/cpuid" that contains "mvendorid", "marchid" and "mimpid". Signed-off-by: Nikita Shubin <n.shubin@yadro.com> Tested-by: Kautuk Consul <kconsul@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anup Patel <anup@brainfault.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-riscv@lists.infradead.org Cc: linux@yadro.com Link: https://lore.kernel.org/r/20220815132251.25702-2-nikita.shubin@maquefel.me Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25tools headers UAPI: Sync powerpc syscall tables with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in these csets: e237506238352f3b ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs") That doesn't cause any changes in the perf tools. As a reminder, this table is used in tools perf to allow features such as: [root@five ~]# perf trace -e set_mempolicy_home_node ^C[root@five ~]# [root@five ~]# perf trace -v -e set_mempolicy_home_node Using CPUID AuthenticAMD-25-21-0 event qualifier tracepoint filter: (common_pid != 253729 && common_pid != 3585) && (id == 450) mmap size 528384B ^C[root@five ~] [root@five ~]# perf trace -v -e set* --max-events 5 Using CPUID AuthenticAMD-25-21-0 event qualifier tracepoint filter: (common_pid != 253734 && common_pid != 3585) && (id == 38 || id == 54 || id == 105 || id == 106 || id == 109 || id == 112 || id == 113 || id == 114 || id == 116 || id == 117 || id == 119 || id == 122 || id == 123 || id == 141 || id == 160 || id == 164 || id == 170 || id == 171 || id == 188 || id == 205 || id == 218 || id == 238 || id == 273 || id == 308 || id == 450) mmap size 528384B 0.000 ( 0.008 ms): bash/253735 setpgid(pid: 253735 (bash), pgid: 253735 (bash)) = 0 6849.011 ( 0.008 ms): bash/16046 setpgid(pid: 253736 (bash), pgid: 253736 (bash)) = 0 6849.080 ( 0.005 ms): bash/253736 setpgid(pid: 253736 (bash), pgid: 253736 (bash)) = 0 7437.718 ( 0.009 ms): gnome-shell/253737 set_robust_list(head: 0x7f34b527e920, len: 24) = 0 13445.986 ( 0.010 ms): bash/16046 setpgid(pid: 253738 (bash), pgid: 253738 (bash)) = 0 [root@five ~]# That is the filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -w set_mempolicy_home_node tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node tools/perf/arch/s390/entry/syscalls/syscall.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node $ $ grep -w set_mempolicy_home_node /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c [450] = "set_mempolicy_home_node", $ This addresses these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl' diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/lkml/Y01HN2DGkWz8tC%2FJ@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driverQi Liu
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the PCIe link's events, and trace the TLP headers). This patch add support for PTT device in perf tool, so users could use 'perf record' to get TLP headers trace data. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()Qi Liu
Add find_pmu_for_event() and use to simplify logic in auxtrace_record_init(). find_pmu_for_event() will be reused in subsequent patches. Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-2-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf intel-pt: Fix system_wide dummy event for hybridAdrian Hunter
User space tasks can migrate between CPUs, so when tracing selected CPUs, system-wide sideband is still needed, however evlist->core.has_user_cpus is not set in the hybrid case, so check the target cpu_list instead. Fixes: 7d189cadbeebc778 ("perf intel-pt: Track sideband system-wide when needed") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221012082259.22394-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-14perf annotate: Add missing condition flags for arm64Namhyung Kim
According to the document [1], it can also have 'hs', 'lo', 'vc', 'vs' as a condition code. Let's add them too. [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/condition-codes-1-condition-flags-and-codes Reported-by: Kevin Nomura <nomurak@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20221006222232.266416-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-11Merge tag 'perf-tools-for-v6.1-1-2022-10-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: - Add support for AMD on 'perf mem' and 'perf c2c', the kernel enablement patches went via tip. Example: $ sudo perf mem record -- -c 10000 ^C[ perf record: Woken up 227 times to write data ] [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] $ sudo perf mem report -F mem,sample,snoop Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 Memory access Samples Snoop N/A 700620 N/A L1 hit 126675 N/A L2 hit 424 N/A L3 hit 664 HitM L3 hit 10 N/A Local RAM hit 2 N/A Remote RAM (1 hop) hit 8558 N/A Remote Cache (1 hop) hit 3 N/A Remote Cache (1 hop) hit 2 HitM Remote Cache (2 hops) hit 10 HitM Remote Cache (2 hops) hit 6 N/A Uncached hit 4 N/A $ - "perf lock" improvements: - Add -E/--entries option to limit the number of entries to display, say to ask for just the top 5 contended locks. - Add -q/--quiet option to suppress header and debug messages. - Add a 'perf test' kernel lock contention entry to test 'perf lock'. - "perf lock contention" improvements: - Ask BPF's bpf_get_stackid() to skip some callchain entries. The ones closer to the tooling are bpf related and not that interesting, the ones calling the locking function are the ones we're interested in, example of a full, unskipped callstack: - Allow changing the callstack depth and number of entries to skip. 1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffbb8b8e75 bpf_trace_run2+0x35 0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb 0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5 0xffffffffbc1c26ff _raw_spin_lock+0x1f 0xffffffffbb841015 tick_do_update_jiffies64+0x25 0xffffffffbb8409ee tick_irq_enter+0x9e - Show full callstack in verbose mode (-v option), sometimes this is desirable instead of showing just one callstack entry. - Allow multiple time ranges in 'perf record --delay' to help in reducing the amount of data collected from hardware tracing (Intel PT, etc) when there is a rough idea of periods of time where events of interest take time. - Add Intel PT to record only decoder debug messages when error happens. - Improve layout of Intel PT man page. - Add new branch types: alignment, data and inst faults and arch specific ones, such as fiq, debug_halt, debug_exit, debug_inst and debug_data on arm64. Kernel enablement went thru the tip tree. - Fix 'perf probe' error log check in 'perf test' when no debuginfo is available. - Fix 'perf stat' aggregation mode logic, it should be looking at the CPU not at the core number. - Fix flags parsing in 'perf trace' filters. - Introduce compact encoding of CPU range encoding on perf.data, to avoid having a bitmap with all the CPUs. - Improvements to the 'perf stat' metrics, including adding "core_wide", and computing "smt" from the CPU topology. - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format, that allows tooling to ask for the precise number of lost samples for a given event. - Add 'addr' sort key to see just the address of sampled instructions: $ perf record -o- true | perf report -i- -s addr [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] # Samples: 12 of event 'cycles:u' # Event count (approx.): 252512 # # Overhead Address # ........ .................. 42.96% 0x7f96f08443d7 29.55% 0x7f96f0859b50 14.76% 0x7f96f0852e02 8.30% 0x7f96f0855028 4.43% 0xffffffff8de01087 perf annotate: Toggle full address <-> offset display - Add 'f' hotkey to the 'perf annotate' TUI interface when in 'disassembler output' mode ('o' hotkey) to toggle showing full virtual address or just the offset. - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for pre-existing threads, at the start of a 'perf record' session, speeding up that record startup phase. - Add a command line option to specify build ids in 'perf inject'. - Update JSON event files for the Intel alderlake, broadwell, broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids, skylake, skylakex, and tigerlake processors. - Update vendor JSON event files for the ARM Neoverse V1 and E1 platforms. - Add a 'perf test' entry for 'perf mem' where a struct has false sharing and this gets detected in the 'perf mem' output, tested with Intel, AMD and ARM64 systems. - Add a 'perf test' entry to test the resolution of java symbols, where an output like this is expected: 8.18% jshell jitted-50116-29.so [.] Interpreter 0.75% Thread-1 jitted-83602-1670.so [.] jdk.internal.jimage.BasicImageReader.getString(int) - Add tests for the ARM64 CoreSight hardware tracing feature, with specially crafted pureloop, memcpy, thread loop and unroll tread that then gets traced and the output compared with expected output. Documentation explaining it is also included. - Add per thread Intel PT 'perf test' entry to check that PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a mixture of per thread and per CPU events and mmaps, verify that this gets all recorded correctly. - Introduce pthread mutex wrappers to allow for building with clang's -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by" "lockable", "exclusive_lock_function", "exclusive_trylock_function", "exclusive_locks_required", and "no_thread_safety_analysis" compiler function attributes. - Fix empty version number when building outside of a git repo. - Improve feature detection display when multiple versions of a feature are present, such as for binutils libbfd, that has a mix of possible ways to detect according to the Linux distribution. Previously in some cases we had: Auto-detecting system features <SNIP> ... libbfd: [ on ] ... libbfd-liberty: [ on ] ... libbfd-liberty-z: [ on ] <SNIP> Now for this case we show just the main feature: Auto-detecting system features <SNIP> ... libbfd: [ on ] <SNIP> - Remove some unused structs, variables, macros, function prototypes and includes from various places. * tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits) perf script: Add missing fields in usage hint perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB perf mem/c2c: Avoid printing empty lines for unsupported events perf mem/c2c: Add load store event mappings for AMD perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO} perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout() perf test coresight: Add relevant documentation about ARM64 CoreSight testing perf test: Add git ignore for tmp and output files of ARM CoreSight tests perf test coresight: Add unroll thread test shell script perf test coresight: Add unroll thread test tool perf test coresight: Add thread loop test shell scripts perf test coresight: Add thread loop test tool perf test coresight: Add memcpy thread test shell script perf test coresight: Add memcpy thread test tool perf test: Add git ignore for perf data generated by the ARM CoreSight tests perf test: Add arm64 asm pureloop test shell script perf test: Add asm pureloop test tool ...
2022-10-06perf mem/c2c: Add load store event mappings for AMDRavi Bangoria
The 'perf mem' and 'perf c2c' tools are wrappers around 'perf record' with mem load/ store events. IBS tagged load/store sample provides most of the information needed for these tools. Wire in the "ibs_op//" event as mem-ldst event for AMD. There are some limitations though: Only load/store micro-ops provide mem/c2c information. Whereas, IBS does not have a way to choose a particular type of micro-op to tag. This results in many non-LS micro-ops being tagged which appear as N/A in the perf report. IBS, being an uncore pmu from kernel point of view[1], does not support per process monitoring. Thus, perf mem/c2c on AMD are currently supported in per-cpu mode only. Example: $ sudo perf mem record -- -c 10000 ^C[ perf record: Woken up 227 times to write data ] [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] $ sudo perf mem report -F mem,sample,snoop Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 Memory access Samples Snoop N/A 700620 N/A L1 hit 126675 N/A L2 hit 424 N/A L3 hit 664 HitM L3 hit 10 N/A Local RAM hit 2 N/A Remote RAM (1 hop) hit 8558 N/A Remote Cache (1 hop) hit 3 N/A Remote Cache (1 hop) hit 2 HitM Remote Cache (2 hops) hit 10 HitM Remote Cache (2 hops) hit 6 N/A Uncached hit 4 N/A $ [1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20221006153946.7816-6-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf tools: Add evlist__add_sched_switch()Namhyung Kim
Add a help to create a system-wide sched_switch event. One merit is that it sets the system-wide bit before adding it to evlist so that the libperf can handle the cpu and thread maps correctly. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221003204647.1481128-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-28powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlersRohan McLure
Arch-specific implementations of syscall handlers are currently used over generic implementations for the following reasons: 1. Semantics unique to powerpc 2. Compatibility syscalls require 'argument padding' to comply with 64-bit argument convention in ELF32 abi. 3. Parameter types or order is different in other architectures. These syscall handlers have been defined prior to this patch series without invoking the SYSCALL_DEFINE or COMPAT_SYSCALL_DEFINE macros with custom input and output types. We remove every such direct definition in favour of the aforementioned macros. Also update syscalls.tbl in order to refer to the symbol names generated by each of these macros. Since ppc64_personality can be called by both 64 bit and 32 bit binaries through compatibility, we must generate both both compat_sys_ and sys_ symbols for this handler. As an aside: A number of architectures including arm and powerpc agree on an alternative argument order and numbering for most of these arch-specific handlers. A future patch series may allow for asm/unistd.h to signal through its defines that a generic implementation of these syscall handlers with the correct calling convention be emitted, through the __ARCH_WANT_COMPAT_SYS_... convention. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-16-rmclure@linux.ibm.com
2022-09-26powerpc/32: Remove powerpc select specialisationRohan McLure
Syscall #82 has been implemented for 32-bit platforms in a unique way on powerpc systems. This hack will in effect guess whether the caller is expecting new select semantics or old select semantics. It does so via a guess, based off the first parameter. In new select, this parameter represents the length of a user-memory array of file descriptors, and in old select this is a pointer to an arguments structure. The heuristic simply interprets sufficiently large values of its first parameter as being a call to old select. The following is a discussion on how this syscall should be handled. As discussed in this thread, the existence of such a hack suggests that for whatever powerpc binaries may predate glibc, it is most likely that they would have taken use of the old select semantics. x86 and arm64 both implement this syscall with oldselect semantics. Remove the powerpc implementation, and update syscall.tbl to refer to emit a reference to sys_old_select and compat_sys_old_select for 32-bit binaries, in keeping with how other architectures support syscall #82. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/lkml/13737de5-0eb7-e881-9af0-163b0d29a1a0@csgroup.eu/ Link: https://lore.kernel.org/r/20220921065605.1051927-12-rmclure@linux.ibm.com
2022-08-13perf pmu-events: Hide the pmu_eventsIan Rogers
Hide that the pmu_event structs are an array with a new wrapper struct. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-12-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf pmu-events: Avoid passing pmu_events_mapIan Rogers
Preparation for hiding pmu_events_map as an implementation detail. While the map is passed, the table of events is all that is normally wanted. While modifying the function's types, rename pmu_events_map__find to pmu_events_table__find to match later encapsulation. Similarly rename pmu_add_cpu_aliases_map to pmu_add_cpu_aliases_table. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-10perf tools: Do not pass NULL to parse_events()Adrian Hunter
Many cases do not use the extra error information provided by parse_events and instead pass NULL as the struct parse_events_error pointer. Add a wrapper for those cases so that the pointer is never NULL. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220809080702.6921-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-01perf test: Remove x86 rdpmc testIan Rogers
This test has been superseded by test_stat_user_read in: tools/lib/perf/tests/test-evsel.c The updated test doesn't divide-by-0 when running time of a counter is 0. It also supports ARM64. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Rob Herring <robh@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20220719223946.176299-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-29perf stat: Add topdown metrics in the default perf stat on the hybrid machineZhengjun Xing
Topdown metrics are missed in the default perf stat on the hybrid machine, add Topdown metrics in default perf stat for hybrid systems. Currently, we support the perf metrics Topdown for the p-core PMU in the perf stat default, the perf metrics Topdown support for e-core PMU will be implemented later separately. Refactor the code adds two x86 specific functions. Widen the size of the event name column by 7 chars, so that all metrics after the "#" become aligned again. The perf metrics topdown feature is supported on the cpu_core of ADL. The dedicated perf metrics counter and the fixed counter 3 are used for the topdown events. Adding the topdown metrics doesn't trigger multiplexing. Before: # ./perf stat -a true Performance counter stats for 'system wide': 53.70 msec cpu-clock # 25.736 CPUs utilized 80 context-switches # 1.490 K/sec 24 cpu-migrations # 446.951 /sec 52 page-faults # 968.394 /sec 2,788,555 cpu_core/cycles/ # 51.931 M/sec 851,129 cpu_atom/cycles/ # 15.851 M/sec 2,974,030 cpu_core/instructions/ # 55.385 M/sec 416,919 cpu_atom/instructions/ # 7.764 M/sec 586,136 cpu_core/branches/ # 10.916 M/sec 79,872 cpu_atom/branches/ # 1.487 M/sec 14,220 cpu_core/branch-misses/ # 264.819 K/sec 7,691 cpu_atom/branch-misses/ # 143.229 K/sec 0.002086438 seconds time elapsed After: # ./perf stat -a true Performance counter stats for 'system wide': 61.39 msec cpu-clock # 24.874 CPUs utilized 76 context-switches # 1.238 K/sec 24 cpu-migrations # 390.968 /sec 52 page-faults # 847.097 /sec 2,753,695 cpu_core/cycles/ # 44.859 M/sec 903,899 cpu_atom/cycles/ # 14.725 M/sec 2,927,529 cpu_core/instructions/ # 47.690 M/sec 428,498 cpu_atom/instructions/ # 6.980 M/sec 581,299 cpu_core/branches/ # 9.470 M/sec 83,409 cpu_atom/branches/ # 1.359 M/sec 13,641 cpu_core/branch-misses/ # 222.216 K/sec 8,008 cpu_atom/branch-misses/ # 130.453 K/sec 14,761,308 cpu_core/slots/ # 240.466 M/sec 3,288,625 cpu_core/topdown-retiring/ # 22.3% retiring 1,323,323 cpu_core/topdown-bad-spec/ # 9.0% bad speculation 5,477,470 cpu_core/topdown-fe-bound/ # 37.1% frontend bound 4,679,199 cpu_core/topdown-be-bound/ # 31.7% backend bound 646,194 cpu_core/topdown-heavy-ops/ # 4.4% heavy operations # 17.9% light operations 1,244,999 cpu_core/topdown-br-mispredict/ # 8.4% branch mispredict # 0.5% machine clears 3,891,800 cpu_core/topdown-fetch-lat/ # 26.4% fetch latency # 10.7% fetch bandwidth 1,879,034 cpu_core/topdown-mem-bound/ # 12.7% memory bound # 19.0% Core bound 0.002467839 seconds time elapsed Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220721065706.2886112-6-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>