summaryrefslogtreecommitdiff
path: root/tools/perf/util/arm-spe.c
AgeCommit message (Collapse)Author
2025-05-27perf arm-spe: Add support for SPE Data Source packet on HiSilicon HIP12Yicong Yang
Add data source encoding for HiSilicon HIP12 and coresponding mapping to the perf's memory data source. This will help to synthesize the data and support upper layer tools like perf-mem and perf-c2c. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: CaiJingtao <caijingtao@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Yushan Wang <wangyushan12@huawei.com> Cc: Zeng Tao <prime.zeng@hisilicon.com> Cc: xueshan2@huawei.com Link: https://lore.kernel.org/r/20250425033845.57671-3-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-03-05perf arm-spe: Support previous branch target (PBT) addressLeo Yan
When FEAT_SPE_PBT is implemented, the previous branch target address (named as PBT) before the sampled operation, will be recorded. This commit first introduces a 'prev_br_tgt' field in the record for saving the PBT address in the decoder. If the current operation is a branch instruction, by combining with PBT, it can create a chain with two consecutive branches. As the branch stack stores branches in descending order, meaning a newer branch is stored in a lower entry in the stack. Arm SPE stores the latest branch in the first entry of branch stack, and the previous branch coming from PBT is stored into the second entry. Otherwise, if current operation is not a branch, the last branch will be saved for PBT only. PBT lacks associated information such as branch source address, branch type, and events. The branch entry fills zeros for the corresponding fields and only set its target address. After: perf script -f --itrace=bl -F flags,addr,brstack jcc ffff800080187914 0xffff8000801878fc/0xffff800080187914/P/-/-/1/COND/- 0x0/0xffff8000801878f8/-/-/-/0//- jcc ffff8000802d12d8 0xffff8000802d12f8/0xffff8000802d12d8/P/-/-/1/COND/- 0x0/0xffff8000802d12ec/-/-/-/0//- jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//- jcc ffff8000813fe200 0xffff8000813fe20c/0xffff8000813fe200/P/-/-/1/COND/- 0x0/0xffff8000813fe200/-/-/-/0//- jmp ffff800081410980 0xffff800081419108/0xffff800081410980/P/-/-/1//- 0x0/0xffff800081419104/-/-/-/0//- return ffff80008036e064 0xffff80008141ba84/0xffff80008036e064/P/-/-/1/RET/- 0x0/0xffff80008141ba60/-/-/-/0//- jcc ffff8000803d54f0 0xffff8000803d54e8/0xffff8000803d54f0/P/-/-/1/COND/- 0x0/0xffff8000803d54e0/-/-/-/0//- jmp ffff80008015e468 0xffff8000803d46dc/0xffff80008015e468/P/-/-/1//- 0x0/0xffff8000803d46c8/-/-/-/0//- jmp ffff8000806e2d50 0xffff80008040f710/0xffff8000806e2d50/P/-/-/1//- 0x0/0xffff80008040f6e8/-/-/-/0//- jcc ffff800080721704 0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/- 0x0/0xffff8000807216ac/-/-/-/0//- Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-13-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Add branch stackLeo Yan
Although Arm SPE cannot generate continuous branch records, this commit creates a branch stack with only one branch entry. A single branch info can be used for performance optimization. A branch stack structure is dynamically allocated in the decode queue. The branch stack and stack flags are synthesized based on branch types and associated events. After: # perf script --itrace=bl1 -F flags,addr,brstack jcc ffffc0fad9c6b214 0xffffc0fad9c6b234/0xffffc0fad9c6b214/P/-/-/7/COND/- jcc/miss,not_taken/ ffffc0fadaaebb30 0xffffc0fadaaebb2c/0xffffc0fadaaebb30/MN/-/-/7/COND/- jmp ffffc0fadaaea358 0xffffc0fadaaea5ec/0xffffc0fadaaea358/P/-/-/5//- jcc/not_taken/ ffffc0fadaae6494 0xffffc0fadaae6490/0xffffc0fadaae6494/PN/-/-/11/COND/- jcc/not_taken/ ffff7f83ab54 0xffff7f83ab50/0xffff7f83ab54/PN/-/-/13/COND/- jcc/not_taken/ ffff7f83ab08 0xffff7f83ab04/0xffff7f83ab08/PN/-/-/8/COND/- jcc ffff7f83aa80 0xffff7f83aa58/0xffff7f83aa80/P/-/-/10/COND/- jcc ffff7f9a45d0 0xffff7f9a43f0/0xffff7f9a45d0/P/-/-/29/COND/- jcc/not_taken/ ffffc0fad9ba6db4 0xffffc0fad9ba6db0/0xffffc0fad9ba6db4/PN/-/-/44/COND/- jcc ffffc0fadaac2964 0xffffc0fadaac2970/0xffffc0fadaac2964/P/-/-/6/COND/- jcc ffffc0fad99ddc10 0xffffc0fad99ddc04/0xffffc0fad99ddc10/P/-/-/72/COND/- jcc/not_taken/ ffffc0fad9b3f21c 0xffffc0fad9b3f218/0xffffc0fad9b3f21c/PN/-/-/64/COND/- jcc ffffc0fad9c3b604 0xffffc0fad9c3b5f8/0xffffc0fad9c3b604/P/-/-/13/COND/- jcc ffffc0fadaad6048 0xffffc0fadaad5f8c/0xffffc0fadaad6048/P/-/-/5/COND/- return/miss/ ffff7f84e614 0xffffc0fad98a2274/0xffff7f84e614/M/-/-/13/RET/- jcc/not_taken/ ffffc0fadaac4eb4 0xffffc0fadaac4eb0/0xffffc0fadaac4eb4/PN/-/-/5/COND/- jmp ffff7f8e3130 0xffff7f87555c/0xffff7f8e3130/P/-/-/5//- jcc/not_taken/ ffffc0fad9b3d9b0 0xffffc0fad9b3d9ac/0xffffc0fad9b3d9b0/PN/-/-/14/COND/- return ffffc0fad9b91950 0xffffc0fad98c3e28/0xffffc0fad9b91950/P/-/-/12/RET/- Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-12-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Set sample flags with supplement infoLeo Yan
Based on the supplement information in the record, this commit sets the sample flags for conditional branch, function call, return. It also sets events in flags, such as mispredict, not taken, and in transaction. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20250304111240.3378214-11-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-05perf arm-spe: Fix load-store operation checkingLeo Yan
The ARM_SPE_OP_LD and ARM_SPE_OP_ST operations are secondary operation type, they are overlapping with other second level's operation types belonging to SVE and branch operations. As a result, a non load-store operation can be parsed for data source and memory sample. To fix the issue, this commit introduces a is_ldst_op() macro for checking LDST operation, and apply the checking when synthesize data source and memory samples. Fixes: a89dbc9b988f ("perf arm-spe: Set sample's data source field") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250304111240.3378214-7-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-02-12perf sample: Make user_regs and intr_regs optionalIan Rogers
The struct dump_regs contains 512 bytes of cache_regs, meaning the two values in perf_sample contribute 1088 bytes of its total 1384 bytes size. Initializing this much memory has a cost reported by Tavian Barnes <tavianator@tavianator.com> as about 2.5% when running `perf script --itrace=i0`: https://lore.kernel.org/lkml/d841b97b3ad2ca8bcab07e4293375fb7c32dfce7.1736618095.git.tavianator@tavianator.com/ Adrian Hunter <adrian.hunter@intel.com> replied that the zero initialization was necessary and couldn't simply be removed. This patch aims to strike a middle ground of still zeroing the perf_sample, but removing 79% of its size by make user_regs and intr_regs optional pointers to zalloc-ed memory. To support the allocation accessors are created for user_regs and intr_regs. To support correct cleanup perf_sample__init and perf_sample__exit functions are created and added throughout the code base. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250113194345.1537821-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-12-09perf arm-spe: Add support for SPE Data Source packet on AmpereOneIlkka Koskinen
Decode SPE Data Source packets on AmpereOne. The field is IMPDEF. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241108202946.16835-3-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-12-09perf arm-spe: Prepare for adding data source packet implementations for ↵Ilkka Koskinen
other cores Split Data Source Packet handling to prepare adding support for other implementations. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Graham Woodward <graham.woodward@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241108202946.16835-2-ilkka@os.amperecomputing.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-30perf arm-spe: Use old behavior when opening old SPE filesJames Clark
Since the linked commit, we stopped interpreting data source if the perf.data file doesn't have the new metadata version. This means that perf c2c will show no samples in this case. Keep the old behavior so old files can be opened, but also still show the new warning that updating might improve the decoding. Also re-write the warning to be more concise and specific to a user. Fixes: ba5e7169e548 ("perf arm-spe: Use metadata to decide the data source feature") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Cc: Julio.Suarez@arm.com Cc: Kiel.Friedt@arm.com Cc: Ryan.Roberts@arm.com Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241029143734.291638-1-james.clark@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-29perf arm-spe: Correctly set sample flagsGraham Woodward
Set flags on all synthesized instruction and branch samples. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-4-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-29perf arm-spe: Use ARM_SPE_OP_BRANCH_ERET when synthesizing branchesGraham Woodward
Instead of checking the type for just branch misses, we can instead check for the OP_BRANCH_ERET and synthesise branches as well as branch misses. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-3-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-29perf arm-spe: Set sample.addr to target address for instruction sampleGraham Woodward
For an instruction sample, assign the target address to the field 'to_ip'. If it is a non-branch record, to_ip will be 0, presenting a non-valid target address. Signed-off-by: Graham Woodward <graham.woodward@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Cc: nd@arm.com Cc: mike.leach@linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20241025143009.25419-2-graham.woodward@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-17perf color: Add printf format checking and resolve issuesIan Rogers
Add printf format checking to vararg printf routines in color.h. Resolve build errors/bugs that are found through this checking. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Leo Yan <leo.yan@linux.dev> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241017175356.783793-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-14perf arm-spe: Add Cortex CPUs to common data source encoding listLeo Yan
Add Cortex-A720, Cortex-A725, Cortex-X1C, Cortex-X3 and Cortex-X925 into the common data source encoding list. For everyone of these CPUs, it technical reference manual defines the data source packet as the common encoding format. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-8-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-14perf arm-spe: Add Neoverse-V2 to common data source encoding listBesar Wicaksono
Add Neoverse-V2 MIDR to the common data source encoding range list. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-7-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-14perf arm-spe: Remove the unused 'midr' fieldLeo Yan
The 'midr' field is replaced by the MIDR values stored in metadata (per CPU wise). Remove the 'midr' field as it is no longer used. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-6-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-14perf arm-spe: Use metadata to decide the data source featureLeo Yan
Use the info in the metadata to decide if the data source feature is supported. The CPU MIDR must be in the CPU list for the common data source encoding. For the metadata version 1, it doesn't include info for MIDR. In this case, due to absent info for making decision, print out warning to remind users to upgrade tool and returns false. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-5-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-14perf arm-spe: Introduce arm_spe__is_homogeneous()Leo Yan
Introduce the arm_spe__is_homogeneous() function, it uses to check if Arm SPE is homogeneous cross all CPUs. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-4-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-14perf arm-spe: Rename the common data source encodingLeo Yan
The Neoverse CPUs follow the common data source encoding, and other CPU variants can share the same format. Rename the CPU list and data source definitions as common data source names. This change prepares for appending more CPU variants. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-3-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-14perf arm-spe: Rename arm_spe__synth_data_source_generic()Leo Yan
The arm_spe__synth_data_source_generic() function is invoked when the tool detects that CPUs do not support data source packets and falls back to synthesizing only the memory level. Rename it to arm_spe__synth_memory_level() for better reflecting its purpose. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20241003185322.192357-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-03perf arm-spe: Dump metadata with version 2Leo Yan
This commit dumps metadata with version 2. It dumps metadata for header and per CPU data respectively in the arm_spe_print_info() function to support metadata version 2 format. After: 0 0 0x3c0 [0x1b0]: PERF_RECORD_AUXTRACE_INFO type: 4 Header version :2 Header size :4 PMU type v2 :13 CPU number :8 Magic :0x1010101010101010 CPU # :0 Num of params :3 MIDR :0x410fd801 PMU Type :-1 Min Interval :0 Magic :0x1010101010101010 CPU # :1 Num of params :3 MIDR :0x410fd801 PMU Type :-1 Min Interval :0 Magic :0x1010101010101010 CPU # :2 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :3 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :4 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :5 Num of params :3 MIDR :0x410fd870 PMU Type :13 Min Interval :1024 Magic :0x1010101010101010 CPU # :6 Num of params :3 MIDR :0x410fd850 PMU Type :-1 Min Interval :0 Magic :0x1010101010101010 CPU # :7 Num of params :3 MIDR :0x410fd850 PMU Type :-1 Min Interval :0 Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-6-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-03perf arm-spe: Support metadata version 2Leo Yan
This commit is to support metadata version 2 and at the meantime it is backward compatible for version 1's format. The metadata version 1 doesn't include the ARM_SPE_HEADER_VERSION field. As version 1 is fixed with two u64 fields, by checking the metadata size, it distinguishes the metadata is version 1 or version 2 (and any new versions if later will have). For version 2, it reads out CPU number and retrieves the metadata info for every CPU. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-5-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-03perf arm-spe: Define metadata header version 2Leo Yan
The first version's metadata header structure doesn't include a field to indicate a header version, which is not friendly for extension. Define the metadata version 2 format with a new header structure and extend per CPU's metadata. In the meantime, the old metadata header will still be supported for backward compatibility. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Besar Wicaksono <bwicaksono@nvidia.com> Cc: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241003184302.190806-2-leo.yan@arm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-08-12perf tool: Constify tool pointersIan Rogers
The tool pointer (to a struct largely of function pointers) is passed around but is unchanged except at initialization. Change parameter and variable types to be const to lower the possibilities of what could happen with a tool. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-08-12perf auxtrace: Remove dummy toolsIan Rogers
Add perf_session__deliver_synth_attr_event that synthesizes a perf_record_header_attr event with one id. Remove use of perf_event__synthesize_attr that necessitates the use of the dummy tool in order to pass the session. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-12perf thread: Add accessor functions for threadIan Rogers
Using accessors will make it easier to add reference count checking in later patches. Committer notes: thread->nsinfo wasn't wrapped as it is used together with nsinfo__zput(), where does a trick to set the field with a refcount being dropped to NULL, and that doesn't work well with using thread__nsinfo(thread), that loses the &thread->nsinfo pointer. When refcount checking is added to 'struct thread', later in this series, nsinfo__zput(RC_CHK_ACCESS(thread)->nsinfo) will be used to check the thread pointer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20perf arm-spe: Add SVE flags to the SPE samplesGerman Gomez
Add flags from the Scalable Vector Extension (SVE) to the SPE samples which are available from Armv8.3 (FEAT_SPEv1p1). These will be displayed in a new SIMD sort field in a later commit. Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com Cc: Anshuman.Khandual@arm.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20perf arm-spe: Refactor arm-spe to support operation packet typeGerman Gomez
Extend the decoder of Arm SPE records to support more fields from the operation packet type. Not all fields are being decoded by this commit. Only those needed to support the use-case SVE load/store/other operations. Suggested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman.Khandual@arm.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-28perf arm-spe: augment the data source type with neoverse_spe listJing Zhang
When synthesizing event with SPE data source, commit 4e6430cbb1a9("perf arm-spe: Use SPE data source for neoverse cores") augment the type with source information by MIDR. However, is_midr_in_range only compares the first entry in neoverse_spe. Change is_midr_in_range to is_midr_in_range_list to traverse the neoverse_spe array so that all neoverse cores synthesize event with data source packet. Fixes: 4e6430cbb1a9f1dc ("perf arm-spe: Use SPE data source for neoverse cores") Reviewed-by: Ali Saidi <alisaidi@amazon.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zhuo Song <zhuo.song@linux.alibaba.com> Link: https://lore.kernel.org/r/1664197396-42672-1-git-send-email-renyu.zj@linux.alibaba.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-11perf arm-spe: Use SPE data source for neoverse coresAli Saidi
When synthesizing data from SPE, augment the type with source information for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use the same encoding. I can't find encoding information for any other SPE implementations to unify their choices with Arm's thus that is left for future work. This change populates the mem_lvl_num for Neoverse cores as well as the deprecated mem_lvl namespace. Reviewed-by: German Gomez <german.gomez@arm.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Ali Saidi <alisaidi@amazon.com> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Like Xu <likexu@tencent.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Timothy Hayes <timothy.hayes@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220811062451.435810-4-leo.yan@linaro.org Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-19perf arm-spe: Don't set data source if it's not a memory operationLeo Yan
Except for memory load and store operations, ARM SPE records also can support other operation types, bug when set the data source field the current code assumes a record is a either load operation or store operation, this leads to wrongly synthesize memory samples. This patch strictly checks the record operation type, it only sets data source only for the operation types ARM_SPE_LD and ARM_SPE_ST, otherwise, returns zero for data source. Therefore, we can synthesize memory samples only when data source is a non-zero value, the function arm_spe__is_memory_event() is useless and removed. Fixes: e55ed3423c1bb29f ("perf arm-spe: Synthesize memory event") Reviewed-by: Ali Saidi <alisaidi@amazon.com> Reviewed-by: German Gomez <german.gomez@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: alisaidi@amazon.com Cc: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Huafei <lihuafei1@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20220517020326.18580-5-alisaidi@amazon.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28perf arm-spe: Fix SPE events with phys addressesTimothy Hayes
This patch corrects a bug whereby SPE collection is invoked with pa_enable=1 but synthesized events fail to show physical addresses. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20220421165205.117662-3-timothy.hayes@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-28perf arm-spe: Fix addresses of synthesized SPE eventsTimothy Hayes
This patch corrects a bug whereby synthesized events from SPE samples are missing virtual addresses. Fixes: 54f7815efef7fad9 ("perf arm-spe: Fill address info for samples") Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: bpf@vger.kernel.org Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: netdev@vger.kernel.org Cc: Song Liu <songliubraving@fb.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220421165205.117662-2-timothy.hayes@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-17perf arm-spe: Synthesize SPE instruction eventsGerman Gomez
Synthesize instruction events for every ARM SPE record. Arm SPE implements a hardware-based sample period, and perf implements a software-based one. Add a warning message to inform the user of this. Signed-off-by: German Gomez <german.gomez@arm.com> Tested-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211216152404.52474-1-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-12-16perf arm-spe: Add SPE total latency as PERF_SAMPLE_WEIGHTNamhyung Kim
Use total latency info in the SPE counter packet as sample weight so that we can see it in local_weight and (global) weight sort keys. Maybe we can use PERF_SAMPLE_WEIGHT_STRUCT to support ins_lat as well but I'm not sure which latency it matches. So just adding total latency first. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211201220855.1260688-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18perf inject: Fix ARM SPE handlingGerman Gomez
'perf inject' is currently not working for Arm SPE. When you try to run 'perf inject' and 'perf report' with a perf.data file that contains SPE traces, the tool reports a "Bad address" error: # ./perf record -e arm_spe_0/ts_enable=1,store_filter=1,branch_filter=1,load_filter=1/ -a -- sleep 1 # ./perf inject -i perf.data -o perf.inject.data --itrace # ./perf report -i perf.inject.data --stdio 0x42c00 [0x8]: failed to process type: 9 [Bad address] Error: failed to process sample As far as I know, the issue was first spotted in [1], but 'perf inject' was not yet injecting the samples. This patch does something similar to what cs_etm does for injecting the samples [2], but for SPE. [1] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210412091006.468557-1-leo.yan@linaro.org/#24117339 [2] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/cs-etm.c?h=perf/core&id=133fe2e617e48ca0948983329f43877064ffda3e#n1196 Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211105104130.28186-2-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13perf arm-spe: Support hardware-based PID tracingGerman Gomez
If ARM SPE traces contains CONTEXT packets with TID info, use these values for tracking the TID of samples. Otherwise fall back to using context switch events and display a message warning to the user of possible timing inaccuracies [1]. [1] https://lore.kernel.org/lkml/f877cfa6-9b25-6445-3806-ca44a4042eaf@arm.com/ Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211111133625.193568-5-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13perf arm-spe: Track task context switch for cpu-mode eventsNamhyung Kim
When perf report synthesize events from ARM SPE data, it refers to current cpu, pid and tid in the machine. But there's no place to set them in the ARM SPE decoder. I'm seeing all pid/tid is set to -1 and user symbols are not resolved in the output. # perf record -a -e arm_spe_0/ts_enable=1/ sleep 1 # perf report -q | head 8.77% 8.77% :-1 [kernel.kallsyms] [k] format_decode 7.02% 7.02% :-1 [kernel.kallsyms] [k] seq_printf 7.02% 7.02% :-1 [unknown] [.] 0x0000ffff9f687c34 5.26% 5.26% :-1 [kernel.kallsyms] [k] vsnprintf 3.51% 3.51% :-1 [kernel.kallsyms] [k] string 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f66ae20 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f670b3c 3.51% 3.51% :-1 [unknown] [.] 0x0000ffff9f67c040 1.75% 1.75% :-1 [kernel.kallsyms] [k] ___cache_free 1.75% 1.75% :-1 [kernel.kallsyms] [k] __count_memcg_events Like Intel PT, add context switch records to track task info. As ARM SPE support was added later than PERF_RECORD_SWITCH_CPU_WIDE, I think we can safely set the attr.context_switch bit and use it. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211111133625.193568-2-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13perf arm-spe: Print size using consistent formatAndrew Kilroy
Since the size is already printed earlier in hex, print the same data using the same format, in hex. Reviewed-by: James Clark <james.clark@arm.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20211109142153.56546-3-german.gomez@arm.com Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-01perf arm-spe: Don't wait for PERF_RECORD_EXIT eventLeo Yan
When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last perf event) for processing trace data, which is needless and even might cause logic error, e.g. it might fail to correlate perf events with Arm SPE events correctly. So this patch removes the condition checking for PERF_RECORD_EXIT event. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210519071939.1598923-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-01perf arm-spe: Bail out if the trace is later than perf eventLeo Yan
It's possible that record in Arm SPE trace is later than perf event and vice versa. This asks to correlate the perf events and Arm SPE synthesized events to be processed in the manner of correct timing. To achieve the time ordering, this patch reverses the flow, it firstly calls arm_spe_sample() and then calls arm_spe_decode(). By comparing the timestamp value and detect the perf event is coming earlier than Arm SPE trace data, it bails out from the decoding loop, the last record is pushed into auxtrace stack and is deferred to generate sample. To track the timestamp, everytime it updates timestamp for the latest record. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20210519071939.1598923-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-01perf arm-spe: Assign kernel time to synthesized eventLeo Yan
In current code, it assigns the arch timer counter to the synthesized samples Arm SPE trace, thus the samples don't contain the kernel time but only contain the raw counter value. To fix the issue, this patch converts the timer counter to kernel time and assigns it to sample timestamp. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20210519071939.1598923-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-01perf arm-spe: Convert event kernel time to counter valueLeo Yan
When handle a perf event, Arm SPE decoder needs to decide if this perf event is earlier or later than the samples from Arm SPE trace data; to do comparision, it needs to use the same unit for the time. This patch converts the event kernel time to arch timer's counter value, thus it can be used to compare with counter value contained in Arm SPE Timestamp packet. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20210519071939.1598923-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-01perf arm-spe: Save clock parameters from TIME_CONV eventLeo Yan
During the recording phase, "perf record" tool synthesizes event PERF_RECORD_TIME_CONV for the hardware clock parameters and saves the event into the data file. Afterwards, when processing the data file, the event TIME_CONV will be processed at the very early time and is stored into session context. This patch extracts these parameters from the session context and saves into the structure "spe->tc" with the type perf_tsc_conversion, so that the parameters are ready for conversion between clock counter and time stamp. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20210519071939.1598923-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-16perf arm-spe: Set sample's data source fieldLeo Yan
The sample structure contains the field 'data_src' which is used to tell the data operation attributions, e.g. operation type is loading or storing, cache level, it's snooping or remote accessing, etc. At the end, the 'data_src' will be parsed by perf mem/c2c tools to display human readable strings. This patch is to fill the 'data_src' field in the synthesized samples base on different types. Currently perf tool can display statistics for L1/L2/L3 caches but it doesn't support the 'last level cache'. To fit to current implementation, 'data_src' field uses L3 cache for last level cache. Before this commit, perf mem report looks like this: # Samples: 75K of event 'l1d-miss' # Total weight : 75951 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access # ........ ....... ............ ............. ...................... ............. ...................... ........... ..... .......... # 81.56% 61945 0 N/A [.] 0x00000000000009d8 serial_c [.] 0000000000000000 [unknown] N/A N/A 18.44% 14003 0 N/A [.] 0x0000000000000828 serial_c [.] 0000000000000000 [unknown] N/A N/A Now on a system with Arm SPE, addresses and access types are displayed: # Samples: 75K of event 'l1d-miss' # Total weight : 75951 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked # # Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access # ........ ....... ............ ............. ...................... ............. ...................... ........... ..... .......... # 0.43% 324 0 L1 miss [.] 0x00000000000009d8 serial_c [.] 0x0000ffff80794e00 anon N/A Walker hit 0.42% 322 0 L1 miss [.] 0x00000000000009d8 serial_c [.] 0x0000ffff80794580 anon N/A Walker hit Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-6-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-16perf arm-spe: Synthesize memory eventLeo Yan
The memory event can deliver two benefits: - The first benefit is the memory event can give out global view for memory accessing, rather than organizing events with scatter mode (e.g. uses separate event for L1 cache, last level cache, etc) which which can only display a event for single memory type, memory events include all memory accessing so it can display the data accessing cross memory levels in the same view; - The second benefit is the sample generation might introduce a big overhead and need to wait for long time for Perf reporting, we can specify itrace option '--itrace=M' to filter out other events and only output memory events, this can significantly reduce the overhead caused by generating samples. This patch is to enable memory event for Arm SPE. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-16perf arm-spe: Fill address info for samplesLeo Yan
To properly handle memory and branch samples, this patch divides into two functions for generating samples: arm_spe__synth_mem_sample() is for synthesizing memory and TLB samples; arm_spe__synth_branch_sample() is to synthesize branch samples. Arm SPE backend decoder has passed virtual and physical address through packets, the address info is stored into the synthesize samples in the function arm_spe__synth_mem_sample(). Committer notes: Fixed this: 36 46.77 fedora:27 : FAIL clang version 5.0.2 (tags/RELEASE_502/final) util/arm-spe.c:269:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers] struct perf_sample sample = { 0 }; ^ util/arm-spe.c:288:34: error: missing field 'pid' initializer [-Werror,-Wmissing-field-initializers] struct perf_sample sample = { 0 }; By using = { .ip = 0, }; Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20210211133856.2137-4-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-12perf arm-spe: Enable sample type PERF_SAMPLE_DATA_SRCLeo Yan
This patch is to enable sample type PERF_SAMPLE_DATA_SRC for Arm SPE in the perf data, when output the tracing data, it tells tools that it contains data source in the memory event. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210211133856.2137-1-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-26perf arm-spe: Refactor printing string to bufferLeo Yan
When outputs strings to the decoding buffer with function snprintf(), SPE decoder needs to detects if any error returns from snprintf() and if so needs to directly bail out. If snprintf() returns success, it needs to update buffer pointer and reduce the buffer length so can continue to output the next string into the consequent memory space. This complex logics are spreading in the function arm_spe_pkt_desc() so there has many duplicate codes for handling error detecting, increment buffer pointer and decrement buffer size. To avoid the duplicate code, this patch introduces a new helper function arm_spe_pkt_out_string() which is used to wrap up the complex logics, and it's used by the caller arm_spe_pkt_desc(). This patch moves the variable 'blen' as the function's local variable so allows to remove the unnecessary braces and improve the readability. This patch simplifies the return value for arm_spe_pkt_desc(): '0' means success and other values mean an error has occurred. To realize this, it relies on arm_spe_pkt_out_string()'s parameter 'err', the 'err' is a cumulative value, returns its final value if printing buffer is called for one time or multiple times. Finally, the error is handled in a central place, rather than directly bailing out in switch-cases, it returns error at the end of arm_spe_pkt_desc(). This patch changes the caller arm_spe_dump() to respect the updated return value semantics of arm_spe_pkt_desc(). Suggested-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Li <liwei391@huawei.com> Link: https://lore.kernel.org/r/20201119152441.6972-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-01perf arm-spe: Support synthetic eventsTan Xiaojun
After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical Profiling Extensions (SPE) support") has been merged, it supports to output raw data with option "--dump-raw-trace". However, it misses for support synthetic events so cannot output any statistical info. This patch is to improve the "perf report" support for ARM SPE for four types synthetic events: First level cache synthetic events, including L1 data cache accessing and missing events; Last level cache synthetic events, including last level cache accessing and missing events; TLB synthetic events, including TLB accessing and missing events; Remote access events, which is used to account load/store operations caused to another socket. Example usage: $ perf record -c 1024 -e arm_spe_0/branch_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,store_filter=1,min_latency=0/ dd if=/dev/zero of=/dev/null count=10000 $ perf report --stdio # Samples: 59 of event 'l1d-miss' # Event count (approx.): 59 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................. .................................. # 23.73% 23.73% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.135 20.34% 20.34% dd [kernel.kallsyms] [k] filemap_map_pages 5.08% 5.08% dd [kernel.kallsyms] [k] perf_event_mmap 5.08% 5.08% dd [kernel.kallsyms] [k] unlock_page_memcg 5.08% 5.08% dd [kernel.kallsyms] [k] unmap_page_range 3.39% 3.39% dd [kernel.kallsyms] [k] PageHuge 3.39% 3.39% dd [kernel.kallsyms] [k] release_pages 3.39% 3.39% dd ld-2.28.so [.] 0x0000000000008b5c 1.69% 1.69% dd [kernel.kallsyms] [k] __alloc_fd [...] # Samples: 3K of event 'l1d-access' # Event count (approx.): 3980 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................. ...................................... # 26.98% 26.98% dd [kernel.kallsyms] [k] ret_to_user 10.53% 10.53% dd [kernel.kallsyms] [k] fsnotify 7.51% 7.51% dd [kernel.kallsyms] [k] new_sync_read 4.57% 4.57% dd [kernel.kallsyms] [k] vfs_read 4.35% 4.35% dd [kernel.kallsyms] [k] vfs_write 3.69% 3.69% dd [kernel.kallsyms] [k] __fget_light 3.69% 3.69% dd [kernel.kallsyms] [k] rw_verify_area 3.44% 3.44% dd [kernel.kallsyms] [k] security_file_permission 2.76% 2.76% dd [kernel.kallsyms] [k] __fsnotify_parent 2.44% 2.44% dd [kernel.kallsyms] [k] ksys_write 2.24% 2.24% dd [kernel.kallsyms] [k] iov_iter_zero 2.19% 2.19% dd [kernel.kallsyms] [k] read_iter_zero 1.81% 1.81% dd dd [.] 0x0000000000002960 1.78% 1.78% dd dd [.] 0x0000000000002980 [...] # Samples: 35 of event 'llc-miss' # Event count (approx.): 35 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................. ........................... # 34.29% 34.29% dd [kernel.kallsyms] [k] filemap_map_pages 8.57% 8.57% dd [kernel.kallsyms] [k] unlock_page_memcg 8.57% 8.57% dd [kernel.kallsyms] [k] unmap_page_range 5.71% 5.71% dd [kernel.kallsyms] [k] PageHuge 5.71% 5.71% dd [kernel.kallsyms] [k] release_pages 5.71% 5.71% dd ld-2.28.so [.] 0x0000000000008b5c 2.86% 2.86% dd [kernel.kallsyms] [k] __queue_work 2.86% 2.86% dd [kernel.kallsyms] [k] __radix_tree_lookup 2.86% 2.86% dd [kernel.kallsyms] [k] copy_page [...] # Samples: 2 of event 'llc-access' # Event count (approx.): 2 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................. ............. # 50.00% 50.00% dd [kernel.kallsyms] [k] copy_page 50.00% 50.00% dd libc-2.28.so [.] _dl_addr # Samples: 48 of event 'tlb-miss' # Event count (approx.): 48 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................. .................................. # 20.83% 20.83% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.135 12.50% 12.50% dd [kernel.kallsyms] [k] __arch_clear_user 10.42% 10.42% dd [kernel.kallsyms] [k] clear_page 4.17% 4.17% dd [kernel.kallsyms] [k] copy_page 4.17% 4.17% dd [kernel.kallsyms] [k] filemap_map_pages 2.08% 2.08% dd [kernel.kallsyms] [k] __alloc_fd 2.08% 2.08% dd [kernel.kallsyms] [k] __mod_memcg_state.part.70 2.08% 2.08% dd [kernel.kallsyms] [k] __queue_work 2.08% 2.08% dd [kernel.kallsyms] [k] __rcu_read_unlock 2.08% 2.08% dd [kernel.kallsyms] [k] d_path 2.08% 2.08% dd [kernel.kallsyms] [k] destroy_inode 2.08% 2.08% dd [kernel.kallsyms] [k] do_dentry_open [...] # Samples: 9K of event 'tlb-access' # Event count (approx.): 9573 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................. ...................................... # 25.79% 25.79% dd [kernel.kallsyms] [k] __arch_clear_user 11.22% 11.22% dd [kernel.kallsyms] [k] ret_to_user 8.56% 8.56% dd [kernel.kallsyms] [k] fsnotify 4.06% 4.06% dd [kernel.kallsyms] [k] new_sync_read 3.67% 3.67% dd [kernel.kallsyms] [k] el0_svc_common.constprop.2 3.04% 3.04% dd [kernel.kallsyms] [k] __fsnotify_parent 2.90% 2.90% dd [kernel.kallsyms] [k] vfs_write 2.82% 2.82% dd [kernel.kallsyms] [k] vfs_read 2.52% 2.52% dd libc-2.28.so [.] write 2.26% 2.26% dd [kernel.kallsyms] [k] security_file_permission 2.08% 2.08% dd [kernel.kallsyms] [k] ksys_write 1.96% 1.96% dd [kernel.kallsyms] [k] rw_verify_area 1.95% 1.95% dd [kernel.kallsyms] [k] read_iter_zero [...] # Samples: 9 of event 'branch-miss' # Event count (approx.): 9 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................. ......................... # 22.22% 22.22% dd libc-2.28.so [.] _dl_addr 11.11% 11.11% dd [kernel.kallsyms] [k] __arch_clear_user 11.11% 11.11% dd [kernel.kallsyms] [k] __arch_copy_from_user 11.11% 11.11% dd [kernel.kallsyms] [k] __dentry_kill 11.11% 11.11% dd [kernel.kallsyms] [k] __efistub_memcpy 11.11% 11.11% dd ld-2.28.so [.] 0x0000000000012b7c 11.11% 11.11% dd libc-2.28.so [.] 0x000000000002a980 11.11% 11.11% dd libc-2.28.so [.] 0x0000000000083340 # Samples: 29 of event 'remote-access' # Event count (approx.): 29 # # Children Self Command Shared Object Symbol # ........ ........ ....... ................. ........................... # 41.38% 41.38% dd [kernel.kallsyms] [k] filemap_map_pages 10.34% 10.34% dd [kernel.kallsyms] [k] unlock_page_memcg 10.34% 10.34% dd [kernel.kallsyms] [k] unmap_page_range 6.90% 6.90% dd [kernel.kallsyms] [k] release_pages 3.45% 3.45% dd [kernel.kallsyms] [k] PageHuge 3.45% 3.45% dd [kernel.kallsyms] [k] __queue_work 3.45% 3.45% dd [kernel.kallsyms] [k] page_add_file_rmap 3.45% 3.45% dd [kernel.kallsyms] [k] page_counter_try_charge 3.45% 3.45% dd [kernel.kallsyms] [k] page_remove_rmap 3.45% 3.45% dd [kernel.kallsyms] [k] xas_start 3.45% 3.45% dd ld-2.28.so [.] 0x0000000000002a1c 3.45% 3.45% dd ld-2.28.so [.] 0x0000000000008b5c 3.45% 3.45% dd ld-2.28.so [.] 0x00000000000093cc Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Tested-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200530122442.490-4-leo.yan@linaro.org Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>