Age | Commit message (Collapse) | Author |
|
perf_attr_map could be shared among different version of perf binary. Add
bperf_attr_map_compatible() to check whether the existing attr_map is
compatible with current perf binary.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: kernel-team@fb.com
Link: https://lore.kernel.org/r/20210425214333.1090950-3-song@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
By following the same protocol, other tools can share hardware PMCs with
perf. Move perf_event_attr_map_entry and BPF_PERF_DEFAULT_ATTR_MAP_PATH to
bpf_perf.h for other tools to use.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: kernel-team@fb.com
Link: https://lore.kernel.org/r/20210425214333.1090950-2-song@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:
- Improve Intel uncore PMU support:
- Parse uncore 'discovery tables' - a new hardware capability
enumeration method introduced on the latest Intel platforms. This
table is in a well-defined PCI namespace location and is read via
MMIO. It is organized in an rbtree.
These uncore tables will allow the discovery of standard counter
blocks, but fancier counters still need to be enumerated
explicitly.
- Add Alder Lake support
- Improve IIO stacks to PMON mapping support on Skylake servers
- Add Intel Alder Lake PMU support - which requires the introduction of
'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big')
and Gracemont ('small' - Atom derived) cores.
The CPU-side feature set is entirely symmetrical - but on the PMU
side there's core type dependent PMU functionality.
- Reduce data loss with CPU level hardware tracing on Intel PT / AUX
profiling, by fixing the AUX allocation watermark logic.
- Improve ring buffer allocation on NUMA systems
- Put 'struct perf_event' into their separate kmem_cache pool
- Add support for synchronous signals for select perf events. The
immediate motivation is to support low-overhead sampling-based race
detection for user-space code. The feature consists of the following
main changes:
- Add thread-only event inheritance via
perf_event_attr::inherit_thread, which limits inheritance of
events to CLONE_THREAD.
- Add the ability for events to not leak through exec(), via
perf_event_attr::remove_on_exec.
- Allow the generation of SIGTRAP via perf_event_attr::sigtrap,
extend siginfo with an u64 ::si_perf, and add the breakpoint
information to ::si_addr and ::si_perf if the event is
PERF_TYPE_BREAKPOINT.
The siginfo support is adequate for breakpoints right now - but the
new field can be used to introduce support for other types of
metadata passed over siginfo as well.
- Misc fixes, cleanups and smaller updates.
* tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
signal, perf: Add missing TRAP_PERF case in siginfo_layout()
signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures
perf/x86: Allow for 8<num_fixed_counters<16
perf/x86/rapl: Add support for Intel Alder Lake
perf/x86/cstate: Add Alder Lake CPU support
perf/x86/msr: Add Alder Lake CPU support
perf/x86/intel/uncore: Add Alder Lake support
perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
perf/x86/intel: Add Alder Lake Hybrid support
perf/x86: Support filter_match callback
perf/x86/intel: Add attr_update for Hybrid PMUs
perf/x86: Add structures for the attributes of Hybrid PMUs
perf/x86: Register hybrid PMUs
perf/x86: Factor out x86_pmu_show_pmu_cap
perf/x86: Remove temporary pmu assignment in event_init
perf/x86/intel: Factor out intel_pmu_check_extra_regs
perf/x86/intel: Factor out intel_pmu_check_event_constraints
perf/x86/intel: Factor out intel_pmu_check_num_counters
perf/x86: Hybrid PMU support for extra_regs
perf/x86: Hybrid PMU support for event constraints
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 updates from Borislav Petkov:
- Turn the stack canary into a normal __percpu variable on 32-bit which
gets rid of the LAZY_GS stuff and a lot of code.
- Add an insn_decode() API which all users of the instruction decoder
should preferrably use. Its goal is to keep the details of the
instruction decoder away from its users and simplify and streamline
how one decodes insns in the kernel. Convert its users to it.
- kprobes improvements and fixes
- Set the maximum DIE per package variable on Hygon
- Rip out the dynamic NOP selection and simplify all the machinery
around selecting NOPs. Use the simplified NOPs in objtool now too.
- Add Xeon Sapphire Rapids to list of CPUs that support PPIN
- Simplify the retpolines by folding the entire thing into an
alternative now that objtool can handle alternatives with stack ops.
Then, have objtool rewrite the call to the retpoline with the
alternative which then will get patched at boot time.
- Document Intel uarch per models in intel-family.h
- Make Sub-NUMA Clustering topology the default and Cluster-on-Die the
exception on Intel.
* tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
x86, sched: Treat Intel SNC topology as default, COD as exception
x86/cpu: Comment Skylake server stepping too
x86/cpu: Resort and comment Intel models
objtool/x86: Rewrite retpoline thunk calls
objtool: Skip magical retpoline .altinstr_replacement
objtool: Cache instruction relocs
objtool: Keep track of retpoline call sites
objtool: Add elf_create_undef_symbol()
objtool: Extract elf_symbol_add()
objtool: Extract elf_strtab_concat()
objtool: Create reloc sections implicitly
objtool: Add elf_create_reloc() helper
objtool: Rework the elf_rebuild_reloc_section() logic
objtool: Fix static_call list generation
objtool: Handle per arch retpoline naming
objtool: Correctly handle retpoline thunk calls
x86/retpoline: Simplify retpolines
x86/alternatives: Optimize optimize_nops()
x86: Add insn_decode_kernel()
x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 cleanups from Borislav Petkov:
"Trivial cleanups and fixes all over the place"
* tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Remove me from IDE/ATAPI section
x86/pat: Do not compile stubbed functions when X86_PAT is off
x86/asm: Ensure asm/proto.h can be included stand-alone
x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files
x86/msr: Make locally used functions static
x86/cacheinfo: Remove unneeded dead-store initialization
x86/process/64: Move cpu_current_top_of_stack out of TSS
tools/turbostat: Unmark non-kernel-doc comment
x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL()
x86/fpu/math-emu: Fix function cast warning
x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
x86: Fix various typos in comments, take #2
x86: Remove unusual Unicode characters from comments
x86/kaslr: Return boolean values from a function returning bool
x86: Fix various typos in comments
x86/setup: Remove unused RESERVE_BRK_ARRAY()
stacktrace: Move documentation for arch_stack_walk_reliable() to header
x86: Remove duplicate TSC DEADLINE MSR definitions
|
|
To pick up fixes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Kernel has supported COMETLAKE/COMETLAKE_L to use the SKYLAKE
events and supported TIGERLAKE_L/TIGERLAKE/ROCKETLAKE to use
the ICELAKE events. But pmu-events mapfile.csv is missing
these model numbers.
Now add the missing model numbers to mapfile.csv.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210329070903.8894-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Although 'err' has been initialized to -ENOMEM, but it will be reassigned
by the "err = unwind__prepare_access(...)" statement in the for loop. So
that, the value of 'err' is unknown when map__clone() failed.
Fixes: 6c502584438bda63 ("perf unwind: Call unwind__prepare_access for forked thread")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: zhen lei <thunder.leizhen@huawei.com>
Link: http://lore.kernel.org/lkml/20210415092744.3793-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Command 'perf ftrace -v -- ls' fails in s390 (at least 5.12.0rc6).
The root cause is a missing pointer dereference which causes an
array element address to be used as PID.
Fix this by extracting the PID.
Output before:
# ./perf ftrace -v -- ls
function_graph tracer is used
write '-263732416' to tracing/set_ftrace_pid failed: Invalid argument
failed to set ftrace pid
#
Output after:
./perf ftrace -v -- ls
function_graph tracer is used
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
4) | rcu_read_lock_sched_held() {
4) 0.552 us | rcu_lockdep_current_cpu_online();
4) 6.124 us | }
Reported-by: Alexander Schmidt <alexschm@de.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20210421120400.2126433-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In the function auxtrace_parse_snapshot_options(), the callback pointer
"itr->parse_snapshot_options" can be NULL if it has not been set during
the AUX record initialization. This can cause tool crashing if the
callback pointer "itr->parse_snapshot_options" is dereferenced without
performing NULL check.
Add a NULL check for the pointer "itr->parse_snapshot_options" before
invoke the callback.
Fixes: d20031bb63dd6dde ("perf tools: Add AUX area tracing Snapshot Mode")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: http://lore.kernel.org/lkml/20210420151554.2031768-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Update Topdown documentation to permit calls to rdpmc, and describe
interaction with system calls.
Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: http://lore.kernel.org/lkml/20210421091009.1711565-1-mdr@ashroe.eu
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Although 'ret' has been initialized to -1, but it will be reassigned by
the "ret = open(...)" statement in the for loop. So that, the value of
'ret' is unknown when asprintf() failed.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210415083417.3740-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To use in automated tests inside containers from a tarball generated
by 'make perf-tar-src-pkg*', where testing building from a tarball
is obviously not needed, so add a 'build-test-tarball' for that case.
And don't build with gtk2 as this complicates things for cross builds
where we don't always have all the libraries a full perf build requires
available for the target arch, ditto for static builds.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Although 'ret' has been initialized to -1, but it will be reassigned by
the "ret = open(...)" statement in the for loop. So that, the value of
'ret' is unknown when asprintf() failed.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210415083417.3740-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Relative path include works in the regular build due to -I paths but may
break in other situations.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210416214113.552252-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The patch changes the output format in 2 ways:
- line number is displayed for all source lines (matching TUI mode)
- source locations for the hottest lines are printed
at the line end in order to preserve layout
Before:
0.00 : 405ef1: inc %r15
: tmpsd * (TD + tmpsd * TDD)));
0.01 : 405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3 # 4318b0 <_IO_stdin_used+0x8b0>
: tmpsd * (TC +
eff.c:1811 0.67 : 405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3 # 4318b8 <_IO_stdin_used+0x8b8>
: TA + tmpsd * (TB +
0.35 : 405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3 # 4318c0 <_IO_stdin_used+0x8c0>
: dumbo =
eff.c:1809 1.41 : 405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3 # 4318c8 <_IO_stdin_used+0x8c8>
: sumi -= sj * tmpsd * dij2i * dumbo;
eff.c:1813 2.58 : 405f18: vmulsd %xmm3,%xmm0,%xmm0
2.81 : 405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
3.78 : 405f23: vmovsd %xmm0,0x30(%rsp)
: for (k = 0; k < lpears[i] + upears[i]; k++) {
eff.c:1761 0.90 : 405f29: cmp %r15d,%r12d
After:
0.00 : 405ef1: inc %r15
: 1812 tmpsd * (TD + tmpsd * TDD)));
0.01 : 405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3 # 4318b0 <_IO_stdin_used+0x8b0>
: 1811 tmpsd * (TC +
0.67 : 405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3 # 4318b8 <_IO_stdin_used+0x8b8> // eff.c:1811
: 1810 TA + tmpsd * (TB +
0.35 : 405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3 # 4318c0 <_IO_stdin_used+0x8c0>
: 1809 dumbo =
1.41 : 405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3 # 4318c8 <_IO_stdin_used+0x8c8> // eff.c:1809
: 1813 sumi -= sj * tmpsd * dij2i * dumbo;
2.58 : 405f18: vmulsd %xmm3,%xmm0,%xmm0 // eff.c:1813
2.81 : 405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
3.78 : 405f23: vmovsd %xmm0,0x30(%rsp)
: 1761 for (k = 0; k < lpears[i] + upears[i]; k++) {
Where e.g. '// eff.c:1811' shares the same color as the percentantage
at the line beginning.
Signed-off-by: Martin Liška <mliska@suse.cz>
Link: http://lore.kernel.org/lkml/a0d53f31-f633-5013-c386-a4452391b081@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
After a "make -C tools/perf", git reports the following untracked file:
perf-iostat
Add this generated file to perf's .gitignore file.
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210419094147.15909-5-alexander.antonov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This functionality is based on recently introduced sysfs attributes for
Intel® Xeon® Scalable processor family (code name Skylake-SP):
Commit bb42b3d39781d7fc ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping")
Mode is intended to provide four I/O performance metrics in MB per each
PCIe root port:
- Inbound Read: I/O devices below root port read from the host memory
- Inbound Write: I/O devices below root port write to the host memory
- Outbound Read: CPU reads from I/O devices below root port
- Outbound Write: CPU writes to I/O devices below root port
Each metric requiries only one uncore event which increments at every 4B
transfer in corresponding direction. The formulas to compute metrics
are generic:
#EventCount * 4B / (1024 * 1024)
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210419094147.15909-4-alexander.antonov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Introduce helper functions to control PCIe root ports list.
These helpers will be used in the follow-up patch.
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210419094147.15909-3-alexander.antonov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add basic flow for a new iostat mode in perf. Mode is intended to
provide four I/O performance metrics per each PCIe root port: Inbound Read,
Inbound Write, Outbound Read, Outbound Write.
The actual code to compute the metrics and attribute it to
root port is in follow-on patches.
Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210419094147.15909-2-alexander.antonov@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Patch adds initial JSON/events for POWER10.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Paul Clarke <pc@us.ibm.com>
Tested-by: Paul Clarke <pc@us.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20210419112001.71466-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
- keep the ZC code, drop the code related to reinit
net/bridge/netfilter/ebtables.c
- fix build after move to net_generic
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Turns out, the default setting of attr.aux_watermark to half of the total
buffer size is not very useful, especially with smaller buffers. The
problem is that, after half of the buffer is filled up, the kernel updates
->aux_head and sets up the next "transaction", while observing that
->aux_tail is still zero (as userspace haven't had the chance to update
it), meaning that the trace will have to stop at the end of this second
"transaction". This means, for example, that the second PERF_RECORD_AUX in
every trace comes with TRUNCATED flag set.
Setting attr.aux_watermark to quarter of the buffer gives enough space for
the ->aux_tail update to be observed and prevents the data loss.
The obligatory before/after showcase:
> # perf_before record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 6 times to write data ]
> Warning:
> AUX data lost 4 times out of 10!
>
> [ perf record: Captured and wrote 0.099 MB perf.data ]
> # perf record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 4 times to write data ]
> [ perf record: Captured and wrote 0.039 MB perf.data ]
The effect is still visible with large workloads and large buffers,
although less pronounced.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210414154955.49603-3-alexander.shishkin@linux.intel.com
|
|
After gnulib update sed stopped matching `[[:space:]]*+' as before,
causing the following compilation error:
In file included from builtin-trace.c:719:
trace/beauty/generated/fsconfig_arrays.c:2:3: error: expected expression before ']' token
2 | [] = "",
| ^
trace/beauty/generated/fsconfig_arrays.c:2:3: error: array index in initializer not of integer type
trace/beauty/generated/fsconfig_arrays.c:2:3: note: (near initialization for 'fsconfig_cmds')
Fix this by correcting the regular expression used in the generator.
Also, clean up the script by removing redundant egrep, xargs, and printf
invocations.
Committer testing:
Continues to work:
$ cat tools/perf/trace/beauty/fsconfig.sh
#!/bin/sh
# SPDX-License-Identifier: LGPL-2.1
if [ $# -ne 1 ] ; then
linux_header_dir=tools/include/uapi/linux
else
linux_header_dir=$1
fi
linux_mount=${linux_header_dir}/mount.h
printf "static const char *fsconfig_cmds[] = {\n"
ms='[[:space:]]*'
sed -nr "s/^${ms}FSCONFIG_([[:alnum:]_]+)${ms}=${ms}([[:digit:]]+)${ms},.*/\t[\2] = \"\1\",/p" \
${linux_mount}
printf "};\n"
$ tools/perf/trace/beauty/fsconfig.sh
static const char *fsconfig_cmds[] = {
[0] = "SET_FLAG",
[1] = "SET_STRING",
[2] = "SET_BINARY",
[3] = "SET_PATH",
[4] = "SET_PATH_EMPTY",
[5] = "SET_FD",
[6] = "CMD_CREATE",
[7] = "CMD_RECONFIGURE",
};
$
Fixes: d35293004a5e4 ("perf beauty: Add generator for fsconfig's 'cmd' arg values")
Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
Co-authored-by: Dmitry V. Levin <ldv@altlinux.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/20210414182723.1670663-1-vt@altlinux.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
exec'ed
Before:
# perf record -a cycles,instructions,cache-misses
Workload failed: No such file or directory
#
After:
# perf record -a cycles,instructions,cache-misses
Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory
#
Helps disambiguating other error scenarios:
# perf record -a -e cycles,instructions,cache-misses bla
Failed to collect 'cycles,instructions,cache-misses' for the 'bla' workload: No such file or directory
# perf record -a cycles,instructions,cache-misses sleep 1
Failed to collect 'cycles' for the 'cycles,instructions,cache-misses' workload: No such file or directory
#
When all goes well we're back to the usual:
# perf record -a -e cycles,instructions,cache-misses sleep 1
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 3.151 MB perf.data (21242 samples) ]
#
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210414131628.2064862-3-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add a 'scnprintf' method to obtain the list of evsels in a evlist as a
string, excluding the "dummy" event used for things like receiving
metadata events (PERF_RECORD_FORK, MMAP, etc) when synthesizing
preexisting threads.
Will be used to improve the error message for workload failure in 'perf
record.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20210414131628.2064862-2-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In hist__find_annotations(), since different 'struct hist_entry' entries
may point to same symbol, we free notes->src to signal already processed
this symbol in stdio mode; when annotate, entry will skipped if
notes->src is NULL to avoid repeated output.
However, there is a problem, for example, run the following command:
# perf record -e branch-misses -e branch-instructions -a sleep 1
perf.data file contains different types of sample event.
If the same IP sample event exists in branch-misses and branch-instructions,
this event uses the same symbol. When annotate branch-misses events, notes->src
corresponding to this event is set to null, as a result, when annotate
branch-instructions events, this event is skipped and no annotate is output.
Solution of this patch is to remove zfree in hists__find_annotations and
change sort order to "dso,symbol" to avoid duplicate output when different
processes correspond to the same symbol.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: zhangjinhao2@huawei.com
Link: http://lore.kernel.org/lkml/20210319123527.173883-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up fixes from perf/urgent that got into upstream.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add PMU events for AMD Zen3 processors as documented in the AMD Processor
Programming Reference for Family 19h and Model 01h [1].
Below are the events which are new on Zen3:
PMCx041 ls_mab_alloc.{all_allocations|hardware_prefetcher_allocations|load_store_allocations}
PMCx043 ls_dmnd_fills_from_sys.ext_cache_local
PMCx044 ls_any_fills_from_sys.{mem_io_remote|ext_cache_remote|mem_io_local|ext_cache_local|int_cache|lcl_l2}
PMCx047 ls_misal_loads.{ma4k|ma64}
PMCx059 ls_sw_pf_dc_fills.ext_cache_local
PMCx05a ls_hw_pf_dc_fills.ext_cache_local
PMCx05f ls_alloc_mab_count
PMCx085 bp_l1_tlb_miss_l2_tlb_miss.coalesced_4k
PMCx0ab de_dis_cops_from_decoder.disp_op_type.{any_integer_dispatch|any_fp_dispatch}
PMCx0cc ex_ret_ind_brch_instr
PMCx18e ic_tag_hit_miss.{all_instruction_cache_accesses|instruction_cache_miss|instruction_cache_hit}
PMCx1c7 ex_ret_msprd_brnch_instr_dir_msmtch
PMCx28f op_cache_hit_miss.{all_op_cache_accesses|op_cache_miss|op_cache_hit}
Section 2.1.17.2 "Performance Measurement" of "PPR for AMD Family 19h,
Model 01h, Revision B1 Processors - 55898 Rev 0.35 - Feb 5, 2021." lists
new metrics. Add them.
Preserve the events for Zen3 if they are measurable and non-zero as taken
from Zen2 directory even if the PPR of Zen3 [1] omits them. Those events
are the following:
PMCx000 fpu_pipe_assignment.{total|total0|total1|total2|total3}
PMCx004 fp_num_mov_elim_scal_op.{optimized|opt_potential|sse_mov_ops_elim|sse_mov_ops}
PMCx02D ls_rdtsc
PMCx040 ls_dc_accesses
PMCx046 ls_tablewalker.{iside|ic_type1|ic_type0|dside|dc_type1|dc_type0}
PMCx061 l2_request_g2.{group1|ls_rd_sized|ls_rd_sized_nc|ic_rd_sized|ic_rd_sized_nc|smc_inval|bus_lock_originator|bus_locks_responses}
PMCx062 l2_latency.l2_cycles_waiting_on_fills
PMCx063 l2_wcb_req.{wcb_write|wcb_close|zero_byte_store|cl_zero}
PMCx06d l2_fill_pending.l2_fill_busy
PMCx080 ic_fw32
PMCx081 ic_fw32_miss
PMCx086 bp_snp_re_sync
PMCx087 ic_fetch_stall.{ic_stall_any|ic_stall_dq_empty|ic_stall_back_pressure}
PMCx08a bp_l1_btb_correct
PMCx08c ic_cache_inval.{l2_invalidating_probe|fill_invalidated}
PMCx099 bp_tlb_rel
PMCx0a9 de_dis_uop_queue_empty_di0
PMCx0c7 ex_ret_brn_resync
PMCx28a ic_oc_mode_switch.{oc_ic_mode_switch|ic_oc_mode_switch}
L3PMCx01 l3_request_g1.caching_l3_cache_accesses
L3PMCx06 l3_comb_clstr_state.{other_l3_miss_typs|request_miss}
[1] Processor Programming Reference (PPR) for AMD Family 19h, Model 01h,
Revision B1 Processors - 55898 Rev 0.35 - Feb 5, 2021.
[2] Processor Programming Reference (PPR) for AMD Family 17h Model 71h,
Revision B0 Processors, 56176 Rev 3.06 - Jul 17, 2019.
[3] Processor Programming Reference (PPR) for AMD Family 17h Models
01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019.
All of the PPRs can be found at:
https://bugzilla.kernel.org/show_bug.cgi?id=206537
Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-5-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Use 0x%02x format for all event codes and umasks as this helps in tracking
changes of automatically generated event tables.
Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-4-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The values of event codes and umasks are inconsistent with letter cases.
Enforce a unique style and default everything to lower case as this
helps in tracking changes of automatically generated event tables.
Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-3-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit 08ed77e414ab2342 ("perf vendor events amd: Add recommended events")
added the hits event "L2 Cache Hits from L2 HWPF" with the same metric
expression as the accesses event "L2 Cache Accesses from L2 HWPF":
$ perf list --details
...
l2_cache_accesses_from_l2_hwpf
[L2 Cache Accesses from L2 HWPF]
[l2_pf_hit_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
l2_cache_hits_from_l2_hwpf
[L2 Cache Hits from L2 HWPF]
[l2_pf_hit_l2 + l2_pf_miss_l2_hit_l3 + l2_pf_miss_l2_l3]
...
This was wrong and led to counting hits the same as accesses. Section
2.1.15.2 "Performance Measurement" of "PPR for AMD Family 17h Model 31h
B0 - 55803 Rev 0.54 - Sep 12, 2019", documents the hits event with
EventCode 0x70 which is the same as l2_pf_hit_l2.
Fix this, and massage the description for l2_pf_hit_l2 as the hits event
is now the duplicate of l2_pf_hit_l2. AMD recommends using the recommended
event over other events if the duplicate exists and maintain both for
consistency. Hence, l2_cache_hits_from_l2_hwpf should override
l2_pf_hit_l2.
Before:
# perf stat -M l2_cache_accesses_from_l2_hwpf,l2_cache_hits_from_l2_hwpf sleep 1
Performance counter stats for 'sleep 1':
1,436 l2_pf_miss_l2_l3 # 11114.00 l2_cache_accesses_from_l2_hwpf
# 11114.00 l2_cache_hits_from_l2_hwpf
4,482 l2_pf_hit_l2
5,196 l2_pf_miss_l2_hit_l3
1.001765339 seconds time elapsed
After:
# perf stat -M l2_cache_accesses_from_l2_hwpf sleep 1
Performance counter stats for 'sleep 1':
1,477 l2_pf_miss_l2_l3 # 10442.00 l2_cache_accesses_from_l2_hwpf
3,978 l2_pf_hit_l2
4,987 l2_pf_miss_l2_hit_l3
1.001491186 seconds time elapsed
# perf stat -e l2_cache_hits_from_l2_hwpf sleep 1
Performance counter stats for 'sleep 1':
3,983 l2_cache_hits_from_l2_hwpf
1.001329970 seconds time elapsed
Note the difference in performance counter values for the accesses
versus the hits after the fix, and the hits event now counting the same
as l2_pf_hit_l2.
Fixes: 08ed77e414ab ("perf vendor events amd: Add recommended events")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Reviewed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org> # On a 3900X
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vijay Thakkar <vijaythakkar@me.com>
Cc: linux-perf-users@vger.kernel.org
Link: https://lore.kernel.org/r/20210406215944.113332-2-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add L3 metrics.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/1617791570-165223-7-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add L2 metrics.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/1617791570-165223-6-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add L1 metrics. Formula is as consistent as possible with MAN pages
description for these metrics.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/1617791570-165223-5-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
the system
Add a function to find the common PMU map for the system.
For arm64, a special variant is added. This is because arm64 supports
heterogeneous CPU systems. As such, it cannot be guaranteed that the
cpumap is same for all CPUs. So in case of heterogeneous systems, don't
return a cpumap.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Paul A. Clarke <pc@us.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/1617791570-165223-4-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The pmu-events parsing test does not handle metric reuse at all.
Introduce some simple handling to resolve metrics who reference other
metrics.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Paul A. Clarke <pc@us.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linuxarm@huawei.com
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/1617791570-165223-3-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Function find_metric() is required for the metric processing in the
pmu-events testcase, so make it public. Also change the name to include
"metricgroup".
Tested-by: Paul A. Clarke <pc@us.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will@kernel.org>
Cc: linuxarm@huawei.com
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/1617791570-165223-2-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
SPE extended headers are > 1 byte so ensure the buffer contains at least
this before reading. This issue was detected by fuzzing.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20210407153955.317215-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When '--total-cycles' is specified, it supports sorting for all blocks
by 'Sampled Cycles%'. This is useful to concentrate on the globally
hottest blocks.
'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles
But in current code, it doesn't use the cycles aggregation. Part of
'cycles' counting is possibly dropped for some overlap jumps. But for
identifying the hot block, we always need the full cycles.
# perf record -b ./triad_loop
# perf report --total-cycles --stdio
Before:
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... ............................................................. .................
#
0.81% 793 4.32% 793 [setup-vdso.h:34 -> setup-vdso.h:40] ld-2.27.so
0.49% 480 0.87% 160 [native_write_msr+0 -> native_write_msr+16] [kernel.kallsyms]
0.48% 476 0.52% 95 [native_read_msr+0 -> native_read_msr+29] [kernel.kallsyms]
0.31% 303 1.65% 303 [nmi_restore+0 -> nmi_restore+37] [kernel.kallsyms]
0.26% 255 1.39% 255 [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162] [kernel.kallsyms]
0.24% 234 1.28% 234 [end_repeat_nmi+67 -> end_repeat_nmi+83] [kernel.kallsyms]
0.23% 227 1.24% 227 [__irqentry_text_end+96 -> __irqentry_text_end+126] [kernel.kallsyms]
0.20% 194 1.06% 194 [native_set_debugreg+52 -> native_set_debugreg+56] [kernel.kallsyms]
0.11% 106 0.14% 26 [native_sched_clock+0 -> native_sched_clock+98] [kernel.kallsyms]
0.10% 97 0.53% 97 [trigger_load_balance+0 -> trigger_load_balance+67] [kernel.kallsyms]
0.09% 85 0.46% 85 [get-dynamic-info.h:102 -> get-dynamic-info.h:111] ld-2.27.so
...
0.00% 92.7K 0.02% 4 [triad_loop.c:64 -> triad_loop.c:65] triad_loop
The hottest block '[triad_loop.c:64 -> triad_loop.c:65]' is not at
the top of output.
After:
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles [Program Block Range] Shared Object
# ............... .............. ........... .......... .............................................................. .................
#
94.35% 92.7K 0.02% 4 [triad_loop.c:64 -> triad_loop.c:65] triad_loop
0.81% 793 4.32% 793 [setup-vdso.h:34 -> setup-vdso.h:40] ld-2.27.so
0.49% 480 0.87% 160 [native_write_msr+0 -> native_write_msr+16] [kernel.kallsyms]
0.48% 476 0.52% 95 [native_read_msr+0 -> native_read_msr+29] [kernel.kallsyms]
0.31% 303 1.65% 303 [nmi_restore+0 -> nmi_restore+37] [kernel.kallsyms]
0.26% 255 1.39% 255 [nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162] [kernel.kallsyms]
0.24% 234 1.28% 234 [end_repeat_nmi+67 -> end_repeat_nmi+83] [kernel.kallsyms]
0.23% 227 1.24% 227 [__irqentry_text_end+96 -> __irqentry_text_end+126] [kernel.kallsyms]
0.20% 194 1.06% 194 [native_set_debugreg+52 -> native_set_debugreg+56] [kernel.kallsyms]
0.11% 106 0.14% 26 [native_sched_clock+0 -> native_sched_clock+98] [kernel.kallsyms]
0.10% 97 0.53% 97 [trigger_load_balance+0 -> trigger_load_balance+67] [kernel.kallsyms]
0.09% 85 0.46% 85 [get-dynamic-info.h:102 -> get-dynamic-info.h:111] ld-2.27.so
0.08% 82 0.06% 11 [intel_pmu_drain_pebs_nhm+580 -> intel_pmu_drain_pebs_nhm+627] [kernel.kallsyms]
0.08% 77 0.42% 77 [lru_add_drain_cpu+0 -> lru_add_drain_cpu+133] [kernel.kallsyms]
0.08% 74 0.10% 18 [handle_pmi_common+271 -> handle_pmi_common+310] [kernel.kallsyms]
0.08% 74 0.40% 74 [get-dynamic-info.h:131 -> get-dynamic-info.h:157] ld-2.27.so
0.07% 69 0.09% 17 [intel_pmu_drain_pebs_nhm+432 -> intel_pmu_drain_pebs_nhm+468] [kernel.kallsyms]
Now the hottest block is reported at the top of output.
Fixes: b65a7d372b1a55db ("perf hist: Support block formats with compare/sort/display")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210407024452.29988-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'struct mem_info' is defined at 22nd line.
The declaration here is unnecessary. Remove it.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kael_w@yeah.net
Link: http://lore.kernel.org/lkml/20210406105104.675879-1-wanjiabing@vivo.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Since commit 14d3d54052539a1e ("perf session: Try to read pipe data from
file") 'perf inject' has started printing "PERFILE2h" when not processing
pipes.
The commit exposed perf to the possiblity that the input is not a pipe
but the 'repipe' parameter gets used. That causes the printing because
perf inject sets 'repipe' to true always.
The 'repipe' parameter of perf_session__new() is used by 2 functions:
- perf_file_header__read_pipe()
- trace_report()
In both cases, the functions copy data to STDOUT_FILENO when 'repipe' is
true.
Fix by setting 'repipe' to true only if the output is a pipe.
Fixes: e558a5bd8b74aff4 ("perf inject: Work with files")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andrew Vagin <avagin@openvz.org>
Link: http://lore.kernel.org/lkml/20210401103605.9000-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'struct target' is declared twice. One has been declared at 21st line.
Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kael_w@yeah.net
Link: http://lore.kernel.org/lkml/20210401062424.991737-1-wanjiabing@vivo.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In particular we want to have this upstream commit:
b90829704780: ("bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG")
... before merging in x86/cpu changes and the removal of the NOP optimizations, and
applying PeterZ's !retpoline objtool series.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
'perf annotate' supports --symbol but it's impossible to filter a C++
symbol. With --no-demangle one can filter easily by mangled function
name.
Signed-off-by: Martin Liška <mliska@suse.cz>
Link: http://lore.kernel.org/lkml/c3c7e959-9f7f-18e2-e795-f604275cbac3@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Some OCaml developers reported that this bit of information is sometimes
useful for disambiguating functions for which the OCaml compiler assigns
the same name, e.g. nested or inlined functions.
Signed-off-by: Fabian Hemmer <copy@copy.sh>
Link: http://lore.kernel.org/lkml/20210226075223.p3s5oz4jbxwnqjtv@nyu
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up fixes sent via perf/urgent and in the BPF tools/ directories.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When executing the daemon test on Arm64 and x86 with Debian (Buster)
distro, both skip the test case with the log:
# ./perf test -v 76
76: daemon operations :
--- start ---
test child forked, pid 11687
test daemon list
trap: SIGINT: bad trap
./tests/shell/daemon.sh: 173: local: cpu-clock: bad variable name
test child finished with -2
---- end ----
daemon operations: Skip
So the error happens for the variable expansion when use local variable
in the shell script. Since Debian Buster uses dash but not bash as
non-interactive shell, when execute the daemon testing, it hits a known
issue for dash which was reported [1].
To resolve this issue, one option is to add double quotes for all local
variables assignment, so need to change the code from:
local line=`perf daemon --config ${config} -x: | head -2 | tail -1`
... to:
local line="`perf daemon --config ${config} -x: | head -2 | tail -1`"
But the testing script has bunch of local variables, this leads to big
changes for whole script.
On the other hand, the testing script asks to use the "local" feature
which is bash-specific, so this patch explicitly uses "#!/bin/bash" to
ensure running the script with bash.
After:
# ./perf test -v 76
76: daemon operations :
--- start ---
test child forked, pid 11329
test daemon list
test daemon reconfig
test daemon stop
test daemon signal
signal 12 sent to session 'test [11596]'
signal 12 sent to session 'test [11596]'
test daemon ping
test daemon lock
test child finished with 0
---- end ----
daemon operations: Ok
[1] https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097
Fixes: 2291bb915b55 ("perf tests: Add daemon 'list' command test")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210320104554.529213-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The sort dimension "p_stage_cyc" is used to represent pipeline
stage cycle information. Presently, this is used only in powerpc.
For unsupported platforms, we don't want to display it
in the perf report output columns. Hence add check in sort_dimension__add()
and skip the sort key incase it is not applicable for the particular arch.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/1616425047-1666-6-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|