Age | Commit message (Collapse) | Author |
|
The build of kernel v4.14-rc1 for i686 fails on RHEL 6 with the error
in tools/perf:
util/syscalltbl.c:157: error: expected ';', ',' or ')' before '__maybe_unused'
mv: cannot stat `util/.syscalltbl.o.tmp': No such file or directory
Fix it by placing/moving:
#include <linux/compiler.h>
outside of #ifdef HAVE_SYSCALL_TABLE block.
Signed-off-by: Akemi Yagi <toracat@elrepo.org>
Cc: Alan Bartlett <ajb@elrepo.org>
Link: http://lkml.kernel.org/r/oq41r8$1v9$1@blaine.gmane.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
With --call-graph option, perf report can display call chains using
type, min percent threshold, optional print limit and order. And the
default call-graph parameter is 'graph,0.5,caller,function,percent'.
Before this patch, 'perf report --call-graph' shows incorrect debug
messages as below:
# perf report --call-graph
Invalid callchain mode: 0.5
Invalid callchain order: 0.5
Invalid callchain sort key: 0.5
Invalid callchain config key: 0.5
Invalid callchain mode: caller
Invalid callchain mode: function
Invalid callchain order: function
Invalid callchain mode: percent
Invalid callchain order: percent
Invalid callchain sort key: percent
That is because in function __parse_callchain_report_opt(),each field of
the call-graph parameter is passed to parse_callchain_{mode,order,
sort_key,value} in turn until it meets the matching value.
For example, the order field "caller" is passed to
parse_callchain_mode() firstly and obviously it doesn't match any mode
field. Therefore parse_callchain_mode() will shows the debug message
"Invalid callchain mode: caller", which could confuse users.
The patch fixes this issue by moving the warning out of the function
parse_callchain_{mode,order,sort_key,value}.
Signed-off-by: Mengting Zhang <zhangmengting@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1506154694-39691-1-git-send-email-zhangmengting@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Yet another fix for probing the max attr.precise_ip setting: it is not
enough settting attr.exclude_kernel for !root users, as they _can_
profile the kernel if the kernel.perf_event_paranoid sysctl is set to
-1, so check that as well.
Testing it:
As non root:
$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = 2
$ perf record sleep 1
$ perf evlist -v
cycles:uppp: ..., exclude_kernel: 1, ... precise_ip: 3, ...
Now as non-root, but with kernel.perf_event_paranoid set set to the
most permissive value, -1:
$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = -1
$ perf record sleep 1
$ perf evlist -v
cycles:ppp: ..., exclude_kernel: 0, ... precise_ip: 3, ...
$
I.e. non-root, default kernel.perf_event_paranoid: :uppp modifier = not allowed to sample the kernel,
non-root, most permissible kernel.perf_event_paranoid: :ppp = allowed to sample the kernel.
In both cases, use the highest available precision: attr.precise_ip = 3.
Reported-and-Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: d37a36979077 ("perf evsel: Fix attr.exclude_kernel setting for default cycles:p")
Link: http://lkml.kernel.org/n/tip-nj2qkf75xsd6pw6hhjzfqqdx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now that I'm switching the container builds from using a local volume
pointing to the kernel repository with the perf sources, instead getting
a detached tarball to be able to use a container cluster, some places
broke because I forgot to put some of the required files in
tools/perf/MANIFEST, namely some bitsperlong.h files.
So, to fix it do the same as for tools/build/ and pack the whole
tools/arch/ directory.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wmenpjfjsobwdnfde30qqncj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Support direct --user-regs arguments in 'perf record', previously the
only way to sample PERF_SAMPLE_REGS_USER was implicitly selecting it
when recording callchains (Andi Kleen)
- Support showing sampled user regs in 'perf script' (Andi Kleen)
- Introduce the concept of weak groups in 'perf stat': try to set up a
group, but if it's not schedulable fallback to not using a group. That
gives us the best of both worlds: groups if they work, but still a
usable fallback if they don't. E.g: (Andi Kleen)
% perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1
125,366,055 branches (80.02%)
9,208,402 branch-misses # 7.35% of all branches (80.01%)
24,560,249 l1d.replacement (80.00%)
43,174,971 l2_lines_in.all (80.05%)
31,891,457 l2_rqsts.all_code_rd (79.92%)
- Support metrics in 'stat' and 'list'. A metric is a formula that
uses multiple events to compute a higher level result (e.g. IPC). (Andi Kleen)
- Add Intel processors vendor event metrics JSON files (Andi Kleen)
- Add 'pid' and 'tid' options to 'perf sched timehist' (David Ahern)
- Generate 'behavior' string table from kernel headers, helps getting
new parameters when synchronizing kernel headers, like MADV_WIPEONFORK
and MADV_KEEPONFORK, that are now beautied (Arnaldo Carvalho de Melo)
- Improve TUI progress bar by showing how many bytes from a total were
processed (Jiri Olsa)
- Use scandir() to replace readdir(), prep work to have the synthesizing
of PERF_RECORD_ entries for existing threads be multithreaded, making
'perf top' bearable on high core count systems such as Intel's Knights
Landing/Mill (Kan Liang)
- Allow creating a ~/.perfconfig file when setting a variable to its
default value, previously it would bail out and not write such a
file (Taeung Song)
- Introduce wrapper for allowing purely single threaded apps to avoid
the costs of locking (Arnaldo Carvalho de Melo)
- Introduce hashtable to reduce the cost of thread lookup
- Fix build C++ build wrt poison.h using void pointer arithmetic,
affects only the embedded clang/llvm case, that is disabled by
default (Arnaldo Carvalho de Melo)
- Fix leaking rec_argv in error cases (Martin Kepplinger)
- Remove Intel CQM perf test, that infrastructure was nuked (Xiaochen Shen)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Andi reported a performance drop in single threaded perf tools such as
'perf script' due to the growing number of locks being put in place to
allow for multithreaded tools, so wrap the POSIX threads rwlock routines
with the names used for such kinds of locks in the Linux kernel and then
allow for tools to ask for those locks to be used or not.
I.e. a tool may have a multithreaded phase and then switch to single
threaded, like the upcoming patches for the synthesizing of
PERF_RECORD_{FORK,MMAP,etc} for pre-existing processes to then switch to
single threaded mode in 'perf top'.
The init routines will not be conditional, this way starting as single
threaded to then move to multi threaded mode should be possible.
Reported-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170404161739.GH12903@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now that I'm switching the container builds from using a local volume
pointing to the kernel repository with the perf sources, instead getting
a detached tarball to be able to use a container cluster, some places
broke because I forgot to put some of the required files in
tools/perf/MANIFEST, namely some bitsperlong.h files.
So, to fix it do the same as for tools/build/ and pack the whole
tools/arch/ directory.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wmenpjfjsobwdnfde30qqncj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This is one more case where the way that syscall parameter values are
defined in kernel headers are easy to parse using a shell script that
will then generate the string table that gets used by the madvise
'behaviour' argument beautifier.
This way as soon as the header syncronization mechanism in perf's build
system detects a change in a copy of a kernel ABI header and that file
is syncronized, we get 'perf trace' updated automagically.
So, when we syncronize this:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
We'll get these:
#define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
#define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-dolb0ghds4ui7wc1npgkchvc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Intel CQM perf test is obsolete for perf PMU code has been removed in
commit c39a0e2c8850 ("x86/perf/cqm: Wipe out perf based cqm").
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Pei P Jia <pei.p.jia@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1505797057-16300-1-git-send-email-xiaochen.shen@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The -M metric group parser threw away the events of earlier groups when
multiple groups were specified. Fix this here by not overwriting the
string incorrectly.
Now this works correctly:
% perf stat -M Summary,SMT --metric-only -a sleep 1
Performance counter stats for 'system wide':
Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization SMT_2T_Utilization Kernel_Utilization CoreIPC CORE_CLKS
900907376.0 2.7 2398954144.0 0.1 0.0 0.2 0.2 0.1 0.4 2080822855.5
while previously it would only show the SMT metrics.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170914205735.18431-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Let's free the allocated rec_argv in case we return early, in order to
avoid leaking memory.
This adds free() at a few very similar places across the tree where it
was missing.
Signed-off-by: Martin Kepplinger <martink@posteo.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Martin kepplinger <martink@posteo.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170913191419.29806-1-martink@posteo.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When a PMU is missing print a better error message mentioning
the missing PMU.
% mkdir empty
% mount --bind empty /sys/devices/msr
% perf stat -M Summary true
event syntax error: '{inst_retired.any,cycles}:W,{cpu_clk_unhalted.thread}:W,{inst_retired.any}:W,{cpu_clk_unhalted.ref_tsc,msr/tsc/}:W,{fp_comp_ops_exe.sse_scalar..'
\___ Cannot find PMU `msr'. Missing kernel support?
It still cannot find the right column for aliases, but it's already a vast improvement.
v2: Check asprintf
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170913215006.32222-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In some cases we already have calculated the hash bucket, so reuse it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-800zehjsyy03er4s4jf0e99v@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To process any events, it needs to find the thread in the machine first.
The machine maintains a rb tree to store all threads. The rb tree is
protected by a rw lock.
It is not a problem for current perf which serially processing events.
However, it will have scalability performance issue to process events in
parallel, especially on a heavy load system which have many threads.
Introduce a hashtable to divide the big rb tree into many samll rb tree
for threads. The index is thread id % hashtable size. It can reduce the
lock contention.
Committer notes:
Renamed some variables and function names to reduce semantic confusion:
'struct threads' pointers: thread -> threads
threads hastable index: tid -> hash_bucket
struct threads *machine__thread() -> machine__threads()
Cast tid to (unsigned int) to handle -1 in machine__threads() (Kan Liang)
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1505096603-215017-2-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.
The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.
I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.
I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.
I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.
This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.
[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add JSON metrics for Skylake server
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Broadwell DE
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Broadwell Server.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Haswell EP.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Ivy Town.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Haswell.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Ivy Bridge.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Sandy Bridge EP.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Sandy Bridge.
Committer testing:
# grep "model name" /proc/cpuinfo | head -1
model name : Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
# perf list metricgroup
List of pre-defined events (to be used in -e):
Metric Groups:
DSB
FLOPS
Frontend
Frontend_Bandwidth
Pipeline
Ports_Utilization
Power
SMT
Summary
TopDownL1
# perf stat -M Power --metric-only -a sleep 1
Performance counter stats for 'system wide':
Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
0.8 0.0 98.1 0.0 0.0 0.0 23.4 0.0
1.001153658 seconds time elapsed
# perf stat -v -M Power --metric-only -a sleep 1
Using CPUID GenuineIntel-6-2A
metric expr cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc for Turbo_Utilization
found event cpu_clk_unhalted.thread
found event cpu_clk_unhalted.ref_tsc
metric expr (cstate_core@c3\-residency@ / msr@tsc@) * 100 for C3_Core_Residency
found event cstate_core/c3-residency/
found event msr/tsc/
metric expr (cstate_core@c6\-residency@ / msr@tsc@) * 100 for C6_Core_Residency
found event cstate_core/c6-residency/
found event msr/tsc/
metric expr (cstate_core@c7\-residency@ / msr@tsc@) * 100 for C7_Core_Residency
found event cstate_core/c7-residency/
found event msr/tsc/
metric expr (cstate_pkg@c2\-residency@ / msr@tsc@) * 100 for C2_Pkg_Residency
found event cstate_pkg/c2-residency/
found event msr/tsc/
metric expr (cstate_pkg@c3\-residency@ / msr@tsc@) * 100 for C3_Pkg_Residency
found event cstate_pkg/c3-residency/
found event msr/tsc/
metric expr (cstate_pkg@c6\-residency@ / msr@tsc@) * 100 for C6_Pkg_Residency
found event cstate_pkg/c6-residency/
found event msr/tsc/
metric expr (cstate_pkg@c7\-residency@ / msr@tsc@) * 100 for C7_Pkg_Residency
found event cstate_pkg/c7-residency/
found event msr/tsc/
adding {cpu_clk_unhalted.thread,cpu_clk_unhalted.ref_tsc}:W,{cstate_core/c3-residency/,msr/tsc/}:W,{cstate_core/c6-residency/,msr/tsc/}:W,{cstate_core/c7-residency/,msr/tsc/}:W,{cstate_pkg/c2-residency/,msr/tsc/}:W,{cstate_pkg/c3-residency/,msr/tsc/}:W,{cstate_pkg/c6-residency/,msr/tsc/}:W,{cstate_pkg/c7-residency/,msr/tsc/}:W
cpu_clk_unhalted.thread -> cpu/event=0x3c/
cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/
Weak group for cstate_pkg/c2-residency//2 failed
Weak group for cstate_pkg/c3-residency//2 failed
Weak group for cstate_pkg/c6-residency//2 failed
Weak group for cstate_pkg/c7-residency//2 failed
cpu_clk_unhalted.thread: 5564185 4002833569 4002833569
cpu_clk_unhalted.ref_tsc: 7325424 4002833569 4002833569
cstate_core/c3-residency/: 68293 4003027101 4003027101
msr/tsc/: 12451294472 4003027101 4003027101
cstate_core/c6-residency/: 12238830163 4003260984 4003260984
msr/tsc/: 12452017806 4003260984 4003260984
cstate_core/c7-residency/: 0 4003489648 4003489648
msr/tsc/: 12452725162 4003489648 4003489648
cstate_pkg/c2-residency/: 1830054 1000913138 1000913138
msr/tsc/: 12453441079 4003717513 4003717513
cstate_pkg/c3-residency/: 0 1000973570 1000973570
msr/tsc/: 12454177865 4003954758 4003954758
cstate_pkg/c6-residency/: 2940448859 1001032370 1001032370
msr/tsc/: 12454833890 4004166118 4004166118
cstate_pkg/c7-residency/: 0 1001049818 1001049818
msr/tsc/: 12454919470 4004194204 4004194204
Performance counter stats for 'system wide':
Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
0.8 0.0 98.3 0.0 0.0 0.0 23.6 0.0
1.001126519 seconds time elapsed
#
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Skylake.
Committer testing:
# grep "model name" /proc/cpuinfo | head -1
model name : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
# uname -a
Linux seventh 4.12.0-rc6+ #1 SMP Fri Jun 30 16:40:55 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
# perf stat --metric-only -M Summary -a sleep 1
Performance counter stats for 'system wide':
Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization
34021097.0 0.0 119424171.0 0.0 0.0 0.0 0.0
1.001001793 seconds time elapsed
# perf list metricgroup
List of pre-defined events (to be used in -e):
Metric Groups:
DSB
FLOPS
Frontend
Frontend_Bandwidth
Memory_BW
Memory_Bound
Memory_Lat
Pipeline
Ports_Utilization
Power
SMT
Summary
TLB
TopDownL1
Unknown_Branches
# perf stat --metric-only -M Ports_Utilization -a sleep 1
Performance counter stats for 'system wide':
ILP
1475828.0
1.000688547 seconds time elapsed
# perf stat -v --metric-only -M Ports_Utilization -a sleep 1
Using CPUID GenuineIntel-6-9E
metric expr uops_executed.thread / ( uops_executed.core_cycles_ge_1 / 2) if #smt_on else uops_executed.core_cycles_ge_1 for ILP
found event uops_executed.thread
found event uops_executed.core_cycles_ge_1
adding {uops_executed.thread,uops_executed.core_cycles_ge_1}:W
uops_executed.thread -> cpu/umask=0x1,period=2000003,event=0xb1/
uops_executed.core_cycles_ge_1 -> cpu/umask=0x2,period=2000003,cmask=1,event=0xb1/
uops_executed.thread: 8115271 4002547654 4002547654
uops_executed.core_cycles_ge_1: 3282969 4002547654 4002547654
Performance counter stats for 'system wide':
ILP
3282969.0
1.000719870 seconds time elapsed
#
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add JSON metrics for Broadwell.
Commiter testing:
# uname -a
Linux jouet 4.13.0-rc7+ #3 SMP Sat Sep 2 09:04:44 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
# grep "model name" /proc/cpuinfo | head -1
model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
# perf list metricgroup
List of pre-defined events (to be used in -e):
Metric Groups:
DSB
FLOPS
Frontend
Frontend_Bandwidth
Memory_BW
Memory_Bound
Memory_Lat
Pipeline
Ports_Utilization
Power
SMT
Summary
TLB
TopDownL1
Unknown_Branches
# perf stat -M Power --metric-only -a sleep 1
Performance counter stats for 'system wide':
Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1.003502904 seconds time elapsed
#
# perf stat -M Memory_BW --metric-only -a sleep 1
Performance counter stats for 'system wide':
MLP
1.7
1.001364525 seconds time elapsed
#
# perf stat -M TLB --metric-only -a sleep 1
Performance counter stats for 'system wide':
Page_Walks_Utilization
0.1
1.005962198 seconds time elapsed
#
# perf stat -M Summary --metric-only -a sleep 1
Performance counter stats for 'system wide':
Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization
7281856697.0 0.0 11150898087.0 1.0 0.0 1.0 0.7
1.012134025 seconds time elapsed
#
Running in verbose mode shows which counters and expressions are being
used:
# perf stat -v -M Summary --metric-only -a sleep 1
Using CPUID GenuineIntel-6-3D
metric expr 1 / inst_retired.any / cycles for CPI
found event inst_retired.any
found event cycles
metric expr cpu_clk_unhalted.thread for CLKS
found event cpu_clk_unhalted.thread
metric expr inst_retired.any for Instructions
found event inst_retired.any
metric expr cpu_clk_unhalted.ref_tsc / msr@tsc@ for CPU_Utilization
found event cpu_clk_unhalted.ref_tsc
found event msr/tsc/
metric expr ( 1*( fp_arith_inst_retired.scalar_single + fp_arith_inst_retired.scalar_double ) + 2* fp_arith_inst_retired.128b_packed_double + 4*( fp_arith_inst_retired.128b_packed_single + fp_arith_inst_retired.256b_packed_double ) + 8* fp_arith_inst_retired.256b_packed_single ) / 1000000000 / duration_time for GFLOPs
found event fp_arith_inst_retired.scalar_single
found event fp_arith_inst_retired.scalar_double
found event fp_arith_inst_retired.128b_packed_double
found event fp_arith_inst_retired.128b_packed_single
found event fp_arith_inst_retired.256b_packed_double
found event fp_arith_inst_retired.256b_packed_single
found event duration_time
metric expr 1 - cpu_clk_thread_unhalted.one_thread_active / ( cpu_clk_thread_unhalted.ref_xclk_any / 2 ) if #smt_on else 0 for SMT_2T_Utilization
found event cpu_clk_thread_unhalted.one_thread_active
found event cpu_clk_thread_unhalted.ref_xclk_any
metric expr cpu_clk_unhalted.ref_tsc:u / cpu_clk_unhalted.ref_tsc for Kernel_Utilization
found event cpu_clk_unhalted.ref_tsc:u
found event cpu_clk_unhalted.ref_tsc
adding {inst_retired.any,cycles}:W,{cpu_clk_unhalted.thread}:W,{inst_retired.any}:W,{cpu_clk_unhalted.ref_tsc,msr/tsc/}:W,{fp_arith_inst_retired.scalar_single,fp_arith_inst_retired.scalar_double,fp_arith_inst_retired.128b_packed_double,fp_arith_inst_retired.128b_packed_single,fp_arith_inst_retired.256b_packed_double,fp_arith_inst_retired.256b_packed_single,duration_time}:W,{cpu_clk_thread_unhalted.one_thread_active,cpu_clk_thread_unhalted.ref_xclk_any}:W,{cpu_clk_unhalted.ref_tsc:u,cpu_clk_unhalted.ref_tsc}:W
inst_retired.any -> cpu/event=0xc0/
cpu_clk_unhalted.thread -> cpu/event=0x3c/
inst_retired.any -> cpu/event=0xc0/
cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/
fp_arith_inst_retired.scalar_single -> cpu/umask=0x2,period=2000003,event=0xc7/
fp_arith_inst_retired.scalar_double -> cpu/umask=0x1,period=2000003,event=0xc7/
fp_arith_inst_retired.128b_packed_double -> cpu/umask=0x4,period=2000003,event=0xc7/
fp_arith_inst_retired.128b_packed_single -> cpu/umask=0x8,period=2000003,event=0xc7/
fp_arith_inst_retired.256b_packed_double -> cpu/umask=0x10,period=2000003,event=0xc7/
fp_arith_inst_retired.256b_packed_single -> cpu/umask=0x20,period=2000003,event=0xc7/
cpu_clk_thread_unhalted.one_thread_active -> cpu/umask=0x2,period=2000003,event=0x3c/
cpu_clk_thread_unhalted.ref_xclk_any -> cpu/umask=0x1,any=1,period=2000003,event=0x3c/
cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/
cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/
Weak group for fp_arith_inst_retired.scalar_single/7 failed
Weak group for cpu_clk_unhalted.ref_tsc:u/2 failed
inst_retired.any: 8704146437 4026374016 619883741
cycles: 11180800018 4026374016 619883741
cpu_clk_unhalted.thread: 11140030295 4026323772 931621933
inst_retired.any: 8643115117 4026260510 1243595906
cpu_clk_unhalted.ref_tsc: 10201638510 4026184297 1247351077
msr/tsc/: 10378022785 4026184297 1247351077
fp_arith_inst_retired.scalar_single: 134697 4026102728 1559210545
fp_arith_inst_retired.scalar_double: 274339 4026007348 1870014984
fp_arith_inst_retired.128b_packed_double: 1639 4025886054 1866736918
fp_arith_inst_retired.128b_packed_single: 0 4025776614 2175106569
fp_arith_inst_retired.256b_packed_double: 0 4025681734 1235551129
fp_arith_inst_retired.256b_packed_single: 0 4025582962 1232398454
duration_time: 0 4025552913 4025552913
cpu_clk_thread_unhalted.one_thread_active: 10505 4025474649 923893076
cpu_clk_thread_unhalted.ref_xclk_any: 394992110 4025474649 923893076
cpu_clk_unhalted.ref_tsc:u: 5341421014 4025360315 1231634198
cpu_clk_unhalted.ref_tsc: 10258278508 4025252611 307909362
Performance counter stats for 'system wide':
Instructions CPI CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization
8704146437.0 0.0 11140030295.0 1.0 0.0 1.0 0.5
1.006783654 seconds time elapsed
#
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It's not possible to run a package event and a per cpu event in the same
group. This is used by some of the power metrics. They work correctly
when not using a group.
Normally weak groups should handle that, but in this case EBADF is
returned instead of the normal EINVAL.
$ strace -e perf_event_open ./perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
Using CPUID GenuineIntel-6-3E
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EINVAL (Invalid argument)
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = 3
perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, 3, 0) = 4
perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 1, 0, 0) = -1 EBADF (Bad file descriptor)
and perf errors out.
Make weak groups trigger a fall back for EBADF too. Then this case works correctly:
$ perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
Using CPUID GenuineIntel-6-3E
Weak group for cstate_pkg/c2-residency//2 failed
cstate_pkg/c2-residency/: 476709882 1000598460 1000598460
msr/tsc/: 39625837911 12007369110 12007369110
Performance counter stats for 'system wide':
476,709,882 cstate_pkg/c2-residency/
39,625,837,911 msr/tsc/
1.000697588 seconds time elapsed
This fixes perf stat -M Power ...
$ perf stat -M Power --metric-only -a sleep 1
Performance counter stats for 'system wide':
Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency
1.0 0.7 30.0 0.0 0.9 0.1 0.4 0.0
1.001240740 seconds time elapsed
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170905211324.32427-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There are no usage outside util.c and this is the only remaining reason
for fcntl.h to be included in util.h, to get the loff_t definition in
Alpine Linux, so make it static.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-2dzlsao7k6ihozs5karw6kpx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When there isn't a config file (e.g. ~/.perfconfig) or it has nothing,
the config set wasn't created.
If the config set does not exist, a config file can't be autogenerated.
So allow creating a empty config set in the above case,
then we can support the config file autogeneration.
Before:
$ rm -f ~/.perfconfig
$ perf config --user report.children=false
$ cat ~/.perfconfig
cat: /root/.perfconfig: No such file or directory
But I think it should work even if there isn't a config file.
After:
$ rm -f ~/.perfconfig
$ perf config --user report.children=false
$ cat ~/.perfconfig
# this file is auto-generated.
[report]
children = false
NOTE:
As a result, if perf_config_set__init() fails, it looks as if the config
set isn't freed. But it isn't a problem. Because the config set will be
freed by perf_config_set__delete() at the end of cmd_config().
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1504754336-9824-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Currently set_config() can be repeatedly called for each input config on
the below case:
$ perf config kmem.default=slab report.children=false ...
But it's a waste, so only once write a config file gathering all given
config key=value pairs.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1504754331-9776-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In perf_event__synthesize_threads() perf goes through all proc files
serially by readdir.
scandir() does a snapshoot of /proc, which is multithreading friendly.
It's possible that some threads which are added during event synthesize.
But the number of lost threads should be small. They should not impact
the final analysis.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1504806954-150842-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Adding the size values '[current/total]' into progress bar, to show more
detailed progress of data reading.
Adding new ui_progress__init_size function to specify we want to display
the size.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Adding ui specific init function allowing to setup the progress bar
width based on current screen scales.
Adding TUI init function to get more grained update of the progress bar.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To be able to cleanup only python related binaries.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908084621.31595-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Teach perf script to print user regs.
% perf record --user-regs=ip,sp ...
% perf script -F ip,sym,uregs
...
ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637
ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637
ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637
ffffffff9e060c24 native_write_msr ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637
ffffffff9e00cc12 intel_pmu_handle_irq ABI:2 SP:0x7ffd0ea06c38 IP:0x7fe77f55b637
v2: Rebased on top of phys-addr patches
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20170905184057.26135-1-andi@firstfloor.org
[ Use PRIu64 for regs->abi in print_sample_uregs() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
USER_REGS can currently only collected implicitely with call graph
recording. Sometimes it is useful to see them separately, and filter
them. Add a new --user-regs option to record that is similar to
--intr-regs, but acts on user regs.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170905170029.19722-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Some metrics (like GFLOPs) need walltime_nsecs_stats for each interval.
Compute it for each interval instead of only at the end.
Pointed out by Jiri.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-12-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Some perf stat metrics use an internal "duration_time" metric. It is not
correctly printed however. So hide it during output to avoid confusing
users with 0 counts.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-11-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Some of the metrics formulas (like GFLOPs) need to know how long the
measurement period is. Support an internal event called duration_time,
which reports time in second. It maps to the dummy event, but is special
cased for statistics to report the walltime duration.
So far it is not printed, but only used internally for metrics.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-10-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We don't need to use ctx to look up events for saved values. The
context is already part of the evsel pointer, which is the primary key.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-9-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add code to perf list to print metric groups, and metrics
that don't have an event name. The metricgroup code collects
the eventgroups and events into a rblist, and then prints
them according to the configured filters.
The metricgroups are printed by default, but can be
limited by perf list metric or perf list metricgroup
% perf list metricgroup
..
Metric Groups:
DSB:
DSB_Coverage
[Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
FLOPS:
GFLOPs
[Giga Floating Point Operations Per Second]
Frontend:
IFetch_Line_Utilization
[Rough Estimation of fraction of fetched lines bytes that were likely consumed by program instructions]
Frontend_Bandwidth:
DSB_Coverage
[Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
Memory_BW:
MLP
[Memory-Level-Parallelism (average number of L1 miss demand load when there is at least 1 such miss)]
v2: Check return value of asprintf to fix warning on FC26
Fix key in lookup/addition for the groups list
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-8-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add generic support for standalone metrics specified in JSON files to
perf stat. A metric is a formula that uses multiple events to compute a
higher level result (e.g. IPC).
Previously metrics were always tied to an event and automatically
enabled with that event. But now change it that we can have standalone
metrics. They are in the same JSON data structure as events, but don't
have an event name.
We also allow to organize the metrics in metric groups, which allows a
short cut to select several related metrics at once.
Add a new -M / --metrics option to perf stat that adds the metrics or
metric groups specified.
Add the core code to manage and parse the metric groups. They are
collected from the JSON data structures into a separate rblist. When
computing shadow values look for metrics in that list. Then they are
computed using the existing saved values infrastructure in stat-shadow.c
The actual JSON metrics are in a separate pull request.
% perf stat -M Summary --metric-only -a sleep 1
Performance counter stats for 'system wide':
Instructions CLKS CPU_Utilization GFLOPs SMT_2T_Utilization Kernel_Utilization
317614222.0 1392930775.0 0.0 0.0 0.2 0.1
1.001497549 seconds time elapsed
% perf stat -M GFLOPs flops
Performance counter stats for 'flops':
3,999,541,471 fp_comp_ops_exe.sse_scalar_single # 1.2 GFLOPs (66.65%)
14 fp_comp_ops_exe.sse_scalar_double (66.65%)
0 fp_comp_ops_exe.sse_packed_double (66.67%)
0 fp_comp_ops_exe.sse_packed_single (66.70%)
0 simd_fp_256.packed_double (66.70%)
0 simd_fp_256.packed_single (66.67%)
0 duration_time
3.238372845 seconds time elapsed
v2: Add missing header file
v3: Move find_map to pmu.c
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-7-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Extract the code to get the per cpu JSON alias into a separate function
for reuse. No behavior changes.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-6-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Print the generic metric header even when the expression evaluation
failed. Otherwise an expression that fails on the first collections due
to division by zero may suddenly reappear later without an header.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-5-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The 'perf stat' shadow metric printing already supports generic metrics.
Factor out the code doing that into a separate function that can be
re-used in a later patch.
No behavior changes.
v2: Fix indentation
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Some enhancements to the JSON parser to prepare for metrics support
- Parse the new MetricGroup field
- Support JSON events with no event name, that have only MetricName.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170831194036.30146-3-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Setting up groups can be complicated due to the complicated scheduling
restrictions of different PMUs.
User tools usually don't understand all these restrictions.
Still in many cases it is useful to set up groups and they work most of
the time. However if the group is set up wrong some members will not
report any value because they never get scheduled.
Add a concept of a 'weak group': try to set up a group, but if it's not
schedulable fallback to not using a group. That gives us the best of
both worlds: groups if they work, but still a usable fallback if they
don't.
In theory it would be possible to have more complex fallback strategies
(e.g. try to split the group in half), but the simple fallback of not
using a group seems to work for now.
So far the weak group is only implemented for perf stat, not for record.
Here's an unschedulable group (on IvyBridge with SMT on)
% perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
73,806,067 branches
4,848,144 branch-misses # 6.57% of all branches
14,754,458 l1d.replacement
24,905,558 l2_lines_in.all
<not supported> l2_rqsts.all_code_rd <------- will never report anything
With the weak group:
% perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1
125,366,055 branches (80.02%)
9,208,402 branch-misses # 7.35% of all branches (80.01%)
24,560,249 l1d.replacement (80.00%)
43,174,971 l2_lines_in.all (80.05%)
31,891,457 l2_rqsts.all_code_rd (79.92%)
The extra event scheduled with some extra multiplexing
v2: Move fallback code to separate function.
Add comment on for_each_group_member
Adjust to new perf_evsel__close interface
v3: Fix debug print out.
Committer testing:
Before:
# perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
Performance counter stats for 'system wide':
<not counted> branches
<not counted> branch-misses
<not counted> l1d.replacement
<not counted> l2_lines_in.all
<not supported> l2_rqsts.all_code_rd
1.002147212 seconds time elapsed
# perf stat -e '{branches,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}' -a sleep 1
Performance counter stats for 'system wide':
83,207,892 branches
11,065,444 l1d.replacement
28,484,024 l2_lines_in.all
12,186,179 l2_rqsts.all_code_rd
1.001739493 seconds time elapsed
After:
# perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}':W -a sleep 1
Performance counter stats for 'system wide':
543,323,909 branches (80.01%)
27,100,512 branch-misses # 4.99% of all branches (80.02%)
50,402,905 l1d.replacement (80.03%)
67,385,892 l2_lines_in.all (80.01%)
21,352,885 l2_rqsts.all_code_rd (79.94%)
1.001086658 seconds time elapsed
#
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20170831194036.30146-2-andi@firstfloor.org
[ Add a "'perf stat' only, for now" comment in the man page, suggested by Jiri ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add options to only show event for specific pid(s) and tid(s).
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1504288152-19690-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix TUI progress bar when delta from new total from that of the
previous update is greater than the progress "step" (screen width
progress bar block)) (Jiri Olsa)
- Make tools/lib/api make DEBUG=1 build use -D_FORTIFY_SOURCE=2 not
to cripple debuginfo, just like tools/perf/ does (Jiri Olsa)
- Avoid leaking the 'perf.data' file to workloads started from the
'perf record' command line by using the O_CLOEXEC open flag (Jiri Olsa)
- Fix building when libunwind's 'unwind.h' file is present in the
include path, clashing with tools/perf/util/unwind.h (Milian Wolff)
- Check per .perfconfig section entry flag, not just per section (Taeung Song)
- Support running perf binaries with a dash in their name, needed to
run perf as an AppImage (Milian Wolff)
- Wait for the right child by using waitpid() when running workloads
from 'perf stat', also to fix using perf as an AppImage (Milian Wolff)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling updates from Ingo Molnar:
"Perf tooling updates and fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf annotate browser: Help for cycling thru hottest instructions with TAB/shift+TAB
perf stat: Only auto-merge events that are PMU aliases
perf test: Add test case for PERF_SAMPLE_PHYS_ADDR
perf script: Support physical address
perf mem: Support physical address
perf sort: Add sort option for physical address
perf tools: Support new sample type for physical address
perf vendor events powerpc: Remove duplicate events
perf intel-pt: Fix syntax in documentation of config option
perf test powerpc: Fix 'Object code reading' test
perf trace: Support syscall name globbing
perf syscalltbl: Support glob matching on syscall names
perf report: Calculate the average cycles of iterations
|