summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2018-02-16perf tests shell lib: Use a wildcard to remove the vfs_getname probeArnaldo Carvalho de Melo
In some situations the vfs_getname is being added both as requested and with a _1 suffix (inlines?): probe:vfs_getname_1 (on getname_flags:63@acme/git/linux/fs/namei.c with pathname) This ends up making the cleanup to miss that one, as it removes just 'probe:vfs_getname', which makes the second test to use this probe point to fail, since it finds that leftover from the first test, use a wildcard to remove both. Before: # perf test 60 61 62 63 60: Use vfs_getname probe to get syscall args filenames : FAILED! 61: probe libc's inet_pton & backtrace it with ping : Ok 62: Check open filename arg using perf trace + vfs_getname: FAILED! 63: Add vfs_getname probe to get syscall args filenames : Ok After: # perf test 60 61 62 63 60: Use vfs_getname probe to get syscall args filenames : Ok 61: probe libc's inet_pton & backtrace it with ping : Ok 62: Check open filename arg using perf trace + vfs_getname: Ok 63: Add vfs_getname probe to get syscall args filenames : Ok # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-2k5kutwr4ds36adiakyb4yvy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf test: Fix test case inet_pton to accept inlines.Thomas Richter
Using Fedora 27 and latest Linux kernel the test case trace+probe_libc_inet_pton.sh fails again on s390. This time is the inlining of functions which does not match. After an update of the glibc (from 2.26-16 to 2.26-24) the output is different The expected output is: __inet_pton (/usr/lib64/libc-2.26.so) gaih_inet (inlined) .... The actual output is: 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.061/0.061/0.061/0.000 ms 0.000 probe_libc:inet_pton:(3ffb2140448)) __inet_pton (inlined) gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so) ... Fix this by being less strict on 'inlined' verses library name and accept both Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180214070303.55757-1-tmricht@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf test: Fix test case 23 for s390 z/VM or KVM guestsThomas Richter
On s390 perf can be executed on a LPAR with support for hardware events (i. e. cycles) or on a z/VM or KVM guest where no hardware events are supported. In this environment use software event named cpu-clock for this test case. Use the cpuid infrastructure functions to determine the cpuid on s390 which contains an indication of the cpu counter facility availability. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180213151419.80737-4-tmricht@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf cpuid: Introduce a platform specific cpuid compare functionThomas Richter
The function get_cpuid_str() is called by perf_pmu__getcpuid() and on s390 returns a complete description of the CPU and its capabilities, which is a comma separated list. To map the CPU type with the value defined in the pmu-events/arch/s390/mapfile.csv, introduce an architecture specific cpuid compare function named strcmp_cpuid_str() The currently used regex algorithm is defined as the weak default and will be used if no platform specific one is defined. This matches the current behavior. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180213151419.80737-3-tmricht@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf annotate: Scan cpuid for s390 and save machine typeThomas Richter
Scan the cpuid string and extract the type number for later use. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180213151419.80737-2-tmricht@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf record: Provide detailed information on s390 CPUThomas Richter
When perf record ... is setup to record data, the s390 cpu information was a fixed string "IBM/S390". Replace this string with one containing more information about the machine. The information included in the cpuid is a comma separated list: manufacturer,type,model-capacity,model[,version,authorization] with - manufacturer: up to 16 byte name of the manufacturer (IBM). - type: a four digit number refering to the machine generation. - model-capacitiy: up to 16 characters describing number of cpus etc. - model: up to 16 characters describing model. - version: the CPU-MF counter facility version number, available on LPARs only, omitted on z/VM guests. - authorization: the CPU-MF counter facility authorization level, available on LPARs only, omitted on z/VM guests. Before: [root@s8360047 perf]# ./perf record -- sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (4 samples) ] [root@s8360047 perf]# ./perf report --header | fgrep cpuid # cpuid : IBM/S390 [root@s8360047 perf]# After: [root@s35lp76 perf]# ./perf report --header|fgrep cpuid # cpuid : IBM,3906,704,M03,3.5,002f [root@s35lp76 perf]# Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180213151419.80737-1-tmricht@linux.vnet.ibm.com [ Use scnprintf instead of strncat to fix build errors on gcc GNU C99 5.4.0 20160609 -march=zEC12 -m64 -mzarch -ggdb3 -O6 -std=gnu99 -fPIC -fno-omit-frame-pointer -funwind-tables -fstack-protector-all ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf trace powerpc: Use generated syscall tableRavi Bangoria
This should speed up accessing new system calls introduced with the kernel rather than waiting for libaudit updates to include them. It also enables users to specify wildcards, for example, perf trace -e 'open*', just like was already possible on x86 and s390. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20180129083417.31240-4-ravi.bangoria@linux.vnet.ibm.com [ Do it for ppc32 as well ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf powerpc: Generate system call table from asm/unistd.hRavi Bangoria
This should speed up accessing new system calls introduced with the kernel rather than waiting for libaudit updates to include them. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20180129083417.31240-3-ravi.bangoria@linux.vnet.ibm.com [ Made it generate syscall_32.c as well to fix the build on 32-bit ppc ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.hRavi Bangoria
Will be used for generating the syscall id/string translation table. Committer notes: Update it already to catch with these csets applied since Ravi first submitted this patch: 3350eb2ea127 powerpc: sys_pkey_mprotect() system call 9499ec1b5e82 powerpc: sys_pkey_alloc() and sys_pkey_free() system calls So now 'perf trace' on ppc now knows about the pkey_ syscals. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20180129083417.31240-2-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf report: Fix memory corruption in --branch-history mode --branch-historyJiri Olsa
Jin Yao reported memory corrupton in perf report with branch info used for stack trace: > Following command lines will cause perf crash. > perf record -j call -g -a <application> > perf report --branch-history > > *** Error in `perf': double free or corruption (!prev): 0x00000000104aa040 *** > ======= Backtrace: ========= > /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725] > /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a] > /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc] > perf[0x51b914] > perf(hist_entry_iter__add+0x1e5)[0x51f305] > perf[0x43cf01] > perf[0x4fa3bf] > perf[0x4fa923] > perf[0x4fd396] > perf[0x4f9614] > perf(perf_session__process_events+0x89e)[0x4fc38e] > perf(cmd_report+0x15d2)[0x43f202] > perf[0x4a059f] > perf(main+0x631)[0x427b71] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830] > perf(_start+0x29)[0x427d89] For the cumulative output, we allocate the he_cache array based on the --max-stack option value and populate it with data from 'callchain_cursor'. The --max-stack option value does not ensure now the limit for number of callchain_cursor nodes, so the cumulative iter code will allocate smaller array than it's actually needed and cause above corruption. I think the --max-stack limit does not apply here anyway, because we add callchain data as normal hist entries, while the --max-stack control the limit of single entry callchain depth. Using the callchain_cursor.nr as he_cache array count to fix this. Also removing struct hist_entry_iter::max_stack, because there's no longer any use for it. We need more fixes to ensure that the branch stack code follows properly the logic of --max-stack, which is not the case at the moment. Original-patch-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reported-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180216123619.GA9945@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf report: Fix wrong jump arrowJin Yao
When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │ if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │ x += x / y + y / x; 0.00 │0.4│ 1310 movsd (%rsp),%xmm0 0.00 │0.4│ 565 movsd 0x8(%rsp),%xmm4 │0.4│ movsd 0x8(%rsp),%xmm1 │0.4│ movsd (%rsp),%xmm3 │0.4│ divsd %xmm4,%xmm0 0.00 │0.4│ 579 divsd %xmm3,%xmm1 │0.4│ movsd (%rsp),%xmm2 │0.4│ addsd %xmm1,%xmm0 │0.4│ addsd %xmm2,%xmm0 0.00 │0.4│ movsd %xmm0,(%rsp) │ │ volatile double x = 1212121212, y = 121212; │ │ │ │ s_randseed = time(0); │ │ srand(s_randseed); │ │ │ │ for (i = 0; i < 2000000000; i++) { 1.37 │0.4└─→ 82: sub $0x1,%ebx 28.21 │0.48 17 ↑ jne 38 The jump arrow in above example is not correct. It should add the width of IPC and Cycle. With this patch, the result is: Percent│ IPC Cycle │ if (flag) 1.37 │0.48 1 ┌──je 82 │ │ x += x / y + y / x; 0.00 │0.48 1310 │ movsd (%rsp),%xmm0 0.00 │0.48 565 │ movsd 0x8(%rsp),%xmm4 │0.48 │ movsd 0x8(%rsp),%xmm1 │0.48 │ movsd (%rsp),%xmm3 │0.48 │ divsd %xmm4,%xmm0 0.00 │0.48 579 │ divsd %xmm3,%xmm1 │0.48 │ movsd (%rsp),%xmm2 │0.48 │ addsd %xmm1,%xmm0 │0.48 │ addsd %xmm2,%xmm0 0.00 │0.48 │ movsd %xmm0,(%rsp) │ │ volatile double x = 1212121212, y = 121212; │ │ │ │ s_randseed = time(0); │ │ srand(s_randseed); │ │ │ │ for (i = 0; i < 2000000000; i++) { 1.37 │0.48 82:└─→sub $0x1,%ebx 28.21 │0.48 17 ↑ jne 38 Committer notes: Please note that only from LBRv5 (according to Jiri) onwards, i.e. >= Skylake is that we'll have the cycles counts in each branch record entry, so to see the Cycles and IPC columns, and be able to test this patch, one need a capable hardware. While applying this I first tested it on a Broadwell class machine and couldn't get those columns, will add code to the annotate browser to warn the user about that, i.e. you have branch records, but no cycles, use a more recent hardware to get the cycles and IPC columns. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1517223473-14750-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf report: Fix description for --mem-modeAndi Kleen
The "mem-loads" event only works when PEBS is enabled, so add the "/p" ("precise") suffix to the examples. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> LPU-Reference: 20180209163909.9240-1-andi@firstfloor.org Link: https://lkml.kernel.org/n/tip-v0gcd4u9tktrvjjsp6y7ouv4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf inject: Emit instruction records on ETM trace discontinuityRobert Walker
There may be discontinuities in the ETM trace stream due to overflows or ETM configuration for selective trace. This patch emits an instruction sample with the pending branch stack when a TRACE ON packet occurs indicating a discontinuity in the trace data. A new packet type CS_ETM_TRACE_ON is added, which is emitted by the low level decoder when a TRACE ON occurs. The higher level decoder flushes the branch stack when this packet is emitted. Signed-off-by: Robert Walker <robert.walker@arm.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1518607481-4059-3-git-send-email-robert.walker@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf cs-etm: Inject capabilitity for CoreSight tracesRobert Walker
Added user space perf functionality to translate CoreSight traces into instruction events with branch stack. To invoke the new functionality, use the perf inject tool with --itrace=il. For example, to translate the ETM trace from perf.data into last branch records in a new inj.data file: $ perf inject --itrace=i100000il128 -i perf.data -o perf.data.new The 'i' parameter to itrace generates periodic instruction events. The period between instruction events can be specified as a number of instructions suffixed by i (default 100000). The parameter to 'l' specifies the number of entries in the branch stack attached to instruction events. The 'b' parameter to itrace generates events on taken branches. This patch also fixes the contents of the branch events used in perf report - previously branch events were generated for each contiguous range of instructions executed. These are fixed to generate branch events between the last address of a range ending in an executed branch instruction and the start address of the next range. Based on patches by Sebastian Pop <s.pop@samsung.com> with additional fixes and support for specifying the instruction period. Originally-by: Sebastian Pop <s.pop@samsung.com> Signed-off-by: Robert Walker <robert.walker@arm.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1518607481-4059-2-git-send-email-robert.walker@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf mem: Document a missing optionSangwon Hong
Add the missing --force option on the man page. Signed-off-by: Sangwon Hong <qpakzk@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/1518381517-30766-2-git-send-email-qpakzk@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf kmem: Document a missing option & an argumentSangwon Hong
First, 'perf kmem' has a '--force' option, but didn't document it on the man page. So add it. Second, the '--time' option has to get a value, but isn't documented on the man page. Describe it. Signed-off-by: Sangwon Hong <qpakzk@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/1518381517-30766-1-git-send-email-qpakzk@gmail.com [ Add blank like after --force block, as requested by Namhyung ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf annotate: Add missing arguments in Man pageJaecheol Shin
Some options must require an argument. But input, stdio-color, cpu have no them. So I added it. Signed-off-by: Jaecheol Shin <jcgod413@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/20180207095205.62715-1-jcgod413@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf cs-etm: Properly deal with cpu mapsMathieu Poirier
This patch allows the CoreSight AUX info section to fit topologies where only a subset of all available CPUs are present, avoiding at the same time accessing the ETM configuration areas of CPUs that have been offlined. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1518478737-24649-1-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf auxtrace arm: Fixing uninitialised variableMathieu Poirier
When working natively on arm64 the compiler gets pesky and complains that variable 'i' is uninitialised, something that breaks the compilation. Here no further checks are needed since variable 'found_spe' can only be true if variable 'i' has been initialised as part of the for loop. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1518467557-18505-4-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf tools: Use target->per_thread and target->system_wide flagsJin Yao
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Fixes: 73c0ca1eee3d ("perf thread_map: Enumerate all threads from /proc") Link: http://lkml.kernel.org/r/1518467557-18505-3-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> [Fixed checkpatch warning about line over 80 characters] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf cs-etm: Freeing allocated memoryMathieu Poirier
This patch frees all the memory allocated in function cs_etm__alloc_queue(). Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1518467557-18505-2-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf tests: Use arch__compare_symbol_names to compare symbolsJiri Olsa
The symbol search called by machine__find_kernel_symbol_by_name is using internally arch__compare_symbol_names function to compare 2 symbol names, because different archs have different ways of comparing symbols. Mostly for skipping '.' prefixes and similar. In test 1 when we try to find matching symbols in kallsyms and vmlinux, by address and by symbol name. When either is found we compare the pair symbol names by simple strcmp, which is not good enough for reasons explained in previous paragraph. On powerpc this can cause lockup, because even thought we found the pair, the compared names are different and don't match simple strcmp. Following code path is executed, that leads to lockup: - we find the pair in kallsyms by sym->start next_pair: - we compare the names and it fails - we find the pair by sym->name - the pair addresses match so we call goto next_pair because we assume the names match in this case Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 031b84c407c3 ("perf probe ppc: Enable matching against dot symbols automatically") Link: http://lkml.kernel.org/r/20180215122635.24029-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf tools: Do not create kernel maps in sample__resolve()Jiri Olsa
There's no need for kernel maps to be allocated at this point - sample processing. We search for kernel maps using the kernel map_groups in machine::kmaps which is static. If vmlinux maps for any reason still don't exist, the search correctly fails because they are not in the map group. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180215122635.24029-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf machine: Remove machine__load_kallsyms()Jiri Olsa
The current machine__load_kallsyms() function has no caller, so replace it directly with __machine__load_kallsyms(). Also remove the no_kcore argument as it was always called with a 'true' value. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180215122635.24029-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf machine: Don't search for active kernel start in ↵Jiri Olsa
__machine__create_kernel_maps We should not search for the kernel start address in __machine__create_kernel_maps(), because it's being used in the 'report' code path, where we are interested in kernel MMAP data address (the one recorded via 'perf record', possibly on another machine, or an older or newer kernel on the same machine where analysis is being performed) instead of in current kernel address. The __machine__create_kernel_maps() function serves purely for creating the machines kernel maps and setting up the kmap group. The report code path then sets the address based on the data from kernel MMAP event in the machine__set_kernel_mmap() function. The kallsyms search address logic is used for test code, that calls machine__create_kernel_maps() to get current maps and calls machine__get_running_kernel_start() to get kernel starting address. Use machine__set_kernel_mmap() to set the kernel maps start address and moving map_groups__fixup_end to be call when all maps are in place. Also make __machine__create_kernel_maps static, because there's no external user. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180215122635.24029-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf machine: Generalize machine__set_kernel_mmap()Jiri Olsa
So it could be called without event object, just with start and end values. It will be used in following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180215122635.24029-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf machine: Move kernel mmap name into struct machineJiri Olsa
It simplifies and centralizes the code. The kernel mmap name is set for machine type, which we know from the beginning, so there's no reason to generate it every time we need it. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180215122635.24029-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf machine: Free root_dir in machine__init() error pathJiri Olsa
Free root_dir in machine__init() error path. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180215122635.24029-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf symbols: Check if we read regular file in dso__load()Jiri Olsa
The current code in dso__load() calls is_regular_file(), but it checks its return value only after calling symsrc__init(). That can make symsrc__init() block in elf_* functions on reading the file if the file happens to be device and not regular one. Call symsrc__init() only for regular files. Also remove the symsrc__destroy() cleanup, which is not needed now, because we call symsrc__init() only for regular files. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180215122635.24029-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf stat: Add support to print counts after a period of timeyuzhoujian
Introduce a new option to print counts after N milliseconds and update 'perf stat' documentation accordingly. Show below is the output of the new option for perf stat. $ perf stat --time 2000 -e cycles -a Performance counter stats for 'system wide': 157,260,423 cycles 2.003060766 seconds time elapsed We can print the count deltas after N milliseconds with this new introduced option. This option is not supported with "-I" option. In addition, according to Kangliang's patch(19afd10410957), the monitoring overhead for system-wide core event could be very high if the interval-print parameter was below 100ms, and the limitation value is 10ms. So the same warning will be displayed when the time is set between 10ms to 100ms, and the minimal time is limited to 10ms. Users can make a decision according to their spcific cases. Committer notes: This actually stops the workload after the specified time, then prints the counts. So I renamed the option to --timeout and updated the documentation to state that it will not just print the counts after the specified time, but will really stop the 'perf stat' session and print the counts. The rename from 'time' to 'timeout' also fixes the build in systems where 'time' is used by glibc and can't be used as a name of a variable, such as centos:5 and centos:6. Changes since v3: - none. Changes since v2: - modify the time check in __run_perf_stat func to keep some consistency with the workload case. - add the warning when the time is set between 10ms to 100ms. - add the pr_err when the time is set below 10ms. Changes since v1: - none. Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1517217923-8302-3-git-send-email-ufo19890607@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf stat: Add support to print counts for fixed timesyuzhoujian
Introduce a new option to print counts for fixed number of times and update 'perf stat' documentation accordingly. Show below is the output of the new option for perf stat. $ perf stat -I 1000 --interval-count 2 -e cycles -a # time counts unit events 1.002827089 93,884,870 cycles 2.004231506 56,573,446 cycles We can just print the counts for several times with this newly introduced option. The usage of it is a little like 'vmstat', and it should be used together with "-I" option. $ vmstat -n 1 2 procs ---------memory-------------- --swap- ----io-- -system-- ------cpu--- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 78270544 547484 51732076 0 0 0 20 1 1 1 0 99 0 0 0 0 0 78270512 547484 51732080 0 0 0 16 477 1555 0 0 100 0 0 Changes since v3: - merge interval_count check and times check to one line. - fix the wrong indent in stat.h - use stat_config.times instead of 'times' in cmd_stat function. Changes since v2: - none. Changes since v1: - change the name of the new option "times-print" to "interval-count". - keep the new option interval specifically. Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1517217923-8302-2-git-send-email-ufo19890607@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf report: Add support to display group output for non group eventsJiri Olsa
Add support to display group output for if non grouped events are detected and user forces --group option. Now for non-group events recorded like: $ perf record -e 'cycles,instructions' ls you can still get group output by using --group option in report: $ perf report --group --stdio ... # Overhead Command Shared Object Symbol # ................ ....... ................ ...................... # 17.67% 0.00% ls libc-2.25.so [.] _IO_do_write@@GLIB 15.59% 25.94% ls ls [.] calculate_columns 15.41% 31.35% ls libc-2.25.so [.] __strcoll_l ... Committer note: We should improve on this by making sure that the first line states that this is not a group, but since the user doesn't have to force group view when really using grouped events (e.g. '{cycles,instructions}'), the user better know what is being done... Requested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180209092734.GB20449@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf report: Ask for ordered events for --tasks optionJiri Olsa
If we have the time in, keep the events in time order. Committer notes: Trying to be more verbose, what actual effect this will have in this particular case? Before and after this patch shows the artifacts: --- /tmp/before 2018-02-06 15:40:29.536411625 -0300 +++ /tmp/after 2018-02-06 15:40:51.963403599 -0300 @@ -5,34 +5,34 @@ 2540 2540 1818 | gnome-terminal- 3489 3489 2540 | bash 32433 32433 3489 | perf - 32434 32434 32433 | perf + 32434 32434 32433 | make 32441 32441 32434 | make 32514 32514 32441 | make 511 511 32514 | sh - 512 512 511 | sh + 512 512 511 | install <SNIP> We don't have 'perf' calling 'perf' calling 'make', etc, the second 'perf' actually is 'make', i.e. there was reordering of the relevant PERF_RECORD_COMM and PERF_RECORD_FORK records. Ditto for sh/install later on. Look for FORK and COMM meta events, for those tids: # perf report -D | egrep 'PERF_RECORD_(FORK|COMM)' | egrep '3243[34]' 0 14774650990679 0x1a3cd8 [0x38]: PERF_RECORD_FORK(32433:32433):(3489:3489) 1 14774652080381 0x1d6568 [0x30]: PERF_RECORD_COMM exec: perf:32433/32433 1 14774742473340 0x1dbb48 [0x38]: PERF_RECORD_FORK(32434:32434):(32433:32433) 0 14774752005779 0x1a4af8 [0x30]: PERF_RECORD_COMM exec: make:32434/32434 0 14774753997960 0x1a5578 [0x38]: PERF_RECORD_FORK(32435:32435):(32434:32434) 0 14774756070782 0x1a5618 [0x38]: PERF_RECORD_FORK(32438:32438):(32434:32434) 0 14774757772939 0x1a5680 [0x38]: PERF_RECORD_FORK(32440:32440):(32434:32434) 0 14774758230600 0x1a56e8 [0x38]: PERF_RECORD_FORK(32441:32441):(32434:32434) # First column is the cpu, second is the timestamp. So they are on different CPUs, thus ring buffers, and when we don't use the ordered_events class, we end up mixing that up, use it to take advantage of the PERF_RECORD_FINISHED_ROUND meta events to go on ordering the events using the PERF_SAMPLE_TIME present in the PERF_RECORD_{FORK,COMM,EXIT,SAMPLE,etc} records in the ring buffer. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180206181813.10943-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf tools: Fix comment for sort__* compare functionsJiri Olsa
In commit 2f15bd8c6c6e ("perf tools: Fix "Command" sort_entry's cmp and collapse function") we switched from pointer to string comparison. But failed to remove related comments. Removing them and adding another one to warn before pointer comparison in here. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180206181813.10943-18-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf tests: Fix dwarf unwind for stripped binariesJiri Olsa
When we strip the perf binary, dwarf unwind test stop to work. The reason is that strip will remove static function symbols, which we need to check for unwind. This change will keep this test working in cases where the global symbols are put into dynamic symbol table, which is the case on x86. It still won't work on powerpc. Making those 5 local functions global, and adding 'test_dwarf_unwind__' to their names. Committer testing: Before: # perf test dwarf 58: DWARF unwind : Ok # strip ~/bin/perf # perf test dwarf 58: DWARF unwind : FAILED! # perf test -v dwarf 58: DWARF unwind : --- start --- test child forked, pid 6590 unwind: thread map already set, dso=/home/acme/bin/perf <SNIP> unwind: access_mem addr 0x7ffce6c48098 val 48563f, offset 1144 unwind: test__dwarf_unwind:ip = 0x4a54e5 (0xa54e5) got: test__dwarf_unwind 0xa54e5, expecting test__dwarf_unwind unwind: '':ip = 0x4a50bb (0xa50bb) failed: got unresolved address 0xa50bb unwind failed test child finished with -1 ---- end ---- DWARF unwind: FAILED! # After: # perf test dwarf 58: DWARF unwind : Ok # strip ~/bin/perf # perf test dwarf 58: DWARF unwind : Ok # # perf test -v dwarf 58: DWARF unwind : --- start --- test child forked, pid 7219 unwind: thread map already set, dso=/home/acme/bin/perf <SNIP> unwind: access_mem addr 0x7fff007da2c8 val 48575f, offset 1144 unwind: test__arch_unwind_sample:ip = 0x589044 (0x189044) got: test__arch_unwind_sample 0x189044, expecting test__arch_unwind_sample unwind: test_dwarf_unwind__thread:ip = 0x4a52f7 (0xa52f7) got: test_dwarf_unwind__thread 0xa52f7, expecting test_dwarf_unwind__thread unwind: test_dwarf_unwind__compare:ip = 0x4a5468 (0xa5468) got: test_dwarf_unwind__compare 0xa5468, expecting test_dwarf_unwind__compare unwind: bsearch:ip = 0x7f6608ae94d8 (0x394d8) got: bsearch 0x394d8, expecting bsearch unwind: test_dwarf_unwind__krava_3:ip = 0x4a54d1 (0xa54d1) got: test_dwarf_unwind__krava_3 0xa54d1, expecting test_dwarf_unwind__krava_3 unwind: test_dwarf_unwind__krava_2:ip = 0x4a550b (0xa550b) got: test_dwarf_unwind__krava_2 0xa550b, expecting test_dwarf_unwind__krava_2 unwind: test_dwarf_unwind__krava_1:ip = 0x4a554b (0xa554b) got: test_dwarf_unwind__krava_1 0xa554b, expecting test_dwarf_unwind__krava_1 unwind: test__dwarf_unwind:ip = 0x4a5605 (0xa5605) got: test__dwarf_unwind 0xa5605, expecting test__dwarf_unwind test child finished with 0 ---- end ---- DWARF unwind: Ok # Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180206181813.10943-17-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUNDJiri Olsa
Adding --show-round-event to display PERF_RECORD_FINISHED_ROUND events like: # perf script --show-round-events 2>/dev/null yes 8591 [002] 124177.397597: 18 cpu/mem-stores/P: ff... yes 8591 [002] 124177.397615: 1 cpu/mem-loads,ldlat=30/P: ff... PERF_RECORD_FINISHED_ROUND perf 10380 [001] 124177.397622: 6 cpu/mem-loads,ldlat=30/P: ff... PERF_RECORD_FINISHED_ROUND swapper 0 [000] 124177.400518: 88 cpu/mem-stores/P: ff... swapper 0 [000] 124177.400521: 88 cpu/mem-stores/P: ff... Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180206181813.10943-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf record: Put new line after target override warningJiri Olsa
There's no new-line after target-override warning, now: $ perf record -a --per-thread Warning: SYSTEM/CPU switch overriding PER-THREAD^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.705 MB perf.data (2939 samples) ] with patch: $ perf record -a --per-thread Warning: SYSTEM/CPU switch overriding PER-THREAD ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.705 MB perf.data (2939 samples) ] Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 16ad2ffb822c ("perf tools: Introduce perf_target__strerror()") Link: http://lkml.kernel.org/r/20180206181813.10943-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15Revert "tools include s390: Grab a copy of arch/s390/include/uapi/asm/unistd.h"Hendrik Brueckner
This reverts commit f120c7b187e6c418238710b48723ce141f467543 which is no longer required with the introduction of a syscall.tbl on s390. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: linux-s390@vger.kernel.org LPU-Reference: 1518090470-2899-2-git-send-email-brueckner@linux.vnet.ibm.com Link: https://lkml.kernel.org/n/tip-q1lg0nvhha1tk39ri9aqalcb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf s390: Rework system call table creation by using syscall.tblHendrik Brueckner
Recently, s390 uses a syscall.tbl input file to generate its system call table and unistd uapi header files. Hence, update mksyscalltbl to use it as input to create the system table for perf. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: linux-s390@vger.kernel.org LPU-Reference: 1518090470-2899-4-git-send-email-brueckner@linux.vnet.ibm.com Link: https://lkml.kernel.org/n/tip-bdyhllhsq1zgxv2qx4m377y6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf s390: Grab a copy of arch/s390/kernel/syscall/syscall.tblHendrik Brueckner
Grab a copy of the s390 system call table file introduced with commit 857f46bfb07f53dc112d69bdfb137cc5ec3da7c5 "s390/syscalls: add system call table". Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: linux-s390@vger.kernel.org LPU-Reference: 1518090470-2899-3-git-send-email-brueckner@linux.vnet.ibm.com Link: https://lkml.kernel.org/n/tip-hpw7vdjp7g92ivgpddrp5ydq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf test: Fix test trace+probe_libc_inet_pton.sh for s390xThomas Richter
On Intel test case trace+probe_libc_inet_pton.sh succeeds and the output is: [root@f27 perf]# ./perf trace --no-syscalls -e probe_libc:inet_pton/max-stack=3/ ping -6 -c 1 ::1 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.037 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.037/0.037/0.037/0.000 ms 0.000 probe_libc:inet_pton:(7fa40ac618a0)) __GI___inet_pton (/usr/lib64/libc-2.26.so) getaddrinfo (/usr/lib64/libc-2.26.so) main (/usr/bin/ping) The kernel stack unwinder is used, it is specified implicitly as call-graph=fp (frame pointer). On s390x only dwarf is available for stack unwinding. It is also done in user space. This requires different parameter setup and result checking for s390x and Intel. This patch adds separate perf trace setup and result checking for Intel and s390x. On s390x specify this command line to get a call-graph and handle the different call graph result checking: [root@s35lp76 perf]# ./perf trace --no-syscalls -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.041 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.041/0.041/0.041/0.000 ms 0.000 probe_libc:inet_pton:(3ffb9942060)) __GI___inet_pton (/usr/lib64/libc-2.26.so) gaih_inet (inlined) __GI_getaddrinfo (inlined) main (/usr/bin/ping) __libc_start_main (/usr/lib64/libc-2.26.so) _start (/usr/bin/ping) [root@s35lp76 perf]# Before: [root@s8360047 perf]# ./perf test -vv 58 58: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 26349 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.079 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.079/0.079/0.079/0.000 ms 0.000 probe_libc:inet_pton:(3ff925c2060)) test child finished with -1 ---- end ---- probe libc's inet_pton & backtrace it with ping: FAILED! [root@s8360047 perf]# After: [root@s35lp76 perf]# ./perf test -vv 57 57: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 38708 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.038 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.038/0.038/0.038/0.000 ms 0.000 probe_libc:inet_pton:(3ff87342060)) __GI___inet_pton (/usr/lib64/libc-2.26.so) gaih_inet (inlined) __GI_getaddrinfo (inlined) main (/usr/bin/ping) __libc_start_main (/usr/lib64/libc-2.26.so) _start (/usr/bin/ping) test child finished with 0 ---- end ---- probe libc's inet_pton & backtrace it with ping: Ok [root@s35lp76 perf]# On Intel the test case runs unchanged and succeeds. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/20180117083831.101001-1-tmricht@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf data: Document missing --force optionSangwon Hong
Add the --force option to the man page. Signed-off-by: Sangwon Hong <qpakzk@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Link: http://lkml.kernel.org/r/1517831315-31490-1-git-send-email-qpakzk@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf tools: Substitute yet another strtoull()Andy Shevchenko
Instead of home grown function let's use what library provides us. Signed-off-by: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20180129130359.1490-1-andriy.shevchenko@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf top: Check the latency of perf_top__mmap_read()Kan Liang
The latency of perf_top__mmap_read() should be lower than refresh time. If not, give some hints to reduce the latency. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1516310792-208685-18-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf top: Switch default mode to overwrite modeKan Liang
perf_top__mmap_read() has a severe performance issue in the Knights Landing/Mill platform, when monitoring heavy load systems. It costs several minutes to finish, which is unacceptable. Currently, 'perf top' uses the non overwrite mode. For non overwrite mode, it tries to read everything in the ringbuffer and doesn't pause it. Once there are lots of samples delivered persistently, the processing time could be very long. Also, the latest samples could be lost when the ringbuffer is full. For overwrite mode, it takes a snapshot for the system by pausing the ringbuffer, which could significantly reduce the processing time. Also, the overwrite mode always keep the latest samples. Considering the real time requirement for 'perf top', the overwrite mode is more suitable for it. Actually, 'perf top' was overwrite mode. It is changed to non overwrite mode since commit 93fc64f14472 ("perf top: Switch to non overwrite mode"). It's better to change it back to overwrite mode by default. For the kernel which doesn't support overwrite mode, it will fall back to non overwrite mode. There would be some records lost in overwrite mode because of pausing the ringbuffer. It has little impact for the accuracy of the snapshot and can be tolerated. For overwrite mode, unconditionally wait 100 ms before each snapshot. It also reduces the overhead caused by pausing ringbuffer, especially on light load system. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1516310792-208685-17-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf top: Remove lost events checkingKan Liang
There would be some records lost in overwrite mode because of pausing the ringbuffer. It has little impact for the accuracy of the snapshot and could be tolerated by 'perf top'. Remove the lost events checking. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1516310792-208685-16-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf hists browser: Add parameter to disable lost event warningKan Liang
For overwrite mode, the ringbuffer will be paused. The event lost is expected. It needs a way to notify the browser not print the warning. It will be used later for perf top to disable lost event warning in overwrite mode. There is no behavior change for now. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1516310792-208685-15-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf top: Add overwrite fall backKan Liang
Switch to non-overwrite mode if kernel doesnot support overwrite ringbuffer. It's only effect when overwrite mode is supported. No change to current behavior. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1516310792-208685-14-git-send-email-kan.liang@intel.com [ Use perf_missing_features.write_backward instead of the non merged is_write_backward_fail() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf evsel: Expose the perf_missing_features structArnaldo Carvalho de Melo
As tools may need to adjust to missing features, as 'perf top' will, in the next csets, to cope with a missing 'write_backward' feature. Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-jelngl9q1ooaizvkcput9tic@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-15perf top: Check per-event overwrite termKan Liang
Per-event overwrite term is not forbidden in 'perf top', which can bring problems. Because 'perf top' only support non-overwrite mode now. Add new rules and check regarding to overwrite term for 'perf top'. - All events either have same per-event term or don't have per-event mode setting. Otherwise, it will error out. - Per-event overwrite term should be consistent as opts->overwrite. If not, updating the opts->overwrite according to per-event term. Make it possible to support either non-overwrite or overwrite mode. The overwrite mode is forbidden now, which will be removed when the overwrite mode is supported later. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1516310792-208685-12-git-send-email-kan.liang@intel.com [ Renamed perf_top_overwrite_check to perf_top__overwrite_check, to follow existing convention ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>