summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r--tools/perf/Documentation/Build.txt15
-rw-r--r--tools/perf/Documentation/android.txt80
-rw-r--r--tools/perf/Documentation/intel-acr.txt53
-rw-r--r--tools/perf/Documentation/perf-annotate.txt1
-rw-r--r--tools/perf/Documentation/perf-arm-spe.txt14
-rw-r--r--tools/perf/Documentation/perf-bench.txt58
-rw-r--r--tools/perf/Documentation/perf-check.txt1
-rw-r--r--tools/perf/Documentation/perf-diff.txt2
-rw-r--r--tools/perf/Documentation/perf-list.txt3
-rw-r--r--tools/perf/Documentation/perf-trace.txt4
-rw-r--r--tools/perf/Documentation/perf.data-file-format.txt10
11 files changed, 159 insertions, 82 deletions
diff --git a/tools/perf/Documentation/Build.txt b/tools/perf/Documentation/Build.txt
index 83dc87c662b6..57b226e7fc2f 100644
--- a/tools/perf/Documentation/Build.txt
+++ b/tools/perf/Documentation/Build.txt
@@ -99,3 +99,18 @@ configuration paths for cross building:
In this case, the variable PKG_CONFIG_SYSROOT_DIR can be used alongside the
variable PKG_CONFIG_LIBDIR or PKG_CONFIG_PATH to prepend the sysroot path to
the library paths for cross compilation.
+
+5) Build with Clang
+===================
+By default, the makefile uses GCC as compiler. With specifying environment
+variables HOSTCC, CC and CXX, it allows to build perf with Clang.
+
+Using Clang for a native build:
+
+ $ HOSTCC=clang CC=clang CXX=clang++ make -C tools/perf
+
+Specifying ARCH and CROSS_COMPILE for cross compilation:
+
+ $ HOSTCC=clang CC=clang CXX=clang++ \
+ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
+ make -C tools/perf
diff --git a/tools/perf/Documentation/android.txt b/tools/perf/Documentation/android.txt
index 24a59998fc91..3f3cc7ac3d13 100644
--- a/tools/perf/Documentation/android.txt
+++ b/tools/perf/Documentation/android.txt
@@ -1,78 +1,10 @@
How to compile perf for Android
-=========================================
+===============================
-I. Set the Android NDK environment
-------------------------------------------------
+There are two ways to build perf and run it on Android:
-(a). Use the Android NDK
-------------------------------------------------
-1. You need to download and install the Android Native Development Kit (NDK).
-Set the NDK variable to point to the path where you installed the NDK:
- export NDK=/path/to/android-ndk
+- Method 1: Build perf with static linking. See Build.txt, section
+ "4) Cross compilation" for how to build a static perf binary.
-2. Set cross-compiling environment variables for NDK toolchain and sysroot.
-For arm:
- export NDK_TOOLCHAIN=${NDK}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi-
- export NDK_SYSROOT=${NDK}/platforms/android-24/arch-arm
-For x86:
- export NDK_TOOLCHAIN=${NDK}/toolchains/x86-4.9/prebuilt/linux-x86_64/bin/i686-linux-android-
- export NDK_SYSROOT=${NDK}/platforms/android-24/arch-x86
-
-This method is only tested for Android NDK versions Revision 11b and later.
-perf uses some bionic enhancements that are not included in prior NDK versions.
-You can use method (b) described below instead.
-
-(b). Use the Android source tree
------------------------------------------------
-1. Download the master branch of the Android source tree.
-Set the environment for the target you want using:
- source build/envsetup.sh
- lunch
-
-2. Build your own NDK sysroot to contain latest bionic changes and set the
-NDK sysroot environment variable.
- cd ${ANDROID_BUILD_TOP}/ndk
-For arm:
- ./build/tools/build-ndk-sysroot.sh --abi=arm
- export NDK_SYSROOT=${ANDROID_BUILD_TOP}/ndk/build/platforms/android-3/arch-arm
-For x86:
- ./build/tools/build-ndk-sysroot.sh --abi=x86
- export NDK_SYSROOT=${ANDROID_BUILD_TOP}/ndk/build/platforms/android-3/arch-x86
-
-3. Set the NDK toolchain environment variable.
-For arm:
- export NDK_TOOLCHAIN=${ANDROID_TOOLCHAIN}/arm-linux-androideabi-
-For x86:
- export NDK_TOOLCHAIN=${ANDROID_TOOLCHAIN}/i686-linux-android-
-
-II. Compile perf for Android
-------------------------------------------------
-You need to run make with the NDK toolchain and sysroot defined above:
-For arm:
- make WERROR=0 ARCH=arm CROSS_COMPILE=${NDK_TOOLCHAIN} EXTRA_CFLAGS="-pie --sysroot=${NDK_SYSROOT}"
-For x86:
- make WERROR=0 ARCH=x86 CROSS_COMPILE=${NDK_TOOLCHAIN} EXTRA_CFLAGS="-pie --sysroot=${NDK_SYSROOT}"
-
-III. Install perf
------------------------------------------------
-You need to connect to your Android device/emulator using adb.
-Install perf using:
- adb push perf /data/perf
-
-If you also want to use perf-archive you need busybox tools for Android.
-For installing perf-archive, you first need to replace #!/bin/bash with #!/system/bin/sh:
- sed 's/#!\/bin\/bash/#!\/system\/bin\/sh/g' perf-archive >> /tmp/perf-archive
- chmod +x /tmp/perf-archive
- adb push /tmp/perf-archive /data/perf-archive
-
-IV. Environment settings for running perf
-------------------------------------------------
-Some perf features need environment variables to run properly.
-You need to set these before running perf on the target:
- adb shell
- # PERF_PAGER=cat
-
-IV. Run perf
-------------------------------------------------
-Run perf on your device/emulator to which you previously connected using adb:
- # ./data/perf
+- Method 2: Download the Android NDK and use the bundled Clang to
+ build perf. See Build.txt, section "5) Build with clang" for details.
diff --git a/tools/perf/Documentation/intel-acr.txt b/tools/perf/Documentation/intel-acr.txt
new file mode 100644
index 000000000000..72654fdd9a52
--- /dev/null
+++ b/tools/perf/Documentation/intel-acr.txt
@@ -0,0 +1,53 @@
+Intel Auto Counter Reload Support
+---------------------------------
+Support for Intel Auto Counter Reload in perf tools
+
+Auto counter reload provides a means for software to specify to hardware
+that certain counters, if supported, should be automatically reloaded
+upon overflow of chosen counters. By taking a sample only if the rate of
+one event exceeds some threshold relative to the rate of another event,
+this feature enables software to sample based on the relative rate of
+two or more events. To enable this, the user must provide a sample period
+term and a bitmask ("acr_mask") for each relevant event specifying the
+counters in an event group to reload if the event's specified sample
+period is exceeded.
+
+For example, if the user desires to measure a scenario when IPC > 2,
+the event group might look like the one below:
+
+ perf record -e {cpu_atom/instructions,period=200000,acr_mask=0x2/, \
+ cpu_atom/cycles,period=100000,acr_mask=0x3/} -- true
+
+In this case, if the "instructions" counter exceeds the sample period of
+200000, the second counter, "cycles", will be reset and a sample will be
+taken. If "cycles" is exceeded first, both counters in the group will be
+reset. In this way, samples will only be taken for cases where IPC > 2.
+
+The acr_mask term is a hexadecimal value representing a bitmask of the
+events in the group to be reset when the period is exceeded. In the
+example above, "instructions" is assigned an acr_mask of 0x2, meaning
+only the second event in the group is reloaded and a sample is taken
+for the first event. "cycles" is assigned an acr_mask of 0x3, meaning
+that both event counters will be reset if the sample period is exceeded
+first.
+
+ratio-to-prev Event Term
+------------------------
+To simplify this, an event term "ratio-to-prev" is provided which is used
+alongside the sample period term n or the -c/--count option. This would
+allow users to specify the desired relative rate between events as a
+ratio. Note: Both events compared must belong to the same PMU.
+
+The command above would then become
+
+ perf record -e {cpu_atom/instructions/, \
+ cpu_atom/cycles,period=100000,ratio-to-prev=0.5/} -- true
+
+ratio-to-prev is the ratio of the event using the term relative
+to the previous event in the group, which will always be 1,
+for a 1:0.5 or 2:1 ratio.
+
+To sample for IPC < 2 for example, the events need to be reordered:
+
+ perf record -e {cpu_atom/cycles/, \
+ cpu_atom/instructions,period=200000,ratio-to-prev=2.0/} -- true
diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt
index 46090c5b42b4..547f1a268018 100644
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@@ -170,7 +170,6 @@ include::itrace.txt[]
--code-with-type::
Show data type info in code annotation (for memory instructions only).
- Currently it only works with --stdio option.
SEE ALSO
diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
index 37afade4f1b2..cda8dd47fc4d 100644
--- a/tools/perf/Documentation/perf-arm-spe.txt
+++ b/tools/perf/Documentation/perf-arm-spe.txt
@@ -191,14 +191,20 @@ groups:
36 branch
0 remote-access
900 memory
+ 1800 instructions
The arm_spe// and dummy:u events are implementation details and are expected to be empty.
-To get a full list of unique samples that are not sorted into groups, set the itrace option to
-generate 'instruction' samples. The period option is also taken into account, so set it to 1
-instruction unless you want to further downsample the already sampled SPE data:
+The instructions group contains the full list of unique samples that are not
+sorted into other groups. To generate only this group use --itrace=i1i.
- perf report --itrace=i1i
+1i (1 instruction interval) signifies no further downsampling. Rather than an
+instruction interval, this generates a sample every n SPE samples. For example
+to generate the default set of events for every 100 SPE samples:
+
+ perf report --itrace==bxofmtMai100i
+
+Other period types, for example nanoseconds (ns) are not currently supported.
Memory access details are also stored on the samples and this can be viewed with:
diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt
index 8331bd28b10e..1160224cb718 100644
--- a/tools/perf/Documentation/perf-bench.txt
+++ b/tools/perf/Documentation/perf-bench.txt
@@ -177,11 +177,21 @@ Suite for evaluating performance of simple memory copy in various ways.
Options of *memcpy*
^^^^^^^^^^^^^^^^^^^
--l::
+-s::
--size::
Specify size of memory to copy (default: 1MB).
Available units are B, KB, MB, GB and TB (case insensitive).
+-p::
+--page::
+Specify page-size for mapping memory buffers (default: 4KB).
+Available values are 4KB, 2MB, 1GB (case insensitive).
+
+-k::
+--chunk::
+Specify the chunk-size for each invocation. (default: 0, or full-extent)
+Available units are B, KB, MB, GB and TB (case insensitive).
+
-f::
--function::
Specify function to copy (default: default).
@@ -201,11 +211,21 @@ Suite for evaluating performance of simple memory set in various ways.
Options of *memset*
^^^^^^^^^^^^^^^^^^^
--l::
+-s::
--size::
Specify size of memory to set (default: 1MB).
Available units are B, KB, MB, GB and TB (case insensitive).
+-p::
+--page::
+Specify page-size for mapping memory buffers (default: 4KB).
+Available values are 4KB, 2MB, 1GB (case insensitive).
+
+-k::
+--chunk::
+Specify the chunk-size for each invocation. (default: 0, or full-extent)
+Available units are B, KB, MB, GB and TB (case insensitive).
+
-f::
--function::
Specify function to set (default: default).
@@ -220,6 +240,40 @@ Repeat memset invocation this number of times.
--cycles::
Use perf's cpu-cycles event instead of gettimeofday syscall.
+*mmap*::
+Suite for evaluating memory subsystem performance for mmap()'d memory.
+
+Options of *mmap*
+^^^^^^^^^^^^^^^^^
+-s::
+--size::
+Specify size of memory to set (default: 1MB).
+Available units are B, KB, MB, GB and TB (case insensitive).
+
+-p::
+--page::
+Specify page-size for mapping memory buffers (default: 4KB).
+Available values are 4KB, 2MB, 1GB (case insensitive).
+
+-r::
+--randomize::
+Specify seed to randomize page access offset (default: 0, or not randomized).
+
+-f::
+--function::
+Specify function to set (default: all).
+Available functions are 'demand' and 'populate', with the first
+demand faulting pages in the region and the second using an eager
+mapping.
+
+-l::
+--nr_loops::
+Repeat mmap() invocation this number of times.
+
+-c::
+--cycles::
+Use perf's cpu-cycles event instead of gettimeofday syscall.
+
SUITES FOR 'numa'
~~~~~~~~~~~~~~~~~
*mem*::
diff --git a/tools/perf/Documentation/perf-check.txt b/tools/perf/Documentation/perf-check.txt
index ee92042082f7..4c9ccda6ce91 100644
--- a/tools/perf/Documentation/perf-check.txt
+++ b/tools/perf/Documentation/perf-check.txt
@@ -56,6 +56,7 @@ feature::
libcapstone / HAVE_LIBCAPSTONE_SUPPORT
libdw-dwarf-unwind / HAVE_LIBDW_SUPPORT
libelf / HAVE_LIBELF_SUPPORT
+ libLLVM / HAVE_LIBLLVM_SUPPORT
libnuma / HAVE_LIBNUMA_SUPPORT
libopencsd / HAVE_CSTRACE_SUPPORT
libperl / HAVE_LIBPERL_SUPPORT
diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt
index f3067a4af294..58efab72d2e5 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -285,7 +285,7 @@ If specified the 'Weighted diff' column is displayed with value 'd' computed as:
- period being the hist entry period value
- - WEIGHT-A/WEIGHT-B being user supplied weights in the the '-c' option
+ - WEIGHT-A/WEIGHT-B being user supplied weights in the '-c' option
behind ':' separator like '-c wdiff:1,2'.
- WEIGHT-A being the weight of the data file
- WEIGHT-B being the weight of the baseline data file
diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index 28215306a78a..a4378a0cd914 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -73,6 +73,7 @@ counted. The following modifiers exist:
e - group or event are exclusive and do not share the PMU
b - use BPF aggregration (see perf stat --bpf-counters)
R - retire latency value of the event
+ X - don't regroup the event to match PMUs
The 'p' modifier can be used for specifying how precise the instruction
address should be. The 'p' modifier can be specified multiple times:
@@ -392,6 +393,8 @@ Support raw format:
. '--raw-dump [hw|sw|cache|tracepoint|pmu|event_glob]', shows the raw-dump of
a certain kind of events.
+include::intel-acr.txt[]
+
SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-top[1],
diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt
index 973fede403a0..892c82a9bf40 100644
--- a/tools/perf/Documentation/perf-trace.txt
+++ b/tools/perf/Documentation/perf-trace.txt
@@ -249,6 +249,10 @@ the thread executes on the designated CPUs. Default is to monitor all CPUs.
works well with -s/--summary option where no argument information is
required.
+--max-summary=N::
+ Maximum number of lines in the summary mode. Note that this applies to
+ each entry (thread or cgroup).
+
PAGEFAULTS
----------
diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
index cd95ba09f727..c9d4dec65344 100644
--- a/tools/perf/Documentation/perf.data-file-format.txt
+++ b/tools/perf/Documentation/perf.data-file-format.txt
@@ -348,6 +348,16 @@ to special needs.
struct perf_bpil, which contains detailed information about
a BPF program, including type, id, tag, jited/xlated instructions, etc.
+The format of data in HEADER_BPF_PROG_INFO is as follows:
+ u32 count
+
+ struct perf_bpil {
+ u32 info_len; /* size of struct bpf_prog_info, when the tool is compiled */
+ u32 data_len; /* total bytes allocated for data, round up to 8 bytes */
+ u64 arrays; /* which arrays are included in data */
+ struct bpf_prog_info info;
+ u8 data[];
+ }[count];
HEADER_BPF_BTF = 26,