diff options
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r-- | tools/perf/Documentation/Build.txt | 15 | ||||
-rw-r--r-- | tools/perf/Documentation/android.txt | 80 | ||||
-rw-r--r-- | tools/perf/Documentation/intel-acr.txt | 53 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-annotate.txt | 1 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-arm-spe.txt | 14 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-bench.txt | 58 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-check.txt | 1 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-diff.txt | 2 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-list.txt | 3 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-trace.txt | 4 | ||||
-rw-r--r-- | tools/perf/Documentation/perf.data-file-format.txt | 10 |
11 files changed, 159 insertions, 82 deletions
diff --git a/tools/perf/Documentation/Build.txt b/tools/perf/Documentation/Build.txt index 83dc87c662b6..57b226e7fc2f 100644 --- a/tools/perf/Documentation/Build.txt +++ b/tools/perf/Documentation/Build.txt @@ -99,3 +99,18 @@ configuration paths for cross building: In this case, the variable PKG_CONFIG_SYSROOT_DIR can be used alongside the variable PKG_CONFIG_LIBDIR or PKG_CONFIG_PATH to prepend the sysroot path to the library paths for cross compilation. + +5) Build with Clang +=================== +By default, the makefile uses GCC as compiler. With specifying environment +variables HOSTCC, CC and CXX, it allows to build perf with Clang. + +Using Clang for a native build: + + $ HOSTCC=clang CC=clang CXX=clang++ make -C tools/perf + +Specifying ARCH and CROSS_COMPILE for cross compilation: + + $ HOSTCC=clang CC=clang CXX=clang++ \ + ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \ + make -C tools/perf diff --git a/tools/perf/Documentation/android.txt b/tools/perf/Documentation/android.txt index 24a59998fc91..3f3cc7ac3d13 100644 --- a/tools/perf/Documentation/android.txt +++ b/tools/perf/Documentation/android.txt @@ -1,78 +1,10 @@ How to compile perf for Android -========================================= +=============================== -I. Set the Android NDK environment ------------------------------------------------- +There are two ways to build perf and run it on Android: -(a). Use the Android NDK ------------------------------------------------- -1. You need to download and install the Android Native Development Kit (NDK). -Set the NDK variable to point to the path where you installed the NDK: - export NDK=/path/to/android-ndk +- Method 1: Build perf with static linking. See Build.txt, section + "4) Cross compilation" for how to build a static perf binary. -2. Set cross-compiling environment variables for NDK toolchain and sysroot. -For arm: - export NDK_TOOLCHAIN=${NDK}/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/arm-linux-androideabi- - export NDK_SYSROOT=${NDK}/platforms/android-24/arch-arm -For x86: - export NDK_TOOLCHAIN=${NDK}/toolchains/x86-4.9/prebuilt/linux-x86_64/bin/i686-linux-android- - export NDK_SYSROOT=${NDK}/platforms/android-24/arch-x86 - -This method is only tested for Android NDK versions Revision 11b and later. -perf uses some bionic enhancements that are not included in prior NDK versions. -You can use method (b) described below instead. - -(b). Use the Android source tree ------------------------------------------------ -1. Download the master branch of the Android source tree. -Set the environment for the target you want using: - source build/envsetup.sh - lunch - -2. Build your own NDK sysroot to contain latest bionic changes and set the -NDK sysroot environment variable. - cd ${ANDROID_BUILD_TOP}/ndk -For arm: - ./build/tools/build-ndk-sysroot.sh --abi=arm - export NDK_SYSROOT=${ANDROID_BUILD_TOP}/ndk/build/platforms/android-3/arch-arm -For x86: - ./build/tools/build-ndk-sysroot.sh --abi=x86 - export NDK_SYSROOT=${ANDROID_BUILD_TOP}/ndk/build/platforms/android-3/arch-x86 - -3. Set the NDK toolchain environment variable. -For arm: - export NDK_TOOLCHAIN=${ANDROID_TOOLCHAIN}/arm-linux-androideabi- -For x86: - export NDK_TOOLCHAIN=${ANDROID_TOOLCHAIN}/i686-linux-android- - -II. Compile perf for Android ------------------------------------------------- -You need to run make with the NDK toolchain and sysroot defined above: -For arm: - make WERROR=0 ARCH=arm CROSS_COMPILE=${NDK_TOOLCHAIN} EXTRA_CFLAGS="-pie --sysroot=${NDK_SYSROOT}" -For x86: - make WERROR=0 ARCH=x86 CROSS_COMPILE=${NDK_TOOLCHAIN} EXTRA_CFLAGS="-pie --sysroot=${NDK_SYSROOT}" - -III. Install perf ------------------------------------------------ -You need to connect to your Android device/emulator using adb. -Install perf using: - adb push perf /data/perf - -If you also want to use perf-archive you need busybox tools for Android. -For installing perf-archive, you first need to replace #!/bin/bash with #!/system/bin/sh: - sed 's/#!\/bin\/bash/#!\/system\/bin\/sh/g' perf-archive >> /tmp/perf-archive - chmod +x /tmp/perf-archive - adb push /tmp/perf-archive /data/perf-archive - -IV. Environment settings for running perf ------------------------------------------------- -Some perf features need environment variables to run properly. -You need to set these before running perf on the target: - adb shell - # PERF_PAGER=cat - -IV. Run perf ------------------------------------------------- -Run perf on your device/emulator to which you previously connected using adb: - # ./data/perf +- Method 2: Download the Android NDK and use the bundled Clang to + build perf. See Build.txt, section "5) Build with clang" for details. diff --git a/tools/perf/Documentation/intel-acr.txt b/tools/perf/Documentation/intel-acr.txt new file mode 100644 index 000000000000..72654fdd9a52 --- /dev/null +++ b/tools/perf/Documentation/intel-acr.txt @@ -0,0 +1,53 @@ +Intel Auto Counter Reload Support +--------------------------------- +Support for Intel Auto Counter Reload in perf tools + +Auto counter reload provides a means for software to specify to hardware +that certain counters, if supported, should be automatically reloaded +upon overflow of chosen counters. By taking a sample only if the rate of +one event exceeds some threshold relative to the rate of another event, +this feature enables software to sample based on the relative rate of +two or more events. To enable this, the user must provide a sample period +term and a bitmask ("acr_mask") for each relevant event specifying the +counters in an event group to reload if the event's specified sample +period is exceeded. + +For example, if the user desires to measure a scenario when IPC > 2, +the event group might look like the one below: + + perf record -e {cpu_atom/instructions,period=200000,acr_mask=0x2/, \ + cpu_atom/cycles,period=100000,acr_mask=0x3/} -- true + +In this case, if the "instructions" counter exceeds the sample period of +200000, the second counter, "cycles", will be reset and a sample will be +taken. If "cycles" is exceeded first, both counters in the group will be +reset. In this way, samples will only be taken for cases where IPC > 2. + +The acr_mask term is a hexadecimal value representing a bitmask of the +events in the group to be reset when the period is exceeded. In the +example above, "instructions" is assigned an acr_mask of 0x2, meaning +only the second event in the group is reloaded and a sample is taken +for the first event. "cycles" is assigned an acr_mask of 0x3, meaning +that both event counters will be reset if the sample period is exceeded +first. + +ratio-to-prev Event Term +------------------------ +To simplify this, an event term "ratio-to-prev" is provided which is used +alongside the sample period term n or the -c/--count option. This would +allow users to specify the desired relative rate between events as a +ratio. Note: Both events compared must belong to the same PMU. + +The command above would then become + + perf record -e {cpu_atom/instructions/, \ + cpu_atom/cycles,period=100000,ratio-to-prev=0.5/} -- true + +ratio-to-prev is the ratio of the event using the term relative +to the previous event in the group, which will always be 1, +for a 1:0.5 or 2:1 ratio. + +To sample for IPC < 2 for example, the events need to be reordered: + + perf record -e {cpu_atom/cycles/, \ + cpu_atom/instructions,period=200000,ratio-to-prev=2.0/} -- true diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt index 46090c5b42b4..547f1a268018 100644 --- a/tools/perf/Documentation/perf-annotate.txt +++ b/tools/perf/Documentation/perf-annotate.txt @@ -170,7 +170,6 @@ include::itrace.txt[] --code-with-type:: Show data type info in code annotation (for memory instructions only). - Currently it only works with --stdio option. SEE ALSO diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt index 37afade4f1b2..cda8dd47fc4d 100644 --- a/tools/perf/Documentation/perf-arm-spe.txt +++ b/tools/perf/Documentation/perf-arm-spe.txt @@ -191,14 +191,20 @@ groups: 36 branch 0 remote-access 900 memory + 1800 instructions The arm_spe// and dummy:u events are implementation details and are expected to be empty. -To get a full list of unique samples that are not sorted into groups, set the itrace option to -generate 'instruction' samples. The period option is also taken into account, so set it to 1 -instruction unless you want to further downsample the already sampled SPE data: +The instructions group contains the full list of unique samples that are not +sorted into other groups. To generate only this group use --itrace=i1i. - perf report --itrace=i1i +1i (1 instruction interval) signifies no further downsampling. Rather than an +instruction interval, this generates a sample every n SPE samples. For example +to generate the default set of events for every 100 SPE samples: + + perf report --itrace==bxofmtMai100i + +Other period types, for example nanoseconds (ns) are not currently supported. Memory access details are also stored on the samples and this can be viewed with: diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt index 8331bd28b10e..1160224cb718 100644 --- a/tools/perf/Documentation/perf-bench.txt +++ b/tools/perf/Documentation/perf-bench.txt @@ -177,11 +177,21 @@ Suite for evaluating performance of simple memory copy in various ways. Options of *memcpy* ^^^^^^^^^^^^^^^^^^^ --l:: +-s:: --size:: Specify size of memory to copy (default: 1MB). Available units are B, KB, MB, GB and TB (case insensitive). +-p:: +--page:: +Specify page-size for mapping memory buffers (default: 4KB). +Available values are 4KB, 2MB, 1GB (case insensitive). + +-k:: +--chunk:: +Specify the chunk-size for each invocation. (default: 0, or full-extent) +Available units are B, KB, MB, GB and TB (case insensitive). + -f:: --function:: Specify function to copy (default: default). @@ -201,11 +211,21 @@ Suite for evaluating performance of simple memory set in various ways. Options of *memset* ^^^^^^^^^^^^^^^^^^^ --l:: +-s:: --size:: Specify size of memory to set (default: 1MB). Available units are B, KB, MB, GB and TB (case insensitive). +-p:: +--page:: +Specify page-size for mapping memory buffers (default: 4KB). +Available values are 4KB, 2MB, 1GB (case insensitive). + +-k:: +--chunk:: +Specify the chunk-size for each invocation. (default: 0, or full-extent) +Available units are B, KB, MB, GB and TB (case insensitive). + -f:: --function:: Specify function to set (default: default). @@ -220,6 +240,40 @@ Repeat memset invocation this number of times. --cycles:: Use perf's cpu-cycles event instead of gettimeofday syscall. +*mmap*:: +Suite for evaluating memory subsystem performance for mmap()'d memory. + +Options of *mmap* +^^^^^^^^^^^^^^^^^ +-s:: +--size:: +Specify size of memory to set (default: 1MB). +Available units are B, KB, MB, GB and TB (case insensitive). + +-p:: +--page:: +Specify page-size for mapping memory buffers (default: 4KB). +Available values are 4KB, 2MB, 1GB (case insensitive). + +-r:: +--randomize:: +Specify seed to randomize page access offset (default: 0, or not randomized). + +-f:: +--function:: +Specify function to set (default: all). +Available functions are 'demand' and 'populate', with the first +demand faulting pages in the region and the second using an eager +mapping. + +-l:: +--nr_loops:: +Repeat mmap() invocation this number of times. + +-c:: +--cycles:: +Use perf's cpu-cycles event instead of gettimeofday syscall. + SUITES FOR 'numa' ~~~~~~~~~~~~~~~~~ *mem*:: diff --git a/tools/perf/Documentation/perf-check.txt b/tools/perf/Documentation/perf-check.txt index ee92042082f7..4c9ccda6ce91 100644 --- a/tools/perf/Documentation/perf-check.txt +++ b/tools/perf/Documentation/perf-check.txt @@ -56,6 +56,7 @@ feature:: libcapstone / HAVE_LIBCAPSTONE_SUPPORT libdw-dwarf-unwind / HAVE_LIBDW_SUPPORT libelf / HAVE_LIBELF_SUPPORT + libLLVM / HAVE_LIBLLVM_SUPPORT libnuma / HAVE_LIBNUMA_SUPPORT libopencsd / HAVE_CSTRACE_SUPPORT libperl / HAVE_LIBPERL_SUPPORT diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt index f3067a4af294..58efab72d2e5 100644 --- a/tools/perf/Documentation/perf-diff.txt +++ b/tools/perf/Documentation/perf-diff.txt @@ -285,7 +285,7 @@ If specified the 'Weighted diff' column is displayed with value 'd' computed as: - period being the hist entry period value - - WEIGHT-A/WEIGHT-B being user supplied weights in the the '-c' option + - WEIGHT-A/WEIGHT-B being user supplied weights in the '-c' option behind ':' separator like '-c wdiff:1,2'. - WEIGHT-A being the weight of the data file - WEIGHT-B being the weight of the baseline data file diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 28215306a78a..a4378a0cd914 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -73,6 +73,7 @@ counted. The following modifiers exist: e - group or event are exclusive and do not share the PMU b - use BPF aggregration (see perf stat --bpf-counters) R - retire latency value of the event + X - don't regroup the event to match PMUs The 'p' modifier can be used for specifying how precise the instruction address should be. The 'p' modifier can be specified multiple times: @@ -392,6 +393,8 @@ Support raw format: . '--raw-dump [hw|sw|cache|tracepoint|pmu|event_glob]', shows the raw-dump of a certain kind of events. +include::intel-acr.txt[] + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-top[1], diff --git a/tools/perf/Documentation/perf-trace.txt b/tools/perf/Documentation/perf-trace.txt index 973fede403a0..892c82a9bf40 100644 --- a/tools/perf/Documentation/perf-trace.txt +++ b/tools/perf/Documentation/perf-trace.txt @@ -249,6 +249,10 @@ the thread executes on the designated CPUs. Default is to monitor all CPUs. works well with -s/--summary option where no argument information is required. +--max-summary=N:: + Maximum number of lines in the summary mode. Note that this applies to + each entry (thread or cgroup). + PAGEFAULTS ---------- diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt index cd95ba09f727..c9d4dec65344 100644 --- a/tools/perf/Documentation/perf.data-file-format.txt +++ b/tools/perf/Documentation/perf.data-file-format.txt @@ -348,6 +348,16 @@ to special needs. struct perf_bpil, which contains detailed information about a BPF program, including type, id, tag, jited/xlated instructions, etc. +The format of data in HEADER_BPF_PROG_INFO is as follows: + u32 count + + struct perf_bpil { + u32 info_len; /* size of struct bpf_prog_info, when the tool is compiled */ + u32 data_len; /* total bytes allocated for data, round up to 8 bytes */ + u64 arrays; /* which arrays are included in data */ + struct bpf_prog_info info; + u8 data[]; + }[count]; HEADER_BPF_BTF = 26, |