diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-01-24 05:45:40 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-01-24 05:45:40 -0800 |
commit | 7685b334d1e4927cc73b62c65293ba65748d9c52 (patch) | |
tree | f7490fa318bf9c8079d5fb274646ef6a6aa1b86f /tools/perf/scripts/python | |
parent | bc8198dc7ebc492ec3e9fa1617dcdfbe98e73b17 (diff) | |
parent | 91b7747dc70d64b5ec56ffe493310f207e7ffc99 (diff) |
Merge tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf-tools updates from Namhyung Kim:
"There are a lot of changes in the perf tools in this cycle.
build:
- Use generic syscall table to generate syscall numbers on supported
archs
- This also enables to get rid of libaudit which was used for syscall
numbers
- Remove python2 support as it's deprecated for years
- Fix issues on static build with libzstd
perf record:
- Intel-PT supports "aux-action" config term to pause or resume
tracing in the aux-buffer. Users can start the intel_pt event as
"started-paused" and configure other events to control the Intel-PT
tracing:
# perf record --kcore -e intel_pt/aux-action=start-paused/ \
-e syscalls:sys_enter_newuname/aux-action=resume/ \
-e syscalls:sys_exit_newuname/aux-action=pause/ -- uname
This requires kernel support (which was added in v6.13)
perf lock:
- 'perf lock contention' command has an ability to symbolize locks in
dynamically allocated objects using slab cache name when it runs
with BPF. Those dynamic locks would have "&" prefix in the name to
distinguish them from ordinary (static) locks
# perf lock con -abl -E 5 sleep 1
contended total wait max wait avg wait address symbol
2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex)
1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex)
4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex)
2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex)
3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex)
This also requires kernel/BPF support (which was added in v6.13)
perf ftrace:
- 'perf ftrace latency' command gets a couple of options to support
linear buckets instead of exponential. Also it's possible to
specify max and min latency for the linear buckets:
# perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \
--min-latency=200 --max-latency=800 -- sleep 1
# DURATION | COUNT | GRAPH |
0 - 200 ns | 186 | ### |
200 - 300 ns | 256 | ##### |
300 - 400 ns | 364 | ####### |
400 - 500 ns | 223 | #### |
500 - 600 ns | 111 | ## |
600 - 700 ns | 41 | |
700 - 800 ns | 141 | ## |
800 - ... ns | 169 | ### |
# statistics (in nsec)
total time: 2162212
avg time: 967
max time: 16817
min time: 132
count: 2236
- As you can see in the above example, it nows shows the statistics
at the end so that users can see the avg/max/min latencies easily
- 'perf ftrace profile' command has --graph-opts option like 'perf
ftrace trace' so that it can control the tracing behaviors in the
same way. For example, it can limit the function call depth or
threshold
perf script:
- Improve physical memory resolution in 'mem-phys-addr' script by
parsing /proc/iomem file
# perf script mem-phys-addr -- find /
...
Event: mem_inst_retired.all_loads:P
Memory type count percentage
---------------------------------------- ---------- ----------
100000000-85f7fffff : System RAM 8929 69.7
547600000-54785d23f : Kernel data 1240 9.7
546a00000-5474bdfff : Kernel rodata 490 3.8
5480ce000-5485fffff : Kernel bss 121 0.9
0-fff : Reserved 3860 30.1
100000-89c01fff : System RAM 18 0.1
8a22c000-8df6efff : System RAM 5 0.0
Others:
- 'perf test' gets --runs-per-test option to run the test cases
repeatedly. This would be helpful to see if it's flaky
- Add 'parse_events' method to Python perf extension module, so that
users can use the same event parsing logic in the python code. One
more step towards implementing perf tools in Python. :)
- Support opening tracepoint events without libtraceevent. This will
be helpful if it won't use the tracing data like in 'perf stat'
- Update ARM Neoverse N2/V2 JSON events and metrics"
* tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (176 commits)
perf test: Update event_groups test to use instructions
perf bench: Fix undefined behavior in cmpworker()
perf annotate: Prefer passing evsel to evsel->core.idx
perf lock: Rename fields in lock_type_table
perf lock: Add percpu-rwsem for type filter
perf lock: Fix parse_lock_type which only retrieve one lock flag
perf lock: Fix return code for functions in __cmd_contention
perf hist: Fix width calculation in hpp__fmt()
perf hist: Fix bogus profiles when filters are enabled
perf hist: Deduplicate cmp/sort/collapse code
perf test: Improve verbose documentation
perf test: Add a runs-per-test flag
perf test: Fix parallel/sequential option documentation
perf test: Send list output to stdout rather than stderr
perf test: Rename functions and variables for better clarity
perf tools: Expose quiet/verbose variables in Makefile.perf
perf config: Add a function to set one variable in .perfconfig
perf test perftool_testsuite: Return correct value for skipping
perf test perftool_testsuite: Add missing description
perf test record+probe_libc_inet_pton: Make test resilient
...
Diffstat (limited to 'tools/perf/scripts/python')
-rw-r--r-- | tools/perf/scripts/python/Perf-Trace-Util/Context.c | 20 | ||||
-rw-r--r-- | tools/perf/scripts/python/mem-phys-addr.py | 177 |
2 files changed, 103 insertions, 94 deletions
diff --git a/tools/perf/scripts/python/Perf-Trace-Util/Context.c b/tools/perf/scripts/python/Perf-Trace-Util/Context.c index 01f54d6724a5..60dcfe56d4d9 100644 --- a/tools/perf/scripts/python/Perf-Trace-Util/Context.c +++ b/tools/perf/scripts/python/Perf-Trace-Util/Context.c @@ -24,16 +24,6 @@ #include "../../../util/srcline.h" #include "../../../util/srccode.h" -#if PY_MAJOR_VERSION < 3 -#define _PyCapsule_GetPointer(arg1, arg2) \ - PyCObject_AsVoidPtr(arg1) -#define _PyBytes_FromStringAndSize(arg1, arg2) \ - PyString_FromStringAndSize((arg1), (arg2)) -#define _PyUnicode_AsUTF8(arg) \ - PyString_AsString(arg) - -PyMODINIT_FUNC initperf_trace_context(void); -#else #define _PyCapsule_GetPointer(arg1, arg2) \ PyCapsule_GetPointer((arg1), (arg2)) #define _PyBytes_FromStringAndSize(arg1, arg2) \ @@ -42,7 +32,6 @@ PyMODINIT_FUNC initperf_trace_context(void); PyUnicode_AsUTF8(arg) PyMODINIT_FUNC PyInit_perf_trace_context(void); -#endif static struct scripting_context *get_args(PyObject *args, const char *name, PyObject **arg2) { @@ -104,7 +93,7 @@ static PyObject *perf_sample_insn(PyObject *obj, PyObject *args) if (c->sample->ip && !c->sample->insn_len && thread__maps(c->al->thread)) { struct machine *machine = maps__machine(thread__maps(c->al->thread)); - script_fetch_insn(c->sample, c->al->thread, machine); + script_fetch_insn(c->sample, c->al->thread, machine, /*native_arch=*/true); } if (!c->sample->insn_len) Py_RETURN_NONE; /* N.B. This is a return statement */ @@ -213,12 +202,6 @@ static PyMethodDef ContextMethods[] = { { NULL, NULL, 0, NULL} }; -#if PY_MAJOR_VERSION < 3 -PyMODINIT_FUNC initperf_trace_context(void) -{ - (void) Py_InitModule("perf_trace_context", ContextMethods); -} -#else PyMODINIT_FUNC PyInit_perf_trace_context(void) { static struct PyModuleDef moduledef = { @@ -240,4 +223,3 @@ PyMODINIT_FUNC PyInit_perf_trace_context(void) return mod; } -#endif diff --git a/tools/perf/scripts/python/mem-phys-addr.py b/tools/perf/scripts/python/mem-phys-addr.py index 1f332e72b9b0..5e237a5a5f1b 100644 --- a/tools/perf/scripts/python/mem-phys-addr.py +++ b/tools/perf/scripts/python/mem-phys-addr.py @@ -3,98 +3,125 @@ # # Copyright (c) 2018, Intel Corporation. -from __future__ import division -from __future__ import print_function - import os import sys -import struct import re import bisect import collections +from dataclasses import dataclass +from typing import (Dict, Optional) sys.path.append(os.environ['PERF_EXEC_PATH'] + \ - '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') + +@dataclass(frozen=True) +class IomemEntry: + """Read from a line in /proc/iomem""" + begin: int + end: int + indent: int + label: str -#physical address ranges for System RAM -system_ram = [] -#physical address ranges for Persistent Memory -pmem = [] -#file object for proc iomem -f = None -#Count for each type of memory -load_mem_type_cnt = collections.Counter() -#perf event name -event_name = None +# Physical memory layout from /proc/iomem. Key is the indent and then +# a list of ranges. +iomem: Dict[int, list[IomemEntry]] = collections.defaultdict(list) +# Child nodes from the iomem parent. +children: Dict[IomemEntry, set[IomemEntry]] = collections.defaultdict(set) +# Maximum indent seen before an entry in the iomem file. +max_indent: int = 0 +# Count for each range of memory. +load_mem_type_cnt: Dict[IomemEntry, int] = collections.Counter() +# Perf event name set from the first sample in the data. +event_name: Optional[str] = None def parse_iomem(): - global f - f = open('/proc/iomem', 'r') - for i, j in enumerate(f): - m = re.split('-|:',j,2) - if m[2].strip() == 'System RAM': - system_ram.append(int(m[0], 16)) - system_ram.append(int(m[1], 16)) - if m[2].strip() == 'Persistent Memory': - pmem.append(int(m[0], 16)) - pmem.append(int(m[1], 16)) + """Populate iomem from /proc/iomem file""" + global iomem + global max_indent + global children + with open('/proc/iomem', 'r', encoding='ascii') as f: + for line in f: + indent = 0 + while line[indent] == ' ': + indent += 1 + if indent > max_indent: + max_indent = indent + m = re.split('-|:', line, 2) + begin = int(m[0], 16) + end = int(m[1], 16) + label = m[2].strip() + entry = IomemEntry(begin, end, indent, label) + # Before adding entry, search for a parent node using its begin. + if indent > 0: + parent = find_memory_type(begin) + assert parent, f"Given indent expected a parent for {label}" + children[parent].add(entry) + iomem[indent].append(entry) -def print_memory_type(): - print("Event: %s" % (event_name)) - print("%-40s %10s %10s\n" % ("Memory type", "count", "percentage"), end='') - print("%-40s %10s %10s\n" % ("----------------------------------------", - "-----------", "-----------"), - end=''); - total = sum(load_mem_type_cnt.values()) - for mem_type, count in sorted(load_mem_type_cnt.most_common(), \ - key = lambda kv: (kv[1], kv[0]), reverse = True): - print("%-40s %10d %10.1f%%\n" % - (mem_type, count, 100 * count / total), - end='') +def find_memory_type(phys_addr) -> Optional[IomemEntry]: + """Search iomem for the range containing phys_addr with the maximum indent""" + for i in range(max_indent, -1, -1): + if i not in iomem: + continue + position = bisect.bisect_right(iomem[i], phys_addr, + key=lambda entry: entry.begin) + if position is None: + continue + iomem_entry = iomem[i][position-1] + if iomem_entry.begin <= phys_addr <= iomem_entry.end: + return iomem_entry + print(f"Didn't find {phys_addr}") + return None -def trace_begin(): - parse_iomem() +def print_memory_type(): + print(f"Event: {event_name}") + print(f"{'Memory type':<40} {'count':>10} {'percentage':>10}") + print(f"{'-' * 40:<40} {'-' * 10:>10} {'-' * 10:>10}") + total = sum(load_mem_type_cnt.values()) + # Add count from children into the parent. + for i in range(max_indent, -1, -1): + if i not in iomem: + continue + for entry in iomem[i]: + global children + for child in children[entry]: + if load_mem_type_cnt[child] > 0: + load_mem_type_cnt[entry] += load_mem_type_cnt[child] -def trace_end(): - print_memory_type() - f.close() + def print_entries(entries): + """Print counts from parents down to their children""" + global children + for entry in sorted(entries, + key = lambda entry: load_mem_type_cnt[entry], + reverse = True): + count = load_mem_type_cnt[entry] + if count > 0: + mem_type = ' ' * entry.indent + f"{entry.begin:x}-{entry.end:x} : {entry.label}" + percent = 100 * count / total + print(f"{mem_type:<40} {count:>10} {percent:>10.1f}") + print_entries(children[entry]) -def is_system_ram(phys_addr): - #/proc/iomem is sorted - position = bisect.bisect(system_ram, phys_addr) - if position % 2 == 0: - return False - return True + print_entries(iomem[0]) -def is_persistent_mem(phys_addr): - position = bisect.bisect(pmem, phys_addr) - if position % 2 == 0: - return False - return True +def trace_begin(): + parse_iomem() -def find_memory_type(phys_addr): - if phys_addr == 0: - return "N/A" - if is_system_ram(phys_addr): - return "System RAM" +def trace_end(): + print_memory_type() - if is_persistent_mem(phys_addr): - return "Persistent Memory" +def process_event(param_dict): + if "sample" not in param_dict: + return - #slow path, search all - f.seek(0, 0) - for j in f: - m = re.split('-|:',j,2) - if int(m[0], 16) <= phys_addr <= int(m[1], 16): - return m[2] - return "N/A" + sample = param_dict["sample"] + if "phys_addr" not in sample: + return -def process_event(param_dict): - name = param_dict["ev_name"] - sample = param_dict["sample"] - phys_addr = sample["phys_addr"] + phys_addr = sample["phys_addr"] + entry = find_memory_type(phys_addr) + if entry: + load_mem_type_cnt[entry] += 1 - global event_name - if event_name == None: - event_name = name - load_mem_type_cnt[find_memory_type(phys_addr)] += 1 + global event_name + if event_name is None: + event_name = param_dict["ev_name"] |