Merge tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf-tools updates from Namhyung Kim: "There are a lot of changes in the perf tools in this cycle. build: - Use generic syscall table to generate syscall numbers on supported archs - This also enables to get rid of libaudit which was used for syscall numbers - Remove python2 support as it's deprecated for years - Fix issues on static build with libzstd perf record: - Intel-PT supports "aux-action" config term to pause or resume tracing in the aux-buffer. Users can start the intel_pt event as "started-paused" and configure other events to control the Intel-PT tracing: # perf record --kcore -e intel_pt/aux-action=start-paused/ \ -e syscalls:sys_enter_newuname/aux-action=resume/ \ -e syscalls:sys_exit_newuname/aux-action=pause/ -- uname This requires kernel support (which was added in v6.13) perf lock: - 'perf lock contention' command has an ability to symbolize locks in dynamically allocated objects using slab cache name when it runs with BPF. Those dynamic locks would have "&" prefix in the name to distinguish them from ordinary (static) locks # perf lock con -abl -E 5 sleep 1 contended total wait max wait avg wait address symbol 2 1.95 us 1.77 us 975 ns ffff9d5e852d3498 &task_struct (mutex) 1 1.18 us 1.18 us 1.18 us ffff9d5e852d3538 &task_struct (mutex) 4 1.12 us 354 ns 279 ns ffff9d5e841ca800 &kmalloc-cg-512 (mutex) 2 859 ns 617 ns 429 ns ffffffffa41c3620 delayed_uprobe_lock (mutex) 3 691 ns 388 ns 230 ns ffffffffa41c0940 pack_mutex (mutex) This also requires kernel/BPF support (which was added in v6.13) perf ftrace: - 'perf ftrace latency' command gets a couple of options to support linear buckets instead of exponential. Also it's possible to specify max and min latency for the linear buckets: # perf ftrace latency -abn -T switch_mm_irqs_off --bucket-range=100 \ --min-latency=200 --max-latency=800 -- sleep 1 # DURATION | COUNT | GRAPH | 0 - 200 ns | 186 | ### | 200 - 300 ns | 256 | ##### | 300 - 400 ns | 364 | ####### | 400 - 500 ns | 223 | #### | 500 - 600 ns | 111 | ## | 600 - 700 ns | 41 | | 700 - 800 ns | 141 | ## | 800 - ... ns | 169 | ### | # statistics (in nsec) total time: 2162212 avg time: 967 max time: 16817 min time: 132 count: 2236 - As you can see in the above example, it nows shows the statistics at the end so that users can see the avg/max/min latencies easily - 'perf ftrace profile' command has --graph-opts option like 'perf ftrace trace' so that it can control the tracing behaviors in the same way. For example, it can limit the function call depth or threshold perf script: - Improve physical memory resolution in 'mem-phys-addr' script by parsing /proc/iomem file # perf script mem-phys-addr -- find / ... Event: mem_inst_retired.all_loads:P Memory type count percentage ---------------------------------------- ---------- ---------- 100000000-85f7fffff : System RAM 8929 69.7 547600000-54785d23f : Kernel data 1240 9.7 546a00000-5474bdfff : Kernel rodata 490 3.8 5480ce000-5485fffff : Kernel bss 121 0.9 0-fff : Reserved 3860 30.1 100000-89c01fff : System RAM 18 0.1 8a22c000-8df6efff : System RAM 5 0.0 Others: - 'perf test' gets --runs-per-test option to run the test cases repeatedly. This would be helpful to see if it's flaky - Add 'parse_events' method to Python perf extension module, so that users can use the same event parsing logic in the python code. One more step towards implementing perf tools in Python. :) - Support opening tracepoint events without libtraceevent. This will be helpful if it won't use the tracing data like in 'perf stat' - Update ARM Neoverse N2/V2 JSON events and metrics" * tag 'perf-tools-for-v6.14-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (176 commits) perf test: Update event_groups test to use instructions perf bench: Fix undefined behavior in cmpworker() perf annotate: Prefer passing evsel to evsel->core.idx perf lock: Rename fields in lock_type_table perf lock: Add percpu-rwsem for type filter perf lock: Fix parse_lock_type which only retrieve one lock flag perf lock: Fix return code for functions in __cmd_contention perf hist: Fix width calculation in hpp__fmt() perf hist: Fix bogus profiles when filters are enabled perf hist: Deduplicate cmp/sort/collapse code perf test: Improve verbose documentation perf test: Add a runs-per-test flag perf test: Fix parallel/sequential option documentation perf test: Send list output to stdout rather than stderr perf test: Rename functions and variables for better clarity perf tools: Expose quiet/verbose variables in Makefile.perf perf config: Add a function to set one variable in .perfconfig perf test perftool_testsuite: Return correct value for skipping perf test perftool_testsuite: Add missing description perf test record+probe_libc_inet_pton: Make test resilient ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2025-01-24 05:45:40 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2025-01-24 05:45:40 -0800
commit: 7685b334d1e4927cc73b62c65293ba65748d9c52 (patch)
tree: f7490fa318bf9c8079d5fb274646ef6a6aa1b86f /tools/perf/scripts/python
parent: bc8198dc7ebc492ec3e9fa1617dcdfbe98e73b17 (diff)
parent: 91b7747dc70d64b5ec56ffe493310f207e7ffc99 (diff)
2 files changed, 103 insertions, 94 deletions
diff --git a/tools/perf/scripts/python/Perf-Trace-Util/Context.c b/tools/perf/scripts/python/Perf-Trace-Util/Context.c
index 01f54d6724a5..60dcfe56d4d9 100644
--- a/tools/perf/scripts/python/Perf-Trace-Util/Context.c
+++ b/tools/perf/scripts/python/Perf-Trace-Util/Context.c
@@ -24,16 +24,6 @@
 #include "../../../util/srcline.h"
 #include "../../../util/srccode.h"
 
-#if PY_MAJOR_VERSION < 3
-#define _PyCapsule_GetPointer(arg1, arg2) \
-  PyCObject_AsVoidPtr(arg1)
-#define _PyBytes_FromStringAndSize(arg1, arg2) \
-  PyString_FromStringAndSize((arg1), (arg2))
-#define _PyUnicode_AsUTF8(arg) \
-  PyString_AsString(arg)
-
-PyMODINIT_FUNC initperf_trace_context(void);
-#else
 #define _PyCapsule_GetPointer(arg1, arg2) \
   PyCapsule_GetPointer((arg1), (arg2))
 #define _PyBytes_FromStringAndSize(arg1, arg2) \
@@ -42,7 +32,6 @@ PyMODINIT_FUNC initperf_trace_context(void);
   PyUnicode_AsUTF8(arg)
 
 PyMODINIT_FUNC PyInit_perf_trace_context(void);
-#endif
 
 static struct scripting_context *get_args(PyObject *args, const char *name, PyObject **arg2)
 {
@@ -104,7 +93,7 @@ static PyObject *perf_sample_insn(PyObject *obj, PyObject *args)
 	if (c->sample->ip && !c->sample->insn_len && thread__maps(c->al->thread)) {
 		struct machine *machine =  maps__machine(thread__maps(c->al->thread));
 
-		script_fetch_insn(c->sample, c->al->thread, machine);
+		script_fetch_insn(c->sample, c->al->thread, machine, /*native_arch=*/true);
 	}
 	if (!c->sample->insn_len)
 		Py_RETURN_NONE; /* N.B. This is a return statement */
@@ -213,12 +202,6 @@ static PyMethodDef ContextMethods[] = {
 	{ NULL, NULL, 0, NULL}
 };
 
-#if PY_MAJOR_VERSION < 3
-PyMODINIT_FUNC initperf_trace_context(void)
-{
-	(void) Py_InitModule("perf_trace_context", ContextMethods);
-}
-#else
 PyMODINIT_FUNC PyInit_perf_trace_context(void)
 {
 	static struct PyModuleDef moduledef = {
@@ -240,4 +223,3 @@ PyMODINIT_FUNC PyInit_perf_trace_context(void)
 
 	return mod;
 }
-#endif
diff --git a/tools/perf/scripts/python/mem-phys-addr.py b/tools/perf/scripts/python/mem-phys-addr.py
index 1f332e72b9b0..5e237a5a5f1b 100644
--- a/tools/perf/scripts/python/mem-phys-addr.py
+++ b/tools/perf/scripts/python/mem-phys-addr.py
@@ -3,98 +3,125 @@
 #
 # Copyright (c) 2018, Intel Corporation.
 
-from __future__ import division
-from __future__ import print_function
-
 import os
 import sys
-import struct
 import re
 import bisect
 import collections
+from dataclasses import dataclass
+from typing import (Dict, Optional)
 
 sys.path.append(os.environ['PERF_EXEC_PATH'] + \
-	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+    '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+@dataclass(frozen=True)
+class IomemEntry:
+    """Read from a line in /proc/iomem"""
+    begin: int
+    end: int
+    indent: int
+    label: str
 
-#physical address ranges for System RAM
-system_ram = []
-#physical address ranges for Persistent Memory
-pmem = []
-#file object for proc iomem
-f = None
-#Count for each type of memory
-load_mem_type_cnt = collections.Counter()
-#perf event name
-event_name = None
+# Physical memory layout from /proc/iomem. Key is the indent and then
+# a list of ranges.
+iomem: Dict[int, list[IomemEntry]] = collections.defaultdict(list)
+# Child nodes from the iomem parent.
+children: Dict[IomemEntry, set[IomemEntry]] = collections.defaultdict(set)
+# Maximum indent seen before an entry in the iomem file.
+max_indent: int = 0
+# Count for each range of memory.
+load_mem_type_cnt: Dict[IomemEntry, int] = collections.Counter()
+# Perf event name set from the first sample in the data.
+event_name: Optional[str] = None
 
 def parse_iomem():
-	global f
-	f = open('/proc/iomem', 'r')
-	for i, j in enumerate(f):
-		m = re.split('-|:',j,2)
-		if m[2].strip() == 'System RAM':
-			system_ram.append(int(m[0], 16))
-			system_ram.append(int(m[1], 16))
-		if m[2].strip() == 'Persistent Memory':
-			pmem.append(int(m[0], 16))
-			pmem.append(int(m[1], 16))
+    """Populate iomem from /proc/iomem file"""
+    global iomem
+    global max_indent
+    global children
+    with open('/proc/iomem', 'r', encoding='ascii') as f:
+        for line in f:
+            indent = 0
+            while line[indent] == ' ':
+                indent += 1
+            if indent > max_indent:
+                max_indent = indent
+            m = re.split('-|:', line, 2)
+            begin = int(m[0], 16)
+            end = int(m[1], 16)
+            label = m[2].strip()
+            entry = IomemEntry(begin, end, indent, label)
+            # Before adding entry, search for a parent node using its begin.
+            if indent > 0:
+                parent = find_memory_type(begin)
+                assert parent, f"Given indent expected a parent for {label}"
+                children[parent].add(entry)
+            iomem[indent].append(entry)
 
-def print_memory_type():
-	print("Event: %s" % (event_name))
-	print("%-40s  %10s  %10s\n" % ("Memory type", "count", "percentage"), end='')
-	print("%-40s  %10s  %10s\n" % ("----------------------------------------",
-					"-----------", "-----------"),
-					end='');
-	total = sum(load_mem_type_cnt.values())
-	for mem_type, count in sorted(load_mem_type_cnt.most_common(), \
-					key = lambda kv: (kv[1], kv[0]), reverse = True):
-		print("%-40s  %10d  %10.1f%%\n" %
-			(mem_type, count, 100 * count / total),
-			end='')
+def find_memory_type(phys_addr) -> Optional[IomemEntry]:
+    """Search iomem for the range containing phys_addr with the maximum indent"""
+    for i in range(max_indent, -1, -1):
+        if i not in iomem:
+            continue
+        position = bisect.bisect_right(iomem[i], phys_addr,
+                                       key=lambda entry: entry.begin)
+        if position is None:
+            continue
+        iomem_entry = iomem[i][position-1]
+        if  iomem_entry.begin <= phys_addr <= iomem_entry.end:
+            return iomem_entry
+    print(f"Didn't find {phys_addr}")
+    return None
 
-def trace_begin():
-	parse_iomem()
+def print_memory_type():
+    print(f"Event: {event_name}")
+    print(f"{'Memory type':<40}  {'count':>10}  {'percentage':>10}")
+    print(f"{'-' * 40:<40}  {'-' * 10:>10}  {'-' * 10:>10}")
+    total = sum(load_mem_type_cnt.values())
+    # Add count from children into the parent.
+    for i in range(max_indent, -1, -1):
+        if i not in iomem:
+            continue
+        for entry in iomem[i]:
+            global children
+            for child in children[entry]:
+                if load_mem_type_cnt[child] > 0:
+                    load_mem_type_cnt[entry] += load_mem_type_cnt[child]
 
-def trace_end():
-	print_memory_type()
-	f.close()
+    def print_entries(entries):
+        """Print counts from parents down to their children"""
+        global children
+        for entry in sorted(entries,
+                            key = lambda entry: load_mem_type_cnt[entry],
+                            reverse = True):
+            count = load_mem_type_cnt[entry]
+            if count > 0:
+                mem_type = ' ' * entry.indent + f"{entry.begin:x}-{entry.end:x} : {entry.label}"
+                percent = 100 * count / total
+                print(f"{mem_type:<40}  {count:>10}  {percent:>10.1f}")
+                print_entries(children[entry])
 
-def is_system_ram(phys_addr):
-	#/proc/iomem is sorted
-	position = bisect.bisect(system_ram, phys_addr)
-	if position % 2 == 0:
-		return False
-	return True
+    print_entries(iomem[0])
 
-def is_persistent_mem(phys_addr):
-	position = bisect.bisect(pmem, phys_addr)
-	if position % 2 == 0:
-		return False
-	return True
+def trace_begin():
+    parse_iomem()
 
-def find_memory_type(phys_addr):
-	if phys_addr == 0:
-		return "N/A"
-	if is_system_ram(phys_addr):
-		return "System RAM"
+def trace_end():
+    print_memory_type()
 
-	if is_persistent_mem(phys_addr):
-		return "Persistent Memory"
+def process_event(param_dict):
+    if "sample" not in param_dict:
+        return
 
-	#slow path, search all
-	f.seek(0, 0)
-	for j in f:
-		m = re.split('-|:',j,2)
-		if int(m[0], 16) <= phys_addr <= int(m[1], 16):
-			return m[2]
-	return "N/A"
+    sample = param_dict["sample"]
+    if "phys_addr" not in sample:
+        return
 
-def process_event(param_dict):
-	name       = param_dict["ev_name"]
-	sample     = param_dict["sample"]
-	phys_addr  = sample["phys_addr"]
+    phys_addr  = sample["phys_addr"]
+    entry = find_memory_type(phys_addr)
+    if entry:
+        load_mem_type_cnt[entry] += 1
 
-	global event_name
-	if event_name == None:
-		event_name = name
-	load_mem_type_cnt[find_memory_type(phys_addr)] += 1
+    global event_name
+    if event_name is None:
+        event_name  = param_dict["ev_name"]
author	Linus Torvalds <torvalds@linux-foundation.org>	2025-01-24 05:45:40 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2025-01-24 05:45:40 -0800
commit	7685b334d1e4927cc73b62c65293ba65748d9c52 (patch)
tree	f7490fa318bf9c8079d5fb274646ef6a6aa1b86f /tools/perf/scripts/python
parent	bc8198dc7ebc492ec3e9fa1617dcdfbe98e73b17 (diff)
parent	91b7747dc70d64b5ec56ffe493310f207e7ffc99 (diff)