linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-06-20	perf vendor events: Update alderlake events/metrics	Ian Rogers
	Update events from v1.24 to v1.27. Update p-core TMA metrics from v4.7 to v4.8, and the e-core TMA metrics to v3.6. Bring in the event updates v1.27: https://github.com/intel/perfmon/commit/ea4f309a04c50ca77a00da2db130fd7cf06db978 v1.26: https://github.com/intel/perfmon/commit/0052e68d24d9873d5ff22363677794fa3eb05313 The p-core TMA 4.8 information was updated in: https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736 And e-core in: https://github.com/intel/perfmon/commit/d9c2faa70bafe03129dc10f9fe414ef03a95acd9 New events are: EXE_ACTIVITY.2_3_PORTS_UTIL, ICACHE_DATA.STALL_PERIODS, L2_TRANS.L2_WB, MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024, MEM_UOPS_RETIRED.LOCK_LOADS, OFFCORE_REQUESTS.DEMAND_CODE_RD, OFFCORE_REQUESTS.DEMAND_RFO, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD, OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD, RS.EMPTY_RESOURCE, SERIALIZATION.C01_MS_SCB, SW_PREFETCH_ACCESS.ANY, UOPS_ISSUED.ANY, UOPS_ISSUED.CYCLES Co-authored-by: Weilin Wang <weilin.wang@intel.com> Co-authored-by: Caleb Biggers <caleb.biggers@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620181752.3945845-2-irogers@google.com
2024-06-20	perf doc: Add AMD IBS usage document	Ravi Bangoria
	Add a perf man page document that describes how to exploit AMD IBS with Linux perf. Brief intro about IBS and simple one-liner examples will help naive users to get started. This is not meant to be an exhaustive IBS guide. User should refer latest AMD64 Architecture Programmer's Manual for detailed description of IBS. Usage: $ man perf-amd-ibs Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: ananth.narayan@amd.com Cc: sandipan.das@amd.com Cc: santosh.shukla@amd.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240620054104.815-1-ravi.bangoria@amd.com
2024-06-20	tools/perf: Handle perftool-testsuite_probe testcases fail when kernel ↵	Athira Rajeev
	debuginfo is not present Running "perftool-testsuite_probe" fails as below: ./perf test -v "perftool-testsuite_probe" 83: perftool-testsuite_probe : FAILED There are three fails: 1. Regexp not found: "\sprobe:inode_permission(?:_\d+)?\s+$on inode_permission(?:[:\+][0-9A-Fa-f]+)?@.+$" -- [ FAIL ] -- perf_probe :: test_adding_kernel :: listing added probe :: perf probe -l (output regexp parsing) 2. Regexp not found: "probe:vfs_mknod" Regexp not found: "probe:vfs_create" Regexp not found: "probe:vfs_rmdir" Regexp not found: "probe:vfs_link" Regexp not found: "probe:vfs_write" -- [ FAIL ] -- perf_probe :: test_adding_kernel :: wildcard adding support (command exitcode + output regexp parsing) 3. Regexp not found: "Failed to find" Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64" Regexp not found: "in this function\|at this address" Line did not match any pattern: "The /boot/vmlinux file has no debug information." Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package." These three tests depends on kernel debug info. 1. Fail 1 expects file name along with probe which needs debuginfo 2. Fail 2 : perf probe -nf --max-probes=512 -a 'vfs_ $params' Debuginfo-analysis is not supported. Error: Failed to add events. 3. Fail 3 : perf probe 'vfs_read somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64' Debuginfo-analysis is not supported. Error: Failed to add events. There is already helper function skip_if_no_debuginfo in lib/probe_vfs_getname.sh which does perf probe and returns "2" if debug info is not present. Use the skip_if_no_debuginfo function and skip only the three tests which needs debuginfo based on the result. With the patch: 83: perftool-testsuite_probe: --- start --- test child forked, pid 3927 -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: -a -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: --add -- [ PASS ] -- perf_probe :: test_adding_kernel :: listing added probe :: perf list Regexp not found: "\s*probe:inode_permission(?:_\d+)?\s+$on inode_permission(?:[:\+][0-9A-Fa-f]+)?@.+$" -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped -- [ PASS ] -- perf_probe :: test_adding_kernel :: using added probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: deleting added probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: listing removed probe (should NOT be listed) -- [ PASS ] -- perf_probe :: test_adding_kernel :: dry run :: adding probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: first probe adding -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (without force) -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (with force) -- [ PASS ] -- perf_probe :: test_adding_kernel :: using doubled probe -- [ PASS ] -- perf_probe :: test_adding_kernel :: removing multiple probes Regexp not found: "probe:vfs_mknod" Regexp not found: "probe:vfs_create" Regexp not found: "probe:vfs_rmdir" Regexp not found: "probe:vfs_link" Regexp not found: "probe:vfs_write" -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped Regexp not found: "Failed to find" Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64" Regexp not found: "in this function\|at this address" Line did not match any pattern: "The /boot/vmlinux file has no debug information." Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package." -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped -- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: add -- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: record -- [ PASS ] -- perf_probe :: test_adding_kernel :: function argument probing :: script ## [ PASS ] ## perf_probe :: test_adding_kernel SUMMARY ---- end(0) ---- 83: perftool-testsuite_probe : Ok Only the three specific tests are skipped and remaining ran successfully. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: disgoel@linux.vnet.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240617122121.7484-1-atrajeev@linux.vnet.ibm.com
2024-06-15	perf hist: Honor symbol_conf.skip_empty	Namhyung Kim
	So that it can skip events with no sample according to the config value. This can omit the dummy event in the output of perf report --group. An example output: $ sudo perf mem record -a sleep 1 $ sudo perf report --group Before) # # Samples: 232 of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u' # Event count (approx.): 3089861 # # Overhead Command Shared Object Symbol # ........................ ........... ................. ..................................... # 9.29% 0.00% 0.00% swapper [kernel.kallsyms] [k] update_blocked_averages 5.26% 0.15% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_se 4.15% 0.00% 0.00% perf-exec [kernel.kallsyms] [k] slab_update_freelist.isra.0 3.87% 0.00% 0.00% perf-exec [kernel.kallsyms] [k] memcg_slab_post_alloc_hook 3.79% 0.17% 0.00% swapper [kernel.kallsyms] [k] enqueue_task_fair 3.63% 0.00% 0.00% sleep [kernel.kallsyms] [k] next_uptodate_page 2.86% 0.00% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq 2.78% 0.00% 0.00% swapper [kernel.kallsyms] [k] __schedule 2.34% 0.00% 0.00% swapper [kernel.kallsyms] [k] intel_idle 2.32% 0.97% 0.00% swapper [kernel.kallsyms] [k] psi_group_change After) # # Samples: 232 of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P' # Event count (approx.): 3089861 # # Overhead Command Shared Object Symbol # ................ ........... ................. ..................................... # 9.29% 0.00% swapper [kernel.kallsyms] [k] update_blocked_averages 5.26% 0.15% swapper [kernel.kallsyms] [k] __update_load_avg_se 4.15% 0.00% perf-exec [kernel.kallsyms] [k] slab_update_freelist.isra.0 3.87% 0.00% perf-exec [kernel.kallsyms] [k] memcg_slab_post_alloc_hook 3.79% 0.17% swapper [kernel.kallsyms] [k] enqueue_task_fair 3.63% 0.00% sleep [kernel.kallsyms] [k] next_uptodate_page 2.86% 0.00% swapper [kernel.kallsyms] [k] __update_load_avg_cfs_rq 2.78% 0.00% swapper [kernel.kallsyms] [k] __schedule 2.34% 0.00% swapper [kernel.kallsyms] [k] intel_idle 2.32% 0.97% swapper [kernel.kallsyms] [k] psi_group_change Now it doesn't have a column for the dummy event. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-5-namhyung@kernel.org
2024-06-15	perf hist: Add symbol_conf.skip_empty	Namhyung Kim
	Add the skip_empty flag to symbol_conf and set the value from the report command to preserve the existing behavior. This makes the code simpler and will be needed other code which is hard to add a new argument. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-4-namhyung@kernel.org
2024-06-15	perf hist: Simplify __hpp_fmt() using hpp_fmt_data	Namhyung Kim
	The struct hpp_fmt_data is to keep the values for each group members so it doesn't need to check the event index in the group. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-3-namhyung@kernel.org
2024-06-15	perf hist: Factor out __hpp__fmt_print()	Namhyung Kim
	Split the logic to print the histogram values according to the format string. This was used in 3 different places so it's better to move out the logic into a function. Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607202918.2357459-2-namhyung@kernel.org
2024-06-15	perf: sched map skips redundant lines with cpu filters	Fernand Sieber
	perf sched map supports cpu filter. However, even with cpu filters active, any context switch currently corresponds to a separate line. As result, context switches on irrelevant cpus result to redundant lines, which makes the output particlularly difficult to read on wide architectures. Fix it by skipping printing for irrelevant CPUs. Example snippet of output before fix: B0 1.461147 secs B0 B0 B0 G0 1.517139 secs After fix: B0 1.461147 secs G0 1.517139 secs Signed-off-by: Fernand Sieber <sieberf@amazon.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-and-tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240614073517.94974-1-sieberf@amazon.com
2024-06-13	perf test pmu: Warn don't fail for legacy mixed case event names	Ian Rogers
	PowerPC has mixed case events matching legacy hardware cache events. Warn but don't fail in this case. Event parsing will still work in this case by matching the legacy case. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kajol Jain <kjain@linux.ibm.com> Cc: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240612124027.2712643-1-irogers@google.com
2024-06-13	tools/perf: Fix timing issue with parallel threads in perf bench ↵	Athira Rajeev
	wake-up-parallel perf bench futex fails as below and hangs intermittently when attempted to run on on a powerpc system: ./perf bench futex wake-parallel Running 'futex/wake-parallel' benchmark: Run summary [PID 88588]: blocking on 640 threads (at [private] futex 0x10464b8c), 640 threads waking up 1 at a time. [Run 1]: Avg per-thread latency (waking 1/640 threads) in 0.1309 ms (+-53.27%) [Run 2]: Avg per-thread latency (waking 1/640 threads) in 0.0120 ms (+-31.16%) [Run 3]: Avg per-thread latency (waking 1/640 threads) in 0.1474 ms (+-92.47%) [Run 4]: Avg per-thread latency (waking 1/640 threads) in 0.2883 ms (+-67.75%) [Run 5]: Avg per-thread latency (waking 1/640 threads) in 0.4108 ms (+-39.60%) [Run 6]: Avg per-thread latency (waking 1/640 threads) in 0.7843 ms (+-78.98%) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) perf: couldn't wakeup all tasks (0/1) In the system, where perf bench wake-up-parallel is has system configuration of 640 cpus. After debugging, this turned out to be a timing issue. The benchmark creates threads equal to number of cpus and issues a futex_wait. Then it does a usleep for .1 second before initiating futex_wake. In system configuration with more threads, the usleep time is not enough. Patch changes the usleep from 100000 to 200000 With the patch, ran multiple iterations and there were no issues further seen Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607044354.82225-3-atrajeev@linux.vnet.ibm.com
2024-06-13	tools/perf: Fix perf bench epoll to enable the run when some CPU's are offline	Athira Rajeev
	Perf bench epoll fails as below when attempted to run on on a powerpc system: ./perf bench epoll wait Running 'epoll/wait' benchmark: Run summary [PID 627653]: 79 threads monitoring on 64 file-descriptors for 8 secs. perf: pthread_create: No such file or directory In the setup where this perf bench was ran, difference was that partition had 640 CPU's, but not all CPUs were online. 80 CPUs were online. While creating threads and using epoll_wait , code sets the affinity using cpumask. The cpumask size used is 80 which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the benchmark reports fail while setting affinity for cpu number which is greater than 80 or higher, because it attempts to set a bit position which is not allocated on the cpumask. Fix this by changing the size of cpumask to number of possible cpus and not the number of online cpus. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607044354.82225-2-atrajeev@linux.vnet.ibm.com
2024-06-13	tools/perf: Fix perf bench futex to enable the run when some CPU's are offline	Athira Rajeev
	Perf bench futex fails as below when attempted to run on on a powerpc system: ./perf bench futex all Running futex/hash benchmark... Run summary [PID 626307]: 80 threads, each operating on 1024 [private] futexes for 10 secs. perf: pthread_create: No such file or directory In the setup where this perf bench was ran, difference was that partition had 640 CPU's, but not all CPUs were online. 80 CPUs were online. While blocking the threads with futex_wait, code sets the affinity using cpumask. The cpumask size used is 80 which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the benchmark reports fail while setting affinity for cpu number which is greater than 80 or higher, because it attempts to set a bit position which is not allocated on the cpumask. Fix this by changing the size of cpumask to number of possible cpus and not the number of online cpus. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Disha Goel <disgoel@linux.ibm.com> Cc: akanksha@linux.ibm.com Cc: kjain@linux.ibm.com Cc: maddy@linux.ibm.com Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607044354.82225-1-atrajeev@linux.vnet.ibm.com
2024-06-13	perf record: Ensure space for lost samples	Ian Rogers
	Previous allocation didn't account for sample ID written after the lost samples event. Switch from malloc/free to a stack allocation. Reported-by: Milian Wolff <milian.wolff@kdab.com> Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/ Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240611050626.1223155-1-irogers@google.com
2024-06-10	perf evsel: Refactor tool events	Ian Rogers
	Tool events unnecessarily open a dummy perf event which is useless even with `perf record` which will still open a dummy event. Change the behavior of tool events so: - duration_time - call `rdclock` on open and then report the count as a delta since the start in evsel__read_counter. This moves code out of builtin-stat making it more general purpose. - user_time/system_time - open the fd as either `/proc/pid/stat` or `/proc/stat` for cases like system wide. evsel__read_counter will read the appropriate field out of the procfs file. These values were previously supplied by wait4, if the procfs read fails then the wait4 values are used, assuming the process/thread terminated. By reading user_time and system_time this way, interval mode, per PID and per CPU can be supported although there are restrictions given what the files provide (e.g. per PID can't be combined with per CPU). Opening any of the tool events for `perf record` is changed to return invalid. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240503232849.17752-1-irogers@google.com
2024-06-07	perf test: Speed up test case 70 annotate basic tests	Thomas Richter
	On some s390 linux machine (mostly older models) and with debug packages installed, the test case 'perf annotate basic tests' runs for some longer time. Speed up the test and save the output of command perf annotate in a temporary file. This is used to perform pattern matching via grep command. This saves on invocation of perf annotate which runs for some time. Output before: # time bash -x tests/shell/annotate.sh >/dev/null 2>&1; echo EXIT CODE $? real 4m35.543s user 3m19.442s sys 1m14.322s EXIT CODE 0 # Output after: # time bash -x tests/shell/annotate.sh >/dev/null 2>&1; echo EXIT CODE $? real 2m2.881s user 1m30.980s sys 0m30.684s EXIT CODE 0 # Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Cc: svens@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240607054352.2774936-1-tmricht@linux.ibm.com
2024-06-07	perf stat: Choose the most disaggregate command line option	Ian Rogers
	When multiple aggregation options are passed to perf stat the behavior isn't clear. Consider "perf stat -A --per-socket .." and "perf stat --per-socket -A ..", the first won't aggregate at all while the second will do per-socket aggregation, even though the same options were passed. Rather than set an enum value, gather the options in a struct and process them from most to least aggregate. This ensures the least aggregate option always applies, so no aggregation if "-A" is passed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605063828.195700-2-irogers@google.com
2024-06-07	perf stat: Make options local	Ian Rogers
	Reduce the scope of stat_options to cmd_stat, and pass as an argument to __cmd_record. This is done to make more localized changes to the options in later patches. A side-effect of the change is to reduce the size of a stripped PIE perf binary by 5952 bytes. The savings come mainly in the dynamic relocation section. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605063828.195700-1-irogers@google.com
2024-06-06	perf maps: Add/use a sorted insert for fixup overlap and insert	Ian Rogers
	Data may have lots of overlapping mmaps. The regular insert adds at the end and relies on a later sort. For data with overlapping mappings the sort will happen during a subsequent maps__find or __maps__fixup_overlap_and_insert, there's never a period where the inserted maps buffer up and a single sort happens. To avoid back to back sorts, maintain the sort order when fixing up and inserting. Previously the first_ending_after search was O(log n) where n is the size of maps, and the insert was O(1) but because of the continuous sorting was becoming O(n*log(n)). With maintaining sort order, the insert now becomes O(n) for a memmove. For a perf report on a perf.data file containing overlapping mappings the time numbers are: Before: real 0m5.894s user 0m5.650s sys 0m0.231s After: real 0m0.675s user 0m0.454s sys 0m0.196s Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Steinar H . Gunderson <sesse@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240521165109.708593-4-irogers@google.com
2024-06-06	perf maps: Reduce sorting for overlapping mappings	Ian Rogers
	When an 'after' map is generated the 'new' map must be before it so terminate iterating and don't resort. If the entry 'pos' is entirely overlapped by the 'new' mapping then don't remove and insert the mapping, just replace - again to remove sorting. For a perf report on a perf.data file containing overlapping mappings the time numbers are: Before: real 0m9.856s user 0m9.637s sys 0m0.204s After: real 0m5.894s user 0m5.650s sys 0m0.231s Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Steinar H . Gunderson <sesse@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240521165109.708593-3-irogers@google.com
2024-06-06	perf maps: Fix use after free in __maps__fixup_overlap_and_insert	Ian Rogers
	In the case 'before' and 'after' are broken out from pos, maps_by_address may be changed by __maps__insert, as such it needs re-reading. Don't ignore the return value from __maps_insert. Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Steinar H . Gunderson <sesse@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240521165109.708593-2-irogers@google.com
2024-06-06	perf script: netdev-times: add location parameter to consume_skb	Lucas Stach
	dd1b527831a3 ("net: add location to trace_consume_skb()") added a new parameter to the consume_skb tracepoint. Adapt the script to match. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: kernel@pengutronix.de Cc: patchwork-lst@pengutronix.de Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605144442.1985270-1-l.stach@pengutronix.de
2024-06-06	perf: parse-events: Fix compilation error while defining DEBUG_PARSER	Clément Le Goffic
	Compiling perf tool with 'DEBUG_PARSER=1' leads to errors: $> make -C tools/perf PARSER_DEBUG=1 NO_LIBTRACEEVENT=1 ... CC util/expr-flex.o CC util/expr.o util/parse-events.c:33:12: error: redundant redeclaration of ‘parse_events_debug’ [-Werror=redundant-decls] 33 \| extern int parse_events_debug; \| ^~~~~~~~~~~~~~~~~~ In file included from util/parse-events.c:18: util/parse-events-bison.h:43:12: note: previous declaration of ‘parse_events_debug’ with type ‘int’ 43 \| extern int parse_events_debug; \| ^~~~~~~~~~~~~~~~~~ util/expr.c:27:12: error: redundant redeclaration of ‘expr_debug’ [-Werror=redundant-decls] 27 \| extern int expr_debug; \| ^~~~~~~~~~ In file included from util/expr.c:11: util/expr-bison.h:43:12: note: previous declaration of ‘expr_debug’ with type ‘int’ 43 \| extern int expr_debug; \| ^~~~~~~~~~ cc-1: all warnings being treated as errors Remove extern declaration from the parse-envents.c file as there is a conflict with the ones generated using bison and yacc tools from the file parse-events.[ly]. Signed-off-by: Clément Le Goffic <clement.legoffic@foss.st.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605140453.614862-1-clement.legoffic@foss.st.com
2024-06-05	perf bpf: Fix handling of minimal vmlinux.h file when interrupting the build	Namhyung Kim
	Ingo reported that he was seeing these when hitting Control+C during a perf tools build: Makefile.perf:1149: *** Missing bpftool input for generating vmlinux.h. Stop. The failure happens when you don't have vmlinux.h or vmlinux with BTF. ifeq ($(VMLINUX_H),) ifeq ($(VMLINUX_BTF),) $(error Missing bpftool input for generating vmlinux.h) endif endif VMLINUX_BTF can be empty if you didn't build a kernel or it doesn't have a BTF section and the current kernel also has no BTF. This is totally ok. But VMLINUX_H should be set to the minimal version in the source tree (unless you overwrite it manually) when you don't pass GEN_VMLINUX_H=1 (which requires VMLINUX_BTF should not be empty). The problem is that it's defined in Makefile.config which is not included for `make clean`. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Link: http://lore.kernel.org/lkml/CAM9d7ch5HTr+k+_GpbMrX0HUo5BZ11byh1xq0Two7B7RQACuNw@mail.gmail.com Link: http://lore.kernel.org/lkml/ZjssGrj+abyC6mYP@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-05	Revert "perf record: Reduce memory for recording PERF_RECORD_LOST_SAMPLES event"	Arnaldo Carvalho de Melo
	This reverts commit 7d1405c71df21f6c394b8a885aa8a133f749fa22. This causes segfaults in some cases, as reported by Milian: ``` sudo /usr/bin/perf record -z --call-graph dwarf -e cycles -e raw_syscalls:sys_enter ls ... [ perf record: Woken up 3 times to write data ] malloc(): invalid next size (unsorted) Aborted ``` Backtrace with GDB + debuginfod: ``` malloc(): invalid next size (unsorted) Thread 1 "perf" received signal SIGABRT, Aborted. __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c 44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0; (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007ffff6ea8eb3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78 #2 0x00007ffff6e50a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/ raise.c:26 #3 0x00007ffff6e384c3 in __GI_abort () at abort.c:79 #4 0x00007ffff6e39354 in __libc_message_impl (fmt=fmt@entry=0x7ffff6fc22ea "%s\n") at ../sysdeps/posix/libc_fatal.c:132 #5 0x00007ffff6eb3085 in malloc_printerr (str=str@entry=0x7ffff6fc5850 "malloc(): invalid next size (unsorted)") at malloc.c:5772 #6 0x00007ffff6eb657c in _int_malloc (av=av@entry=0x7ffff6ff6ac0 <main_arena>, bytes=bytes@entry=368) at malloc.c:4081 #7 0x00007ffff6eb877e in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3754 #8 0x000055555569bdb6 in perf_session.do_write_header () #9 0x00005555555a373a in __cmd_record.constprop.0 () #10 0x00005555555a6846 in cmd_record () #11 0x000055555564db7f in run_builtin () #12 0x000055555558ed77 in main () ``` Valgrind memcheck: ``` ==45136== Invalid write of size 8 ==45136== at 0x2B38A5: perf_event__synthesize_id_sample (in /usr/bin/perf) ==45136== by 0x157069: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd ==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675) ==45136== by 0x3574AB: zalloc (in /usr/bin/perf) ==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== ==45136== Syscall param write(buf) points to unaddressable byte(s) ==45136== at 0x575953D: __libc_write (write.c:26) ==45136== by 0x575953D: write (write.c:24) ==45136== by 0x35761F: ion (in /usr/bin/perf) ==45136== by 0x357778: writen (in /usr/bin/perf) ==45136== by 0x1548F7: record__write (in /usr/bin/perf) ==45136== by 0x15708A: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd ==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675) ==45136== by 0x3574AB: zalloc (in /usr/bin/perf) ==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== ----- Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/ Reported-by: Milian Wolff <milian.wolff@kdab.com> Tested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@kernel.org # 6.8+ Link: https://lore.kernel.org/lkml/Zl9ksOlHJHnKM70p@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-06-04	perf hisi-ptt: remove unused struct 'hisi_ptt_queue'	Dr. David Alan Gilbert
	'hisi_ptt_queue' has been unused since the original commit 5e91e57e6809 ("perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: yangyicong@hisilicon.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240602000709.213116-1-linux@treblig.org
2024-06-03	perf genelf: remove unused struct 'options'	Dr. David Alan Gilbert
	'options' has been unused since commit fa7f7e735495 ("perf jit: Move test functionality in to a test"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240602000505.213032-1-linux@treblig.org
2024-06-03	perf lock info: Display both map and thread by default	Nick Forrington
	Change "perf lock info" argument handling to: Display both map and thread info (rather than an error) when neither are specified. Display both map and thread info (rather than just thread info) when both are requested. Signed-off-by: Nick Forrington <nick.forrington@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240513091413.738537-2-nick.forrington@arm.com
2024-05-30	perf top: Allow filters on events	Ian Rogers
	Allow filters to be added to perf top events. One use is to workaround issues with: ``` $ perf top --uid="$(id -u)" ``` which tries to scan /proc find processes belonging to the uid and can fail in such a pid terminates between the scan and the perf_event_open reporting: ``` Error: The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:P). /bin/dmesg \| grep -i perf may provide additional information. ``` A similar filter: ``` $ perf top -e cycles:P --filter "uid == $(id -u)" ``` doesn't fail this way. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240524205227.244375-4-irogers@google.com
2024-05-30	perf bpf filter: Add uid and gid terms	Ian Rogers
	Allow the BPF filter to use the uid and gid terms determined by the bpf_get_current_uid_gid BPF helper. For example, the following will record the cpu-clock event system wide discarding samples that don't belong to the current user. $ perf record -e cpu-clock --filter "uid == $(id -u)" -a sleep 0.1 Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240524205227.244375-3-irogers@google.com
2024-05-30	perf bpf filter: Give terms their own enum	Ian Rogers
	Give the term types their own enum so that additional terms can be added that don't correspond to a PERF_SAMPLE_xx flag. The term values are numerically ascending rather than bit field positions, this means they need translating to a PERF_SAMPLE_xx bit field in certain places using a shift. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240524205227.244375-2-irogers@google.com
2024-05-29	perf trace beauty: Always show mmap prot even though PROT_NONE	Changbin Du
	PROT_NONE is also useful information, so do not omit the mmap prot even though it is 0. syscall_arg__scnprintf_mmap_prot() could print PROT_NONE for prot 0. Before: PROT_NONE is not shown. $ sudo perf trace -e syscalls:sys_enter_mmap --filter prot==0 -- ls 0.000 ls/2979231 syscalls:sys_enter_mmap(len: 4220888, flags: PRIVATE\|ANONYMOUS) After: PROT_NONE is displayed. $ sudo perf trace -e syscalls:sys_enter_mmap --filter prot==0 -- ls 0.000 ls/2975708 syscalls:sys_enter_mmap(len: 4220888, prot: NONE, flags: PRIVATE\|ANONYMOUS) Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240522033542.1359421-3-changbin.du@huawei.com
2024-05-29	perf trace beauty: Always show param if show_zero is set	Changbin Du
	For some parameters, it is best to also display them when they are 0, e.g. flags. Here we only check the show_zero property and let arg printer handle special cases. Signed-off-by: Changbin Du <changbin.du@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240522033542.1359421-2-changbin.du@huawei.com
2024-05-28	perf docs: Fix typos	Ian Rogers
	Assorted typo fixes. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240521223555.858859-1-irogers@google.com
2024-05-28	perf list: Fix the --no-desc option	Breno Leitao
	Currently, the --no-desc option in perf list isn't functioning as intended. This issue arises from the overwriting of struct option->desc with the opposite value of struct option->long_desc. Consequently, whatever parse_options() returns at struct option->desc gets overridden later, rendering the --desc or --no-desc arguments ineffective. To resolve this, set ->desc as true by default and allow parse_options() to adjust it accordingly. This adjustment will fix the --no-desc option while preserving the functionality of the other parameters. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Ian Rogers <irogers@google.com> Cc: leit@meta.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240517141427.1905691-1-leitao@debian.org
2024-05-28	perf arm-spe: Unaligned pointer work around	Ian Rogers
	Use get_unaligned_leXX instead of leXX_to_cpu to handle unaligned pointers. Such pointers occur with libFuzzer testing. A similar change for intel-pt was done in: https://lore.kernel.org/r/20231005190451.175568-6-adrian.hunter@intel.com Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240514052402.3031871-1-irogers@google.com
2024-05-28	perf tests: Add some pmu core functionality tests	Ian Rogers
	Test behavior of PMU names and comparisons wrt suffixes using Intel uncore_cha, marvell mrvl_ddr_pmu and S390's cpum_cf as examples. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Bharat Bhushan <bbhushan2@marvell.com> Cc: Bhaskara Budiredla <bbudiredla@marvell.com> Cc: Tuan Phan <tuanphan@os.amperecomputing.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240515060114.3268149-3-irogers@google.com
2024-05-28	perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu	Ian Rogers
	The mrvl_ddr_pmu is uncore and has a hexadecimal address suffix while the previous PMU sorting/merging code assumes uncore PMU names start with uncore_ and have a decimal suffix. Because of the previous assumption it isn't possible to wildcard the mrvl_ddr_pmu. Modify pmu_name_len_no_suffix but also remove the suffix number out argument, this is because we don't know if a suffix number of say 100 is in hexadecimal or decimal. As the only use of the suffix number is in comparisons, it is safe there to compare the values as hexadecimal. Modify perf_pmu__match_ignoring_suffix so that hexadecimal suffixes are ignored. Only allow hexadecimal suffixes to be greater than length 2 (ie 3 or more) so that S390's cpum_cf PMU doesn't lose its suffix. Change the return type of pmu_name_len_no_suffix to size_t to workaround GCC incorrectly determining the result could be negative. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Bharat Bhushan <bbhushan2@marvell.com> Cc: Bhaskara Budiredla <bbudiredla@marvell.com> Cc: Tuan Phan <tuanphan@os.amperecomputing.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240515060114.3268149-2-irogers@google.com
2024-05-28	tools headers: Update the syscall tables and unistd.h, mostly to support the ↵	Arnaldo Carvalho de Melo
	new 'mseal' syscall But also to wire up shadow stacks on 32-bit x86, picking up those changes from these csets: ff388fe5c481d39c ("mseal: wire up mseal syscall") 2883f01ec37dd866 ("x86/shstk: Enable shadow stacks for x32") This makes 'perf trace' support it, now its possible, for instance to do: # perf trace -e mseal --max-stack=16 Here is an example with the 'sendmmsg' syscall: root@x1:~# perf trace -e sendmmsg --max-stack 16 --max-events=1 0.000 ( 0.062 ms): dbus-broker/1012 sendmmsg(fd: 150, mmsg: 0x7ffef57cca50, vlen: 1, flags: DONTWAIT\|NOSIGNAL) = 1 syscall_exit_to_user_mode_prepare ([kernel.kallsyms]) syscall_exit_to_user_mode_prepare ([kernel.kallsyms]) syscall_exit_to_user_mode ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64 ([kernel.kallsyms]) [0x117ce7] (/usr/lib64/libc.so.6 (deleted)) root@x1:~# To do a system wide tracing of the new 'mseal' syscall with a backtrace of at most 16 entries. This addresses these perf tools build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H J Lu <hjl.tools@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZlXlo4TNcba4wnVZ@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27	perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with ↵	Arnaldo Carvalho de Melo
	the kernel sources to pick POSTED_MSI_NOTIFICATION To pick up the change in: f5a3562ec9dd29e6 ("x86/irq: Reserve a per CPU IDT vector for posted MSIs") That picks up this new vector: $ cp arch/x86/include/asm/irq_vectors.h tools/perf/trace/beauty/arch/x86/include/asm/irq_vectors.h $ tools/perf/trace/beauty/tracepoints/x86_irq_vectors.sh > after $ diff -u before after --- before 2024-05-27 12:50:47.708863932 -0300 +++ after 2024-05-27 12:51:15.335113123 -0300 @@ -1,6 +1,7 @@ static const char x86_irq_vectors[] = { [0x02] = "NMI", [0x80] = "IA32_SYSCALL", + [0xeb] = "POSTED_MSI_NOTIFICATION", [0xec] = "LOCAL_TIMER", [0xed] = "HYPERV_STIMER0", [0xee] = "HYPERV_REENLIGHTENMENT", $ Now those will be known when pretty printing the irq_vectors: tracepoints. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/ZlS34M0x30EFVhbg@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27	perf beauty: Update copy of linux/socket.h with the kernel sources	Arnaldo Carvalho de Melo
	To pick up the fixes in: 0645fbe760afcc53 ("net: have do_accept() take a struct proto_accept_arg argument") That just changes a function prototype, not touching things used by the perf scrape scripts such as: $ tools/perf/trace/beauty/sockaddr.sh \| head -5 static const char *socket_families[] = { [0] = "UNSPEC", [1] = "LOCAL", [2] = "INET", [3] = "AX25", $ This addresses this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZlSrceExgjrUiDb5@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27	tools headers UAPI: Sync fcntl.h with the kernel sources to pick F_DUPFD_QUERY	Arnaldo Carvalho de Melo
	There is no scrape script yet for those, but the warning pointed out we need to update the array with the F_LINUX_SPECIFIC_BASE entries, do it. Now 'perf trace' can decode that cmd and also use it in filter, as in: root@number:~# perf trace -e syscalls:*enter_fcntl --filter 'cmd != SETFL && cmd != GETFL' 0.000 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLK, arg: 0x7fffdc6a8a50) 0.013 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLKW, arg: 0x7fffdc6a8aa0) 0.090 sssd_kcm/303828 syscalls:sys_enter_fcntl(fd: 13</var/lib/sss/secrets/secrets.ldb>, cmd: SETLKW, arg: 0x7fffdc6a88e0) ^Croot@number:~# This picks up the changes in: c62b758bae6af16f ("fcntl: add F_DUPFD_QUERY fcntl()") Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZlSqNQH9mFw2bmjq@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27	tools headers UAPI: Sync linux/prctl.h with the kernel sources	Arnaldo Carvalho de Melo
	To pick the changes in: 628d701f2de5b9a1 ("powerpc/dexcr: Add DEXCR prctl interface") 6b9391b581fddd85 ("riscv: Include riscv_set_icache_flush_ctx prctl") That adds some PowerPC and a RISC-V specific prctl options: $ tools/perf/trace/beauty/prctl_option.sh > before $ cp include/uapi/linux/prctl.h tools/perf/trace/beauty/include/uapi/linux/prctl.h $ tools/perf/trace/beauty/prctl_option.sh > after $ diff -u before after --- before 2024-05-27 12:14:21.358032781 -0300 +++ after 2024-05-27 12:14:32.364530185 -0300 @@ -65,6 +65,9 @@ [68] = "GET_MEMORY_MERGE", [69] = "RISCV_V_SET_CONTROL", [70] = "RISCV_V_GET_CONTROL", + [71] = "RISCV_SET_ICACHE_FLUSH_CTX", + [72] = "PPC_GET_DEXCR", + [73] = "PPC_SET_DEXCR", }; static const char *prctl_set_mm_options[] = { [1] = "START_CODE", $ That now will be used to decode the syscall option and also to compose filters, for instance: [root@five ~]# perf trace -e syscalls:sys_enter_prctl --filter option==SET_NAME 0.000 Isolated Servi/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23f13b7aee) 0.032 DOM Worker/3474327 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23deb25670) 7.920 :3474328/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fbb10) 7.935 StreamT~s #374/3474328 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24fb970) 8.400 Isolated Servi/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24bab10) 8.418 StreamT~s #374/3474329 syscalls:sys_enter_prctl(option: SET_NAME, arg2: 0x7f23e24ba970) ^C[root@five ~]# This addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/lkml/ZlSklGWp--v_Ije7@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-27	tools include UAPI: Sync linux/stat.h with the kernel sources	Arnaldo Carvalho de Melo
	To get the changes in: 2a82bb02941fb53d ("statx: stx_subvol") To pick up this change and support it: $ tools/perf/trace/beauty/statx_mask.sh > before $ cp include/uapi/linux/stat.h tools/perf/trace/beauty/include/uapi/linux/stat.h $ tools/perf/trace/beauty/statx_mask.sh > after $ diff -u before after --- before 2024-05-22 13:39:49.742470571 -0300 +++ after 2024-05-22 13:39:59.157883101 -0300 @@ -14,4 +14,5 @@ [ilog2(0x00001000) + 1] = "MNT_ID", [ilog2(0x00002000) + 1] = "DIOALIGN", [ilog2(0x00004000) + 1] = "MNT_ID_UNIQUE", + [ilog2(0x00008000) + 1] = "SUBVOL", }; $ Now we'll see it like we see these: # perf trace -e statx 0.000 ( 0.015 ms): systemd-userwo/3982299 statx(dfd: 6, filename: ".", mask: TYPE\|INO\|MNT_ID, buffer: 0x7ffd8945e850) = 0 <SNIP> 180.559 ( 0.007 ms): (ostnamed)/3982957 statx(dfd: 4, filename: "sys", flags: SYMLINK_NOFOLLOW\|NO_AUTOMOUNT\|STATX_DONT_SYNC, mask: TYPE, buffer: 0x7fff13161190) = 0 180.918 ( 0.011 ms): (ostnamed)/3982957 statx(dfd: CWD, filename: "/run/systemd/mount-rootfs/sys/kernel/security", flags: SYMLINK_NOFOLLOW\|NO_AUTOMOUNT\|STATX_DONT_SYNC, mask: MNT_ID, buffer: 0x7fff13161120) = 0 180.956 ( 0.010 ms): (ostnamed)/3982957 statx(dfd: CWD, filename: "/run/systemd/mount-rootfs/sys/fs/cgroup", flags: SYMLINK_NOFOLLOW\|NO_AUTOMOUNT\|STATX_DONT_SYNC, mask: MNT_ID, buffer: 0x7fff13161120) = 0 <SNIP> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zk5nO9yT0oPezUoo@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-26	Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"	Arnaldo Carvalho de Melo
	This reverts commit 617824a7f0f73e4de325cf8add58e55b28c12493. This made a simple 'perf record -e cycles:pp make -j199' stop working on the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed at length in the threads in the Link tags below. The fix provided by Ian wasn't acceptable and work to fix this will take time we don't have at this point, so lets revert this and work on it on the next devel cycle. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Cc: Ethan Adams <j.ethan.adams@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-21	Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "General: - Integrate the shellcheck utility with the build of perf to allow catching shell problems early in areas such as 'perf test', 'perf trace' scrape scripts, etc - Add 'uretprobe' variant in the 'perf bench uprobe' tool - Add script to run instances of 'perf script' in parallel - Allow parsing tracepoint names that start with digits, such as 9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems where those tracepoints aren't available - Add Kan Liang to MAINTAINERS as a perf tools reviewer - Add support for using the 'capstone' disassembler library in various tools, such as 'perf script' and 'perf annotate'. This is an alternative for the use of the 'xed' and 'objdump' disassemblers Data-type profiling improvements: - Resolve types for a->b->c by backtracking the assignments until it finds DWARF info for one of those members - Support for global variables, keeping a cache to speed up lookups - Handle the 'call' instruction, dealing with effects on registers and handling its return when tracking register data types - Handle x86's segment based addressing like %gs:0x28, to support things like per CPU variables, the stack canary, etc - Data-type profiling got big speedups when using capstone for disassembling. The objdump outoput parsing method is left as a fallback when capstone fails or isn't available. There are patches posted for 6.11 that to use a LLVM disassembler - Support event group display in the TUI when annotating types with --data-type, for instance to show memory load and store events for the data type fields - Optimize the 'perf annotate' data structures, reducing memory usage - Add a initial 'perf test' for 'perf annotate', checking that a target symbol appears on the output, specifying objdump via the command line, etc Vendor Events: - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1 - Add AMD's Zen 5 core and uncore events and metrics. Those come from the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors" document, with events that capture information on op dispatch, execution and retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/ AmpereOneX Miscellaneous: - Sync header copies with the kernel sources - Move some header copies used only for generating translation string tables for ioctl cmds and other syscall integer arguments to a new directory under tools/perf/beauty/, to separate from copies in tools/include/ that are used to build the tools - Introduce scrape script for several syscall 'flags'/'mask' arguments - Improve cpumap utilization, fixing up pairing of refcounts, using the right iterators (perf_cpu_map__for_each_cpu), etc - Give more details about raw event encodings in 'perf list', show tracepoint encoding in the detailed output - Refactor the DSOs handling code, reducing memory usage - Document the BPF event modifier and add a 'perf test' for it - Improve the event parser, better error messages and add further 'perf test's for it - Add reference count checking to 'struct comm_str' and 'struct mem_info' - Make ARM64's 'perf test' entries for the Neoverse N1 more robust - Tweak the ARM64's Coresight 'perf test's - Improve ARM64's CoreSight ETM version detection and error reporting - Fix handling of symbols when using kcore - Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual machines in 'perf report' - Fix -g/--call-graph option failure in 'perf sched timehist' - Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent installed in non-standard directories, such as when doing cross builds - Various 'perf test' and 'perf bench' fixes - Improve 'perf probe' error message for long C++ probe names" * tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits) tools lib subcmd: Show parent options in help perf pmu: Count sys and cpuid JSON events separately perf stat: Don't display metric header for non-leader uncore events perf annotate-data: Ensure the number of type histograms perf annotate: Fix segfault on sample histogram perf daemon: Fix file leak in daemon_session__control libsubcmd: Fix parse-options memory leak perf lock: Avoid memory leaks from strdup() perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency perf tools: Ignore deleted cgroups perf parse: Allow tracepoint names to start with digits perf parse-events: Add new 'fake_tp' parameter for tests perf parse-events: pass parse_state to add_tracepoint perf symbols: Fix ownership of string in dso__load_vmlinux() perf symbols: Update kcore map before merging in remaining symbols perf maps: Re-use __maps__free_maps_by_name() perf symbols: Remove map from list before updating addresses perf tracepoint: Don't scan all tracepoints to test if one exists perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT perf thread: Fixes to thread__new() related to initializing comm ...
2024-05-11	perf pmu: Count sys and cpuid JSON events separately	Ian Rogers
	Sys events are eagerly loaded as each event has a compat option that may mean the event is or isn't associated with the PMU. These shouldn't be counted as loaded_json_events as that is used for JSON events matching the CPUID that may or may not have been loaded. The mismatch causes issues on ARM64 that uses sys events. Fixes: e6ff1eed3584362d ("perf pmu: Lazily add JSON events") Closes: https://lore.kernel.org/lkml/20240510024729.1075732-1-justin.he@arm.com/ Reported-by: Jia He <justin.he@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240511003601.2666907-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-11	perf stat: Don't display metric header for non-leader uncore events	Ian Rogers
	On an Intel tigerlake laptop a metric like: { "BriefDescription": "Test", "MetricExpr": "imc_free_running@data_read@ + imc_free_running@data_write@", "MetricGroup": "Test", "MetricName": "Test", "ScaleUnit": "6.103515625e-5MiB" }, Will have 4 events: uncore_imc_free_running_0/data_read/ uncore_imc_free_running_0/data_write/ uncore_imc_free_running_1/data_read/ uncore_imc_free_running_1/data_write/ If aggregration is disabled with metric-only 2 column headers are needed: $ perf stat -M test --metric-only -A -a sleep 1 Performance counter stats for 'system wide': MiB Test MiB Test CPU0 1821.0 1820.5 But when not, the counts aggregated in the metric leader and only 1 column should be shown: $ perf stat -M test --metric-only -a sleep 1 Performance counter stats for 'system wide': MiB Test 5909.4 1.001258915 seconds time elapsed Achieve this by skipping events that aren't metric leaders when printing column headers and aggregation isn't disabled. The bug is long standing, the fixes tag is set to a refactor as that is as far back as is reasonable to backport. Fixes: 088519f318be3a41 ("perf stat: Move the display functions to stat-display.c") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kaige Ye <ye@kaige.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240510051309.2452468-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-11	perf annotate-data: Ensure the number of type histograms	Namhyung Kim
	Arnaldo reported that there is a case where nr_histograms and histograms don't agree each other. It ended up in a segfault trying to access a NULL histograms array. Let's make sure to update the nr_histograms when the histograms array is changed. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510210452.2449944-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-11	perf annotate: Fix segfault on sample histogram	Namhyung Kim
	A symbol can have no samples, then accessing the annotated_source->samples hashmap will result in a segfault. Fixes: a3f7768bcf48281d ("perf annotate: Fix memory leak in annotated_source") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510210452.2449944-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-10	perf daemon: Fix file leak in daemon_session__control	Samasth Norway Ananda
	The open() function returns -1 on error. The 'control' and 'ack' file descriptors are both initialized with open() and further validated with 'if' statement. 'if (!control)' would evaluate to 'true' if returned value on error were '0' but it is actually '-1'. Fixes: edcaa47958c7438b ("perf daemon: Add 'ping' command") Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510003424.2016914-1-samasth.norway.ananda@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>