Age | Commit message (Collapse) | Author |
|
Add Neoverse-V2 MIDR to the common data source encoding range list.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-7-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The 'midr' field is replaced by the MIDR values stored in metadata (per
CPU wise). Remove the 'midr' field as it is no longer used.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-6-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Use the info in the metadata to decide if the data source feature is
supported. The CPU MIDR must be in the CPU list for the common data
source encoding.
For the metadata version 1, it doesn't include info for MIDR. In this
case, due to absent info for making decision, print out warning to
remind users to upgrade tool and returns false.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-5-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Introduce the arm_spe__is_homogeneous() function, it uses to check if
Arm SPE is homogeneous cross all CPUs.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-4-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The Neoverse CPUs follow the common data source encoding, and other
CPU variants can share the same format.
Rename the CPU list and data source definitions as common data source
names. This change prepares for appending more CPU variants.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-3-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The arm_spe__synth_data_source_generic() function is invoked when the
tool detects that CPUs do not support data source packets and falls back
to synthesizing only the memory level.
Rename it to arm_spe__synth_memory_level() for better reflecting its
purpose.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20241003185322.192357-2-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
As Ian Rogers <irogers@google.com> pointed out, intel-cqm.c is neither
used nor built. It was deleted in the following commit:
commit b24413180f56 ("License cleanup: add SPDX GPL-2.0 license identifier to files with no license")
However, it resurfaced soon after in the following commit:
commit 5c9295bfe6f5 ("perf tests: Remove Intel CQM perf test")
It should be deleted once and for all.
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Howard Chu <howardchu95@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Matt Fleming <mfleming@cloudflare.com>
Link: https://lore.kernel.org/r/20241011055700.4142694-1-howardchu95@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It should not clear the inherit bit simply because the kernel doesn't
support the sample read with it. IOW the inherit bit should be kept
when the sample read is not requested for the event.
Fixes: 90035d3cd876cb71 ("tools/perf: Allow inherit + PERF_SAMPLE_READ when opening events")
Acked-by: Ben Gainey <ben.gainey@arm.com>
Link: https://lore.kernel.org/r/20241009062250.730192-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
pre-migration wait time is the time that a task unnecessarily spends
on the runqueue of a CPU but doesn't get switched-in there. In terms
of tracepoints, it is the time between sched:sched_wakeup and
sched:sched_migrate_task.
Let's say a task woke up on CPU2, then it got migrated to CPU4 and
then it's switched-in to CPU4. So, here pre-migration wait time is
time that it was waiting on runqueue of CPU2 after it is woken up.
The general pattern for pre-migration to occur is:
sched:sched_wakeup
sched:sched_migrate_task
sched:sched_switch
The sched:sched_waking event is used to capture the wakeup time,
as it aligns with the existing code and only introduces a negligible
time difference.
pre-migrations are generally not useful and it increases migrations.
This metric would be helpful in testing patches mainly related to wakeup
and load-balancer code paths as better wakeup logic would choose an
optimal CPU where task would be switched-in and thereby reducing pre-
migrations.
The sample output(s) when -P or --pre-migrations is used:
=================
time cpu task name wait time sch delay run time pre-mig time
[tid/pid] (msec) (msec) (msec) (msec)
--------------- ------ ------------------------------ --------- --------- --------- ---------
38456.720806 [0001] schbench[28634/28574] 4.917 4.768 1.004 0.000
38456.720810 [0001] rcu_preempt[18] 3.919 0.003 0.004 0.000
38456.721800 [0006] schbench[28779/28574] 23.465 23.465 1.999 0.000
38456.722800 [0002] schbench[28773/28574] 60.371 60.237 3.955 60.197
38456.722806 [0001] schbench[28634/28574] 0.004 0.004 1.996 0.000
38456.722811 [0001] rcu_preempt[18] 1.996 0.005 0.005 0.000
38456.723800 [0000] schbench[28833/28574] 4.000 4.000 3.999 0.000
38456.723800 [0004] schbench[28762/28574] 42.951 42.839 3.999 39.867
38456.723802 [0007] schbench[28812/28574] 43.947 43.817 3.999 40.866
38456.723804 [0001] schbench[28587/28574] 7.935 7.822 0.993 0.000
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Link: https://lore.kernel.org/r/20241004170756.18064-1-vineethr@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The hashmap API used to require parentheses for the hashmap argument if
it's not a pointer type. It's now fixed so let's drop the parentheses.
Link: https://lore.kernel.org/r/20241009202009.884884-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The hashmap__for_each_entry[_safe] is accessing 'map' as if it's a
pointer. But it does without parentheses so passing a static hash map
with an ampersand (like &slab_hash below) caused compiler warnings due
to unmatched types.
In file included from util/bpf_lock_contention.c:5:
util/bpf_lock_contention.c: In function ‘exit_slab_cache_iter’:
linux/tools/perf/util/hashmap.h:169:32: error: invalid type argument of ‘->’ (have ‘struct hashmap’)
169 | for (bkt = 0; bkt < map->cap; bkt++) \
| ^~
util/bpf_lock_contention.c:105:9: note: in expansion of macro ‘hashmap__for_each_entry’
105 | hashmap__for_each_entry(&slab_hash, cur, bkt)
| ^~~~~~~~~~~~~~~~~~~~~~~
/home/namhyung/project/linux/tools/perf/util/hashmap.h:170:31: error: invalid type argument of ‘->’ (have ‘struct hashmap’)
170 | for (cur = map->buckets[bkt]; cur; cur = cur->next)
| ^~
util/bpf_lock_contention.c:105:9: note: in expansion of macro ‘hashmap__for_each_entry’
105 | hashmap__for_each_entry(&slab_hash, cur, bkt)
| ^~~~~~~~~~~~~~~~~~~~~~~
Cc: bpf@vger.kernel.org
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241009202009.884884-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To get the fixes in the current perf-tools tree.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
util/tool_pmu.c: In function 'evsel__tool_pmu_read':
util/tool_pmu.c:419:55: error: passing argument 2 of 'tool_pmu__read_event' from incompatible pointer type [-Werror=incompatible-pointer-types]
419 | if (!tool_pmu__read_event(ev, &val)) {
| ^~~~
| |
| long unsigned int *
util/tool_pmu.c:335:56: note: expected 'u64 *' {aka 'long long unsigned int *'} but argument is of type 'long unsigned int *'
335 | bool tool_pmu__read_event(enum tool_pmu_event ev, u64 *result)
| ~~~~~^~~~~~
Link: https://lore.kernel.org/r/Zw1XIGML32VaxE0t@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The testcase for tool_pmu failed in powerpc as below:
./perf test -v "Parsing without PMU name"
8: Tool PMU :
8.1: Parsing without PMU name : FAILED!
This happens when parse_events results in either skip or fail
of an event. Because the code invokes evlist__delete(evlist)
and "goto out".
ret = parse_events(evlist, str, &err);
if (ret) {
evlist__delete(evlist);
But in the "out" section also evlist__delete happens.
out:
evlist__delete(evlist);
return ret;
Hence remove the duplicate evlist__delete from the first path
in the testcase
With the change:
# ./perf test -v "Parsing without PMU name"
8: Tool PMU :
8.1: Parsing without PMU name : Ok
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: akanksha@linux.ibm.com
Cc: hbathini@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20241013170732.71339-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
perf fails to compile on systems with GCC version11
as below:
In file included from /usr/include/string.h:519,
from /home/athir/perf-tools-next/tools/include/linux/bitmap.h:5,
from /home/athir/perf-tools-next/tools/perf/util/pmu.h:5,
from /home/athir/perf-tools-next/tools/perf/util/evsel.h:14,
from /home/athir/perf-tools-next/tools/perf/util/evlist.h:14,
from tests/tool_pmu.c:3:
In function ‘strncpy’,
inlined from ‘do_test’ at tests/tool_pmu.c:25:3:
/usr/include/bits/string_fortified.h:95:10: error: ‘__builtin_strncpy’ specified bound 128 equals destination size [-Werror=stringop-truncation]
95 | return __builtin___strncpy_chk (__dest, __src, __len,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
96 | __glibc_objsize (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~~
The compile error is from strncpy refernce in do_test:
strncpy(str, tool_pmu__event_to_str(ev), sizeof(str));
This behaviour is not observed with GCC version 8, but observed
with GCC version 11 . This is message from gcc for detecting
truncation while using strncpu. Use snprintf instead of strncpy
here to be safe.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: akanksha@linux.ibm.com
Cc: hbathini@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: disgoel@linux.vnet.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20241013173742.71882-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The original commit message:
"
Use current sort mechanism but the real .se_cmp() just returns 0 so
that new columns "Predicted", "Abort" and "Cycles" are created in display
but actually these keys are not the sort keys.
For example:
Overhead Source:Line Symbol Shared Object Predicted Abort Cycles
........ ............ ........ ............. ......... ..... ......
38.25% div.c:45 [.] main div 97.6% 0 3
"
Update missed commit from series "perf report: Show branch flags/cycles
in --branch-history callgraph view" to apply to current repository so that
new columns described above are visible.
Link to original series:
https://lore.kernel.org/lkml/1477876794-30749-1-git-send-email-yao.jin@linux.intel.com/
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Link: https://lore.kernel.org/r/20241010184046.203822-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Ensure parsing with and without PMU creates events with the expected
config values. This ensures the tool.json doesn't get out of sync with
tool_pmu_event enum.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-11-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Use the regular PMU approaches with tool json events to reduce the
amount of special tool_pmu code - tool_pmu__config_terms and
tool_pmu__for_each_event_cb are removed. Some functions remain, like
tool_pmu__str_to_event, as conveniences to metricgroups. Add
tool_pmu__skip_event/tool_pmu__num_skip_events to handle the case that
tool json events shouldn't appear on certain architectures. This isn't
done in jevents.py due to complexity in the empty-pmu-events.c and
when all vendor json is built into the tool.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-10-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Introduce the notion of a common architecture/model that can be used
to find event tables for common PMUs like the tool PMU. By having tool
events be json standard PMU attribute configuration, descriptions,
etc. can be used and these routines are already optimized for things
like binary searching.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-9-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add the expr literals like "#smt_on" as tool events, this allows stat
events to give the values. On my laptop with hyperthreading enabled:
```
$ perf stat -e "has_pmem,num_cores,num_cpus,num_cpus_online,num_dies,num_packages,smt_on,system_tsc_freq" true
Performance counter stats for 'true':
0 has_pmem
8 num_cores
16 num_cpus
16 num_cpus_online
1 num_dies
1 num_packages
1 smt_on
2,496,000,000 system_tsc_freq
0.001113637 seconds time elapsed
0.001218000 seconds user
0.000000000 seconds sys
```
And with hyperthreading disabled:
```
$ perf stat -e "has_pmem,num_cores,num_cpus,num_cpus_online,num_dies,num_packages,smt_on,system_tsc_freq" true
Performance counter stats for 'true':
0 has_pmem
8 num_cores
16 num_cpus
8 num_cpus_online
1 num_dies
1 num_packages
0 smt_on
2,496,000,000 system_tsc_freq
0.000802115 seconds time elapsed
0.000000000 seconds user
0.000806000 seconds sys
```
As zero matters for these values, in stat-display
should_skip_zero_counter only skip the zero value if it is not the
first aggregation index.
The tool event implementations are used in expr but not evaluated as
events for simplicity. Also core_wide isn't made a tool event as it
requires command line parameters.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-8-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now the events are associated with the tool PMU, rename the functions
to reflect this.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-7-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To better reflect the events listed are from the tool PMU. Rename the
enum values from PERF_TOOL_* to TOOL_PMU__EVENT_*.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than treat tool events as a special kind of event, create a
tool only PMU where the events/aliases match the existing
duration_time, user_time and system_time events. Remove special
parsing and printing support for the tool events, but add function
calls for when PMU functions are called on a tool_pmu.
Move the tool PMU code in evsel into tool_pmu.c to better encapsulate
the tool event behavior in that file.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Expose config_term_name as parse_events__term_type_str so that PMUs not
in pmu.c may access it.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Hard coded terms like "config=10" are skipped by perf_pmu__config
assuming they were already applied to a perf_event_attr by parse
event's config_attr function. When doing a reverse number to name
lookup in perf_pmu__name_from_config, as the hardcoded terms aren't
applied the config value is incorrect leading to misses or false
matches. Fix this by adding a parameter to have perf_pmu__config apply
hardcoded terms too (not just in parse event's config_term_common).
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Use ifs rather than ?: to avoid a large compound statement.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20241002032016.333748-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
color_fwrite_lines() was added by 2009's commit
8fc0321f1ad0 ("perf_counter tools: Add color terminal output support")
but has never been used.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241009003938.254936-1-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Change function name "is_hydrid" to "is_hybrid".
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20241007194758.78659-1-thomas.falcon@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
add_perf_probe_events has been unused since 2015's commit
b02137cc6550 ("perf probe: Move print logic into cmd_probe()")
which confusingly now uses perf_add_probe_events.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240929010659.430208-1-linux@treblig.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix an assert() to handle captured and unprocessed ARM CoreSight CPU
traces
- Fix static build compilation error when libdw isn't installed or is
too old
- Add missing include when building with
!HAVE_DWARF_GETLOCATIONS_SUPPORT
- Add missing refcount put on 32-bit DSOs
- Fix disassembly of user space binaries by setting the binary_type of
DSO when loading
- Update headers with the kernel sources, including asound.h, sched.h,
fcntl, msr-index.h, irq_vectors.h, socket.h, list_sort.c and arm64's
cputype.h
* tag 'perf-tools-fixes-for-v6.12-1-2024-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf cs-etm: Fix the assert() to handle captured and unprocessed cpu trace
perf build: Fix build feature-dwarf_getlocations fail for old libdw
perf build: Fix static compilation error when libdw is not installed
perf dwarf-aux: Fix build with !HAVE_DWARF_GETLOCATIONS_SUPPORT
tools headers arm64: Sync arm64's cputype.h with the kernel sources
perf tools: Cope with differences for lib/list_sort.c copy from the kernel
tools check_headers.sh: Add check variant that excludes some hunks
perf beauty: Update copy of linux/socket.h with the kernel sources
tools headers UAPI: Sync the linux/in.h with the kernel sources
perf trace beauty: Update the arch/x86/include/asm/irq_vectors.h copy with the kernel sources
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools include UAPI: Sync linux/fcntl.h copy with the kernel sources
tools include UAPI: Sync linux/sched.h copy with the kernel sources
tools include UAPI: Sync sound/asound.h copy with the kernel sources
perf vdso: Missed put on 32-bit dsos
perf symbol: Set binary_type of dso when loading
|
|
With the patch 0b6c5371c03c "Add missing topdown metrics events" eight
topdown metric events with numbers ranging from 0x8000 to 0x8700 were
added to the test since they were added as 'perf stat' default events.
Later the patch 951efb9976ce "Update no event/metric expectations" kept
only 4 of those events(0x8000-0x8300).
Currently, the topdown events with numbers 0x8400 to 0x8700 are missing
from the list of expected events resulting in a failure. Add back the
missing topdown events.
Fixes: 951efb9976ce ("perf test attr: Update no event/metric expectations")
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: mpetlan@redhat.com
Link: https://lore.kernel.org/r/20240311081611.7835-1-vmolnaro@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This commit dumps metadata with version 2. It dumps metadata for header
and per CPU data respectively in the arm_spe_print_info() function to
support metadata version 2 format.
After:
0 0 0x3c0 [0x1b0]: PERF_RECORD_AUXTRACE_INFO type: 4
Header version :2
Header size :4
PMU type v2 :13
CPU number :8
Magic :0x1010101010101010
CPU # :0
Num of params :3
MIDR :0x410fd801
PMU Type :-1
Min Interval :0
Magic :0x1010101010101010
CPU # :1
Num of params :3
MIDR :0x410fd801
PMU Type :-1
Min Interval :0
Magic :0x1010101010101010
CPU # :2
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :3
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :4
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :5
Num of params :3
MIDR :0x410fd870
PMU Type :13
Min Interval :1024
Magic :0x1010101010101010
CPU # :6
Num of params :3
MIDR :0x410fd850
PMU Type :-1
Min Interval :0
Magic :0x1010101010101010
CPU # :7
Num of params :3
MIDR :0x410fd850
PMU Type :-1
Min Interval :0
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-6-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This commit is to support metadata version 2 and at the meantime it is
backward compatible for version 1's format.
The metadata version 1 doesn't include the ARM_SPE_HEADER_VERSION field.
As version 1 is fixed with two u64 fields, by checking the metadata
size, it distinguishes the metadata is version 1 or version 2 (and any
new versions if later will have). For version 2, it reads out CPU number
and retrieves the metadata info for every CPU.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-5-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Save the Arm SPE information on a per-CPU basis. This approach is easier
in the decoding phase for retrieving metadata based on the CPU number of
every Arm SPE record.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-4-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The metadata is designed to contain a header and per CPU information.
The arm_spe_find_cpus() function is introduced to identify how many CPUs
support ARM SPE. Based on the CPU number, calculates the metadata size.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-3-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The first version's metadata header structure doesn't include a field to
indicate a header version, which is not friendly for extension.
Define the metadata version 2 format with a new header structure and
extend per CPU's metadata. In the meantime, the old metadata header will
still be supported for backward compatibility.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Besar Wicaksono <bwicaksono@nvidia.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241003184302.190806-2-leo.yan@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
There is a difference between the SYNOPSIS section of the help message
and the man page (tools/perf/Documentation/perf-list.txt) for the perf
list command. After checking, we found that the help message reflected
the latest specifications. Therefore, revised the SYNOPSIS section of
the man page to match the help message.
Signed-off-by: Yoshihiro Furudera <fj5100bi@fujitsu.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Liang
Link: https://lore.kernel.org/r/20241003002404.2592094-1-fj5100bi@fujitsu.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Test "Setup struct perf_event_attr" consists of multiple test cases that
can affect the max sample rate value for perf events. Some test cases
check this value as it should not be lowered under the set minimum for
the given test. Currently, it is possible for the test cases to affect
each other as the previous tests can lower the sample rate, leading to
a possible failure of some of the future test cases as the value is not
restored at any point.
# 10: Setup struct perf_event_attr:
--- start ---
test child forked, pid 104220
Using CPUID 0x00000000413fd0c1
running './tests/attr/test-record-C0'
Current sample rate: 10000
running './tests/attr/test-record-basic'
Current sample rate: 900
running './tests/attr/test-record-branch-any'
Current sample rate: 600
running './tests/attr/test-record-dummy-C0'
Current sample rate: 600
expected sample_period=4000, got 600
FAILED './tests/attr/test-record-dummy-C0' - match failure
Restore the max sample rate value for perf events to a reasonable value
before each test case if its value was lowered too much to ensure the
same conditions for each test case.
# 10: Setup struct perf_event_attr:
--- start ---
test child forked, pid 107222
Using CPUID 0x00000000413fd0c1
running './tests/attr/test-record-C0'
Current sample rate: 10000
running './tests/attr/test-record-basic'
Current sample rate: 800
running './tests/attr/test-record-branch-any'
Current sample rate: 700
unsupp './tests/attr/test-record-branch-any'
running './tests/attr/test-record-branch-filter-any'
Current sample rate: 10000
running './tests/attr/test-record-count'
Current sample rate: 10000
running './tests/attr/test-record-data'
Current sample rate: 600
running './tests/attr/test-record-dummy-C0'
Current sample rate: 800
running './tests/attr/test-record-freq'
Current sample rate: 10000
...
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Radostin Stoyanov <rstoyano@redhat.com>
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Link: https://lore.kernel.org/r/20241003125136.15918-1-vmolnaro@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Since 9ffa6c7512ca ("perf machine thread: Remove exited threads by
default") perf cleans exited threads up, but as said, sometimes they
are necessary to be kept. The mentioned commit does not cover all the
cases, we also need the information to construct the summary table in
perf-trace.
Before:
# perf trace -s true
Summary of events:
After:
# perf trace -s -- true
Summary of events:
true (383382), 64 events, 91.4%
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
mmap 8 0 0.150 0.013 0.019 0.031 11.90%
mprotect 3 0 0.045 0.014 0.015 0.017 6.47%
openat 2 0 0.014 0.006 0.007 0.007 9.73%
munmap 1 0 0.009 0.009 0.009 0.009 0.00%
access 1 1 0.009 0.009 0.009 0.009 0.00%
pread64 4 0 0.006 0.001 0.001 0.002 4.53%
fstat 2 0 0.005 0.001 0.002 0.003 37.59%
arch_prctl 2 1 0.003 0.001 0.002 0.002 25.91%
read 1 0 0.003 0.003 0.003 0.003 0.00%
close 2 0 0.003 0.001 0.001 0.001 3.86%
brk 1 0 0.002 0.002 0.002 0.002 0.00%
rseq 1 0 0.001 0.001 0.001 0.001 0.00%
prlimit64 1 0 0.001 0.001 0.001 0.001 0.00%
set_robust_list 1 0 0.001 0.001 0.001 0.001 0.00%
set_tid_address 1 0 0.001 0.001 0.001 0.001 0.00%
execve 1 0 0.000 0.000 0.000 0.000 0.00%
[namhyung: simplified the condition]
Fixes: 9ffa6c7512ca ("perf machine thread: Remove exited threads by default")
Reported-by: Veronika Molnarova <vmolnaro@redhat.com>
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Link: https://lore.kernel.org/r/20240927151926.399474-1-mpetlan@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Command perf test 86 fails on s390:
# perf test -F 86
ping 868299 [007] 28248.013596: probe_libc:inet_pton_1: (3ff95948020)
3ff95948020 inet_pton+0x0 (inlined)
3ff9595e6e7 text_to_binary_address+0x1007 (inlined)
3ff9595e6e7 gaih_inet+0x1007 (inlined)
FAIL: expected backtrace entry \
"main\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$"
got "3ff9595e6e7 gaih_inet+0x1007 (inlined)"
86: probe libc's inet_pton & backtrace it with ping : FAILED!
#
The root cause is a new stack layout, two functions have been added
as seen below.
# perf script | tac | grep -m1 '^ping' -B9 | tac
ping 866856 [007] 25979.494921: probe_libc:inet_pton: (3ff8ec48020)
3ff8ec48020 inet_pton+0x0 (inlined)
new --> 3ff8ec5e6e7 text_to_binary_address+0x1007 (inlined)
new --> 3ff8ec5e6e7 gaih_inet+0x1007 (inlined)
3ff8ec5e6e7 getaddrinfo+0x1007 (/usr/lib64/libc.so.6)
2aa3fe04bf5 main+0xff5 (/usr/bin/ping)
3ff8eb34a5b __libc_start_call_main+0x8b (/usr/lib64/libc.so.6)
3ff8eb34b5d __libc_start_main@GLIBC_2.2+0xad (inlined)
2aa3fe06a1f [unknown] (/usr/bin/ping)
#
The new functions in the call chain are:
- text_to_binary_address()
- gaih_inet().
Both functions are inlined and do not show up in the output
of the nm command:
# nm -a /usr/lib64/libc.so.6 | \
grep -E '(text_to_binary_address|gaih_inet)$'
#
There is no possibility to add these 2 functions depending on their
existance in the C library.
Add text_to_binary_address() and gaih_inet() to the list of
expected functions in an compatible way and extend the regular
expression. On s390 the backtrace can now be
Before After
probe_libc:inet_pton probe_libc:inet_pton
inet_pton inet_pton
getaddrinfo getaddrinfo | text_to_binary_address
main main | gaih_inet
Output after:
# perf test -F 86
86: probe libc's inet_pton & backtrace it with ping : Ok
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: agordeev@linux.ibm.com
Cc: gor@linux.ibm.com
Cc: hca@linux.ibm.com
Cc: sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20241001124224.3370306-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The "perf record" tool will now default to this new mode if the user
specifies a sampling group when not in system-wide mode, and when
"--no-inherit" is not specified.
This change updates evsel to allow the combination of inherit
and PERF_SAMPLE_READ.
A fallback is implemented for kernel versions where this feature is not
supported.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Cc: james.clark@arm.com
Link: https://lore.kernel.org/r/20241001121505.1009685-3-ben.gainey@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Sample period calculation in deliver_sample_value is updated to
calculate the per-thread period delta for events that are inherit +
PERF_SAMPLE_READ. When the sampling event has this configuration, the
read_format.id is used with the tid from the sample to lookup the
storage of the previously accumulated counter total before calculating
the delta. All existing valid configurations where read_format.value
represents some global value continue to use just the read_format.id to
locate the storage of the previously accumulated total.
perf_sample_id is modified to support tracking per-thread
values, along with the existing global per-id values. In the
per-thread case, values are stored in a hash by tid within the
perf_sample_id, and are dynamically allocated as the number is not known
ahead of time.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Cc: james.clark@arm.com
Link: https://lore.kernel.org/r/20241001121505.1009685-2-ben.gainey@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Clean up return value to be TEST_* rather than unspecific integer. Add
test case skip reason. Skip test if EACCES comes back from
evsel__newtp.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-5-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Clean up return value to be TEST_* rather than unspecific integer. Add
test case skip reason. Skip test if EACCES comes back from
evsel__newtp.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
These error paths occur without sufficient permissions. Fix the memory
leaks to make leak sanitizer happier.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Missed cleanup when an error occurs.
Fixes: 49de179577e7 ("perf stat: No need to setup affinities when starting a workload")
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20241001052327.7052-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The "perf all PMU test" fails on a Coffee Lake machine.
The failure is caused by the below change in the commit e2641db83f18
("perf vendor events: Add/update skylake events/metrics").
+ {
+ "BriefDescription": "This 48-bit fixed counter counts the UCLK cycles",
+ "Counter": "FIXED",
+ "EventCode": "0xff",
+ "EventName": "UNC_CLOCK.SOCKET",
+ "PerPkg": "1",
+ "PublicDescription": "This 48-bit fixed counter counts the UCLK cycles.",
+ "Unit": "cbox_0"
}
The other cbox events have the unit name "CBOX", while the fixed counter
has a unit name "cbox_0". So the events_table will maintain separate
entries for cbox and cbox_0.
The perf_pmus__print_pmu_events() calculates the total number of events,
allocate an aliases buffer, store all the events into the buffer, sort,
and print all the aliases one by one.
The problem is that the calculated total number of events doesn't match
the stored events in the aliases buffer.
The perf_pmu__num_events() is used to calculate the number of events. It
invokes the pmu_events_table__num_events() to go through the entire
events_table to find all events. Because of the
pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
the events for cbox and cbox_0 are all counted.
When storing events into the aliases buffer, the
perf_pmu__for_each_event() only process the events for cbox.
Since a bigger buffer was allocated, the last entry are all 0.
When printing all the aliases, null will be outputted, and trigger the
failure.
The mismatch was introduced from the commit e3edd6cf6399 ("perf
pmu-events: Reduce processed events by passing PMU"). The
pmu_events_table__for_each_event() stops immediately once a pmu is set.
But for uncore, especially this case, the method is wrong and mismatch
what perf does in the perf_pmu__num_events().
With the patch,
$ perf list pmu | grep -A 1 clock.socket
unc_clock.socket
[This 48-bit fixed counter counts the UCLK cycles. Unit: uncore_cbox_0
$ perf test "perf all PMU test"
107: perf all PMU test : Ok
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/all/202407101021.2c8baddb-oliver.sang@intel.com/
Fixes: e3edd6cf6399 ("perf pmu-events: Reduce processed events by passing PMU")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241001021431.814811-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
|
|
If one builds perf with DEBUG=1, captures data on multiple CPUs and
finally runs 'perf report -C <cpu>' for only one of the cpus, assert()
aborts the program. This happens because there are empty queues with
format set.
This patch changes the condition to abort only if a queue is not empty
and if the format is unset.
$ make -C tools/perf DEBUG=1 CORESIGHT=1 CSLIBS=/usr/lib CSINCLUDES=/usr/include install
$ perf record -o kcore --kcore -e cs_etm/timestamp/k -s -C 0-1 dd if=/dev/zero of=/dev/null bs=1M count=1
$ perf report --input kcore/data --vmlinux=/home/ikoskine/projects/linux/vmlinux -C 1
Aborted (core dumped)
Fixes: 57880a7966be510c ("perf: cs-etm: Allocate queues for all CPUs")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240924233930.5193-1-ilkka@os.amperecomputing.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For libdw versions below 0.177, need to link libdl.a in addition to
libbebl.a during static compilation, otherwise
feature-dwarf_getlocations compilation will fail.
Before:
$ make LDFLAGS=-static
BUILD: Doing 'make -j20' parallel build
<SNIP>
Makefile.config:483: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157
<SNIP>
$ cat ../build/feature/test-dwarf_getlocations.make.output
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/libebl.a(eblclosebackend.o): in function `ebl_closebackend':
(.text+0x20): undefined reference to `dlclose'
collect2: error: ld returned 1 exit status
After:
$ make LDFLAGS=-static
<SNIP>
Auto-detecting system features:
... dwarf: [ on ]
<SNIP>
$ ./perf probe
Usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
or: perf probe [<options>] --del '[GROUP:]EVENT' ...
or: perf probe --list [GROUP:]EVENT ...
<SNIP>
Fixes: 536661da6ea18fe6 ("perf: build: Only link libebl.a for old libdw")
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240919013513.118527-3-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|