summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-25tracing/user_events: Set event filter_type from typeBeau Belgrave
Users expect that events can be filtered by the kernel. User events currently sets all event fields as FILTER_OTHER which limits to binary filters only. When strings are being used, functionality is reduced. Use filter_assign_type() to find the most appropriate filter type for each field in user events to ensure full kernel capabilities. Link: https://lkml.kernel.org/r/20230419214140.4158-2-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-04-25ring-buffer: Clearly check null ptr returned by rb_set_head_page()Zheng Yejian
In error case, 'buffer_page' returned by rb_set_head_page() is NULL, currently check '&buffer_page->list' is equivalent to check 'buffer_page' due to 'list' is the first member of 'buffer_page', but suppose it is not some time, 'head_page' would be wild memory while check would be bypassed. Link: https://lore.kernel.org/linux-trace-kernel/20230414071729.57312-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing: Unbreak user eventsSteven Rostedt (Google)
The user events was added a bit prematurely, and there were a few kernel developers that had issues with it. The API also needed a bit of work to make sure it would be stable. It was decided to make user events "broken" until this was settled. Now it has a new API that appears to be as stable as it will be without the use of a crystal ball. It's being used within Microsoft as is, which means the API has had some testing in real world use cases. It went through many discussions in the bi-weekly tracing meetings, and there's been no more comments about updates. I feel this is good to go. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Use print_format_fields() for trace outputSteven Rostedt (Google)
Currently, user events are shown using the "hex" output for "safety" reasons as one cannot trust user events behaving nicely. But the hex output is not the only utility for safe outputting of trace events. The print_event_fields() is just as safe and gives user readable output. Before: example-839 [001] ..... 43.222244: 00000000: b1 06 00 00 47 03 00 00 00 00 00 00 ....G....... example-839 [001] ..... 43.564433: 00000000: b1 06 00 00 47 03 00 00 01 00 00 00 ....G....... example-839 [001] ..... 43.763917: 00000000: b1 06 00 00 47 03 00 00 02 00 00 00 ....G....... example-839 [001] ..... 43.967929: 00000000: b1 06 00 00 47 03 00 00 03 00 00 00 ....G....... After: example-837 [006] ..... 55.739249: test: count=0x0 (0) example-837 [006] ..... 111.104784: test: count=0x1 (1) example-837 [006] ..... 111.268444: test: count=0x2 (2) example-837 [006] ..... 111.416533: test: count=0x3 (3) example-837 [006] ..... 111.542859: test: count=0x4 (4) Link: https://lore.kernel.org/linux-trace-kernel/20230328151413.4770b8d7@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Align structs with tabs for readabilityBeau Belgrave
Add tabs to make struct members easier to read and unify the style of the code. Link: https://lkml.kernel.org/r/20230328235219.203-13-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Limit global user_event countBeau Belgrave
Operators want to be able to ensure enough tracepoints exist on the system for kernel components as well as for user components. Since there are only up to 64K events, by default allow up to half to be used by user events. Add a kernel sysctl parameter (kernel.user_events_max) to set a global limit that is honored among all groups on the system. This ensures hard limits can be setup to prevent user processes from consuming all event IDs on the system. Link: https://lkml.kernel.org/r/20230328235219.203-12-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Charge event allocs to cgroupsBeau Belgrave
Operators need a way to limit how much memory cgroups use. User events need to be included into that accounting. Fix this by using GFP_KERNEL_ACCOUNT for allocations generated by user programs for user_event tracing. Link: https://lkml.kernel.org/r/20230328235219.203-11-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Update documentation for ABIBeau Belgrave
The ABI for user_events has changed from mmap() based to remote writes. Update the documentation to reflect these changes, add new section for unregistering events since lifetime is now tied to tasks instead of files. Link: https://lkml.kernel.org/r/20230328235219.203-10-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Use write ABI in exampleBeau Belgrave
The ABI has changed to use a remote write approach. Update the example to show the expected use of this new ABI. Also remove debugfs path and use tracefs to ensure example works in more environments. Link: https://lkml.kernel.org/r/20230328235219.203-9-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Add ABI self-testBeau Belgrave
Add ABI specific self-test to ensure enablements work in various scenarios such as fork, VM_CLONE, and basic event enable/disable. Ensure ABI contracts/limits are also being upheld, such as bit limits and data size limits. Link: https://lkml.kernel.org/r/20230328235219.203-8-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Update self-tests to write ABIBeau Belgrave
ABI has been changed to remote writes, update existing test cases to use this new ABI to ensure existing functionality continues to work. Link: https://lkml.kernel.org/r/20230328235219.203-7-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Add ioctl for disabling addressesBeau Belgrave
Enablements are now tracked by the lifetime of the task/mm. User processes need to be able to disable their addresses if tracing is requested to be turned off. Before unmapping the page would suffice. However, we now need a stronger contract. Add an ioctl to enable this. A new flag bit is added, freeing, to user_event_enabler to ensure that if the event is attempted to be removed while a fault is being handled that the remove is delayed until after the fault is reattempted. Link: https://lkml.kernel.org/r/20230328235219.203-6-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Fixup enable faults asynclyBeau Belgrave
When events are enabled within the various tracing facilities, such as ftrace/perf, the event_mutex is held. As events are enabled pages are accessed. We do not want page faults to occur under this lock. Instead queue the fault to a workqueue to be handled in a process context safe way without the lock. The enable address is marked faulting while the async fault-in occurs. This ensures that we don't attempt to fault-in more than is necessary. Once the page has been faulted in, an address write is re-attempted. If the page couldn't fault-in, then we wait until the next time the event is enabled to prevent any potential infinite loops. Link: https://lkml.kernel.org/r/20230328235219.203-5-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Use remote writes for event enablementBeau Belgrave
As part of the discussions for user_events aligned with user space tracers, it was determined that user programs should register a aligned value to set or clear a bit when an event becomes enabled. Currently a shared page is being used that requires mmap(). Remove the shared page implementation and move to a user registered address implementation. In this new model during the event registration from user programs 3 new values are specified. The first is the address to update when the event is either enabled or disabled. The second is the bit to set/clear to reflect the event being enabled. The third is the size of the value at the specified address. This allows for a local 32/64-bit value in user programs to support both kernel and user tracers. As an example, setting bit 31 for kernel tracers when the event becomes enabled allows for user tracers to use the other bits for ref counts or other flags. The kernel side updates the bit atomically, user programs need to also update these values atomically. User provided addresses must be aligned on a natural boundary, this allows for single page checking and prevents odd behaviors such as a enable value straddling 2 pages instead of a single page. Currently page faults are only logged, future patches will handle these. Link: https://lkml.kernel.org/r/20230328235219.203-4-beaub@linux.microsoft.com Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Track fork/exec/exit for mm lifetimeBeau Belgrave
During tracefs discussions it was decided instead of requiring a mapping within a user-process to track the lifetime of memory descriptors we should hook the appropriate calls. Do this by adding the minimal stubs required for task fork, exec, and exit. Currently this is just a NOP. Future patches will implement these calls fully. Link: https://lkml.kernel.org/r/20230328235219.203-3-beaub@linux.microsoft.com Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing/user_events: Split header into uapi and kernelBeau Belgrave
The UAPI parts need to be split out from the kernel parts of user_events now that other parts of the kernel will reference it. Do so by moving the existing include/linux/user_events.h into include/uapi/linux/user_events.h. Link: https://lkml.kernel.org/r/20230328235219.203-2-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tracing: Add "fields" option to show raw trace event fieldsSteven Rostedt (Google)
The hex, raw and bin formats come from the old PREEMPT_RT patch set latency tracer. That actually gave real alternatives to reading the ascii buffer. But they have started to bit rot and they do not give a good representation of the tracing data. Add "fields" option that will read the trace event fields and parse the data from how the fields are defined: With "fields" = 0 (default) echo 1 > events/sched/sched_switch/enable cat trace <idle>-0 [003] d..2. 540.078653: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/3:1 next_pid=83 next_prio=120 kworker/3:1-83 [003] d..2. 540.078860: sched_switch: prev_comm=kworker/3:1 prev_pid=83 prev_prio=120 prev_state=I ==> next_comm=swapper/3 next_pid=0 next_prio=120 <idle>-0 [003] d..2. 540.206423: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=807 next_prio=120 sshd-807 [003] d..2. 540.206531: sched_switch: prev_comm=sshd prev_pid=807 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 <idle>-0 [001] d..2. 540.206597: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 kworker/u16:4-58 [001] d..2. 540.206617: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120 bash-830 [001] d..2. 540.206678: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 kworker/u16:4-58 [001] d..2. 540.206696: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120 bash-830 [001] d..2. 540.206713: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 echo 1 > options/fields <...>-998 [002] d..2. 538.643732: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/2 prev_state=0x20 (32) prev_prio=0x78 (120) prev_pid=0x3e6 (998) prev_comm=trace-cmd <idle>-0 [001] d..2. 538.643806: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/1 bash-830 [001] d..2. 538.644106: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash kworker/u16:4-58 [001] d..2. 538.644130: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4 bash-830 [001] d..2. 538.644180: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash kworker/u16:4-58 [001] d..2. 538.644185: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4 bash-830 [001] d..2. 538.644204: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/1 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash <idle>-0 [003] d..2. 538.644211: sched_switch: next_prio=0x78 (120) next_pid=0x327 (807) next_comm=sshd prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/3 sshd-807 [003] d..2. 538.644340: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/3 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x327 (807) prev_comm=sshd It traces the data safely without using the trace print formatting. Link: https://lore.kernel.org/linux-trace-kernel/20230328145156.497651be@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29tools/kvm_stat: use canonical ftrace pathRoss Zwisler
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing A comment in kvm_stat still refers to this older debugfs path, so let's update it to avoid confusion. Link: https://lkml.kernel.org/r/20230313211746.1541525-3-zwisler@kernel.org Cc: "Tobin C. Harding" <me@tobin.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Tycho Andersen <tycho@tycho.pizza> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29leaking_addresses: also skip canonical ftrace pathRoss Zwisler
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing scripts/leaking_addresses.pl only skipped this older debugfs path, so let's add the canonical path as well. Link: https://lkml.kernel.org/r/20230313211746.1541525-2-zwisler@kernel.org Cc: "Tobin C. Harding" <me@tobin.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Acked-by: Tycho Andersen <tycho@tycho.pizza> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-29selftests: use canonical ftrace pathRoss Zwisler
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing A few spots in tools/testing/selftests still refer to this older debugfs path, so let's update them to avoid confusion. Link: https://lkml.kernel.org/r/20230313211746.1541525-1-zwisler@kernel.org Cc: "Tobin C. Harding" <me@tobin.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tycho Andersen <tycho@tycho.pizza> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-28docs: tracing: Update fprobe documentationMasami Hiramatsu (Google)
Update fprobe.rst for - the private entry_data argument - the return value of the entry handler - the nr_rethook_node field. Link: https://lkml.kernel.org/r/167526701579.433354.3057889264263546659.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-28lib/test_fprobe: Add a testcase for skipping exit_handlerMasami Hiramatsu (Google)
Add a testcase for skipping exit_handler if entry_handler returns !0. Link: https://lkml.kernel.org/r/167526700658.433354.12922388040490848613.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-28fprobe: Skip exit_handler if entry_handler returns !0Masami Hiramatsu (Google)
Skip hooking function return and calling exit_handler if the entry_handler() returns !0. Link: https://lkml.kernel.org/r/167526699798.433354.10998365726830117303.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-28lib/test_fprobe: Add a test case for nr_maxactiveMasami Hiramatsu (Google)
Add a test case for nr_maxactive. If the number of active functions is more than nr_maxactive, it must be skipped. Link: https://lkml.kernel.org/r/167526698856.433354.4430007340787176666.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-28fprobe: Add nr_maxactive to specify rethook_node pool sizeMasami Hiramatsu (Google)
Add nr_maxactive to specify rethook_node pool size. This means the maximum number of actively running target functions concurrently for probing by exit_handler. Note that if the running function is preempted or sleep, it is still counted as 'active'. Link: https://lkml.kernel.org/r/167526697917.433354.17779774988245113106.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-28lib/test_fprobe: Add private entry_data testcasesMasami Hiramatsu (Google)
Add test cases for checking whether private entry_data is correctly passed or not. Link: https://lkml.kernel.org/r/167526697074.433354.17790288501657876219.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-28fprobe: Pass entry_data to handlersMasami Hiramatsu (Google)
Pass the private entry_data to the entry and exit handlers so that they can share the context data, something like saved function arguments etc. User must specify the private entry_data size by @entry_data_size field before registering the fprobe. Link: https://lkml.kernel.org/r/167526696173.433354.17408372048319432574.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ftrace: Show a list of all functions that have ever been enabledSteven Rostedt (Google)
When debugging a crash that appears to be related to ftrace, but not for sure, it is useful to know if a function was ever enabled by ftrace or not. It could be that a BPF program was attached to it, or possibly a live patch. We are having crashes in the field where this information is not always known. But having ftrace set a flag if a function has ever been attached since boot up helps tremendously in trying to know if a crash had to do with something using ftrace. For analyzing crashes, the use of a kdump image can have access to the flags. When looking at issues where the kernel did not panic, the touched_functions file can simply be used. Link: https://lore.kernel.org/linux-trace-kernel/20230124095653.6fd1640e@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Chris Li <chriscli@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ring_buffer: Use try_cmpxchg instead of cmpxchgUros Bizjak
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Link: https://lkml.kernel.org/r/20230305155532.5549-4-ubizjak@gmail.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Acked-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ring_buffer: Change some static functions to boolUros Bizjak
The return values of some functions are of boolean type. Change the type of these function to bool and adjust their return values. Also change type of some internal varibles to bool. No functional change intended. Link: https://lkml.kernel.org/r/20230305155532.5549-3-ubizjak@gmail.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ring_buffer: Change some static functions to voidUros Bizjak
The results of some static functions are not used. Change the type of these function to void and remove unnecessary returns. No functional change intended. Link: https://lkml.kernel.org/r/20230305155532.5549-2-ubizjak@gmail.com Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ftrace: selftest: remove broken trace_direct_trampMark Rutland
The ftrace selftest code has a trace_direct_tramp() function which it uses as a direct call trampoline. This happens to work on x86, since the direct call's return address is in the usual place, and can be returned to via a RET, but in general the calling convention for direct calls is different from regular function calls, and requires a trampoline written in assembly. On s390, regular function calls place the return address in %r14, and an ftrace patch-site in an instrumented function places the trampoline's return address (which is within the instrumented function) in %r0, preserving the original %r14 value in-place. As a regular C function will return to the address in %r14, using a C function as the trampoline results in the trampoline returning to the caller of the instrumented function, skipping the body of the instrumented function. Note that the s390 issue is not detcted by the ftrace selftest code, as the instrumented function is trivial, and returning back into the caller happens to be equivalent. On arm64, regular function calls place the return address in x30, and an ftrace patch-site in an instrumented function saves this into r9 and places the trampoline's return address (within the instrumented function) in x30. A regular C function will return to the address in x30, but will not restore x9 into x30. Consequently, using a C function as the trampoline results in returning to the trampoline's return address having corrupted x30, such that when the instrumented function returns, it will return back into itself. To avoid future issues in this area, remove the trace_direct_tramp() function, and require that each architecture with direct calls provides a stub trampoline, named ftrace_stub_direct_tramp. This can be written to handle the architecture's trampoline calling convention, and in future could be used elsewhere (e.g. in the ftrace ops sample, to measure the overhead of direct calls), so we may as well always build it in. Link: https://lkml.kernel.org/r/20230321140424.345218-8-revest@chromium.org Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Florent Revest <revest@chromium.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ftrace: Make DIRECT_CALLS work WITH_ARGS and !WITH_REGSFlorent Revest
Direct called trampolines can be called in two ways: - either from the ftrace callsite. In this case, they do not access any struct ftrace_regs nor pt_regs - Or, if a ftrace ops is also attached, from the end of a ftrace trampoline. In this case, the call_direct_funcs ops is in charge of setting the direct call trampoline's address in a struct ftrace_regs Since: commit 9705bc709604 ("ftrace: pass fregs to arch_ftrace_set_direct_caller()") The later case no longer requires a full pt_regs. It only needs a struct ftrace_regs so DIRECT_CALLS can work with both WITH_ARGS or WITH_REGS. With architectures like arm64 already abandoning WITH_REGS in favor of WITH_ARGS, it's important to have DIRECT_CALLS work WITH_ARGS only. Link: https://lkml.kernel.org/r/20230321140424.345218-7-revest@chromium.org Signed-off-by: Florent Revest <revest@chromium.org> Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ftrace: Store direct called addresses in their opsFlorent Revest
All direct calls are now registered using the register_ftrace_direct API so each ops can jump to only one direct-called trampoline. By storing the direct called trampoline address directly in the ops we can save one hashmap lookup in the direct call ops and implement arm64 direct calls on top of call ops. Link: https://lkml.kernel.org/r/20230321140424.345218-6-revest@chromium.org Signed-off-by: Florent Revest <revest@chromium.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ftrace: Rename _ftrace_direct_multi APIs to _ftrace_direct APIsFlorent Revest
Now that the original _ftrace_direct APIs are gone, the "_multi" suffixes only add confusion. Link: https://lkml.kernel.org/r/20230321140424.345218-5-revest@chromium.org Signed-off-by: Florent Revest <revest@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ftrace: Remove the legacy _ftrace_direct APIFlorent Revest
This API relies on a single global ops, used for all direct calls registered with it. However, to implement arm64 direct calls, we need each ops to point to a single direct call trampoline. Link: https://lkml.kernel.org/r/20230321140424.345218-4-revest@chromium.org Signed-off-by: Florent Revest <revest@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ftrace: Replace uses of _ftrace_direct APIs with _ftrace_direct_multiFlorent Revest
The _multi API requires that users keep their own ops but can enforce that an op is only associated to one direct call. Link: https://lkml.kernel.org/r/20230321140424.345218-3-revest@chromium.org Signed-off-by: Florent Revest <revest@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-21ftrace: Let unregister_ftrace_direct_multi() call ftrace_free_filter()Florent Revest
A common pattern when using the ftrace_direct_multi API is to unregister the ops and also immediately free its filter. We've noticed it's very easy for users to miss calling ftrace_free_filter(). This adds a "free_filters" argument to unregister_ftrace_direct_multi() to both remind the user they should free filters and also to make their life easier. Link: https://lkml.kernel.org/r/20230321140424.345218-2-revest@chromium.org Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Florent Revest <revest@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19Linux 6.3-rc3Linus Torvalds
2023-03-19Merge tag 'trace-v6.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix setting affinity of hwlat threads in containers Using sched_set_affinity() has unwanted side effects when being called within a container. Use set_cpus_allowed_ptr() instead - Fix per cpu thread management of the hwlat tracer: - Do not start per_cpu threads if one is already running for the CPU - When starting per_cpu threads, do not clear the kthread variable as it may already be set to running per cpu threads - Fix return value for test_gen_kprobe_cmd() On error the return value was overwritten by being set to the result of the call from kprobe_event_delete(), which would likely succeed, and thus have the function return success - Fix splice() reads from the trace file that was broken by commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") - Remove obsolete and confusing comment in ring_buffer.c The original design of the ring buffer used struct page flags for tricks to optimize, which was shortly removed due to them being tricks. But a comment for those tricks remained - Set local functions and variables to static * tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr ring-buffer: remove obsolete comment for free_buffer_page() tracing: Make splice_read available again ftrace: Set direct_ops storage-class-specifier to static trace/hwlat: Do not start per-cpu thread if it is already running trace/hwlat: Do not wipe the contents of per-cpu thread data tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static tracing: Fix wrong return in kprobe_event_gen_test.c
2023-03-19tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptrCosta Shulyupin
There is a problem with the behavior of hwlat in a container, resulting in incorrect output. A warning message is generated: "cpumask changed while in round-robin mode, switching to mode none", and the tracing_cpumask is ignored. This issue arises because the kernel thread, hwlatd, is not a part of the container, and the function sched_setaffinity is unable to locate it using its PID. Additionally, the task_struct of hwlatd is already known. Ultimately, the function set_cpus_allowed_ptr achieves the same outcome as sched_setaffinity, but employs task_struct instead of PID. Test case: # cd /sys/kernel/tracing # echo 0 > tracing_on # echo round-robin > hwlat_detector/mode # echo hwlat > current_tracer # unshare --fork --pid bash -c 'echo 1 > tracing_on' # dmesg -c Actual behavior: [573502.809060] hwlat_detector: cpumask changed while in round-robin mode, switching to mode none Link: https://lore.kernel.org/linux-trace-kernel/20230316144535.1004952-1-costa.shul@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 0330f7aa8ee63 ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs") Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19ring-buffer: remove obsolete comment for free_buffer_page()Vlastimil Babka
The comment refers to mm/slob.c which is being removed. It comes from commit ed56829cb319 ("ring_buffer: reset buffer page when freeing") and according to Steven the borrowed code was a page mapcount and mapping reset, which was later removed by commit e4c2ce82ca27 ("ring_buffer: allocate buffer page pointer"). Thus the comment is not accurate anyway, remove it. Link: https://lore.kernel.org/linux-trace-kernel/20230315142446.27040-1-vbabka@suse.cz Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Reported-by: Mike Rapoport <mike.rapoport@gmail.com> Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Fixes: e4c2ce82ca27 ("ring_buffer: allocate buffer page pointer") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19tracing: Make splice_read available againSung-hun Kim
Since the commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") is applied to the kernel, splice() and sendfile() calls on the trace file (/sys/kernel/debug/tracing /trace) return EINVAL. This patch restores these system calls by initializing splice_read in file_operations of the trace file. This patch only enables such functionalities for the read case. Link: https://lore.kernel.org/linux-trace-kernel/20230314013707.28814-1-sfoon.kim@samsung.com Cc: stable@vger.kernel.org Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Signed-off-by: Sung-hun Kim <sfoon.kim@samsung.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-19Merge tag 'tty-6.3-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small tty and serial driver fixes for 6.3-rc3 to resolve some reported issues. They include: - 8250 driver Kconfig issue pointed out by you that showed up in -rc1 - qcom-geni serial driver fixes - various 8250 driver fixes for reported problems - fsl_lpuart driver fixes - serdev fix for regression in -rc1 - vt.c bugfix All have been in linux-next for over a week with no reported problems" * tag 'tty-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty: vt: protect KD_FONT_OP_GET_TALL from unbound access serial: qcom-geni: drop bogus uart_write_wakeup() serial: qcom-geni: fix mapping of empty DMA buffer serial: qcom-geni: fix DMA mapping leak on shutdown serial: qcom-geni: fix console shutdown hang serdev: Set fwnode for serdev devices tty: serial: fsl_lpuart: fix race on RX DMA shutdown serial: 8250_pci1xxxx: Disable SERIAL_8250_PCI1XXXX config by default serial: 8250_fsl: fix handle_irq locking serial: 8250_em: Fix UART port type serial: 8250: ASPEED_VUART: select REGMAP instead of depending on it tty: serial: fsl_lpuart: skip waiting for transmission complete when UARTCTRL_SBK is asserted Revert "tty: serial: fsl_lpuart: adjust SERIAL_FSL_LPUART_CONSOLE config dependency"
2023-03-19Merge tag 'char-misc-6.3-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are a few small char/misc/other driver subsystem patches to resolve reported problems for 6.3-rc3. Included in here are: - Interconnect driver fixes for reported problems - Memory driver fixes for reported problems - nvmem core fix - firmware driver fix for reported problem All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (23 commits) memory: tegra30-emc: fix interconnect registration race memory: tegra20-emc: fix interconnect registration race memory: tegra124-emc: fix interconnect registration race memory: tegra: fix interconnect registration race interconnect: exynos: drop redundant link destroy interconnect: exynos: fix registration race interconnect: exynos: fix node leak in probe PM QoS error path interconnect: qcom: msm8974: fix registration race interconnect: qcom: rpmh: fix registration race interconnect: qcom: rpmh: fix probe child-node error handling interconnect: qcom: rpm: fix registration race nvmem: core: return -ENOENT if nvmem cell is not found firmware: xilinx: don't make a sleepable memory allocation from an atomic context interconnect: qcom: rpm: fix probe child-node error handling interconnect: qcom: osm-l3: fix registration race interconnect: imx: fix registration race interconnect: fix provider registration API interconnect: fix icc_provider_del() error handling interconnect: fix mem leak when freeing nodes interconnect: qcom: qcm2290: Fix MASTER_SNOC_BIMC_NRT ...
2023-03-19Merge tag 'ras_urgent_for_v6.3_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fix from Borislav Petkov: - Flush out logged errors immediately after MCA banks configuration changes over sysfs have been done instead of waiting until something else triggers the workqueue later - another error or the polling interval cycle is reached * tag 'ras_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Make sure logged MCEs are processed after sysfs update
2023-03-19Merge tag 'perf_urgent_for_v6.3_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Check whether sibling events have been deactivated before adding them to groups - Update the proper event time tracking variable depending on the event type - Fix a memory overwrite issue due to using the wrong function argument when outputting perf events * tag 'perf_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix check before add_event_to_groups() in perf_group_detach() perf: fix perf_event_context->time perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output
2023-03-19Merge tag 'x86_urgent_for_v6.3_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "There's a little bit more 'movement' in there for my taste but it needs to happen and should make the code better after it. - Check cmdline_find_option()'s return value before further processing - Clear temporary storage in the resctrl code to prevent access to an unexistent MSR - Add a simple throttling mechanism to protect the hypervisor from potentially malicious SEV guests issuing requests in rapid succession. In order to not jeopardize the sanity of everyone involved in maintaining this code, the request issuing side has received a cleanup, split in more or less trivial, small and digestible pieces. Otherwise, the code was threatening to become an unmaintainable mess. Therefore, that cleanup is marked indirectly also for stable so that there's no differences between the upstream code and the stable variant when it comes down to backporting more there" * tag 'x86_urgent_for_v6.3_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix use of uninitialized buffer in sme_enable() x86/resctrl: Clear staged_config[] before and after it is used virt/coco/sev-guest: Add throttling awareness virt/coco/sev-guest: Convert the sw_exit_info_2 checking to a switch-case virt/coco/sev-guest: Do some code style cleanups virt/coco/sev-guest: Carve out the request issuing logic into a helper virt/coco/sev-guest: Remove the disable_vmpck label in handle_guest_request() virt/coco/sev-guest: Simplify extended guest request handling virt/coco/sev-guest: Check SEV_SNP attribute at probe time
2023-03-19Merge tag 'ext4_for_linus_urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fix from Ted Ts'o: "Fix a double unlock bug on an error path in ext4, found by smatch and syzkaller" * tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix possible double unlock when moving a directory
2023-03-19ftrace: Set direct_ops storage-class-specifier to staticTom Rix
smatch reports this warning kernel/trace/ftrace.c:2594:19: warning: symbol 'direct_ops' was not declared. Should it be static? The variable direct_ops is only used in ftrace.c, so it should be static Link: https://lore.kernel.org/linux-trace-kernel/20230311135113.711824-1-trix@redhat.com Signed-off-by: Tom Rix <trix@redhat.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>