summaryrefslogtreecommitdiff
path: root/kernel/trace/fgraph.c
AgeCommit message (Collapse)Author
2025-03-04fgraph: Correct typo in ftrace_return_to_handler commentHaiyue Wang
Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250218122052.58348-1-haiyuewa@163.com Signed-off-by: Haiyue Wang <haiyuewa@163.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-21fgraph: Remove calltime and rettime from generic operationsSteven Rostedt
The function graph infrastructure is now generic so that kretprobes, fprobes and BPF can use it. But there is still some leftover logic that only the function graph tracer itself uses. This is the calculation of the calltime and return time of the functions. The calculation of the calltime has been moved into the function graph tracer and those users that need it so that it doesn't cause overhead to the other users. But the return function timestamp was still called. Instead of just moving the taking of the timestamp into the function graph trace remove the calltime and rettime completely from the ftrace_graph_ret structure. Instead, move it into the function graph return entry event structure and this also moves all the calltime and rettime logic out of the generic fgraph.c code and into the tracing code that uses it. This has been reported to decrease the overhead by ~27%. Link: https://lore.kernel.org/all/Z3aSuql3fnXMVMoM@krava/ Link: https://lore.kernel.org/all/173665959558.1629214.16724136597211810729.stgit@devnote2/ Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250121194436.15bdf71a@gandalf.local.home Reported-by: Jiri Olsa <olsajiri@gmail.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-21Merge tag 'ftrace-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Have fprobes built on top of function graph infrastructure The fprobe logic is an optimized kprobe that uses ftrace to attach to functions when a probe is needed at the start or end of the function. The fprobe and kretprobe logic implements a similar method as the function graph tracer to trace the end of the function. That is to hijack the return address and jump to a trampoline to do the trace when the function exits. To do this, a shadow stack needs to be created to store the original return address. Fprobes and function graph do this slightly differently. Fprobes (and kretprobes) has slots per callsite that are reserved to save the return address. This is fine when just a few points are traced. But users of fprobes, such as BPF programs, are starting to add many more locations, and this method does not scale. The function graph tracer was created to trace all functions in the kernel. In order to do this, when function graph tracing is started, every task gets its own shadow stack to hold the return address that is going to be traced. The function graph tracer has been updated to allow multiple users to use its infrastructure. Now have fprobes be one of those users. This will also allow for the fprobe and kretprobe methods to trace the return address to become obsolete. With new technologies like CFI that need to know about these methods of hijacking the return address, going toward a solution that has only one method of doing this will make the kernel less complex. - Cleanup with guard() and free() helpers There were several places in the code that had a lot of "goto out" in the error paths to either unlock a lock or free some memory that was allocated. But this is error prone. Convert the code over to use the guard() and free() helpers that let the compiler unlock locks or free memory when the function exits. - Remove disabling of interrupts in the function graph tracer When function graph tracer was first introduced, it could race with interrupts and NMIs. To prevent that race, it would disable interrupts and not trace NMIs. But the code has changed to allow NMIs and also interrupts. This change was done a long time ago, but the disabling of interrupts was never removed. Remove the disabling of interrupts in the function graph tracer is it is not needed. This greatly improves its performance. - Allow the :mod: command to enable tracing module functions on the kernel command line. The function tracer already has a way to enable functions to be traced in modules by writing ":mod:<module>" into set_ftrace_filter. That will enable either all the functions for the module if it is loaded, or if it is not, it will cache that command, and when the module is loaded that matches <module>, its functions will be enabled. This also allows init functions to be traced. But currently events do not have that feature. Because enabling function tracing can be done very early at boot up (before scheduling is enabled), the commands that can be done when function tracing is started is limited. Having the ":mod:" command to trace module functions as they are loaded is very useful. Update the kernel command line function filtering to allow it. * tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits) ftrace: Implement :mod: cache filtering on kernel command line tracing: Adopt __free() and guard() for trace_fprobe.c bpf: Use ftrace_get_symaddr() for kprobe_multi probes ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr Documentation: probes: Update fprobe on function-graph tracer selftests/ftrace: Add a test case for repeating register/unregister fprobe selftests: ftrace: Remove obsolate maxactive syntax check tracing/fprobe: Remove nr_maxactive from fprobe fprobe: Add fprobe_header encoding feature fprobe: Rewrite fprobe on function-graph tracer s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS tracing: Add ftrace_fill_perf_regs() for perf event tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs fprobe: Use ftrace_regs in fprobe exit handler fprobe: Use ftrace_regs in fprobe entry handler fgraph: Pass ftrace_regs to retfunc fgraph: Replace fgraph_ret_regs with ftrace_regs ...
2025-01-02fgraph: Add READ_ONCE() when accessing fgraph_array[]Zilin Guan
In __ftrace_return_to_handler(), a loop iterates over the fgraph_array[] elements, which are fgraph_ops. The loop checks if an element is a fgraph_stub to prevent using a fgraph_stub afterward. However, if the compiler reloads fgraph_array[] after this check, it might race with an update to fgraph_array[] that introduces a fgraph_stub. This could result in the stub being processed, but the stub contains a null "func_hash" field, leading to a NULL pointer dereference. To ensure that the gops compared against the fgraph_stub matches the gops processed later, add a READ_ONCE(). A similar patch appears in commit 63a8dfb ("function_graph: Add READ_ONCE() when accessing fgraph_array[]"). Cc: stable@vger.kernel.org Fixes: 37238abe3cb47 ("ftrace/function_graph: Pass fgraph_ops to function graph callbacks") Link: https://lore.kernel.org/20241231113731.277668-1-zilin@seu.edu.cn Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26fgraph: Pass ftrace_regs to retfuncMasami Hiramatsu (Google)
Pass ftrace_regs to the fgraph_ops::retfunc(). If ftrace_regs is not available, it passes a NULL instead. User callback function can access some registers (including return address) via this ftrace_regs. Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/173518992972.391279.14055405490327765506.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26fgraph: Replace fgraph_ret_regs with ftrace_regsMasami Hiramatsu (Google)
Use ftrace_regs instead of fgraph_ret_regs for tracing return value on function_graph tracer because of simplifying the callback interface. The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by CONFIG_HAVE_FUNCTION_GRAPH_FREGS. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/173518991508.391279.16635322774382197642.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26fgraph: Pass ftrace_regs to entryfuncMasami Hiramatsu (Google)
Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not available, it passes a NULL instead. User callback function can access some registers (including return address) via this ftrace_regs. Note that the ftrace_regs can be NULL when the arch does NOT define: HAVE_DYNAMIC_FTRACE_WITH_ARGS or HAVE_DYNAMIC_FTRACE_WITH_REGS. More specifically, if HAVE_DYNAMIC_FTRACE_WITH_REGS is defined but not the HAVE_DYNAMIC_FTRACE_WITH_ARGS, and the ftrace ops used to register the function callback does not set FTRACE_OPS_FL_SAVE_REGS. In this case, ftrace_regs can be NULL in user callback. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/173518990044.391279.17406984900626078579.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-23fgraph: Get ftrace recursion lock in function_graph_enterMasami Hiramatsu (Google)
Get the ftrace recursion lock in the generic function_graph_enter() instead of each architecture code. This changes all function_graph tracer callbacks running in non-preemptive state. On x86 and powerpc, this is by default, but on the other architecutres, this will be new. Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/173379653720.973433.18438622234884980494.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-16fgraph: Still initialize idle shadow stacks when startingSteven Rostedt
A bug was discovered where the idle shadow stacks were not initialized for offline CPUs when starting function graph tracer, and when they came online they were not traced due to the missing shadow stack. To fix this, the idle task shadow stack initialization was moved to using the CPU hotplug callbacks. But it removed the initialization when the function graph was enabled. The problem here is that the hotplug callbacks are called when the CPUs come online, but the idle shadow stack initialization only happens if function graph is currently active. This caused the online CPUs to not get their shadow stack initialized. The idle shadow stack initialization still needs to be done when the function graph is registered, as they will not be allocated if function graph is not registered. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241211135335.094ba282@batman.local.home Fixes: 2c02f7375e65 ("fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks") Reported-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Linus Walleij <linus.walleij@linaro.org> Closes: https://lore.kernel.org/all/CACRpkdaTBrHwRbbrphVy-=SeDz6MSsXhTKypOtLrTQ+DgGAOcQ@mail.gmail.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-29fgraph: Use guard(mutex)(&ftrace_lock) for unregister_ftrace_graph()Steven Rostedt
The ftrace_lock is held throughout unregister_ftrace_graph(), use a guard to simplify the error paths. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20241028071307.770550792@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-29fgraph: Give ret_stack its own kmem cacheSteven Rostedt
The ret_stack (shadow stack used by function graph infrastructure) is created for every task on the system when function graph is enabled. Give it its own kmem_cache. This will make it easier to see how much memory is being used specifically for function graph shadow stacks. In the future, this size may change and may not be a power of two. Having its own cache can also keep it from fragmenting memory. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/20241026063210.7d4910a7@rorschach.local.home Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-29fgraph: Separate size of ret_stack from PAGE_SIZESteven Rostedt
The ret_stack (shadow stack used by function graph infrastructure) is currently defined as PAGE_SIZE. But some architectures which have 64K PAGE_SIZE, this is way overkill. Also there's an effort to allow the PAGE_SIZE to be defined at boot up. Hard code it for now to 4096. In the future, this size may change and even be dependent on specific architectures. Link: https://lore.kernel.org/all/e5067bb8-0fcd-4739-9bca-0e872037d5a1@arm.com/ Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241019152951.053f9646@rorschach.local.home Suggested-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-29Merge tag 'ftrace-v6.12-rc4' into trace/ftrace/coreSteven Rostedt
In order to modify the code that allocates the shadow stacks, merge the changes that fixed the CPU hotplug shadow stack allocations and build on top of that. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-24fgraph: Change the name of cpuhp state to "fgraph:online"Steven Rostedt
The cpuhp state name given to cpuhp_setup_state() is "fgraph_idle_init" which doesn't really conform to the names that are used for cpu hotplug setups. Instead rename it to "fgraph:online" to be in line with other states. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/20241024222944.473d88c5@rorschach.local.home Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 2c02f7375e658 ("fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-24fgraph: Fix missing unlock in register_ftrace_graph()Li Huafei
Use guard(mutex)() to acquire and automatically release ftrace_lock, fixing the issue of not unlocking when calling cpuhp_setup_state() fails. Fixes smatch warning: kernel/trace/fgraph.c:1317 register_ftrace_graph() warn: inconsistent returns '&ftrace_lock'. Link: https://lore.kernel.org/20241024155917.1019580-1-lihuafei1@huawei.com Fixes: 2c02f7375e65 ("fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202410220121.wxg0olfd-lkp@intel.com/ Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-18fgraph: Allocate ret_stack_list with proper sizeSteven Rostedt
The ret_stack_list is an array of ret_stack shadow stacks for the function graph usage. When the first function graph is enabled, all tasks in the system get a shadow stack. The ret_stack_list is a 32 element array of pointers to these shadow stacks. It allocates the shadow stack in batches (32 stacks at a time), assigns them to running tasks, and continues until all tasks are covered. When the function graph shadow stack changed from an array of ftrace_ret_stack structures to an array of longs, the allocation of ret_stack_list went from allocating an array of 32 elements to just a block defined by SHADOW_STACK_SIZE. Luckily, that's defined as PAGE_SIZE and is much more than enough to hold 32 pointers. But it is way overkill for the amount needed to allocate. Change the allocation of ret_stack_list back to a kcalloc() of FTRACE_RETSTACK_ALLOC_SIZE pointers. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241018215212.23f13f40@rorschach Fixes: 42675b723b484 ("function_graph: Convert ret_stack to a series of longs") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-18fgraph: Use CPU hotplug mechanism to initialize idle shadow stacksSteven Rostedt
The function graph infrastructure allocates a shadow stack for every task when enabled. This includes the idle tasks. The first time the function graph is invoked, the shadow stacks are created and never freed until the task exits. This includes the idle tasks. Only the idle tasks that were for online CPUs had their shadow stacks created when function graph tracing started. If function graph tracing is enabled and a CPU comes online, the idle task representing that CPU will not have its shadow stack created, and all function graph tracing for that idle task will be silently dropped. Instead, use the CPU hotplug mechanism to allocate the idle shadow stacks. This will include idle tasks for CPUs that come online during tracing. This issue can be reproduced by: # cd /sys/kernel/tracing # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > set_ftrace_pid # echo function_graph > current_tracer # echo 1 > options/funcgraph-proc # echo 1 > /sys/devices/system/cpu/cpu1 # grep '<idle>' per_cpu/cpu1/trace | head Before, nothing would show up. After: 1) <idle>-0 | 0.811 us | __enqueue_entity(); 1) <idle>-0 | 5.626 us | } /* enqueue_entity */ 1) <idle>-0 | | dl_server_update_idle_time() { 1) <idle>-0 | | dl_scaled_delta_exec() { 1) <idle>-0 | 0.450 us | arch_scale_cpu_capacity(); 1) <idle>-0 | 1.242 us | } 1) <idle>-0 | 1.908 us | } 1) <idle>-0 | | dl_server_start() { 1) <idle>-0 | | enqueue_dl_entity() { 1) <idle>-0 | | task_contending() { Note, if tracing stops and restarts, the old way would then initialize the onlined CPUs. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/20241018214300.6df82178@rorschach Fixes: 868baf07b1a25 ("ftrace: Fix memory leak with function graph and cpu hotplug") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-10fgraph: Simplify return address printing in function graph tracerMasami Hiramatsu (Google)
Simplify return address printing in the function graph tracer by removing fgraph_extras. Since this feature is only used by the function graph tracer and the feature flags can directly accessible from the function graph tracer, fgraph_extras can be removed from the fgraph callback. Cc: Donglin Peng <dolinux.peng@gmail.com> Link: https://lore.kernel.org/172857234900.270774.15378354017601069781.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-05function_graph: Remove unnecessary initialization in ftrace_graph_ret_addr()Oleg Nesterov
After the commit 29c1c24a2707 ("function_graph: Fix up ftrace_graph_ret_addr()") ftrace_graph_ret_addr() doesn't need to initialize "int i" at the start. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/20240916175818.GA28944@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-05function_graph: Support recording and printing the function return addressDonglin Peng
When using function_graph tracer to analyze the flow of kernel function execution, it is often necessary to quickly locate the exact line of code where the call occurs. While this may be easy at times, it can be more time-consuming when some functions are inlined or the flow is too long. This feature aims to simplify the process by recording the return address of traced funcions and printing it when outputing trace logs. To enhance human readability, the prefix 'ret=' is used for the kernel return value, while '<-' serves as the prefix for the return address in trace logs to make it look more like the function tracer. A new trace option named 'funcgraph-retaddr' has been introduced, and the existing option 'sym-addr' can be used to control the format of the return address. See below logs with both funcgraph-retval and funcgraph-retaddr enabled. 0) | load_elf_binary() { /* <-bprm_execve+0x249/0x600 */ 0) | load_elf_phdrs() { /* <-load_elf_binary+0x84/0x1730 */ 0) | __kmalloc_noprof() { /* <-load_elf_phdrs+0x4a/0xb0 */ 0) 3.657 us | __cond_resched(); /* <-__kmalloc_noprof+0x28c/0x390 ret=0x0 */ 0) + 24.335 us | } /* __kmalloc_noprof ret=0xffff8882007f3000 */ 0) | kernel_read() { /* <-load_elf_phdrs+0x6c/0xb0 */ 0) | rw_verify_area() { /* <-kernel_read+0x2b/0x50 */ 0) | security_file_permission() { /* <-kernel_read+0x2b/0x50 */ 0) | selinux_file_permission() { /* <-security_file_permission+0x26/0x40 */ 0) | __inode_security_revalidate() { /* <-selinux_file_permission+0x6d/0x140 */ 0) 2.034 us | __cond_resched(); /* <-__inode_security_revalidate+0x5f/0x80 ret=0x0 */ 0) 6.602 us | } /* __inode_security_revalidate ret=0x0 */ 0) 2.214 us | avc_policy_seqno(); /* <-selinux_file_permission+0x107/0x140 ret=0x0 */ 0) + 16.670 us | } /* selinux_file_permission ret=0x0 */ 0) + 20.809 us | } /* security_file_permission ret=0x0 */ 0) + 25.217 us | } /* rw_verify_area ret=0x0 */ 0) | __kernel_read() { /* <-load_elf_phdrs+0x6c/0xb0 */ 0) | ext4_file_read_iter() { /* <-__kernel_read+0x160/0x2e0 */ Then, we can use the faddr2line to locate the source code, for example: $ ./scripts/faddr2line ./vmlinux load_elf_phdrs+0x6c/0xb0 load_elf_phdrs+0x6c/0xb0: elf_read at fs/binfmt_elf.c:471 (inlined by) load_elf_phdrs at fs/binfmt_elf.c:531 Link: https://lore.kernel.org/20240915032912.1118397-1-dolinux.peng@gmail.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409150605.HgUmU8ea-lkp@intel.com/ Signed-off-by: Donglin Peng <dolinux.peng@gmail.com> [ Rebased to handle text_delta offsets ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-09-30ftrace: Have calltime be saved in the fgraph storageSteven Rostedt
The calltime field in the shadow stack frame is only used by the function graph tracer and profiler. But now that there's other users of the function graph infrastructure, this adds overhead and wastes space on the shadow stack. Move the calltime to the fgraph data storage, where the function graph and profiler entry functions will save it in its own graph storage and retrieve it in its exit functions. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <olsajiri@gmail.com> Link: https://lore.kernel.org/20240914214827.096968730@goodmis.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-09-30ftrace: Use a running sleeptime instead of saving on shadow stackSteven Rostedt
The fgraph "sleep-time" option tells the function graph tracer and the profiler whether to include the time a function "sleeps" (is scheduled off the CPU) in its duration for the function. By default it is true, which means the duration of a function is calculated by the timestamp of when the function was entered to the timestamp of when it exits. If the "sleep-time" option is disabled, it needs to remove the time that the task was not running on the CPU during the function. Currently it is done in a sched_switch tracepoint probe where it moves the "calltime" (time of entry of the function) forward by the sleep time calculated. It updates all the calltime in the shadow stack. This is time consuming for those users of the function graph tracer that does not care about the sleep time. Instead, add a "ftrace_sleeptime" to the task_struct that gets the sleep time added each time the task wakes up. Then have the function entry save the current "ftrace_sleeptime" and on function exit, move the calltime forward by the difference of the current "ftrace_sleeptime" from the saved sleeptime. This removes one dependency of "calltime" needed to be on the shadow stack. It also simplifies the code that removes the sleep time of functions. TODO: Only enable the sched_switch tracepoint when this is needed. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <olsajiri@gmail.com> Link: https://lore.kernel.org/20240914214826.938908568@goodmis.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-09-30fgraph: Use fgraph data to store subtime for profilerSteven Rostedt
Instead of having the "subtime" for the function profiler in the infrastructure ftrace_ret_stack structure, have it use the fgraph data reserve and retrieve functions. This will keep the limited shadow stack from wasting 8 bytes for something that is seldom used. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <olsajiri@gmail.com> Link: https://lore.kernel.org/20240914214826.780323141@goodmis.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-21tracing: fgraph: Fix to add new fgraph_ops to array after ↵Masami Hiramatsu (Google)
ftrace_startup_subops() Since the register_ftrace_graph() assigns a new fgraph_ops to fgraph_array before registring it by ftrace_startup_subops(), the new fgraph_ops can be used in function_graph_enter(). In most cases, it is still OK because those fgraph_ops's hashtable is already initialized by ftrace_set_filter*() etc. But if a user registers a new fgraph_ops which does not initialize the hash list, ftrace_ops_test() in function_graph_enter() causes a NULL pointer dereference BUG because fgraph_ops->ops.func_hash is NULL. This can be reproduced by the below commands because function profiler's fgraph_ops does not initialize the hash list; # cd /sys/kernel/tracing # echo function_graph > current_tracer # echo 1 > function_profile_enabled To fix this problem, add a new fgraph_ops to fgraph_array after ftrace_startup_subops(). Thus, until the new fgraph_ops is initialized, we will see fgraph_stub on the corresponding fgraph_array entry. Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/172398528350.293426.8347220120333730248.stgit@devnote2 Fixes: c132be2c4fcc ("function_graph: Have the instances use their own ftrace_ops for filtering") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-07function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()Petr Pavlu
When ftrace_graph_ret_addr() is invoked to convert a found stack return address to its original value, the function can end up producing the following crash: [ 95.442712] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ 95.442720] #PF: supervisor read access in kernel mode [ 95.442724] #PF: error_code(0x0000) - not-present page [ 95.442727] PGD 0 P4D 0- [ 95.442731] Oops: Oops: 0000 [#1] PREEMPT SMP PTI [ 95.442736] CPU: 1 UID: 0 PID: 2214 Comm: insmod Kdump: loaded Tainted: G OE K 6.11.0-rc1-default #1 67c62a3b3720562f7e7db5f11c1fdb40b7a2857c [ 95.442747] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH [ 95.442750] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [ 95.442754] RIP: 0010:ftrace_graph_ret_addr+0x42/0xc0 [ 95.442766] Code: [...] [ 95.442773] RSP: 0018:ffff979b80ff7718 EFLAGS: 00010006 [ 95.442776] RAX: ffffffff8ca99b10 RBX: ffff979b80ff7760 RCX: ffff979b80167dc0 [ 95.442780] RDX: ffffffff8ca99b10 RSI: ffff979b80ff7790 RDI: 0000000000000005 [ 95.442783] RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000 [ 95.442786] R10: 0000000000000005 R11: 0000000000000000 R12: ffffffff8e9491e0 [ 95.442790] R13: ffffffff8d6f70f0 R14: ffff979b80167da8 R15: ffff979b80167dc8 [ 95.442793] FS: 00007fbf83895740(0000) GS:ffff8a0afdd00000(0000) knlGS:0000000000000000 [ 95.442797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 95.442800] CR2: 0000000000000028 CR3: 0000000005070002 CR4: 0000000000370ef0 [ 95.442806] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 95.442809] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 95.442816] Call Trace: [ 95.442823] <TASK> [ 95.442896] unwind_next_frame+0x20d/0x830 [ 95.442905] arch_stack_walk_reliable+0x94/0xe0 [ 95.442917] stack_trace_save_tsk_reliable+0x7d/0xe0 [ 95.442922] klp_check_and_switch_task+0x55/0x1a0 [ 95.442931] task_call_func+0xd3/0xe0 [ 95.442938] klp_try_switch_task.part.5+0x37/0x150 [ 95.442942] klp_try_complete_transition+0x79/0x2d0 [ 95.442947] klp_enable_patch+0x4db/0x890 [ 95.442960] do_one_initcall+0x41/0x2e0 [ 95.442968] do_init_module+0x60/0x220 [ 95.442975] load_module+0x1ebf/0x1fb0 [ 95.443004] init_module_from_file+0x88/0xc0 [ 95.443010] idempotent_init_module+0x190/0x240 [ 95.443015] __x64_sys_finit_module+0x5b/0xc0 [ 95.443019] do_syscall_64+0x74/0x160 [ 95.443232] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 95.443236] RIP: 0033:0x7fbf82f2c709 [ 95.443241] Code: [...] [ 95.443247] RSP: 002b:00007fffd5ea3b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 95.443253] RAX: ffffffffffffffda RBX: 000056359c48e750 RCX: 00007fbf82f2c709 [ 95.443257] RDX: 0000000000000000 RSI: 000056356ed4efc5 RDI: 0000000000000003 [ 95.443260] RBP: 000056356ed4efc5 R08: 0000000000000000 R09: 00007fffd5ea3c10 [ 95.443263] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 [ 95.443267] R13: 000056359c48e6f0 R14: 0000000000000000 R15: 0000000000000000 [ 95.443272] </TASK> [ 95.443274] Modules linked in: [...] [ 95.443385] Unloaded tainted modules: intel_uncore_frequency(E):1 isst_if_common(E):1 skx_edac(E):1 [ 95.443414] CR2: 0000000000000028 The bug can be reproduced with kselftests: cd linux/tools/testing/selftests make TARGETS='ftrace livepatch' (cd ftrace; ./ftracetest test.d/ftrace/fgraph-filter.tc) (cd livepatch; ./test-livepatch.sh) The problem is that ftrace_graph_ret_addr() is supposed to operate on the ret_stack of a selected task but wrongly accesses the ret_stack of the current task. Specifically, the above NULL dereference occurs when task->curr_ret_stack is non-zero, but current->ret_stack is NULL. Correct ftrace_graph_ret_addr() to work with the right ret_stack. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reported-by: Miroslav Benes <mbenes@suse.cz> Link: https://lore.kernel.org/20240803131211.17255-1-petr.pavlu@suse.com Fixes: 7aa1eaef9f42 ("function_graph: Allow multiple users to attach to function graph") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-14function_graph: Add READ_ONCE() when accessing fgraph_array[]Steven Rostedt (Google)
In function_graph_enter() there's a loop that looks at fgraph_array[] elements which are fgraph_ops. It first tests if it is a fgraph_stub op, and if so skips it, as that's just there as a place holder. Then it checks the fgraph_ops filters to see if the ops wants to trace the current function. But if the compiler reloads the fgraph_array[] after the check against fgraph_stub, it could race with the fgraph_array[] being updated with the fgraph_stub. That would cause the stub to be processed. But the stub has a null "func_hash" field which will cause a NULL pointer dereference. Add a READ_ONCE() so that the gops that is compared against the fgraph_stub is also the gops that is processed later. Link: https://lore.kernel.org/all/CA+G9fYsSVJQZH=nM=1cjTc94PgSnMF9y65BnOv6XSoCG_b6wmw@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240613095223.1f07e3a4@rorschach.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: cc60ee813b503 ("function_graph: Use static_call and branch to optimize entry function") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-06-11function_graph: Everyone uses HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, remove itSteven Rostedt (Google)
All architectures that implement function graph also implements HAVE_FUNCTION_GRAPH_RET_ADDR_PTR. Remove it, as it is no longer a differentiator. Link: https://lore.kernel.org/linux-trace-kernel/20240611031737.982047614@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-11function_graph: Fix up ftrace_graph_ret_addr()Steven Rostedt (Google)
Yang Li sent a patch to fix the kerneldoc of ftrace_graph_ret_addr(). While reviewing it, I realized that the comments in the entire function header needed a rewrite. When doing that, I realized that @idx parameter was being ignored. Every time this was called by the unwinder, it would start the loop at the top of the shadow stack and look for the matching stack pointer. When it found it, it would return it. When the unwinder asked for the next function, it would search from the beginning again. In reality, it should start from where it left off. That was the reason for the @idx parameter in the first place. The first time the unwinder calls this function, the @idx pointer would contain zero. That would mean to start from the top of the stack. The function was supposed to update the @idx with the index where it found the return address, so that the next time the unwinder calls this function it doesn't have to search through the previous addresses it found (making it O(n^2)!). This speeds up the unwinder's use of ftrace_graph_ret_addr() by an order of magnitude. Link: https://lore.kernel.org/linux-trace-kernel/20240610181746.656e3759@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20240611031737.821995106@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Reported-by: Yang Li <yang.lee@linux.alibaba.com> Fixes: 7aa1eaef9f428 ("function_graph: Allow multiple users to attach to function graph") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-10function_graph: Make fgraph_update_pid_func() a stub for !DYNAMIC_FTRACESteven Rostedt (Google)
When CONFIG_DYNAMIC_FTRACE is not set, the function fgraph_update_pid_func() doesn't do anything. Currently, most of its logic is within a "#ifdef CONFIG_DYNAMIC_FTRACE" block, but its variables were declared outside that, and when DYNAMIC_FTRACE is not set, it produces unused variable warnings. Instead, just place it (and the helper function fgraph_pid_func()) within the #ifdef block and have the header file use a empty stub function for when DYNAMIC_FTRACE is not defined. Link: https://lore.kernel.org/linux-trace-kernel/20240607094833.6a787d73@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406071806.BRjaC5FF-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-06fgraph: Remove some unused functionsJiapeng Chong
These functions are defined in the fgraph.c file, but not called elsewhere, so delete these unused functions. kernel/trace/fgraph.c:273:1: warning: unused function 'set_bitmap_bits'. kernel/trace/fgraph.c:259:19: warning: unused function 'get_fgraph_type'. Link: https://lore.kernel.org/linux-trace-kernel/20240606021053.27783-1-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9289 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-06function_graph: Do not update pid func if CONFIG_DYNAMIC_FTRACE not enabledSteven Rostedt (Google)
The ftrace subops is only defined if CONFIG_DYNAMIC_FTRACE is enabled. If it is not, function tracing is extremely limited, and the subops in the ftrace_ops structure is not defined (and will fail to compile). If DYNAMIC_FTRACE is not enabled, then function graph filtering will not work (as it shouldn't). Link: https://lore.kernel.org/linux-trace-kernel/20240605202709.096020676@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: df3ec5da6a1e7 ("function_graph: Add pid tracing back to function graph tracer") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406051855.9VIYXbTB-lkp@intel.com/ Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-06function_graph: Make fgraph_do_direct static key staticSteven Rostedt (Google)
The static branch key "fgraph_do_direct" was not declared static but is only used in one file. Change it to a static variable. Link: https://lore.kernel.org/linux-trace-kernel/20240605202708.936515302@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: cc60ee813b503 ("function_graph: Use static_call and branch to optimize entry function") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406051711.dS1sQZ9n-lkp@intel.com/ Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Use static_call and branch to optimize return functionSteven Rostedt (Google)
In most cases function graph is used by a single user. Instead of calling a loop to call function graph callbacks in this case, call the function return callback directly. Use the static_key that is set when the function graph tracer has less than 2 callbacks registered. It will do the direct call in that case, and will do the loop over all callers when there are 2 or more callbacks registered. Link: https://lore.kernel.org/linux-trace-kernel/20240603190824.921460797@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Use static_call and branch to optimize entry functionSteven Rostedt (Google)
In most cases function graph is used by a single user. Instead of calling a loop to call function graph callbacks in this case, call the function entry callback directly. Add a static_key that will be used to set the function graph logic to either do the loop (when more than one callback is registered) or to call the callback directly if there is only one registered callback. Link: https://lore.kernel.org/linux-trace-kernel/20240603190824.766858241@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Use bitmask to loop on fgraph entrySteven Rostedt (Google)
Instead of looping through all the elements of fgraph_array[] to see if there's an gops attached to one and then calling its gops->func(). Create a fgraph_array_bitmask that sets bits when an index in the array is reserved (via the simple lru algorithm). Then only the bits set in this bitmask needs to be looked at where only elements in the array that have ops registered need to be looked at. Note, we do not care about races. If a bit is set before the gops is assigned, it only wastes time looking at the element and ignoring it (as it did before this bitmask is added). Link: https://lore.kernel.org/linux-trace-kernel/20240603190824.604448781@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Use for_each_set_bit() in __ftrace_return_to_handler()Steven Rostedt (Google)
Instead of iterating through the entire fgraph_array[] and seeing if one of the bitmap bits are set to know to call the array's retfunc() function, use for_each_set_bit() on the bitmap itself. This will only iterate for the number of set bits. Link: https://lore.kernel.org/linux-trace-kernel/20240603190824.447448026@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()Steven Rostedt (VMware)
Added functions that can be called by a fgraph_ops entryfunc and retfunc to store state between the entry of the function being traced to the exit of the same function. The fgraph_ops entryfunc() may call fgraph_reserve_data() to store up to 32 words onto the task's shadow ret_stack and this then can be retrieved by fgraph_retrieve_data() called by the corresponding retfunc(). Co-developed with Masami Hiramatsu: Link: https://lore.kernel.org/linux-trace-kernel/171509109089.162236.11372474169781184034.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190823.959703050@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Add "task variables" per task for fgraph_opsSteven Rostedt (VMware)
Add a "task variables" array on the tasks shadow ret_stack that is the size of longs for each possible registered fgraph_ops. That's a total of 16, taking up 8 * 16 = 128 bytes (out of a page size 4k). This will allow for fgraph_ops to do specific features on a per task basis having a way to maintain state for each task. Co-developed with Masami Hiramatsu: Link: https://lore.kernel.org/linux-trace-kernel/171509104383.162236.12239656156685718550.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190823.308806126@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Use a simple LRU for fgraph_array index numberMasami Hiramatsu (Google)
Since the fgraph_array index is used for the bitmap on the shadow stack, it may leave some entries after a function_graph instance is removed. Thus if another instance reuses the fgraph_array index soon after releasing it, the fgraph may confuse to call the newer callback for the entries which are pushed by the older instance. To avoid reusing the fgraph_array index soon after releasing, introduce a simple LRU table for managing the index number. This will reduce the possibility of this confusion. Link: https://lore.kernel.org/linux-trace-kernel/171509103267.162236.6885097397289135378.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190823.147421545@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Add pid tracing back to function graph tracerSteven Rostedt (Google)
Now that the function_graph has a main callback that handles the function graph subops tracing, it no longer honors the pid filtering of ftrace. Add back this logic in the function_graph code to update the gops callback for the entry function to test if it should trace the current task or not. Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.991720703@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Have the instances use their own ftrace_ops for filteringSteven Rostedt (VMware)
Allow for instances to have their own ftrace_ops part of the fgraph_ops that makes the funtion_graph tracer filter on the set_ftrace_filter file of the instance and not the top instance. This uses the new ftrace_startup_subops(), by using graph_ops as the "manager ops" that defines the callback function and adds the functions defined by the filters of the ops for each trace instance. The callback defined by the manager ops will call the registered fgraph ops that were added to the fgraph_array. Co-developed with Masami Hiramatsu: Link: https://lore.kernel.org/linux-trace-kernel/171509102088.162236.15758883237657317789.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.832946261@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04ftrace: Add subops logic to allow one ops to manage manySteven Rostedt (Google)
There are cases where a single system will use a single function callback to handle multiple users. For example, to allow function_graph tracer to have multiple users where each can trace their own set of functions, it is useful to only have one ftrace_ops registered to ftrace that will call a function by the function_graph tracer to handle the multiplexing with the different registered function_graph tracers. Add a "subop_list" to the ftrace_ops that will hold a list of other ftrace_ops that the top ftrace_ops will manage. The function ftrace_startup_subops() that takes the manager ftrace_ops and a subop ftrace_ops it will manage. If there are no subops with the ftrace_ops yet, it will copy the ftrace_ops subop filters to the manager ftrace_ops and register that with ftrace_startup(), and adds the subop to its subop_list. If the manager ops already has something registered, it will then merge the new subop filters with what it has and enable the new functions that covers all the subops it has. To remove a subop, ftrace_shutdown_subops() is called which will use the subop_list of the manager ops to rebuild all the functions it needs to trace, and update the ftrace records to only call the functions it now has registered. If there are no more functions registered, it will then call ftrace_shutdown() to disable itself completely. Note, it is up to the manager ops callback to always make sure that the subops callbacks are called if its filter matches, as there are times in the update where the callback could be calling more functions than those that are currently registered. This could be updated to handle other systems other than function_graph, for example, fprobes could use this (but will need an interface to call ftrace_startup_subops()). Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.508431129@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04ftrace/function_graph: Pass fgraph_ops to function graph callbacksSteven Rostedt (VMware)
Pass the fgraph_ops structure to the function graph callbacks. This will allow callbacks to add a descriptor to a fgraph_ops private field that wil be added in the future and use it for the callbacks. This will be useful when more than one callback can be registered to the function graph tracer. Co-developed with Masami Hiramatsu: Link: https://lore.kernel.org/linux-trace-kernel/171509098588.162236.4787930115997357578.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190822.035147698@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Remove logic around ftrace_graph_entry and returnSteven Rostedt (VMware)
The function pointers ftrace_graph_entry and ftrace_graph_return are no longer called via the function_graph tracer. Instead, an array structure is now used that will allow for multiple users of the function_graph infrastructure. The variables are still used by the architecture code for non dynamic ftrace configs, where a test is made against them to see if they point to the default stub function or not. This is how the static function tracing knows to call into the function graph tracer infrastructure or not. Two new stub functions are made. entry_run() and return_run(). The ftrace_graph_entry and ftrace_graph_return are set to them respectively when the function graph tracer is enabled, and this will trigger the architecture specific function graph code to be executed. This also requires checking the global_ops hash for all calls into the function_graph tracer. Co-developed with Masami Hiramatsu: Link: https://lore.kernel.org/linux-trace-kernel/171509097408.162236.17387844142114638932.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.872127216@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Handle tail calls for stack unwindingMasami Hiramatsu (Google)
For the tail-call, there would be 2 or more ftrace_ret_stacks on the ret_stack, which records "return_to_handler" as the return address except for the last one. But on the real stack, there should be 1 entry because tail-call reuses the return address on the stack and jump to the next function. In ftrace_graph_ret_addr() that is used for stack unwinding, skip tail calls as a real stack unwinder would do. Link: https://lore.kernel.org/linux-trace-kernel/171509096221.162236.8806372072523195752.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.717065217@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Allow multiple users to attach to function graphSteven Rostedt (VMware)
Allow for multiple users to attach to function graph tracer at the same time. Only 16 simultaneous users can attach to the tracer. This is because there's an array that stores the pointers to the attached fgraph_ops. When a function being traced is entered, each of the ftrace_ops entryfunc is called and if it returns non zero, its index into the array will be added to the shadow stack. On exit of the function being traced, the shadow stack will contain the indexes of the ftrace_ops on the array that want their retfunc to be called. Because a function may sleep for a long time (if a task sleeps itself), the return of the function may be literally days later. If the ftrace_ops is removed, its place on the array is replaced with a ftrace_ops that contains the stub functions and that will be called when the function finally returns. If another ftrace_ops is added that happens to get the same index into the array, its return function may be called. But that's actually the way things current work with the old function graph tracer. If one tracer is removed and another is added, the new one will get the return calls of the function traced by the previous one, thus this is not a regression. This can be fixed by adding a counter to each time the array item is updated and save that on the shadow stack as well, such that it won't be called if the index saved does not match the index on the array. Note, being able to filter functions when both are called is not completely handled yet, but that shouldn't be too hard to manage. Co-developed with Masami Hiramatsu: Link: https://lore.kernel.org/linux-trace-kernel/171509096221.162236.8806372072523195752.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.555493396@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Add an array structure that will allow multiple callbacksSteven Rostedt (VMware)
Add an array structure that will eventually allow the function graph tracer to have up to 16 simultaneous callbacks attached. It's an array of 16 fgraph_ops pointers, that is assigned when one is registered. On entry of a function the entry of the first item in the array is called, and if it returns zero, then the callback returns non zero if it wants the return callback to be called on exit of the function. The array will simplify the process of having more than one callback attached to the same function, as its index into the array can be stored on the shadow stack. We need to only save the index, because this will allow the fgraph_ops to be freed before the function returns (which may happen if the function call schedule for a long time). Co-developed with Masami Hiramatsu: Link: https://lore.kernel.org/linux-trace-kernel/171509095075.162236.8272148192748284581.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.392113213@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by longSteven Rostedt (VMware)
Instead of using "ALIGN()", use BUILD_BUG_ON() as the structures should always be divisible by sizeof(long). Co-developed with Masami Hiramatsu: Link: https://lore.kernel.org/linux-trace-kernel/171509093949.162236.14518699447151894536.stgit@devnote2 Link: http://lkml.kernel.org/r/20190524111144.GI2589@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.232168933@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-06-04function_graph: Convert ret_stack to a series of longsSteven Rostedt (VMware)
In order to make it possible to have multiple callbacks registered with the function_graph tracer, the retstack needs to be converted from an array of ftrace_ret_stack structures to an array of longs. This will allow to store the list of callbacks on the stack for the return side of the functions. Link: https://lore.kernel.org/linux-trace-kernel/171509092742.162236.4427737821399314856.stgit@devnote2 Link: https://lore.kernel.org/linux-trace-kernel/20240603190821.073111754@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guo Ren <guoren@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-14ftrace: Remove unused global 'ftrace_direct_func_count'Dr. David Alan Gilbert
Commit 8788ca164eb4b ("ftrace: Remove the legacy _ftrace_direct API") stopped setting the 'ftrace_direct_func_count' variable, but left it around. Clean it up. Link: https://lore.kernel.org/linux-trace-kernel/20240506233305.215735-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>