From 4ff16c25e2cc48cbe6956e356c38a25ac063a64d Mon Sep 17 00:00:00 2001 From: David Smith Date: Tue, 7 Feb 2012 10:11:05 -0600 Subject: tracepoint, vfs, sched: Add exec() tracepoint Added a minimal exec tracepoint. Exec is an important major event in the life of a task, like fork(), clone() or exit(), all of which we already trace. [ We also do scheduling re-balancing during exec() - so it's useful from a scheduler instrumentation POV as well. ] If you want to watch a task start up, when it gets exec'ed is a good place to start. With the addition of this tracepoint, exec's can be monitored and better picture of general system activity can be obtained. This tracepoint will also enable better process life tracking, allowing you to answer questions like "what process keeps starting up binary X?". This tracepoint can also be useful in ftrace filtering and trigger conditions: i.e. starting or stopping filtering when exec is called. Signed-off-by: David Smith Signed-off-by: Peter Zijlstra Cc: Steven Rostedt Cc: Christoph Hellwig Cc: Al Viro Cc: Andrew Morton Cc: Linus Torvalds Link: http://lkml.kernel.org/r/4F314D19.7030504@redhat.com Signed-off-by: Ingo Molnar --- fs/exec.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'fs/exec.c') diff --git a/fs/exec.c b/fs/exec.c index aeb135c7ff5c..d0d208092773 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -63,6 +63,8 @@ #include #include "internal.h" +#include + int core_uses_pid; char core_pattern[CORENAME_MAX_SIZE] = "core"; unsigned int core_pipe_limit; @@ -1401,9 +1403,10 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs) */ bprm->recursion_depth = depth; if (retval >= 0) { - if (depth == 0) - ptrace_event(PTRACE_EVENT_EXEC, - old_pid); + if (depth == 0) { + trace_sched_process_exec(current, old_pid, bprm); + ptrace_event(PTRACE_EVENT_EXEC, old_pid); + } put_binfmt(fmt); allow_write_access(bprm->file); if (bprm->file) -- cgit From e636825346b36a07ccfc8e30946d52855e21f681 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 19 Mar 2012 17:03:22 +0100 Subject: exit_signal: simplify the "we have changed execution domain" logic exit_notify() checks "tsk->self_exec_id != tsk->parent_exec_id" to handle the "we have changed execution domain" case. We can change do_thread() to always set ->exit_signal = SIGCHLD and remove this check to simplify the code. We could change setup_new_exec() instead, this looks more logical because it increments ->self_exec_id. But note that de_thread() already resets ->exit_signal if it changes the leader, let's keep both changes close to each other. Note that we change ->exit_signal lockless, this changes the rules. Thereafter ->exit_signal is not stable under tasklist but this is fine, the only possible change is OLDSIG -> SIGCHLD. This can race with eligible_child() but the race is harmless. We can race with reparent_leader() which changes our ->exit_signal in parallel, but it does the same change to SIGCHLD. The noticeable user-visible change is that the execing task is not "visible" to do_wait()->eligible_child(__WCLONE) right after exec. To me this looks more logical, and this is consistent with mt case. Signed-off-by: Oleg Nesterov Signed-off-by: Linus Torvalds --- fs/exec.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'fs/exec.c') diff --git a/fs/exec.c b/fs/exec.c index b0695a9900ef..1e94d2263ae0 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -977,6 +977,9 @@ static int de_thread(struct task_struct *tsk) sig->notify_count = 0; no_thread_group: + /* we have changed execution domain */ + tsk->exit_signal = SIGCHLD; + if (current->mm) setmax_mm_hiwater_rss(&sig->maxrss, current->mm); -- cgit From 701085b219016d38f105b031381b9cee6200253a Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 19 Mar 2012 17:04:01 +0100 Subject: exec: move de_thread()->setmax_mm_hiwater_rss() into exec_mmap() Minor cleanup. de_thread()->setmax_mm_hiwater_rss() looks a bit strange, move it into exec_mmap() which plays with old_mm. Signed-off-by: Oleg Nesterov Signed-off-by: Linus Torvalds --- fs/exec.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'fs/exec.c') diff --git a/fs/exec.c b/fs/exec.c index 1e94d2263ae0..95551c6da090 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -850,6 +850,7 @@ static int exec_mmap(struct mm_struct *mm) if (old_mm) { up_read(&old_mm->mmap_sem); BUG_ON(active_mm != old_mm); + setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm); mm_update_next_owner(old_mm); mmput(old_mm); return 0; @@ -980,9 +981,6 @@ no_thread_group: /* we have changed execution domain */ tsk->exit_signal = SIGCHLD; - if (current->mm) - setmax_mm_hiwater_rss(&sig->maxrss, current->mm); - exit_itimers(sig); flush_itimer_signals(); -- cgit