From 9a11b49a805665e13a56aa067afaf81d43ec1514 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Mon, 3 Jul 2006 00:24:33 -0700 Subject: [PATCH] lockdep: better lock debugging Generic lock debugging: - generalized lock debugging framework. For example, a bug in one lock subsystem turns off debugging in all lock subsystems. - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from the mutex/rtmutex debugging code: it caused way too much prototype hackery, and lockdep will give the same information anyway. - ability to do silent tests - check lock freeing in vfree too. - more finegrained debugging options, to allow distributions to turn off more expensive debugging features. There's no separate 'held mutexes' list anymore - but there's a 'held locks' stack within lockdep, which unifies deadlock detection across all lock classes. (this is independent of the lockdep validation stuff - lockdep first checks whether we are holding a lock already) Here are the current debugging options: CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y which do: config DEBUG_MUTEXES bool "Mutex debugging, basic checks" config DEBUG_LOCK_ALLOC bool "Detect incorrect freeing of live mutexes" Signed-off-by: Ingo Molnar Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 4 ---- 1 file changed, 4 deletions(-) (limited to 'kernel/fork.c') diff --git a/kernel/fork.c b/kernel/fork.c index 9064bf9e131b..1cd46a4fb0d3 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -919,10 +919,6 @@ static inline void rt_mutex_init_task(struct task_struct *p) spin_lock_init(&p->pi_lock); plist_head_init(&p->pi_waiters, &p->pi_lock); p->pi_blocked_on = NULL; -# ifdef CONFIG_DEBUG_RT_MUTEXES - spin_lock_init(&p->held_list_lock); - INIT_LIST_HEAD(&p->held_list_head); -# endif #endif } -- cgit From de30a2b355ea85350ca2f58f3b9bf4e5bc007986 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Mon, 3 Jul 2006 00:24:42 -0700 Subject: [PATCH] lockdep: irqtrace subsystem, core Accurate hard-IRQ-flags and softirq-flags state tracing. This allows us to attach extra functionality to IRQ flags on/off events (such as trace-on/off). Signed-off-by: Ingo Molnar Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'kernel/fork.c') diff --git a/kernel/fork.c b/kernel/fork.c index 1cd46a4fb0d3..b7db7fb74f53 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -968,6 +968,10 @@ static task_t *copy_process(unsigned long clone_flags, if (!p) goto fork_out; +#ifdef CONFIG_TRACE_IRQFLAGS + DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled); + DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled); +#endif retval = -EAGAIN; if (atomic_read(&p->user->processes) >= p->signal->rlim[RLIMIT_NPROC].rlim_cur) { @@ -1042,6 +1046,21 @@ static task_t *copy_process(unsigned long clone_flags, } mpol_fix_fork_child_flag(p); #endif +#ifdef CONFIG_TRACE_IRQFLAGS + p->irq_events = 0; + p->hardirqs_enabled = 0; + p->hardirq_enable_ip = 0; + p->hardirq_enable_event = 0; + p->hardirq_disable_ip = _THIS_IP_; + p->hardirq_disable_event = 0; + p->softirqs_enabled = 1; + p->softirq_enable_ip = _THIS_IP_; + p->softirq_enable_event = 0; + p->softirq_disable_ip = 0; + p->softirq_disable_event = 0; + p->hardirq_context = 0; + p->softirq_context = 0; +#endif rt_mutex_init_task(p); -- cgit From fbb9ce9530fd9b66096d5187fa6a115d16d9746c Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Mon, 3 Jul 2006 00:24:50 -0700 Subject: [PATCH] lockdep: core Do 'make oldconfig' and accept all the defaults for new config options - reboot into the kernel and if everything goes well it should boot up fine and you should have /proc/lockdep and /proc/lockdep_stats files. Typically if the lock validator finds some problem it will print out voluminous debug output that begins with "BUG: ..." and which syslog output can be used by kernel developers to figure out the precise locking scenario. What does the lock validator do? It "observes" and maps all locking rules as they occur dynamically (as triggered by the kernel's natural use of spinlocks, rwlocks, mutexes and rwsems). Whenever the lock validator subsystem detects a new locking scenario, it validates this new rule against the existing set of rules. If this new rule is consistent with the existing set of rules then the new rule is added transparently and the kernel continues as normal. If the new rule could create a deadlock scenario then this condition is printed out. When determining validity of locking, all possible "deadlock scenarios" are considered: assuming arbitrary number of CPUs, arbitrary irq context and task context constellations, running arbitrary combinations of all the existing locking scenarios. In a typical system this means millions of separate scenarios. This is why we call it a "locking correctness" validator - for all rules that are observed the lock validator proves it with mathematical certainty that a deadlock could not occur (assuming that the lock validator implementation itself is correct and its internal data structures are not corrupted by some other kernel subsystem). [see more details and conditionals of this statement in include/linux/lockdep.h and Documentation/lockdep-design.txt] Furthermore, this "all possible scenarios" property of the validator also enables the finding of complex, highly unlikely multi-CPU multi-context races via single single-context rules, increasing the likelyhood of finding bugs drastically. In practical terms: the lock validator already found a bug in the upstream kernel that could only occur on systems with 3 or more CPUs, and which needed 3 very unlikely code sequences to occur at once on the 3 CPUs. That bug was found and reported on a single-CPU system (!). So in essence a race will be found "piecemail-wise", triggering all the necessary components for the race, without having to reproduce the race scenario itself! In its short existence the lock validator found and reported many bugs before they actually caused a real deadlock. To further increase the efficiency of the validator, the mapping is not per "lock instance", but per "lock-class". For example, all struct inode objects in the kernel have inode->inotify_mutex. If there are 10,000 inodes cached, then there are 10,000 lock objects. But ->inotify_mutex is a single "lock type", and all locking activities that occur against ->inotify_mutex are "unified" into this single lock-class. The advantage of the lock-class approach is that all historical ->inotify_mutex uses are mapped into a single (and as narrow as possible) set of locking rules - regardless of how many different tasks or inode structures it took to build this set of rules. The set of rules persist during the lifetime of the kernel. To see the rough magnitude of checking that the lock validator does, here's a portion of /proc/lockdep_stats, fresh after bootup: lock-classes: 694 [max: 2048] direct dependencies: 1598 [max: 8192] indirect dependencies: 17896 all direct dependencies: 16206 dependency chains: 1910 [max: 8192] in-hardirq chains: 17 in-softirq chains: 105 in-process chains: 1065 stack-trace entries: 38761 [max: 131072] combined max dependencies: 2033928 hardirq-safe locks: 24 hardirq-unsafe locks: 176 softirq-safe locks: 53 softirq-unsafe locks: 137 irq-safe locks: 59 irq-unsafe locks: 176 The lock validator has observed 1598 actual single-thread locking patterns, and has validated all possible 2033928 distinct locking scenarios. More details about the design of the lock validator can be found in Documentation/lockdep-design.txt, which can also found at: http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt [bunk@stusta.de: cleanups] Signed-off-by: Ingo Molnar Signed-off-by: Arjan van de Ven Signed-off-by: Adrian Bunk Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'kernel/fork.c') diff --git a/kernel/fork.c b/kernel/fork.c index b7db7fb74f53..7f48abdd7bb6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1061,6 +1061,11 @@ static task_t *copy_process(unsigned long clone_flags, p->hardirq_context = 0; p->softirq_context = 0; #endif +#ifdef CONFIG_LOCKDEP + p->lockdep_depth = 0; /* no locks held yet */ + p->curr_chain_key = 0; + p->lockdep_recursion = 0; +#endif rt_mutex_init_task(p); -- cgit From ad33945175bed649ca5fe0881269db005bbb449a Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Mon, 3 Jul 2006 00:25:15 -0700 Subject: [PATCH] lockdep: annotate ->mmap_sem Teach special (recursive) locking code to the lock validator. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar Signed-off-by: Arjan van de Ven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'kernel/fork.c') diff --git a/kernel/fork.c b/kernel/fork.c index 7f48abdd7bb6..54953d8a6f17 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -193,7 +193,10 @@ static inline int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) down_write(&oldmm->mmap_sem); flush_cache_mm(oldmm); - down_write(&mm->mmap_sem); + /* + * Not linked in yet - no deadlock potential: + */ + down_write_nested(&mm->mmap_sem, SINGLE_DEPTH_NESTING); mm->locked_vm = 0; mm->mmap = NULL; -- cgit From 36c8b586896f60cb91a4fd526233190b34316baf Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Mon, 3 Jul 2006 00:25:41 -0700 Subject: [PATCH] sched: cleanup, remove task_t, convert to struct task_struct cleanup: remove task_t and convert all the uses to struct task_struct. I introduced it for the scheduler anno and it was a mistake. Conversion was mostly scripted, the result was reviewed and all secondary whitespace and style impact (if any) was fixed up by hand. Signed-off-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/fork.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'kernel/fork.c') diff --git a/kernel/fork.c b/kernel/fork.c index 54953d8a6f17..56e4e07e45f7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -933,13 +933,13 @@ static inline void rt_mutex_init_task(struct task_struct *p) * parts of the process environment (as per the clone * flags). The actual kick-off is left to the caller. */ -static task_t *copy_process(unsigned long clone_flags, - unsigned long stack_start, - struct pt_regs *regs, - unsigned long stack_size, - int __user *parent_tidptr, - int __user *child_tidptr, - int pid) +static struct task_struct *copy_process(unsigned long clone_flags, + unsigned long stack_start, + struct pt_regs *regs, + unsigned long stack_size, + int __user *parent_tidptr, + int __user *child_tidptr, + int pid) { int retval; struct task_struct *p = NULL; @@ -1294,9 +1294,9 @@ struct pt_regs * __devinit __attribute__((weak)) idle_regs(struct pt_regs *regs) return regs; } -task_t * __devinit fork_idle(int cpu) +struct task_struct * __devinit fork_idle(int cpu) { - task_t *task; + struct task_struct *task; struct pt_regs regs; task = copy_process(CLONE_VM, 0, idle_regs(®s), 0, NULL, NULL, 0); -- cgit