summaryrefslogtreecommitdiff
path: root/mm/mmap_lock.c
AgeCommit message (Collapse)Author
2025-01-13mm: mmap_lock: optimize mmap_lock tracepointsShakeel Butt
We are starting to deploy mmap_lock tracepoint monitoring across our fleet and the early results showed that these tracepoints are consuming significant amount of CPUs in kernfs_path_from_node when enabled. It seems like the kernel is trying to resolve the cgroup path in the fast path of the locking code path when the tracepoints are enabled. In addition for some application their metrics are regressing when monitoring is enabled. The cgroup path resolution can be slow and should not be done in the fast path. Most userspace tools, like bpftrace, provides functionality to get the cgroup path from cgroup id, so let's just trace the cgroup id and the users can use better tools to get the path in the slow path. Link: https://lkml.kernel.org/r/20241125171617.113892-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcountVlastimil Babka
Since 7d6be67cfdd4 ("mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer") we use trace_mmap_lock_reg()/unreg() only to maintain an atomic reg_refcount which is checked to avoid performing get_mm_memcg_path() in case none of the tracepoints using it is enabled. This can be achieved directly by putting all the work needed for the tracepoint behind the trace_mmap_lock_##type##_enabled(), as suggested by Documentation/trace/tracepoints.rst and with the following advantages: - uses the tracepoint's static key instead of evaluating a branch - the check tracepoint specific, not shared by all of them - we can get rid of trace_mmap_lock_reg()/unreg() completely Thus use the trace_..._enabled() check and remove unnecessary code. Link: https://lkml.kernel.org/r/20241105113456.95066-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-03mm: mmap_lock: replace get_memcg_path_buf() with on-stack bufferTetsuo Handa
Commit 2b5067a8143e ("mm: mmap_lock: add tracepoints around lock acquisition") introduced TRACE_MMAP_LOCK_EVENT() macro using preempt_disable() in order to let get_mm_memcg_path() return a percpu buffer exclusively used by normal, softirq, irq and NMI contexts respectively. Commit 832b50725373 ("mm: mmap_lock: use local locks instead of disabling preemption") replaced preempt_disable() with local_lock(&memcg_paths.lock) based on an argument that preempt_disable() has to be avoided because get_mm_memcg_path() might sleep if PREEMPT_RT=y. But syzbot started reporting inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. and inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. messages, for local_lock() does not disable IRQ. We could replace local_lock() with local_lock_irqsave() in order to suppress these messages. But this patch instead replaces percpu buffers with on-stack buffer, for the size of each buffer returned by get_memcg_path_buf() is only 256 bytes which is tolerable for allocating from current thread's kernel stack memory. Link: https://lkml.kernel.org/r/ef22d289-eadb-4ed9-863b-fbc922b33d8d@I-love.SAKURA.ne.jp Reported-by: syzbot <syzbot+40905bca570ae6784745@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=40905bca570ae6784745 Fixes: 832b50725373 ("mm: mmap_lock: use local locks instead of disabling preemption") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Nicolas Saenz Julienne <nsaenzju@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2021-07-23mm: mmap_lock: fix disabling preemption directlyMuchun Song
Commit 832b50725373 ("mm: mmap_lock: use local locks instead of disabling preemption") fixed a bug by using local locks. But commit d01079f3d0c0 ("mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations") changed those lines back to the original version. I guess it was introduced by fixing conflicts. Link: https://lkml.kernel.org/r/20210720074228.76342-1-songmuchun@bytedance.com Fixes: d01079f3d0c0 ("mm/mmap_lock: remove dead code for !CONFIG_TRACING configurations") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01mm/mmap_lock: remove dead code for !CONFIG_TRACING configurationsMel Gorman
make W=1 generates the following warning in mmap_lock.c for allnoconfig mm/mmap_lock.c:213:6: warning: no previous prototype for `__mmap_lock_do_trace_start_locking' [-Wmissing-prototypes] void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/mmap_lock.c:219:6: warning: no previous prototype for `__mmap_lock_do_trace_acquire_returned' [-Wmissing-prototypes] void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/mmap_lock.c:226:6: warning: no previous prototype for `__mmap_lock_do_trace_released' [-Wmissing-prototypes] void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write) On !CONFIG_TRACING configurations, the code is dead so put it behind an #ifdef. [cuibixuan@huawei.com: fix warning when CONFIG_TRACING is not defined] Link: https://lkml.kernel.org/r/20210531033426.74031-1-cuibixuan@huawei.com Link: https://lkml.kernel.org/r/20210520084809.8576-13-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Streetman <ddstreet@ieee.org> Cc: David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm: mmap_lock: use local locks instead of disabling preemptionNicolas Saenz Julienne
mmap_lock will explicitly disable/enable preemption upon manipulating its local CPU variables. This is to be expected, but in this case, it doesn't play well with PREEMPT_RT. The preemption disabled code section also takes a spin-lock. Spin-locks in RT systems will try to schedule, which is exactly what we're trying to avoid. To mitigate this, convert the explicit preemption handling to local_locks. Which are RT aware, and will disable migration instead of preemption when PREEMPT_RT=y. The faulty call trace looks like the following: __mmap_lock_do_trace_*() preempt_disable() get_mm_memcg_path() cgroup_path() kernfs_path_from_node() spin_lock_irqsave() /* Scheduling while atomic! */ Link: https://lkml.kernel.org/r/20210604163506.2103900-1-nsaenzju@redhat.com Fixes: 2b5067a8143e3 ("mm: mmap_lock: add tracepoints around lock acquisition ") Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Tested-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: mmap_lock: add tracepoints around lock acquisitionAxel Rasmussen
The goal of these tracepoints is to be able to debug lock contention issues. This lock is acquired on most (all?) mmap / munmap / page fault operations, so a multi-threaded process which does a lot of these can experience significant contention. We trace just before we start acquisition, when the acquisition returns (whether it succeeded or not), and when the lock is released (or downgraded). The events are broken out by lock type (read / write). The events are also broken out by memcg path. For container-based workloads, users often think of several processes in a memcg as a single logical "task", so collecting statistics at this level is useful. The end goal is to get latency information. This isn't directly included in the trace events. Instead, users are expected to compute the time between "start locking" and "acquire returned", using e.g. synthetic events or BPF. The benefit we get from this is simpler code. Because we use tracepoint_enabled() to decide whether or not to trace, this patch has effectively no overhead unless tracepoints are enabled at runtime. If tracepoints are enabled, there is a performance impact, but how much depends on exactly what e.g. the BPF program does. [axelrasmussen@google.com: fix use-after-free race and css ref leak in tracepoints] Link: https://lkml.kernel.org/r/20201130233504.3725241-1-axelrasmussen@google.com [axelrasmussen@google.com: v3] Link: https://lkml.kernel.org/r/20201207213358.573750-1-axelrasmussen@google.com [rostedt@goodmis.org: in-depth examples of tracepoint_enabled() usage, and per-cpu-per-context buffer design] Link: https://lkml.kernel.org/r/20201105211739.568279-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>