summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2024-04-26Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-04-26 We've added 12 non-merge commits during the last 22 day(s) which contain a total of 14 files changed, 168 insertions(+), 72 deletions(-). The main changes are: 1) Fix BPF_PROBE_MEM in verifier and JIT to skip loads from vsyscall page, from Puranjay Mohan. 2) Fix a crash in XDP with devmap broadcast redirect when the latter map is in process of being torn down, from Toke Høiland-Jørgensen. 3) Fix arm64 and riscv64 BPF JITs to properly clear start time for BPF program runtime stats, from Xu Kuohai. 4) Fix a sockmap KCSAN-reported data race in sk_psock_skb_ingress_enqueue, from Jason Xing. 5) Fix BPF verifier error message in resolve_pseudo_ldimm64, from Anton Protopopov. 6) Fix missing DEBUG_INFO_BTF_MODULES Kconfig menu item, from Andrii Nakryiko. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64 bpf, x86: Fix PROBE_MEM runtime load check bpf: verifier: prevent userspace memory access xdp: use flags field to disambiguate broadcast redirect arm32, bpf: Reimplement sign-extension mov instruction riscv, bpf: Fix incorrect runtime stats bpf, arm64: Fix incorrect runtime stats bpf: Fix a verifier verbose message bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue MAINTAINERS: bpf: Add Lehui and Puranjay as riscv64 reviewers MAINTAINERS: Update email address for Puranjay Mohan bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition ==================== Link: https://lore.kernel.org/r/20240426224248.26197-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-26kunit/fortify: Fix mismatched kvalloc()/vfree() usageKees Cook
The kv*() family of tests were accidentally freeing with vfree() instead of kvfree(). Use kvfree() instead. Fixes: 9124a2640148 ("kunit/fortify: Validate __alloc_size attribute results") Link: https://lore.kernel.org/r/20240425230619.work.299-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-26Merge tag 'mm-hotfixes-stable-2024-04-26-13-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "11 hotfixes. 8 are cc:stable and the remaining 3 (nice ratio!) address post-6.8 issues or aren't considered suitable for backporting. All except one of these are for MM. I see no particular theme - it's singletons all over" * tag 'mm-hotfixes-stable-2024-04-26-13-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio() selftests: mm: protection_keys: save/restore nr_hugepages value from launch script stackdepot: respect __GFP_NOLOCKDEP allocation flag hugetlb: check for anon_vma prior to folio allocation mm: zswap: fix shrinker NULL crash with cgroup_disable=memory mm: turn folio_test_hugetlb into a PageType mm: support page_mapcount() on page_has_type() pages mm: create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros mm/hugetlb: fix missing hugetlb_lock for resv uncharge selftests: mm: fix unused and uninitialized variable warning selftests/harness: remove use of LINE_MAX
2024-04-26Fix a potential infinite loop in extract_user_to_sg()David Howells
Fix extract_user_to_sg() so that it will break out of the loop if iov_iter_extract_pages() returns 0 rather than looping around forever. [Note that I've included two fixes lines as the function got moved to a different file and renamed] Fixes: 85dd2c8ff368 ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator") Fixes: f5f82cd18732 ("Move netfs_extract_iter_to_sg() to lib/scatterlist.c") Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Steve French <sfrench@samba.org> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: netfs@lists.linux.dev Link: https://lore.kernel.org/r/1967121.1714034372@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-26sbitmap: use READ_ONCE to access map->wordlinke li
In __sbitmap_queue_get_batch(), map->word is read several times, and update atomically using atomic_long_try_cmpxchg(). But the first two read of map->word is not protected. This patch moves the statement val = READ_ONCE(map->word) forward, eliminating unprotected accesses to map->word within the function. It is aimed at reducing the number of benign races reported by KCSAN in order to focus future debugging effort on harmful races. Signed-off-by: linke li <lilinke99@qq.com> Link: https://lore.kernel.org/r/tencent_0B517C25E519D3D002194E8445E86C04AD0A@qq.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-25test_hexdump: avoid string truncation warningArnd Bergmann
gcc can warn when a string is too long to fit into the strncpy() destination buffer, as it is here depending on the function arguments: inlined from 'test_hexdump_prepare_test.constprop' at /home/arnd/arm-soc/lib/test_hexdump.c:116:3: include/linux/fortify-string.h:108:33: error: '__builtin_strncpy' output truncated copying between 0 and 32 bytes from a string of length 32 [-Werror=stringop-truncation] 108 | #define __underlying_strncpy __builtin_strncpy | ^ include/linux/fortify-string.h:187:16: note: in expansion of macro '__underlying_strncpy' 187 | return __underlying_strncpy(p, q, size); | ^~~~~~~~~~~~~~~~~~~~ The intention here is to copy exactly 'l' bytes without any padding or NUL-termination, so the most logical change is to use memcpy(), just as a previous change adapted the other output from strncpy() to memcpy(). Link: https://lkml.kernel.org/r/20240409140059.3806717-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Justin Stitt <justinstitt@google.com> Cc: Alexey Starikovskiy <astarikovskiy@suse.de> Cc: Bob Moore <robert.moore@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Len Brown <lenb@kernel.org> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25devres: don't use "proxy" headersAndy Shevchenko
Update header inclusions to follow IWYU (Include What You Use) principle. Link: https://lkml.kernel.org/r/20240403104820.557487-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25devres: switch to use dev_err_probe() for unificationAndy Shevchenko
Patch series "devres: A couple of cleanups". A couple of ad-hoc cleanups. No functional changes intended. This patch (of 2): The devm_*() APIs are supposed to be called during the ->probe() stage. Many drivers (especially new ones) have switched to use dev_err_probe() for error messaging for the sake of unification. Let's do the same in the devres APIs. Link: https://lkml.kernel.org/r/20240403104820.557487-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20240403104820.557487-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25kgdb: add HAS_IOPORT dependencyNiklas Schnelle
In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at compile time. We thus need to add HAS_IOPORT as dependency for those drivers using them. Link: https://lkml.kernel.org/r/20240403132547.762429-2-schnelle@linux.ibm.com Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib/build_OID_registry: don't mention the full path of the script in outputUwe Kleine-König
This change strips the full path of the script generating lib/oid_registry_data.c to just lib/build_OID_registry. The motivation for this change is Yocto emitting a build warning File /usr/src/debug/linux-lxatac/6.7-r0/lib/oid_registry_data.c in package linux-lxatac-src contains reference to TMPDIR [buildpaths] So this change brings us one step closer to make the build result reproducible independent of the build path. Link: https://lkml.kernel.org/r/20240313211957.884561-2-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25mm/filemap: optimize filemap folio addingKairui Song
Instead of doing multiple tree walks, do one optimism range check with lock hold, and exit if raced with another insertion. If a shadow exists, check it with a new xas_get_order helper before releasing the lock to avoid redundant tree walks for getting its order. Drop the lock and do the allocation only if a split is needed. In the best case, it only need to walk the tree once. If it needs to alloc and split, 3 walks are issued (One for first ranged conflict check and order retrieving, one for the second check after allocation, one for the insert after split). Testing with 4K pages, in an 8G cgroup, with 16G brd as block device: echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap --rw=randread --time_based \ --ramp_time=30s --runtime=5m --group_reporting Before: bw ( MiB/s): min= 1027, max= 3520, per=100.00%, avg=2445.02, stdev=18.90, samples=8691 iops : min=263001, max=901288, avg=625924.36, stdev=4837.28, samples=8691 After (+7.3%): bw ( MiB/s): min= 493, max= 3947, per=100.00%, avg=2625.56, stdev=25.74, samples=8651 iops : min=126454, max=1010681, avg=672142.61, stdev=6590.48, samples=8651 Test result with THP (do a THP randread then switch to 4K page in hope it issues a lot of splitting): echo 3 > /proc/sys/vm/drop_caches fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap -thp=1 --readonly \ --rw=randread --time_based --ramp_time=30s --runtime=10m \ --group_reporting fio -name=cached --numjobs=16 --filename=/mnt/test.img \ --buffered=1 --ioengine=mmap \ --rw=randread --time_based --runtime=5s --group_reporting Before: bw ( KiB/s): min= 4141, max=14202, per=100.00%, avg=7935.51, stdev=96.85, samples=18976 iops : min= 1029, max= 3548, avg=1979.52, stdev=24.23, samples=18976· READ: bw=4545B/s (4545B/s), 4545B/s-4545B/s (4545B/s-4545B/s), io=64.0KiB (65.5kB), run=14419-14419msec After (+12.5%): bw ( KiB/s): min= 4611, max=15370, per=100.00%, avg=8928.74, stdev=105.17, samples=19146 iops : min= 1151, max= 3842, avg=2231.27, stdev=26.29, samples=19146 READ: bw=4635B/s (4635B/s), 4635B/s-4635B/s (4635B/s-4635B/s), io=64.0KiB (65.5kB), run=14137-14137msec The performance is better for both 4K (+7.5%) and THP (+12.5%) cached read. Link: https://lkml.kernel.org/r/20240415171857.19244-5-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib/xarray: introduce a new helper xas_get_orderKairui Song
It can be used after xas_load to check the order of loaded entries. Compared to xa_get_order, it saves an XA_STATE and avoid a rewalk. Added new test for xas_get_order, to make the test work, we have to export xas_get_order with EXPORT_SYMBOL_GPL. Also fix a sparse warning by checking the slot value with xa_entry instead of accessing it directly, as suggested by Matthew Wilcox. [kasong@tencent.com: simplify comment, sparse warning fix, per Matthew Wilcox] Link: https://lkml.kernel.org/r/20240416071722.45997-4-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20240415171857.19244-4-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25alloc_tag: Tighten file permissions on /proc/allocinfoKees Cook
The /proc/allocinfo file exposes a tremendous about of information about kernel build details, memory allocations (obviously), and potentially even image layout (due to ordering). As this is intended to be consumed by system owners (like /proc/slabinfo), use the same file permissions as there: 0400. Link: https://lkml.kernel.org/r/20240425200844.work.184-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: add memory allocations report in show_mem()Suren Baghdasaryan
Include allocations in show_mem reports. Link: https://lkml.kernel.org/r/20240321163705.3067592-33-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25rhashtable: plumb through alloc tagKent Overstreet
This gives better memory allocation profiling results; rhashtable allocations will be accounted to the code that initialized the rhashtable. [surenb@google.com: undo _noprof additions in the documentation] Link: https://lkml.kernel.org/r/20240326231453.1206227-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-32-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: add codetag reference into slabobj_extSuren Baghdasaryan
To store code tag for every slab object, a codetag reference is embedded into slabobj_ext when CONFIG_MEM_ALLOC_PROFILING=y. Link: https://lkml.kernel.org/r/20240321163705.3067592-23-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: introduce early boot parameter to avoid page_ext memory overheadSuren Baghdasaryan
The highest memory overhead from memory allocation profiling comes from page_ext objects. This overhead exists even if the feature is disabled but compiled-in. To avoid it, introduce an early boot parameter that prevents page_ext object creation. The new boot parameter is a tri-state with possible values of 0|1|never. When it is set to "never" the memory allocation profiling support is disabled, and overhead is minimized (currently no page_ext objects are allocated, in the future more overhead might be eliminated). As a result we also lose ability to enable memory allocation profiling at runtime (because there is no space to store alloctag references). Runtime sysctrl becomes read-only if the early boot parameter was set to "never". Note that the default value of this boot parameter depends on the CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT configuration. When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n the boot parameter is set to "never", therefore eliminating any overhead. CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y results in boot parameter being set to 1 (enabled). This allows distributions to avoid any overhead by setting CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n config and with no changes to the kernel command line. We reuse sysctl.vm.mem_profiling boot parameter name in order to avoid introducing yet another control. This change turns it into a tri-state early boot parameter. Link: https://lkml.kernel.org/r/20240321163705.3067592-16-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: introduce support for page allocation taggingSuren Baghdasaryan
Introduce helper functions to easily instrument page allocators by storing a pointer to the allocation tag associated with the code that allocated the page in a page_ext field. Link: https://lkml.kernel.org/r/20240321163705.3067592-15-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: add allocation tagging support for memory allocation profilingSuren Baghdasaryan
Introduce CONFIG_MEM_ALLOC_PROFILING which provides definitions to easily instrument memory allocators. It registers an "alloc_tags" codetag type with /proc/allocinfo interface to output allocation tag information when the feature is enabled. CONFIG_MEM_ALLOC_PROFILING_DEBUG is provided for debugging the memory allocation profiling instrumentation. Memory allocation profiling can be enabled or disabled at runtime using /proc/sys/vm/mem_profiling sysctl when CONFIG_MEM_ALLOC_PROFILING_DEBUG=n. CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT enables memory allocation profiling by default. [surenb@google.com: Documentation/filesystems/proc.rst: fix allocinfo title] Link: https://lkml.kernel.org/r/20240326073813.727090-1-surenb@google.com [surenb@google.com: do limited memory accounting for modules with ARCH_NEEDS_WEAK_PER_CPU] Link: https://lkml.kernel.org/r/20240402180933.1663992-2-surenb@google.com [klarasmodin@gmail.com: explicitly include irqflags.h in alloc_tag.h] Link: https://lkml.kernel.org/r/20240407133252.173636-1-klarasmodin@gmail.com [surenb@google.com: fix alloc_tag_init() to prevent passing NULL to PTR_ERR()] Link: https://lkml.kernel.org/r/20240417003349.2520094-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-14-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Klara Modin <klarasmodin@gmail.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: prevent module unloading if memory is not freedSuren Baghdasaryan
Skip freeing module's data section if there are non-zero allocation tags because otherwise, once these allocations are freed, the access to their code tag would cause UAF. Link: https://lkml.kernel.org/r/20240321163705.3067592-13-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: code tagging module supportSuren Baghdasaryan
Add support for code tagging from dynamically loaded modules. Link: https://lkml.kernel.org/r/20240321163705.3067592-12-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib: code tagging frameworkSuren Baghdasaryan
Add basic infrastructure to support code tagging which stores tag common information consisting of the module name, function, file name and line number. Provide functions to register a new code tag type and navigate between code tags. Link: https://lkml.kernel.org/r/20240321163705.3067592-11-surenb@google.com Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25lib/test_hmm.c: handle src_pfns and dst_pfns allocation failureDuoming Zhou
The kcalloc() in dmirror_device_evict_chunk() will return null if the physical memory has run out. As a result, if src_pfns or dst_pfns is dereferenced, the null pointer dereference bug will happen. Moreover, the device is going away. If the kcalloc() fails, the pages mapping a chunk could not be evicted. So add a __GFP_NOFAIL flag in kcalloc(). Finally, as there is no need to have physically contiguous memory, Switch kcalloc() to kvcalloc() in order to avoid failing allocations. Link: https://lkml.kernel.org/r/20240312005905.9939-1-duoming@zju.edu.cn Fixes: b2ef9f5a5cb3 ("mm/hmm/test: add selftest driver for HMM") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Cc: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24stackdepot: respect __GFP_NOLOCKDEP allocation flagAndrey Ryabinin
If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow: ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ #49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs] but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 Always preserve __GFP_NOLOCKDEP to fix this. Link: https://lkml.kernel.org/r/20240418141133.22950-1-ryabinin.a.a@gmail.com Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Xiubo Li <xiubli@redhat.com> Closes: https://lore.kernel.org/all/a0caa289-ca02-48eb-9bf2-d86fd47b71f4@redhat.com/ Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Closes: https://lore.kernel.org/all/f9ff999a-e170-b66b-7caf-293f2b147ac2@opensource.wdc.com/ Suggested-by: Dave Chinner <david@fromorbit.com> Tested-by: Xiubo Li <xiubli@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-24ubsan: Avoid i386 UBSAN handler crashes with ClangKees Cook
When generating Runtime Calls, Clang doesn't respect the -mregparm=3 option used on i386. Hopefully this will be fixed correctly in Clang 19: https://github.com/llvm/llvm-project/pull/89707 but we need to fix this for earlier Clang versions today. Force the calling convention to use non-register arguments. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://github.com/KSPP/linux/issues/350 Link: https://lore.kernel.org/r/20240424224026.it.216-kees@kernel.org Acked-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-24cpumask: Introduce cpumask_first_and_and()Dawei Li
Introduce cpumask_first_and_and() to get intersection between 3 cpumasks, free of any intermediate cpumask variable. Instead, cpumask_first_and_and() works in-place with all inputs and produces desired output directly. Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20240416085454.3547175-2-dawei.li@shingroup.cn
2024-04-24ubsan: Remove 1-element array usage in debug reportingKees Cook
The "type_name" character array was still marked as a 1-element array. While we don't validate strings used in format arguments yet, let's fix this before it causes trouble some future day. Link: https://lore.kernel.org/r/20240424162739.work.492-kees@kernel.org Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-24string_kunit: Move strtomem KUnit test to string_kunit.cKees Cook
It is more logical to have the strtomem() test in string_kunit.c instead of the memcpy() suite. Move it to live with memtostr(). Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-24string.h: Introduce memtostr() and memtostr_pad()Kees Cook
Another ambiguous use of strncpy() is to copy from strings that may not be NUL-terminated. These cases depend on having the destination buffer be explicitly larger than the source buffer's maximum size, having the size of the copy exactly match the source buffer's maximum size, and for the destination buffer to get explicitly NUL terminated. This usually happens when parsing protocols or hardware character arrays that are not guaranteed to be NUL-terminated. The code pattern is effectively this: char dest[sizeof(src) + 1]; strncpy(dest, src, sizeof(src)); dest[sizeof(dest) - 1] = '\0'; In practice it usually looks like: struct from_hardware { ... char name[HW_NAME_SIZE] __nonstring; ... }; struct from_hardware *p = ...; char name[HW_NAME_SIZE + 1]; strncpy(name, p->name, HW_NAME_SIZE); name[NW_NAME_SIZE] = '\0'; This cannot be replaced with: strscpy(name, p->name, sizeof(name)); because p->name is smaller and not NUL-terminated, so FORTIFY will trigger when strnlen(p->name, sizeof(name)) is used. And it cannot be replaced with: strscpy(name, p->name, sizeof(p->name)); because then "name" may contain a 1 character early truncation of p->name. Provide an unambiguous interface for converting a maybe not-NUL-terminated string to a NUL-terminated string, with compile-time buffer size checking so that it can never fail at runtime: memtostr() and memtostr_pad(). Also add KUnit tests for both. Link: https://lore.kernel.org/r/20240410023155.2100422-1-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-23Merge 6.9-rc5 into tty-nextGreg Kroah-Hartman
We want the tty fixes in here as well, and it resolves a merge conflict in: drivers/tty/serial/serial_core.c as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-22s390: Stop using weak symbols for __iowrite64_copy()Jason Gunthorpe
Complete switching the __iowriteXX_copy() routines over to use #define and arch provided inline/macro functions instead of weak symbols. S390 has an implementation that simply calls another memcpy function. Inline this so the callers don't have to do two jumps. Link: https://lore.kernel.org/r/3-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22x86: Stop using weak symbols for __iowrite32_copy()Jason Gunthorpe
Start switching iomap_copy routines over to use #define and arch provided inline/macro functions instead of weak symbols. Inline functions allow more compiler optimization and this is often a driver hot path. x86 has the only weak implementation for __iowrite32_copy(), so replace it with a static inline containing the same single instruction inline assembly. The compiler will generate the "mov edx,ecx" in a more optimal way. Remove iomap_copy_64.S Link: https://lore.kernel.org/r/1-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-19Merge tag 'hardening-v6.9-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Correctly disable UBSAN configs in configs/hardening (Nathan Chancellor) - Add missing signed integer overflow trap types to arm64 handler * tag 'hardening-v6.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: ubsan: Add awareness of signed integer overflow traps configs/hardening: Disable CONFIG_UBSAN_SIGNED_WRAP configs/hardening: Fix disabling UBSAN configurations
2024-04-19string: Convert KUnit test names to standard conventionKees Cook
The KUnit convention for test names is AREA_test_WHAT. Adjust the string test names to follow this pattern. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20240419140155.3028912-5-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-19string: Merge strcat KUnit tests into string_kunit.cKees Cook
Move the strcat() tests into string_kunit.c. Remove the separate Kconfig and Makefile rule. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20240419140155.3028912-4-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-19string: Prepare to merge strcat KUnit tests into string_kunit.cKees Cook
The test naming convention differs between string_kunit.c and strcat_kunit.c. Move "test" to the beginning of the function name. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20240419140155.3028912-3-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-19string: Merge strscpy KUnit tests into string_kunit.cKees Cook
Move the strscpy() tests into string_kunit.c. Remove the separate Kconfig and Makefile rule. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20240419140155.3028912-2-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-19string: Prepare to merge strscpy_kunit.c into string_kunit.cKees Cook
In preparation for moving the strscpy_kunit.c tests into string_kunit.c, rename "tc" to "strscpy_check" for better readability. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Tested-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20240419140155.3028912-1-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-19Merge tag 'bootconfig-fixes-v6.9-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig fixes from Masami Hiramatsu: - Fix potential static_command_line buffer overrun. Currently we allocate the memory for static_command_line based on "boot_command_line", but it will copy "command_line" into it. So we use the length of "command_line" instead of "boot_command_line" (as we previously did) - Use memblock_free_late() in xbc_exit() instead of memblock_free() after the buddy system is initialized - Fix a kerneldoc warning * tag 'bootconfig-fixes-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: bootconfig: Fix the kerneldoc of _xbc_exit() bootconfig: use memblock_free_late to free xbc memory to buddy init/main.c: Fix potential static_command_line memory overflow
2024-04-18string_kunit: Add test cases for str*cmp functionsIvan Orlov
Currently, str*cmp functions (strcmp, strncmp, strcasecmp and strncasecmp) are not covered with tests. Extend the `string_kunit.c` test by adding the test cases for them. This patch adds 8 more test cases: 1) strcmp test 2) strcmp test on long strings (2048 chars) 3) strncmp test 4) strncmp test on long strings (2048 chars) 5) strcasecmp test 6) strcasecmp test on long strings 7) strncasecmp test 8) strncasecmp test on long strings These test cases aim at covering as many edge cases as possible, including the tests on empty strings, situations when the different symbol is placed at the end of one of the strings, etc. Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://lore.kernel.org/r/20240417233033.717596-1-ivan.orlov0322@gmail.com Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-18bootconfig: Fix the kerneldoc of _xbc_exit()Masami Hiramatsu (Google)
Fix the kerneldoc of _xbc_exit() which is updated to have an @early argument and the function name is changed. Link: https://lore.kernel.org/all/171321744474.599864.13532445969528690358.stgit@devnote2/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404150036.kPJ3HEFA-lkp@intel.com/ Fixes: 89f9a1e876b5 ("bootconfig: use memblock_free_late to free xbc memory to buddy") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-04-16bpf, tests: Fix typos in commentsChen Pei
Currently, there are two comments with same name "64-bit ATOMIC magnitudes", the second one should be "32-bit ATOMIC magnitudes" based on the context. Signed-off-by: Chen Pei <cp0613@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240415081928.17440-1-cp0613@linux.alibaba.com
2024-04-15ubsan: Add awareness of signed integer overflow trapsKees Cook
On arm64, UBSAN traps can be decoded from the trap instruction. Add the add, sub, and mul overflow trap codes now that CONFIG_UBSAN_SIGNED_WRAP exists. Seen under clang 19: Internal error: UBSAN: unrecognized failure code: 00000000f2005515 [#1] PREEMPT SMP Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/lkml/20240411-fix-ubsan-in-hardening-config-v1-0-e0177c80ffaa@kernel.org Fixes: 557f8c582a9b ("ubsan: Reintroduce signed overflow sanitizer") Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240415182832.work.932-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-04-15net: dqs: make struct dql more cache efficientBreno Leitao
With the previous change, struct dqs->stall_thrs will be in the hot path (at queue side), even if DQS is disabled. The other fields accessed in this function (last_obj_cnt and num_queued) are in the first cache line, let's move this field (stall_thrs) to the very first cache line, since there is a hole there. This does not change the structure size, since it moves an short (2 bytes) to 4-bytes whole in the first cache line. This is the new structure format now: struct dql { unsigned int num_queued; unsigned int last_obj_cnt; ... short unsigned int stall_thrs; /* XXX 2 bytes hole, try to pack */ ... /* --- cacheline 1 boundary (64 bytes) --- */ ... /* Longest stall detected, reported to user */ short unsigned int stall_max; /* XXX 2 bytes hole, try to pack */ }; Also, read the stall_thrs (now in the very first cache line) earlier, together with dql->num_queued (also in the first cache line). Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240411192241.2498631-5-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-14bootconfig: use memblock_free_late to free xbc memory to buddyQiang Zhang
On the time to free xbc memory in xbc_exit(), memblock may has handed over memory to buddy allocator. So it doesn't make sense to free memory back to memblock. memblock_free() called by xbc_exit() even causes UAF bugs on architectures with CONFIG_ARCH_KEEP_MEMBLOCK disabled like x86. Following KASAN logs shows this case. This patch fixes the xbc memory free problem by calling memblock_free() in early xbc init error rewind path and calling memblock_free_late() in xbc exit path to free memory to buddy allocator. [ 9.410890] ================================================================== [ 9.418962] BUG: KASAN: use-after-free in memblock_isolate_range+0x12d/0x260 [ 9.426850] Read of size 8 at addr ffff88845dd30000 by task swapper/0/1 [ 9.435901] CPU: 9 PID: 1 Comm: swapper/0 Tainted: G U 6.9.0-rc3-00208-g586b5dfb51b9 #5 [ 9.446403] Hardware name: Intel Corporation RPLP LP5 (CPU:RaptorLake)/RPLP LP5 (ID:13), BIOS IRPPN02.01.01.00.00.19.015.D-00000000 Dec 28 2023 [ 9.460789] Call Trace: [ 9.463518] <TASK> [ 9.465859] dump_stack_lvl+0x53/0x70 [ 9.469949] print_report+0xce/0x610 [ 9.473944] ? __virt_addr_valid+0xf5/0x1b0 [ 9.478619] ? memblock_isolate_range+0x12d/0x260 [ 9.483877] kasan_report+0xc6/0x100 [ 9.487870] ? memblock_isolate_range+0x12d/0x260 [ 9.493125] memblock_isolate_range+0x12d/0x260 [ 9.498187] memblock_phys_free+0xb4/0x160 [ 9.502762] ? __pfx_memblock_phys_free+0x10/0x10 [ 9.508021] ? mutex_unlock+0x7e/0xd0 [ 9.512111] ? __pfx_mutex_unlock+0x10/0x10 [ 9.516786] ? kernel_init_freeable+0x2d4/0x430 [ 9.521850] ? __pfx_kernel_init+0x10/0x10 [ 9.526426] xbc_exit+0x17/0x70 [ 9.529935] kernel_init+0x38/0x1e0 [ 9.533829] ? _raw_spin_unlock_irq+0xd/0x30 [ 9.538601] ret_from_fork+0x2c/0x50 [ 9.542596] ? __pfx_kernel_init+0x10/0x10 [ 9.547170] ret_from_fork_asm+0x1a/0x30 [ 9.551552] </TASK> [ 9.555649] The buggy address belongs to the physical page: [ 9.561875] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x45dd30 [ 9.570821] flags: 0x200000000000000(node=0|zone=2) [ 9.576271] page_type: 0xffffffff() [ 9.580167] raw: 0200000000000000 ffffea0011774c48 ffffea0012ba1848 0000000000000000 [ 9.588823] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 [ 9.597476] page dumped because: kasan: bad access detected [ 9.605362] Memory state around the buggy address: [ 9.610714] ffff88845dd2ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 9.618786] ffff88845dd2ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 9.626857] >ffff88845dd30000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 9.634930] ^ [ 9.638534] ffff88845dd30080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 9.646605] ffff88845dd30100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 9.654675] ================================================================== Link: https://lore.kernel.org/all/20240414114944.1012359-1-qiang4.zhang@linux.intel.com/ Fixes: 40caa127f3c7 ("init: bootconfig: Remove all bootconfig data when the init memory is removed") Cc: Stable@vger.kernel.org Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-04-12watchdog/softlockup: Low-overhead detection of interrupt stormBitao Hu
The following softlockup is caused by interrupt storm, but it cannot be identified from the call tree. Because the call tree is just a snapshot and doesn't fully capture the behavior of the CPU during the soft lockup. watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] ... Call trace: __do_softirq+0xa0/0x37c __irq_exit_rcu+0x108/0x140 irq_exit+0x14/0x20 __handle_domain_irq+0x84/0xe0 gic_handle_irq+0x80/0x108 el0_irq_naked+0x50/0x58 Therefore, it is necessary to report CPU utilization during the softlockup_threshold period (report once every sample_period, for a total of 5 reportings), like this: watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] CPU#28 Utilization every 4s during lockup: #1: 0% system, 0% softirq, 100% hardirq, 0% idle #2: 0% system, 0% softirq, 100% hardirq, 0% idle #3: 0% system, 0% softirq, 100% hardirq, 0% idle #4: 0% system, 0% softirq, 100% hardirq, 0% idle #5: 0% system, 0% softirq, 100% hardirq, 0% idle ... This is helpful in determining whether an interrupt storm has occurred or in identifying the cause of the softlockup. The criteria for determination are as follows: a. If the hardirq utilization is high, then interrupt storm should be considered and the root cause cannot be determined from the call tree. b. If the softirq utilization is high, then the call might not necessarily point at the root cause. c. If the system utilization is high, then analyzing the root cause from the call tree is possible in most cases. The mechanism requires a considerable amount of global storage space when configured for the maximum number of CPUs. Therefore, adding a SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes" if the max number of CPUs is <= 128. Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Liu Song <liusong@linux.alibaba.com> Link: https://lore.kernel.org/r/20240411074134.30922-5-yaoma@linux.alibaba.com
2024-04-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/unix/garbage.c 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c faa12ca24558 ("bnxt_en: Reset PTP tx_avail after possible firmware reset") b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 7ac10c7d728d ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()") 194fad5b2781 ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions") drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 958f56e48385 ("net/mlx5e: Un-expose functions in en.h") 49e6c9387051 ("net/mlx5e: RSS, Block XOR hash with over 128 channels") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11Merge tag 'net-6.9-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth. Current release - new code bugs: - netfilter: complete validation of user input - mlx5: disallow SRIOV switchdev mode when in multi-PF netdev Previous releases - regressions: - core: fix u64_stats_init() for lockdep when used repeatedly in one file - ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr - bluetooth: fix memory leak in hci_req_sync_complete() - batman-adv: avoid infinite loop trying to resize local TT - drv: geneve: fix header validation in geneve[6]_xmit_skb - drv: bnxt_en: fix possible memory leak in bnxt_rdma_aux_device_init() - drv: mlx5: offset comp irq index in name by one - drv: ena: avoid double-free clearing stale tx_info->xdpf value - drv: pds_core: fix pdsc_check_pci_health deadlock Previous releases - always broken: - xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING - bluetooth: fix setsockopt not validating user input - af_unix: clear stale u->oob_skb. - nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies - drv: virtio_net: fix guest hangup on invalid RSS update - drv: mlx5e: Fix mlx5e_priv_init() cleanup flow - dsa: mt7530: trap link-local frames regardless of ST Port State" * tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits) net: ena: Set tx_info->xdpf value to NULL net: ena: Fix incorrect descriptor free behavior net: ena: Wrong missing IO completions check order net: ena: Fix potential sign extension issue af_unix: Fix garbage collector racing against connect() net: dsa: mt7530: trap link-local frames regardless of ST Port State Revert "s390/ism: fix receive message buffer allocation" net: sparx5: fix wrong config being used when reconfiguring PCS net/mlx5: fix possible stack overflows net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev net/mlx5e: RSS, Block XOR hash with over 128 channels net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit net/mlx5e: HTB, Fix inconsistencies with QoS SQs number net/mlx5e: Fix mlx5e_priv_init() cleanup flow net/mlx5e: RSS, Block changing channels number when RXFH is configured net/mlx5: Correctly compare pkt reformat ids net/mlx5: Properly link new fs rules into the tree net/mlx5: offset comp irq index in name by one net/mlx5: Register devlink first under devlink lock net/mlx5: E-switch, store eswitch pointer before registering devlink_param ...
2024-04-10Merge tag 'hardening-v6.9-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - gcc-plugins/stackleak: Avoid .head.text section (Ard Biesheuvel) - ubsan: fix unused variable warning in test module (Arnd Bergmann) - Improve entropy diffusion in randomize_kstack * tag 'hardening-v6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: randomize_kstack: Improve entropy diffusion ubsan: fix unused variable warning in test module gcc-plugins/stackleak: Avoid .head.text section