summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2024-02-23lib/stackdepot: fix first entry having a 0-handleOscar Salvador
Patch series "page_owner: print stacks and their outstanding allocations", v10. page_owner is a great debug functionality tool that lets us know about all pages that have been allocated/freed and their specific stacktrace. This comes very handy when debugging memory leaks, since with some scripting we can see the outstanding allocations, which might point to a memory leak. In my experience, that is one of the most useful cases, but it can get really tedious to screen through all pages and try to reconstruct the stack <-> allocated/freed relationship, becoming most of the time a daunting and slow process when we have tons of allocation/free operations. This patchset aims to ease that by adding a new functionality into page_owner. This functionality creates a new directory called 'page_owner_stacks' under 'sys/kernel//debug' with a read-only file called 'show_stacks', which prints out all the stacks followed by their outstanding number of allocations (being that the times the stacktrace has allocated but not freed yet). This gives us a clear and a quick overview of stacks <-> allocated/free. We take advantage of the new refcount_f field that stack_record struct gained, and increment/decrement the stack refcount on every __set_page_owner() (alloc operation) and __reset_page_owner (free operation) call. Unfortunately, we cannot use the new stackdepot api STACK_DEPOT_FLAG_GET because it does not fulfill page_owner needs, meaning we would have to special case things, at which point makes more sense for page_owner to do its own {dec,inc}rementing of the stacks. E.g: Using STACK_DEPOT_FLAG_PUT, once the refcount reaches 0, such stack gets evicted, so page_owner would lose information. This patchset also creates a new file called 'set_threshold' within 'page_owner_stacks' directory, and by writing a value to it, the stacks which refcount is below such value will be filtered out. A PoC can be found below: # cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks.txt # head -40 page_owner_full_stacks.txt prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 alloc_pages_mpol+0x91/0x1f0 folio_alloc+0x14/0x50 filemap_alloc_folio+0xb2/0x100 page_cache_ra_unbounded+0x96/0x180 filemap_get_pages+0xfd/0x590 filemap_read+0xcc/0x330 blkdev_read_iter+0xb8/0x150 vfs_read+0x285/0x320 ksys_read+0xa5/0xe0 do_syscall_64+0x80/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 stack_count: 521 prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 alloc_pages_mpol+0x91/0x1f0 folio_alloc+0x14/0x50 filemap_alloc_folio+0xb2/0x100 __filemap_get_folio+0x14a/0x490 ext4_write_begin+0xbd/0x4b0 [ext4] generic_perform_write+0xc1/0x1e0 ext4_buffered_write_iter+0x68/0xe0 [ext4] ext4_file_write_iter+0x70/0x740 [ext4] vfs_write+0x33d/0x420 ksys_write+0xa5/0xe0 do_syscall_64+0x80/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 stack_count: 4609 ... ... # echo 5000 > /sys/kernel/debug/page_owner_stacks/set_threshold # cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks_5000.txt # head -40 page_owner_full_stacks_5000.txt prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 alloc_pages_mpol+0x91/0x1f0 folio_alloc+0x14/0x50 filemap_alloc_folio+0xb2/0x100 __filemap_get_folio+0x14a/0x490 ext4_write_begin+0xbd/0x4b0 [ext4] generic_perform_write+0xc1/0x1e0 ext4_buffered_write_iter+0x68/0xe0 [ext4] ext4_file_write_iter+0x70/0x740 [ext4] vfs_write+0x33d/0x420 ksys_pwrite64+0x75/0x90 do_syscall_64+0x80/0x160 entry_SYSCALL_64_after_hwframe+0x6e/0x76 stack_count: 6781 prep_new_page+0xa9/0x120 get_page_from_freelist+0x801/0x2210 __alloc_pages+0x18b/0x350 pcpu_populate_chunk+0xec/0x350 pcpu_balance_workfn+0x2d1/0x4a0 process_scheduled_works+0x84/0x380 worker_thread+0x12a/0x2a0 kthread+0xe3/0x110 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1b/0x30 stack_count: 8641 This patch (of 7): The very first entry of stack_record gets a handle of 0, but this is wrong because stackdepot treats a 0-handle as a non-valid one. E.g: See the check in stack_depot_fetch() Fix this by adding and offset of 1. This bug has been lurking since the very beginning of stackdepot, but no one really cared as it seems. Because of that I am not adding a Fixes tag. Link: https://lkml.kernel.org/r/20240215215907.20121-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20240215215907.20121-2-osalvador@suse.de Co-developed-by: Marco Elver <elver@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23merge mm-hotfixes-stable into mm-nonmm-stable to pick up stackdepot changesAndrew Morton
2024-02-23stackdepot: use variable size records for non-evictable entriesMarco Elver
With the introduction of stack depot evictions, each stack record is now fixed size, so that future reuse after an eviction can safely store differently sized stack traces. In all cases that do not make use of evictions, this wastes lots of space. Fix it by re-introducing variable size stack records (up to the max allowed size) for entries that will never be evicted. We know if an entry will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided, since a later stack_depot_put() attempt is undefined behavior. With my current kernel config that enables KASAN and also SLUB owner tracking, I observe (after a kernel boot) a whopping reduction of 296 stack depot pools, which translates into 4736 KiB saved. The savings here are from SLUB owner tracking only, because KASAN generic mode still uses refcounting. Before: pools: 893 allocations: 29841 frees: 6524 in_use: 23317 freelist_size: 3454 After: pools: 597 refcounted_allocations: 17547 refcounted_frees: 6477 refcounted_in_use: 11070 freelist_size: 3497 persistent_count: 12163 persistent_bytes: 1717008 [elver@google.com: fix -Wstringop-overflow warning] Link: https://lore.kernel.org/all/20240201135747.18eca98e@canb.auug.org.au/ Link: https://lkml.kernel.org/r/20240201090434.1762340-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Link: https://lkml.kernel.org/r/20240129100708.39460-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22netlink: add nla be16/32 types to minlen arrayFlorian Westphal
BUG: KMSAN: uninit-value in nla_validate_range_unsigned lib/nlattr.c:222 [inline] BUG: KMSAN: uninit-value in nla_validate_int_range lib/nlattr.c:336 [inline] BUG: KMSAN: uninit-value in validate_nla lib/nlattr.c:575 [inline] BUG: KMSAN: uninit-value in __nla_validate_parse+0x2e20/0x45c0 lib/nlattr.c:631 nla_validate_range_unsigned lib/nlattr.c:222 [inline] nla_validate_int_range lib/nlattr.c:336 [inline] validate_nla lib/nlattr.c:575 [inline] ... The message in question matches this policy: [NFTA_TARGET_REV] = NLA_POLICY_MAX(NLA_BE32, 255), but because NLA_BE32 size in minlen array is 0, the validation code will read past the malformed (too small) attribute. Note: Other attributes, e.g. BITFIELD32, SINT, UINT.. are also missing: those likely should be added too. Reported-by: syzbot+3f497b07aa3baf2fb4d0@syzkaller.appspotmail.com Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/all/CABOYnLzFYHSnvTyS6zGa-udNX55+izqkOt2sB9WDqUcEGW6n8w@mail.gmail.com/raw Fixes: ecaf75ffd5f5 ("netlink: introduce bigendian integer types") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240221172740.5092-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22lib/Kconfig.debug: update Clang version check in CONFIG_KCOVNathan Chancellor
Now that the minimum supported version of LLVM for building the kernel has been bumped to 13.0.1, this condition can be changed to just CONFIG_CC_IS_CLANG, as the build will fail during the configuration stage for older LLVM versions. Link: https://lkml.kernel.org/r/20240125-bump-min-llvm-ver-to-13-0-1-v1-10-f5ff9bda41c5@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Conor Dooley <conor@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22lib: dhry: add missing closing parenthesisGeert Uytterhoeven
The help text for the Dhrystone benchmark test lacks a matching closing parenthesis. Link: https://lkml.kernel.org/r/772b43271bcb3dd17a6aae671b2084f08c05b079.1705934853.git.geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22lib: dhry: use ktime_ms_delta() helperGeert Uytterhoeven
Use the existing ktime_ms_delta() helper instead of open-coding the same operation. Link: https://lkml.kernel.org/r/bb43c67a7580de6152f5e6eb225071166d33b6e4.1705934853.git.geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22lib: dhry: remove unneeded <linux/mutex.h>Geert Uytterhoeven
Patch series "lib: dhry: miscellaneous cleanups". This patch series contains a few miscellaneous cleanups for the Dhrystone benchmark test. This patch (of 3): The Dhrystone benchmark test does not use mutexes. Link: https://lkml.kernel.org/r/cover.1705934853.git.geert+renesas@glider.be Link: https://lkml.kernel.org/r/cf8fafaedccf96143f1513745c43a457480bfc24.1705934853.git.geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22flex_proportions: remove unused fprop_local_singleKemeng Shi
The single variant of flex_proportions is not used. Simply remove it. Link: https://lkml.kernel.org/r/20240118201321.759174-1-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22lib/sort: optimize heapsort with double-pop variationKuan-Wei Chiu
Instead of popping only the maximum element from the heap during each iteration, we now pop the two largest elements at once. Although this introduces an additional comparison to determine the second largest element, it enables a reduction in the height of the tree by one during the heapify operations starting from root's left/right child. This reduction in tree height by one leads to a decrease of one comparison and one swap. This optimization results in saving approximately 0.5 * n swaps without increasing the number of comparisons. Additionally, the heap size during heapify is now one less than the original size, offering a chance for further reduction in comparisons and swaps. The following experimental data is based on the array generated using get_random_u32(). | N | swaps (old) | swaps (new) | comparisons (old) | comparisons (new) | |-------|-------------|-------------|-------------------|-------------------| | 1000 | 9054 | 8569 | 10328 | 10320 | | 2000 | 20137 | 19182 | 22634 | 22587 | | 3000 | 32062 | 30623 | 35833 | 35752 | | 4000 | 44274 | 42282 | 49332 | 49306 | | 5000 | 57195 | 54676 | 63300 | 63294 | | 6000 | 70205 | 67202 | 77599 | 77557 | | 7000 | 83276 | 79831 | 92113 | 92032 | | 8000 | 96630 | 92678 | 106635 | 106617 | | 9000 | 110349 | 105883 | 121505 | 121404 | | 10000 | 124165 | 119202 | 136628 | 136617 | Link: https://lkml.kernel.org/r/20240113031352.2395118-3-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: George Spelvin <lkml@sdf.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22lib/sort: optimize heapsort for equal elements in sift-down pathKuan-Wei Chiu
Patch series "lib/sort: Optimize the number of swaps and comparisons". This patch series aims to optimize the heapsort algorithm, specifically targeting a reduction in the number of swaps and comparisons required. This patch (of 2): Currently, when searching for the sift-down path and encountering equal elements, the algorithm chooses the left child. However, considering that the height of the right subtree may be one less than that of the left subtree, selecting the right child in such cases can potentially reduce the number of comparisons and swaps. For instance, when sorting an array of 10,000 identical elements, the current implementation requires 247,209 comparisons. With this patch, the number of comparisons can be reduced to 227,241. Link: https://lkml.kernel.org/r/20240113031352.2395118-1-visitorckw@gmail.com Link: https://lkml.kernel.org/r/20240113031352.2395118-2-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22treewide: update LLVM Bugzilla linksNathan Chancellor
LLVM moved their issue tracker from their own Bugzilla instance to GitHub issues. While all of the links are still valid, they may not necessarily show the most up to date information around the issues, as all updates will occur on GitHub, not Bugzilla. Another complication is that the Bugzilla issue number is not always the same as the GitHub issue number. Thankfully, LLVM maintains this mapping through two shortlinks: https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num> https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num> Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the "https://llvm.org/pr<num>" shortlink so that the links show the most up to date information. Each migrated issue links back to the Bugzilla entry, so there should be no loss of fidelity of information here. Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Fangrui Song <maskray@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/udp.c f796feabb9f5 ("udp: add local "peek offset enabled" flag") 56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)") Adjacent changes: net/unix/garbage.c aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.") 11498715f266 ("af_unix: Remove io_uring code for GC.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22XArray: add cmpxchg order testDaniel Gomez
XArray multi-index entries do not keep track of the order stored once the entry is being marked as used with cmpxchg (conditionally replaced with NULL). Add a test to check the order is actually lost. The test also verifies the order and entries for all the tied indexes before and after the NULL replacement with xa_cmpxchg. Add another entry at 1 << order that keeps the node around and the order information for the NULL-entry after xa_cmpxchg. Link: https://lkml.kernel.org/r/20240131225125.1370598-3-mcgrof@kernel.org Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22test_xarray: add tests for advanced multi-index useLuis Chamberlain
Patch series "test_xarray: advanced API multi-index tests", v2. This is a respin of the test_xarray multi-index tests [0] which use and demonstrate the advanced API which is used by the page cache. This should let folks more easily follow how we use multi-index to support for example a min order later in the page cache. It also lets us grow the selftests to mimic more of what we do in the page cache. This patch (of 2): The multi index selftests are great but they don't replicate how we deal with the page cache exactly, which makes it a bit hard to follow as the page cache uses the advanced API. Add tests which use the advanced API, mimicking what we do in the page cache, while at it, extend the example to do what is needed for min order support. [mcgrof@kernel.org: fix soft lockup for advanced-api tests] Link: https://lkml.kernel.org/r/20240216194329.840555-1-mcgrof@kernel.org [akpm@linux-foundation.org: s/i/loops/, make non-static] [akpm@linux-foundation.org: restore static storage for loop counter] Link: https://lkml.kernel.org/r/20240131225125.1370598-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20240131225125.1370598-2-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Daniel Gomez <da.gomez@samsung.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22maple_tree: avoid duplicate variable init in mast_spanning_rebalance()Lukas Bulwahn
The local variables r_tmp and l_tmp in mast_spanning_rebalance() are already initialized at its declaration; there is no need to assign the value again. Remove the duplicate initialization of {r,l}_tmp. No functional change. Due to common compiler optimizations, also no change to object code. This issue was identified with clang-analyzer's dead stores analysis. Link: https://lkml.kernel.org/r/20240122102000.29558-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22Merge series 'Use Maple Trees for simple_offset utilities' of ↵Christian Brauner
https://lore.kernel.org/r/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net Pull simple offset series from Chuck Lever In an effort to address slab fragmentation issues reported a few months ago, I've replaced the use of xarrays for the directory offset map in "simple" file systems (including tmpfs). Thanks to Liam Howlett for helping me get this working with Maple Trees. * series 'Use Maple Trees for simple_offset utilities' of https://lore.kernel.org/r/170820083431.6328.16233178852085891453.stgit@91.116.238.104.host.secureserver.net: (6 commits) libfs: Convert simple directory offsets to use a Maple Tree test_maple_tree: testing the cyclic allocation maple_tree: Add mtree_alloc_cyclic() libfs: Add simple_offset_empty() libfs: Define a minimum directory offset libfs: Re-arrange locking in offset_iterate_dir() Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-21maple_tree: fix comment describing mas_node_count_gfp()Sidhartha Kumar
The function description comment for mas_node_count_gfp() mistakingly refers to the function as mas_node_count(). Change it to refer to the correct function. Link: https://lkml.kernel.org/r/20240109223119.162357-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-21test_maple_tree: testing the cyclic allocationLiam R. Howlett
This tests the interactions of the cyclic allocations, the maple state index and last, and overflow. Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/170820144894.6328.13052830860966450674.stgit@91.116.238.104.host.secureserver.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-21maple_tree: Add mtree_alloc_cyclic()Chuck Lever
I need a cyclic allocator for the simple_offset implementation in fs/libfs.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/170820144179.6328.12838600511394432325.stgit@91.116.238.104.host.secureserver.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-20string: Allow 2-argument strscpy()Kees Cook
Using sizeof(dst) for the "size" argument in strscpy() is the overwhelmingly common case. Instead of requiring this everywhere, allow a 2-argument version to be used that will use the sizeof() internally. There are other functions in the kernel with optional arguments[1], so this isn't unprecedented, and improves readability. Update and relocate the kern-doc for strscpy() too, and drop __HAVE_ARCH_STRSCPY as it is unused. Adjust ARCH=um build to notice the changed export name, as it doesn't do full header includes for the string helpers. This could additionally let us save a few hundred lines of code: 1177 files changed, 2455 insertions(+), 3026 deletions(-) with a treewide cleanup using Coccinelle: @needless_arg@ expression DST, SRC; @@ strscpy(DST, SRC -, sizeof(DST) ) Link: https://elixir.bootlin.com/linux/v6.7/source/include/linux/pci.h#L1517 [1] Reviewed-by: Justin Stitt <justinstitt@google.com> Cc: Andy Shevchenko <andy@kernel.org> Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-20string: Redefine strscpy_pad() as a macroKees Cook
In preparation for making strscpy_pad()'s 3rd argument optional, redefine it as a macro. This also has the benefit of allowing greater FORITFY introspection, as it couldn't see into the strscpy() nor the memset() within strscpy_pad(). Cc: Andy Shevchenko <andy@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-hardening@vger.kernel.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-20ubsan: Reintroduce signed overflow sanitizerKees Cook
In order to mitigate unexpected signed wrap-around[1], bring back the signed integer overflow sanitizer. It was removed in commit 6aaa31aeb9cf ("ubsan: remove overflow checks") because it was effectively a no-op when combined with -fno-strict-overflow (which correctly changes signed overflow from being "undefined" to being explicitly "wrap around"). Compilers are adjusting their sanitizers to trap wrap-around and to detecting common code patterns that should not be instrumented (e.g. "var + offset < var"). Prepare for this and explicitly rename the option from "OVERFLOW" to "WRAP" to more accurately describe the behavior. To annotate intentional wrap-around arithmetic, the helpers wrapping_add/sub/mul_wrap() can be used for individual statements. At the function level, the __signed_wrap attribute can be used to mark an entire function as expecting its signed arithmetic to wrap around. For a single object file the Makefile can use "UBSAN_SIGNED_WRAP_target.o := n" to mark it as wrapping, and for an entire directory, "UBSAN_SIGNED_WRAP := n" can be used. Additionally keep these disabled under CONFIG_COMPILE_TEST for now. Link: https://github.com/KSPP/linux/issues/26 [1] Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hao Luo <haoluo@google.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-20lib/Kconfig.debug: TEST_IOV_ITER depends on MMUGuenter Roeck
Trying to run the iov_iter unit test on a nommu system such as the qemu kc705-nommu emulation results in a crash. KTAP version 1 # Subtest: iov_iter # module: kunit_iov_iter 1..9 BUG: failure at mm/nommu.c:318/vmap()! Kernel panic - not syncing: BUG! The test calls vmap() directly, but vmap() is not supported on nommu systems, causing the crash. TEST_IOV_ITER therefore needs to depend on MMU. Link: https://lkml.kernel.org/r/20240208153010.1439753-1-linux@roeck-us.net Fixes: 2d71340ff1d4 ("iov_iter: Kunit tests for copying to/from an iterator") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-20treewide: replace or remove redundant def_bool in Kconfig filesMasahiro Yamada
'def_bool X' is a shorthand for 'bool' plus 'default X'. 'def_bool' is redundant where 'bool' is already present, so 'def_bool X' can be replaced with 'default X', or removed if X is 'n'. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-02-19Merge 6.8-rc5 into tty-nextGreg Kroah-Hartman
We need the serial/tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-19Merge 6.8-rc5 into driver-core-nextGreg Kroah-Hartman
We need the driver core changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-17Merge tag 'driver-core-6.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are some driver core fixes, a kobject fix, and a documentation update for 6.8-rc5. In detail these changes are: - devlink fixes for reported issues with 6.8-rc1 - topology scheduling regression fix that has been reported by many - kobject loosening of checks change in -rc1 is now reverted as some codepaths seemed to need the checks - documentation update for the CVE process. Has been reviewed by many, the last minute change to the document was to bring the .rst format back into the the new style rules, the contents did not change. All of these, except for the documentation update, have been in linux-next for over a week. The documentation update has been reviewed for weeks by a group of developers, and in public for a week and the wording has stabilized for now. If future changes are needed, we can do so before 6.8-final is out (or anytime after that)" * tag 'driver-core-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Documentation: Document the Linux Kernel CVE process Revert "kobject: Remove redundant checks for whether ktype is NULL" driver core: fw_devlink: Improve logs for cycle detection driver core: fw_devlink: Improve detection of overlapping cycles driver core: Fix device_link_flag_is_sync_state_only() topology: Set capacity_freq_ref in all cases
2024-02-17kobject: reduce uevent_sock_mutex scopeEric Dumazet
This is a followup of commit a3498436b3a0 ("netns: restrict uevents") - uevent_sock_mutex no longer protects uevent_seqnum thanks to prior patch in the series. - uevent_net_broadcast() can run without holding uevent_sock_mutex. - Instead of grabbing uevent_sock_mutex before calling kobject_uevent_net_broadcast(), we can move the mutex_lock(&uevent_sock_mutex) to the place we iterate over uevent_sock_list : uevent_net_broadcast_untagged(). After this patch, typical netdevice creations and destructions calling uevent_net_broadcast_tagged() no longer need to acquire uevent_sock_mutex. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christian Brauner <brauner@kernel.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240214084829.684541-3-edumazet@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-17kobject: make uevent_seqnum atomicEric Dumazet
We will soon no longer acquire uevent_sock_mutex for most kobject_uevent_net_broadcast() calls, and also while calling uevent_net_broadcast(). Make uevent_seqnum an atomic64_t to get its own protection. This fixes a race while reading /sys/kernel/uevent_seqnum. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christian Brauner <brauner@kernel.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240214084829.684541-2-edumazet@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-16Merge tag 'trace-v6.8-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix the #ifndef that didn't have the 'CONFIG_' prefix on HAVE_DYNAMIC_FTRACE_WITH_REGS The fix to have dynamic trampolines work with x86 broke arm64 as the config used in the #ifdef was HAVE_DYNAMIC_FTRACE_WITH_REGS and not CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which removed the fix that the previous fix was to fix. - Fix tracing_on state The code to test if "tracing_on" is set incorrectly used ring_buffer_record_is_on() which returns false if the ring buffer isn't able to be written to. But the ring buffer disable has several bits that disable it. One is internal disabling which is used for resizing and other modifications of the ring buffer. But the "tracing_on" user space visible flag should only report if tracing is actually on and not internally disabled, as this can cause confusion as writing "1" when it is disabled will not enable it. Instead use ring_buffer_record_is_set_on() which shows the user space visible settings. - Fix a false positive kmemleak on saved cmdlines Now that the saved_cmdlines structure is allocated via alloc_page() and not via kmalloc() it has become invisible to kmemleak. The allocation done to one of its pointers was flagged as a dangling allocation leak. Make kmemleak aware of this allocation and free. - Fix synthetic event dynamic strings An update that cleaned up the synthetic event code removed the return value of trace_string(), and had it return zero instead of the length, causing dynamic strings in the synthetic event to always have zero size. - Clean up documentation and header files for seq_buf * tag 'trace-v6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: seq_buf: Fix kernel documentation seq_buf: Don't use "proxy" headers tracing/synthetic: Fix trace_string() return value tracing: Inform kmemleak of saved_cmdlines allocation tracing: Use ring_buffer_record_is_set_on() in tracer_tracing_is_on() tracing: Fix HAVE_DYNAMIC_FTRACE_WITH_REGS ifdef
2024-02-16s390/raid6: convert to use standard fpu_*() inline assembliesHeiko Carstens
Move the s390 specific raid6 inline assemblies, make them generic, and reuse them to implement the raid6 gen/xor implementation. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: decrease stack usage for some casesHeiko Carstens
The kernel_fpu structure has a quite large size of 520 bytes. In order to reduce stack footprint introduce several kernel fpu structures with different and also smaller sizes. This way every kernel fpu user must use the correct variant. A compile time check verifies that the correct variant is used. There are several users which use only 16 instead of all 32 vector registers. For those users the new kernel_fpu_16 structure with a size of only 266 bytes can be used. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: move, rename, and merge header filesHeiko Carstens
Move, rename, and merge the fpu and vx header files. This way fpu header files have a consistent naming scheme (fpu*.h). Also get rid of the fpu subdirectory and move header files to asm directory, so that all fpu and vx header files can be found at the same location. Merge internal.h header file into other header files, since the internal helpers are used at many locations. so those helper functions are really not internal. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/dev.c 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()") 723de3ebef03 ("net: free altname using an RCU callback") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") drivers/net/ethernet/renesas/ravb_main.c ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path" ) c2da9408579d ("ravb: Add Rx checksum offload support for GbEth") net/mptcp/protocol.c bdd70eb68913 ("mptcp: drop the push_pending field") 28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-15seq_buf: Fix kernel documentationAndy Shevchenko
There are plenty of issues with the kernel documentation here: - misspelled word "sequence" - different style of returned value descriptions - missed Return sections - unaligned style of ASCII / NUL-terminated / etc - wrong function references Fix all these. Link: https://lkml.kernel.org/r/20240215152506.598340-1-andriy.shevchenko@linux.intel.com Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-15seq_buf: Don't use "proxy" headersAndy Shevchenko
Update header inclusions to follow IWYU (Include What You Use) principle. Link: https://lkml.kernel.org/r/20240215142255.400264-1-andriy.shevchenko@linux.intel.com Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-14Merge tag 'linux_kselftest-kunit-fixes-6.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fix from Shuah Khan: "One important fix to unregister kunit_bus when KUnit module is unloaded. Not doing so causes an error when KUnit module tries to re-register the bus when it gets reloaded" * tag 'linux_kselftest-kunit-fixes-6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: device: Unregister the kunit_bus on shutdown
2024-02-12PCI: Move PCI-specific devres code to drivers/pci/Philipp Stanner
The pcim_*() functions in lib/devres.c are guarded by an #ifdef CONFIG_PCI and, thus, don't belong to this file. They are only ever used for PCI and are not generic infrastructure. Move all pcim_*() functions in lib/devres.c to drivers/pci/devres.c. Adjust the Makefile. Add drivers/pci/devres.c to Documentation. Link: https://lore.kernel.org/r/20240131090023.12331-4-pstanner@redhat.com Suggested-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-02-12PCI: Move pci_iomap.c to drivers/pci/Philipp Stanner
The entirety of pci_iomap.c is guarded by an #ifdef CONFIG_PCI. It, consequently, does not belong to lib/ because it is not generic infrastructure. Move pci_iomap.c to drivers/pci/ and implement the necessary changes to Makefiles and Kconfigs. Update MAINTAINERS file. Update Documentation. Link: https://lore.kernel.org/r/20240131090023.12331-3-pstanner@redhat.com [bhelgaas: squash in https://lore.kernel.org/r/20240212150934.24559-1-pstanner@redhat.com] Suggested-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2024-02-09s390/fpu: make use of __uninitialized macroHeiko Carstens
Code sections in s390 specific kernel code which use floating point or vector registers all come with a 520 byte stack variable to save already in use registers, if required. With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled this variable will always be initialized on function entry in addition to saving register contents, which contradicts the intention (performance improvement) of such code sections. Therefore provide a DECLARE_KERNEL_FPU_ONSTACK() macro which provides struct kernel_fpu variables with an __uninitialized attribute, and convert all existing code to use this. This way only this specific type of stack variable will not be initialized, regardless of config options. Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240205154844.3757121-3-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-08Revert "kobject: Remove redundant checks for whether ktype is NULL"Greg Kroah-Hartman
This reverts commit 1b28cb81dab7c1eedc6034206f4e8d644046ad31. It is reported to cause problems, so revert it for now until the root cause can be found. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 1b28cb81dab7 ("kobject: Remove redundant checks for whether ktype is NULL") Cc: Zhen Lei <thunder.leizhen@huawei.com> Closes: https://lore.kernel.org/oe-lkp/202402071403.e302e33a-oliver.sang@intel.com Link: https://lore.kernel.org/r/2024020849-consensus-length-6264@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-07dump_stack: Do not get cpu_sync for panic CPUJohn Ogness
dump_stack() is called in panic(). If for some reason another CPU is holding the printk_cpu_sync and is unable to release it, the panic CPU will be unable to continue and print the stacktrace. Since non-panic CPUs are not allowed to store new printk messages anyway, there is no need to synchronize the stacktrace output in a panic situation. For the panic CPU, do not get the printk_cpu_sync because it is not needed and avoids a potential deadlock scenario in panic(). Link: https://lore.kernel.org/lkml/ZcIGKU8sxti38Kok@alley Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240207134103.1357162-15-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-02-06kunit: device: Unregister the kunit_bus on shutdownDavid Gow
If KUnit is built as a module, and it's unloaded, the kunit_bus is not unregistered. This causes an error if it's then re-loaded later, as we try to re-register the bus. Unregister the bus and root_device on shutdown, if it looks valid. In addition, be more specific about the value of kunit_bus_device. It is: - a valid struct device* if the kunit_bus initialised correctly. - an ERR_PTR if it failed to initialise. - NULL before initialisation and after shutdown. Fixes: d03c720e03bd ("kunit: Add APIs for managing devices") Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Rae Moar <rmoar@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-06ubsan: Remove CONFIG_UBSAN_SANITIZE_ALLKees Cook
For simplicity in splitting out UBSan options into separate rules, remove CONFIG_UBSAN_SANITIZE_ALL, effectively defaulting to "y", which is how it is generally used anyway. (There are no ":= y" cases beyond where a specific file is enabled when a top-level ":= n" is in effect.) Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Marco Elver <elver@google.com> Cc: linux-doc@vger.kernel.org Cc: linux-kbuild@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-06ubsan: Silence W=1 warnings in self-testKees Cook
Silence a handful of W=1 warnings in the UBSan selftest, which set variables without using them. For example: lib/test_ubsan.c:101:6: warning: variable 'val1' set but not used [-Wunused-but-set-variable] 101 | int val1 = 10; | ^ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401310423.XpCIk6KO-lkp@intel.com/ Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-05net: blackhole_dev: fix build warning for ethh set but not usedBreno Leitao
lib/test_blackhole_dev.c sets a variable that is never read, causing this following building warning: lib/test_blackhole_dev.c:32:17: warning: variable 'ethh' set but not used [-Wunused-but-set-variable] Remove the variable struct ethhdr *ethh, which is unused. Fixes: 509e56b37cc3 ("blackhole_dev: add a selftest") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-04Merge 6.8-rc3 into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-02lib/test_kmod: fix kernel-doc warningsRandy Dunlap
Fix all kernel-doc warnings in test_kmod.c: - Mark some enum values as private so that kernel-doc is not needed for them - s/thread_mutex/thread_lock/ in a struct's kernel-doc comments - add kernel-doc info for @task_sync test_kmod.c:67: warning: Enum value '__TEST_KMOD_INVALID' not described in enum 'kmod_test_case' test_kmod.c:67: warning: Enum value '__TEST_KMOD_MAX' not described in enum 'kmod_test_case' test_kmod.c:100: warning: Function parameter or member 'task_sync' not described in 'kmod_test_device_info' test_kmod.c:134: warning: Function parameter or member 'thread_mutex' not described in 'kmod_test_device' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: <linux-modules@vger.kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-02-02iov_iter: Avoid wrap-around instrumentation in copy_compat_iovec_from_user()Kees Cook
The loop counter "i" in copy_compat_iovec_from_user() is an int, but because the nr_segs argument is unsigned long, the signed overflow sanitizer got worried "i" could wrap around. Instead of making "i" an unsigned long (which may enlarge the type size), switch both nr_segs and i to u32. There is no truncation with nr_segs since it is never larger than UIO_MAXIOV anyway. This keeps sanitizer instrumentation[1] out of a UACCESS path: vmlinux.o: warning: objtool: copy_compat_iovec_from_user+0xa9: call to __ubsan_handle_add_overflow() with UACCESS enabled Link: https://github.com/KSPP/linux/issues/26 [1] Cc: Christian Brauner <brauner@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240129183729.work.991-kees@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>