path: root/lib
AgeCommit message (Collapse)Author
10 daysMerge branch 'fixes' of git:// Torvalds
Pull misc fixes from Al Viro: "vhost race fix and a percpu_ref_init-caused cgroup double-free fix. The latter had manifested as buggered struct mount refcounting - those are also using percpu data structures, but anything that does percpu allocations could be hit" * 'fixes' of git:// Fix double fget() in vhost_net_set_backend() percpu_ref_init(): clean ->percpu_count_ref on failure
11 dayspercpu_ref_init(): clean ->percpu_count_ref on failureAl Viro
That way percpu_ref_exit() is safe after failing percpu_ref_init(). At least one user (cgroup_create()) had a double-free that way; there might be other similar bugs. Easier to fix in percpu_ref_init(), rather than playing whack-a-mole in sloppy users... Usual symptoms look like a messed refcounting in one of subsystems that use percpu allocations (might be percpu-refcount, might be something else). Having refcounts for two different objects share memory is Not Nice(tm)... Reported-by: Signed-off-by: Al Viro <>
2022-05-09dim: initialize all struct fieldsJesse Brandeburg
The W=2 build pointed out that the code wasn't initializing all the variables in the dim_cq_moder declarations with the struct initializers. The net change here is zero since these structs were already static const globals and were initialized with zeros by the compiler, but removing compiler warnings has value in and of itself. lib/dim/net_dim.c: At top level: lib/dim/net_dim.c:54:9: warning: missing initializer for field ‘comps’ of ‘const struct dim_cq_moder’ [-Wmissing-field-initializers] 54 | NET_DIM_RX_EQE_PROFILES, | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from lib/dim/net_dim.c:6: ./include/linux/dim.h:45:13: note: ‘comps’ declared here 45 | u16 comps; | ^~~~~ and repeats for the tx struct, and once you fix the comps entry then the cq_period_mode field needs the same treatment. Use the commonly accepted style to indicate to the compiler that we know what we're doing, and add a comma at the end of each struct initializer to clean up the issue, and use explicit initializers for the fields we are initializing which makes the compiler happy. While here and fixing these lines, clean up the code slightly with a fix for the super long lines by removing the word "_MODERATION" from a couple defines only used in this file. Fixes: f8be17b81d44 ("lib/dim: Fix -Wunused-const-variable warnings") Signed-off-by: Jesse Brandeburg <> Link: Signed-off-by: Jakub Kicinski <>
2022-05-01Merge tag 'x86_urgent_for_v5.18_rc5' of ↵Linus Torvalds
git:// Pull x86 fixes from Borislav Petkov: - A fix to disable PCI/MSI[-X] masking for XEN_HVM guests as that is solely controlled by the hypervisor - A build fix to make the function prototype (__warn()) as visible as the definition itself - A bunch of objtool annotation fixes which have accumulated over time - An ORC unwinder fix to handle bad input gracefully - Well, we thought the microcode gets loaded in time in order to restore the microcode-emulated MSRs but we thought wrong. So there's a fix for that to have the ordering done properly - Add new Intel model numbers - A spelling fix * tag 'x86_urgent_for_v5.18_rc5' of git:// x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests bug: Have __warn() prototype defined unconditionally x86/Kconfig: fix the spelling of 'becoming' in X86_KERNEL_IBT config objtool: Use offstr() to print address of missing ENDBR objtool: Print data address for "!ENDBR" data warnings x86/xen: Add ANNOTATE_NOENDBR to startup_xen() x86/uaccess: Add ENDBR to __put_user_nocheck*() x86/retpoline: Add ANNOTATE_NOENDBR for retpolines x86/static_call: Add ANNOTATE_NOENDBR to static call trampoline objtool: Enable unreachable warnings for CLANG LTO x86,objtool: Explicitly mark idtentry_body()s tail REACHABLE x86,objtool: Mark cpu_startup_entry() __noreturn x86,xen,objtool: Add UNWIND hint lib/strn*,objtool: Enforce user_access_begin() rules MAINTAINERS: Add x86 unwinding entry x86/unwind/orc: Recheck address range after stack info was updated x86/cpu: Load microcode during restore_processor_state() x86/cpu: Add new Alderlake and Raptorlake CPU model numbers
2022-04-27hex2bin: fix access beyond string endMikulas Patocka
If we pass too short string to "hex2bin" (and the string size without the terminating NUL character is even), "hex2bin" reads one byte after the terminating NUL character. This patch fixes it. Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on error - so we can't just return the variable "hi" or "lo" on error. This inconsistency may be fixed in the next merge window, but for the purpose of fixing this bug, we just preserve the existing behavior and return -1 and -EINVAL. Signed-off-by: Mikulas Patocka <> Reviewed-by: Andy Shevchenko <> Fixes: b78049831ffe ("lib: add error checking to hex2bin") Cc: Signed-off-by: Linus Torvalds <>
2022-04-27hex2bin: make the function hex_to_bin constant-timeMikulas Patocka
The function hex2bin is used to load cryptographic keys into device mapper targets dm-crypt and dm-integrity. It should take constant time independent on the processed data, so that concurrently running unprivileged code can't infer any information about the keys via microarchitectural convert channels. This patch changes the function hex_to_bin so that it contains no branches and no memory accesses. Note that this shouldn't cause performance degradation because the size of the new function is the same as the size of the old function (on x86-64) - and the new function causes no branch misprediction penalties. I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64 i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32 sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64 powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are no branches in the generated code. Signed-off-by: Mikulas Patocka <> Cc: Signed-off-by: Linus Torvalds <>
2022-04-22XArray: Disallow sibling entries of nodesMatthew Wilcox (Oracle)
There is a race between xas_split() and xas_load() which can result in the wrong page being returned, and thus data corruption. Fortunately, it's hard to hit (syzbot took three months to find it) and often guarded with VM_BUG_ON(). The anatomy of this race is: thread A thread B order-9 page is stored at index 0x200 lookup of page at index 0x274 page split starts load of sibling entry at offset 9 stores nodes at offsets 8-15 load of entry at offset 8 The entry at offset 8 turns out to be a node, and so we descend into it, and load the page at index 0x234 instead of 0x274. This is hard to fix on the split side; we could replace the entire node that contains the order-9 page instead of replacing the eight entries. Fixing it on the lookup side is easier; just disallow sibling entries that point to nodes. This cannot ever be a useful thing as the descent would not know the correct offset to use within the new node. The test suite continues to pass, but I have not added a new test for this bug. Reported-by: Tested-by: Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Signed-off-by: Matthew Wilcox (Oracle) <>
2022-04-19lib/strn*,objtool: Enforce user_access_begin() rulesPeter Zijlstra
Apparently GCC can fail to inline a 'static inline' single caller function: lib/strnlen_user.o: warning: objtool: strnlen_user()+0x33: call to do_strnlen_user() with UACCESS enabled lib/strncpy_from_user.o: warning: objtool: strncpy_from_user()+0x33: call to do_strncpy_from_user() with UACCESS enabled Reported-by: Thomas Gleixner <> Signed-off-by: Peter Zijlstra (Intel) <> Acked-by: Josh Poimboeuf <> Link:
2022-04-10Merge tag 'driver-core-5.18-rc2' of ↵Linus Torvalds
git:// Pull driver core updates from Greg KH: "Here are two small driver core changes for 5.18-rc2. They are the final bits in the removal of the default_attrs field in struct kobj_type. I had to wait until after 5.18-rc1 for all of the changes to do this came in through different development trees, and then one new user snuck in. So this series has two changes: - removal of the default_attrs field in the powerpc/pseries/vas code. The change has been acked by the PPC maintainers to come through this tree - removal of default_attrs from struct kobj_type now that all in-kernel users are removed. This cleans up the kobject code a little bit and removes some duplicated functionality that confused people (now there is only one way to do default groups) Both of these have been in linux-next for all of this week with no reported problems" * tag 'driver-core-5.18-rc2' of git:// kobject: kobj_type: remove default_attrs powerpc/pseries/vas: use default_groups in kobj_type
2022-04-08lz4: fix LZ4_decompress_safe_partial read out of boundGuo Xuenan
When partialDecoding, it is EOF if we've either filled the output buffer or can't proceed with reading an offset for following match. In some extreme corner cases when compressed data is suitably corrupted, UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial may lead to read out of bound problem during decoding. lz4 upstream has fixed it [2] and this issue has been disscussed here [3] before. current decompression routine was ported from lz4 v1.8.3, bumping lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd better fix it first. [1] [2] [3] Link: Reported-by: Signed-off-by: Guo Xuenan <> Reviewed-by: Nick Terrell <> Acked-by: Gao Xiang <> Cc: Yann Collet <> Cc: Chengyang Fan <> Cc: <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-04-05kobject: kobj_type: remove default_attrsGreg Kroah-Hartman
Now that all in-kernel users of default_attrs for the kobj_type are gone and converted to properly use the default_groups pointer instead, it can be safely removed. There is one standard way to create sysfs files in a kobj_type, and not two like before, causing confusion as to which should be used. Cc: "Rafael J. Wysocki" <> Link: Signed-off-by: Greg Kroah-Hartman <>
2022-04-01Merge tag 'for-5.18/block-2022-04-01' of git:// Torvalds
Pull block fixes from Jens Axboe: "Either fixes or a few additions that got missed in the initial merge window pull. In detail: - List iterator fix to avoid leaking value post loop (Jakob) - One-off fix in minor count (Christophe) - Fix for a regression in how io priority setting works for an exiting task (Jiri) - Fix a regression in this merge window with blkg_free() being called in an inappropriate context (Ming) - Misc fixes (Ming, Tom)" * tag 'for-5.18/block-2022-04-01' of git:// blk-wbt: remove wbt_track stub block: use dedicated list iterator variable block: Fix the maximum minor value is blk_alloc_ext_minor() block: restore the old set_task_ioprio() behaviour wrt PF_EXITING block: avoid calling blkg_free() in atomic context lib/sbitmap: allocate sb->map via kvzalloc_node
2022-04-01Merge tag 'xarray-5.18' of git:// Torvalds
Pull XArray updates from Matthew Wilcox: - Documentation update - Fix test-suite build after move of bitmap.h - Fix xas_create_range() when a large entry is already present - Fix xas_split() of a shadow entry * tag 'xarray-5.18' of git:// XArray: Update the LRU list in xas_split() XArray: Fix xas_create_range() when multi-order entry present XArray: Include bitmap.h from xarray.h XArray: Document the locking requirement for the xa_state
2022-03-31Merge tag 'for-linus-5.18-rc1' of ↵Linus Torvalds
git:// Pull UML updates from Richard Weinberger: - Devicetree support (for testing) - Various cleanups and fixes: UBD, port_user, uml_mconsole - Maintainer update * tag 'for-linus-5.18-rc1' of git:// um: run_helper: Write error message to kernel log on exec failure on host um: port_user: Improve error handling when port-helper is not found um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar um: port_user: Search for in.telnetd in PATH um: clang: Strip out -mno-global-merge from USER_CFLAGS docs: UML: Mention telnetd for port channel um: Remove unused timeval_to_ns() function um: Fix uml_mconsole stop/go um: Cleanup syscall_handler_t definition/cast, fix warning uml: net: vector: fix const issue um: Fix WRITE_ZEROES in the UBD Driver um: Migrate vector drivers to NAPI um: Fix order of dtb unflatten/early init um: fix and optimize xor select template for CONFIG64 and timetravel mode um: Document dtb command line option lib/logic_iomem: correct fallback config references um: Remove duplicated include in syscalls_64.c MAINTAINERS: Update UserModeLinux entry
2022-03-31XArray: Update the LRU list in xas_split()Matthew Wilcox (Oracle)
When splitting a value entry, we may need to add the new nodes to the LRU list and remove the parent node from the LRU list. The WARN_ON checks in shadow_lru_isolate() catch this oversight. This bug was latent until we stopped splitting folios in shrink_page_list() with commit 820c4e2e6f51 ("mm/vmscan: Free non-shmem folios without splitting them"). That allows the creation of large shadow entries, and subsequently when trying to page in a small page, we will split the large shadow entry in __filemap_add_folio(). Fixes: 8fc75643c5e1 ("XArray: add xas_split") Reported-by: Hugh Dickins <> Signed-off-by: Matthew Wilcox (Oracle) <>
2022-03-29lib/test: use after free in register_test_dev_kmod()Dan Carpenter
The "test_dev" pointer is freed but then returned to the caller. Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader") Signed-off-by: Dan Carpenter <> Signed-off-by: Luis Chamberlain <>
2022-03-28XArray: Fix xas_create_range() when multi-order entry presentMatthew Wilcox (Oracle)
If there is already an entry present that is of order >= XA_CHUNK_SHIFT when we call xas_create_range(), xas_create_range() will misinterpret that entry as a node and dereference xa_node->parent, generally leading to a crash that looks something like this: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0 RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline] RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725 It's deterministically reproducable once you know what the problem is, but producing it in a live kernel requires khugepaged to hit a race. While the problem has been present since xas_create_range() was introduced, I'm not aware of a way to hit it before the page cache was converted to use multi-index entries. Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Reported-by: Signed-off-by: Matthew Wilcox (Oracle) <>
2022-03-26Merge tag 'memcpy-v5.18-rc1' of ↵Linus Torvalds
git:// Pull FORTIFY_SOURCE updates from Kees Cook: "This series consists of two halves: - strict compile-time buffer size checking under FORTIFY_SOURCE for the memcpy()-family of functions (for extensive details and rationale, see the first commit) - enabling FORTIFY_SOURCE for Clang, which has had many overlapping bugs that we've finally worked past" * tag 'memcpy-v5.18-rc1' of git:// fortify: Add Clang support fortify: Make sure strlen() may still be used as a constant expression fortify: Use __diagnose_as() for better diagnostic coverage fortify: Make pointer arguments const Compiler Attributes: Add __diagnose_as for Clang Compiler Attributes: Add __overloadable for Clang Compiler Attributes: Add __pass_object_size for Clang fortify: Replace open-coded __gnu_inline attribute fortify: Update compile-time tests for Clang 14 fortify: Detect struct member overflows in memset() at compile-time fortify: Detect struct member overflows in memmove() at compile-time fortify: Detect struct member overflows in memcpy() at compile-time
2022-03-26Merge tag 'for-5.18/64bit-pi-2022-03-25' of git:// Torvalds
Pull block layer 64-bit data integrity support from Jens Axboe: "This adds support for 64-bit data integrity in the block layer and in NVMe" * tag 'for-5.18/64bit-pi-2022-03-25' of git:// crypto: fix crc64 testmgr digest byte order nvme: add support for enhanced metadata block: add pi for extended integrity crypto: add rocksoft 64b crc guard tag framework lib: add rocksoft model crc64 linux/kernel: introduce lower_48_bits function asm-generic: introduce be48 unaligned accessors nvme: allow integrity on extended metadata formats block: support pi with extended metadata
2022-03-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: "This is the material which was staged after willystuff in linux-next. Subsystems affected by this patch series: mm (debug, selftests, pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise), and selftests" * emailed patches from Andrew Morton <>: (113 commits) selftests: kselftest framework: provide "finished" helper mm: madvise: MADV_DONTNEED_LOCKED mm: fix race between MADV_FREE reclaim and blkdev direct IO read mm: generalize ARCH_HAS_FILTER_PGPROT mm: unmap_mapping_range_tree() with i_mmap_rwsem shared mm: warn on deleting redirtied only if accounted mm/huge_memory: remove stale locking logic from __split_huge_pmd() mm/huge_memory: remove stale page_trans_huge_mapcount() mm/swapfile: remove stale reuse_swap_page() mm/khugepaged: remove reuse_swap_page() usage mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() mm: streamline COW logic in do_swap_page() mm: slightly clarify KSM logic in do_swap_page() mm: optimize do_wp_page() for fresh pages in local LRU pagevecs mm: optimize do_wp_page() for exclusive pages in the swapcache mm/huge_memory: make is_transparent_hugepage() static userfaultfd/selftests: enable hugetlb remap and remove event testing selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test mm: enable MADV_DONTNEED for hugetlb mappings kasan: disable LOCKDEP when printing reports ...
2022-03-24kasan: update function name in commentsPeter Collingbourne
The function kasan_global_oob was renamed to kasan_global_oob_right, but the comments referring to it were not updated. Do so. Link: Link: Signed-off-by: Peter Collingbourne <> Reviewed-by: Miaohe Lin <> Reviewed-by: Marco Elver <> Reviewed-by: Andrey Konovalov <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-24kasan: test: support async (again) and asymm modes for HW_TAGSAndrey Konovalov
Async mode support has already been implemented in commit e80a76aa1a91 ("kasan, arm64: tests supports for HW_TAGS async mode") but then got accidentally broken in commit 99734b535d9b ("kasan: detect false-positives in tests"). Restore the changes removed by the latter patch and adapt them for asymm mode: add a sync_fault flag to kunit_kasan_expectation that only get set if the MTE fault was synchronous, and reenable MTE on such faults in tests. Also rename kunit_kasan_expectation to kunit_kasan_status and move its definition to mm/kasan/kasan.h from include/linux/kasan.h, as this structure is only internally used by KASAN. Also put the structure definition under IS_ENABLED(CONFIG_KUNIT). Link: Signed-off-by: Andrey Konovalov <> Cc: Marco Elver <> Cc: Alexander Potapenko <> Cc: Dmitry Vyukov <> Cc: Andrey Ryabinin <> Cc: Vincenzo Frascino <> Cc: Catalin Marinas <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-24kasan: improve vmalloc testsAndrey Konovalov
Update the existing vmalloc_oob() test to account for the specifics of the tag-based modes. Also add a few new checks and comments. Add new vmalloc-related tests: - vmalloc_helpers_tags() to check that exported vmalloc helpers can handle tagged pointers. - vmap_tags() to check that SW_TAGS mode properly tags vmap() mappings. - vm_map_ram_tags() to check that SW_TAGS mode properly tags vm_map_ram() mappings. - vmalloc_percpu() to check that SW_TAGS mode tags regions allocated for __alloc_percpu(). The tagging of per-cpu mappings is best-effort; proper tagging is tracked in [1]. [1] [ similar to "kasan: test: fix compatibility with FORTIFY_SOURCE"] Link: Link: [ set_memory_rw/ro() are not exported to modules] Link: [ fix build] Cc: Andrey Konovalov <> [ vmap_tags() and vm_map_ram_tags() pass invalid page array size] Link: Signed-off-by: Andrey Konovalov <> Signed-off-by: Stephen Rothwell <> Acked-by: Marco Elver <> Cc: Alexander Potapenko <> Cc: Andrey Ryabinin <> Cc: Catalin Marinas <> Cc: Dmitry Vyukov <> Cc: Evgenii Stepanov <> Cc: Mark Rutland <> Cc: Peter Collingbourne <> Cc: Vincenzo Frascino <> Cc: Will Deacon <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-24kasan: allow enabling KASAN_VMALLOC and SW/HW_TAGSAndrey Konovalov
Allow enabling CONFIG_KASAN_VMALLOC with SW_TAGS and HW_TAGS KASAN modes. Also adjust CONFIG_KASAN_VMALLOC description: - Mention HW_TAGS support. - Remove unneeded internal details: they have no place in Kconfig description and are already explained in the documentation. Link: Signed-off-by: Andrey Konovalov <> Acked-by: Marco Elver <> Cc: Alexander Potapenko <> Cc: Andrey Ryabinin <> Cc: Catalin Marinas <> Cc: Dmitry Vyukov <> Cc: Evgenii Stepanov <> Cc: Mark Rutland <> Cc: Peter Collingbourne <> Cc: Vincenzo Frascino <> Cc: Will Deacon <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-24lib/vsprintf: avoid redundant work with 0 sizeWaiman Long
Patch series "mm/page_owner: Extend page_owner to show memcg information", v4. While debugging the constant increase in percpu memory consumption on a system that spawned large number of containers, it was found that a lot of offline mem_cgroup structures remained in place without being freed. Further investigation indicated that those mem_cgroup structures were pinned by some pages. In order to find out what those pages are, the existing page_owner debugging tool is extended to show memory cgroup information and whether those memcgs are offline or not. With the enhanced page_owner tool, the following is a typical page that pinned the mem_cgroup structure in my test case: Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 162970 (podman), ts 1097761405537 ns, free_ts 1097760838089 ns PFN 1925700 type Movable Block 3761 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff) prep_new_page+0xac/0xe0 get_page_from_freelist+0x1327/0x14d0 __alloc_pages+0x191/0x340 alloc_pages_vma+0x84/0x250 shmem_alloc_page+0x3f/0x90 shmem_alloc_and_acct_page+0x76/0x1c0 shmem_getpage_gfp+0x281/0x940 shmem_write_begin+0x36/0xe0 generic_perform_write+0xed/0x1d0 __generic_file_write_iter+0xdc/0x1b0 generic_file_write_iter+0x5d/0xb0 new_sync_write+0x11f/0x1b0 vfs_write+0x1ba/0x2a0 ksys_write+0x59/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Charged to offline memcg libpod-conmon-15e4f9c758422306b73b2dd99f9d50a5ea53cbb16b4a13a2c2308a4253cc0ec8. So the page was not freed because it was part of a shmem segment. That is useful information that can help users to diagnose similar problems. With cgroup v1, /proc/cgroups can be read to find out the total number of memory cgroups (online + offline). With cgroup v2, the cgroup.stat of the root cgroup can be read to find the number of dying cgroups (most likely pinned by dying memcgs). The page_owner feature is not supposed to be enabled for production system due to its memory overhead. However, if it is suspected that dying memcgs are increasing over time, a test environment with page_owner enabled can then be set up with appropriate workload for further analysis on what may be causing the increasing number of dying memcgs. This patch (of 4): For *scnprintf(), vsnprintf() is always called even if the input size is 0. That is a waste of time, so just return 0 in this case. Note that vsnprintf() will never return -1 to indicate an error. So skipping the call to vsnprintf() when size is 0 will have no functional impact at all. Link: Link: Signed-off-by: Waiman Long <> Acked-by: David Rientjes <> Reviewed-by: Sergey Senozhatsky <> Acked-by: Roman Gushchin <> Acked-by: Rafael Aquini <> Acked-by: Mike Rapoport <> Cc: Roman Gushchin <> Cc: Johannes Weiner <> Cc: Michal Hocko <> Cc: Vladimir Davydov <> Cc: Petr Mladek <> Cc: Steven Rostedt (Google) <> Cc: Andy Shevchenko <> Cc: Rasmus Villemoes <> Cc: Ira Weiny <> Cc: David Rientjes <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-24Merge tag 'cxl-for-5.18' of ↵Linus Torvalds
git:// Pull CXL (Compute Express Link) updates from Dan Williams: "This development cycle extends the subsystem to discover CXL resources throughout a CXL/PCIe switch topology and respond to hot add/remove events anywhere in that topology. This is more foundational infrastructure in preparation for dynamic memory region provisioning support. Recall that CXL memory regions, as the new "Theory of Operation" section of Documentation/driver-api/cxl/memory-devices.rst describes, bring storage volume striping semantics to memory. The hot add/remove behavior is validated with extensions to the cxl_test unit test environment and this test in the cxl-cli test suite: Summary: - Add a driver for 'struct cxl_memdev' objects responsible for CXL.mem operation as distinct from 'cxl_pci' mailbox operations. Its primary responsibility is enumerating an endpoint 'struct cxl_port' and all the 'struct cxl_port' instances between an endpoint and the CXL platform root. - Add a driver for 'struct cxl_port' objects responsible for enumerating and operating all Host-managed Device Memory (HDM) decoder resources between the platform-level CXL memory description, all intervening host bridges / switches, and the HDM resources in endpoints. - Update the cxl_pci driver to validate CXL.mem operation precursors to HDM decoder operation like ready-polling, and legacy CXL 1.1 DVSEC based CXL.mem configuration. - Add basic lockdep coverage for usage of device_lock() on CXL subsystem objects similar to what exists for LIBNVDIMM. Include a compile-time switch for which subsystem to validate at run-time. - Update cxl_test to emulate a one level switch topology. - Document a "Theory of Operation" for the subsystem. - Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs - Include miscellaneous fixes for spec / QEMU CXL emulation compatibility and static analysis reports" * tag 'cxl-for-5.18' of git:// (48 commits) cxl/core/port: Fix NULL but dereferenced coccicheck error cxl/port: Hold port reference until decoder release cxl/port: Fix endpoint refcount leak cxl/core: Fix cxl_device_lock() class detection cxl/core/port: Fix unregister_port() lock assertion cxl/regs: Fix size of CXL Capability Header Register cxl/core/port: Handle invalid decoders cxl/core/port: Fix / relax decoder target enumeration tools/testing/cxl: Add a physical_node link tools/testing/cxl: Enumerate mock decoders tools/testing/cxl: Mock one level of switches tools/testing/cxl: Fix root port to host bridge assignment tools/testing/cxl: Mock dvsec_ranges() cxl/core/port: Add endpoint decoders cxl/core: Move target_list out of base decoder attributes cxl/mem: Add the cxl_mem driver cxl/core/port: Add switch port enumeration cxl/memdev: Add numa_node attribute cxl/pci: Emit device serial number cxl/pci: Implement wait for media active ...
2022-03-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "Various misc subsystems, before getting into the post-linux-next material. 41 patches. Subsystems affected by this patch series: procfs, misc, core-kernel, lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump, taskstats, panic, kcov, resource, and ubsan" * emailed patches from Andrew Morton <>: (41 commits) Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang" kernel/resource: fix kfree() of bootmem memory again kcov: properly handle subsequent mmap calls kcov: split ioctl handling into locked and unlocked parts panic: move panic_print before kmsg dumpers panic: add option to dump all CPUs backtraces in panic_print docs: sysctl/kernel: add missing bit to panic_print taskstats: remove unneeded dead assignment kasan: no need to unset panic_on_warn in end_report() ubsan: no need to unset panic_on_warn in ubsan_epilogue() panic: unset panic_on_warn inside panic() docs: kdump: add scp example to write out the dump file docs: kdump: update description about sysfs file system support arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible cgroup: use irqsave in cgroup_rstat_flush_locked(). fat: use pointer to simple type in put_user() minix: fix bug when opening a file with O_DIRECT ...
2022-03-24Merge tag 'net-next-5.18' of ↵Linus Torvalds
git:// Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git:// (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...
2022-03-23Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"Marco Elver
This reverts commit ea91a1d45d19469001a4955583187b0d75915759. Since df05c0e9496c ("Documentation: Raise the minimum supported version of LLVM to 11.0.0") the minimum Clang version is now 11.0, which fixed the UBSAN/KCSAN vs. KCOV incompatibilities. Link: Link: Link: Signed-off-by: Marco Elver <> Reviewed-by: Nathan Chancellor <> Reviewed-by: Kees Cook <> Cc: Alexander Potapenko <> Cc: Dmitry Vyukov <> Cc: Nathan Chancellor <> Cc: Nick Desaulniers <> Cc: Arnd Bergmann <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-23ubsan: no need to unset panic_on_warn in ubsan_epilogue()Tiezhu Yang
panic_on_warn is unset inside panic(), so no need to unset it before calling panic() in ubsan_epilogue(). Link: Signed-off-by: Tiezhu Yang <> Reviewed-by: Marco Elver <> Cc: Andrey Ryabinin <> Cc: Baoquan He <> Cc: Jonathan Corbet <> Cc: Xuefeng Li <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-23lib: bitmap: fix many kernel-doc warningsRandy Dunlap
Fix kernel-doc warings in lib/bitmap.c: lib/bitmap.c:498: warning: Function parameter or member 'buf' not described in 'bitmap_print_to_buf' lib/bitmap.c:498: warning: Function parameter or member 'maskp' not described in 'bitmap_print_to_buf' lib/bitmap.c:498: warning: Function parameter or member 'nmaskbits' not described in 'bitmap_print_to_buf' lib/bitmap.c:498: warning: Function parameter or member 'off' not described in 'bitmap_print_to_buf' lib/bitmap.c:498: warning: Function parameter or member 'count' not described in 'bitmap_print_to_buf' lib/bitmap.c:561: warning: contents before sections lib/bitmap.c:606: warning: Function parameter or member 'buf' not described in 'bitmap_print_list_to_buf' lib/bitmap.c:606: warning: Function parameter or member 'maskp' not described in 'bitmap_print_list_to_buf' lib/bitmap.c:606: warning: Function parameter or member 'nmaskbits' not described in 'bitmap_print_list_to_buf' lib/bitmap.c:606: warning: Function parameter or member 'off' not described in 'bitmap_print_list_to_buf' lib/bitmap.c:606: warning: Function parameter or member 'count' not described in 'bitmap_print_list_to_buf' lib/bitmap.c:819: warning: missing initial short description on line: * bitmap_parselist_user() This still leaves 15 warnings for function return values not described, similar to this one: bitmap.c:890: warning: No description found for return value of 'bitmap_parse' Link: Fixes: 1fae562983ca ("cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list") Fixes: 4b060420a596 ("bitmap, irq: add smp_affinity_list interface to /proc/irq") Signed-off-by: Randy Dunlap <> Cc: Yury Norov <> Cc: Andy Shevchenko <> Cc: Rasmus Villemoes <> Cc: Tian Tao <> Cc: Mike Travis <> Cc: Greg Kroah-Hartman <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-23lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN optionFeng Tang
0Day robots reported there is compiling issue for 'csky' ARCH when CONFIG_DEBUG_FORCE_DATA_SECTION_ALIGNED is enabled [1]: All errors (new ones prefixed by >>): {standard input}: Assembler messages: >> {standard input}:2277: Error: pcrel offset for branch to .LS000B too far (0x3c) Which was discussed in [2]. And as there is no solution for csky yet, add some dependency for this config to limit it to several ARCHs which have no compiling issue so far. [1]. [2]. Link: Reported-by: kernel test robot <> Signed-off-by: Feng Tang <> Cc: Guo Ren <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-23Kconfig.debug: make DEBUG_INFO selectable from a choiceKees Cook
Currently it's not possible to enable DEBUG_INFO for an all*config build, since it is marked as "depends on !COMPILE_TEST". This generally makes sense because a debug build of an all*config target ends up taking much longer and the output is much larger. Having this be "default off" makes sense. However, there are cases where enabling DEBUG_INFO for such builds is useful for doing treewide A/B comparisons of build options, etc. Make DEBUG_INFO selectable from any of the DWARF version choice options, with DEBUG_INFO_NONE being the default for COMPILE_TEST. The mutually exclusive relationship between DWARF5 and BTF must be inverted, but the result remains the same. Additionally moves DEBUG_KERNEL and DEBUG_MISC up to the top of the menu because they were enabling features _above_ it, making it weird to navigate menuconfig. [ make DEBUG_INFO always default=n] Link: Link: Link: Signed-off-by: Kees Cook <> Suggested-by: Arnd Bergmann <> Reviewed-by: Arnd Bergmann <> Reviewed-by: Nathan Chancellor <> Reviewed-by: Nick Desaulniers <> Tested-by: Nick Desaulniers <> Reviewed-by: Masahiro Yamada <> Cc: Nick Desaulniers <> Cc: Tetsuo Handa <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-23Merge tag 'asm-generic-5.18' of ↵Linus Torvalds
git:// Pull asm-generic updates from Arnd Bergmann: "There are three sets of updates for 5.18 in the asm-generic tree: - The set_fs()/get_fs() infrastructure gets removed for good. This was already gone from all major architectures, but now we can finally remove it everywhere, which loses some particularly tricky and error-prone code. There is a small merge conflict against a parisc cleanup, the solution is to use their new version. - The nds32 architecture ends its tenure in the Linux kernel. The hardware is still used and the code is in reasonable shape, but the mainline port is not actively maintained any more, as all remaining users are thought to run vendor kernels that would never be updated to a future release. - A series from Masahiro Yamada cleans up some of the uapi header files to pass the compile-time checks" * tag 'asm-generic-5.18' of git:// (27 commits) nds32: Remove the architecture uaccess: remove CONFIG_SET_FS ia64: remove CONFIG_SET_FS support sh: remove CONFIG_SET_FS support sparc64: remove CONFIG_SET_FS support lib/test_lockup: fix kernel pointer check for separate address spaces uaccess: generalize access_ok() uaccess: fix type mismatch warnings from access_ok() arm64: simplify access_ok() m68k: fix access_ok for coldfire MIPS: use simpler access_ok() MIPS: Handle address errors for accesses above CPU max virtual user address uaccess: add generic __{get,put}_kernel_nofault nios2: drop access_ok() check from __put_user() x86: use more conventional access_ok() definition x86: remove __range_not_ok() sparc64: add __{get,put}_kernel_nofault() nds32: fix access_ok() checks in get/put_user uaccess: fix nios2 and microblaze get_user_8() sparc64: fix building assembly files ...
2022-03-23Merge tag 'linux-kselftest-kunit-5.18-rc1' of ↵Linus Torvalds
git:// Pull KUnit updates from Shuah Khan: - changes to decrease macro layering string, integer, EQ/NE asserts - remove unused macros - several cleanups and fixes - new list tests for list_del_init_careful(), list_is_head() and list_entry_is_head() * tag 'linux-kselftest-kunit-5.18-rc1' of git:// list: test: Add a test for list_entry_is_head() list: test: Add a test for list_is_head() list: test: Add test for list_del_init_careful() kunit: cleanup assertion macro internal variables kunit: factor out str constants from binary assertion structs kunit: consolidate KUNIT_INIT_BINARY_ASSERT_STRUCT macros kunit: remove va_format from kunit_assert kunit: tool: drop mostly unused KunitResult.result field kunit: decrease macro layering for EQ/NE asserts kunit: decrease macro layering for integer asserts kunit: reduce layering in string assertion macros kunit: drop unused intermediate macros for ptr inequality checks kunit: make KUNIT_EXPECT_EQ() use KUNIT_EXPECT_EQ_MSG(), etc. kunit: drop unused assert_type from kunit_assert and clean up macros kunit: split out part of kunit_assert into a static const kunit: factor out kunit_base_assert_format() call into kunit_fail() kunit: drop unused kunit* field in kunit_assert kunit: move check if assertion passed into the macros kunit: add example test case showing off all the expect macros
2022-03-23Merge tag 'printk-for-5.18' of ↵Linus Torvalds
git:// Pull printk updates from Petr Mladek: - Make %pK behave the same as %p for kptr_restrict == 0 also with no_hash_pointers parameter - Ignore the default console in the device tree also when console=null or console="" is used on the command line - Document console=null and console="" behavior - Prevent a deadlock and a livelock caused by console_lock in panic() - Make console_lock available for panicking CPU - Fast query for the next to-be-used sequence number - Use the expected return values in printk.devkmsg __setup handler - Use the correct atomic operations in wake_up_klogd() irq_work handler - Avoid possible unaligned access when handling %4cc printing format * tag 'printk-for-5.18' of git:// printk: fix return value of printk.devkmsg __setup handler vsprintf: Fix %pK with kptr_restrict == 0 printk: make suppress_panic_printk static printk: Set console_set_on_cmdline=1 when __add_preferred_console() is called with user_specified == true Docs: printk: add 'console=null|""' to admin/kernel-parameters printk: use atomic updates for klogd work printk: Drop console_sem during panic printk: Avoid livelock with heavy printk during panic printk: disable optimistic spin during panic printk: Add panic_in_progress helper vsprintf: Move space out of string literals in fourcc_string() vsprintf: Fix potential unaligned access printk: ringbuffer: Improve prb_next_seq() performance
2022-03-22Merge tag 'folio-5.18c' of git:// Torvalds
Pull folio updates from Matthew Wilcox: - Rewrite how munlock works to massively reduce the contention on i_mmap_rwsem (Hugh Dickins): - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig): - Convert GUP to use folios and make pincount available for order-1 pages. (Matthew Wilcox) - Convert a few more truncation functions to use folios (Matthew Wilcox) - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox) - Convert rmap_walk to use folios (Matthew Wilcox) - Convert most of shrink_page_list() to use a folio (Matthew Wilcox) - Add support for creating large folios in readahead (Matthew Wilcox) * tag 'folio-5.18c' of git:// (114 commits) mm/damon: minor cleanup for damon_pa_young selftests/vm/transhuge-stress: Support file-backed PMD folios mm/filemap: Support VM_HUGEPAGE for file mappings mm/readahead: Switch to page_cache_ra_order mm/readahead: Align file mappings for non-DAX mm/readahead: Add large folio readahead mm: Support arbitrary THP sizes mm: Make large folios depend on THP mm: Fix READ_ONLY_THP warning mm/filemap: Allow large folios to be added to the page cache mm: Turn can_split_huge_page() into can_split_folio() mm/vmscan: Convert pageout() to take a folio mm/vmscan: Turn page_check_references() into folio_check_references() mm/vmscan: Account large folios correctly mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios mm/vmscan: Free non-shmem folios without splitting them mm/rmap: Constify the rmap_walk_control argument mm/rmap: Convert rmap_walk() to take a folio mm: Turn page_anon_vma() into folio_anon_vma() mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read() ...
2022-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22kfence: allow use of a deferrable timerMarco Elver
Allow the use of a deferrable timer, which does not force CPU wake-ups when the system is idle. A consequence is that the sample interval becomes very unpredictable, to the point that it is not guaranteed that the KFENCE KUnit test still passes. Nevertheless, on power-constrained systems this may be preferable, so let's give the user the option should they accept the above trade-off. Link: Signed-off-by: Marco Elver <> Reviewed-by: Alexander Potapenko <> Cc: Dmitry Vyukov <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-22kunit: make kunit_test_timeout compatible with commentPeng Liu
In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC" represent 5min. However, it is wrong when dealing with arm64 whose default HZ = 250, or some other situations. Use msecs_to_jiffies to fix this, and kunit_test_timeout will work as desired. Link: Fixes: 5f3e06208920 ("kunit: test: add support for test abort") Signed-off-by: Peng Liu <> Reviewed-by: Marco Elver <> Reviewed-by: Daniel Latypov <> Reviewed-by: Brendan Higgins <> Tested-by: Brendan Higgins <> Cc: Alexander Potapenko <> Cc: Dmitry Vyukov <> Cc: Wang Kefeng <> Cc: David Gow <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-22kunit: fix UAF when run kfence test case test_gfpzeroPeng Liu
Patch series "kunit: fix a UAF bug and do some optimization", v2. This series is to fix UAF (use after free) when running kfence test case test_gfpzero, which is time costly. This UAF bug can be easily triggered by setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Furthermore, some optimization for kunit tests has been done. This patch (of 3): Kunit will create a new thread to run an actual test case, and the main process will wait for the completion of the actual test thread until overtime. The variable "struct kunit test" has local property in function kunit_try_catch_run, and will be used in the test case thread. Task kunit_try_catch_run will free "struct kunit test" when kunit runs overtime, but the actual test case is still run and an UAF bug will be triggered. The above problem has been both observed in a physical machine and qemu platform when running kfence kunit tests. The problem can be triggered when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the test case test_gfpzero will cost hours and kunit will run to overtime. The follows show the panic log. BUG: unable to handle page fault for address: ffffffff82d882e9 Call Trace: kunit_log_append+0x58/0xd0 ... test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test] test_gfpzero.cold+0x61/0x8ab [kfence_test] kunit_try_run_case+0x4c/0x70 kunit_generic_run_threadfn_adapter+0x11/0x20 kthread+0x166/0x190 ret_from_fork+0x22/0x30 Kernel panic - not syncing: Fatal exception Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 To solve this problem, the test case thread should be stopped when the kunit frame runs overtime. The stop signal will send in function kunit_try_catch_run, and test_gfpzero will handle it. Link: Link: Signed-off-by: Peng Liu <> Reviewed-by: Marco Elver <> Reviewed-by: Brendan Higgins <> Tested-by: Brendan Higgins <> Cc: Alexander Potapenko <> Cc: Dmitry Vyukov <> Cc: Wang Kefeng <> Cc: Daniel Latypov <> Cc: David Gow <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-22xarray: use kmem_cache_alloc_lru to allocate xa_nodeMuchun Song
The workingset will add the xa_node to the shadow_nodes list. So the allocation of xa_node should be done by kmem_cache_alloc_lru(). Using xas_set_lru() to pass the list_lru which we want to insert xa_node into to set up the xa_node reclaim context correctly. Link: Signed-off-by: Muchun Song <> Acked-by: Johannes Weiner <> Acked-by: Roman Gushchin <> Cc: Alex Shi <> Cc: Anna Schumaker <> Cc: Chao Yu <> Cc: Dave Chinner <> Cc: Fam Zheng <> Cc: Jaegeuk Kim <> Cc: Kari Argillander <> Cc: Matthew Wilcox (Oracle) <> Cc: Michal Hocko <> Cc: Qi Zheng <> Cc: Shakeel Butt <> Cc: Theodore Ts'o <> Cc: Trond Myklebust <> Cc: Vladimir Davydov <> Cc: Vlastimil Babka <> Cc: Wei Yang <> Cc: Xiongchun Duan <> Cc: Yang Shi <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2022-03-21Merge tag 'overflow-v5.18-rc1' of ↵Linus Torvalds
git:// Pull overflow updates from Kees Cook: "These changes come in roughly two halves: support of Gustavo A. R. Silva's struct_size() work via additional helpers for catching overflow allocation size calculations, and conversions of selftests to KUnit (which includes some tweaks for UML + Clang): - Convert overflow selftest to KUnit - Convert stackinit selftest to KUnit - Implement size_t saturating arithmetic helpers - Allow struct_size() to be used in initializers" * tag 'overflow-v5.18-rc1' of git:// lib: stackinit: Convert to KUnit um: Allow builds with Clang lib: overflow: Convert to Kunit overflow: Provide constant expression struct_size overflow: Implement size_t saturating arithmetic helpers test_overflow: Regularize test reporting output
2022-03-21lib/sbitmap: allocate sb->map via kvzalloc_nodeMing Lei
sbitmap has been used in scsi for replacing atomic operations on sdev->device_busy, so IOPS on some fast scsi storage can be improved. However, sdev->device_busy can be changed in fast path, so we have to allocate the sb->map statically. sdev->device_busy has been capped to 1024, but some drivers may configure the default depth as < 8, then cause each sbitmap word to hold only one bit. Finally 1024 * 128( sizeof(sbitmap_word)) bytes is needed for sb->map, given it is order 5 allocation, sometimes it may fail. Avoid the issue by using kvzalloc_node() for allocating sb->map. Cc: Ewan D. Milne <> Signed-off-by: Ming Lei <> Reviewed-by: Chaitanya Kulkarni <> Link: Signed-off-by: Jens Axboe <>
2022-03-21Merge tag 'for-5.18/drivers-2022-03-18' of git:// Torvalds
Pull block driver updates from Jens Axboe: - NVMe updates via Christoph: - add vectored-io support for user-passthrough (Kanchan Joshi) - add verbose error logging (Alan Adamson) - support buffered I/O on block devices in nvmet (Chaitanya Kulkarni) - central discovery controller support (Martin Belanger) - fix and extended the globally unique idenfier validation (Christoph) - move away from the deprecated IDA APIs (Sagi Grimberg) - misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin, Chaitanya Kulkarni) - add lockdep annotations for in-kernel sockets (Chris Leech) - use vmalloc for ANA log buffer (Hannes Reinecke) - kerneldoc fixes (Chaitanya Kulkarni) - cleanups (Guoqing Jiang, Chaitanya Kulkarni, Christoph) - warn about shared namespaces without multipathing (Christoph) - MD updates via Song with a set of cleanups (Christoph, Mariusz, Paul, Erik, Dirk) - loop cleanups and queue depth configuration (Chaitanya) - null_blk cleanups and fixes (Chaitanya) - Use descriptive init/exit names in virtio_blk (Randy) - Use bvec_kmap_local() in drivers (Christoph) - bcache fixes (Mingzhe) - xen blk-front persistent grant speedups (Juergen) - rnbd fix and cleanup (Gioh) - Misc fixes (Christophe, Colin) * tag 'for-5.18/drivers-2022-03-18' of git:// (76 commits) virtio_blk: eliminate anonymous module_init & module_exit nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH nvme: remove nvme_alloc_request and nvme_alloc_request_qid nvme: cleanup how disk->disk_name is assigned nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate nvmet: use snprintf() with PAGE_SIZE in configfs nvmet: don't fold lines nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport nvme-tcp: lockdep: annotate in-kernel sockets nvme-tcp: don't fold the line nvme-tcp: don't initialize ret variable nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio nvme-multipath: use vmalloc for ANA log buffer xen/blkfront: speed up purge_persistent_grants() raid5: initialize the stripe_head embeeded bios as needed raid5-cache: statically allocate the recovery ra bio raid5-cache: fully initialize flush_bio when needed raid5-ppl: fully initialize the bio in ppl_new_iounit ...
2022-03-21Merge tag 'for-5.18/block-2022-03-18' of git:// Torvalds
Pull block updates from Jens Axboe: - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo) - blk-rq-qos completion fix (Tejun) - blk-cgroup merge fix (Tejun) - Add offline error return value to distinguish it from an IO error on the device (Song) - IO stats fixes (Zhang, Christoph) - blkcg refcount fixes (Ming, Yu) - Fix for indefinite dispatch loop softlockup (Shin'ichiro) - blk-mq hardware queue management improvements (Ming) - sbitmap dead code removal (Ming, John) - Plugging merge improvements (me) - Show blk-crypto capabilities in sysfs (Eric) - Multiple delayed queue run improvement (David) - Block throttling fixes (Ming) - Start deprecating auto module loading based on dev_t (Christoph) - bio allocation improvements (Christoph, Chaitanya) - Get rid of bio_devname (Christoph) - bio clone improvements (Christoph) - Block plugging improvements (Christoph) - Get rid of genhd.h header (Christoph) - Ensure drivers use appropriate flush helpers (Christoph) - Refcounting improvements (Christoph) - Queue initialization and teardown improvements (Ming, Christoph) - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng, Lukas, Nian, Yang, Eric, Chengming) * tag 'for-5.18/block-2022-03-18' of git:// (127 commits) block: cancel all throttled bios in del_gendisk() block: let blkcg_gq grab request queue's refcnt block: avoid use-after-free on throttle data block: limit request dispatch loop duration block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative" sr: simplify the local variable initialization in sr_block_open() block: don't merge across cgroup boundaries if blkcg is enabled block: fix rq-qos breakage from skipping rq_qos_done_bio() block: flush plug based on hardware and software queue order block: ensure plug merging checks the correct queue at least once block: move rq_qos_exit() into disk_release() block: do more work in elevator_exit block: move blk_exit_queue into disk_release block: move q_usage_counter release into blk_queue_release block: don't remove hctx debugfs dir from blk_mq_exit_queue block: move blkcg initialization/destroy into disk allocation/release handler sr: implement ->free_disk to simplify refcounting sd: implement ->free_disk to simplify refcounting sd: delay calling free_opal_dev sd: call sd_zbc_release_disk before releasing the scsi_device reference ...
2022-03-21Merge branch 'linus' of ↵Linus Torvalds
git:// Pull crypto updates from Herbert Xu: "API: - hwrng core now credits for low-quality RNG devices. Algorithms: - Optimisations for neon aes on arm/arm64. - Add accelerated crc32_be on arm64. - Add ffdheXYZ(dh) templates. - Disallow hmac keys < 112 bits in FIPS mode. - Add AVX assembly implementation for sm3 on x86. Drivers: - Add missing local_bh_disable calls for crypto_engine callback. - Ensure BH is disabled in crypto_engine callback path. - Fix zero length DMA mappings in ccree. - Add synchronization between mailbox accesses in octeontx2. - Add Xilinx SHA3 driver. - Add support for the TDES IP available on sama7g5 SoC in atmel" * 'linus' of git:// (137 commits) crypto: xilinx - Turn SHA into a tristate and allow COMPILE_TEST MAINTAINERS: update HPRE/SEC2/TRNG driver maintainers list crypto: dh - Remove the unused function dh_safe_prime_dh_alg() hwrng: nomadik - Change clk_disable to clk_disable_unprepare crypto: arm64 - cleanup comments crypto: qat - fix initialization of pfvf rts_map_msg structures crypto: qat - fix initialization of pfvf cap_msg structures crypto: qat - remove unneeded assignment crypto: qat - disable registration of algorithms crypto: hisilicon/qm - fix memset during queues clearing crypto: xilinx: prevent probing on non-xilinx hardware crypto: marvell/octeontx - Use swap() instead of open coding it crypto: ccree - Fix use after free in cc_cipher_exit() crypto: ccp - ccp_dmaengine_unregister release dma channels crypto: octeontx2 - fix missing unlock hwrng: cavium - fix NULL but dereferenced coccicheck error crypto: cavium/nitrox - don't cast parameter in bit operations crypto: vmx - add missing dependencies MAINTAINERS: Add maintainer for Xilinx ZynqMP SHA3 driver crypto: xilinx - Add Xilinx SHA3 driver ...
2022-03-21Merge tag 'random-5.18-rc1-for-linus' of ↵Linus Torvalds
git:// Pull random number generator updates from Jason Donenfeld: "There have been a few important changes to the RNG's crypto, but the intent for 5.18 has been to shore up the existing design as much as possible with modern cryptographic functions and proven constructions, rather than actually changing up anything fundamental to the RNG's design. So it's still the same old RNG at its core as before: it still counts entropy bits, and collects from the various sources with the same heuristics as before, and so forth. However, the cryptographic algorithms that transform that entropic data into safe random numbers have been modernized. Just as important, if not more, is that the code has been cleaned up and re-documented. As one of the first drivers in Linux, going back to 1.3.30, its general style and organization was showing its age and becoming both a maintenance burden and an auditability impediment. Hopefully this provides a more solid foundation to build on for the future. I encourage you to open up the file in full, and maybe you'll remark, "oh, that's what it's doing," and enjoy reading it. That, at least, is the eventual goal, which this pull begins working toward. Here's a summary of the various patches in this pull: - /dev/urandom and /dev/random now do the same thing, per the patch we discussed on the list. I think this is worth trying out. If it does appear problematic, I've made sure to keep it standalone and revertible without any conflicts. - Fixes and cleanups for numerous integer type problems, locking issues, and general code quality concerns. - The input pool's LFSR has been replaced with a cryptographically secure hash function, which has security and performance benefits alike, and consequently allows us to count entropy bits linearly. - The pre-init injection now uses a real hash function too, instead of an LFSR or vanilla xor. - The interrupt handler's fast_mix() function now uses one round of SipHash, rather than the fake crypto that was there before. - All additions of RDRAND and RDSEED now go through the input pool's hash function, in part to mitigate ridiculous hypothetical CPU backdoors, but more so to have a consistent interface for ingesting entropy that's easy to analyze, making everything happen one way, instead of a potpourri of different ways. - The crng now works on per-cpu data, while also being in accordance with the actual "fast key erasure RNG" design. This allows us to fix several boot-time race complications associated with the prior dynamically allocated model, eliminates much locking, and makes our backtrack protection more robust. - Batched entropy now erases doled out values so that it's backtrack resistant. - Working closely with Sebastian, the interrupt handler no longer needs to take any locks at all, as we punt the synchronized/expensive operations to a workqueue. This is especially nice for PREEMPT_RT, where taking spinlocks in irq context is problematic. It also makes the handler faster for the rest of us. - Also working with Sebastian, we now do the right thing on CPU hotplug, so that we don't use stale entropy or fail to accumulate new entropy when CPUs come back online. - We handle virtual machines that fork / clone / snapshot, using the "vmgenid" ACPI specification for retrieving a unique new RNG seed, which we can use to also make WireGuard (and in the future, other things) safe across VM forks. - Around boot time, we now try to reseed more often if enough entropy is available, before settling on the usual 5 minute schedule. - Last, but certainly not least, the documentation in the file has been updated considerably" * tag 'random-5.18-rc1-for-linus' of git:// (60 commits) random: check for signal and try earlier when generating entropy random: reseed more often immediately after booting random: make consistent usage of crng_ready() random: use SipHash as interrupt entropy accumulator wireguard: device: clear keys on VM fork random: provide notifier for VM fork random: replace custom notifier chain with standard one random: do not export add_vmfork_randomness() unless needed virt: vmgenid: notify RNG of VM fork and supply generation ID ACPI: allow longer device IDs random: add mechanism for VM forks to reinitialize crng random: don't let 644 read-only sysctls be written to random: give sysctl_random_min_urandom_seed a more sensible value random: block in /dev/urandom random: do crng pre-init loading in worker rather than irq random: unify cycles_t and jiffies usage and types random: cleanup UUID handling random: only wake up writers after zap if threshold was passed random: round-robin registers as ulong, not u32 random: clear fast pool, crng, and batches in cpuhp bring up ...
2022-03-21lib: stackinit: Convert to KUnitKees Cook
Convert stackinit unit tests to KUnit, for better integration into the kernel self test framework. Includes a rename of test_stackinit.c to stackinit_kunit.c, and CONFIG_TEST_STACKINIT to CONFIG_STACKINIT_KUNIT_TEST. Adjust expected test results based on which stack initialization method was chosen: $ CMD="./tools/testing/kunit/ run stackinit --raw_output \ --arch=x86_64 --kconfig_add" $ $CMD | grep stackinit: # stackinit: pass:36 fail:0 skip:29 total:65 $ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_USER=y | grep stackinit: # stackinit: pass:37 fail:0 skip:28 total:65 $ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF=y | grep stackinit: # stackinit: pass:55 fail:0 skip:10 total:65 $ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y | grep stackinit: # stackinit: pass:62 fail:0 skip:3 total:65 $ $CMD CONFIG_INIT_STACK_ALL_PATTERN=y --make_option LLVM=1 | grep stackinit: # stackinit: pass:60 fail:0 skip:5 total:65 $ $CMD CONFIG_INIT_STACK_ALL_ZERO=y --make_option LLVM=1 | grep stackinit: # stackinit: pass:60 fail:0 skip:5 total:65 Temporarily remove the userspace-build mode, which will be restored in a later patch. Expand the size of the pre-case switch variable so it doesn't get accidentally cleared. Cc: David Gow <> Cc: Daniel Latypov <> Cc: Arnd Bergmann <> Signed-off-by: Kees Cook <> --- v1: v2: - split "userspace KUnit stub" into separate header and patch (Daniel) - Improve commit log and comments (David) - Provide mapping of expected XFAIL tests to CONFIGs (David)
2022-03-21Merge branch 'for-5.18-vsprintf-fourcc-fixup' into for-linusPetr Mladek