summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-09-29Merge tag 'libcrypto-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: - Add a RISC-V optimized implementation of Poly1305. This code was written by Andy Polyakov and contributed by Zhihang Shao. - Migrate the MD5 code into lib/crypto/, and add KUnit tests for MD5. Yes, it's still the 90s, and several kernel subsystems are still using MD5 for legacy use cases. As long as that remains the case, it's helpful to clean it up in the same way as I've been doing for other algorithms. Later, I plan to convert most of these users of MD5 to use the new MD5 library API instead of the generic crypto API. - Simplify the organization of the ChaCha, Poly1305, BLAKE2s, and Curve25519 code. Consolidate these into one module per algorithm, and centralize the configuration and build process. This is the same reorganization that has already been successful for SHA-1 and SHA-2. - Remove the unused crypto_kpp API for Curve25519. - Migrate the BLAKE2s and Curve25519 self-tests to KUnit. - Always enable the architecture-optimized BLAKE2s code. * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (38 commits) crypto: md5 - Implement export_core() and import_core() wireguard: kconfig: simplify crypto kconfig selections lib/crypto: tests: Enable Curve25519 test when CRYPTO_SELFTESTS lib/crypto: curve25519: Consolidate into single module lib/crypto: curve25519: Move a couple functions out-of-line lib/crypto: tests: Add Curve25519 benchmark lib/crypto: tests: Migrate Curve25519 self-test to KUnit crypto: curve25519 - Remove unused kpp support crypto: testmgr - Remove curve25519 kpp tests crypto: x86/curve25519 - Remove unused kpp support crypto: powerpc/curve25519 - Remove unused kpp support crypto: arm/curve25519 - Remove unused kpp support crypto: hisilicon/hpre - Remove unused curve25519 kpp support lib/crypto: tests: Add KUnit tests for BLAKE2s lib/crypto: blake2s: Consolidate into single C translation unit lib/crypto: blake2s: Move generic code into blake2s.c lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code lib/crypto: blake2s: Remove obsolete self-test lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2 lib/crypto: chacha: Consolidate into single module ...
2025-09-29Merge tag 'crc-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: "Update crc_kunit to test the CRC functions in softirq and hardirq contexts, similar to what the lib/crypto/ KUnit tests do. Move the helper function needed to do this into a common header. This is useful mainly to test fallback code paths for when FPU/SIMD/vector registers are unusable" * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: Documentation/staging: Fix typo and incorrect citation in crc32.rst lib/crc: Drop inline from all *_mod_init_arch() functions lib/crc: Use underlying functions instead of crypto_simd_usable() lib/crc: crc_kunit: Test CRC computation in interrupt contexts kunit, lib/crypto: Move run_irq_test() to common header
2025-09-29Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds
Pull fscrypt updates from Eric Biggers: "Make fs/crypto/ use the HMAC-SHA512 library functions instead of crypto_shash. This is simpler, faster, and more reliable" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: use HMAC-SHA512 library for HKDF fscrypt: Remove redundant __GFP_NOWARN
2025-09-29Merge tag 'dlm-6.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "This adds a dlm_release_lockspace() flag to request that node-failure recovery be performed for the node leaving the lockspace. The implementation of this flag requires coordination with userland clustering components. It's been requested for use by GFS2" * tag 'dlm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: check for undefined release_option values dlm: handle release_option as unsigned dlm: move to rinfo for all middle conversion cases dlm: handle invalid lockspace member remove dlm: add new flag DLM_RELEASE_RECOVER for dlm_lockspace_release dlm: add new configfs entry release_recover for lockspace members dlm: add new RELEASE_RECOVER uevent attribute for release_lockspace dlm: use defines for force values in dlm_release_lockspace dlm: check for defined force value in dlm_lockspace_release
2025-09-29Merge tag 'erofs-for-6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: - Support FS_IOC_GETFSLABEL ioctl - Stop meaningless read-more policy on fragment extents - Remove a duplicate check for ztailpacking inline * tag 'erofs-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: drop redundant sanity check for ztailpacking inline erofs: Add support for FS_IOC_GETFSLABEL erofs: avoid reading more for fragment maps
2025-09-29Merge tag 'hfs-v6.18-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs Pull hfs updates from Viacheslav Dubeyko: "This contains several fixes of syzbot reported issues, HFS/HFS+ fixes of xfstests failures, and rework of HFS/HFS+ debug output subsystem. - Kang Chen fixed a slab-out-of-bounds issue in hfsplus_uni2asc() when hfsplus_uni2asc() is called from hfsplus_listxattr(). - Yang Chenzhi fixed a crash in hfsplus_bmap_alloc() if record offset or length is larger than node_size. - Yangtao Li corrected the error code from hfsplus_fill_super() if Catalog File contains corrupted record for the case of hidden directory's type. - KMSAN uninit-value fixes: hfs_find_set_zero_bits() and __hfsplus_ext_cache_extent() use kzalloc() instead of kmalloc(), and in hfsplus_delete_cat() by proper initialization of struct hfsplus_inode_info in the hfsplus_iget() logic. - A slab-out-of-bounds issue could happen in hfsplus_strcasecmp() if the length field of struct hfsplus_unistr is bigger than HFSPLUS_MAX_STRLEN. Fixed by checking the length of comparing strings, and if the strings' length is bigger than HFSPLUS_MAX_STRLEN, then the length is corrected to this value. - The generic/736 xfstest failed for HFS because the HFS volume becomes corrupted after the test run. The main reason was the absence of logic that corrects mdb->drNxtCNID/HFS_SB(sb)->next_id (next unused CNID) after deleting a record in Catalog File. That was fixed by implementing the necessary logic in hfs_correct_next_unused_CNID()" * tag 'hfs-v6.18-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs: hfs/hfsplus: rework debug output subsystem hfsplus: fix slab-out-of-bounds read in hfsplus_strcasecmp() hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc() hfs: clear offset and space out of valid records in b-tree node hfs: add logic of correcting a next unused CNID hfsplus: fix KMSAN uninit-value issue in hfsplus_delete_cat() hfs: fix KMSAN uninit-value issue in hfs_find_set_zero_bits() hfs: make proper initalization of struct hfs_find_data hfsplus: fix KMSAN uninit-value issue in __hfsplus_ext_cache_extent() hfs: validate record offset in hfsplus_bmap_alloc hfsplus: return EIO when type of hidden directory mismatch in hfsplus_fill_super() MAINTAINERS: update location of hfs&hfsplus trees
2025-09-29Merge tag 'v6.18-rc-part1-smb3-common' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb restructuring updates from Steve French: "Large set of small restructuring smbdirect related patches for cifs.ko and ksmbd.ko. This is the next step in order to use common structures for smbdirect handling across both modules. And also includes improved handling of broken connections, as well as fixed negotiation as rdma resources. Moving to common functions is planned for 6.19, as well as also providing smbdirect via sockets to userspace (e.g. for samba to also be able to use smbdirect for userspace server and userspace client tools). This was heavily reviewed and tested at the recent SMB3.1.1 test event at SDC" * tag 'v6.18-rc-part1-smb3-common' of git://git.samba.org/ksmbd: (159 commits) smb: server: let smb_direct_flush_send_list() invalidate a remote key first smb: server: make use of ib_alloc_cq_any() instead of ib_alloc_cq() smb: server: make consitent use of spin_lock_irq{save,restore}() in transport_rdma.c smb: server: let {free_transport,smb_direct_disconnect_rdma_{work,connection}}() wake up all wait queues smb: server: let smb_direct_disconnect_rdma_connection() disable all work but disconnect_work smb: server: fill in smbdirect_socket.first_error on error smb: server: let smb_direct_disconnect_rdma_connection() set SMBDIRECT_SOCKET_ERROR... smb: server: pass struct smbdirect_socket to smb_direct_send_negotiate_response() smb: server: pass struct smbdirect_socket to {enqueue,get_first}_reassembly() smb: server: pass struct smbdirect_socket to smb_direct_post_send_data() smb: server: pass struct smbdirect_socket to post_sendmsg() smb: server: pass struct smbdirect_socket to smb_direct_create_header() smb: server: pass struct smbdirect_socket to manage_keep_alive_before_sending() smb: server: pass struct smbdirect_socket to manage_credits_prior_sending() smb: server: pass struct smbdirect_socket to calc_rw_credits() smb: server: pass struct smbdirect_socket to wait_for_rw_credits() smb: server: pass struct smbdirect_socket to wait_for_send_credits() smb: server: pass struct smbdirect_socket to wait_for_credits() smb: server: pass struct smbdirect_socket to smb_direct_flush_send_list() smb: server: pass struct smbdirect_socket to smb_direct_post_send() ...
2025-09-29Merge tag 'xfs-merge-6.18' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Carlos Maiolino: "For this merge window, there are really no new features, but there are a few things worth to emphasize: - Deprecated for years already, the (no)attr2 and (no)ikeep mount options have been removed for good - Several cleanups (specially from typedefs) and bug fixes - Improvements made in the online repair reap calculations - online fsck is now enabled by default" * tag 'xfs-merge-6.18' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits) xfs: rework datasync tracking and execution xfs: rearrange code in xfs_inode_item_precommit xfs: scrub: use kstrdup_const() for metapath scan setups xfs: use bt_nr_sectors in xfs_dax_translate_range xfs: track the number of blocks in each buftarg xfs: constify xfs_errortag_random_default xfs: improve default maximum number of open zones xfs: improve zone statistics message xfs: centralize error tag definitions xfs: remove pointless externs in xfs_error.h xfs: remove the expr argument to XFS_TEST_ERROR xfs: remove xfs_errortag_set xfs: remove xfs_errortag_get xfs: move the XLOG_REG_ constants out of xfs_log_format.h xfs: adjust the hint based zone allocation policy xfs: refactor hint based zone allocation fs: add an enum for number of life time hints xfs: fix log CRC mismatches between i386 and other architectures xfs: rename the old_crc variable in xlog_recover_process xfs: remove the unused xfs_log_iovec_t typedef ...
2025-09-29Merge tag 'gfs2-for-6.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Partially revert "gfs2: do_xmote fixes" to ignore dlm_lock() errors during withdraw; passing on those errors doesn't help - Change the LM_FLAG_TRY and LM_FLAG_TRY_1CB logic in add_to_queue() to check if the holder would actually block - Move some more dlm specific code from glock.c to lock_dlm.c - Remove the unused dlm alternate locking mode code - Add proper locking to make sure that dlm lockspaces are never used after being released - Various other cleanups * tag 'gfs2-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix unlikely race in gdlm_put_lock gfs2: Add proper lockspace locking gfs2: Minor run_queue fixes gfs2: run_queue cleanup gfs2: Simplify do_promote gfs2: Get rid of GLF_INVALIDATE_IN_PROGRESS gfs2: Fix GLF_INVALIDATE_IN_PROGRESS flag clearing in do_xmote gfs2: Remove duplicate check in do_xmote gfs2: Fix LM_FLAG_TRY* logic in add_to_queue gfs2: Remove DLM_LKF_ALTCW / DLM_LKF_ALTPR code gfs2: Further sanitize lock_dlm.c gfs2: Do not use atomic operations unnecessarily gfs2: Sanitize gfs2_meta_check, gfs2_metatype_check, gfs2_io_error gfs2: Turn gfs2_withdraw into a void function gfs2: Partially revert "gfs2: do_xmote fixes" gfs2: Simplify refcounting in do_xmote gfs2: do_xmote cleanup gfs2: Remove space before newline gfs2: Remove unused sd_withdraw_wait field gfs2: Remove unused GIF_FREE_VFS_INODE flag
2025-09-29Remove bcachefs core codeLinus Torvalds
bcachefs was marked 'externally maintained' in 6.17 but the code remained to make the transition smoother. It's now a DKMS module, making the in-kernel code stale, so remove it to avoid any version confusion. Link: https://lore.kernel.org/linux-bcachefs/yokpt2d2g2lluyomtqrdvmkl3amv3kgnipmenobkpgx537kay7@xgcgjviv3n7x/T/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-29Unbreak 'make tools/*' for user-space targetsLinus Torvalds
This pattern isn't very documented, and apparently not used much outside of 'make tools/help', but it has existed for over a decade (since commit ea01fa9f63ae: "tools: Connect to the kernel build system"). However, it doesn't work very well for most cases, particularly the useful "tools/all" target, because it overrides the LDFLAGS value with an empty one. And once overridden, 'make' will then not honor the tooling makefiles trying to change it - which then makes any LDFLAGS use in the tooling directory break, typically causing odd link errors. Remove that LDFLAGS override, since it seems to be entirely historical. The core kernel makefiles no longer modify LDFLAGS as part of the build, and use kernel-specific link flags instead (eg 'KBUILD_LDFLAGS' and friends). This allows more of the 'make tools/*' cases to work. I say 'more', because some of the tooling build rules make various other assumptions or have other issues, so it's still a bit hit-or-miss. But those issues tend to show up with the 'make -C tools xyz' pattern too, so now it's no longer an issue of this particular 'tools/*' build rule being special. Acked-by: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-29docs: networking: phy: clarify abbreviation "PAL"Markus Heidelberg
It is suddenly used in the text without introduction, so the meaning might have been unclear to readers. Signed-off-by: Markus Heidelberg <m.heidelberg@cab.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250926131520.222346-1-m.heidelberg@cab.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: ethtool: remove duplicated mm.o from MakefileMarkus Heidelberg
Fixes: 2b30f8291a30 ("net: ethtool: add support for MAC Merge layer") Signed-off-by: Markus Heidelberg <m.heidelberg@cab.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250926131323.222192-1-m.heidelberg@cab.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'vfs-6.18-rc1.async' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs async directory updates from Christian Brauner: "This contains further preparatory changes for the asynchronous directory locking scheme: - Add lookup_one_positive_killable() which allows overlayfs to perform lookup that won't block on a fatal signal - Unify the mount idmap handling in struct renamedata as a rename can only happen within a single mount - Introduce kern_path_parent() for audit which sets the path to the parent and returns a dentry for the target without holding any locks on return - Rename kern_path_locked() as it is only used to prepare for the removal of an object from the filesystem: kern_path_locked() => start_removing_path() kern_path_create() => start_creating_path() user_path_create() => start_creating_user_path() user_path_locked_at() => start_removing_user_path_at() done_path_create() => end_creating_path() NA => end_removing_path()" * tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: debugfs: rename start_creating() to debugfs_start_creating() VFS: rename kern_path_locked() and related functions. VFS/audit: introduce kern_path_parent() for audit VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata VFS: discard err2 in filename_create() VFS/ovl: add lookup_one_positive_killable()
2025-09-29Merge tag 'vfs-6.18-rc1.writeback' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs writeback updates from Christian Brauner: "This contains work adressing lockups reported by users when a systemd unit reading lots of files from a filesystem mounted with the lazytime mount option exits. With the lazytime mount option enabled we can be switching many dirty inodes on cgroup exit to the parent cgroup. The numbers observed in practice when systemd slice of a large cron job exits can easily reach hundreds of thousands or millions. The logic in inode_do_switch_wbs() which sorts the inode into appropriate place in b_dirty list of the target wb however has linear complexity in the number of dirty inodes thus overall time complexity of switching all the inodes is quadratic leading to workers being pegged for hours consuming 100% of the CPU and switching inodes to the parent wb. Simple reproducer of the issue: FILES=10000 # Filesystem mounted with lazytime mount option MNT=/mnt/ echo "Creating files and switching timestamps" for (( j = 0; j < 50; j ++ )); do mkdir $MNT/dir$j for (( i = 0; i < $FILES; i++ )); do echo "foo" >$MNT/dir$j/file$i done touch -a -t 202501010000 $MNT/dir$j/file* done wait echo "Syncing and flushing" sync echo 3 >/proc/sys/vm/drop_caches echo "Reading all files from a cgroup" mkdir /sys/fs/cgroup/unified/mycg1 || exit echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit for (( j = 0; j < 50; j ++ )); do cat /mnt/dir$j/file* >/dev/null & done wait echo "Switching wbs" # Now rmdir the cgroup after the script exits This can be solved by: - Avoiding contention on the wb->list_lock when switching inodes by running a single work item per wb and managing a queue of items switching to the wb - Allowing rescheduling when switching inodes over to a different cgroup to avoid softlockups - Maintaining the b_dirty list ordering instead of sorting it" * tag 'vfs-6.18-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: writeback: Add tracepoint to track pending inode switches writeback: Avoid excessively long inode switching times writeback: Avoid softlockup when switching many inodes writeback: Avoid contention on wb->list_lock when switching inodes
2025-09-29Merge tag 'namespace-6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains a larger set of changes around the generic namespace infrastructure of the kernel. Each specific namespace type (net, cgroup, mnt, ...) embedds a struct ns_common which carries the reference count of the namespace and so on. We open-coded and cargo-culted so many quirks for each namespace type that it just wasn't scalable anymore. So given there's a bunch of new changes coming in that area I've started cleaning all of this up. The core change is to make it possible to correctly initialize every namespace uniformly and derive the correct initialization settings from the type of the namespace such as namespace operations, namespace type and so on. This leaves the new ns_common_init() function with a single parameter which is the specific namespace type which derives the correct parameters statically. This also means the compiler will yell as soon as someone does something remotely fishy. The ns_common_init() addition also allows us to remove ns_alloc_inum() and drops any special-casing of the initial network namespace in the network namespace initialization code that Linus complained about. Another part is reworking the reference counting. The reference counting was open-coded and copy-pasted for each namespace type even though they all followed the same rules. This also removes all open accesses to the reference count and makes it private and only uses a very small set of dedicated helpers to manipulate them just like we do for e.g., files. In addition this generalizes the mount namespace iteration infrastructure introduced a few cycles ago. As reminder, the vfs makes it possible to iterate sequentially and bidirectionally through all mount namespaces on the system or all mount namespaces that the caller holds privilege over. This allow userspace to iterate over all mounts in all mount namespaces using the listmount() and statmount() system call. Each mount namespace has a unique identifier for the lifetime of the systems that is exposed to userspace. The network namespace also has a unique identifier working exactly the same way. This extends the concept to all other namespace types. The new nstree type makes it possible to lookup namespaces purely by their identifier and to walk the namespace list sequentially and bidirectionally for all namespace types, allowing userspace to iterate through all namespaces. Looking up namespaces in the namespace tree works completely locklessly. This also means we can move the mount namespace onto the generic infrastructure and remove a bunch of code and members from struct mnt_namespace itself. There's a bunch of stuff coming on top of this in the future but for now this uses the generic namespace tree to extend a concept introduced first for pidfs a few cycles ago. For a while now we have supported pidfs file handles for pidfds. This has proven to be very useful. This extends the concept to cover namespaces as well. It is possible to encode and decode namespace file handles using the common name_to_handle_at() and open_by_handle_at() apis. As with pidfs file handles, namespace file handles are exhaustive, meaning it is not required to actually hold a reference to nsfs in able to decode aka open_by_handle_at() a namespace file handle. Instead the FD_NSFS_ROOT constant can be passed which will let the kernel grab a reference to the root of nsfs internally and thus decode the file handle. Namespaces file descriptors can already be derived from pidfds which means they aren't subject to overmount protection bugs. IOW, it's irrelevant if the caller would not have access to an appropriate /proc/<pid>/ns/ directory as they could always just derive the namespace based on a pidfd already. It has the same advantage as pidfds. It's possible to reliably and for the lifetime of the system refer to a namespace without pinning any resources and to compare them trivially. Permission checking is kept simple. If the caller is located in the namespace the file handle refers to they are able to open it otherwise they must hold privilege over the owning namespace of the relevant namespace. The namespace file handle layout is exposed as uapi and has a stable and extensible format. For now it simply contains the namespace identifier, the namespace type, and the inode number. The stable format means that userspace may construct its own namespace file handles without going through name_to_handle_at() as they are already allowed for pidfs and cgroup file handles" * tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits) ns: drop assert ns: move ns type into struct ns_common nstree: make struct ns_tree private ns: add ns_debug() ns: simplify ns_common_init() further cgroup: add missing ns_common include ns: use inode initializer for initial namespaces selftests/namespaces: verify initial namespace inode numbers ns: rename to __ns_ref nsfs: port to ns_ref_*() helpers net: port to ns_ref_*() helpers uts: port to ns_ref_*() helpers ipv4: use check_net() net: use check_net() net-sysfs: use check_net() user: port to ns_ref_*() helpers time: port to ns_ref_*() helpers pid: port to ns_ref_*() helpers ipc: port to ns_ref_*() helpers cgroup: port to ns_ref_*() helpers ...
2025-09-29Merge tag 'vfs-6.18-rc1.afs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull afs updates from Christian Brauner: "This contains the change to enable afs to support RENAME_NOREPLACE and RENAME_EXCHANGE if the server supports it" * tag 'vfs-6.18-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Add support for RENAME_NOREPLACE and RENAME_EXCHANGE
2025-09-29Merge tag 'kernel-6.18-rc1.clone3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull copy_process updates from Christian Brauner: "This contains the changes to enable support for clone3() on nios2 which apparently is still a thing. The more exciting part of this is that it cleans up the inconsistency in how the 64-bit flag argument is passed from copy_process() into the various other copy_*() helpers" [ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ] * tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nios2: implement architecture-specific portion of sys_clone3 arch: copy_thread: pass clone_flags as u64 copy_process: pass clone_flags as u64 across calltree copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)
2025-09-29Merge tag 'vfs-6.18-rc1.workqueue' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs workqueue updates from Christian Brauner: "This contains various workqueue changes affecting the filesystem layer. Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This replaces the use of system_wq and system_unbound_wq. system_wq is a per-CPU workqueue which isn't very obvious from the name and system_unbound_wq is to be used when locality is not required. So this renames system_wq to system_percpu_wq, and system_unbound_wq to system_dfl_wq. This also adds a new WQ_PERCPU flag to allow the fs subsystem users to explicitly request the use of per-CPU behavior. Both WQ_UNBOUND and WQ_PERCPU flags coexist for one release cycle to allow callers to transition their calls. WQ_UNBOUND will be removed in a next release cycle" * tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: WQ_PERCPU added to alloc_workqueue users fs: replace use of system_wq with system_percpu_wq fs: replace use of system_unbound_wq with system_dfl_wq
2025-09-29Merge tag 'vfs-6.18-rc1.rust' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs rust updates from Christian Brauner: "This contains a few minor vfs rust changes: - Add the pid namespace Rust wrappers to the correct MAINTAINERS entry - Use to_result() in the Rust file error handling code - Update imports for fs and pid_namespce Rust wrappers" * tag 'vfs-6.18-rc1.rust' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: rust: file: use to_result for error handling pid: add Rust files to MAINTAINERS rust: fs: update ARef and AlwaysRefCounted imports from sync::aref rust: pid_namespace: update AlwaysRefCounted imports from sync::aref
2025-09-29Merge tag 'vfs-6.18-rc1.pidfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: "This just contains a few changes to pid_nr_ns() to make it more robust and cleans up or improves a few users that ab- or misuse it currently" * tag 'vfs-6.18-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pid: change task_state() to use task_ppid_nr_ns() pid: change bacct_add_tsk() to use task_ppid_nr_ns() pid: make __task_pid_nr_ns(ns => NULL) safe for zombie callers pid: Add a judgment for ns null in pid_nr_ns
2025-09-29Merge tag 'vfs-6.18-rc1.iomap' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iomap updates from Christian Brauner: "This contains minor fixes and cleanups to the iomap code. Nothing really stands out here" * tag 'vfs-6.18-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: error out on file IO when there is no inline_data buffer iomap: trace iomap_zero_iter zeroing activities
2025-09-29Merge tag 'vfs-6.18-rc1.inode' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "This contains a series I originally wrote and that Eric brought over the finish line. It moves out the i_crypt_info and i_verity_info pointers out of 'struct inode' and into the fs-specific part of the inode. So now the few filesytems that actually make use of this pay the price in their own private inode storage instead of forcing it upon every user of struct inode. The pointer for the crypt and verity info is simply found by storing an offset to its address in struct fsverity_operations and struct fscrypt_operations. This shrinks struct inode by 16 bytes. I hope to move a lot more out of it in the future so that struct inode becomes really just about very core stuff that we need, much like struct dentry and struct file, instead of the dumping ground it has become over the years. On top of this are a various changes associated with the ongoing inode lifetime handling rework that multiple people are pushing forward: - Stop accessing inode->i_count directly in f2fs and gfs2. They simply should use the __iget() and iput() helpers - Make the i_state flags an enum - Rework the iput() logic Currently, if we are the last iput, and we have the I_DIRTY_TIME bit set, we will grab a reference on the inode again and then mark it dirty and then redo the put. This is to make sure we delay the time update for as long as possible We can rework this logic to simply dec i_count if it is not 1, and if it is do the time update while still holding the i_count reference Then we can replace the atomic_dec_and_lock with locking the ->i_lock and doing atomic_dec_and_test, since we did the atomic_add_unless above - Add an icount_read() helper and convert everyone that accesses inode->i_count directly for this purpose to use the helper - Expand dump_inode() to dump more information about an inode helping in debugging - Add some might_sleep() annotations to iput() and associated helpers" * tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add might_sleep() annotation to iput() and more fs: expand dump_inode() inode: fix whitespace issues fs: add an icount_read helper fs: rework iput logic fs: make the i_state flags an enum fs: stop accessing ->i_count directly in f2fs and gfs2 fsverity: check IS_VERITY() in fsverity_cleanup_inode() fs: remove inode::i_verity_info btrfs: move verity info pointer to fs-specific part of inode f2fs: move verity info pointer to fs-specific part of inode ext4: move verity info pointer to fs-specific part of inode fsverity: add support for info in fs-specific part of inode fs: remove inode::i_crypt_info ceph: move crypt info pointer to fs-specific part of inode ubifs: move crypt info pointer to fs-specific part of inode f2fs: move crypt info pointer to fs-specific part of inode ext4: move crypt info pointer to fs-specific part of inode fscrypt: add support for info in fs-specific part of inode fscrypt: replace raw loads of info pointer with helper function
2025-09-29Merge tag 'vfs-6.18-rc1.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: "This contains some work around mount api handling: - Output the warning message for mnt_too_revealing() triggered during fsmount() to the fscontext log. This makes it possible for the mount tool to output appropriate warnings on the command line. For example, with the newest fsopen()-based mount(8) from util-linux, the error messages now look like: # mount -t proc proc /tmp mount: /tmp: fsmount() failed: VFS: Mount too revealing. dmesg(1) may have more information after failed mount system call. - Do not consume fscontext log entries when returning -EMSGSIZE Userspace generally expects APIs that return -EMSGSIZE to allow for them to adjust their buffer size and retry the operation. However, the fscontext log would previously clear the message even in the -EMSGSIZE case. Given that it is very cheap for us to check whether the buffer is too small before we remove the message from the ring buffer, let's just do that instead. - Drop an unused argument from do_remount()" * tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: fs/namespace.c: remove ms_flags argument from do_remount selftests/filesystems: add basic fscontext log tests fscontext: do not consume log entries when returning -EMSGSIZE vfs: output mount_too_revealing() errors to fscontext docs/vfs: Remove mentions to the old mount API helpers fscontext: add custom-prefix log helpers fs: Remove mount_bdev fs: Remove mount_nodev
2025-09-29mount: handle NULL values in mnt_ns_release()Christian Brauner
When calling in listmount() mnt_ns_release() may be passed a NULL pointer. Handle that case gracefully. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-29Merge tag 'vfs-6.18-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add "initramfs_options" parameter to set initramfs mount options. This allows to add specific mount options to the rootfs to e.g., limit the memory size - Add RWF_NOSIGNAL flag for pwritev2() Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE signal from being raised when writing on disconnected pipes or sockets. The flag is handled directly by the pipe filesystem and converted to the existing MSG_NOSIGNAL flag for sockets - Allow to pass pid namespace as procfs mount option Ever since the introduction of pid namespaces, procfs has had very implicit behaviour surrounding them (the pidns used by a procfs mount is auto-selected based on the mounting process's active pidns, and the pidns itself is basically hidden once the mount has been constructed) This implicit behaviour has historically meant that userspace was required to do some special dances in order to configure the pidns of a procfs mount as desired. Examples include: * In order to bypass the mnt_too_revealing() check, Kubernetes creates a procfs mount from an empty pidns so that user namespaced containers can be nested (without this, the nested containers would fail to mount procfs) But this requires forking off a helper process because you cannot just one-shot this using mount(2) * Container runtimes in general need to fork into a container before configuring its mounts, which can lead to security issues in the case of shared-pidns containers (a privileged process in the pidns can interact with your container runtime process) While SUID_DUMP_DISABLE and user namespaces make this less of an issue, the strict need for this due to a minor uAPI wart is kind of unfortunate Things would be much easier if there was a way for userspace to just specify the pidns they want. So this pull request contains changes to implement a new "pidns" argument which can be set using fsconfig(2): fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd); fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0); or classic mount(2) / mount(8): // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid"); Cleanups: - Remove the last references to EXPORT_OP_ASYNC_LOCK - Make file_remove_privs_flags() static - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used - Use try_cmpxchg() in start_dir_add() - Use try_cmpxchg() in sb_init_done_wq() - Replace offsetof() with struct_size() in ioctl_file_dedupe_range() - Remove vfs_ioctl() export - Replace rwlock() with spinlock in epoll code as rwlock causes priority inversion on preempt rt kernels - Make ns_entries in fs/proc/namespaces const - Use a switch() statement() in init_special_inode() just like we do in may_open() - Use struct_size() in dir_add() in the initramfs code - Use str_plural() in rd_load_image() - Replace strcpy() with strscpy() in find_link() - Rename generic_delete_inode() to inode_just_drop() and generic_drop_inode() to inode_generic_drop() - Remove unused arguments from fcntl_{g,s}et_rw_hint() Fixes: - Document @name parameter for name_contains_dotdot() helper - Fix spelling mistake - Always return zero from replace_fd() instead of the file descriptor number - Limit the size for copy_file_range() in compat mode to prevent a signed overflow - Fix debugfs mount options not being applied - Verify the inode mode when loading it from disk in minixfs - Verify the inode mode when loading it from disk in cramfs - Don't trigger automounts with RESOLVE_NO_XDEV If openat2() was called with RESOLVE_NO_XDEV it didn't traverse through automounts, but could still trigger them - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in tracepoints - Fix unused variable warning in rd_load_image() on s390 - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD - Use ns_capable_noaudit() when determining net sysctl permissions - Don't call path_put() under namespace semaphore in listmount() and statmount()" * tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits) fcntl: trim arguments listmount: don't call path_put() under namespace semaphore statmount: don't call path_put() under namespace semaphore pid: use ns_capable_noaudit() when determining net sysctl permissions fs: rename generic_delete_inode() and generic_drop_inode() init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD initramfs: Replace strcpy() with strscpy() in find_link() initrd: Use str_plural() in rd_load_image() initramfs: Use struct_size() helper to improve dir_add() initrd: Fix unused variable warning in rd_load_image() on s390 fs: use the switch statement in init_special_inode() fs/proc/namespaces: make ns_entries const filelock: add FL_RECLAIM to show_fl_flags() macro eventpoll: Replace rwlock with spinlock selftests/proc: add tests for new pidns APIs procfs: add "pidns" mount option pidns: move is-ancestor logic to helper openat2: don't trigger automounts with RESOLVE_NO_XDEV namei: move cross-device check to __traverse_mounts namei: remove LOOKUP_NO_XDEV check from handle_mounts ...
2025-09-29Fix CC_HAS_ASM_GOTO_OUTPUT on non-x86 architecturesLinus Torvalds
There's a silly problem with the CC_HAS_ASM_GOTO_OUTPUT test: even with a working compiler it will fail on some architectures simply because it uses the mnemonic "jmp" for testing the inline asm. And as reported by Geert, not all architectures use that mnemonic, so the test fails spuriously on such platforms (including arm and riscv, but also several other architectures). This issue avoided any obvious test failures because the build still works thanks to falling back on the old non-asm-goto code, which just generates worse code. Just use an empty asm statement instead. Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Fixes: e2ffa15b9baa ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17") Cc: Nathan Chancellor <nathan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-29Merge branches 'acpi-apei', 'acpi-misc' and 'pnp'Rafael J. Wysocki
Merge ACPI APEI updates, a miscellaneous update related to ACPI, and a PNP update for 6.18-rc1: - Remove redundant assignments in erst_dbg_{ioctl|write}() in the ACPI APEI driver (Thorsten Blum) - Allow the ACPI APEI EINJ to handle more types of addresses than just MMIO (Jiaqi Yan) - Use str_low_high() helper in two places in the ACPI code (Chelsy Ratnawat) - Use str_plural() to simplify the PNP code (Xichao Zhao) * acpi-apei: ACPI: APEI: EINJ: Allow more types of addresses except MMIO ACPI: APEI: Remove redundant assignments in erst_dbg_{ioctl|write}() * acpi-misc: ACPI: Use str_low_high() helper in two places * pnp: PNP: isapnp: use str_plural() to simplify the code
2025-09-29Merge branches 'acpi-thermal', 'acpi-fan', 'acpi-video', 'acpi-tad' and ↵Rafael J. Wysocki
'acpi-prm' Merge an ACPI thermal zone driver update, an ACPI fan driver update, an ACPI backlight (video) driver update, an ACPI TAD (time and alarm device) driver update, and an ACPI PRM (platform runtime mechanism) driver update for 6.18-rc1: - Eliminate a dummy local variable from the ACPI thermal driver (Rafael Wysocki) - Fold two simple functions into their only caller in the ACPI fan driver (Rafael Wysocki) - Force native backlight on Lenovo 82K8 in the ACPI backlight (video) driver (Mario Limonciello) - Add missing sysfs_remove_group() for ACPI_TAD_RT (Daniel Tang) - Skip PRM handlers with NULL handler_address or NULL VA in the ACPI PRM driver (Shang song) * acpi-thermal: ACPI: thermal: Get rid of a dummy local variable * acpi-fan: ACPI: fan: Fold two simple functions into their only caller * acpi-video: ACPI: video: force native for Lenovo 82K8 * acpi-tad: ACPI: TAD: Add missing sysfs_remove_group() for ACPI_TAD_RT * acpi-prm: ACPI: PRM: Skip handlers with NULL handler_address or NULL VA
2025-09-29Merge branches 'acpi-property', 'acpi-resource', 'acpi-pm' and 'acpi-tables'Rafael J. Wysocki
Merge updates of the ACPI device properties management code, ACPI resources management code, ACPI power management, and ACPI data tables parsing code for 6.18-rc1: - Fix ACPI buffer properties extraction for data-only subnodes represented as _DSD-equivalent packages (Rafael Wysocki) - Fix handling of ACPI data-only subnodes represented as _DSD-equivalent packages in the case when they are embedded in larger _DSD-equivalent packages and clean up acpi_nondev_subnode_extract() (Rafael Wysocki) - Skip ACPI IRQ override on ASUS Vivobook Pro N6506CU (Sam van Kampen) - Add power resource init function and use it for introducing an HP EliteBook 855 G7 WWAN modem power resource quirk (Maciej Szmigiero) - Add support for DBG2 RISC-V SBI port subtype and Precise Baud Rate field to the ACPI SPCR table parser (Chen Pei) * acpi-property: ACPI: property: Adjust failure handling in acpi_nondev_subnode_extract() ACPI: property: Do not pass NULL handles to acpi_attach_data() ACPI: property: Add code comments explaining what is going on ACPI: property: Disregard references in data-only subnode lists ACPI: property: Fix buffer properties extraction for subnodes * acpi-resource: ACPI: resource: Skip IRQ override on ASUS Vivobook Pro N6506CU * acpi-pm: ACPI: PM: Add HP EliteBook 855 G7 WWAN modem power resource quirk ACPI: PM: Add power resource init function * acpi-tables: ACPI: SPCR: Support Precise Baud Rate field ACPI: SPCR: Add support for DBG2 RISC-V SBI port subtype
2025-09-29Merge branches 'acpi-scan', 'acpi-processor' and 'acpi-sysfs'Rafael J. Wysocki
Merge an ACPI device enumeration update, ACPI processor driver updates, and an ACPI sysfs-related code update for 6.18-rc1: - Add Intel CVS ACPI HIDs to acpi_ignore_dep_ids[] so it is not regarded as real dependency (Hans de Goede) - Use ACPI_FREE() for freeing an ACPI object in description_show() in the ACPI sysfs-related code (Kaushlendra Kumar) - Fix memory leak in the ACPI processor idle driver registration error code path and optimize ACPI idle driver registration (Huisong Li, Rafael Wysocki) - Add module import namespace to the ACPI processor idle driver (Rafael Wysocki) - Eliminate static variable flat_state_cnt from the ACPI processor idle driver (Rafael Wysocki) - Release cpufreq policy references using __free() in the ACPI processor thremal driver (Zihuan Zhang) - Remove unused empty stubs of some functions and rearrange function declarations in a header file in the ACPI processor driver (Huisong Li) - Redefine two functions as void in the ACPI processor driver (Rafael Wysocki) - Do not expose global variable acpi_idle_driver in the ACPI processor driver (Huisong Li) * acpi-scan: ACPI: scan: Add Intel CVS ACPI HIDs to acpi_ignore_dep_ids[] * acpi-processor: ACPI: processor: Do not expose global variable acpi_idle_driver ACPI: processor: idle: Redefine two functions as void ACPI: processor: Update cpuidle driver check in __acpi_processor_start() ACPI: processor: idle: Rearrange declarations in header file ACPI: processor: Remove unused empty stubs of some functions ACPI: processor: thermal: Release policy references using __free() ACPI: processor: idle: Fix function defined but not used warning ACPI: processor: idle: Eliminate static variable flat_state_cnt ACPI: processor: idle: Add module import namespace ACPI: processor: idle: Optimize ACPI idle driver registration ACPI: processor: idle: Fix memory leak when register cpuidle device failed * acpi-sysfs: ACPI: sysfs: Use ACPI_FREE() for freeing an ACPI object
2025-09-29Merge branch 'acpica'Rafael J. Wysocki
Merge ACPICA updates (20250807 release material with a few fixes on top) for 6.18-rc1: - Add SoundWire File Table (SWFT) signature to ACPICA (Maciej Strozek) - Rearrange local variable definition involving #ifdef in ACPICA to avoid using uninitialized variables (Zhe Qiao) - Allow ACPICA to skip Global Lock initialization (Huacai Chen) - Apply ACPI_NONSTRING in more places in ACPICA and fix two regressions related to incorrect ACPI_NONSTRING usage (Ahmed Salem) - Fix printing CDAT table header when dissasebling CDAT AML (Ahmed Salem) - Use acpi_ds_clear_operands() in acpi_ds_call_control_method() in ACPICA (Hans de Goede) - Update dsmethod.c in ACPICA to address unused variable warning (Saket Dumbre) - Print error messages in ACPICA for too few or too many control method arguments (Saket Dumbre) - Update ACPICA version to 20250807 (Saket Dumbre) - Fix largest possible resource descriptor index in ACPICA (Dmitry Antipov) - Add Back-Invalidate restriction to CXL Window for CEDT in ACPICA (Davidlohr Bueso). - Add the package type to acceptable Arg3 types for _DSM in ACPICA because ACPI_TYPE_ANY does not cover it (Saket Dumbre) - Fix return values in ap_is_valid_checksum() in the acpidump utility in ACPICA (Kaushlendra Kumar) * acpica: ACPICA: acpidump: fix return values in ap_is_valid_checksum() ACPICA: ACPI_TYPE_ANY does not include the package type ACPICA: CEDT: Add Back-Invalidate restriction to CXL Window ACPICA: Fix largest possible resource descriptor index ACPICA: Update version to 20250807 ACPICA: Print error messages for too few or too many arguments ACPICA: Update dsmethod.c to get rid of unused variable warning ACPICA: dispatcher: Use acpi_ds_clear_operands() in acpi_ds_call_control_method() ACPICA: Debugger: drop ACPI_NONSTRING attribute from name_seg ACPICA: acpidump: drop ACPI_NONSTRING attribute from file_name ACPICA: iASL: Fix printing CDAT table header ACPICA: Apply ACPI_NONSTRING ACPICA: Allow to skip Global Lock initialization ACPICA: Change the compilation conditions ACPICA: Remove redundant "#ifdef" definitions ACPICA: Modify variable definition position ACPICA: Add SoundWire File Table (SWFT) signature
2025-09-29Merge branch 'pm-tools'Rafael J. Wysocki
Merge power management utilities updates for 6.18-rc1: - Fix and clean up the x86_energy_perf_policy utility and update its documentation (Len Brown, Kaushlendra Kumar) - Fix incorrect sorting of PMT telemetry in turbostat (Kaushlendra Kumar) - Fix incorrect size in cpuidle_state_disable() and the error return value of cpupower_write_sysfs() in cpupower (Kaushlendra Kumar) * pm-tools: tools/power x86_energy_perf_policy.8: Emphasize preference for SW interfaces tools/power x86_energy_perf_policy: Add make snapshot target tools/power x86_energy_perf_policy: Prefer driver HWP limits tools/power x86_energy_perf_policy: EPB access is only via sysfs tools/power x86_energy_perf_policy: Prepare for MSR/sysfs refactoring tools/power x86_energy_perf_policy: Enhance HWP enable tools/power x86_energy_perf_policy: Enhance HWP enabled check tools/power x86_energy_perf_policy: Fix incorrect fopen mode usage tools/power turbostat: Fix incorrect sorting of PMT telemetry tools/cpupower: Fix incorrect size in cpuidle_state_disable() tools/cpupower: fix error return value in cpupower_write_sysfs()
2025-09-29Merge branches 'pm-core', 'pm-runtime' and 'pm-sleep'Rafael J. Wysocki
Merge changes related to system sleep and runtime PM framework for 6.18-rc1: - Annotate loops walking device links in the power management core code as _srcu and add macros for walking device links to reduce the likelihood of coding mistakes related to them (Rafael Wysocki) - Document time units for *_time functions in the runtime PM API (Brian Norris) - Clear power.must_resume in noirq suspend error path to avoid resuming a dependant device under a suspended parent or supplier (Rafael Wysocki) - Fix GFP mask handling during hybrid suspend and make the amdgpu driver handle hybrid suspend correctly (Mario Limonciello, Rafael Wysocki) - Fix GFP mask handling after aborted hibernation in platform mode and combine exit paths in power_down() to avoid code duplication (Rafael Wysocki) - Use vmalloc_array() and vcalloc() in the hibernation core to avoid open-coded size computations (Qianfeng Rong) - Fix typo in hibernation core code comment (Li Jun) - Call pm_wakeup_clear() in the same place where other functions that do bookkeeping prior to suspend_prepare() are called (Samuel Wu) * pm-core: PM: core: Add two macros for walking device links PM: core: Annotate loops walking device links as _srcu * pm-runtime: PM: runtime: Documentation: ABI: Document time units for *_time * pm-sleep: PM: hibernate: Combine return paths in power_down() PM: hibernate: Restrict GFP mask in power_down() PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage drm/amd: Fix hybrid sleep PM: hibernate: Add pm_hibernation_mode_is_suspend() PM: hibernate: Fix hybrid-sleep PM: sleep: core: Clear power.must_resume in noirq suspend error path PM: sleep: Make pm_wakeup_clear() call more clear PM: hibernate: Fix typo in memory bitmaps description comment PM: hibernate: Use vmalloc_array() and vcalloc() to improve code
2025-09-29Merge branches 'pm-cpuidle' and 'pm-powercap'Rafael J. Wysocki
Merge cpuidle and power capping changes for 6.18-rc1: - Fail cpuidle device registration if there is one already to avoid sysfs-related issues (Rafael Wysocki) - Use sysfs_emit()/sysfs_emit_at() instead of sprintf()/scnprintf() in cpuidle (Vivek Yadav) - Fix device and OF node leaks at probe in the qcom-spm cpuidle driver and drop unnecessary initialisations from it (Johan Hovold) - Remove unnecessary address-of operators from the intel_idle cpuidle driver (Kaushlendra Kumar) - Rearrange main loop in menu_select() to make the code in that funtion easier to follow (Rafael Wysocki) - Convert values in microseconds to ktime using us_to_ktime() where applicable in the intel_idle power capping driver (Xichao Zhao) * pm-cpuidle: cpuidle: Fail cpuidle device registration if there is one already cpuidle: sysfs: Use sysfs_emit()/sysfs_emit_at() instead of sprintf()/scnprintf() cpuidle: qcom-spm: drop unnecessary initialisations cpuidle: qcom-spm: fix device and OF node leaks at probe intel_idle: Remove unnecessary address-of operators cpuidle: governors: menu: Rearrange main loop in menu_select() * pm-powercap: powercap: idle_inject: use us_to_ktime() where appropriate
2025-09-29Merge branches 'pm-em', 'pm-opp' and 'pm-devfreq'Rafael J. Wysocki
Merge energy model management, OPP (operating performance points) and devfreq updates for 6.18-rc1: - Prevent CPU capacity updates after registering a perf domain from failing on a first CPU that is not present (Christian Loehle) - Add support for the cases in which frequency alone is not sufficient to uniquely identify an OPP (Krishna Chaitanya Chundru) - Use to_result() for OPP error handling in Rust (Onur Özkan) - Add support for LPDDR5 on Rockhip RK3588 SoC to rockchip-dfi devfreq driver (Nicolas Frattaroli) - Fix an issue where DDR cycle counts on RK3588/RK3528 with LPDDR4(X) are reported as half by adding a cycle multiplier to the DFI driver in rockchip-dfi devfreq-event driver (Nicolas Frattaroli) - Fix missing error pointer dereference check of regulator instance in the mtk-cci devfreq driver probe and remove a redundant condition from an if () statement in that driver (Dan Carpenter, Liao Yuanhong) * pm-em: PM: EM: Fix late boot with holes in CPU topology * pm-opp: OPP: Add support to find OPP for a set of keys rust: opp: use to_result for error handling * pm-devfreq: PM / devfreq: rockchip-dfi: add support for LPDDR5 PM / devfreq: rockchip-dfi: double count on RK3588 PM / devfreq: mtk-cci: avoid redundant conditions PM / devfreq: mtk-cci: Fix potential error pointer dereference in probe()
2025-09-29Merge tag 'linux-cpupower-6.18-rc1' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Merge cpupower utility updates for 6.18-rc1 from Shuah Khan: "Fixes incorrect return vale in cpupower_write_sysfs() error path and passing incorrect size to cpuidle_state_write_file() while writing status to disable file in cpuidle_state_disable()." * tag 'linux-cpupower-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: (1125 commits) tools/cpupower: Fix incorrect size in cpuidle_state_disable() tools/cpupower: fix error return value in cpupower_write_sysfs() Linux 6.17-rc6 MAINTAINERS: Input: Drop melfas-mip4 section USB: core: remove the move buf action MAINTAINERS: Update the DMA Rust entry erofs: fix long xattr name prefix placement Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups" hsr: hold rcu and dev lock for hsr_get_port_ndev hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr hsr: use rtnl lock when iterating over ports wifi: nl80211: completely disable per-link stats for now net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups net: ethtool: fix wrong type used in struct kernel_ethtool_ts_info selftests/bpf: Skip timer cases when bpf_timer is not supported bpf: Reject bpf_timer for PREEMPT_RT libceph: fix invalid accesses to ceph_connection_v1_info PM: hibernate: Restrict GFP mask in hibernation_snapshot() MAINTAINERS: add Phil as netfilter reviewer netfilter: nf_tables: restart set lookup on base_seq change ...
2025-09-29Merge series "slab: Re-entrant kmalloc_nolock()"Vlastimil Babka
From the cover letter [1]: This patch set introduces kmalloc_nolock() which is the next logical step towards any context allocation necessary to remove bpf_mem_alloc and get rid of preallocation requirement in BPF infrastructure. In production BPF maps grew to gigabytes in size. Preallocation wastes memory. Alloc from any context addresses this issue for BPF and other subsystems that are forced to preallocate too. This long task started with introduction of alloc_pages_nolock(), then memcg and objcg were converted to operate from any context including NMI, this set completes the task with kmalloc_nolock() that builds on top of alloc_pages_nolock() and memcg changes. After that BPF subsystem will gradually adopt it everywhere. Link: https://lore.kernel.org/all/20250909010007.1660-1-alexei.starovoitov@gmail.com/ [1]
2025-09-29Merge series "SLUB percpu sheaves"Vlastimil Babka
This series adds an opt-in percpu array-based caching layer to SLUB. It has evolved to a state where kmem caches with sheaves are compatible with all SLUB features (slub_debug, SLUB_TINY, NUMA locality considerations). The plan is therefore that it will be later enabled for all kmem caches and replace the complicated cpu (partial) slabs code. Note the name "sheaf" was invented by Matthew Wilcox so we don't call the arrays magazines like the original Bonwick paper. The per-NUMA-node cache of sheaves is thus called "barn". This caching may seem similar to the arrays we had in SLAB, but there are some important differences: - deals differently with NUMA locality of freed objects, thus there are no per-node "shared" arrays (with possible lock contention) and no "alien" arrays that would need periodical flushing - instead, freeing remote objects (which is rare) bypasses the sheaves - percpu sheaves thus contain only local objects (modulo rare races and local node exhaustion) - NUMA restricted allocations and strict_numa mode is still honoured - improves kfree_rcu() handling by reusing whole sheaves - there is an API for obtaining a preallocated sheaf that can be used for guaranteed and efficient allocations in a restricted context, when the upper bound for needed objects is known but rarely reached - opt-in, not used for every cache (for now) The motivation comes mainly from the ongoing work related to VMA locking scalability and the related maple tree operations. This is why VMA and maple nodes caches are sheaf-enabled in the patchset. A sheaf-enabled cache has the following expected advantages: - Cheaper fast paths. For allocations, instead of local double cmpxchg, thanks to local_trylock() it becomes a preempt_disable() and no atomic operations. Same for freeing, which is otherwise a local double cmpxchg only for short term allocations (so the same slab is still active on the same cpu when freeing the object) and a more costly locked double cmpxchg otherwise. - kfree_rcu() batching and recycling. kfree_rcu() will put objects to a separate percpu sheaf and only submit the whole sheaf to call_rcu() when full. After the grace period, the sheaf can be used for allocations, which is more efficient than freeing and reallocating individual slab objects (even with the batching done by kfree_rcu() implementation itself). In case only some cpus are allowed to handle rcu callbacks, the sheaf can still be made available to other cpus on the same node via the shared barn. The maple_node cache uses kfree_rcu() and thus can benefit from this. Note: this path is currently limited to !PREEMPT_RT - Preallocation support. A prefilled sheaf can be privately borrowed to perform a short term operation that is not allowed to block in the middle and may need to allocate some objects. If an upper bound (worst case) for the number of allocations is known, but only much fewer allocations actually needed on average, borrowing and returning a sheaf is much more efficient then a bulk allocation for the worst case followed by a bulk free of the many unused objects. Maple tree write operations should benefit from this. - Compatibility with slub_debug. When slub_debug is enabled for a cache, we simply don't create the percpu sheaves so that the debugging hooks (at the node partial list slowpaths) are reached as before. The same thing is done for CONFIG_SLUB_TINY. Sheaf preallocation still works by reusing the (ineffective) paths for requests exceeding the cache's sheaf_capacity. This is in line with the existing approach where debugging bypasses the fast paths and SLUB_TINY preferes memory savings over performance. The above is adapted from the cover letter [1], which contains also in-kernel microbenchmark results showing the lower overhead of sheaves. Results from Suren Baghdasaryan [2] using a mmap/munmap microbenchmark also show improvements. Results from Sudarsan Mahendran [3] using will-it-scale show both benefits and regressions, probably due to overall noisiness of those tests. Link: https://lore.kernel.org/all/20250910-slub-percpu-caches-v8-0-ca3099d8352c@suse.cz/ [1] Link: https://lore.kernel.org/all/CAJuCfpEQ%3DRUgcAvRzE5jRrhhFpkm8E2PpBK9e9GhK26ZaJQt%3DQ@mail.gmail.com/ [2] Link: https://lore.kernel.org/all/20250913000935.1021068-1-sudarsanm@google.com/ [3]
2025-09-29slab: Introduce kmalloc_nolock() and kfree_nolock().Alexei Starovoitov
kmalloc_nolock() relies on ability of local_trylock_t to detect the situation when per-cpu kmem_cache is locked. In !PREEMPT_RT local_(try)lock_irqsave(&s->cpu_slab->lock, flags) disables IRQs and marks s->cpu_slab->lock as acquired. local_lock_is_locked(&s->cpu_slab->lock) returns true when slab is in the middle of manipulating per-cpu cache of that specific kmem_cache. kmalloc_nolock() can be called from any context and can re-enter into ___slab_alloc(): kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf -> kmalloc_nolock() -> ___slab_alloc(cache_B) or kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() -> ___slab_alloc(cache_B) Hence the caller of ___slab_alloc() checks if &s->cpu_slab->lock can be acquired without a deadlock before invoking the function. If that specific per-cpu kmem_cache is busy the kmalloc_nolock() retries in a different kmalloc bucket. The second attempt will likely succeed, since this cpu locked different kmem_cache. Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu rt_spin_lock is locked by current _task_. In this case re-entrance into the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different bucket that is most likely is not locked by the current task. Though it may be locked by a different task it's safe to rt_spin_lock() and sleep on it. Similar to alloc_pages_nolock() the kmalloc_nolock() returns NULL immediately if called from hard irq or NMI in PREEMPT_RT. kfree_nolock() defers freeing to irq_work when local_lock_is_locked() and (in_nmi() or in PREEMPT_RT). SLUB_TINY config doesn't use local_lock_is_locked() and relies on spin_trylock_irqsave(&n->list_lock) to allocate, while kfree_nolock() always defers to irq_work. Note, kfree_nolock() must be called _only_ for objects allocated with kmalloc_nolock(). Debug checks (like kmemleak and kfence) were skipped on allocation, hence obj = kmalloc(); kfree_nolock(obj); will miss kmemleak/kfence book keeping and will cause false positives. large_kmalloc is not supported by either kmalloc_nolock() or kfree_nolock(). Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29slab: Reuse first bit for OBJEXTS_ALLOC_FAILAlexei Starovoitov
Since the combination of valid upper bits in slab->obj_exts with OBJEXTS_ALLOC_FAIL bit can never happen, use OBJEXTS_ALLOC_FAIL == (1ull << 0) as a magic sentinel instead of (1ull << 2) to free up bit 2. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29slab: Make slub local_(try)lock more precise for LOCKDEPAlexei Starovoitov
kmalloc_nolock() can be called from any context the ___slab_alloc() can acquire local_trylock_t (which is rt_spin_lock in PREEMPT_RT) and attempt to acquire a different local_trylock_t while in the same task context. The calling sequence might look like: kmalloc() -> tracepoint -> bpf -> kmalloc_nolock() or more precisely: __lock_acquire+0x12ad/0x2590 lock_acquire+0x133/0x2d0 rt_spin_lock+0x6f/0x250 ___slab_alloc+0xb7/0xec0 kmalloc_nolock_noprof+0x15a/0x430 my_debug_callback+0x20e/0x390 [testmod] ___slab_alloc+0x256/0xec0 __kmalloc_cache_noprof+0xd6/0x3b0 Make LOCKDEP understand that local_trylock_t-s protect different kmem_caches. In order to do that add lock_class_key for each kmem_cache and use that key in local_trylock_t. This stack trace is possible on both PREEMPT_RT and !PREEMPT_RT, but teach lockdep about it only for PREEMPT_RT, since in !PREEMPT_RT the ___slab_alloc() code is using local_trylock_irqsave() when lockdep is on. Note, this patch applies this logic to local_lock_t while the next one converts it to local_trylock_t. Both are mapped to rt_spin_lock in PREEMPT_RT. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29mm: Introduce alloc_frozen_pages_nolock()Alexei Starovoitov
Split alloc_pages_nolock() and introduce alloc_frozen_pages_nolock() to be used by alloc_slab_page(). Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29mm: Allow GFP_ACCOUNT to be used in alloc_pages_nolock().Alexei Starovoitov
Change alloc_pages_nolock() to default to __GFP_COMP when allocating pages, since upcoming reentrant alloc_slab_page() needs __GFP_COMP. Also allow __GFP_ACCOUNT flag to be specified, since most of BPF infra needs __GFP_ACCOUNT except BPF streams. Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29locking/local_lock: Introduce local_lock_is_locked().Alexei Starovoitov
Introduce local_lock_is_locked() that returns true when given local_lock is locked by current cpu (in !PREEMPT_RT) or by current task (in PREEMPT_RT). The goal is to detect a deadlock by the caller. Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29maple_tree: Convert forking to use the sheaf interfaceLiam R. Howlett
Use the generic interface which should result in less bulk allocations during a forking. A part of this is to abstract the freeing of the sheaf or maple state allocations into its own function so mas_destroy() and the tree duplication code can use the same functionality to return any unused resources. [andriy.shevchenko@linux.intel.com: remove unused mt_alloc_bulk()] Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29maple_tree: Add single node allocation support to maple stateLiam R. Howlett
The fast path through a write will require replacing a single node in the tree. Using a sheaf (32 nodes) is too heavy for the fast path, so special case the node store operation by just allocating one node in the maple state. Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29maple_tree: Prefilled sheaf conversion and testingLiam R. Howlett
Use prefilled sheaves instead of bulk allocations. This should speed up the allocations and the return path of unused allocations. Remove the push and pop of nodes from the maple state as this is now handled by the slab layer with sheaves. Testing has been removed as necessary since the features of the tree have been reduced. Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29tools/testing: Add support for prefilled slab sheafsLiam R. Howlett
Add the prefilled sheaf structs to the slab header and the associated functions to the testing/shared/linux.c file. Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-09-29maple_tree: Replace mt_free_one() with kfree()Pedro Falcato
kfree() is a little shorter and works with kmem_cache_alloc'd pointers too. Also lets us remove one more helper. Signed-off-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>