linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2025-12-05	Merge tag 'fuse-update-6.19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add mechanism for cleaning out unused, stale dentries; controlled via a module option (Luis Henriques) - Fix various bugs - Cleanups * tag 'fuse-update-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Uninitialized variable in fuse_epoch_work() fuse: fix io-uring list corruption for terminated non-committed requests fuse: signal that a fuse inode should exhibit local fs behaviors fuse: Always flush the page cache before FOPEN_DIRECT_IO write fuse: Invalidate the page cache after FOPEN_DIRECT_IO write fuse: rename 'namelen' to 'namesize' fuse: use strscpy instead of strcpy fuse: refactor fuse_conn_put() to remove negative logic. fuse: new work queue to invalidate dentries from old epochs fuse: new work queue to periodically invalidate expired dentries dcache: export shrink_dentry_list() and add new helper d_dispose_if_unused() fuse: add WARN_ON and comment for RCU revalidate fuse: Fix whitespace for fuse_uring_args_to_ring() comment fuse: missing copy_finish in fuse-over-io-uring argument copies fuse: fix readahead reclaim deadlock
2025-11-17	get rid of kill_litter_super()	Al Viro
	Not used anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16	primitives for maintaining persisitency	Al Viro
	* d_make_persistent(dentry, inode) - bump refcount, mark persistent and make hashed positive. Return value is a borrowed reference to dentry; it can be used until something removes persistency (at the very least, until the parent gets unlocked, but some filesystems may have stronger exclusion). * d_make_discardable() - remove persistency mark and drop reference. d_make_persistent() is similar to combination of d_instantiate(), dget() and setting flag. The only difference is that unlike d_instantiate() it accepts hashed and unhashed negatives alike. It is always called in strong locking environment (parent held exclusive, or, in some cases, dentry coming from d_alloc_name()); if we ever start using it with parent held only shared and dentry coming from d_alloc_parallel(), we'll need to copy the in-lookup logics from __d_add(). d_make_discardable() is eqiuvalent to combination of removing flag and dput(); since flag removal requires ->d_lock, there's no point trying to avoid taking that for refcount decrement as fast_dput() does. The slow path of dput() has been taken into a helper and reused in d_make_discardable() instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16	introduce a flag for explicitly marking persistently pinned dentries	Al Viro
	Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). Reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: * get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. * instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. Here we * introduce the new flag * teach shrink_dcache_for_umount() to handle it (i.e. remove and drop refcount on anything that survives to umount with that flag still set) * teach kill_litter_super() that anything with that flag does not need to be unpinned. Next commits will add primitives for maintaing that flag and convert the common helpers to those. After that - a long series of per-filesystem patches converting to those primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-12	dcache: export shrink_dentry_list() and add new helper d_dispose_if_unused()	Luis Henriques
	Add and export a new helper d_dispose_if_unused() which is simply a wrapper around to_shrink_list(), to add an entry to a dispose list if it's not used anymore. Also export shrink_dentry_list() to kill all dentries in a dispose list. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-09-15	make it easier to catch those who try to modify ->d_name	Al Viro
	Turn d_name into an anon union of const struct qstr d_name with struct qstr __d_name. Very few places need to modify it (all in fs/dcache.c); those are switched to use of ->__d_name. Note that ->d_name can actually change under you unless you have the right locking environment; this const just prohibits accidentally doing stores without being easily spotted. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-11	make d_set_d_op() static	Al Viro
	Convert the last user (d_alloc_pseudo()) and be done with that. Any out-of-tree filesystem using it should switch to d_splice_alias_ops() or, better yet, check whether it really needs to have ->d_op vary among its dentries. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-10	new helper: set_default_d_op()	Al Viro
	... to be used instead of manually assigning to ->s_d_op. All in-tree filesystem converted (and field itself is renamed, so any out-of-tree ones in need of conversion will be caught by compiler). Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-10	new helper: d_splice_alias_ops()	Al Viro
	Uses of d_set_d_op() on live dentry can be very dangerous; it is going to be withdrawn and replaced with saner things. The best way for a filesystem is to have the default dentry_operations set at mount time and be done with that - __d_alloc() will use that. Currently there are two cases when d_set_d_op() is used on a live dentry - one is procfs, which has several genuinely different dentry_operations instances (different ->d_revalidate(), etc.) and another is simple_lookup(), where we would be better off without overriding ->d_op. For procfs we have d_set_d_op() calls followed by d_splice_alias(); provide a new helper (d_splice_alias_ops(inode, dentry, d_ops)) that would combine those two, and do the d_set_d_op() part while under ->d_lock. That eliminates one of the places where ->d_flags had been modified without holding ->d_lock; current behaviour is not racy, but the reasons for that are far too brittle. Better move to uniform locking rules and simpler proof of correctness... The next commit will convert procfs to use of that helper; it is not exported and won't be until somebody comes up with convincing modular user for it. Again, the best approach is to have default ->d_op and let __d_alloc() do the right thing; filesystem _may_ need non-uniform ->d_op (procfs does), but there'd better be good reasons for that. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-05-26	Merge tag 'vfs-6.16-rc1.async.dir' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs directory lookup updates from Christian Brauner: "This contains cleanups for the lookup_one() family of helpers. We expose a set of functions with names containing "lookup_one_len" and others without the "_len". This difference has nothing to do with "len". It's rater a historical accident that can be confusing. The functions without "_len" take a "mnt_idmap" pointer. This is found in the "vfsmount" and that is an important question when choosing which to use: do you have a vfsmount, or are you "inside" the filesystem. A related question is "is permission checking relevant here?". nfsd and cachefiles do* have a vfsmount but don't use the non-_len functions. They pass nop_mnt_idmap and refuse to work on filesystems which have any other idmap. This work changes nfsd and cachefile to use the lookup_one family of functions and to explictily pass &nop_mnt_idmap which is consistent with all other vfs interfaces used where &nop_mnt_idmap is explicitly passed. The remaining uses of the "_one" functions do not require permission checks so these are renamed to be "_noperm" and the permission checking is removed. This series also changes these lookup function to take a qstr instead of separate name and len. In many cases this simplifies the call" * tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: change lookup_one_common and lookup_noperm_common to take a qstr Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS VFS: rename lookup_one_len family to lookup_noperm and remove permission check cachefiles: Use lookup_one() rather than lookup_one_len() nfsd: Use lookup_one() rather than lookup_one_len() VFS: improve interface for lookup_one functions
2025-04-11	dcache: convert dentry flag macros to enum	Omar Sandoval
	Commit 9748cb2dc393 ("VFS: repack DENTRY_ flags.") changed the value of DCACHE_MOUNTED, which broke drgn's path_lookup() helper. drgn is forced to hard-code it because it's a macro, and macros aren't preserved in debugging information by default. Enums, on the other hand, are included in debugging information. Convert the DCACHE_* flag macros to an enum so that debugging tools like drgn and bpftrace can make use of them. Link: https://github.com/osandov/drgn/blob/2027d0fea84d74b835e77392f7040c2a333180c6/drgn/helpers/linux/fs.py#L43-L46 Signed-off-by: Omar Sandoval <osandov@fb.com> Link: https://lore.kernel.org/177665a082f048cf536b9cd6af467b3be6b6e6ed.1744141838.git.osandov@fb.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-08	Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS	NeilBrown
	try_lookup_noperm() and d_hash_and_lookup() are nearly identical. The former does some validation of the name where the latter doesn't. Outside of the VFS that validation is likely valuable, and having only one exported function for this task is certainly a good idea. So make d_hash_and_lookup() local to VFS files and change all other callers to try_lookup_noperm(). Note that the arguments are swapped. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250319031545.2999807-6-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-07	VFS: improve interface for lookup_one functions	NeilBrown
	The family of functions: lookup_one() lookup_one_unlocked() lookup_one_positive_unlocked() appear designed to be used by external clients of the filesystem rather than by filesystems acting on themselves as the lookup_one_len family are used. They are used by: btrfs/ioctl - which is a user-space interface rather than an internal activity exportfs - i.e. from nfsd or the open_by_handle_at interface overlayfs - at access the underlying filesystems smb/server - for file service They should be used by nfsd (more than just the exportfs path) and cachefs but aren't. It would help if the documentation didn't claim they should "not be called by generic code". Also the path component name is passed as "name" and "len" which are (confusingly?) separate by the "base". In some cases the len in simply "strlen" and so passing a qstr using QSTR() would make the calling clearer. Other callers do pass separate name and len which are stored in a struct. Sometimes these are already stored in a qstr, other times it easily could be. So this patch changes these three functions to receive a 'struct qstr *', and improves the documentation. QSTR_LEN() is added to make it easy to pass a QSTR containing a known len. [brauner@kernel.org: take a struct qstr pointer] Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-2-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-26	Merge tag 'sysctl-6.15-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Move vm_table members out of kernel/sysctl.c All vm_table array members have moved to their respective subsystems leading to the removal of vm_table from kernel/sysctl.c. This increases modularity by placing the ctl_tables closer to where they are actually used and at the same time reducing the chances of merge conflicts in kernel/sysctl.c. - ctl_table range fixes Replace the proc_handler function that checks variable ranges in coredump_sysctls and vdso_table with the one that actually uses the extra{1,2} pointers as min/max values. This tightens the range of the values that users can pass into the kernel effectively preventing {under,over}flows. - Misc fixes Correct grammar errors and typos in test messages. Update sysctl files in MAINTAINERS. Constified and removed array size in declaration for alignment_tbl * tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits) selftests/sysctl: fix wording of help messages selftests: fix spelling/grammar errors in sysctl/sysctl.sh MAINTAINERS: Update sysctl file list in MAINTAINERS sysctl: Fix underflow value setting risk in vm_table coredump: Fixes core_pipe_limit sysctl proc_handler sysctl: remove unneeded include sysctl: remove the vm_table sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c fs: dcache: move the sysctl to fs/dcache.c sunrpc: simplify rpcauth_cache_shrink_count() fs: drop_caches: move sysctl to fs/drop_caches.c fs: fs-writeback: move sysctl to fs/fs-writeback.c mm: nommu: move sysctl to mm/nommu.c security: min_addr: move sysctl to security/min_addr.c mm: mmap: move sysctl to mm/mmap.c mm: util: move sysctls to mm/util.c mm: vmscan: move vmscan sysctls to mm/vmscan.c mm: swap: move sysctl to mm/swap.c mm: filemap: move sysctl to mm/filemap.c ...
2025-02-27	Merge patch series "prep patches for my mkdir series"	Christian Brauner
	NeilBrown <neilb@suse.de> says: These two patches are cleanup are dependencies for my mkdir changes and subsequence directory locking changes. * patches from https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de: (2 commits) nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked() nfs/vfs: discard d_exact_alias() Link: https://lore.kernel.org/r/20250226062135.2043651-1-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-26	nfs/vfs: discard d_exact_alias()	NeilBrown
	d_exact_alias() is a descendent of d_add_unique() which was introduced 20 years ago mostly likely to work around problems with NFS servers of the time. It is now not used in several situations were it was originally needed and there have been no reports of problems - presumably the old NFS servers have been improved. This only place it is now use is in NFSv4 code and the old problematic servers are thought to have been v2/v3 only. There is no clear benefit in reusing a unhashed() dentry which happens to have the same name as the dentry we are adding. So this patch removes d_exact_alias() and the one place that it is used. Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250226062135.2043651-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	VFS: repack DENTRY_ flags.	NeilBrown
	Bits 13, 23, 24, and 27 are not used. Move all those holes to the end. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250206054504.2950516-7-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-07	fs: dcache: move the sysctl to fs/dcache.c	Kaixiong Yu
	The sysctl_vfs_cache_pressure belongs to fs/dcache.c, move it to fs/dcache.c from kernel/sysctl.c. As a part of fs/dcache.c cleaning, sysctl_vfs_cache_pressure is changed to a static variable, and change the inline-type function vfs_pressure_ratio() to out-of-inline type, export vfs_pressure_ratio() with EXPORT_SYMBOL_GPL to be used by other files. Move the unneeded include(linux/dcache.h). Signed-off-by: Kaixiong Yu <yukaixiong@huawei.com> Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-02-01	Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull misc vfs cleanups from Al Viro: "Two unrelated patches - one is a removal of long-obsolete include in overlayfs (it used to need fs/internal.h, but the extern it wanted has been moved back to include/linux/namei.h) and another introduces convenience helper constructing struct qstr by a NUL-terminated string" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add a string-to-qstr constructor fs/overlayfs/namei.c: get rid of include ../internal.h
2025-01-27	add a string-to-qstr constructor	Al Viro
	Quite a few places want to build a struct qstr by given string; it would be convenient to have a primitive doing that, rather than open-coding it via QSTR_INIT(). The closest approximation was in bcachefs, but that expands to initializer list - {.len = strlen(string), .name = string}. It would be more useful to have it as compound literal - (struct qstr){.len = strlen(string), .name = string}. Unlike initializer list it's a valid expression. What's more, it's a valid lvalue - it's an equivalent of anonymous local variable with such initializer, so the things like path->dentry = d_alloc_pseudo(mnt->mnt_sb, &QSTR(name)); are valid. It can also be used as initializer, with identical effect - struct qstr x = (struct qstr){.name = s, .len = strlen(s)}; is equivalent to struct qstr anon_variable = {.name = s, .len = strlen(s)}; struct qstr x = anon_variable; // anon_variable is never used after that point and any even remotely sane compiler will manage to collapse that into struct qstr x = {.name = s, .len = strlen(s)}; What compound literals can't be used for is initialization of global variables, but those are covered by QSTR_INIT(). This commit lifts definition(s) of QSTR() into linux/dcache.h, converts it to compound literal (all bcachefs users are fine with that) and converts assorted open-coded instances to using that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	9p: fix ->rename_sem exclusion	Al Viro
	9p wants to be able to build a path from given dentry to fs root and keep it valid over a blocking operation. ->s_vfs_rename_mutex would be a natural candidate, but there are places where we need that and where we have no way to tell if ->s_vfs_rename_mutex is already held deeper in callchain. Moreover, it's only held for cross-directory renames; name changes within the same directory happen without it. Solution: * have d_move() done in ->rename() rather than in its caller * maintain a 9p-private rwsem (per-filesystem) * hold it exclusive over the relevant part of ->rename() * hold it shared over the places where we want the path. That almost works. FS_RENAME_DOES_D_MOVE is enough to put all d_move() and d_exchange() calls under filesystem's control. However, there's also __d_unalias(), which isn't covered by any of that. If ->lookup() hits a directory inode with preexisting dentry elsewhere (due to e.g. rename done on server behind our back), d_splice_alias() called by ->lookup() will move/rename that alias. Add a couple of optional methods, so that __d_unalias() would do if alias->d_op->d_unalias_trylock != NULL if (!alias->d_op->d_unalias_trylock(alias)) fail (resulting in -ESTALE from lookup) __d_move(...) if alias->d_op->d_unalias_unlock != NULL alias->d_unalias_unlock(alias) where it currently does __d_move(). 9p instances do down_write_trylock() and up_write() of ->rename_mutex. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27	Pass parent directory inode and expected name to ->d_revalidate()	Al Viro
	->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed. It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all. It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances. This commit only changes the calling conventions; making use of supplied values is left to followups. NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs). One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-17	dcache: back inline names with a struct-wrapped array of unsigned long	Al Viro
	... so that they can be copied with struct assignment (which generates better code) and accessed word-by-word. The type is union shortname_storage; it's a union of arrays of unsigned char and unsigned long. struct name_snapshot.inline_name turned into union shortname_storage; users (all in fs/dcache.c) adjusted. struct dentry.d_iname has some users outside of fs/dcache.c; to reduce the amount of noise in commit, it is replaced with union shortname_storage d_shortname and d_iname is turned into a macro that expands to d_shortname.string (similar to d_lock handling). That compat macro is temporary - most of the remaining instances will be taken out by debugfs series, and once that is merged and few others are taken care of this will go away. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-17	make sure that DNAME_INLINE_LEN is a multiple of word size	Al Viro
	... calling the number of words DNAME_INLINE_WORDS. The next step will be to have a structure to hold inline name arrays (both in dentry and in name_snapshot) and use that to alias the existing arrays of unsigned char there. That will allow both full-structure copies and convenient word-by-word accesses. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-07-15	Merge tag 'vfs-6.11.inode' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode / dentry updates from Christian Brauner: "This contains smaller performance improvements to inodes and dentries: inode: - Add rcu based inode lookup variants. They avoid one inode hash lock acquire in the common case thereby significantly reducing contention. We already support RCU-based operations but didn't take advantage of them during inode insertion. Callers of iget_locked() get the improvement without any code changes. Callers that need a custom callback can switch to iget5_locked_rcu() as e.g., did btrfs. With 20 threads each walking a dedicated 1000 dirs * 1000 files directory tree to stat(2) on a 32 core + 24GB ram vm: before: 3.54s user 892.30s system 1966% cpu 45.549 total after: 3.28s user 738.66s system 1955% cpu 37.932 total (-16.7%) Long-term we should pick up the effort to introduce more fine-grained locking and possibly improve on the currently used hash implementation. - Start zeroing i_state in inode_init_always() instead of doing it in individual filesystems. This allows us to remove an unneeded lock acquire in new_inode() and not burden individual filesystems with this. dcache: - Move d_lockref out of the area used by RCU lookup to avoid cacheline ping poing because the embedded name is sharing a cacheline with d_lockref. - Fix dentry size on 32bit with CONFIG_SMP=y so it does actually end up with 128 bytes in total" * tag 'vfs-6.11.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: fix dentry size vfs: move d_lockref out of the area used by RCU lookup bcachefs: remove now spurious i_state initialization xfs: remove now spurious i_state initialization in xfs_inode_alloc vfs: partially sanitize i_state zeroing on inode creation xfs: preserve i_state around inode_init_always in xfs_reinit_inode btrfs: use iget5_locked_rcu vfs: add rcu-based find_inode variants for iget ops
2024-07-03	fs: fix dentry size	Christian Brauner
	On CONFIG_SMP=y and on 32bit we need to decrease DNAME_INLINE_LEN to 36 btyes to end up with 128 bytes in total. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Links: https://lore.kernel.org/r/CAHk-=whtoqTSCcAvV-X-KPqoDWxS4vxmWpuKLB+Vv8=FtUd5vA@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	vfs: move d_lockref out of the area used by RCU lookup	Mateusz Guzik
	Stock kernel scales worse than FreeBSD when doing a 20-way stat(2) on the same tmpfs-backed file. According to perf top: 38.09% lockref_put_return 26.08% lockref_get_not_dead 25.60% __d_lookup_rcu 0.89% clear_bhb_loop __d_lookup_rcu is participating in cacheline ping pong due to the embedded name sharing a cacheline with lockref. Moving it out resolves the problem: 41.50% lockref_put_return 41.03% lockref_get_not_dead 1.54% clear_bhb_loop benchmark (will-it-scale, Sapphire Rapids, tmpfs, ops/s): FreeBSD:7219334 before: 5038006 after: 7842883 (+55%) One minor remark: the 'after' result is unstable, fluctuating in the range ~7.8 mln to ~9 mln during different runs. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240613001215.648829-3-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-27	vfs: rename parent_ino to d_parent_ino and make it use RCU	Mateusz Guzik
	The routine is used by procfs through dir_emit_dots. The combined RCU and lock fallback implementation is too big for an inline. Given that the routine takes a dentry argument fs/dcache.c seems like the place to put it in. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240627161152.802567-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-11	Merge tag 'vfs-6.9.misc' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Misc features, cleanups, and fixes for vfs and individual filesystems. Features: - Support idmapped mounts for hugetlbfs. - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug where the passed offset is ignored if the file is O_APPEND. The new flag allows a caller to enforce that the offset is honored to conform to posix even if the file was opened in append mode. - Move i_mmap_rwsem in struct address_space to avoid false sharing between i_mmap and i_mmap_rwsem. - Convert efs, qnx4, and coda to use the new mount api. - Add a generic is_dot_dotdot() helper that's used by various filesystems and the VFS code instead of open-coding it multiple times. - Recently we've added stable offsets which allows stable ordering when iterating directories exported through NFS on e.g., tmpfs filesystems. Originally an xarray was used for the offset map but that caused slab fragmentation issues over time. This switches the offset map to the maple tree which has a dense mode that handles this scenario a lot better. Includes tests. - Finally merge the case-insensitive improvement series Gabriel has been working on for a long time. This cleanly propagates case insensitive operations through ->s_d_op which in turn allows us to remove the quite ugly generic_set_encrypted_ci_d_ops() operations. It also improves performance by trying a case-sensitive comparison first and then fallback to case-insensitive lookup if that fails. This also fixes a bug where overlayfs would be able to be mounted over a case insensitive directory which would lead to all sort of odd behaviors. Cleanups: - Make file_dentry() a simple accessor now that ->d_real() is simplified because of the backing file work we did the last two cycles. - Use the dedicated file_mnt_idmap helper in ntfs3. - Use smp_load_acquire/store_release() in the i_size_read/write helpers and thus remove the hack to handle i_size reads in the filemap code. - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in fs/ - It's no longer necessary to perform a second built-in initramfs unpack call because we retain the contents of the previous extraction. Remove it. - Now that we have removed various allocators kfree_rcu() always works with kmem caches and kmalloc(). So simplify various places that only use an rcu callback in order to handle the kmem cache case. - Convert the pipe code to use a lockdep comparison function instead of open-coding the nesting making lockdep validation easier. - Move code into fs-writeback.c that was located in a header but can be made static as it's only used in that one file. - Rewrite the alignment checking iterators for iovec and bvec to be easier to read, and also significantly more compact in terms of generated code. This saves 270 bytes of text on x86-64 (with clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also saves a bit of time for the same workload. - Switch various places to use KMEM_CACHE instead of kmem_cache_create(). - Use inode_set_ctime_to_ts() in inode_set_ctime_current() - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak. - Various smaller cleanups for eventfds. Fixes: - Fix various comments and typos, and unneeded initializations. - Fix stack allocation hack for clang in the select code. - Improve dump_mapping() debug code on a best-effort basis. - Fix build errors in various selftests. - Avoid wrap-around instrumentation in various places. - Don't allow user namespaces without an idmapping to be used for idmapped mounts. - Fix sysv sb_read() call. - Fix fallback implementation of the get_name() export operation" * tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits) hugetlbfs: support idmapped mounts qnx4: convert qnx4 to use the new mount api fs: use inode_set_ctime_to_ts to set inode ctime to current time libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup efs: remove SLAB_MEM_SPREAD flag usage jfs: remove SLAB_MEM_SPREAD flag usage minix: remove SLAB_MEM_SPREAD flag usage openpromfs: remove SLAB_MEM_SPREAD flag usage proc: remove SLAB_MEM_SPREAD flag usage qnx6: remove SLAB_MEM_SPREAD flag usage ...
2024-02-09	Revert "get rid of DCACHE_GENOCIDE"	Al Viro
	This reverts commit 57851607326a2beef21e67f83f4f53a90df8445a. Unfortunately, while we only call that thing once, the callback can be called more than once for the same dentry - all it takes is rename_lock being touched while we are in d_walk(). For now let's revert it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-06	fs: remove the inode argument to ->d_real() method	Amir Goldstein
	The only remaining user of ->d_real() method is d_real_inode(), which passed NULL inode argument to get the real data dentry. There are no longer any users that call ->d_real() with a non-NULL inode argument for getting a detry from a specific underlying layer. Remove the inode argument of the method and replace it with an integer 'type' argument, to allow callers to request the real metadata dentry instead of the real data dentry. All the current users of d_real_inode() (e.g. uprobe) continue to get the real data inode. Caller that need to get the real metadata inode (e.g. IMA/EVM) can use d_inode(d_real(dentry, D_REAL_METADATA)). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20240202110132.1584111-3-amir73il@gmail.com Tested-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-25	dcache: remove unnecessary NULL check in dget_dlock()	Vegard Nossum
	dget_dlock() requires dentry->d_lock to be held when called, yet contains a NULL check for dentry. An audit of all calls to dget_dlock() shows that it is never called with a NULL pointer (as spin_lock()/spin_unlock() would crash in these cases): $ git grep -W '\<dget_dlock\>' arch/powerpc/platforms/cell/spufs/inode.c- spin_lock(&dentry->d_lock); arch/powerpc/platforms/cell/spufs/inode.c- if (simple_positive(dentry)) { arch/powerpc/platforms/cell/spufs/inode.c: dget_dlock(dentry); fs/autofs/expire.c- spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED); fs/autofs/expire.c- if (simple_positive(child)) { fs/autofs/expire.c: dget_dlock(child); fs/autofs/root.c: dget_dlock(active); fs/autofs/root.c- spin_unlock(&active->d_lock); fs/autofs/root.c: dget_dlock(expiring); fs/autofs/root.c- spin_unlock(&expiring->d_lock); fs/ceph/dir.c- if (!spin_trylock(&dentry->d_lock)) fs/ceph/dir.c- continue; [...] fs/ceph/dir.c: dget_dlock(dentry); fs/ceph/mds_client.c- spin_lock(&alias->d_lock); [...] fs/ceph/mds_client.c: dn = dget_dlock(alias); fs/configfs/inode.c- spin_lock(&dentry->d_lock); fs/configfs/inode.c- if (simple_positive(dentry)) { fs/configfs/inode.c: dget_dlock(dentry); fs/libfs.c: found = dget_dlock(d); fs/libfs.c- spin_unlock(&d->d_lock); fs/libfs.c: found = dget_dlock(child); fs/libfs.c- spin_unlock(&child->d_lock); fs/libfs.c: child = dget_dlock(d); fs/libfs.c- spin_unlock(&d->d_lock); fs/ocfs2/dcache.c: dget_dlock(dentry); fs/ocfs2/dcache.c- spin_unlock(&dentry->d_lock); include/linux/dcache.h:static inline struct dentry dget_dlock(struct dentry dentry) After taking out the NULL check, dget_dlock() becomes almost identical to __dget_dlock(); the only difference is that dget_dlock() returns the dentry that was passed in. These are static inline helpers, so we can rely on the compiler to discard unused return values. We can therefore also remove __dget_dlock() and replace calls to it by dget_dlock(). Also fix up and improve the kerneldoc comments while we're at it. Al Viro pointed out that we can also clean up some of the callers to make use of the returned value and provided a bit more info for the kerneldoc. While preparing v2 I also noticed that the tabs used in the kerneldoc comments were causing the kerneldoc to get parsed incorrectly so I also fixed this up (including for d_unhashed, which is otherwise unrelated). Testing: x86 defconfig build + boot; make htmldocs for the kerneldoc warning. objdump shows there are code generation changes. Link: https://lore.kernel.org/all/20231022164520.915013-1-vegard.nossum@oracle.com/ Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: Nick Piggin <npiggin@kernel.dk> Cc: Waiman Long <Waiman.Long@hp.com> Cc: linux-doc@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-25	kill DCACHE_MAY_FREE	Al Viro
	With the new ordering in __dentry_kill() it has become redundant - it's set if and only if both DCACHE_DENTRY_KILLED and DCACHE_SHRINK_LIST are set. We set it in __dentry_kill(), after having set DCACHE_DENTRY_KILLED with the only condition being that DCACHE_SHRINK_LIST is there; all of that is done without dropping ->d_lock and the only place that checks that flag (shrink_dentry_list()) does so under ->d_lock, after having found the victim on its shrink list. Since DCACHE_SHRINK_LIST is set only when placing dentry into shrink list and removed only by shrink_dentry_list() itself, a check for DCACHE_DENTRY_KILLED in there would be equivalent to check for DCACHE_MAY_FREE. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-25	Merge branches 'work.dcache-misc' and 'work.dcache2' into work.dcache	Al Viro

2023-11-25	get rid of DCACHE_GENOCIDE	Al Viro
	... now that we never call d_genocide() other than from kill_litter_super() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-25	d_genocide(): move the extern into fs/internal.h	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-25	kill d_instantate_anon(), fold __d_instantiate_anon() into remaining caller	Al Viro
	now that the only user of d_instantiate_anon() is gone... [braino fix folded - kudos to Dan Carpenter] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-25	dentry: switch the lists of children to hlist	Al Viro
	Saves a pointer per struct dentry and actually makes the things less clumsy. Cleaned the d_walk() and dcache_readdir() a bit by use of hlist_for_... iterators. A couple of new helpers - d_first_child() and d_next_sibling(), to make the expressions less awful. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-18	kill d_backing_dentry()	Al Viro
	no users left Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-18	[software coproarchaeology] dentry.h: kill a mysterious comment	Al Viro
	there's a strange comment in front of d_lookup() declaration: /* appendix may either be NULL or be used for transname suffixes / Looks like nobody had been curious enough to track its history; it predates git, it predates bitkeeper and if you look through the pre-BK trees, you finally arrive at this in 2.1.44-for-davem: / appendix may either be NULL or be used for transname suffixes / -extern struct dentry d_lookup(struct inode * dir, struct qstr * name, - struct qstr * appendix); +extern struct dentry * d_lookup(struct dentry * dir, struct qstr * name); In other words, it refers to the third argument d_lookup() used to have back then. It had been introduced in 2.1.43-pre, on June 12 1997, along with d_lookup(), only to be removed by July 4 1997, presumably when the Cthulhu-awful thing it used to be used for (look for CONFIG_TRANS_NAMES in 2.1.43-pre, and keep a heavy-duty barfbag ready) had been, er, noticed and recognized for what it had been. Despite the appendectomy, the comment remained. Some things really need to be put out of their misery... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-18	dentry.h: trim externs	Al Viro
	d_instantiate_unique() had been gone for 7 years; __d_lookup...() and shrink_dcache_for_umount() are fs/internal.h fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-18	kill d_{is,set}_fallthru()	Al Viro
	Introduced in 2015 and never had any in-tree users... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-18	DCACHE_COOKIE: RIP	Al Viro
	the last user gone in 2021... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-18	DCACHE_... ->d_flags bits: switch to BIT()	Al Viro
	For bits 20..22 (inode type cached in ->d_flags) turn the definitions into expressions like (5 << 20); everything else turns into straight use of BIT() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-18	struct dentry: get rid of randomize_layout idiocy	Al Viro
	This is beyond ridiculous. There is a reason why that thing is cacheline-aligned... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-09-11	fs: factor out d_mark_tmpfile()	Kent Overstreet
	New helper for bcachefs - bcachefs doesn't want the inode_dec_link_count() call that d_tmpfile does, it handles i_nlink on its own atomically with other btree updates Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christian Brauner <brauner@kernel.org>
2022-10-10	Merge tag 'pull-tmpfile' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs tmpfile updates from Al Viro: "Miklos' ->tmpfile() signature change; pass an unopened struct file to it, let it open the damn thing. Allows to add tmpfile support to FUSE" * tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fuse: implement ->tmpfile() vfs: open inside ->tmpfile() vfs: move open right after ->tmpfile() vfs: make vfs_tmpfile() static ovl: use vfs_tmpfile_open() helper cachefiles: use vfs_tmpfile_open() helper cachefiles: only pass inode to *mark_inode_inuse() helpers cachefiles: tmpfile error handling cleanup hugetlbfs: cleanup mknod and tmpfile vfs: add vfs_tmpfile_open() helper
2022-09-24	vfs: open inside ->tmpfile()	Miklos Szeredi
	This is in preparation for adding tmpfile support to fuse, which requires that the tmpfile creation and opening are done as a single operation. Replace the 'struct dentry ' argument of i_op->tmpfile with 'struct file '. Call finish_open_simple() as the last thing in ->tmpfile() instances (may be omitted in the error case). Change d_tmpfile() argument to 'struct file *' as well to make callers more readable. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-08-20	dynamic_dname(): drop unused dentry argument	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-11	Merge tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client	Linus Torvalds
	Pull ceph updates from Ilya Dryomov: "We have a good pile of various fixes and cleanups from Xiubo, Jeff, Luis and others, almost exclusively in the filesystem. Several patches touch files outside of our normal purview to set the stage for bringing in Jeff's long awaited ceph+fscrypt series in the near future. All of them have appropriate acks and sat in linux-next for a while" * tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client: (27 commits) libceph: clean up ceph_osdc_start_request prototype libceph: fix ceph_pagelist_reserve() comment typo ceph: remove useless check for the folio ceph: don't truncate file in atomic_open ceph: make f_bsize always equal to f_frsize ceph: flush the dirty caps immediatelly when quota is approaching libceph: print fsid and epoch with osd id libceph: check pointer before assigned to "c->rules[]" ceph: don't get the inline data for new creating files ceph: update the auth cap when the async create req is forwarded ceph: make change_auth_cap_ses a global symbol ceph: fix incorrect old_size length in ceph_mds_request_args ceph: switch back to testing for NULL folio->private in ceph_dirty_folio ceph: call netfs_subreq_terminated with was_async == false ceph: convert to generic_file_llseek ceph: fix the incorrect comment for the ceph_mds_caps struct ceph: don't leak snap_rwsem in handle_cap_grant ceph: prevent a client from exceeding the MDS maximum xattr size ceph: choose auth MDS for getxattr with the Xs caps ceph: add session already open notify support ...