summaryrefslogtreecommitdiff
path: root/fs/nfsd
AgeCommit message (Collapse)Author
2025-02-02nfsd: clear acl_access/acl_default after releasing themLi Lingfeng
If getting acl_default fails, acl_access and acl_default will be released simultaneously. However, acl_access will still retain a pointer pointing to the released posix_acl, which will trigger a WARNING in nfs3svc_release_getacl like this: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 26 PID: 3199 at lib/refcount.c:28 refcount_warn_saturate+0xb5/0x170 Modules linked in: CPU: 26 UID: 0 PID: 3199 Comm: nfsd Not tainted 6.12.0-rc6-00079-g04ae226af01f-dirty #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb5/0x170 Code: cc cc 0f b6 1d b3 20 a5 03 80 fb 01 0f 87 65 48 d8 00 83 e3 01 75 e4 48 c7 c7 c0 3b 9b 85 c6 05 97 20 a5 03 01 e8 fb 3e 30 ff <0f> 0b eb cd 0f b6 1d 8a3 RSP: 0018:ffffc90008637cd8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83904fde RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88871ed36380 RBP: ffff888158beeb40 R08: 0000000000000001 R09: fffff520010c6f56 R10: ffffc90008637ab7 R11: 0000000000000001 R12: 0000000000000001 R13: ffff888140e77400 R14: ffff888140e77408 R15: ffffffff858b42c0 FS: 0000000000000000(0000) GS:ffff88871ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000562384d32158 CR3: 000000055cc6a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? refcount_warn_saturate+0xb5/0x170 ? __warn+0xa5/0x140 ? refcount_warn_saturate+0xb5/0x170 ? report_bug+0x1b1/0x1e0 ? handle_bug+0x53/0xa0 ? exc_invalid_op+0x17/0x40 ? asm_exc_invalid_op+0x1a/0x20 ? tick_nohz_tick_stopped+0x1e/0x40 ? refcount_warn_saturate+0xb5/0x170 ? refcount_warn_saturate+0xb5/0x170 nfs3svc_release_getacl+0xc9/0xe0 svc_process_common+0x5db/0xb60 ? __pfx_svc_process_common+0x10/0x10 ? __rcu_read_unlock+0x69/0xa0 ? __pfx_nfsd_dispatch+0x10/0x10 ? svc_xprt_received+0xa1/0x120 ? xdr_init_decode+0x11d/0x190 svc_process+0x2a7/0x330 svc_handle_xprt+0x69d/0x940 svc_recv+0x180/0x2d0 nfsd+0x168/0x200 ? __pfx_nfsd+0x10/0x10 kthread+0x1a2/0x1e0 ? kthread+0xf4/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Kernel panic - not syncing: kernel: panic_on_warn set ... Clear acl_access/acl_default after posix_acl_release is called to prevent UAF from being triggered. Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241107014705.2509463-1-lilingfeng@huaweicloud.com/ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Rick Macklem <rmacklem@uoguelph.ca> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-28Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Enable using direct IO with localio - Added localio related tracepoints Bugfixes: - Sunrpc fixes for working with a very large cl_tasks list - Fix a possible buffer overflow in nfs_sysfs_link_rpc_client() - Fixes for handling reconnections with localio - Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT - Fix COPY_NOTIFY xdr_buf size calculations - pNFS/Flexfiles fix for retrying requesting a layout segment for reads - Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired Cleanups: - Various other nfs & nfsd localio cleanups - Prepratory patches for async copy improvements that are under development - Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts - Add netns inum and srcaddr to debugfs rpc_xprt info" * tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits) SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info pnfs/flexfiles: retry getting layout segment for reads NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE NFSv4.2: fix COPY_NOTIFY xdr buf size calculation NFS: Rename struct nfs4_offloadcancel_data NFS: Fix typo in OFFLOAD_CANCEL comment NFS: CB_OFFLOAD can return NFS4ERR_DELAY nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it nfs: fix incorrect error handling in LOCALIO nfs: probe for LOCALIO when v3 client reconnects to server nfs: probe for LOCALIO when v4 client reconnects to server nfs/localio: remove redundant code and simplify LOCALIO enablement nfs_common: add nfs_localio trace events nfs_common: track all open nfsd_files per LOCALIO nfs_client nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_ nfsd: update percpu_ref to manage references on nfsd_net ...
2025-01-27Merge tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Jeff Layton contributed an implementation of NFSv4.2+ attribute delegation, as described here: https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html This interoperates with similar functionality introduced into the Linux NFS client in v6.11. An attribute delegation permits an NFS client to manage a file's mtime, rather than flushing dirty data to the NFS server so that the file's mtime reflects the last write, which is considerably slower. Neil Brown contributed dynamic NFSv4.1 session slot table resizing. This facility enables NFSD to increase or decrease the number of slots per NFS session depending on server memory availability. More session slots means greater parallelism. Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND encoding screws up when crossing a page boundary in the encoding buffer. This is a zero-day bug, but hitting it is rare and depends on the NFS client implementation. The Linux NFS client does not happen to trigger this issue. A variety of bug fixes and other incremental improvements fill out the list of commits in this release. Great thanks to all contributors, reviewers, testers, and bug reporters who participated during this development cycle" * tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits) sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode sunrpc: Remove gss_generic_token deadcode sunrpc: Remove unused xprt_iter_get_xprt Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages" nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION nfsd: handle delegated timestamps in SETATTR nfsd: add support for delegated timestamps nfsd: rework NFS4_SHARE_WANT_* flag handling nfsd: add support for FATTR4_OPEN_ARGUMENTS nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_* nfsd: switch to autogenerated definitions for open_delegation_type4 nfs_common: make include/linux/nfs4.h include generated nfs4_1.h nfsd: fix handling of delegated change attr in CB_GETATTR SUNRPC: Document validity guarantees of the pointer returned by reserve_space NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer NFSD: Refactor nfsd4_do_encode_secinfo() again NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer ...
2025-01-21Merge tag 'lsm-pr-20250121' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Improved handling of LSM "secctx" strings through lsm_context struct The LSM secctx string interface is from an older time when only one LSM was supported, migrate over to the lsm_context struct to better support the different LSMs we now have and make it easier to support new LSMs in the future. These changes explain the Rust, VFS, and networking changes in the diffstat. - Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are enabled Small tweak to be a bit smarter about when we build the LSM's common audit helpers. - Check for absurdly large policies from userspace in SafeSetID SafeSetID policies rules are fairly small, basically just "UID:UID", it easy to impose a limit of KMALLOC_MAX_SIZE on policy writes which helps quiet a number of syzbot related issues. While work is being done to address the syzbot issues through other mechanisms, this is a trivial and relatively safe fix that we can do now. - Various minor improvements and cleanups A collection of improvements to the kernel selftests, constification of some function parameters, removing redundant assignments, and local variable renames to improve readability. * tag 'lsm-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lockdown: initialize local array before use to quiet static analysis safesetid: check size of policy writes net: corrections for security_secid_to_secctx returns lsm: rename variable to avoid shadowing lsm: constify function parameters security: remove redundant assignment to return variable lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test binder: initialize lsm_context structure rust: replace lsm context+len with lsm_context lsm: secctx provider check on release lsm: lsm_context in security_dentry_init_security lsm: use lsm_context in security_inode_getsecctx lsm: replace context+len with lsm_context lsm: ensure the correct LSM context releaser
2025-01-21nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATIONJeff Layton
Allow clients to request getting a delegation xor an open stateid if a delegation isn't available. This allows the client to avoid sending a final CLOSE for the (useless) open stateid, when it is granted a delegation. If this flag is requested by the client and there isn't already a new open stateid, discard the new open stateid before replying. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: handle delegated timestamps in SETATTRJeff Layton
Allow SETATTR to handle delegated timestamps. This patch assumes that only the delegation holder has the ability to set the timestamps in this way, so we allow this only if the SETATTR stateid refers to a *_ATTRS_DELEG delegation. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: add support for delegated timestampsJeff Layton
Add support for the delegated timestamps on write delegations. This allows the server to proxy timestamps from the delegation holder to other clients that are doing GETATTRs vs. the same inode. When OPEN4_SHARE_ACCESS_WANT_DELEG_TIMESTAMPS bit is set in the OPEN call, set the dl_type to the *_ATTRS_DELEG flavor of delegation. Add timespec64 fields to nfs4_cb_fattr and decode the timestamps into those. Vet those timestamps according to the delstid spec and update the inode attrs if necessary. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: rework NFS4_SHARE_WANT_* flag handlingJeff Layton
The delstid draft adds new NFS4_SHARE_WANT_TYPE_MASK values that don't fit neatly into the existing WANT_MASK or WHEN_MASK. Add a new NFS4_SHARE_WANT_MOD_MASK value and redefine NFS4_SHARE_WANT_MASK to include it. Also fix the checks in nfsd4_deleg_xgrade_none_ext() to check for the flags instead of equality, since there may be modifier flags in the value. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: add support for FATTR4_OPEN_ARGUMENTSJeff Layton
Add support for FATTR4_OPEN_ARGUMENTS. This a new mechanism for the client to discover what OPEN features the server supports. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegationsJeff Layton
Add some preparatory code to various functions that handle delegation types to allow them to handle the OPEN_DELEGATE_*_ATTRS_DELEG constants. Add helpers for detecting whether it's a read or write deleg, and whether the attributes are delegated. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*Jeff Layton
Add the OPEN4_SHARE_ACCESS_WANT constants from the nfs4.1 and delstid draft into the nfs4_1.x file, and regenerate the headers and source files. Do a mass renaming of NFS4_SHARE_WANT_* to OPEN4_SHARE_ACCESS_WANT_* in the nfsd directory. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: switch to autogenerated definitions for open_delegation_type4Jeff Layton
Rename the enum with the same name in include/linux/nfs4.h, add the proper enum to nfs4_1.x and regenerate the headers and source files. Do a mass rename of all NFS4_OPEN_DELEGATE_* to OPEN_DELEGATE_* in the nfsd directory. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfs_common: make include/linux/nfs4.h include generated nfs4_1.hJeff Layton
In the long run, the NFS development community intends to autogenerate a lot of the XDR handling code. Both the NFS client and server include "include/linux/nfs4.hi". That file was hand-rolled, and some of the symbols in it conflict with the autogenerated symbols. Add a small nfs4_1.x to Documentation that currently just has the necessary definitions for the delstid draft, and generate the relevant header and source files. Make include/linux/nfs4.h include the generated include/linux/sunrpc/xdrgen/nfs4_1.h and remove the conflicting definitions from it and nfs_xdr.h. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-21nfsd: fix handling of delegated change attr in CB_GETATTRJeff Layton
RFC8881, section 10.4.3 has some specific guidance as to how the delegated change attribute should be handled. We currently don't follow that guidance properly. In particular, when the file is modified, the server always reports the initial change attribute + 1. Section 10.4.3 however indicates that it should be incremented on every GETATTR request from other clients. Only request the change attribute until the file has been modified. If there is an outstanding delegation, then increment the cached change attribute on every GETATTR. Fixes: 6487a13b5c6b ("NFSD: add support for CB_GETATTR callback") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-20Merge tag 'kernel-6.14-rc1.cred' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull cred refcount updates from Christian Brauner: "For the v6.13 cycle we switched overlayfs to a variant of override_creds() that doesn't take an extra reference. To this end the {override,revert}_creds_light() helpers were introduced. This generalizes the idea behind {override,revert}_creds_light() to the {override,revert}_creds() helpers. Afterwards overriding and reverting credentials is reference count free unless the caller explicitly takes a reference. All callers have been appropriately ported" * tag 'kernel-6.14-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) cred: fold get_new_cred_many() into get_cred_many() cred: remove unused get_new_cred() nfsd: avoid pointless cred reference count bump cachefiles: avoid pointless cred reference count bump dns_resolver: avoid pointless cred reference count bump trace: avoid pointless cred reference count bump cgroup: avoid pointless cred reference count bump acct: avoid pointless reference count bump io_uring: avoid pointless cred reference count bump smb: avoid pointless cred reference count bump cifs: avoid pointless cred reference count bump cifs: avoid pointless cred reference count bump ovl: avoid pointless cred reference count bump open: avoid pointless cred reference count bump nfsfh: avoid pointless cred reference count bump nfs/nfs4recover: avoid pointless cred reference count bump nfs/nfs4idmap: avoid pointless reference count bump nfs/localio: avoid pointless cred reference count bumps coredump: avoid pointless cred reference count bump binfmt_misc: avoid pointless cred reference count bump ...
2025-01-14nfs_common: track all open nfsd_files per LOCALIO nfs_clientMike Snitzer
This tracking enables __nfsd_file_cache_purge() to call nfs_localio_invalidate_clients(), upon shutdown or export change, to nfs_close_local_fh() all open nfsd_files that are still cached by the LOCALIO nfs clients associated with nfsd_net that is being shutdown. Now that the client must track all open nfsd_files there was more work than necessary being done with the global nfs_uuids_lock contended. This manifested in various RCU issues, e.g.: hrtimer: interrupt took 47969440 ns rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Use nfs_uuid->lock to protect all nfs_uuid_t members, instead of nfs_uuids_lock, once nfs_uuid_is_local() adds the client to nn->local_clients. Also add 'local_clients_lock' to 'struct nfsd_net' to protect nn->local_clients. And store a pointer to spinlock in the 'list_lock' member of nfs_uuid_t so nfs_localio_disable_client() can use it to avoid taking the global nfs_uuids_lock. In combination, these split out locks eliminate the use of the single nfslocalio.c global nfs_uuids_lock in the IO paths (open and close). Also refactored associated fs/nfs_common/nfslocalio.c methods' locking to reduce work performed with spinlocks held in general. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lockMike Snitzer
This global spinlock protects all nfs_uuid_t relative to the global nfs_uuids list. A later commit will split this global spinlock so prepare by renaming this lock to reflect its intended narrow scope. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_fileMike Snitzer
Now that LOCALIO no longer leans on NFSD's filecache for caching open files (and instead uses NFS client-side open nfsd_file caching) there is no need to use NFSD filecache's GC feature. Avoiding GC will speed up nfsd_file initial opens. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_Mike Snitzer
Also update Documentation/filesystems/nfs/localio.rst accordingly and reduce the technical documentation debt that was previously captured in that document. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14nfsd: update percpu_ref to manage references on nfsd_netMike Snitzer
Holding a reference on nfsd_net is what is required, it was never actually about ensuring nn->nfsd_serv available. Move waiting for outstanding percpu references from nfsd_destroy_serv() to nfsd_shutdown_net(). By moving it later it will be possible to invalidate localio clients during nfsd_file_cache_shutdown_net() via __nfsd_file_cache_purge(). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14nfs_common: rename functions that invalidate LOCALIO nfs_clientsMike Snitzer
Rename nfs_uuid_invalidate_one_client to nfs_localio_disable_client. Rename nfs_uuid_invalidate_clients to nfs_localio_invalidate_clients. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-14nfsd: add nfsd_file_{get,put} to 'nfs_to' nfsd_localio_operationsMike Snitzer
In later a commit LOCALIO must call both nfsd_file_get and nfsd_file_put to manage extra nfsd_file references. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-10NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode bufferChuck Lever
Commit ab04de60ae1c ("NFSD: Optimize nfsd4_encode_fattr()") replaced the use of write_bytes_to_xdr_buf() because it's expensive and the data items to be encoded are already properly aligned. However, there's no guarantee that the pointer returned from xdr_reserve_space() will still point to the correct reserved space in the encode buffer after one or more intervening calls to xdr_reserve_space(). It just happens to work with the current implementation of xdr_reserve_space(). This commit effectively reverts the optimization. Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode bufferChuck Lever
There's no guarantee that the pointer returned from xdr_reserve_space() will still point to the correct reserved space in the encode buffer after one or more intervening calls to xdr_reserve_space(). It just happens to work with the current implementation of xdr_reserve_space(). Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10NFSD: Refactor nfsd4_do_encode_secinfo() againChuck Lever
Extract the code that encodes the secinfo4 union data type to clarify the logic. The removed warning is pretty well obscured and thus probably not terribly useful. Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode bufferChuck Lever
There's no guarantee that the pointer returned from xdr_reserve_space() will still point to the correct reserved space in the encode buffer after one or more intervening calls to xdr_reserve_space(). It just happens to work with the current implementation of xdr_reserve_space(). Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the ↵Chuck Lever
encode buffer Commit eeadcb757945 ("NFSD: Simplify READ_PLUS") replaced the use of write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read() at the time. However, the current code will corrupt the encoded data if the XDR data items that are reserved early and then poked into the XDR buffer later happen to fall on a page boundary in the XDR encoding buffer. __xdr_commit_encode can shift encoded data items in the encoding buffer so that pointers returned from xdr_reserve_space() no longer address the same part of the encoding stream. Fixes: eeadcb757945 ("NFSD: Simplify READ_PLUS") Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10NFSD: Insulate nfsd4_encode_read_plus() from page boundaries in the encode ↵Chuck Lever
buffer Commit eeadcb757945 ("NFSD: Simplify READ_PLUS") replaced the use of write_bytes_to_xdr_buf(), copying what was in nfsd4_encode_read() at the time. However, the current code will corrupt the encoded data if the XDR data items that are reserved early and then poked into the XDR buffer later happen to fall on a page boundary in the XDR encoding buffer. __xdr_commit_encode can shift encoded data items in the encoding buffer so that pointers returned from xdr_reserve_space() no longer address the same part of the encoding stream. Fixes: eeadcb757945 ("NFSD: Simplify READ_PLUS") Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10NFSD: Insulate nfsd4_encode_read() from page boundaries in the encode bufferChuck Lever
Commit 28d5bc468efe ("NFSD: Optimize nfsd4_encode_readv()") replaced the use of write_bytes_to_xdr_buf() because it's expensive and the data items to be encoded are already properly aligned. However, the current code will corrupt the encoded data if the XDR data items that are reserved early and then poked into the XDR buffer later happen to fall on a page boundary in the XDR encoding buffer. __xdr_commit_encode can shift encoded data items in the encoding buffer so that pointers returned from xdr_reserve_space() no longer address the same part of the encoding stream. This isn't an issue for splice reads because the reserved encode buffer areas must fall in the XDR buffers header for the splice to work without error. For vectored reads, however, there is a possibility of send buffer corruption in rare cases. Fixes: 28d5bc468efe ("NFSD: Optimize nfsd4_encode_readv()") Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-10NFSD: Encode COMPOUND operation status on page boundariesChuck Lever
J. David reports an odd corruption of a READDIR reply sent to a FreeBSD client. xdr_reserve_space() has to do a special trick when the @nbytes value requests more space than there is in the current page of the XDR buffer. In that case, xdr_reserve_space() returns a pointer to the start of the next page, and then the next call to xdr_reserve_space() invokes __xdr_commit_encode() to copy enough of the data item back into the previous page to make that data item contiguous across the page boundary. But we need to be careful in the case where buffer space is reserved early for a data item whose value will be inserted into the buffer later. One such caller, nfsd4_encode_operation(), reserves 8 bytes in the encoding buffer for each COMPOUND operation. However, a READDIR result can sometimes encode file names so that there are only 4 bytes left at the end of the current XDR buffer page (though plenty of pages are left to handle the remaining encoding tasks). If a COMPOUND operation follows the READDIR result (say, a GETATTR), then nfsd4_encode_operation() will reserve 8 bytes for the op number (9) and the op status (usually NFS4_OK). In this weird case, xdr_reserve_space() returns a pointer to byte zero of the next buffer page, as it assumes the data item will be copied back into place (in the previous page) on the next call to xdr_reserve_space(). nfsd4_encode_operation() writes the op num into the buffer, then saves the next 4-byte location for the op's status code. The next xdr_reserve_space() call is part of GETATTR encoding, so the op num gets copied back into the previous page, but the saved location for the op status continues to point to the wrong spot in the current XDR buffer page because __xdr_commit_encode() moved that data item. After GETATTR encoding is complete, nfsd4_encode_operation() writes the op status over the first XDR data item in the GETATTR result. The NFS4_OK status code (0) makes it look like there are zero items in the GETATTR's attribute bitmask. The patch description of commit 2825a7f90753 ("nfsd4: allow encoding across page boundaries") [2014] remarks that NFSD "can't handle a new operation starting close to the end of a page." This bug appears to be one reason for that remark. Reported-by: J David <j.david.lists@gmail.com> Closes: https://lore.kernel.org/linux-nfs/3998d739-c042-46b4-8166-dbd6c5f0e804@oracle.com/T/#t Tested-by: Rick Macklem <rmacklem@uoguelph.ca> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: fix UAF when access ex_uuid or ex_statsYang Erkun
We can access exp->ex_stats or exp->ex_uuid in rcu context(c_show and e_show). All these resources should be released using kfree_rcu. Fix this by using call_rcu, clean them all after a rcu grace period. ================================================================== BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd] Read of size 1 at addr ff11000010fdc120 by task cat/870 CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x2c/0x3a0 print_report+0xb9/0x280 kasan_report+0xae/0xe0 svc_export_show+0x362/0x430 [nfsd] c_show+0x161/0x390 [sunrpc] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 proc_reg_read+0xe1/0x140 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 830: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc_node_track_caller_noprof+0x1bc/0x400 kmemdup_noprof+0x22/0x50 svc_export_parse+0x8a9/0xb80 [nfsd] cache_do_downcall+0x71/0xa0 [sunrpc] cache_write_procfs+0x8e/0xd0 [sunrpc] proc_reg_write+0xe1/0x140 vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 868: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x37/0x50 kfree+0xf3/0x3e0 svc_export_put+0x87/0xb0 [nfsd] cache_purge+0x17f/0x1f0 [sunrpc] nfsd_destroy_serv+0x226/0x2d0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: ae74136b4bb6 ("SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: no need get cache ref when protected by rcuYang Erkun
rcu_read_lock/rcu_read_unlock has already provide protection for the pointer we will reference when we call e_show. Therefore, there is no need to obtain a cache reference to help protect cache_head. Additionally, the .put such as expkey_put/svc_export_put will invoke dput, which can sleep and break rcu. Stop get cache reference to fix them all. Fixes: ae74136b4bb6 ("SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock") Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06NFSD: add cb opcode to WARN_ONCE on failed callbackOlga Kornievskaia
It helps to know what kind of callback happened that triggered the WARN_ONCE in nfsd4_cb_done() function in diagnosing what can set an uncommon state where both cb_status and tk_status are set at the same time. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06NFSD: fix decoding in nfs4_xdr_dec_cb_getattrOlga Kornievskaia
If a client were to send an error to a CB_GETATTR call, the code erronously continues to try decode past the error code. It ends up returning BAD_XDR error to the rpc layer and then in turn trigger a WARN_ONCE in nfsd4_cb_done() function. Fixes: 6487a13b5c6b ("NFSD: add support for CB_GETATTR callback") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: add shrinker to reduce number of slots allocated per sessionNeilBrown
Add a shrinker which frees unused slots and may ask the clients to use fewer slots on each session. We keep a global count of the number of freeable slots, which is the sum of one less than the current "target" slots in all sessions in all clients in all net-namespaces. This number is reported by the shrinker. When the shrinker is asked to free some, we call xxx on each session in a round-robin asking each to reduce the slot count by 1. This will reduce the "target" so the number reported by the shrinker will reduce immediately. The memory will only be freed later when the client confirmed that it is no longer needed. We use a global list of sessions and move the "head" to after the last session that we asked to reduce, so the next callback from the shrinker will move on to the next session. This pressure should be applied "evenly" across all sessions over time. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: add support for freeing unused session-DRC slotsNeilBrown
Reducing the number of slots in the session slot table requires confirmation from the client. This patch adds reduce_session_slots() which starts the process of getting confirmation, but never calls it. That will come in a later patch. Before we can free a slot we need to confirm that the client won't try to use it again. This involves returning a lower cr_maxrequests in a SEQUENCE reply and then seeing a ca_maxrequests on the same slot which is not larger than we limit we are trying to impose. So for each slot we need to remember that we have sent a reduced cr_maxrequests. To achieve this we introduce a concept of request "generations". Each time we decide to reduce cr_maxrequests we increment the generation number, and record this when we return the lower cr_maxrequests to the client. When a slot with the current generation reports a low ca_maxrequests, we commit to that level and free extra slots. We use an 16 bit generation number (64 seems wasteful) and if it cycles we iterate all slots and reset the generation number to avoid false matches. When we free a slot we store the seqid in the slot pointer so that it can be restored when we reactivate the slot. The RFC can be read as suggesting that the slot number could restart from one after a slot is retired and reactivated, but also suggests that retiring slots is not required. So when we reactive a slot we accept with the next seqid in sequence, or 1. When decoding sa_highest_slotid into maxslots we need to add 1 - this matches how it is encoded for the reply. se_dead is moved in struct nfsd4_session to remove a hole. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: allocate new session-based DRC slots on demand.NeilBrown
If a client ever uses the highest available slot for a given session, attempt to allocate more slots so there is room for the client to use them if wanted. GFP_NOWAIT is used so if there is not plenty of free memory, failure is expected - which is what we want. It also allows the allocation while holding a spinlock. Each time we increase the number of slots by 20% (rounded up). This allows fairly quick growth while avoiding excessive over-shoot. We would expect to stablise with around 10% more slots available than the client actually uses. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: add session slot count to /proc/fs/nfsd/clients/*/infoNeilBrown
Each client now reports the number of slots allocated in each session. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: remove artificial limits on the session-based DRCNeilBrown
Rather than guessing how much space it might be safe to use for the DRC, simply try allocating slots and be prepared to accept failure. The first slot for each session is allocated with GFP_KERNEL which is unlikely to fail. Subsequent slots are allocated with the addition of __GFP_NORETRY which is expected to fail if there isn't much free memory. This is probably too aggressive but clears the way for adding a shrinker interface to free extra slots when memory is tight. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: use an xarray to store v4.1 session slotsNeilBrown
Using an xarray to store session slots will make it easier to change the number of active slots based on demand, and removes an unnecessary limit. To achieve good throughput with a high-latency server it can be helpful to have hundreds of concurrent writes, which means hundreds of slots. So increase the limit to 2048 (twice what the Linux client will currently use). This limit is only a sanity check, not a hard limit. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06sunrpc: remove all connection limit configurationNeilBrown
Now that the connection limit only apply to unconfirmed connections, there is no need to configure it. So remove all the configuration and fix the number of unconfirmed connections as always 64 - which is now given a name: XPT_MAX_TMP_CONN Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: don't use sv_nrthreads in connection limiting calculations.NeilBrown
The heuristic for limiting the number of incoming connections to nfsd currently uses sv_nrthreads - allowing more connections if more threads were configured. A future patch will allow number of threads to grow dynamically so that there will be no need to configure sv_nrthreads. So we need a different solution for limiting connections. It isn't clear what problem is solved by limiting connections (as mentioned in a code comment) but the most likely problem is a connection storm - many connections that are not doing productive work. These will be closed after about 6 minutes already but it might help to slow down a storm. This patch adds a per-connection flag XPT_PEER_VALID which indicates that the peer has presented a filehandle for which it has some sort of access. i.e the peer is known to be trusted in some way. We now only count connections which have NOT been determined to be valid. There should be relative few of these at any given time. If the number of non-validated peer exceed a limit - currently 64 - we close the oldest non-validated peer to avoid having too many of these useless connections. Note that this patch significantly changes the meaning of the various configuration parameters for "max connections". The next patch will remove all of these. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: fix legacy client tracking initializationScott Mayhew
Get rid of the nfsd4_legacy_tracking_ops->init() call in check_for_legacy_methods(). That will be handled in the caller (nfsd4_client_tracking_init()). Otherwise, we'll wind up calling nfsd4_legacy_tracking_ops->init() twice, and the second time we'll trigger the BUG_ON() in nfsd4_init_recdir(). Fixes: 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking") Reported-by: Jur van der Burg <jur@avtware.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219580 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06NFSD: Clean up unused variableChuck Lever
@sb should have been removed by commit 7e64c5bc497c ("NLM/NFSD: Fix lock notifications for async-capable filesystems"). Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: use new wake_up_var interfaces.NeilBrown
The wake_up_var interface is fragile as barriers are sometimes needed. There are now new interfaces so that most wake-ups can use an interface that is guaranteed to have all barriers needed. This patch changes the wake up on cl_cb_inflight to use atomic_dec_and_wake_up(). It also changes the wake up on rp_locked to use store_release_wake_up(). This involves changing rp_locked from atomic_t to int. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-01-06nfsd: trace: remove redundant stateid even deleg_recallChen Hanxiao
Since commit e56dc9e2949e ("nfsd: remove fault injection code") remove all nfsd_recall_delegations codes, we don't need trace_nfsd_deleg_recall any more. Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-12-23Merge tag 'nfsd-6.13-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever:: - Revert one v6.13 fix at the author's request (to be done differently) - Fix a minor problem with recent NFSv4.2 COPY enhancements - Fix an NFSv4.0 callback bug introduced in the v6.13 merge window * tag 'nfsd-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: restore callback functionality for NFSv4.0 NFSD: fix management of pending async copies nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"
2024-12-20nfsd: restore callback functionality for NFSv4.0NeilBrown
A recent patch inadvertently broke callbacks for NFSv4.0. In the 4.0 case we do not expect a session to be found but still need to call setup_callback_client() which will not try to dereference it. This patch moves the check for failure to find a session into the 4.1+ branch of setup_callback_client() Fixes: 1e02c641c3a4 ("NFSD: Prevent NULL dereference in nfsd4_process_cb_update()") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-12-17NFSD: fix management of pending async copiesOlga Kornievskaia
Currently the pending_async_copies count is decremented just before a struct nfsd4_copy is destroyed. After commit aa0ebd21df9c ("NFSD: Add nfsd4_copy time-to-live") nfsd4_copy structures sticks around for 10 lease periods after the COPY itself has completed, the pending_async_copies count stays high for a long time. This causes NFSD to avoid the use of background copy even though the actual background copy workload might no longer be running. In this patch, decrement pending_async_copies once async copy thread is done processing the copy work. Fixes: aa0ebd21df9c ("NFSD: Add nfsd4_copy time-to-live") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-12-17nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"Yang Erkun
This reverts commit f8c989a0c89a75d30f899a7cabdc14d72522bb8d. Before this commit, svc_export_put or expkey_put will call path_put with sync mode. After this commit, path_put will be called with async mode. And this can lead the unexpected results show as follow. mkfs.xfs -f /dev/sda echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports exportfs -ra service nfs-server start mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1 mount /dev/sda /mnt/sda touch /mnt1/sda/file exportfs -r umount /mnt/sda # failed unexcepted The touch will finally call nfsd_cross_mnt, add refcount to mount, and then add cache_head. Before this commit, exportfs -r will call cache_flush to cleanup all cache_head, and path_put in svc_export_put/expkey_put will be finished with sync mode. So, the latter umount will always success. However, after this commit, path_put will be called with async mode, the latter umount may failed, and if we add some delay, umount will success too. Personally I think this bug and should be fixed. We first revert before bugfix patch, and then fix the original bug with a different way. Fixes: f8c989a0c89a ("nfsd: release svc_expkey/svc_export with rcu_work") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>