Age | Commit message (Collapse) | Author |
|
Ext4 will throw -EBADMSG through ext4_readdir when a checksum error
occurs, resulting in the following WARNING.
Fix it by mapping EBADMSG to nfserr_io.
nfsd_buffered_readdir
iterate_dir // -EBADMSG -74
ext4_readdir // .iterate_shared
ext4_dx_readdir
ext4_htree_fill_tree
htree_dirblock_to_tree
ext4_read_dirblock
__ext4_read_dirblock
ext4_dirblock_csum_verify
warn_no_space_for_csum
__warn_no_space_for_csum
return ERR_PTR(-EFSBADCRC) // -EBADMSG -74
nfserrno // WARNING
[ 161.115610] ------------[ cut here ]------------
[ 161.116465] nfsd: non-standard errno: -74
[ 161.117315] WARNING: CPU: 1 PID: 780 at fs/nfsd/nfsproc.c:878 nfserrno+0x9d/0xd0
[ 161.118596] Modules linked in:
[ 161.119243] CPU: 1 PID: 780 Comm: nfsd Not tainted 5.10.0-00014-g79679361fd5d #138
[ 161.120684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qe
mu.org 04/01/2014
[ 161.123601] RIP: 0010:nfserrno+0x9d/0xd0
[ 161.124676] Code: 0f 87 da 30 dd 00 83 e3 01 b8 00 00 00 05 75 d7 44 89 ee 48 c7 c7 c0 57 24 98 89 44 24 04 c6
05 ce 2b 61 03 01 e8 99 20 d8 00 <0f> 0b 8b 44 24 04 eb b5 4c 89 e6 48 c7 c7 a0 6d a4 99 e8 cc 15 33
[ 161.127797] RSP: 0018:ffffc90000e2f9c0 EFLAGS: 00010286
[ 161.128794] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 161.130089] RDX: 1ffff1103ee16f6d RSI: 0000000000000008 RDI: fffff520001c5f2a
[ 161.131379] RBP: 0000000000000022 R08: 0000000000000001 R09: ffff8881f70c1827
[ 161.132664] R10: ffffed103ee18304 R11: 0000000000000001 R12: 0000000000000021
[ 161.133949] R13: 00000000ffffffb6 R14: ffff8881317c0000 R15: ffffc90000e2fbd8
[ 161.135244] FS: 0000000000000000(0000) GS:ffff8881f7080000(0000) knlGS:0000000000000000
[ 161.136695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 161.137761] CR2: 00007fcaad70b348 CR3: 0000000144256006 CR4: 0000000000770ee0
[ 161.139041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 161.140291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 161.141519] PKRU: 55555554
[ 161.142076] Call Trace:
[ 161.142575] ? __warn+0x9b/0x140
[ 161.143229] ? nfserrno+0x9d/0xd0
[ 161.143872] ? report_bug+0x125/0x150
[ 161.144595] ? handle_bug+0x41/0x90
[ 161.145284] ? exc_invalid_op+0x14/0x70
[ 161.146009] ? asm_exc_invalid_op+0x12/0x20
[ 161.146816] ? nfserrno+0x9d/0xd0
[ 161.147487] nfsd_buffered_readdir+0x28b/0x2b0
[ 161.148333] ? nfsd4_encode_dirent_fattr+0x380/0x380
[ 161.149258] ? nfsd_buffered_filldir+0xf0/0xf0
[ 161.150093] ? wait_for_concurrent_writes+0x170/0x170
[ 161.151004] ? generic_file_llseek_size+0x48/0x160
[ 161.151895] nfsd_readdir+0x132/0x190
[ 161.152606] ? nfsd4_encode_dirent_fattr+0x380/0x380
[ 161.153516] ? nfsd_unlink+0x380/0x380
[ 161.154256] ? override_creds+0x45/0x60
[ 161.155006] nfsd4_encode_readdir+0x21a/0x3d0
[ 161.155850] ? nfsd4_encode_readlink+0x210/0x210
[ 161.156731] ? write_bytes_to_xdr_buf+0x97/0xe0
[ 161.157598] ? __write_bytes_to_xdr_buf+0xd0/0xd0
[ 161.158494] ? lock_downgrade+0x90/0x90
[ 161.159232] ? nfs4svc_decode_voidarg+0x10/0x10
[ 161.160092] nfsd4_encode_operation+0x15a/0x440
[ 161.160959] nfsd4_proc_compound+0x718/0xe90
[ 161.161818] nfsd_dispatch+0x18e/0x2c0
[ 161.162586] svc_process_common+0x786/0xc50
[ 161.163403] ? nfsd_svc+0x380/0x380
[ 161.164137] ? svc_printk+0x160/0x160
[ 161.164846] ? svc_xprt_do_enqueue.part.0+0x365/0x380
[ 161.165808] ? nfsd_svc+0x380/0x380
[ 161.166523] ? rcu_is_watching+0x23/0x40
[ 161.167309] svc_process+0x1a5/0x200
[ 161.168019] nfsd+0x1f5/0x380
[ 161.168663] ? nfsd_shutdown_threads+0x260/0x260
[ 161.169554] kthread+0x1c4/0x210
[ 161.170224] ? kthread_insert_work_sanity_check+0x80/0x80
[ 161.171246] ret_from_fork+0x1f/0x30
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Commit 5826e09bf3dd ("NFSD: OP_CB_RECALL_ANY should recall both read and
write delegations") added a new assignment statement to add
RCA4_TYPE_MASK_WDATA_DLG to ra_bmval bitmask of OP_CB_RECALL_ANY. So the
old one should be removed.
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH.
Replace the XDR decoder for PUTPUBFH with a "noop" since we no
longer want the minorversion check, and PUTPUBFH has no arguments to
decode. (Ideally nfsd4_decode_noop should really be called
nfsd4_decode_void).
PUTPUBFH should now behave just like PUTROOTFH.
Reported-by: Cedric Blancher <cedric.blancher@gmail.com>
Fixes: e1a90ebd8b23 ("NFSD: Combine decode operations for v4 and v4.1")
Cc: Dan Shelton <dan.f.shelton@gmail.com>
Cc: Roland Mainz <roland.mainz@nrubsig.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The 'callback address' in client_info_show is output without quotes
causing yaml parsers to fail on processing IPv6 addresses.
Adding quotes to 'callback address' also matches that used by
the 'address' field.
Signed-off-by: Mark Grimes <mark.grimes@ixsystems.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If an NFS operation expects a particular sort of object (file, dir, link,
etc) but gets a file handle for a different sort of object, it must
return an error. The actual error varies among NFS versions in non-trivial
ways.
For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only,
INVAL is suitable.
For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK
was found when not expected. This take precedence over NOTDIR.
For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in
preference to EINVAL when none of the specific error codes apply.
When nfsd_mode_check() finds a symlink where it expected a directory it
needs to return an error code that can be converted to NOTDIR for v2 or
v3 but will be SYMLINK for v4. It must be different from the error
code returns when it finds a symlink but expects a regular file - that
must be converted to EINVAL or SYMLINK.
So we introduce an internal error code nfserr_symlink_not_dir which each
version converts as appropriate.
nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is
only used by NFSv4 and only for OPEN. NFSERR_INVAL is never a suitable
error if the object is the wrong time. For v4.0 we use nfserr_symlink
for non-dirs even if not a symlink. For v4.1 we have nfserr_wrong_type.
We handle this difference in-place in nfsd_check_obj_isreg() as there is
nothing to be gained by delaying the choice to nfsd4_map_status().
As a result of these changes, nfsd_mode_check() doesn't need an rqstp
arg any more.
Note that NFSv4 operations are actually performed in the xdr code(!!!)
so to the only place that we can map the status code successfully is in
nfsd4_encode_operation().
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Rather than using ad hoc values for internal errors (30000, 11000, ...)
use 'enum' to sequentially allocate numbers starting from the first
known available number - now visible as NFS4ERR_FIRST_FREE.
The goal is values that are distinct from all be32 error codes. To get
those we must first select integers that are not already used, then
convert them with cpu_to_be32().
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
There is code scattered around nfsd which chooses an error status based
on the particular version of nfs being used. It is cleaner to have the
version specific choices in version specific code.
With this patch common code returns the most specific error code
possible and the version specific code maps that if necessary.
Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()"
function which is called to map the resp->status before each non-trivial
nfsd_proc_* or nfsd3_proc_* function returns.
NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and
are left for a later patch.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
This further centralizes version number checks.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
With this patch the only places that test ->rq_vers against a specific
version are nfsd_v4client() and nfsd_set_fh_dentry().
The latter sets some flags in the svc_fh, which now includes:
fh_64bit_cookies
fh_use_wgather
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
nfsd_breaker_owns_lease() currently open-codes the same test that
nfsd_v4client() performs.
With this patch we use nfsd_v4client() instead.
Also as i_am_nfsd() is only used in combination with kthread_data(),
replace it with nfsd_current_rqst() which combines the two and returns a
valid svc_rqst, or NULL.
The test for NULL is moved into nfsd_v4client() for code clarity.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
nfsd_permission(), exp_rdonly(), nfsd_setuser(), and nfsexp_flags()
only ever need the cred out of rqstp, so pass it explicitly instead of
the whole rqstp.
This makes the interfaces cleaner.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Rather than passing the whole rqst, pass the pieces that are actually
needed. This makes the inputs to rqst_exp_find() more obvious.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Move the stateid handling to nfsd4_copy_notify.
If nfs4_preprocess_stateid_op did not produce an output stateid, error out.
Copy notify specifically does not permit the use of special stateids,
so enforce that outside generic stateid pre-processing.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If an svc thread needs to perform some initialisation that might fail,
it has no good way to handle the failure.
Before the thread can exit it must call svc_exit_thread(), but that
requires the service mutex to be held. The thread cannot simply take
the mutex as that could deadlock if there is a concurrent attempt to
shut down all threads (which is unlikely, but not impossible).
nfsd currently call svc_exit_thread() unprotected in the unlikely event
that unshare_fs_struct() fails.
We can clean this up by introducing svc_thread_init_status() by which an
svc thread can report whether initialisation has succeeded. If it has,
it continues normally into the action loop. If it has not,
svc_thread_init_status() immediately aborts the thread.
svc_start_kthread() waits for either of these to happen, and calls
svc_exit_thread() (under the mutex) if the thread aborted.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
sp_nrthreads is only ever accessed under the service mutex
nlmsvc_mutex nfs_callback_mutex nfsd_mutex
so these is no need for it to be an atomic_t.
The fact that all code using it is single-threaded means that we can
simplify svc_pool_victim and remove the temporary elevation of
sp_nrthreads.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Instead of using kmalloc to allocate an array for storing active version
info, just declare an array to the max size - it is only 5 or so.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
nfsd_pool_stats_open() is used in nfsctl.c, so move it there.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
nfsd4_ssc_init_umount_work() is only used in the nfsd module, so there
is no need to EXPORT it.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
After many rounds of changes in filecache.c, the fix by commit
ce7df055(NFSD: Make the file_delayed_close workqueue UNBOUND)
is gone, now we are getting syslog messages like these:
[ 1618.186688] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 4 times, consider switching to WQ_UNBOUND
[ 1638.661616] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND
[ 1665.284542] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 16 times, consider switching to WQ_UNBOUND
[ 1759.491342] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 32 times, consider switching to WQ_UNBOUND
[ 3013.012308] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 64 times, consider switching to WQ_UNBOUND
[ 3154.172827] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 128 times, consider switching to WQ_UNBOUND
[ 3422.461924] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 256 times, consider switching to WQ_UNBOUND
[ 3963.152054] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 512 times, consider switching to WQ_UNBOUND
Consider use system_unbound_wq instead of system_wq for
nfsd_file_gc_worker().
Signed-off-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
We already count the frees (via nfsd_file_releases). Count the
allocations as well. Also switch the direct call to nfsd_file_slab_free
in nfsd_file_do_acquire to nfsd_file_free, so that the allocs and
releases match up.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If we wait_for_construction and find that the file is no longer hashed,
and we're going to retry the open, the old nfsd_file reference is
currently leaked. Put the reference before retrying.
Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Given that we do the search and insertion while holding the i_lock, I
don't think it's possible for us to get EEXIST here. Remove this case.
Fixes: c6593366c0bf ("nfsd: don't kill nfsd_files because of lease break error")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Youzhong Yang <youzhong@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
nfsd_file_put() in one thread can race with another thread doing
garbage collection (running nfsd_file_gc() -> list_lru_walk() ->
nfsd_file_lru_cb()):
* In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
* nfsd_file_lru_add() returns true (with NFSD_FILE_REFERENCED bit set)
* garbage collector kicks in, nfsd_file_lru_cb() clears REFERENCED bit and
returns LRU_ROTATE.
* garbage collector kicks in again, nfsd_file_lru_cb() now decrements nf->nf_ref
to 0, runs nfsd_file_unhash(), removes it from the LRU and adds to the dispose
list [list_lru_isolate_move(lru, &nf->nf_lru, head)]
* nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))]. The 'nf' has been added
to the 'dispose' list by nfsd_file_lru_cb(), so nfsd_file_lru_remove(nf) simply
treats it as part of the LRU and removes it, which leads to its removal from
the 'dispose' list.
* At this moment, 'nf' is unhashed with its nf_ref being 0, and not on the LRU.
nfsd_file_put() continues its execution [if (refcount_dec_and_test(&nf->nf_ref))],
as nf->nf_ref is already 0, nf->nf_ref is set to REFCOUNT_SATURATED, and the 'nf'
gets no chance of being freed.
nfsd_file_put() can also race with nfsd_file_cond_queue():
* In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
* nfsd_file_lru_add() sets REFERENCED bit and returns true.
* Some userland application runs 'exportfs -f' or something like that, which triggers
__nfsd_file_cache_purge() -> nfsd_file_cond_queue().
* In nfsd_file_cond_queue(), it runs [if (!nfsd_file_unhash(nf))], unhash is done
successfully.
* nfsd_file_cond_queue() runs [if (!nfsd_file_get(nf))], now nf->nf_ref goes to 2.
* nfsd_file_cond_queue() runs [if (nfsd_file_lru_remove(nf))], it succeeds.
* nfsd_file_cond_queue() runs [if (refcount_sub_and_test(decrement, &nf->nf_ref))]
(with "decrement" being 2), so the nf->nf_ref goes to 0, the 'nf' is added to the
dispose list [list_add(&nf->nf_lru, dispose)]
* nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))], although the 'nf' is not
in the LRU, but it is linked in the 'dispose' list, nfsd_file_lru_remove() simply
treats it as part of the LRU and removes it. This leads to its removal from
the 'dispose' list!
* Now nf->ref is 0, unhashed. nfsd_file_put() continues its execution and set
nf->nf_ref to REFCOUNT_SATURATED.
As shown in the above analysis, using nf_lru for both the LRU list and dispose list
can cause the leaks. This patch adds a new list_head nf_gc in struct nfsd_file, and uses
it for the dispose list. This does not fix the nfsd_file leaking issue completely.
Signed-off-by: Youzhong Yang <youzhong@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- One more write delegation fix
* tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
|
|
It is not safe to dereference fl->c.flc_owner without first confirming
fl->fl_lmops is the expected manager. nfsd4_deleg_getattr_conflict()
tests fl_lmops but largely ignores the result and assumes that flc_owner
is an nfs4_delegation anyway. This is wrong.
With this patch we restore the "!= &nfsd_lease_mng_ops" case to behave
as it did before the change mentioned below. This is the same as the
current code, but without any reference to a possible delegation.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix a number of crashers
- Update email address for an NFSD reviewer
* tag 'nfsd-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
fs/nfsd: fix update of inode attrs in CB_GETATTR
nfsd: fix potential UAF in nfsd4_cb_getattr_release
nfsd: hold reference to delegation when updating it for cb_getattr
MAINTAINERS: Update Olga Kornievskaia's email address
nfsd: prevent panic for nfsv4.0 closed files in nfs4_show_open
nfsd: ensure that nfsd4_fattr_args.context is zeroed out
|
|
Currently, we copy the mtime and ctime to the in-core inode and then
mark the inode dirty. This is fine for certain types of filesystems, but
not all. Some require a real setattr to properly change these values
(e.g. ceph or reexported NFS).
Fix this code to call notify_change() instead, which is the proper way
to effect a setattr. There is one problem though:
In this case, the client is holding a write delegation and has sent us
attributes to update our cache. We don't want to break the delegation
for this since that would defeat the purpose. Add a new ATTR_DELEG flag
that makes notify_change bypass the try_break_deleg call.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Once we drop the delegation reference, the fields embedded in it are no
longer safe to access. Do that last.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Once we've dropped the flc_lock, there is nothing that ensures that the
delegation that was found will still be around later. Take a reference
to it while holding the lock and then drop it when we've finished with
the delegation.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Prior to commit 3f29cc82a84c ("nfsd: split sc_status out of
sc_type") states_show() relied on sc_type field to be of valid
type before calling into a subfunction to show content of a
particular stateid. From that commit, we split the validity of
the stateid into sc_status and no longer changed sc_type to 0
while unhashing the stateid. This resulted in kernel oopsing
for nfsv4.0 opens that stay around and in nfs4_show_open()
would derefence sc_file which was NULL.
Instead, for closed open stateids forgo displaying information
that relies of having a valid sc_file.
To reproduce: mount the server with 4.0, read and close
a file and then on the server cat /proc/fs/nfsd/clients/2/states
[ 513.590804] Call trace:
[ 513.590925] _raw_spin_lock+0xcc/0x160
[ 513.591119] nfs4_show_open+0x78/0x2c0 [nfsd]
[ 513.591412] states_show+0x44c/0x488 [nfsd]
[ 513.591681] seq_read_iter+0x5d8/0x760
[ 513.591896] seq_read+0x188/0x208
[ 513.592075] vfs_read+0x148/0x470
[ 513.592241] ksys_read+0xcc/0x178
Fixes: 3f29cc82a84c ("nfsd: split sc_status out of sc_type")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If nfsd4_encode_fattr4 ends up doing a "goto out" before we get to
checking for the security label, then args.context will be set to
uninitialized junk on the stack, which we'll then try to free.
Initialize it early.
Fixes: f59388a579c6 ("NFSD: Add nfsd4_encode_fattr4_sec_label()")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two minor fixes for recent changes
* tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets
sunrpc: avoid -Wformat-security warning
|
|
When creating nfsd sockets via the netlink interface, we do want to
register with the portmapper. Don't set SVC_SOCK_ANONYMOUS.
Reported-by: Steve Dickson <steved@redhat.com>
Fixes: 16a471177496 ("NFSD: add listener-{set,get} netlink command")
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull nfsd updates from Chuck Lever:
"This is a light release containing optimizations, code clean-ups, and
minor bug fixes.
This development cycle focused on work outside of upstream kernel
development:
- Continuing to build upstream CI for NFSD based on kdevops
- Continuing to focus on the quality of NFSD in LTS kernels
- Participation in IETF nfsv4 WG discussions about NFSv4 ACLs,
directory delegation, and NFSv4.2 COPY offload
Notable features for v6.11 that do not come through the NFSD tree
include NFS server-side support for the new pNFS NVMe layout type
[RFC9561]. Functional testing for pNFS block layouts like this one has
been introduced to our kdevops CI harness. Work on improving the
resolution of file attribute time stamps in local filesystems is also
ongoing tree-wide.
As always I am grateful to NFSD contributors, reviewers, testers, and
bug reporters who participated during this cycle"
* tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument
gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey
MAINTAINERS: Add a bugzilla link for NFSD
nfsd: new netlink ops to get/set server pool_mode
sunrpc: refactor pool_mode setting code
nfsd: allow passing in array of thread counts via netlink
nfsd: make nfsd_svc take an array of thread counts
sunrpc: fix up the special handling of sv_nrpools == 1
SUNRPC: Add a trace point in svc_xprt_deferred_close
NFSD: Support write delegations in LAYOUTGET
lockd: Use *-y instead of *-objs in Makefile
NFSD: Fix nfsdcld warning
svcrdma: Handle ADDR_CHANGE CM event properly
svcrdma: Refactor the creation of listener CMA ID
NFSD: remove unused structs 'nfsd3_voidargs'
NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Support passing NULL along AT_EMPTY_PATH for statx().
NULL paths with any flag value other than AT_EMPTY_PATH go the
usual route and end up with -EFAULT to retain compatibility (Rust
is abusing calls of the sort to detect availability of statx)
This avoids path lookup code, lockref management, memory allocation
and in case of NULL path userspace memory access (which can be
quite expensive with SMAP on x86_64)
- Don't block i_writecount during exec. Remove the
deny_write_access() mechanism for executables
- Relax open_by_handle_at() permissions in specific cases where we
can prove that the caller had sufficient privileges to open a file
- Switch timespec64 fields in struct inode to discrete integers
freeing up 4 bytes
Fixes:
- Fix false positive circular locking warning in hfsplus
- Initialize hfs_inode_info after hfs_alloc_inode() in hfs
- Avoid accidental overflows in vfs_fallocate()
- Don't interrupt fallocate with EINTR in tmpfs to avoid constantly
restarting shmem_fallocate()
- Add missing quote in comment in fs/readdir
Cleanups:
- Don't assign and test in an if statement in mqueue. Move the
assignment out of the if statement
- Reflow the logic in may_create_in_sticky()
- Remove the usage of the deprecated ida_simple_xx() API from procfs
- Reject FSCONFIG_CMD_CREATE_EXCL requets that depend on the new
mount api early
- Rename variables in copy_tree() to make it easier to understand
- Replace WARN(down_read_trylock, ...) abuse with proper asserts in
various places in the VFS
- Get rid of user_path_at_empty() and drop the empty argument from
getname_flags()
- Check for error while copying and no path in one branch in
getname_flags()
- Avoid redundant smp_mb() for THP handling in do_dentry_open()
- Rename parent_ino to d_parent_ino and make it use RCU
- Remove unused header include in fs/readdir
- Export in_group_capable() helper and switch f2fs and fuse over to
it instead of open-coding the logic in both places"
* tag 'vfs-6.11.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
ipc: mqueue: remove assignment from IS_ERR argument
vfs: rename parent_ino to d_parent_ino and make it use RCU
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
stat: use vfs_empty_path() helper
fs: new helper vfs_empty_path()
fs: reflow may_create_in_sticky()
vfs: remove redundant smp_mb for thp handling in do_dentry_open
fuse: Use in_group_or_capable() helper
f2fs: Use in_group_or_capable() helper
fs: Export in_group_or_capable()
vfs: reorder checks in may_create_in_sticky
hfs: fix to initialize fields of hfs_inode_info after hfs_alloc_inode()
proc: Remove usage of the deprecated ida_simple_xx() API
hfsplus: fix to avoid false alarm of circular locking
Improve readability of copy_tree
vfs: shave a branch in getname_flags
vfs: retire user_path_at_empty and drop empty arg from getname_flags
vfs: stop using user_path_at_empty in do_readlinkat
tmpfs: don't interrupt fallocate with EINTR
fs: don't block i_writecount during exec
...
|
|
"data" actually refers to a file_lease and not a file_lock. Both structs
have their file_lock_core as the first field though, so this bug should
be harmless without struct randomization in play.
Reported-by: Florian Evers <florian-evers@gmx.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219008
Fixes: 05580bbfc6bc ("nfsd: adapt to breakup of struct file_lock")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Florian Evers <florian-evers@gmx.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Now that nfsd_svc can handle an array of thread counts, fix up the
netlink threads interface to construct one from the netlink call
and pass it through so we can start a pooled server the same way we
would start a normal one.
Note that any unspecified values in the array are considered zeroes,
so it's possible to shut down a pooled server by passing in a short
array that has only zeros, or even an empty array.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Now that the refcounting is fixed, rework nfsd_svc to use the same
thread setup as the pool_threads interface. Have it take an array of
thread counts instead of just a single value, and pass that from the
netlink threads set interface. Since the new netlink interface doesn't
have the same restriction as pool_threads, move the guard against
shutting down all threads to write_pool_threads.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
I noticed LAYOUTGET(LAYOUTIOMODE4_RW) returning NFS4ERR_ACCESS
unexpectedly. The NFS client had created a file with mode 0444, and
the server had returned a write delegation on the OPEN(CREATE). The
client was requesting a RW layout using the write delegation stateid
so that it could flush file modifications.
Creating a read-only file does not seem to be problematic for
NFSv4.1 without pNFS, so I began looking at NFSD's implementation of
LAYOUTGET.
The failure was because fh_verify() was doing a permission check as
part of verifying the FH presented during the LAYOUTGET. It uses the
loga_iomode value to specify the @accmode argument to fh_verify().
fh_verify(MAY_WRITE) on a file whose mode is 0444 fails with -EACCES.
To permit LAYOUT* operations in this case, add OWNER_OVERRIDE when
checking the access permission of the incoming file handle for
LAYOUTGET and LAYOUTCOMMIT.
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # v6.6+
Message-Id: 4E9C0D74-A06D-4DC3-A48A-73034DC40395@oracle.com
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since CONFIG_NFSD_LEGACY_CLIENT_TRACKING is a new config option, its
initial default setting should have been Y (if we are to follow the
common practice of "default Y, wait, default N, wait, remove code").
Paul also suggested adding a clearer remedy action to the warning
message.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Message-Id: <d2ab4ee7-ba0f-44ac-b921-90c8fa5a04d2@molgen.mpg.de>
Fixes: 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
'nfsd3_voidargs' in nfs[23]acl.c is unused since
commit 788f7183fba8 ("NFSD: Add common helpers to decode void args and
encode void results").
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
These lengths come from xdr_stream_decode_u32() and so we should be a
bit careful with them. Use size_add() and struct_size() to avoid
integer overflows. Saving size_add()/struct_size() results to a u32 is
unsafe because it truncates away the high bits.
Also generally storing sizes in longs is safer. Most systems these days
use 64 bit CPUs. It's harder for an addition to overflow 64 bits than
it is to overflow 32 bits. Also functions like vmalloc() can
successfully allocate UINT_MAX bytes, but nothing can allocate ULONG_MAX
bytes.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Due to a late review, revert and re-fix a recent crasher fix
* tag 'nfsd-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "nfsd: fix oops when reading pool_stats before server is started"
nfsd: initialise nfsd_info.mutex early.
|
|
nfsd_info.mutex can be dereferenced by svc_pool_stats_start()
immediately after the new netns is created. Currently this can
trigger an oops.
Move the initialisation earlier before it can possibly be dereferenced.
Fixes: 7b207ccd9833 ("svc: don't hold reference for poolstats, only mutex.")
Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Closes: https://lore.kernel.org/all/c2e9f6de-1ec4-4d3a-b18d-d5a6ec0814a0@linux.ibm.com/
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix crashes triggered by administrative operations on the server
* tag 'nfsd-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: grab nfsd_mutex in nfsd_nl_rpc_status_get_dumpit()
nfsd: fix oops when reading pool_stats before server is started
|
|
Grab nfsd_mutex lock in nfsd_nl_rpc_status_get_dumpit routine and remove
nfsd_nl_rpc_status_get_start() and nfsd_nl_rpc_status_get_done(). This
patch fix the syzbot log reported below:
INFO: task syz-executor.1:17770 blocked for more than 143 seconds.
Not tainted 6.10.0-rc3-syzkaller-00022-gcea2a26553ac #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.1 state:D stack:23800 pid:17770 tgid:17767 ppid:11381 flags:0x00000006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5408 [inline]
__schedule+0x17e8/0x4a20 kernel/sched/core.c:6745
__schedule_loop kernel/sched/core.c:6822 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6837
schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6894
__mutex_lock_common kernel/locking/mutex.c:684 [inline]
__mutex_lock+0x6a4/0xd70 kernel/locking/mutex.c:752
nfsd_nl_listener_get_doit+0x115/0x5d0 fs/nfsd/nfsctl.c:2124
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb16/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e5/0x430 net/netlink/af_netlink.c:2564
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
netlink_unicast+0x7ec/0x980 net/netlink/af_netlink.c:1361
netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x223/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
___sys_sendmsg net/socket.c:2639 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f24ed27cea9
RSP: 002b:00007f24ee0080c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f24ed3b3f80 RCX: 00007f24ed27cea9
RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000005
RBP: 00007f24ed2ebff4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Fixes: 1bd773b4f0c9 ("nfsd: hold nfsd_mutex across entire netlink operation")
Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
A current limitation of open_by_handle_at() is that it's currently not possible
to use it from within containers at all because we require CAP_DAC_READ_SEARCH
in the initial namespace. That's unfortunate because there are scenarios where
using open_by_handle_at() from within containers.
Two examples:
(1) cgroupfs allows to encode cgroups to file handles and reopen them with
open_by_handle_at().
(2) Fanotify allows placing filesystem watches they currently aren't usable in
containers because the returned file handles cannot be used.
Here's a proposal for relaxing the permission check for open_by_handle_at().
(1) Opening file handles when the caller has privileges over the filesystem
(1.1) The caller has an unobstructed view of the filesystem.
(1.2) The caller has permissions to follow a path to the file handle.
This doesn't address the problem of opening a file handle when only a portion
of a filesystem is exposed as is common in containers by e.g., bind-mounting a
subtree. The proposal to solve this use-case is:
(2) Opening file handles when the caller has privileges over a subtree
(2.1) The caller is able to reach the file from the provided mount fd.
(2.2) The caller has permissions to construct an unobstructed path to the
file handle.
(2.3) The caller has permissions to follow a path to the file handle.
The relaxed permission checks are currently restricted to directory file
handles which are what both cgroupfs and fanotify need. Handling disconnected
non-directory file handles would lead to a potentially non-deterministic api.
If a disconnected non-directory file handle is provided we may fail to decode
a valid path that we could use for permission checking. That in itself isn't a
problem as we would just return EACCES in that case. However, confusion may
arise if a non-disconnected dentry ends up in the cache later and those opening
the file handle would suddenly succeed.
* It's potentially possible to use timing information (side-channel) to infer
whether a given inode exists. I don't think that's particularly
problematic. Thanks to Jann for bringing this to my attention.
* An unrelated note (IOW, these are thoughts that apply to
open_by_handle_at() generically and are unrelated to the changes here):
Jann pointed out that we should verify whether deleted files could
potentially be reopened through open_by_handle_at(). I don't think that's
possible though.
Another potential thing to check is whether open_by_handle_at() could be
abused to open internal stuff like memfds or gpu stuff. I don't think so
but I haven't had the time to completely verify this.
This dates back to discussions Amir and I had quite some time ago and thanks to
him for providing a lot of details around the export code and related patches!
Link: https://lore.kernel.org/r/20240524-vfs-open_by_handle_at-v1-1-3d4b7d22736b@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.
This means that with:
__string(field, mystring)
Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.
There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:
git grep -l __assign_str | while read a ; do
sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
mv /tmp/test-file $a;
done
I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.
Note, the same updates will need to be done for:
__assign_str_len()
__assign_rel_str()
__assign_rel_str_len()
I tested this with both an allmodconfig and an allyesconfig (build only for both).
[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
- reduce overhead of fsnotify infrastructure when no permission events
are in use
- a few small cleanups
* tag 'fsnotify_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix UAF from FS_ERROR event on a shutting down filesystem
fsnotify: optimize the case of no permission event watchers
fsnotify: use an enum for group priority constants
fsnotify: move s_fsnotify_connectors into fsnotify_sb_info
fsnotify: lazy attach fsnotify_sb_info state to sb
fsnotify: create helper fsnotify_update_sb_watchers()
fsnotify: pass object pointer and type to fsnotify mark helpers
fanotify: merge two checks regarding add of ignore mark
fsnotify: create a wrapper fsnotify_find_inode_mark()
fsnotify: create helpers to get sb and connp from object
fsnotify: rename fsnotify_{get,put}_sb_connectors()
fsnotify: Avoid -Wflex-array-member-not-at-end warning
fanotify: remove unneeded sub-zero check for unsigned value
|