Age | Commit message (Collapse) | Author |
|
NFSv3 includes pre/post wcc attributes which allow the client to
determine if all changes to the file have been made by the client
itself, or if any might have been made by some other client.
If there are gaps in the pre/post ctime sequence it must be assumed that
some other client changed the file in that gap and the local cache must
be suspect. The next time the file is opened the cache should be
invalidated.
Since Commit 1c341b777501 ("NFS: Add deferred cache invalidation for
close-to-open consistency violations") in linux 5.3 the Linux client has
been triggering this invalidation. The chunk in nfs_update_inode() in
particularly triggers.
Unfortunately Linux NFS assumes that all replies will be processed in
the order sent, and will arrive in the order processed. This is not
true in general. Consequently Linux NFS might ignore the wcc info in a
WRITE reply because the reply is in response to a WRITE that was sent
before some other request for which a reply has already been seen. This
is detected by Linux using the gencount tests in nfs_inode_attr_cmp().
Also, when the gencount tests pass it is still possible that the request
were processed on the server in a different order, and a gap seen in
the ctime sequence might be filled in by a subsequent reply, so gaps
should not immediately trigger delayed invalidation.
The net result is that writing to a server and then reading the file
back can result in going to the server for the read rather than serving
it from cache - all because a couple of replies arrived out-of-order.
This is a performance regression over kernels before 5.3, though the
change in 5.3 is a correctness improvement.
This has been seen with Linux writing to a Netapp server which
occasionally re-orders requests. In testing the majority of requests
were in-order, but a few (maybe 2 or three at a time) could be
re-ordered.
This patch addresses the problem by recording any gaps seen in the
pre/post ctime sequence and not triggering invalidation until either
there are too many gaps to fit in the table, or until there are no more
active writes and the remaining gaps cannot be resolved.
We allocate a table of 16 gaps on demand. If the allocation fails we
revert to current behaviour which is of little cost as we are unlikely
to be able to cache the writes anyway.
In the table we store "start->end" pair when iversion is updated and
"end<-start" pairs pre/post pairs reported by the server. Usually these
exactly cancel out and so nothing is stored. When there are
out-of-order replies we do store gaps and these will eventually be
cancelled against later replies when this client is the only writer.
If the final write is out-of-order there may be one gap remaining when
the file is closed. This will be noticed and if there is precisely on
gap and if the iversion can be advanced to match it, then we do so.
This patch makes no attempt to handle directories correctly. The same
problem potentially exists in the out-of-order replies to create/unlink
requests can cause future lookup requires to be sent to the server
unnecessarily. A similar scheme using the same primitives could be used
to notice and handle out-of-order replies.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix fast checksum detection, this affects filesystems with non-crc32c
checksum, calculation would not be offloaded to worker threads
- restore thread_pool mount option behaviour for endio workers, the new
value for maximum active threads would not be set to the actual work
queues
* tag 'for-6.3-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix fast csum implementation detection
btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues
|
|
The NFS specific trace points are no longer needed as tracing is well
covered by netfs and fscache.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Daire Byrne <daire@dneg.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The old NFSIOS_FSCACHE counters are no longer accurate or useful with
the conversion to the new netfs API. The new API does not have a page
based interface, and so the counters in nfs_stat_fscachecounters are
no longer obtainable. The new netfs the API has extensive statistics
inside /proc/fs/fscache/stats so we no longer need NFS specific fscache
stats.
Note this also removes the 'fsc:' line from /proc/self/mountstats so
it will be a user-visible change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Daire Byrne <daire@dneg.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Convert the NFS buffered read code paths to corresponding netfs APIs,
but only when fscache is configured and enabled.
The netfs API defines struct netfs_request_ops which must be filled
in by the network filesystem. For NFS, we only need to define 5 of
the functions, the main one being the issue_read() function.
The issue_read() function is called by the netfs layer when a read
cannot be fulfilled locally, and must be sent to the server (either
the cache is not active, or it is active but the data is not available).
Once the read from the server is complete, netfs requires a call to
netfs_subreq_terminated() which conveys either how many bytes were read
successfully, or an error. Note that issue_read() is called with a
structure, netfs_io_subrequest, which defines the IO requested, and
contains a start and a length (both in bytes), and assumes the underlying
netfs will return a either an error on the whole region, or the number
of bytes successfully read.
The NFS IO path is page based and the main APIs are the pgio APIs defined
in pagelist.c. For the pgio APIs, there is no way for the caller to
know how many RPCs will be sent and how the pages will be broken up
into underlying RPCs, each of which will have their own completion and
return code. In contrast, netfs is subrequest based, a single
subrequest may contain multiple pages, and a single subrequest is
initiated with issue_read() and terminated with netfs_subreq_terminated().
Thus, to utilze the netfs APIs, NFS needs some way to accommodate
the netfs API requirement on the single response to the whole
subrequest, while also minimizing disruptive changes to the NFS
pgio layer.
The approach taken with this patch is to allocate a small structure
for each nfs_netfs_issue_read() call, store the final error and number
of bytes successfully transferred in the structure, and update these values
as each RPC completes. The refcount on the structure is used as a marker
for the last RPC completion, is incremented in nfs_netfs_read_initiate(),
and decremented inside nfs_netfs_read_completion(), when a nfs_pgio_header
contains a valid pointer to the data. On the final put (which signals
the final outstanding RPC is complete) in nfs_netfs_read_completion(),
call netfs_subreq_terminated() with either the final error value (if
one or more READs complete with an error) or the number of bytes
successfully transferred (if all RPCs complete successfully). Note
that when all RPCs complete successfully, the number of bytes transferred
is capped to the length of the subrequest. Capping the transferred length
to the subrequest length prevents "Subreq overread" warnings from netfs.
This is due to the "aligned_len" in nfs_pageio_add_page(), and the
corner case where NFS requests a full page at the end of the file,
even when i_size reflects only a partial page (NFS overread).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-by: Daire Byrne <daire@dneg.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
As first steps for support of the netfs library when NFS_FSCACHE is
configured, add NETFS_SUPPORT to Kconfig and add the required netfs_inode
into struct nfs_inode.
Using netfs requires we move the VFS inode structure to be stored
inside struct netfs_inode, along with the fscache_cookie.
Thus, if NFS_FSCACHE is configured, place netfs_inode inside an
anonymous union so the vfs_inode memory is the same and we do
not need to modify other non-fscache areas of NFS.
In addition, inside the NFS fscache code, use the new helpers,
netfs_inode() and netfs_i_cookie() helpers, and remove our own
helper, nfs_i_fscache().
Later patches will convert NFS fscache to fully use netfs.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-by: Daire Byrne <daire@dneg.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Rename readpage_async_filler to nfs_read_add_folio to
better reflect what this function does (add a folio to
the nfs_pageio_descriptor), and simplify arguments to
this function by removing struct nfs_readdesc.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-by: Daire Byrne <daire@dneg.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
There is no need to declare two tables to just create directories,
this can be easily be done with a prefix path with register_sysctl().
Simplify this registration.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
There is no need to declare two tables to just create directories,
this can be easily be done with a prefix path with register_sysctl().
Simplify this registration.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
There is no need to declare two tables to just create directories,
this can be easily be done with a prefix path with register_sysctl().
Simplify this registration.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p fixes from Eric Van Hensbergen:
"These are some collected fixes for the 6.3-rc series that have been
passed our 9p regression tests and been in for-next for at least a
week.
They include a fix for a KASAN reported problem in the extended
attribute handling code and a use after free in the xen transport.
This also includes some updates for the MAINTAINERS file including the
transition of our development mailing list from sourceforge.net to
lists.linux.dev"
* tag '9p-6.3-fixes-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
Update email address and mailing list for v9fs
9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
9P FS: Fix wild-memory-access write in v9fs_get_acl
|
|
The spec requires that we always at least send a RECLAIM_COMPLETE when
we're done establishing the lease and recovering any state.
Fixes: fce5c838e133 ("nfs41: RECLAIM_COMPLETE functionality")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
xfstest generic/361 reports a bug as below:
f2fs_bug_on(sbi, sbi->fsync_node_num);
kernel BUG at fs/f2fs/super.c:1627!
RIP: 0010:f2fs_put_super+0x3a8/0x3b0
Call Trace:
generic_shutdown_super+0x8c/0x1b0
kill_block_super+0x2b/0x60
kill_f2fs_super+0x87/0x110
deactivate_locked_super+0x39/0x80
deactivate_super+0x46/0x50
cleanup_mnt+0x109/0x170
__cleanup_mnt+0x16/0x20
task_work_run+0x65/0xa0
exit_to_user_mode_prepare+0x175/0x190
syscall_exit_to_user_mode+0x25/0x50
do_syscall_64+0x4c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
During umount(), if cp_error is set, f2fs_wait_on_all_pages() should
not stop waiting all F2FS_WB_CP_DATA pages to be writebacked, otherwise,
fsync_node_num can be non-zero after f2fs_wait_on_all_pages() causing
this bug.
In this case, to avoid deadloop in f2fs_wait_on_all_pages(), it needs
to drop all dirty pages rather than redirtying them.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
xfstest generic/019 reports a bug:
kernel BUG at mm/filemap.c:1619!
RIP: 0010:folio_end_writeback+0x8a/0x90
Call Trace:
end_page_writeback+0x1c/0x60
f2fs_write_end_io+0x199/0x420
bio_endio+0x104/0x180
submit_bio_noacct+0xa5/0x510
submit_bio+0x48/0x80
f2fs_submit_write_bio+0x35/0x300
f2fs_submit_merged_ipu_write+0x2a0/0x2b0
f2fs_write_single_data_page+0x838/0x8b0
f2fs_write_cache_pages+0x379/0xa30
f2fs_write_data_pages+0x30c/0x340
do_writepages+0xd8/0x1b0
__writeback_single_inode+0x44/0x370
writeback_sb_inodes+0x233/0x4d0
__writeback_inodes_wb+0x56/0xf0
wb_writeback+0x1dd/0x2d0
wb_workfn+0x367/0x4a0
process_one_work+0x21d/0x430
worker_thread+0x4e/0x3c0
kthread+0x103/0x130
ret_from_fork+0x2c/0x50
The root cause is: after cp_error is set, f2fs_submit_merged_ipu_write()
in f2fs_write_single_data_page() tries to flush IPU bio in cache, however
f2fs_submit_merged_ipu_write() missed to check validity of @bio parameter,
result in submitting random cached bio which belong to other IO context,
then it will cause use-after-free issue, fix it by adding additional
validity check.
Fixes: 0b20fcec8651 ("f2fs: cache global IPU bio")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
i_crtime will never change after inode creation, so we don't need
to copy it into f2fs_inode_info.i_disk_time[3], and monitor its
change to decide whether updating inode page, remove related stuff.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs has supported multi-device feature, to check devices' rw status,
it should use f2fs_hw_is_readonly() rather than bdev_read_only(), fix
it.
Meanwhile, it removes f2fs_hw_is_readonly() check condition in:
- f2fs_write_checkpoint()
- f2fs_convert_inline_inode()
As it has checked f2fs_readonly() condition, and if f2fs' devices
were readonly, f2fs_readonly() must be true.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use common implementation of file type conversion helpers.
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Remove unnecessary lz4hc_compress_pages().
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let's use sysfs_emit.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If the compress feature is not enabled, there is no need to set
compress-related parameters.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When f2fs tries to checkpoint during foreground gc in LFS mode, system
crash occurs due to lack of free space if the amount of dirty node and
dentry pages generated by data migration exceeds free space.
The reproduction sequence is as follows.
- 20GiB capacity block device (null_blk)
- format and mount with LFS mode
- create a file and write 20,000MiB
- 4k random write on full range of the file
RIP: 0010:new_curseg+0x48a/0x510 [f2fs]
Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 48 04 45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff
RSP: 0018:ffff977bc397b218 EFLAGS: 00010246
RAX: 00000000000027b9 RBX: 0000000000000000 RCX: 00000000000027c0
RDX: 0000000000000000 RSI: 00000000000027b9 RDI: ffff8c25ab4e74f8
RBP: ffff977bc397b268 R08: 00000000000027b9 R09: ffff8c29e4a34b40
R10: 0000000000000001 R11: ffff977bc397b0d8 R12: 0000000000000000
R13: ffff8c25b4dd81a0 R14: 0000000000000000 R15: ffff8c2f667f9000
FS: 0000000000000000(0000) GS:ffff8c344ec80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00055d000 CR3: 0000000e30810003 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
allocate_segment_by_default+0x9c/0x110 [f2fs]
f2fs_allocate_data_block+0x243/0xa30 [f2fs]
? __mod_lruvec_page_state+0xa0/0x150
do_write_page+0x80/0x160 [f2fs]
f2fs_do_write_node_page+0x32/0x50 [f2fs]
__write_node_page+0x339/0x730 [f2fs]
f2fs_sync_node_pages+0x5a6/0x780 [f2fs]
block_operations+0x257/0x340 [f2fs]
f2fs_write_checkpoint+0x102/0x1050 [f2fs]
f2fs_gc+0x27c/0x630 [f2fs]
? folio_mark_dirty+0x36/0x70
f2fs_balance_fs+0x16f/0x180 [f2fs]
This patch adds checking whether free sections are enough before checkpoint
during gc.
Signed-off-by: Yonggil Song <yonggil.song@samsung.com>
[Jaegeuk Kim: code clean-up]
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There is only single instance of these ops, and Jaegeuk point out that:
Originally this was intended to give a chance to provide other
allocation option. Anyway, it seems quit hard to do it anymore.
So remove the indirection and call f2fs_get_victim() directly.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Switch cache modes to a bit-mask and use legacy
cache names as shortcuts. Update documentation to
include information on both shortcuts and bitmasks.
This patch also fixes missing guards related to fscache.
Update the documentation for new mount flags
and cache modes.
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
"Two cifs/smb3 client fixes, one for stable:
- double lock fix for a cifs/smb1 reconnect path
- DFS prefixpath fix for reconnect when server moved"
* tag '6.3-rc5-smb3-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: double lock in cifs_reconnect_tcon()
cifs: sanitize paths in cifs_update_super_prepath.
|
|
We are observing huge contention on the epmutex during an http
connection/rate test:
83.17% 0.25% nginx [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
[...]
|--66.96%--__fput
|--60.04%--eventpoll_release_file
|--58.41%--__mutex_lock.isra.6
|--56.56%--osq_lock
The application is multi-threaded, creates a new epoll entry for
each incoming connection, and does not delete it before the
connection shutdown - that is, before the connection's fd close().
Many different threads compete frequently for the epmutex lock,
affecting the overall performance.
To reduce the contention this patch introduces explicit reference counting
for the eventpoll struct. Each registered event acquires a reference,
and references are released at ep_remove() time.
The eventpoll struct is released by whoever - among EP file close() and
and the monitored file close() drops its last reference.
Additionally, this introduces a new 'dying' flag to prevent races between
the EP file close() and the monitored file close().
ep_eventpoll_release() marks, under f_lock spinlock, each epitem as dying
before removing it, while EP file close() does not touch dying epitems.
The above is needed as both close operations could run concurrently and
drop the EP reference acquired via the epitem entry. Without the above
flag, the monitored file close() could reach the EP struct via the epitem
list while the epitem is still listed and then try to put it after its
disposal.
An alternative could be avoiding touching the references acquired via
the epitems at EP file close() time, but that could leave the EP struct
alive for potentially unlimited time after EP file close(), with nasty
side effects.
With all the above in place, we can drop the epmutex usage at disposal time.
Overall this produces a significant performance improvement in the
mentioned connection/rate scenario: the mutex operations disappear from
the topmost offenders in the perf report, and the measured connections/rate
grows by ~60%.
To make the change more readable this additionally renames ep_free() to
ep_clear_and_put(), and moves the actual memory cleanup in a separate
ep_free() helper.
Link: https://lkml.kernel.org/r/4a57788dcaf28f5eb4f8dfddcc3a8b172a7357bb.1679504153.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Xiumei Mu <xmu@redhiat.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
ELF is acronym and therefore should be spelled in all caps.
I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like
being written in the first person.
Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove empty if statement from nfs3_prepare_get_acl and update comment to
follow the one from the referred fs/posix_acl.c:get_acl().
No functional change intended.
Link: https://lkml.kernel.org/r/20221130151231.3654-1-ubizjak@gmail.com
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
procfs' .setattr() has updated i_uid, i_gid and i_mode into proc dirent,
we don't need to call mark_inode_dirty() for delayed update, remove it.
Link: https://lkml.kernel.org/r/20230131150840.34726-1-chao@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"28 hotfixes.
23 are cc:stable and the other five address issues which were
introduced during this merge cycle.
20 are for MM and the remainder are for other subsystems"
* tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: fix a potential concurrency bug in RCU mode
maple_tree: fix get wrong data_end in mtree_lookup_walk()
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
nilfs2: fix sysfs interface lifetime
mm: take a page reference when removing device exclusive entries
mm: vmalloc: avoid warn_alloc noise caused by fatal signal
nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
zsmalloc: document freeable stats
zsmalloc: document new fullness grouping
fsdax: force clear dirty mark if CoW
mm/hugetlb: fix uffd wr-protection for CoW optimization path
mm: enable maple tree RCU mode by default
maple_tree: add RCU lock checking to rcu callback functions
maple_tree: add smp_rmb() to dead node detection
maple_tree: fix write memory barrier of nodes once dead for RCU mode
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
maple_tree: fix freeing of nodes in rcu mode
maple_tree: detect dead nodes in mas_start()
maple_tree: be more cautious about dead nodes
...
|
|
Pull ksmbd server fixes from Steve French:
"Four fixes, three for stable:
- slab out of bounds fix
- lock cancellation fix
- minor cleanup to address clang warning
- fix for xfstest 551 (wrong parms passed to kvmalloc)"
* tag '6.3-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix slab-out-of-bounds in init_smb2_rsp_hdr
ksmbd: delete asynchronous work from list
ksmbd: remove unused is_char_allowed function
ksmbd: do not call kvmalloc() with __GFP_NORETRY | __GFP_NO_WARN
|
|
This lock was supposed to be an unlock.
Fixes: 6cc041e90c17 ("cifs: avoid races in parallel reconnects in smb1")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Conflicts:
drivers/net/ethernet/google/gve/gve.h
3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts")
75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format")
https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/
https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/
Adjacent changes:
net/can/isotp.c
051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()")
96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
fscrypt_initialize() is a "one-time init" function that is called
whenever the key is set up for any inode on any filesystem. Make it
implement "one-time init" more efficiently by not taking a global mutex
in the "already initialized case" and doing fewer pointer dereferences.
Link: https://lore.kernel.org/r/20230406181245.36091-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
This is an implementation of fsverity_operations read_merkle_tree_page,
so it must still return the precise page asked for, but we can use the
folio API to reduce the number of conversions between folios & pages.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-30-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Use the folio API and support folios of arbitrary sizes.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-29-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Use a folio throughout. Does not support large folios due to
an array sized for MAX_BUF_PER_PAGE, but it does remove a few
calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-28-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Iterate once per folio, not once per page.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-27-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Convert to the folio API, saving a few calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-26-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
All the callers now have a folio, so pass that in and operate on folios.
Removes four calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-25-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This definitely doesn't include support for large folios; there
are all kinds of assumptions about the number of buffers attached
to a folio. But it does remove several calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-24-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Remove a few calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-23-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Its one caller already uses a folio.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-22-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Use folio APIs throughout. Saves many calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230324180129.1220691-21-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Remove a call to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-20-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Convert the incoming page to a folio to remove a few calls to
compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-19-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Convert the incoming struct page to a folio. Replaces two implicit
calls to compound_head() with one explicit call.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-18-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Remove a lot of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-17-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Convert the incoming page to a folio so that we call compound_head()
only once instead of seven times.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-16-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
All callers now have a folio, so pass it and use it. The folio may
be large, although I doubt we'll want to use a large folio for an
inline file.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20230324180129.1220691-15-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Saves a number of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20230324180129.1220691-14-willy@infradead.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|