summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-05-20ext4: use writeback_iter in ext4_journalled_submit_inode_data_buffersChristoph Hellwig
Use writeback_iter directly instead of write_cache_pages for a nicer code structure and less indirect calls. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250505091604.3449879-1-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: fix calculation of credits for extent tree modificationJan Kara
Luis and David are reporting that after running generic/750 test for 90+ hours on 2k ext4 filesystem, they are able to trigger a warning in jbd2_journal_dirty_metadata() complaining that there are not enough credits in the running transaction started in ext4_do_writepages(). Indeed the code in ext4_do_writepages() is racy and the extent tree can change between the time we compute credits necessary for extent tree computation and the time we actually modify the extent tree. Thus it may happen that the number of credits actually needed is higher. Modify ext4_ext_index_trans_blocks() to count with the worst case of maximum tree depth. This can reduce the possible number of writers that can operate in the system in parallel (because the credit estimates now won't fit in one transaction) but for reasonably sized journals this shouldn't really be an issue. So just go with a safe and simple fix. Link: https://lore.kernel.org/all/20250415013641.f2ppw6wov4kn4wq2@offworld Reported-by: Davidlohr Bueso <dave@stgolabs.net> Reported-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops@lists.linux.dev Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250429175535.23125-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2025-05-19ksmbd: fix stream write failureNamjae Jeon
If there is no stream data in file, v_len is zero. So, If position(*pos) is zero, stream write will fail due to stream write position validation check. This patch reorganize stream write position validation. Fixes: 0ca6df4f40cf ("ksmbd: prevent out-of-bounds stream writes by validating *pos") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-19smb: client: Reset all search buffer pointers when releasing bufferWang Zhaolong
Multiple pointers in struct cifs_search_info (ntwrk_buf_start, srch_entries_start, and last_entry) point to the same allocated buffer. However, when freeing this buffer, only ntwrk_buf_start was set to NULL, while the other pointers remained pointing to freed memory. This is defensive programming to prevent potential issues with stale pointers. While the active UAF vulnerability is fixed by the previous patch, this change ensures consistent pointer state and more robust error handling. Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-19fanotify: support watching filesystems and mounts inside usernsAmir Goldstein
An unprivileged user is allowed to create an fanotify group and add inode marks, but not filesystem, mntns and mount marks. Add limited support for setting up filesystem, mntns and mount marks by an unprivileged user under the following conditions: 1. User has CAP_SYS_ADMIN in the user ns where the group was created 2.a. User has CAP_SYS_ADMIN in the user ns where the sb was created OR (in case setting up a mntns mark) 2.b. User has CAP_SYS_ADMIN in the user ns associated with the mntns Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250516192803.838659-3-amir73il@gmail.com
2025-05-19fanotify: remove redundant permission checksAmir Goldstein
FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARK flags are already checked as part of the CAP_SYS_ADMIN check for any FANOTIFY_ADMIN_INIT_FLAGS. Remove the individual CAP_SYS_ADMIN checks for these flags. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250516192803.838659-2-amir73il@gmail.com
2025-05-19NFSv4: xattr handlers should check for absent nfs filehandlesScott Mayhew
The nfs inodes for referral anchors that have not yet been followed have their filehandles zeroed out. Attempting to call getxattr() on one of these will cause the nfs client to send a GETATTR to the nfs server with the preceding PUTFH sans filehandle. The server will reply NFS4ERR_NOFILEHANDLE, leading to -EIO being returned to the application. For example: $ strace -e trace=getxattr getfattr -n system.nfs4_acl /mnt/t/ref getxattr("/mnt/t/ref", "system.nfs4_acl", NULL, 0) = -1 EIO (Input/output error) /mnt/t/ref: system.nfs4_acl: Input/output error +++ exited with 1 +++ Have the xattr handlers return -ENODATA instead. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-05-19nfs: add a refcount tracker for struct net as held by the nfs_clientJeff Layton
These are long-held references to the netns, so make sure the refcount tracker is aware of them. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-05-19bcachefs: mkwrite() now only dirties one pageKent Overstreet
Don't dirty the whole folio - fixes write amplification with applications doing mmaped writes. https://www.reddit.com/r/bcachefs/comments/1klzcg1/incredible_amounts_of_write_amplification_when/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-19fs/ntfs3: remove ability to change compression on mounted volumeKonstantin Komarov
Remove all the code related to changing compression on the fly because it's not safe and not maintainable. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-05-18bcachefs: fix extent_has_stripe_ptr()Kent Overstreet
This wasn't checking indirect extents. Fixes: https://github.com/koverstreet/bcachefs/issues/887 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-18smb: client: Fix use-after-free in cifs_fill_direntWang Zhaolong
There is a race condition in the readdir concurrency process, which may access the rsp buffer after it has been released, triggering the following KASAN warning. ================================================================== BUG: KASAN: slab-use-after-free in cifs_fill_dirent+0xb03/0xb60 [cifs] Read of size 4 at addr ffff8880099b819c by task a.out/342975 CPU: 2 UID: 0 PID: 342975 Comm: a.out Not tainted 6.15.0-rc6+ #240 PREEMPT(full) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_report+0xce/0x640 kasan_report+0xb8/0xf0 cifs_fill_dirent+0xb03/0xb60 [cifs] cifs_readdir+0x12cb/0x3190 [cifs] iterate_dir+0x1a1/0x520 __x64_sys_getdents+0x134/0x220 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f996f64b9f9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0d f7 c3 0c 00 f7 d8 64 89 8 RSP: 002b:00007f996f53de78 EFLAGS: 00000207 ORIG_RAX: 000000000000004e RAX: ffffffffffffffda RBX: 00007f996f53ecdc RCX: 00007f996f64b9f9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007f996f53dea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000207 R12: ffffffffffffff88 R13: 0000000000000000 R14: 00007ffc8cd9a500 R15: 00007f996f51e000 </TASK> Allocated by task 408: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 __kasan_slab_alloc+0x6e/0x70 kmem_cache_alloc_noprof+0x117/0x3d0 mempool_alloc_noprof+0xf2/0x2c0 cifs_buf_get+0x36/0x80 [cifs] allocate_buffers+0x1d2/0x330 [cifs] cifs_demultiplex_thread+0x22b/0x2690 [cifs] kthread+0x394/0x720 ret_from_fork+0x34/0x70 ret_from_fork_asm+0x1a/0x30 Freed by task 342979: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x37/0x50 kmem_cache_free+0x2b8/0x500 cifs_buf_release+0x3c/0x70 [cifs] cifs_readdir+0x1c97/0x3190 [cifs] iterate_dir+0x1a1/0x520 __x64_sys_getdents64+0x134/0x220 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e The buggy address belongs to the object at ffff8880099b8000 which belongs to the cache cifs_request of size 16588 The buggy address is located 412 bytes inside of freed 16588-byte region [ffff8880099b8000, ffff8880099bc0cc) The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99b8 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 anon flags: 0x80000000000040(head|node=0|zone=1) page_type: f5(slab) raw: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001 raw: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000 head: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001 head: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000 head: 0080000000000003 ffffea0000266e01 00000000ffffffff 00000000ffffffff head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880099b8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880099b8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880099b8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880099b8200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880099b8280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== POC is available in the link [1]. The problem triggering process is as follows: Process 1 Process 2 ----------------------------------------------------------------- cifs_readdir /* file->private_data == NULL */ initiate_cifs_search cifsFile = kzalloc(sizeof(struct cifsFileInfo), GFP_KERNEL); smb2_query_dir_first ->query_dir_first() SMB2_query_directory SMB2_query_directory_init cifs_send_recv smb2_parse_query_directory srch_inf->ntwrk_buf_start = (char *)rsp; srch_inf->srch_entries_start = (char *)rsp + ... srch_inf->last_entry = (char *)rsp + ... srch_inf->smallBuf = true; find_cifs_entry /* if (cfile->srch_inf.ntwrk_buf_start) */ cifs_small_buf_release(cfile->srch_inf // free cifs_readdir ->iterate_shared() /* file->private_data != NULL */ find_cifs_entry /* in while (...) loop */ smb2_query_dir_next ->query_dir_next() SMB2_query_directory SMB2_query_directory_init cifs_send_recv compound_send_recv smb_send_rqst __smb_send_rqst rc = -ERESTARTSYS; /* if (fatal_signal_pending()) */ goto out; return rc /* if (cfile->srch_inf.last_entry) */ cifs_save_resume_key() cifs_fill_dirent // UAF /* if (rc) */ return -ENOENT; Fix this by ensuring the return code is checked before using pointers from the srch_inf. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220131 [1] Fixes: a364bc0b37f1 ("[CIFS] fix saving of resume key before CIFSFindNext") Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-17bcachefs: Fix bch2_btree_path_traverse_cached() when paths reallocedKent Overstreet
btree_key_cache_fill() will allocate and traverse another path (for the underlying btree), so we can't hold pointers to paths across a call - we have to pass indices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-17btrfs: move misplaced comment of btrfs_path::keep_locksSun YangKai
Commit 925baeddc5b0 ("Btrfs: Start btree concurrency work.") added the comment for the field keep_locks. This got moved later but without the comment, so move it to the right place and fix the comment style. Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-16Merge tag '6.15-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Fix memory leak in mkdir error path - Fix max rsize miscalculation after channel reconnect * tag '6.15-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix zero rsize error messages smb: client: fix memory leak during error handling for POSIX mkdir
2025-05-16Merge tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - NFS: Fix a couple of missed handlers for the ENETDOWN and ENETUNREACH transport errors - NFS: Handle Oopsable failure of nfs_get_lock_context in the unlock path - NFSv4: Fix a race in nfs_local_open_fh() - NFSv4/pNFS: Fix a couple of layout segment leaks in layoutreturn - NFSv4/pNFS Avoid sharing pNFS DS connections between net namespaces since IP addresses are not guaranteed to refer to the same nodes - NFS: Don't flush file data while holding multiple directory locks in nfs_rename() * tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Avoid flushing data while holding directory locks in nfs_rename() NFS/pnfs: Fix the error path in pnfs_layoutreturn_retry_later_locked() NFSv4/pnfs: Reset the layout state after a layoutreturn NFS/localio: Fix a race in nfs_local_open_fh() nfs: nfs3acl: drop useless assignment in nfs3_get_acl() nfs: direct: drop useless initializer in nfs_direct_write_completion() nfs: move the nfs4_data_server_cache into struct nfs_net nfs: don't share pNFS DS connections between net namespaces nfs: handle failure of nfs_get_lock_context in unlock path pNFS/flexfiles: Record the RPC errors in the I/O tracepoints NFSv4/pnfs: Layoutreturn on close must handle fatal networking errors NFSv4: Handle fatal ENETDOWN and ENETUNREACH errors
2025-05-16NFS: Avoid flushing data while holding directory locks in nfs_rename()Trond Myklebust
The Linux client assumes that all filehandles are non-volatile for renames within the same directory (otherwise sillyrename cannot work). However, the existence of the Linux 'subtree_check' export option has meant that nfs_rename() has always assumed it needs to flush writes before attempting to rename. Since NFSv4 does allow the client to query whether or not the server exhibits this behaviour, and since knfsd does actually set the appropriate flag when 'subtree_check' is enabled on an export, it should be OK to optimise away the write flushing behaviour in the cases where it is clearly not needed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2025-05-16NFS/pnfs: Fix the error path in pnfs_layoutreturn_retry_later_locked()Trond Myklebust
If there isn't a valid layout, or the layout stateid has changed, the cleanup after a layout return should clear out the old data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-05-16NFSv4/pnfs: Reset the layout state after a layoutreturnTrond Myklebust
If there are still layout segments in the layout plh_return_lsegs list after a layout return, we should be resetting the state to ensure they eventually get returned as well. Fixes: 68f744797edd ("pNFS: Do not free layout segments that are marked for return") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-05-16ext4: avoid -Wformat-security warningArnd Bergmann
check_igot_inode() prints a variable string, which causes a harmless warning with 'make W=1': fs/ext4/inode.c:4763:45: error: format string is not a string literal (potentially insecure) [-Werror,-Wformat-security] 4763 | ext4_error_inode(inode, function, line, 0, err_str); Use a trivial "%s" format string instead. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250423164354.2780635-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-16btrfs: remove standalone "nologreplay" mount optionQu Wenruo
Standalone "nologreplay" mount option has been marked deprecated since commit 74ef00185eb8 ("btrfs: introduce "rescue=" mount option"), which dates back to v5.9 (2020). Furthermore there is no other filesystem with the same named mount option, so this one is btrfs specific and we will not hit the same problem when removing "norecovery" mount option. So let's remove the standalone "nologreplay" mount option. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-16Merge tag 'xfs-fixes-6.15-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: "This includes a bug fix for a possible data corruption vector on the zoned allocator garbage collector" * tag 'xfs-fixes-6.15-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: Fix comment on xfs_trans_ail_update_bulk() xfs: Fix a comment on xfs_ail_delete xfs: Fail remount with noattr2 on a v5 with v4 enabled xfs: fix zoned GC data corruption due to wrong bv_offset xfs: free up mp->m_free[0].count in error case
2025-05-16coredump: reflow dump helpers a littleChristian Brauner
They look rather messy right now. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-3-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-16coredump: massage do_coredump()Christian Brauner
We're going to extend the coredump code in follow-up patches. Clean it up so we can do this more easily. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-2-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-16coredump: massage format_corename()Christian Brauner
We're going to extend the coredump code in follow-up patches. Clean it up so we can do this more easily. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-1-664f3caf2516@kernel.org Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-16fs/ntfs3: Fix handling of InitializeFileRecordSegmentKonstantin Komarov
Make the logic of handling the InitializeFileRecordSegment operation similar to that in windows. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2025-05-16x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrlJames Morse
Resctrl is a filesystem interface to hardware that provides cache allocation policy and bandwidth control for groups of tasks or CPUs. To support more than one architecture, resctrl needs to live in /fs/. Move the code that is concerned with the filesystem interface to /fs/resctrl. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-25-james.morse@arm.com
2025-05-16fs/resctrl: Add boiler plate for external resctrl codeJames Morse
Add Makefile and Kconfig for fs/resctrl. Add ARCH_HAS_CPU_RESCTRL for the common parts of the resctrl interface and make X86_CPU_RESCTRL select this. Adding an include of asm/resctrl.h to linux/resctrl.h allows the /fs/resctrl files to switch over to using this header instead. Co-developed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64 Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64 Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250515165855.31452-16-james.morse@arm.com
2025-05-16erofs: lazily initialize per-CPU workers and CPU hotplug hooksSandeep Dhavale
Currently, when EROFS is built with per-CPU workers, the workers are started and CPU hotplug hooks are registered during module initialization. This leads to unnecessary worker start/stop cycles during CPU hotplug events, particularly on Android devices that frequently suspend and resume. This change defers the initialization of per-CPU workers and the registration of CPU hotplug hooks until the first EROFS mount. This ensures that these resources are only allocated and managed when EROFS is actually in use. The tear down of per-CPU workers and unregistration of CPU hotplug hooks still occurs during z_erofs_exit_subsystem(), but only if they were initialized. Signed-off-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20250506225743.308517-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-05-16erofs: refine readahead tracepointGao Xiang
- trace_erofs_readpages => trace_erofs_readahead; - Rename a redundant statement `nrpages = readahead_count(rac);`; - Move the tracepoint to the beginning of z_erofs_readahead(). Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20250514120820.2739288-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-05-15Merge tag 'bcachefs-2025-05-15' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "The main user reported ones are: - Fix a btree iterator locking inconsistency that's been causing us to go emergency read-only in evacuate: "Fix broken btree_path lock invariants in next_node()" - Minor btree node cache reclaim tweak that should help with OOMs: don't set btree nodes as accessed on fill - Fix a bch2_bkey_clear_rebalance() issue that was causing rebalance to do needless work" * tag 'bcachefs-2025-05-15' of git://evilpiepirate.org/bcachefs: bcachefs: fix wrong arg to fsck_err() bcachefs: Fix missing commit in backpointer to missing target bcachefs: Fix accidental O(n^2) in fiemap bcachefs: Fix set_should_be_locked() call in peek_slot() bcachefs: Fix self deadlock bcachefs: Don't set btree nodes as accessed on fill bcachefs: Fix livelock in journal_entry_open() bcachefs: Fix broken btree_path lock invariants in next_node() bcachefs: Don't strip rebalance_opts from indirect extents
2025-05-15NFSD: Add a "default" block sizeChuck Lever
We'd like to increase the maximum r/wsize that NFSD can support, but without introducing possible regressions. So let's add a default setting of 1MB. A subsequent patch will raise the maximum value but leave the default alone. No behavior change is expected. Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: Remove NFSSVC_MAXBLKSIZE_V2 macroChuck Lever
The 8192-byte maximum is a protocol-defined limit, and we already have a symbolic constant defined whose name matches the name of the limit defined in the protocol. Replace the duplicate. No change in behavior is expected. Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: Remove NFSD_BUFSIZEChuck Lever
Clean up: The documenting comment for NFSD_BUFSIZE is quite stale. NFSD_BUFSIZE is used only for NFSv4 Reply these days; never for NFSv2 or v3, and never for RPC Calls. Even so, the byte count estimate does not include the size of the NFSv4 COMPOUND Reply HEADER or the RPC auth flavor. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: Use rqstp->rq_bvec in nfsd_iter_write()Chuck Lever
If we can get rid of all uses of rq_vec, then it can be removed. Replace one use of rqstp::rq_vec with rqstp::rq_bvec. The feeling of layering violation grows stronger now that <linux/sunrpc/xdr.h> is included in fs/nfsd/vfs.c. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: De-duplicate the svc_fill_write_vector() call sitesChuck Lever
All three call sites do the same thing. I'm struggling with this a bit, however. struct xdr_buf is an XDR layer object and unmarshaling a WRITE payload is clearly a task intended to be done by the proc and xdr functions, not by VFS. This feels vaguely like a layering violation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15NFSD: Use rqstp->rq_bvec in nfsd_iter_read()Chuck Lever
If we can get rid of all uses of rq_vec, then it can be removed. Replace one use of rqstp::rq_vec with rqstp::rq_bvec. Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.15-rc7). Conflicts: tools/testing/selftests/drivers/net/hw/ncdevmem.c 97c4e094a4b2 ("tests/ncdevmem: Fix double-free of queue array") 2f1a805f32ba ("selftests: ncdevmem: Implement devmem TCP TX") https://lore.kernel.org/20250514122900.1e77d62d@canb.auug.org.au Adjacent changes: net/core/devmem.c net/core/devmem.h 0afc44d8cdf6 ("net: devmem: fix kernel panic when netlink socket close after module unload") bd61848900bf ("net: devmem: Implement TX path") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15ext4: clairfy the rules for modifying extentsZhang Yi
Add a comment at the beginning of extents_status.c to clarify the rules for loading, mapping, modifying, and removing extents and blocks. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250423085257.122685-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-15ext4: check env when mapping and modifying extentsZhang Yi
Add ext4_check_map_extents_env() to the places where loading extents, mapping blocks, removing blocks, and modifying extents, excluding the I/O writeback context. This function will verify whether the locking mechanisms in place are adequate. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250423085257.122685-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-15btrfs: use a single variable to track return value at btrfs_page_mkwrite()Filipe Manana
We have two variables to track return values, ret and ret2, with types vm_fault_t (an unsigned int type) and int, which makes it a bit confusing and harder to keep track. So use a single variable, of type int, and under the 'out' label return vmf_error(ret) in case ret contains an error, otherwise return VM_FAULT_NOPAGE. This is equivalent to what we had before and it's simpler. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: don't return VM_FAULT_SIGBUS on failure to set delalloc for mmap writeFilipe Manana
If the call to btrfs_set_extent_delalloc() fails we are always returning VM_FAULT_SIGBUS, which is odd since the error means "bad access" and the most likely cause for btrfs_set_extent_delalloc() is -ENOMEM, which should be translated to VM_FAULT_OOM. Instead of returning VM_FAULT_SIGBUS return vmf_error(ret2), which gives us a more appropriate return value, and we use that everywhere else too. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: simplify early error checking in btrfs_page_mkwrite()Filipe Manana
We have this entangled error checks early at btrfs_page_mkwrite(): 1) Try to reserve delalloc space by calling btrfs_delalloc_reserve_space() and storing the return value in the ret2 variable; 2) If the reservation succeed, call file_update_time() and store the return value in ret2 and also set the local variable 'reserved' to true (1); 3) Then do an error check on ret2 to see if any of the previous calls failed and if so, jump either to the 'out' label or to the 'out_noreserve' label, depending on whether 'reserved' is true or not. This is unnecessarily complex. Instead change this to a simpler and more straightforward approach: 1) Call btrfs_delalloc_reserve_space(), if that returns an error jump to the 'out_noreserve' label; 2) The call file_update_time() and if that returns an error jump to the 'out' label. Like this there's less nested if statements, no need to use a local variable to track if space was reserved and if statements are used only to check errors. Also move the call to extent_changeset_free() out of the 'out_noreserve' label and under the 'out' label since the changeset is allocated only if the call to reserve delalloc space succeeded. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: pass true to btrfs_delalloc_release_space() at btrfs_page_mkwrite()Filipe Manana
In the last call to btrfs_delalloc_release_space() where the value of the variable 'ret' is never zero, we pass the expression 'ret != 0' as the value for the argument 'qgroup_free', which always evaluates to true. Make this less confusing and more clear by explicitly passing true instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: fix wrong start offset for delalloc space release during mmap writeFilipe Manana
If we're doing a mmap write against a folio that has i_size somewhere in the middle and we have multiple sectors in the folio, we may have to release excess space previously reserved, for the range going from the rounded up (to sector size) i_size to the folio's end offset. We are calculating the right amount to release and passing it to btrfs_delalloc_release_space(), but we are passing the wrong start offset of that range - we're passing the folio's start offset instead of the end offset, plus 1, of the range for which we keep the reservation. This may result in releasing more space then we should and eventually trigger an underflow of the data space_info's bytes_may_use counter. So fix this by passing the start offset as 'end + 1' instead of 'page_start' to btrfs_delalloc_release_space(). Fixes: d0b7da88f640 ("Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: fix harmless race getting delayed ref head count when running delayed ↵Filipe Manana
refs When running delayed references we are reading the number of ready delayed ref heads without taking any lock which can make KCSAN report a race since we can have concurrent tasks updating that number, such as for example when freeing a tree block which will end up decrementing that counter or when adding a new delayed ref while COWing a tree block which will increment that counter. This is a harmless race since running one more or one less delayed ref head doesn't result in any problem, in the critical section of a transaction commit we always run any remaining delayed refs and at that point no one can create more. So fix this harmless race by annotating the read with data_race(). Reported-by: cen zhang <zzzccc427@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAFRLqsUCLMz0hY-GaPj1Z=fhkgRHjxVXHZ8kz0PvkFN0b=8L2Q@mail.gmail.com/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: log error codes during failures when writing super blocksFilipe Manana
When writing super blocks, at write_dev_supers(), we log an error message when we get some error but we don't show which error we got and we have that information. So enhance the error messages with the error codes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: simplify error return logic when getting folio at prepare_one_folio()Filipe Manana
There's no need to have special logic to return -EAGAIN in case the call to __filemap_get_folio() fails, because when FGP_NOWAIT is passed to __filemap_get_folio() it returns ERR_PTR(-EAGAIN) if it needs to do something that would imply blocking. The reason we have this logic is from the days before we migrated to the folio interface, when we called pagecache_get_page() which would return NULL instead of an error pointer. So remove this special casing and always return the error that the call to __filemap_get_folio() returned. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: return real error from __filemap_get_folio() callsFilipe Manana
We have a few places that always assume a -ENOMEM error happened in case a call to __filemap_get_folio() returns an error, which is just too much of an assumption and even if it would be the case at some point in time, it's not future proof and there's nothing in the documentation that guarantees that only ERR_PTR(-ENOMEM) can be returned with the flags we are passing to it. So use the exact error returned by __filemap_get_folio() instead. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-15btrfs: remove superfluous return value check at btrfs_dio_iomap_begin()Filipe Manana
In the if statement that checks the return value from btrfs_check_data_free_space(), there's no point to check if 'ret' is not zero in the else branch, since the main if branch checked that it's zero, so in the else branch it necessarily has a non-zero value. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>