summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-11-28ext4: remove page offset calculation in ext4_block_truncate_page()Baokun Li
For bs <= ps scenarios, calculating the offset within the block is sufficient. For bs > ps, an initial page offset calculation can lead to incorrect behavior. Thus this redundant calculation has been removed. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-3-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-28ext4: remove page offset calculation in ext4_block_zero_page_range()Zhihao Cheng
For bs <= ps scenarios, calculating the offset within the block is sufficient. For bs > ps, an initial page offset calculation can lead to incorrect behavior. Thus this redundant calculation has been removed. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Message-ID: <20251121090654.631996-2-libaokun@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-28afs: Fix uninit var in afs_alloc_anon_key()David Howells
Fix an uninitialised variable (key) in afs_alloc_anon_key() by setting it to cell->anonymous_key. Without this change, the error check may return a false failure with a bad error number. Most of the time this is unlikely to happen because the first encounter with afs_alloc_anon_key() will usually be from (auto)mount, for which all subsequent operations must wait - apart from other (auto)mounts. Once the call->anonymous_key is allocated, all further calls to afs_request_key() will skip the call to afs_alloc_anon_key() for that cell. Fixes: d27c71257825 ("afs: Fix delayed allocation of a cell's anonymous key") Reported-by: Paulo Alcantra <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara <pc@manguebit.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: syzbot+41c68824eefb67cdf00c@syzkaller.appspotmail.com cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-11-28ubifs: vmalloc(array_size()) -> vmalloc_array()Qianfeng Rong
Remove array_size() calls and replace vmalloc() with vmalloc_array() in ubifs_create_dflt_lpt()/lpt_init_rd()/lpt_init_wr(). vmalloc_array() is optimized better, resulting in less instructions being used [1]. [1]: https://lore.kernel.org/lkml/abc66ec5-85a4-47e1-9759-2f60ab111971@vivo.com/ Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2025-11-28ubifs: Remove unnecessary variable assignmentsXichao Zhao
When an error occurs, ubifs_err is used to directly print the error, and different errors have different formats for printing. Therefore, it's not necessary to use 'err' to locate the error occurrence. Thus, remove the relevant assignments to 'err'. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2025-11-28ubifs: Simplify the code using ubifs_crc_nodeXichao Zhao
Replace part of the code using ubifs_crc_node. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2025-11-28ubifs: Remove unnecessary parameters '*c'Xichao Zhao
Because the variable *c is not used within the function, remove it from the ubifs_crc_node function. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2025-11-28Merge tag 'vfs-6.18-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - afs: Fix delayed allocation of a cell's anonymous key The allocation of a cell's anonymous key is done in a background thread along with other cell setup such as doing a DNS upcall. The normal key lookup tries to use the key description on the anonymous authentication key as the reference for request_key() - but it may not yet be set, causing an oops - ovl: fail ovl_lock_rename_workdir() if either target is unhashed As well as checking that the parent hasn't changed after getting the lock, the code needs to check that the dentry hasn't been unhashed. Otherwise overlayfs might try to rename something that has been removed - namespace: fix a reference leak in grab_requested_mnt_ns lookup_mnt_ns() already takes a reference on mnt_ns, and so grab_requested_mnt_ns() doesn't need to take an extra reference * tag 'vfs-6.18-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Fix delayed allocation of a cell's anonymous key ovl: fail ovl_lock_rename_workdir() if either target is unhashed fs/namespace: fix reference leak in grab_requested_mnt_ns
2025-11-28erofs: enable error reporting for z_erofs_stream_switch_bufs()Gao Xiang
Enable propagation of detailed errors to callers. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-11-28erofs: improve Zstd, LZMA and DEFLATE error stringsGao Xiang
Enable better, more detailed, and unique error reporting. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-11-28erofs: improve decompression error reportingGao Xiang
Change the return type of decompress() from `int` to `const char *` to provide more informative error diagnostics: - A NULL return indicates successful decompression; - If IS_ERR(ptr) is true, the return value encodes a standard negative errno (e.g., -ENOMEM, -EOPNOTSUPP) identifying the specific error; - Otherwise, a non-NULL return points to a human-readable error string, and the corresponding error code should be treated as -EFSCORRUPTED. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-11-28erofs: tidy up z_erofs_lz4_handle_overlap()Gao Xiang
- Add some useful comments to explain inplace I/Os and decompression; - Rearrange the code to get rid of one unnecessary goto. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-11-28file: convert replace_fd() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-44-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28exec: convert begin_new_exec() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-21-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28xfs: convert xfs_open_by_handle() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-17-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28userfaultfd: convert new_userfaultfd() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-16-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28timerfd: convert timerfd_create() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-15-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28signalfd: convert do_signalfd4() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-14-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28open: convert do_sys_openat2() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-13-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28eventpoll: convert do_epoll_create() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-12-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28autofs: convert autofs_dev_ioctl_open_mountpoint() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-11-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28nsfs: convert ns_ioctl() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-10-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28nsfs: convert open_namespace() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-9-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28fanotify: convert fanotify_init() to FD_PREPARE()Christian Brauner
Christian Brauner <brauner@kernel.org> says: The fix sent in [1] was squashed into this commit. Link: https://lore.kernel.org/20251127201618.2115275-1-kuniyu@google.com [1] Reported-by: syzbot+321168dfa622eda99689@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/6928b121.a70a0220.d98e3.0110.GAE@google.com Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-8-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28namespace: convert fsmount() to FD_PREPARE()Christian Brauner
Christian Brauner <brauner@kernel.org> says: A variant of the fix sent in [1] was squashed into this commit. Link: https://lore.kernel.org/20251128035149.392402-1-kartikey406@gmail.com [1] Reported-by: Deepanshu Kartikey <kartikey406@gmail.com> Reported-by: syzbot+94048264da5715c251f9@syzkaller.appspotmail.com Tested-by: syzbot+94048264da5715c251f9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=94048264da5715c251f9 Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-7-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28namespace: convert open_tree_attr() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-6-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28namespace: convert open_tree() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-5-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28fhandle: convert do_handle_open() to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-4-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28eventfd: convert do_eventfd() to FD_PREPARE()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-3-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28anon_inodes: convert to FD_ADD()Christian Brauner
Link: https://patch.msgid.link/20251123-work-fd-prepare-v4-2-b6efa1706cfd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28afs: Fix delayed allocation of a cell's anonymous keyDavid Howells
The allocation of a cell's anonymous key is done in a background thread along with other cell setup such as doing a DNS upcall. In the reported bug, this is triggered by afs_parse_source() parsing the device name given to mount() and calling afs_lookup_cell() with the name of the cell. The normal key lookup then tries to use the key description on the anonymous authentication key as the reference for request_key() - but it may not yet be set and so an oops can happen. This has been made more likely to happen by the fix for dynamic lookup failure. Fix this by firstly allocating a reference name and attaching it to the afs_cell record when the record is created. It can share the memory allocation with the cell name (unfortunately it can't just overlap the cell name by prepending it with "afs@" as the cell name already has a '.' prepended for other purposes). This reference name is then passed to request_key(). Secondly, the anon key is now allocated on demand at the point a key is requested in afs_request_key() if it is not already allocated. A mutex is used to prevent multiple allocation for a cell. Thirdly, make afs_request_key_rcu() return NULL if the anonymous key isn't yet allocated (if we need it) and then the caller can return -ECHILD to drop out of RCU-mode and afs_request_key() can be called. Note that the anonymous key is kind of necessary to make the key lookup cache work as that doesn't currently cache a negative lookup, but it's probably worth some investigation to see if NULL can be used instead. Fixes: 330e2c514823 ("afs: Fix dynamic lookup to fail on cell lookup failure") Reported-by: syzbot+41c68824eefb67cdf00c@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/800328.1764325145@warthog.procyon.org.uk cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28ovl: remove unneeded semicolonChen Ni
Remove unnecessary semicolons reported by Coccinelle/coccicheck and the semantic patch at scripts/coccinelle/misc/semicolon.cocci. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Fixed: 7ab96df840e60 ("VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28ovl: fail ovl_lock_rename_workdir() if either target is unhashedNeilBrown
As well as checking that the parent hasn't changed after getting the lock we need to check that the dentry hasn't been unhashed. Otherwise we might try to rename something that has been removed. Reported-by: syzbot+bfc9a0ccf0de47d04e8c@syzkaller.appspotmail.com Fixes: d2c995581c7c ("ovl: Call ovl_create_temp() without lock held.") Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/176429295510.634289.1552337113663461690@noble.neil.brown.name Tested-by: syzbot+bfc9a0ccf0de47d04e8c@syzkaller.appspotmail.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28dcache: touch up predicts in __d_lookup_rcu()Mateusz Guzik
Rationale is that if the parent dentry is the same and the length is the same, then you have to be unlucky for the name to not match. At the same time the dentry was literally just found on the hash, so you have to be even more unlucky to determine it is unhashed. While here add commentary while d_unhashed() is necessary. It was already removed once and brought back in: 2e321806b681b192 ("Revert "vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu"") Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251127131526.4137768-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28filelock: __fcntl_getlease: fix kernel-doc warningsRandy Dunlap
Use the correct function name and add description for the @flavor parameter to avoid these kernel-doc warnings: Warning: fs/locks.c:1706 function parameter 'flavor' not described in '__fcntl_getlease' WARNING: fs/locks.c:1706 expecting prototype for fcntl_getlease(). Prototype was for __fcntl_getlease() instead Fixes: 1602bad16d7d ("vfs: expose delegation support to userland") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251128000826.457120-1-rdunlap@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-28nfsd: fix end_creating() conversionNeil Brown
Avoid a double-unlock as nfs_create_locked() will have unlocked the parent and do the dput() manually. Christian Brauner <brauner@kernel.org> says: I've taken Neil's proposed fix from [1] and added a commit message. Fixes: https://lore.kernel.org/202511252132.2c621407-lkp@intel.com [1] Fixes: bd6ede8a06e8 ("VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()") Signed-off-by: Neil Brown <neil@brown.name> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-27Merge branch 'mm-hotfixes-stable' into mm-nonmm-stable in order to be ableAndrew Morton
to merge "kho: make debugfs interface optional" into mm-nonmm-stable.
2025-11-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: net/xdp/xsk.c 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number") 8da7bea7db69 ("xsk: add indirect call for xsk_destruct_skb") 30ed05adca4a ("xsk: use a smaller new lock for shared pool case") https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27Merge tag 'ceph-for-6.18-rc8' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A patch to make sparse read handling work in msgr2 secure mode from Slava and a couple of fixes from Ziming and myself to avoid operating on potentially invalid memory, all marked for stable" * tag 'ceph-for-6.18-rc8' of https://github.com/ceph/ceph-client: libceph: prevent potential out-of-bounds writes in handle_auth_session_key() libceph: replace BUG_ON with bounds check for map->max_osd ceph: fix crash in process_v2_sparse_read() for encrypted directories libceph: drop started parameter of __ceph_open_session() libceph: fix potential use-after-free in have_mon_and_osd_map()
2025-11-27sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_convJoel Granados
Make do_proc_douintvec static and export proc_douintvec_conv wrapper function for external use. This is to keep with the design in sysctl.c. Update fs/pipe.c to use the new public API. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Create pipe-max-size converter using sysctl UINT macrosJoel Granados
Create a converter for the pipe-max-size proc_handler using the SYSCTL_UINT_CONV_CUSTOM. Move SYSCTL_CONV_IDENTITY macro to the sysctl header to make it available for pipe size validation. Keep returning -EINVAL when (val == 0) by using a range checking converter and setting the minimal valid value (extern1) to SYSCTL_ONE. Keep round_pipe_size by passing it as the operation for SYSCTL_USER_TO_KERN_INT_CONV. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Replace void pointer with const pointer to ctl_tableJoel Granados
* Replace void* data in the converter functions with a const struct ctl_table* table as it was only getting forwarding values from ctl_table->extra{1,2}. * Remove the void* data in the do_proc_* functions as they already had a pointer to the ctl_table. * Remove min/max structures do_proc_do{uint,int}vec_minmax_conv_param; the min/max values get passed directly in ctl_table. * Keep min/max initialization in extra{1,2} in proc_dou8vec_minmax. * The do_proc_douintvec was adjusted outside sysctl.c as it is exported to fs/pipe.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27configfs: Constify ct_item_ops in struct config_item_typeChristophe JAILLET
Make 'ct_item_ops' const in struct config_item_type. This allows constification of many structures which hold some function pointers. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/f43cb57418a7f59e883be8eedc7d6abe802a2094.1761390472.git.christophe.jaillet@wanadoo.fr Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
2025-11-27configfs: Constify ct_group_ops in struct config_item_typeChristophe JAILLET
Make 'ct_group_ops' const in struct config_item_type. This allows constification of many structures which hold some function pointers. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/6b720cf407e8a6d30f35beb72e031b2553d1ab7e.1761390472.git.christophe.jaillet@wanadoo.fr Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
2025-11-27debugfs: Remove broken no-mount modeAaron Thompson
debugfs access modes were added in Linux 5.10 (Dec 2020) [1], but the no-mount mode has behaved effectively the same as the off mode since Linux 5.12 (Apr 2021) [2]. The only difference is the specific error code returned by the debugfs_create_* functions, which is -ENOENT in no-mount mode and -EPERM in off mode. Given that no-mount hasn't worked for several years with no complaints, just remove it. [1] a24c6f7bc923 ("debugfs: Add access restriction option") [2] bc6de804d36b ("debugfs: be more robust at handling improper input in debugfs_lookup()") 56348560d495 ("debugfs: do not attempt to create a new file before the filesystem is initalized") Signed-off-by: Aaron Thompson <dev@aaront.org> Link: https://patch.msgid.link/20251120102222.18371-3-dev@null.aaront.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-27debugfs: Remove redundant access mode checksAaron Thompson
debugfs_get_tree() can only be called if debugfs itself calls simple_pin_fs() or register_filesystem(), and those call paths also check the access mode. debugfs_start_creating() checks the access mode so the checks in the debugfs_create_* functions are unnecessary. An upcoming change will affect debugfs_allow, so doing this cleanup first will make that change simpler. Signed-off-by: Aaron Thompson <dev@aaront.org> Link: https://patch.msgid.link/20251120102222.18371-2-dev@null.aaront.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-26gfs2: Clean up SDF_JOURNAL_LIVE flag handlingAndreas Gruenbacher
Change do_withdraw() to clear the SDF_JOURNAL_LIVE flag under the log flush lock. In addition, change __gfs2_trans_begin() to check if the filesystem is already known to be withdrawn using gfs2_withdrawn(). Then, once we are holding the log flush lock, check if the SDF_JOURNAL_LIVE flag is still set. This second check ensures that the filesystem will remain live until the transaction is submitted. With these changes, it is no longer useful to clear SDF_JOURNAL_LIVE in gfs2_end_log_write() after calling gfs2_withdraw(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: No longer thaw filesystems during a withdrawAndreas Gruenbacher
Previously, when a withdraw occurred, we would wait for another node to recover our journal. This also meant that frozen filesystem needed to be thawed because otherwise, other nodes wouldn't be able to recover the filesystem. With the reversal of commit 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish"), we are no longer waiting for journal recovery during a withdraw, so we no longer need to thaw frozen filesystems, either. This also fixes a potential deadlock reported by lockdep when running xfstest generic/108. In addition, there is nothing left in do_withdraw() that would require taking sd_freeze_mutex, so don't bother taking that lock there anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Withdraw immediately in gfs2_trans_add_metaAndreas Gruenbacher
We can now withdraw while the log is locked. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: New gfs2_withdraw_helperAndreas Gruenbacher
Currently, when a gfs2 filesystem is withdrawn, an "offline" uevent is triggered that invokes gfs2-util's gfs2_withdraw_helper script. The purpose of this script is to deactivate the filesystem's block device so that it can be withdrawn immediately, even before all the filesystem's caches have been discarded. The script provided by gfs2-utils never did anything useful, and there was no way for it to report back its status to the kernel. To fix that, extend the gfs2_withdraw_helper mechanism so that the script can report one of the following results by writing the corresponding value into "/sys$DEVPATH/lock_module/withdraw": 0 - The shared block device has been marked inactive. Future write operations will fail. 1 - The shared block device may still be active and carry out write operations. If the "offline" uevent isn't reacted upon within the timeout configured in /sys$DEVPATH/tune/withdraw_helper_timeout (default 5 seconds), the event handler is assumed to have failed. In addition, add an additional "errors=deactivate" mount option. With these changes, if fatal errors are detected on a gfs2 filesystem and the filesystem is mounted with the "errors=panic" option, the kernel will panic immediately. Otherwise, an attempt will be made to deactivate the underlying block device. If successful, the kernel will release all cluster-wide locks immediately so that the rest of the cluster can continue. If unsuccessful, the kernel will either panic ("errors=deactivate"), or it will purge all filesystem I/O before releasing all cluster-wide locks ("errors=withdraw"). Note that the gfs2_withdraw_helper script still needs to be fixed to take advantage of these improvements. It could be changed to use a mechanism like LVM Persistent Reservations. "dmsetup suspend" is not a suitable mechanism as it infinitely postpones I/O operations, which may prevent withdraw from completing. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>