summaryrefslogtreecommitdiff
path: root/fs/gfs2
AgeCommit message (Collapse)Author
2025-05-30gfs2: Don't clear sb->s_fs_info in gfs2_sys_fs_addAndrew Price
When gfs2_sys_fs_add() fails, it sets sb->s_fs_info to NULL on its error path (see commit 0d515210b696 ("GFS2: Add kobject release method")). The intention seems to be to prevent dereferencing sb->s_fs_info once the object pointed to has been deallocated, but that would be better achieved by setting the pointer to NULL in free_sbd(). As a consequence, when the call to gfs2_sys_fs_add() fails in gfs2_fill_super(), sdp = GFS2_SB(inode) will evaluate to NULL in iput() -> gfs2_drop_inode(), and accessing sdp->sd_flags will be a NULL pointer dereference. Fix that by only setting sb->s_fs_info to NULL when actually freeing the object pointed to in free_sbd(). Fixes: ae9f3bd8259a ("gfs2: replace sd_aspace with sd_inode") Reported-by: syzbot+b12826218502df019f9d@syzkaller.appspotmail.com Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-26Merge tag 'gfs2-for-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix the long-standing warnings in inode_to_wb() when CONFIG_LOCKDEP is enabled: gfs2 doesn't support cgroup writeback and so inode->i_wb will never change. This is the counterpart of commit 9e888998ea4d ("writeback: fix false warning in inode_to_wb()") - Fix a hang introduced by commit 8d391972ae2d ("gfs2: Remove __gfs2_writepage()"): prevent gfs2_logd from creating transactions for jdata pages while trying to flush the log - Fix a race between gfs2_create_inode() and gfs2_evict_inode() by deallocating partially created inodes on the gfs2_create_inode() error path - Fix a bug in the journal head lookup code that could cause mount to fail after successful recovery - Various smaller fixes and cleanups from various people * tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits) gfs2: No more gfs2_find_jhead caching gfs2: Get rid of duplicate log head lookup gfs2: Simplify clean_journal gfs2: Simplify gfs2_log_pointers_init gfs2: Move gfs2_log_pointers_init gfs2: Minor comments fix gfs2: Don't start unnecessary transactions during log flush gfs2: Move gfs2_trans_add_databufs gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folio gfs2: avoid inefficient use of crc32_le_shift() gfs2: Do not call iomap_zero_range beyond eof gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batch gfs2: Fix usage of bio->bi_status in gfs2_end_log_write gfs2: deallocate inodes in gfs2_create_inode gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_dealloc gfs2: Move gfs2_dinode_dealloc gfs2: Don't reread inodes unnecessarily gfs2: gfs2_create_inode error handling fix gfs2: Remove unnecessary NULL check before free_percpu() gfs2: check sb_min_blocksize return value ...
2025-05-26Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - ublk updates: - Add support for updating the size of a ublk instance - Zero-copy improvements - Auto-registering of buffers for zero-copy - Series simplifying and improving GET_DATA and request lookup - Series adding quiesce support - Lots of selftests additions - Various cleanups - NVMe updates via Christoph: - add per-node DMA pools and use them for PRP/SGL allocations (Caleb Sander Mateos, Keith Busch) - nvme-fcloop refcounting fixes (Daniel Wagner) - support delayed removal of the multipath node and optionally support the multipath node for private namespaces (Nilay Shroff) - support shared CQs in the PCI endpoint target code (Wilfred Mallawa) - support admin-queue only authentication (Hannes Reinecke) - use the crc32c library instead of the crypto API (Eric Biggers) - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes Reinecke, Leon Romanovsky, Gustavo A. R. Silva) - MD updates via Yu: - Fix that normal IO can be starved by sync IO, found by mkfs on newly created large raid5, with some clean up patches for bdev inflight counters - Clean up brd, getting rid of atomic kmaps and bvec poking - Add loop driver specifically for zoned IO testing - Eliminate blk-rq-qos calls with a static key, if not enabled - Improve hctx locking for when a plug has IO for multiple queues pending - Remove block layer bouncing support, which in turn means we can remove the per-node bounce stat as well - Improve blk-throttle support - Improve delay support for blk-throttle - Improve brd discard support - Unify IO scheduler switching. This should also fix a bunch of lockdep warnings we've been seeing, after enabling lockdep support for queue freezing/unfreezeing - Add support for block write streams via FDP (flexible data placement) on NVMe - Add a bunch of block helpers, facilitating the removal of a bunch of duplicated boilerplate code - Remove obsolete BLK_MQ pci and virtio Kconfig options - Add atomic/untorn write support to blktrace - Various little cleanups and fixes * tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits) selftests: ublk: add test for UBLK_F_QUIESCE ublk: add feature UBLK_F_QUIESCE selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE traceevent/block: Add REQ_ATOMIC flag to block trace events ublk: run auto buf unregisgering in same io_ring_ctx with registering io_uring: add helper io_uring_cmd_ctx_handle() ublk: remove io argument from ublk_auto_buf_reg_fallback() ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch() selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK selftests: ublk: support UBLK_F_AUTO_BUF_REG ublk: support UBLK_AUTO_BUF_REG_FALLBACK ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG ublk: prepare for supporting to register request buffer automatically ublk: convert to refcount_t selftests: ublk: make IO & device removal test more stressful nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk nvme: introduce multipath_always_on module param nvme-multipath: introduce delayed removal of the multipath head node nvme-pci: derive and better document max segments limits nvme-pci: use struct_size for allocation struct nvme_dev ...
2025-05-26Merge tag 'vfs-6.16-rc1.super' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs freezing updates from Christian Brauner: "This contains various filesystem freezing related work for this cycle: - Allow the power subsystem to support filesystem freeze for suspend and hibernate. Now all the pieces are in place to actually allow the power subsystem to freeze/thaw filesystems during suspend/resume. Filesystems are only frozen and thawed if the power subsystem does actually own the freeze. If the filesystem is already frozen by the time we've frozen all userspace processes we don't care to freeze it again. That's userspace's job once the process resumes. We only actually freeze filesystems if we absolutely have to and we ignore other failures to freeze. We could bubble up errors and fail suspend/resume if the error isn't EBUSY (aka it's already frozen) but I don't think that this is worth it. Filesystem freezing during suspend/resume is best-effort. If the user has 500 ext4 filesystems mounted and 4 fail to freeze for whatever reason then we simply skip them. What we have now is already a big improvement and let's see how we fare with it before making our lives even harder (and uglier) than we have to. - Allow efivars to support freeze and thaw Allow efivarfs to partake to resync variable state during system hibernation and suspend. Add freeze/thaw support. This is a pretty straightforward implementation. We simply add regular freeze/thaw support for both userspace and the kernel. efivars is the first pseudofilesystem that adds support for filesystem freezing and thawing. The simplicity comes from the fact that we simply always resync variable state after efivarfs has been frozen. It doesn't matter whether that's because of suspend, userspace initiated freeze or hibernation. Efivars is simple enough that it doesn't matter that we walk all dentries. There are no directories and there aren't insane amounts of entries and both freeze/thaw are already heavy-handed operations. If userspace initiated a freeze/thaw cycle they would need CAP_SYS_ADMIN in the initial user namespace (as that's where efivarfs is mounted) so it can't be triggered by random userspace. IOW, we really really don't care" * tag 'vfs-6.16-rc1.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: f2fs: fix freezing filesystem during resize kernfs: add warning about implementing freeze/thaw efivarfs: support freeze/thaw power: freeze filesystems during suspend/resume libfs: export find_next_child() super: add filesystem freezing helpers for suspend and hibernate gfs2: pass through holder from the VFS for freeze/thaw super: use common iterator (Part 2) super: use a common iterator (Part 1) super: skip dying superblocks early super: simplify user_get_super() super: remove pointless s_root checks fs: allow all writers to be frozen locking/percpu-rwsem: add freezable alternative to down_read
2025-05-22gfs2: No more gfs2_find_jhead cachingAndreas Gruenbacher
We are no longer calling gfs2_find_jhead() on the same log twice, so there is no more reason for keeping the log contents cached across those calls. In addition, log head lookup and log header writing didn't go through the same address space and so the caching wasn't even fully working, anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Get rid of duplicate log head lookupAndreas Gruenbacher
Currently at mount time, the recovery code looks up the current log head and, if necessary, replays the log and writes a recovery header to indicate that the log is clean. It does that for each log that may need recovery. We also know that our own log will always be checked as part of that process. Then, the mount code looks up the log head of our own log again. The double log head lookup can be costly, but more importantly, it is unnecessary because we can trivially compute the position of the log head after recovery; all we need to do for that is bump the position and lh_sequence by one when writing a recovery header. With that in mind, move the call to gfs2_log_pointers_init() into gfs2_recover_func() and get rid of the double lookup in gfs2_make_fs_rw(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Simplify clean_journalAndreas Gruenbacher
In function clean_journal(), update @head to point at the log header that indicates successful recovery: this is where logging needs to resume. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Simplify gfs2_log_pointers_initAndreas Gruenbacher
Move the initialization of sdp->sd_log_sequence and sdp->sd_log_flush_head inside gfs2_log_pointers_init(). Use gfs2_replay_incr_blk(). Before this change, the log head lookup code in freeze_go_xmote_bh() didn't update sdp->sd_log_flush_head. This is now fixed, but the code in freeze_go_xmote_bh() appears to be pretty useless in the first place: on a frozen filesystem, the log head will not change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Move gfs2_log_pointers_initAndreas Gruenbacher
Move gfs2_log_pointers_init to recovery.c: there is no need for inlining this function. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Minor comments fixAndreas Gruenbacher
Commit 40829760096df ("gfs2: Convert gfs2_find_jhead() to use a folio") replaced grab_cache_page() by filemap_grab_folio(), but the comments were still referring to grab_cache_page(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Don't start unnecessary transactions during log flushAndreas Gruenbacher
Commit 8d391972ae2d ("gfs2: Remove __gfs2_writepage()") changed the log flush code in gfs2_ail1_start_one() to call aops->writepages() instead of aops->writepage(). For jdata inodes, this means that we will now try to reserve log space and start a transaction before we can determine that the pages in question have already been journaled. When this happens in the context of gfs2_logd(), it can now appear that not enough log space is available for freeing up log space, and we will lock up. Fix that by issuing journal writes directly instead of going through aops->writepages() in the log flush code. Fixes: 8d391972ae2d ("gfs2: Remove __gfs2_writepage()") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Move gfs2_trans_add_databufsAndreas Gruenbacher
Move gfs2_trans_add_databufs() to trans.c. Pass in a glock instead of a gfs2_inode. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folioAndreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: avoid inefficient use of crc32_le_shift()Eric Biggers
__get_log_header() was using crc32_le_shift() to update a CRC with four zero bytes. However, this is about 5x slower than just CRC'ing four zero bytes in the normal way. Just do that instead. (We could instead make crc32_le_shift() faster on short lengths. But all its callers do just fine without it, so I'd like to just remove it.) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Do not call iomap_zero_range beyond eofAndreas Gruenbacher
Since commit eb65540aa9fc ("iomap: warn on zero range of a post-eof folio"), iomap_zero_range() warns when asked to zero a folio beyond eof. The warning triggers on the following code path: gfs2_fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE) __gfs2_punch_hole() gfs2_block_zero_range() iomap_zero_range() In __gfs2_punch_hole(), gfs2 zeroes out partial folios at the beginning and at the end of the specified range, whether those folios are beyond eof or not. This may add folios to the page cache which are entirely beyond eof, which isn't of any use. Avoid that by truncating the range to zero out at eof. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batchChristoph Hellwig
__gfs2_jdata_write_folio can't return AOP_WRITEPAGE_ACTIVATE, so don't check for it in gfs2_write_jdata_batch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-09super: add filesystem freezing helpers for suspend and hibernateChristian Brauner
Allow the power subsystem to support filesystem freeze for suspend and hibernate. For some kernel subsystems it is paramount that they are guaranteed that they are the owner of the freeze to avoid any risk of deadlocks. This is the case for the power subsystem. Enable it to recognize whether it did actually freeze the filesystem. If userspace has 10 filesystems and suspend/hibernate manges to freeze 5 and then fails on the 6th for whatever odd reason (current or future) then power needs to undo the freeze of the first 5 filesystems. It can't just walk the list again because while it's unlikely that a new filesystem got added in the meantime it still cannot tell which filesystems the power subsystem actually managed to get a freeze reference count on that needs to be dropped during thaw. There's various ways out of this ugliness. For example, record the filesystems the power subsystem managed to freeze on a temporary list in the callbacks and then walk that list backwards during thaw to undo the freezing or make sure that the power subsystem just actually exclusively freezes things it can freeze and marking such filesystems as being owned by power for the duration of the suspend or resume cycle. I opted for the latter as that seemed the clean thing to do even if it means more code changes. If hibernation races with filesystem freezing (e.g. DM reconfiguration), then hibernation need not freeze a filesystem because it's already frozen but userspace may thaw the filesystem before hibernation actually happens. If the race happens the other way around, DM reconfiguration may unexpectedly fail with EBUSY. So allow FREEZE_EXCL to nest with other holders. An exclusive freezer cannot be undone by any of the other concurrent freezers. Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-07gfs2: use bdev_rw_virt in gfs2_read_superChristoph Hellwig
Switch gfs2_read_super to allocate the superblock buffer using kmalloc which falls back to the page allocator for PAGE_SIZE allocation but gives us a kernel virtual address and then use bdev_rw_virt to perform the synchronous read into it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250507120451.4000627-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-24gfs2: Fix usage of bio->bi_status in gfs2_end_log_writeAndrew Price
bio->bi_status is an index into the blk_errors array, not an errno. Its __bitwise tag is cast away here, resulting in a sparse warning: fs/gfs2/lops.c:207:22: warning: cast from restricted blk_status_t We could either add __force to the cast and continue logging bi_status in the error message, or we could look up the errno in the array and log that. As sdp->sd_log_error is used as an errno in all other cases, look up the errno here for consistency. Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-24gfs2: deallocate inodes in gfs2_create_inodeAndreas Gruenbacher
When creating and destroying inodes, we are relying on the inode hash table to make sure that for a given inode number, only a single inode will exist. We then link that inode to its inode and iopen glock and let those glocks point back at the inode. However, when iget_failed() is called, the inode is removed from the inode hash table before gfs_evict_inode() is called, and uniqueness is no longer guaranteed. Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work around that problem by detaching the inode glock from the inode before calling iget_failed(), but that broke the inode deallocation code in gfs_evict_inode(). To fix that, deallocate partially created inodes in gfs2_create_inode() instead of relying on gfs_evict_inode() for doing that. This means that gfs2_evict_inode() and its helper functions will no longer see partially created inodes, and so some simplifications are possible there. Fixes: 9ffa18884cce ("gfs2: gl_object races fix") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_deallocAndreas Gruenbacher
Don't check for the GIF_ALLOC_FAILED flag in gfs2_ea_dealloc() and pass that information explicitly instead. This allows for a cleaner follow-up patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: Move gfs2_dinode_deallocAndreas Gruenbacher
Move gfs2_dinode_dealloc() and its helper gfs2_final_release_pages() from super.c to inode.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: Don't reread inodes unnecessarilyAndreas Gruenbacher
In gfs2_create_inode(), we initialize the inode from scratch and then we write the result to disk. Clear the GLF_INSTANTIATE_NEEDED glock flag to indicate that the inode is up to date. Otherwise, the next time the inode glock is acquired, gfs2_instantiate() would reread the inode from disk, which isn't necessary. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: gfs2_create_inode error handling fixAndreas Gruenbacher
When gfs2_create_inode() finds a directory, make sure to return -EISDIR. Fixes: 571a4b57975a ("GFS2: bugger off early if O_CREAT open finds a directory") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: Remove unnecessary NULL check before free_percpu()Chen Ni
free_percpu() checks for NULL pointers internally. Remove unneeded NULL check here. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: check sb_min_blocksize return valueEdward Adam Davis
Check the return value of sb_min_blocksize(): it will be 0 when the requested block size is invalid. In addition, check the return value of sb_set_blocksize() as well. Reported-by: syzbot+b0018b7468b2af33b4d5@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: replace sd_aspace with sd_inodeAndreas Gruenbacher
Currently, sdp->sd_aspace and the per-inode metadata address spaces use sb->s_bdev->bd_mapping->host as their ->host; folios in those address spaces will thus appear to be on bdev rather than on gfs2 filesystems. This is a problem because gfs2 doesn't support cgroup writeback (SB_I_CGROUPWB), but bdev does. Fix that by using a "dummy" gfs2 inode as ->host in those address spaces. When coming from a folio, folio->mapping->host->i_sb will then be a gfs2 super block and the SB_I_CGROUPWB flag will not be set in sb->s_iflags. Based on a previous version from Bob Peterson from several years ago. Thanks to Tetsuo Handa, Jan Kara, and Rafael Aquini for helping figure this out. Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: only apply DLM_LKF_VALBLK if sb_lvbptr is not NULLAlexander Aring
Currently, gfs2 always sets the DLM_LKF_VALBLK flag to enable lvb handling even when sb_lvbptr is NULL. This currently causes no problems because DLM ignores the DLM_LKF_VALBLK flag when sb_lvbptr is NULL, but it does violate the DLM API. Fix that by only setting DLM_LKF_VALBLK when sb_lvbptr is not NULL. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: move msleep to sleepable contextAlexander Aring
This patch moves the msleep_interruptible() out of the non-sleepable context by moving the ls->ls_recover_spin spinlock around so msleep_interruptible() will be called in a sleepable context. Cc: stable@vger.kernel.org Fixes: 4a7727725dc7 ("GFS2: Fix recovery issues for spectators") Suggested-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-07gfs2: pass through holder from the VFS for freeze/thawChristian Brauner
The filesystem's freeze/thaw functions can be called from contexts where the holder isn't userspace but the kernel, e.g., during systemd suspend/hibernate. So pass through the freeze/thaw flags from the VFS instead of hard-coding them. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-04lib/crc: remove CONFIG_LIBCRC32CEric Biggers
Now that LIBCRC32C does nothing besides select CRC32, make every option that selects LIBCRC32C instead select CRC32 directly. Then remove LIBCRC32C. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250401221600.24878-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-27Merge tag 'gfs2-for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix two bugs related to locking request cancelation (locking request being retried instead of canceled; canceling the wrong locking request) - Prevent a race between inode creation and deferred delete analogous to commit ffd1cf0443a2 from 6.13. This now allows to further simplify gfs2_evict_inode() without introducing mysterious problems - When in inode delete should be verified / retried "later" but that isn't possible, skip the delete instead of carrying it out immediately. This broke in 6.13 - More folio conversions from Matthew Wilcox (plus a fix from Dan Carpenter) - Various minor fixes and cleanups * tag 'gfs2-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (22 commits) gfs2: some comment clarifications gfs2: Fix a NULL vs IS_ERR() bug in gfs2_find_jhead() gfs2: Convert gfs2_meta_read_endio() to use a folio gfs2: Convert gfs2_end_log_write_bh() to work on a folio gfs2: Convert gfs2_find_jhead() to use a folio gfs2: Convert gfs2_jhead_pg_srch() to gfs2_jhead_folio_search() gfs2: Use b_folio in gfs2_check_magic() gfs2: Use b_folio in gfs2_submit_bhs() gfs2: Use b_folio in gfs2_trans_add_meta() gfs2: Use b_folio in gfs2_log_write_bh() gfs2: skip if we cannot defer delete gfs2: remove redundant warnings gfs2: minor evict fix gfs2: Prevent inode creation race (2) gfs2: Fix additional unlikely request cancelation race gfs2: Fix request cancelation bug gfs2: Check for empty queue in run_queue gfs2: Remove more dead code in add_to_queue gfs2: Replace GIF_DEFER_DELETE with GLF_DEFER_DELETE gfs2: glock holder GL_NOPID fix ...
2025-03-24Merge tag 'vfs-6.15-rc1.async.dir' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs async dir updates from Christian Brauner: "This contains cleanups that fell out of the work from async directory handling: - Change kern_path_locked() and user_path_locked_at() to never return a negative dentry. This simplifies the usability of these helpers in various places - Drop d_exact_alias() from the remaining place in NFS where it is still used. This also allows us to drop the d_exact_alias() helper completely - Drop an unnecessary call to fh_update() from nfsd_create_locked() - Change i_op->mkdir() to return a struct dentry Change vfs_mkdir() to return a dentry provided by the filesystems which is hashed and positive. This allows us to reduce the number of cases where the resulting dentry is not positive to very few cases. The code in these places becomes simpler and easier to understand. - Repack DENTRY_* and LOOKUP_* flags" * tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: fix inline emphasis warning VFS: Change vfs_mkdir() to return the dentry. nfs: change mkdir inode_operation to return alternate dentry if needed. fuse: return correct dentry for ->mkdir ceph: return the correct dentry on mkdir hostfs: store inode in dentry after mkdir if possible. Change inode_operations.mkdir to return struct dentry * nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked() nfs/vfs: discard d_exact_alias() VFS: add common error checks to lookup_one_qstr_excl() VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry VFS: repack LOOKUP_ bit flags. VFS: repack DENTRY_ flags.
2025-03-24Merge tag 'vfs-6.15-rc1.iomap' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iomap updates from Christian Brauner: - Allow the filesystem to submit the writeback bios. - Allow the filsystem to track completions on a per-bio bases instead of the entire I/O. - Change writeback_ops so that ->submit_bio can be done by the filesystem. - A new ANON_WRITE flag for writes that don't have a block number assigned to them at the iomap level leaving the filesystem to do that work in the submission handler. - Incremental iterator advance The folio_batch support for zero range where the filesystem provides a batch of folios to process that might not be logically continguous requires more flexibility than the current offset based iteration currently offers. Update all iomap operations to advance the iterator within the operation and thus remove the need to advance from the core iomap iterator. - Make buffered writes work with RWF_DONTCACHE If RWF_DONTCACHE is set for a write, mark the folios being written as uncached. On writeback completion the pages will be dropped. - Introduce infrastructure for large atomic writes This will eventually be used by xfs and ext4. * tag 'vfs-6.15-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits) iomap: rework IOMAP atomic flags iomap: comment on atomic write checks in iomap_dio_bio_iter() iomap: inline iomap_dio_bio_opflags() iomap: fix inline data on buffered read iomap: Lift blocksize restriction on atomic writes iomap: Support SW-based atomic writes iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW xfs: flag as supporting FOP_DONTCACHE iomap: make buffered writes work with RWF_DONTCACHE iomap: introduce a full map advance helper iomap: rename iomap_iter processed field to status iomap: remove unnecessary advance from iomap_iter() dax: advance the iomap_iter on pte and pmd faults dax: advance the iomap_iter on dedupe range dax: advance the iomap_iter on unshare range dax: advance the iomap_iter on zero range dax: push advance down into dax_iomap_iter() for read and write dax: advance the iomap_iter in the read/write path iomap: convert misc simple ops to incremental advance iomap: advance the iter on direct I/O ...
2025-03-18gfs2: some comment clarificationsAndreas Gruenbacher
Since commit e1fa9ea85ce8 ("gfs2: Stop using glock holder auto-demotion for now"), we unconditionally drop the inode glock before trying to fault in more pages. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-12gfs2: Fix a NULL vs IS_ERR() bug in gfs2_find_jhead()Dan Carpenter
The filemap_grab_folio() function doesn't return NULL, it returns error pointers. Fix the check to match. Fixes: 40829760096d ("gfs2: Convert gfs2_find_jhead() to use a folio") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Convert gfs2_meta_read_endio() to use a folioMatthew Wilcox (Oracle)
Switch from bio_for_each_segment_all() to bio_for_each_folio_all() which removes a call to page_buffers(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Convert gfs2_end_log_write_bh() to work on a folioMatthew Wilcox (Oracle)
gfs2_end_log_write() has to handle bios which consist of both pages which belong to folios and pages which were allocated from a mempool and do not belong to a folio. It would be cleaner to have separate endio handlers which handle each type, but it's not clear to me whether that's even possible. This patch is slightly forward-looking in that page_folio() cannot currently return NULL, but it will return NULL in the future for pages which do not belong to a folio. This was the last user of page_has_buffers(), so remove it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Convert gfs2_find_jhead() to use a folioMatthew Wilcox (Oracle)
Remove a call to grab_cache_page() by using a folio throughout this function. [agruenba@redhat.com: Adjust to return value difference between bio_add_page() and bio_add_folio().] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Convert gfs2_jhead_pg_srch() to gfs2_jhead_folio_search()Matthew Wilcox (Oracle)
Pass in the folio instead of the page. Add an assert that this is not a large folio as we'd need a more complex solution if we wanted to kmap() each page out of a large folio. Removes a use of folio->page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [agruenba@redhat.com: Rename gfs2_jhead_folio_srch() to gfs2_jhead_folio_search().] Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Use b_folio in gfs2_check_magic()Matthew Wilcox (Oracle)
We are preparing to remove bh->b_page. Use kmap_local_folio() instead of kmap_local_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Use b_folio in gfs2_submit_bhs()Matthew Wilcox (Oracle)
Remove a reference to bh->b_page which is going to be removed soon. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Use b_folio in gfs2_trans_add_meta()Matthew Wilcox (Oracle)
The lock bit is maintained on the folio, not on the page. Saves two calls to compound_head() as well as removing two references to bh->b_page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Use b_folio in gfs2_log_write_bh()Matthew Wilcox (Oracle)
We are preparing to remove bh->b_page. gfs2_log_write() should continue to operate on pages as some of the memory being logged does not come from folios, so convert from folio to page in this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: skip if we cannot defer deleteAndreas Gruenbacher
In gfs2_evict_inode(), in the unlikely case that we cannot defer deleting the inode, it is not safe to fall back to deleting the inode; the only valid choice we have is to skip the delete. In addition, in evict_should_delete(), if we cannot lock the inode glock exclusively, we are in a bad enough state that skipping the delete is likely a better choice than trying to recover from the failure later. Fixes: c5b7a2400edc ("gfs2: Only defer deletes when we have an iopen glock") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: remove redundant warningsAndreas Gruenbacher
In glock_set_object() and glock_clear_object(), there is no need to print the glock type and number when we dump the entire glock, anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: minor evict fixAndreas Gruenbacher
In evict_should_delete(), when gfs2_upgrade_iopen_glock() fails, we detach the iopen glock from the inode without calling glock_clear_object(). This leads to a warning in glock_set_object() when the same inode is recreated and the glock is reused. Fix that by only detaching the iopen glock in gfs2_evict_inode(). In addition, remove the dequeue code from evict_should_delete(); we already perform a conditional dequeue in gfs2_evict_inode(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Prevent inode creation race (2)Andreas Gruenbacher
In gfs2_try_evict(), we try grabbing the inode to evict, we try to evict it, and then we try grabbing it again to see if it still exists. There is no guarantee that we will end up with the same inode both times; the inode validity check that commit ffd1cf0443a2 ("gfs2: Prevent inode creation race") added to the first grab is actually needed both times. (To avoid code duplication, add a grab_existing_inode() helper.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Fix additional unlikely request cancelation raceAndreas Gruenbacher
In gfs2_glock_dq(), we must drop the glock spin lock before calling ->lm_cancel, but this means that in the meantime, the operation we are trying to cancel could complete. If the operation completes unsuccessfully, another holder can end up at the head of the queue and another ->lm_lock operation can get started. In this case, we would end up canceling that second operation by accident. To prevent that, introduce a new GLF_CANCELING flag. Set that flag in gfs2_glock_dq() when trying to cancel an operation. When seeing that flag, finish_xmote() will then keep the GLF_LOCK flag set to prevent other glock operations from taking place. gfs2_glock_dq() then completes the cancelation attempt by clearing GLF_LOCK and GLF_CANCELING. In addition, add a missing GLF_DEMOTE_IN_PROGRESS check in gfs2_glock_dq() to make sure that we won't accidentally cancel a demote request. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Fix request cancelation bugAndreas Gruenbacher
In finish_xmote(), when a locking request is canceled, the corresponding holder is moved to the tail of the holders list instead of being dequeued immediately. When there is only a single holder, the canceled locking request is then immediately repeated. This makes no sense; it looks like another remnant of LM_FLAG_PRIORITY support. Instead, dequeue canceled holders and proceed with the next holder in finish_xmote(). We can then easily detect in gfs2_glock_dq() when a holder has been canceled. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>