summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-21bcachefs: "Journal stuck" timeout now takes into account device latencyKent Overstreet
If a block device (e.g. your typical consumer SSD) is taking multiple seconds for IOs (typically flushes), we don't want to emit the "journal stuck" message prematurely. Also, make sure to drop the btree_trans srcu lock if we're blocking for more than a second. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21bcachefs: Reduce stack frame size of __bch2_str_hash_check_key()Kent Overstreet
We don't need all the helpers inlined here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21bcachefs: Fix btree_trans_peek_key_cache()Kent Overstreet
BTREE_ITER_cached_nofill has some tricky corner cases; it's used internally for iterators that aren't walking the key cache, but need to be coherent with the key cache. It tells traverse to look up and lock the key cache entry if present, but don't create one if it doesn't exist. That means we have to have a BTREE_ITER_UPTODATE path (because after traverse the path has to be UPTODATE, or we pop assertions) that doesn't point to anything (which is the less bad option, taken by the previous fix). The previous fix for this path missed an issue that can happen in bch2_trans_peek_key_cache(): we can't set should_be_locked on a path that doesn't point to anything and doesn't hold locks. Fixes: bd5b09727f3d ("bcachefs: Don't set btree_path to updtodate if we don't fill") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-15bcachefs: Fix check_inode_hash_info_matches_root()Kent Overstreet
Can't use memcmp() when the struct contains padding. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Document issue with bch_stripe layoutKent Overstreet
We've got a problem with bch_stripe that is going to take an on disk format rev to fix - we can't access the block sector counts if the checksum type is unknown. Document it for now, there are a few other things to fix as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Fix self healing on read errorKent Overstreet
We were incorrectly checking if there'd been an io error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Pop all the transactions from the abort oneAlan Huang
The transaction is going to abort, so there will be no cycle involving this transaction anymore. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Only abort the transactions in the cycleAlan Huang
When the cycle doesn't involve the initiator of the cycle detection, we might choose a transaction that is not involved in the cycle to abort. It shouldn't be that since it won't break the cycle, this patch therefore chooses the transaction in the cycle to abort. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Introduce lock_graph_pop_fromAlan Huang
This patch introduces a helper function called lock_graph_pop_from, it pops the graph from i. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Convert open-coded lock_graph_pop_all to helperAlan Huang
Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Do not allow no fail lock request to failAlan Huang
If the transaction chose itself as a victim before and restarted, it might request a no fail lock request this time. But it might be added to others' lock graph and be chose as the victim again, it's no longer safe without additional check. We can also convert the cycle detector to be fully RCU-based to solve that unsoundness, but the latency added to trans_put and additional memory required may not worth it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14bcachefs: Merge the condition to avoid additional invocationAlan Huang
If the lock has been acquired and unlocked, we don't have to do clear and wakeup again, though harmless since we hold the intent lock. Merge the condition might be clearer. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-14Revert "bcachefs: Fix bch2_btree_node_upgrade()"Alan Huang
This reverts commit 62448afee714354a26db8a0f3c644f58628f0792. six_lock_tryupgrade fails only if there is an intent lock held, it won't fail no matter how many read locks are held. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-13bcachefs: bcachefs_metadata_version_directory_sizeHongbo Li
This adds another metadata version for accounting directory size. For the new version of the filesystem, when new subdirectory items are created or deleted, the parent directory's size will change accordingly. For the old version of the existed file system, running fsck will automatically upgrade the metadata version, and it will do the check and recalculationg of the directory size. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-13bcachefs: make directory i_size meaningfulHongbo Li
The isize of directory is 0 in bcachefs if the directory is empty. With more child dirents created, its size ought to change. Many other filesystems changed as that (ie. xfs and btrfs). And many of them changed as the size of child dirent name. Although the directory size may not seem to convey much, we can still give it some meaning. The formula of dentry size as follow: occupied_size = 40 + ALIGN(9 + namelen, 8) Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: check_unreachable_inodes is not actually PASS_ONLINE yetKent Overstreet
check_unreachable_inodes does work in online mode, with the one caveat that it assumes check_dirents has also run - and check_dirents is not PASS_ONLINE yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Don't use BTREE_ITER_cached when walking alloc btree during fsckKent Overstreet
No need to pull the whole alloc btree into the btree key cache. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Check for dirents to overwritten inodesKent Overstreet
This fixes various "dirent to missing inode" errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_btree_iter_peek_slot() handles navigating to nonexistent depthKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Don't set btree_path to updtodate if we don't fillKent Overstreet
This fixes various locking asserts, and a null ptr deref in bch2_btree_iter_peek_path(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: __bch2_btree_pos_to_text()Kent Overstreet
Factor out a version of bch2_btree_pos_to_text() that doesn't take a pointer to a in-memory btree node, to be used for btree node scrub. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: printbuf_reset() handles tabstopsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Silence read-only errors when deleting snapshotsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Dropped superblock write is no longer a fatal errorKent Overstreet
Just emit a warning if errors=continue or fix_safe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_trans_node_drop()Kent Overstreet
Factor out a small common helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_trans_unlock_write()Kent Overstreet
New helper for dropping all write locks; which is distinct from the helper the transaction commit path uses, which is faster and only touches updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: btree_node_unlock() can now drop write locksKent Overstreet
Prep work for reworking btree node locking during interior btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: six locks: write locks can now be held recursivelyKent Overstreet
This is needed for the interior update locking rework, where we'll be holding node write locks for the duration of the update - which is needed for synchronizing with online check_allocations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_fs_btree_gc_init()Kent Overstreet
Now returns errors, prep work for check_allocations_done_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Assert that btree write buffer only touches the right btreesKent Overstreet
More asserts, more better. Also, clean up the per-btree flags a bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_inum_path() now crosses subvolumes correctlyKent Overstreet
The dirent that points to a subvolume root is in the parent subvolume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_inum_path() no longer returns an error for disconnected inumsKent Overstreet
bch2_inum_path() should work even if the filesystem is corrupted - we don't want it to cause fsck to fail. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: btree_path_very_locks(): verify lock seqKent Overstreet
If the btree_path's lock seq is wrong, the next bch2_trans_relock() operation is guaranteed to fail and we take an unnecessary transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: fix bch2_btree_key_cache_drop()Kent Overstreet
When evicting, we shouldn't leave a pointer to the key cache entry lying around - that screws up btree path asserts we're adding. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_btree_node_write_trans()Kent Overstreet
Avoiding screwing up path->lock_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Fixes for snapshot_tree.master_subvolKent Overstreet
Ensure that snapshot_tree.master_subvol is cleared when we delete the master subvolume in a tree of snapshots, and allow for snapshot trees that don't have a master subvolume in fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Don't rely on snapshot_tree.master_subvol for reattachingKent Overstreet
Previously, fsck used the snapshot tree's master subvol for finding the root inode number - but the master subvol might have been deleting, and setting a new one should be a user operation; meaning we can't rely on it existing. Fortunately, for finding the root inode number in a tree of snapshots, finding any associated subvolume works. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bch2_kvmalloc()Kent Overstreet
Add a version of kvmalloc() that doesn't have the INT_MAX limit; large filesystems do hit this. We'll want to get rid of the in-memory bucket gens array, but we're not there quite yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Fix assert for online fsckKent Overstreet
We can't check if we're racing with fsck ending until mark_lock is held. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Handle -BCH_ERR_need_mark_replicas in gcKent Overstreet
Locking considerations (possibly no longer relevant?) mean that when an accounting update needs a new superblock replicas entry to be created, it's deferred to the transaction commit error path. But accounting updates for gc/fcsk aren't done from the transaction commit path - so we need to handle -BCH_ERR_btree_insert_need_mark_replicas locally. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Write lock btree node in key cache fillsKent Overstreet
this addresses a key cache coherency bug Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: kill __bch2_btree_iter_flags()Kent Overstreet
bch2_btree_iter_flags() now takes a level parameter; this fixes a bug where using a node iterator on a leaf wouldn't set BTREE_ITER_with_key_cache, leading to fun cache coherency bugs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Drop redundant "read error" call from btree_gcKent Overstreet
The btree node read error path already calls topology error, so this is entirely redundant, and we're not specific enough about our error codes - this was triggering for bucket_ref_update() errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Drop racy warningKent Overstreet
Checking for writing past i_size after unlocking the folio and clearing the dirty bit is racy, and we already check it at the start. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: better check_bp_exists() error messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: add counter_flags for countersHongbo Li
In bcachefs, io_read and io_write counter record the amount of data which has been read and written. They increase in unit of sector, so to display correctly, they need to be shifted to the left by the size of a sector. Other counters like io_move, move_extent_{read, write, finish} also have this problem. In order to support different unit, we add extra column to mark the counter type by using TYPE_COUNTER and TYPE_SECTORS in BCH_PERSISTENT_COUNTERS(). Fixes: 1c6fdbd8f246 ("bcachefs: Initial commit") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bcachefs_metadata_version_autofix_errorsKent Overstreet
It's time to make self healing the default: change the error action for old filesystems to fix_safe, matching the default for current filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: bcachefs_metadata_version_persistent_inode_cursorsKent Overstreet
Persistent cursors for inode allocation. A free inodes btree would add substantial overhead to inode allocation and freeing - a "next num to allocate" cursor is always going to be faster. We just need it to be persistent, to avoid scanning the inodes btree from the start on startup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: bcachefs_metadata_version_inode_depthKent Overstreet
This adds a new inode field, bi_depth, for directory inodes: this allows us to make the check_directory_structure pass much more efficient. Currently, to ensure the filesystem is fully connect and has no loops, for every directory we follow backpointers until we find the root. But by adding a depth counter, it sufficies to only check the parent of each directory, and check that the parent's bi_depth is smaller. (fsck doesn't require that bi_depth = parent->bi_depth + 1; if a rename causes bi_depth off, but the chain to the root is still strictly decreasing, then the algorithm still works and there's no need for fsck to fixup the bi_depth fields). We've already checked backpointers, so we know that every directory (excluding the root)has a valid parent: if bi_depth is always decreasing, every chain must terminate, and terminate at the root directory. bi_depth will not necessarily be correct when fsck runs, due to directory renames - we can't change bi_depth on every child directory when renaming a directory. That's ok; fsck will silently fix the bi_depth field as needed, and future fsck runs will be much faster. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: Option changes now get propagated to reflinked dataKent Overstreet
Now that bch2_move_get_io_opts() re-propagates changed inode io options to bch_extent_rebalance, we can properly suport changing IO path options for reflinked data. Changing a per-file IO path option, either via the xattr interface or via the BCHFS_IOC_REINHERIT_ATTRS ioctl, will now trigger a scan (the inode number is marked as needing a scan, via bch2_set_rebalance_needs_scan()), and rebalance will use bch2_move_data(), which will walk the inode number and pick up the new options. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>