summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)Author
2025-05-01bcachefs: Fix __bch2_dev_group_set()Kent Overstreet
bch2_sb_disk_groups_to_cpu() goes off of the superblock member info, so we need to set that first. Reported-by: Stijn Tintel <stijn@linux-ipv6.be> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-01bcachefs: Kill ERO for i_blocks check in truncateKent Overstreet
Replace with logging the error in the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-01bcachefs: check for inode.bi_sectors underflowKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-01bcachefs: Kill ERO in __bch2_i_sectors_acct()Kent Overstreet
We won't be root causing this in the immediate future, and it's fairly innocuous - so just log it in the superblock. https://github.com/koverstreet/bcachefs/issues/869 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-30bcachefs: readdir fixesKent Overstreet
- Don't call bch2_trans_relock() after dir_emit(); taking a transaction restart here will cause us to emit the same dirent to userspace twice - Fix incorrect checking of the return value on dir_emit(): "true" means success, keep going, but bch2_dir_emit() needs to return true when we're finished iterating. https://github.com/koverstreet/bcachefs/issues/867 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-30bcachefs: improve missing journal write device error messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Topology error after insert is now an EROKent Overstreet
A user hit this, and this will naturally be easier to debug if we don't panic. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Use bch2_kvmalloc() for journal keys arrayKent Overstreet
We can hit this limit fairly easy when we have to reconstuct large amounts of alloc info on large filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: More informative error message when shutting down due to errorKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: btree_root_unreadable_and_scan_found_nothing autofix for non data ↵Kent Overstreet
btrees If loosing a btree won't cause data loss - i.e. it's an alloc btree, or we can easily reconstruct it - we shouldn't require user action to continue repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: btree_node_data_missing is now autofixKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Don't generate alloc updates to invalid bucketsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Improve bch2_dev_bucket_missing()Kent Overstreet
More useful error message. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: fix bch2_dev_buckets_resize()Kent Overstreet
The resize memcpy path was totally busted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Add upgrade table entry from 0.14Kent Overstreet
There are a few errors that needed to be marked as autofix. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Run BCH_RECOVERY_PASS_reconstruct_snapshots on missing subvol -> ↵Kent Overstreet
snapshot Fix this repair path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Add missing utf8_unload()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Emit unicode version message on startupKent Overstreet
fstests expects this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Use generic_set_sb_d_ops for standard casefolding d_opsKent Overstreet
Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-28bcachefs: Fix losing return code in next_fiemap_extent()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24bcachefs: Rework fiemap transaction restart handlingKent Overstreet
Restart handling in the previous patch was incorrect, so: move btree operations into a separate helper, and run it with a lockrestart_do(). Additionally, clarify whether pagecache or the btree takes precedence. Right now, the btree takes precedence: this is incorrect, but it's needed to pass fstests. Add a giant comment explaining why. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24bcachefs: add fiemap delalloc extent detectionBrian Foster
bcachefs currently populates fiemap data from the extents btree. This works correctly when the fiemap sync flag is provided, but if not, it skips all delalloc extents that have not yet been flushed. This is because delalloc extents from buffered writes are first stored as reservation in the pagecache, and only become resident in the extents btree after writeback completes. Update the fiemap implementation to process holes between extents by scanning pagecache for data, via seek data/hole. If a valid data range is found over a hole in the extent btree, fake up an extent key and flag the extent as delalloc for reporting to userspace. Note that this does not necessarily change behavior for the case where there is dirty pagecache over already written extents, where when in COW mode, writeback will allocate new blocks for the underlying ranges. The existing behavior is consistent with btrfs and it is recommended to use the sync flag for the most up to date extent state from fiemap. Signed-off-by: Brian Foster <bfoster@redhat.com>
2025-04-24bcachefs: refactor fiemap processing into extent helper and structBrian Foster
The bulk of the loop in bch2_fiemap() involves processing the current extent key from the iter, including following indirections and trimming the extent size and such. This patch makes a few changes to reduce the size of the loop and facilitate future changes to support delalloc extents. Define a new bch_fiemap_extent structure to wrap the bkey buffer that holds the extent key to report to userspace along with associated fiemap flags. Update bch2_fill_extent() to take the bch_fiemap_extent as a param instead of the individual fields. Finally, lift the bulk of the extent processing into a bch2_fiemap_extent() helper that takes the current key and formats the bch_fiemap_extent appropriately for the fill function. No functional changes intended by this patch. Signed-off-by: Brian Foster <bfoster@redhat.com>
2025-04-24bcachefs: track current fiemap offset in start variableBrian Foster
Signed-off-by: Brian Foster <bfoster@redhat.com>
2025-04-24bcachefs: drop duplicate fiemap sync flagBrian Foster
FIEMAP_FLAG_SYNC handling was deliberately moved into core code in commit 45dd052e67ad ("fs: handle FIEMAP_FLAG_SYNC in fiemap_prep"), released in kernel v5.8. Update bcachefs accordingly. Signed-off-by: Brian Foster <bfoster@redhat.com>
2025-04-24bcachefs: Fix btree_iter_peek_prev() at end of inodeKent Overstreet
At the end of the inode, on an extents iterator, peek_slot() has to advance to the next position to avoid returning a 0 size extent, which is not allowed. Changing iter->pos confuses peek_prev(), but we don't need to call peek_slot() in this case. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24bcachefs: Make btree_iter_peek_prev() assert more preciseKent Overstreet
The issue this assert is guarding against is that in BTREE_ITER_filter_snapshots mode we only want to be iterating within a single inode number - if we iterate into another inode number with keys for a different snapshot tree, we'll loop arbitrarily long before finding a key we can return. This comes up in the unit tests, where we're using inode 0 for our test keys. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24bcachefs: Unit test fixesKent Overstreet
The peek_end() tests expect an empty btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24bcachefs: Print mount opts earlierKent Overstreet
If we aren't mounting with the correct degraded option, it's helpful to know that before we fail to mount degraded. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24bcachefs: unlink: casefold d_invalidateKent Overstreet
casefolding results in additional aliases on lookup for the non-casefolded names - these need invalidating on unlink. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24bcachefs: Fix casefold lookupsKent Overstreet
Add casefolding to bch2_lookup_trans: During the delay between when casefolding was written and when it was merged, the main filesystem lookup path grew self healing - which meant it was no longer using bch2_dirent_lookup_trans(), where casefolding on lookups happens. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-24bcachefs: Casefold is now a regular opts.h optionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-21bcachefs: Implement fileattr_(get|set)Kent Overstreet
inode_operations.fileattr_(get|set) didn't exist when the various flag ioctls where implemented - but they do now, which means we can delete a bunch of ioctl code in favor of standard VFS level wrappers. Closes: https://lore.kernel.org/linux-bcachefs/7ltgrgqgfummyrlvw7hnfhnu42rfiamoq3lpcvrjnlyytldmzp@yazbhusnztqn/ Cc: Petr Vorel <pvorel@suse.cz> Cc: Andrea Cervesato <andrea.cervesato@suse.de> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-21bcachefs: Allocator now copes with unaligned bucketsKent Overstreet
We had a buggy release of bcachefs-tools that wasn't properly aligning bucket sizes. We can't ask users to reformat - and it's easy to teach the allocator to make sure writes are properly aligned. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-21bcachefs: Start copygc, rebalance threads earlierKent Overstreet
Previously, copygc and rebalance weren't started until the very end of mounting, after all recvoery passes have finished. But copygc really should be started earlier, since it may be needed for allocations to make forward progress. Additionally, we've been seeing occasional bug reports where starting the kthread fails due to a pending signal - i.e. we're getting timed out by systemd (during a version upgrade), but we're not seeing the signal until mount is about to complete. Additionally, we now have copygc/rebalance explicitly wait for check_snapshots to complete (if being run); they require that for snapshot_is_ancestor() in the data move path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-21bcachefs: Refactor bch2_run_recovery_passes()Kent Overstreet
Don't use a continue; this simplifies the next patch where run_recovery_passes() will be responsible for waking up copygc and rebalance at the appropriate time. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20bcachefs: bch2_copygc_wakeup()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20bcachefs: Fix ref leak in write_super()Kent Overstreet
found with the new enumerated_ref code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20bcachefs: Change __journal_entry_close() assert to EROKent Overstreet
We've got some reports of this happening in the wild, and need a bit more info to debug it: https://github.com/koverstreet/bcachefs/issues/854 https://www.reddit.com/r/bcachefs/comments/1k28kjm/surprise_soft_lockup/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20bcachefs: Ensure journal space is block size alignedKent Overstreet
We don't require that bucket size is block size aligned (although it should be!) - so we need to handle this in the journal code. This fixes an assertion pop in jorunal_entry_close(), where the journal entry overruns available space - after rounding it up to block size. Fixes: https://github.com/koverstreet/bcachefs/issues/854 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20bcachefs: Stricter checks on "key allowed in this btree"Kent Overstreet
Syzbot managed to come up with a filesystem where check/repair got rather confused at finding a reflink pointer in the inodes btree. Currently, the "key allowed in this btree" checks only apply at commit time, not read time - for forwards compatibility. It seems this is too loose. Now, strict key type allowed checks apply: - at commit time (no forward compatibility issues) - for btree node pointers - if it's a known btree, known key type, and the key type has the "BKEY_TYPE_strict_btree_checks" flag. This means we still have the option of using generic key types - e.g. KEY_TYPE_error, KEY_TYPE_set - on more existing btrees in the future, while most key types that are intended for only a specific btree get stricter checks. Reported-by: syzbot+baee8591f336cab0958b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20bcachefs: Error ratelimiting is no longer only during fsckKent Overstreet
We now more often do repair automatically, without the user invoking fsck - and sometimes that can involve fixing lots of errors, so let's avoid flooding the dmesg log. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20bcachefs: Fix null ptr deref in bch2_snapshot_tree_oldest_subvol()Kent Overstreet
Reported-by: syzbot+baee8591f336cab0958b@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-20bcachefs: Fix early startup error pathKent Overstreet
Don't set JOURNAL_running until we're also calling journal_space_available() for the first time. If JOURNAL_running is set, shutdown will write an empty journal entry - but this will hit an assert in journal_entry_open() if we've never called journal_space_available(). Reported-by: syzbot+53bb24d476ef8368a7f0@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-17Merge tag 'bcachefs-2025-04-17' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Usual set of small fixes/logging improvements. One bigger user reported fix, for inode <-> dirent inconsistencies reported in fsck, after moving a subvolume that had been snapshotted" * tag 'bcachefs-2025-04-17' of git://evilpiepirate.org/bcachefs: bcachefs: Fix snapshotting a subvolume, then renaming it bcachefs: Add missing READ_ONCE() for metadata replicas bcachefs: snapshot_node_missing is now autofix bcachefs: Log message when incompat version requested but not enabled bcachefs: Print version_incompat_allowed on startup bcachefs: Silence extent_poisoned error messages bcachefs: btree_root_unreadable_and_scan_found_nothing now AUTOFIX bcachefs: fix bch2_dev_usage_full_read_fast() bcachefs: Don't print data read retry success on non-errors bcachefs: Add missing error handling bcachefs: Prevent granting write refs when filesystem is read-only
2025-04-17bcachefs: Fix snapshotting a subvolume, then renaming itKent Overstreet
Subvolume roots and the dirents that point to them are special; they don't obey the normal snapshot versioning rules because they cross snapshot boundaries. We don't keep around older versions of subvolume dirents on rename - we don't need to, because subvolume dirents are only visible in the parent subvolume, and we wouldn't be able to match up the different dirent and inode versions due to crossing the snapshot ID boundary. That means that when we rename a subvolume, that's been snapshotted, the older version of the subvolume root will become dangling - it won't have a dirent that points to it. That's expected, we just need to tell fsck that this is ok. Fixes: https://github.com/koverstreet/bcachefs/issues/856 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-16bcachefs: Add missing READ_ONCE() for metadata replicasKent Overstreet
If we race with the user changing the metadata_replicas setting, this could cause us to get an incorrectly sized disk reservation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-15bcachefs: snapshot_node_missing is now autofixKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-15bcachefs: Log message when incompat version requested but not enabledKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-15bcachefs: Print version_incompat_allowed on startupKent Overstreet
Let users know if incompatible features aren't enabled Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>