git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
8 days	bcachefs: btree read retry fixes	Kent Overstreet
	Fix btree node read retries after validate errors: __btree_err() is the wrong place to flag a topology error: that is done by btree_lost_data(). Additionally, some calls to bch2_bkey_pick_read_device() were not updated in the 6.16 rework for improved log messages; we were failing to signal that we still had a retry. Cc: Nikita Ofitserov <himikof@gmail.com> Cc: Alan Huang <mmpgouride@gmail.com> Reported-and-tested-by: Edoardo Codeglia <bcachefs@404.blue> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-16	bcachefs: Kill unused tracepoints	Kent Overstreet
	Dead code cleanup. Link: https://lore.kernel.org/linux-bcachefs/20250612224059.39fddd07@batman.local.home/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-04	bcachefs: Run may_delete_deleted_inode() checks in bch2_inode_rm()	Kent Overstreet
	We had a bug where bch2_evict_inode() incorrectly called bch2_inode_rm() - the journal clearly showed the inode was not unlinked. We've got checks that we use in recovery when cleaning up deleted inodes, lift them to bch2_inode_rm() as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-06-02	bcachefs: bch2_require_recovery_pass()	Kent Overstreet
	Add a helper for requiring that a recovery pass has already run: either run it directly, if we're still in recovery, or if we're not in recovery check if it has run recently and schedule it if it hasn't. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-30	bcachefs: Mark bch_errcode helpers __attribute__((const))	Kent Overstreet
	These don't access global memory or defer pointer arguments - this enables CSE optimizations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: bch2_run_explicit_recovery_pass() cleanup	Kent Overstreet
	Consolidate the run_explicit_recovery_pass() interfaces by adding a flags parameter; this will also let us add a RUN_RECOVERY_PASS_ratelimit flag. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: bch2_fsck_err_opt()	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Async object debugging	Kent Overstreet
	Debugging infrastructure for async objs: this lets us easily create fast_lists for various object types so they'll be visible in debugfs. Add new object types to the BCH_ASYNC_OBJS_TYPES() enum, and drop a pretty-printer wrapper in async_objs.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: BCH_FEATURE_small_image	Kent Overstreet
	We can't go RW if it's an image file that hasn't been resized. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: BCH_FEATURE_no_alloc_info	Kent Overstreet
	If a filesystem is going to only be used read-only, and will be a deployable image, we can strip out alloc info for a substantial reduction in metadata size - around half, due to backpointers. Alloc info will be regenerated on first read-write mount. Remounting RW is disallowed for now, since we don't yet have check_allocations running in RW mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: Single device mode	Kent Overstreet
	Single device filesystems are now identified by the block device name, not the UUID - and single device filesystems with the same UUID can be mounted simultaneously, without any special options. This allocates a new bit in the superblock, BCH_SB_MULTI_DEVICE, which indicates whether a filesystem has ever been multi device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21	bcachefs: early return for negative values when parsing BCH_OPT_UINT	Integral
	Currently, when passing a negative integer as argument, the error message is "too big" due to casting to an unsigned integer: > bcachefs format --block_size=-1 bcachefs.img invalid option: block_size: too big (max 65536) When negative value in argument detected, return early before calling bch2_opt_validate(). A new error code `BCH_ERR_option_negative` is added. Signed-off-by: Integral <integral@archlinuxcn.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-07	bcachefs: journal_shutdown is EROFS, not EIO	Kent Overstreet
	We often filter out EROFS errors to avoid log spew after an emergency shutdown - journal_shutdown is just another emergency shutdown error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-15	bcachefs: Silence extent_poisoned error messages	Kent Overstreet
	extent poisoning is partly so that we don't keep spewing the dmesg log when we've got unreadable data - we don't want to print these. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28	bcachefs: Fix WARN() in bch2_bkey_pick_read_device()	Kent Overstreet
	syzbot discovered that this one is possible: we have pointers, but none of them are to valid devices. Reported-by: syzbot+336a6e6a2dbb7d4dba9a@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-28	bcachefs: Recovery no longer holds state_lock	Kent Overstreet
	state_lock guards against devices coming or leaving, changing state, or the filesystem changing between ro <-> rw. But it's not necessary for running recovery passes, and holding it blocks asynchronous events that would cause us to go RO or kick out devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24	bcachefs: EIO cleanup	Kent Overstreet
	Replace these with proper private error codes, so that when we get an error message we're not sifting through the entire codebase to see where it came from. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24	bcachefs: Kill JOURNAL_ERRORS()	Kent Overstreet
	Convert these to standard error codes, which means we can pass them outside the journal code, they're easier to pass to tracepoints, etc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-24	bcachefs: Validate bch_sb.offset field	Kent Overstreet
	This was missed - but it needs to be correct for the superblock recovery tool that scans the start and end of the device for backup superblocks: we don't want to pick up superblocks that belong to a different partition that starts at a different offset. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16	bcachefs: Checksum errors get additional retries	Kent Overstreet
	It's possible for checksum errors to be transient - e.g. flakey controller or cable, thus we need additional retries (besides retrying from different replicas) before we can definitely return an error. This is particularly important for the next patch, which will allow the data move path to move extents with checksum errors - we don't want to accidentally introduce bitrot due to a transient error! - bch2_bkey_pick_read_device() is substantially reworked, and bch2_dev_io_failures is expanded to record more information about the type of failure (i.e. number of checksum errors). It now returns an error code that describes more precisely the reason for the failure - checksum error, io error, or offline device, instead of the previous generic "insufficient devices". This is important for the next patches that add poisoning, as we only want to poison extents when we've got real checksum errors (or perhaps IO errors?) - not because a device was offline. - Add a new option and superblock field for the number of checksum retries. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16	bcachefs: BCH_ERR_data_read_buffer_too_small	Kent Overstreet
	Now that the read path uses proper error codes, we can get rid of the weird rbio->hole signalling to the move path that the read didn't happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-16	bcachefs: Convert read path to standard error codes	Kent Overstreet
	Kill the READ_ERR/READ_RETRY/READ_RETRY_AVOID enums, and add standard error codes that describe precisely which error occured. This is going to be used for the data move path, to move but poison extents with checksum errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: trace_stripe_create	Kent Overstreet
	Add a simple tracepoint for stripe creation, we'll want to expand this later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: Finish bch2_account_io_completion() conversions	Kent Overstreet
	More prep work for automatically kicking devices out after too many IO errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: data_update now checks for extents that can't be moved	Kent Overstreet
	If a device is ro or failed, we might not have anywhere to move a replica. Check for this early, before doing the read and attempting to write. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: bcachefs_metadata_version_extent_flags	Kent Overstreet
	This implements a new extent field bitflags that apply to the whole extent. There's been a couple things we've wanted this for in the past, but the immediate need is extent poisoning, to solve a rebalance issue. Unknown extent fields can't be parsed (we won't known their size, so we can't advance to the next field), so this is an incompat feature, and using it prevents the filesystem from being mounted by old versions. This also adds the BCH_EXTENT_poisoned flag; this indicates that the data is known to be bad (i.e. there was a checksum error, and we had to write a new checksum) and reads will return errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: bch2_request_incompat_feature() now returns error code	Kent Overstreet
	For future usage, we'll want a dedicated error code for better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: bch2_btree_node_scrub()	Kent Overstreet
	Add a function for scrubbing btree nodes - reading them in, and kicking off a rewrite if there's an error. The btree_node_read_done() checks have to be duplicated because we're not using a pointer to a struct btree - the btree node might already be in cache, and we need to check a specific replica, which might not be the one we previously read from. This will be used in the next patch implementing high-level scrub. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: Bail out early on alloc_nowait data updates	Kent Overstreet
	If a data update doesn't want to block on allocations (promotes, self healing on read error) - check if the allocation would fail before kicking off the data update and calling into the write path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: bch2_update_unwritten_extent() no longer depends on wbio	Kent Overstreet
	Prep work for improving bch2_data_update_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-14	bcachefs: btree_node_(rewrite\|update_key) cleanup	Kent Overstreet
	Factor out get_iter_to_node() and use it for btree_node_rewrite_get_iter(), to be used for fixing btree node write error behaviour. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09	bcachefs: Don't rely on snapshot_tree.master_subvol for reattaching	Kent Overstreet
	Previously, fsck used the snapshot tree's master subvol for finding the root inode number - but the master subvol might have been deleting, and setting a new one should be a user operation; meaning we can't rely on it existing. Fortunately, for finding the root inode number in a tree of snapshots, finding any associated subvolume works. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29	bcachefs: check_extents_to_backpointers() now only checks buckets with ↵	Kent Overstreet
	mismatches Instead of walking every extent and every backpointer it points to, first sum up backpointers in each bucket and check for mismatches, and only look for missing backpointers if mismatches were detected, and only check extents in those buckets. This is a major fsck scalability improvement, since the two backpointers passes (backpointers -> extents and extents -> backpointers) are the most expensive fsck passes by far. Additionally, to speed up the upgrade for backpointer bucket gens, or in situations when we have to rebuild alloc info, add a special case for when no backpointers are found in a bucket - don't check each individual backpointer (in particular, avoiding the write buffer flushes), just recreate them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Use proper errcodes for inode unpack errors	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Kill unnecessary mark_lock usage	Kent Overstreet
	We can't hold mark_lock while calling fsck_err() - that's a deadlock, mark_lock is meant to be a leaf node lock. It's also unnecessary for gc_bucket() and bucket_gen(); rcu suffices since the bucket_gens array describes its size, and we can't race with device removal or resize during gc/fsck since that takes state lock. Reported-by: syzbot+38641fcbda1aaffefdd4@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: BCH_ERR_insufficient_journal_devices	Kent Overstreet
	kill another standard error code use Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: bch2_inum_to_path()	Kent Overstreet
	Add a function for walking backpointers to find a path from a given inode number, and convert various error messages to use it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Issue a transaction restart after commit in repair	Kent Overstreet
	transaction commits invalidate pointers to btree values, and they also downgrade intent locks. This breaks the interior btree update path, which takes intent locks and then calls into the allocator. This isn't an ideal solution: we can't unconditionally issue a restart after a transaction commit, because that would break other codepaths. Reported-by: syzbot+78d82470c16a49702682@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Don't try to en/decrypt when encryption not available	Kent Overstreet
	If a btree node says it's encrypted, but the superblock never had an encryptino key - whoops, that needs to be handled. Reported-by: syzbot+026f1857b12f5eb3f9e9@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: errcode cleanup: journal errors	Kent Overstreet
	Instead of throwing standard error codes, we should be throwing dedicated private error codes, this greatly improves debugability. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: BCH_ERR_btree_node_read_error_cached	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Don't BUG_ON() when superblock feature wasn't set for compressed data	Kent Overstreet
	We don't allocate the mempools for compression/decompression unless we need them - but that means there's an inconsistency to check for. Reported-by: syzbot+cb3fbcfb417448cfd278@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Don't use a shared decompress workspace mempool	Kent Overstreet
	gzip and zstd require different decompress workspace sizes, and if we start with one and then start using the other at runtime we may not get the correct size Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Implement bch2_btree_iter_prev_min()	Kent Overstreet
	A user contributed a filessytem dump, where the dump was actually corrupted (due to being taken while the filesystem was online), but which exposed an interesting bug in fsck - reconstruct_inode(). When itearting in BTREE_ITER_filter_snapshots mode, it's required to give an end position for the iteration and it can't span inode numbers; continuing into the next inode might mean we start seeing keys from a different snapshot tree, that the is_ancestor() checks always filter, thus we're never able to return a key and stop iterating. Backwards iteration never implemented the end position because nothing else needed it - except for reconstuct_inode(). Additionally, backwards iteration is now able to overlay keys from the journal, which will be useful if we ever decide to start doing journal replay in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: bch2_run_explicit_recovery_pass() returns different error when not ↵	Kent Overstreet
	in recovery if we're not in recovery then there's no way to rewind recovery - give this a different errcode so that any error messages will give us a better idea of what happened. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21	bcachefs: Delete dead code	Alan Huang
	lock_fail_root_changed has not been used since commit 0d7009d7ca99 ("bcachefs: Delete old deadlock avoidance code") Remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-07	bcachefs: add check NULL return of bio_kmalloc in journal_read_bucket	Pei Xiao
	bio_kmalloc may return NULL, will cause NULL pointer dereference. Add check NULL return for bio_kmalloc in journal_read_bucket. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Fixes: ac10a9611d87 ("bcachefs: Some fixes for building in userspace") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-29	bcachefs: fix possible null-ptr-deref in __bch2_ec_stripe_head_get()	Gaosheng Cui
	The function ec_new_stripe_head_alloc() returns nullptr if kzalloc() fails. It is crucial to verify its return value before dereferencing it to avoid a potential nullptr dereference. Fixes: 035d72f72c91 ("bcachefs: bch2_ec_stripe_head_get() now checks for change in rw devices") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-24	bcachefs: Fix invalid shift in validate_sb_layout()	Gianfranco Trad
	Add check on layout->sb_max_size_bits against BCH_SB_LAYOUT_SIZE_BITS_MAX to prevent UBSAN shift-out-of-bounds in validate_sb_layout(). Reported-by: syzbot+089fad5a3a5e77825426@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=089fad5a3a5e77825426 Fixes: 03ef80b469d5 ("bcachefs: Ignore unknown mount options") Tested-by: syzbot+089fad5a3a5e77825426@syzkaller.appspotmail.com Signed-off-by: Gianfranco Trad <gianf.trad@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-09	bcachefs: Disk accounting device validation fixes	Kent Overstreet
	- Fix failure to validate that accounting replicas entries point to valid devices: this wasn't a real bug since they'd be cleaned up by GC, but is still something we should know about - Fix failure to validate that dev_data_type entries point to valid devices: this does fix a real bug, since bch2_accounting_read() would then try to copy the counters to that device and pop an inconsistent error when the device didn't exist - Remove accounting entries that are zeroed or invalid: if we're not validating them we need to get rid of them: they might not exist in the superblock, so we need the to trigger the superblock mark path when they're readded. This fixes the replication.ktest rereplicate test, which was failing with "superblock not marked for replicas..." Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>