summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-01bcachefs: BCH_IOCTL_FSCK_OFFLINEKent Overstreet
This adds a new ioctl for running fsck on a list of devices. Normally, if we wish to use the kernel's implementation of fsck we'd run it at mount time with -o fsck. This ioctl lets us run fsck without mounting, so that userspace bcachefs-tools can transparently switch to the kernel's implementation of fsck when appropriate - primarily if the kernel version of bcachefs better matches the filesystem on disk. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_run_online_recovery_passes()Kent Overstreet
Add a new helper for running online recovery passes - i.e. online fsck. This is a subset of our normal recovery passes, and does not - for now - use or follow c->curr_recovery_pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Mark recovery passses that are safe to run onlineKent Overstreet
Online fsck is coming, and many of our recovery/fsck passes are already safe to run while the filesystem is in use - mark which ones. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add ability to redirect log outputKent Overstreet
Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: thread_with_fileKent Overstreet
Abstract out a new helper from the data job code, for connecting a kthread to a file descriptor. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: c->ro_refKent Overstreet
Add a new refcount for async ops that don't necessarily need the fs to be RW, with similar lifetime/rules otherwise as c->writes. To be used by online fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Improve error message when finding wrong btree nodeKent Overstreet
single_device.merge_torture_flakey is, very rarely, finding a btree node that doesn't match the key that points to it: this patch improves the error message to print out more fields from the btree node header, so that we can see what else does or does not match the key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: return from fsync on writeback error to avoid early shutdownBrian Foster
When investigating transient failures of generic/441 on bcachefs, it was determined that the cause of the failure was a combination of unconditional emergency shutdown and racing between background journal activity and the test switchover from a working device mapper table to an error injecting table. Part of the reason for this sequence of events is that bcachefs aggressively flushes as much as possible during fsync(), regardless of errors. While this is reasonable behavior, it is technically unnecessary because once an error is returned from fsync(), the caller cannot make any assumptions about the resilience of data. Tweak the bch2_fsync() logic to return an error on failure of any of the steps involved in the flush. Note that this change alone does not prevent generic/441 failure, but in combination with a test tweak to avoid racing during the dm-error table switchover it avoids the unnecessary shutdowns and allows the test to pass reliably on bcachefs. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: BCH_ERR_opt_parse_errorKent Overstreet
Continuing the project of replacing generic error codes with more specific ones. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Refactor trans->paths_allocated to be standard bitmapKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Move reflink_p triggers into reflink.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Remove obsolete comment about zstdRichard Davies
Remove obsolete comment about zstd, since approach changed during development of commit bbc3a46065d08f9ab3412b1f26bbfa778c444833 Signed-off-by: Richard Davies <richard@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Include btree_trans in more tracepointsKent Overstreet
This gives us more context information - e.g. which codepath is invoking btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: remove sb lock and flags update on explicit shutdownBrian Foster
bcachefs grabs s_umount and sets SB_RDONLY when the fs is shutdown via the ioctl() interface. This has a couple issues related to interactions between shutdown and freeze: 1. The flags == FSOP_GOING_FLAGS_DEFAULT case is a deadlock vector because freeze_bdev() calls into freeze_super(), which also acquires s_umount. 2. If an explicit shutdown occurs while the sb is frozen, SB_RDONLY alters the thaw path as if the sb was read-only at freeze time. This effectively leaks the frozen state and leaves the sb frozen indefinitely. The usage of SB_RDONLY here goes back to the initial bcachefs commit and AFAICT is simply historical behavior. This behavior is unique to bcachefs relative to the handful of other filesystems that support the shutdown ioctl(). Typically, SB_RDONLY is reserved for the proper remount path, which itself is restricted from modifying frozen superblocks in reconfigure_super(). Drop the unnecessary sb lock and flags update bch2_ioc_goingdown() to address both of these issues. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Make backpointer fsck wb flush check more rigorousKent Overstreet
backpointers fsck now always runs in rw mode - the btree is being modified while it runs, by e.g. copygc, rebalance, the discard worker, the invalidate worker. We could find a missing backpointer, flush the btree write buffer, and then on the next iteration find a new key at the exact same position - which will most likely need another write buffer flush. Hence, we have to check for an exact match on last_flushed, not just the pos. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: On missing backpointer to interior node, flush interior updatesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: remove redundant condition from data_update_index_updateDaniel Hill
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: copygc shouldn't try moving buckets on errorDaniel Hill
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Explicity go RW for fsckKent Overstreet
This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: copygc should wakeup on shutdown if disabledDaniel Hill
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: rebalance should wakeup on shutdown if disabledDaniel Hill
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: remove dead bch2_evacuate_bucket()Daniel Hill
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Replace zero-length arrays with flexible-array membersGustavo A. R. Silva
Fake flexible arrays (zero-length and one-element arrays) are deprecated, and should be replaced by flexible-array members. So, replace zero-length arrays with flexible-array members in multiple structures. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: more write buffer refactoringKent Overstreet
prep work for big rewrite - no functional changes in this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: wb_flush_one_slowpath()Kent Overstreet
A bit of refactoring for better inlining in the main btree write buffer flush path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: ONLY_SPECIFIED_DEVS doesn't mean ignore durability anymoreKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Don't open code bch2_dev_exists2()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Improve trace_trans_restart_would_deadlockKent Overstreet
In the CI, we're seeing tests failing due to excessive would_deadlock transaction restarts - the tracepoint now includes the lock cycle that occured. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Improve trace_trans_restart_too_many_iters()Kent Overstreet
We now include the list of paths in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: count_event()Kent Overstreet
Small helper for event counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush()Kent Overstreet
More accurate naming. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_write_buffer_flush_locked()Kent Overstreet
Minor refactoring - improved naming, and move the responsibility for flush_lock to the caller instead of having it be shared. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Clean up btree write buffer write ref handlingKent Overstreet
__bch2_btree_write_buffer_flush() now assumes a write ref is already held (as called by the transaction commit path); and the wrappers bch2_write_buffer_flush() and flush_sync() take an explicit write ref. This means internally the write buffer code can always use BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags around and hoping the NOCHECK_RW flag was always carried around correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: delete useless commit_do()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: kill journal->preres_waitKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Improve btree write buffer tracepointsKent Overstreet
- add a tracepoint for write_buffer_flush_sync; this is expensive - fix the write_buffer_flush_slowpath tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: No need to allocate keys for write bufferKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: convert bch_fs_flags to x-macroKent Overstreet
Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Kill journal_seq/gc args to bch2_dev_usage_update_m()Kent Overstreet
This is only used by gc (fsck). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Refactor bch2_check_alloc_to_lru_ref()Kent Overstreet
This code was somewhat convoluted - because originally bch2_lru_set() could modify the LRU index if there was a collision. That's no longer the case, so the "create LRU entry" path has no reason to update the alloc key, so we can separate the handling of the two fsck errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add a rebalance, data_update tracepointsKent Overstreet
Add a tracepoint for rebalance, printing out - the target option - the compression option - the key being rebalanced Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Print durability in member_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Improve sysfs compression_statsKent Overstreet
Break it out by compression type, and include average extent size. Also, format into a nice table. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Kill dev_usage->buckets_ecKent Overstreet
This counter is redundant; it's simply the sum of BCH_DATA_stripe and BCH_DATA_parity buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_dev_usage_to_text()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: New bucket sector count helpersKent Overstreet
This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(), prep work for separately accounting stripe sectors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: BCH_IOCTL_DEV_USAGE_V2Kent Overstreet
BCH_IOCTL_DEV_USAGE mistakenly put the per-data-type array in struct bch_ioctl_dev_usage; since ioctl numbers encode the size of the arg, that means adding new data types breaks the ioctl. This adds a new version that includes the number of data types as a parameter: the old version is fixed at 10 so as to not break when adding new types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Simplify check_bucket_ref()Kent Overstreet
We only need the sector count being modified. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: six locks: Simplify optimistic spinningKent Overstreet
osq lock maintainers don't want it to be used outside of kernel/locking/ - but, we can do better. Since we have lock handoff signalled via waitlist entries, there's no reason for optimistic spinning to have to look at the lock at all - aside from checking lock-owner; we can just spin looking at our waitlist entry. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01powerpc: Export kvm_guest static key, for bcachefs six locksKent Overstreet
bcachefs's six locks need kvm_guest, via ower_on_cpu() -> vcpu_is_preempted() -> is_kvm_guest() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)