Age | Commit message (Collapse) | Author |
|
Add a tracepoint for any time we return an error and unwind.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The new guard(), scoped_guard() allow for more natural code.
Some of the uses with creative flow control have been left.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More stack usage work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
with typical config options, variables in different inline functions
aren't sharing stack space - and these are slowpaths.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We want recovery.curr_pass to be private to the recovery passes code,
for better showing recovery pass status; also, it may rewind and is
generally not the correct member to use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch_fs has gotten obnoxiously big, let's start organizing thins a bit
better.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Factor out the per-device calculations, for better introspection.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This was an oversight, we want bch2_alloc_sectors_append_ptrs_inlined()
fully inlined.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There is req->ec = erasure_code above.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_print_string_as_lines() is a low level helper that allows messages
longer than 1k to be printed without truncation.
But we should always be printing with the helpers that take a filesystem
object, if we're in fsck they direct output to the userspace process
controlling fsck instead of the dmesg log.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a struct for common state for satisfying an on disk allocation,
instead of passing the same long list of items to every function.
This will help with stack usage, performance, and perhaps enable some
code cleanups.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There was a buggy version of bcachefs-tools which picked misaligned
bucket sizes when formatting, and we're also about to do dynamic block
sizes - which will allow picking logical block size or physical block
size of the device per-write, allowing for better compression ratios at
the cost of slightly worse write performance (i.e. forcing the device to
do RMW or extra buffering).
To account for this, tweak bch2_alloc_sectors_start() to properly align
open_buckets to the blocksize of the write we're about to do.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We had a buggy release of bcachefs-tools that wasn't properly aligning
bucket sizes.
We can't ask users to reformat - and it's easy to teach the allocator to
make sure writes are properly aligned.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
All the fastpaths that need device usage don't need the sector totals or
fragmentation, just bucket counts.
Split bch_dev_usage up into two different versions, the normal one with
just bucket counts.
This is also a stack usage improvement, since we have a bch_dev_usage on
the stack in the allocation path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
For striping across devices, we maintain "clocks", and we advance them
by the inverse of "how much free space this device has left", so that we
round robin biased in favor of devices with more free space.
This code was originally trying to do EWMA-ish stuff when originally
written, ~10 years ago, and was never properly cleaned up when it was
realized that an EWMA is not the right approach here.
That left a bug, when we rescale to keep all the clocks in the correct
range and prevent overflow.
It was assumed that we'd always be allocated from the device with the
smallest clock hand, but that's actually not correct: with the target
options, allocations will be first tried from a subset of devices, and
then the entire filesystem if that fails.
Thus, the rescale from the first allocation - allocating from a subset
of devices - can pick the wrong rescale value and cause the rest of the
clocks to go to 0, losing information.
This resuls in incorrect striping behaviour when the desired number of
replicas doesn't fit on the foreground target.
Link: https://www.reddit.com/r/bcachefs/comments/1jn3t26/replica_allocation_not_evenly_distributed_among/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_dev_usage_read() is fairly expensive, we should optimize this more.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Replace these with proper private error codes, so that when we get an
error message we're not sifting through the entire codebase to see where
it came from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If a data update doesn't want to block on allocations (promotes, self
healing on read error) - check if the allocation would fail before
kicking off the data update and calling into the write path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The uppercase/lowercase style is nice for making the namespace explicit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The discard path is supposed to issue journal flushes when there's too
many buckets empty buckets that need a journal commit before they can be
written to again, but at some point this code seems to have been lost.
Bring it back with a new optimization to make sure we don't issue too
many journal flushes: the journal now tracks the sequence number of the
most recent flush in progress, which the discard path uses when deciding
which buckets need a journal flush.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't hold mark_lock while calling fsck_err() - that's a deadlock,
mark_lock is meant to be a leaf node lock.
It's also unnecessary for gc_bucket() and bucket_gen(); rcu suffices
since the bucket_gens array describes its size, and we can't race with
device removal or resize during gc/fsck since that takes state lock.
Reported-by: syzbot+38641fcbda1aaffefdd4@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Check open buckets and buckets waiting for journal commit before doing
other expensive lookups.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This lets us use darray macros on dev_alloc_list (and it will become a
darray eventually, when we increase the maximum number of devices).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The function bch2_bucket_alloc_trans() lacked a description for the
nowait parameter in its documentation comment block. This patch adds the
missing description to ensure all parameters are properly documented.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=12179
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When calling check_discard_freeespace_key from the allocator, we can't
repair without recursing - run it asynchronously instead.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The early-early allocation path, bch2_bucket_alloc_new_fs(), is no
longer needed - and inconsistencies around new_fs_bucket_idx have been a
frequent source of bugs.
Reported-by: syzbot+592425844580a6598410@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
try_alloc_bucket() has a "safety" check, which avoids allocating a
bucket if there's any backpointers present.
But backpointers are not the source of truth for live data in a bucket,
the bucket sector counts are; this check was fairly useless, and we're
also deferring backpointers checks from fsck to runtime in the near
future.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
check_discard_freespace_key() was doing all the same checks as
try_alloc_bucket(), but with repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prep work for converting try_alloc_bucket() to use
bch2_check_discard_freespace_key().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Open buckets on the partial list should not count as allocated when
we're trying to allocate from the partial list.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
these are an important source of stranded buckets we need to be able to
watch
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Using commit_do() to call alloc_sectors_start_trans() breaks when we're
randomly injecting transaction restarts - the restart in the commit
causes us to leak the lock that alloc_sectorS_start_trans() takes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
No reason for it not to be where it's needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
this was an oversight: rebalance is moving data to a specific device, so
we don't want it falling back to the full filesystem
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
rebalance writes must be BCH_WRITE_ALLOC_NOWAIT because they don't
allocate from the full filesystem - but we don't want spurious
allocation failures due to open buckets.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|