diff options
| author | Kent Overstreet <kent.overstreet@linux.dev> | 2025-07-01 13:36:51 -0400 | 
|---|---|---|
| committer | Kent Overstreet <kent.overstreet@linux.dev> | 2025-07-01 19:33:46 -0400 | 
| commit | c6e8d51b37d2ca37dee63753fd240bcbc6402ad3 (patch) | |
| tree | 94b949fb2ce238793b38e3d3d16317a5057d517b /rust/kernel/alloc.rs | |
| parent | fbf913cb72a52559ae98951fb4311b81d7b0650e (diff) | |
bcachefs: Work around deadlock to btree node rewrites in journal replay
Don't mark btree nodes for rewrites, if they are or would be degraded,
if journal replay hasn't finished, to avoid a deadlock.
This is because btree node rewrites generate more updates for the
interior updates (alloc, backpointers), and if those updates touch
new nodes and generate more rewrites - we can only have so many interior
btree updates in flight before we deadlock on open_buckets.
The biggest cause is that we don't use the btree write buffer (for
the backpointer updates - this needs some real thought on locking in
order to fix.
The problem with this workaround (not doing the rewrite for degraded
nodes in journal replay) is that those degraded nodes persist, and we
don't want that (this is a real bug when a btree node write completes
with fewer replicas than we wanted and leaves a degraded node due to
device _removal_, i.e. the device went away mid write).
It's less of a bug here, but still a problem because we don't yet
have a way of tracking degraded data - we another index (all
extents/btree nodes, by replicas entry) in order to fix properly
(re-replicate degraded data at the earliest possible time).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Diffstat (limited to 'rust/kernel/alloc.rs')
0 files changed, 0 insertions, 0 deletions
