diff options
| author | Mingzhe Zou <mingzhe.zou@easystack.cn> | 2025-05-27 13:16:01 +0800 | 
|---|---|---|
| committer | Jens Axboe <axboe@kernel.dk> | 2025-05-27 07:38:19 -0600 | 
| commit | 208c1559c5b18894e3380b3807b6364bd14f7584 (patch) | |
| tree | 0c6f5fcbae657d7be3eb86db8c48c027264f7dc7 /scripts/gdb/linux/interrupts.py | |
| parent | 5a08e49f2359a14629f27da99aaf0f1c3a68b850 (diff) | |
bcache: reserve more RESERVE_BTREE buckets to prevent allocator hang
Reported an IO hang and unrecoverable error in our testing environment.
After careful research, we found that bch_allocator_thread is stuck,
the call stack is as follows:
[<0>] __switch_to+0xbc/0x108
[<0>] __closure_sync+0x7c/0xbc [bcache]
[<0>] bch_prio_write+0x430/0x448 [bcache]
[<0>] bch_allocator_thread+0xb44/0xb70 [bcache]
[<0>] kthread+0x124/0x130
[<0>] ret_from_fork+0x10/0x18
Moreover, the RESERVE_BTREE type bucket slot are empty and journal_full
occurs at the same time.
When the cache disk is first used, the sb.nJournal_buckets defaults to 0.
So, only 8 RESERVE_BTREE type buckets are reserved. If RESERVE_BTREE type
buckets used up or btree_check_reserve() failed when request handle btree
split, the request will be repeatedly retried and wait for alloc thread to
fill in.
After the alloc thread fills the buckets, it will call bch_prio_write().
If journal_full occurs simultaneously at this time, journal_reclaim() and
btree_flush_write() will be called sequentially, journal_write cannot be
completed.
This is a low probability event, we believe that reserve more RESERVE_BTREE
buckets can avoid the worst situation.
Fixes: 682811b3ce1a ("bcache: fix for allocator and register thread race")
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@kernel.org>
Link: https://lore.kernel.org/r/20250527051601.74407-4-colyli@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Diffstat (limited to 'scripts/gdb/linux/interrupts.py')
0 files changed, 0 insertions, 0 deletions
