diff options
| author | Qu Wenruo <wqu@suse.com> | 2025-07-25 20:33:25 +0930 | 
|---|---|---|
| committer | David Sterba <dsterba@suse.com> | 2025-08-07 17:07:15 +0200 | 
| commit | 4289b494ac553e74e86fed1c66b2bf9530bc1082 (patch) | |
| tree | b7bd4c001e6865fee4df662e1965fb9d73bd81a0 /scripts/gdb/linux/pgtable.py | |
| parent | fc5799986fbca957e2e3c0480027f249951b7bcf (diff) | |
btrfs: do not allow relocation of partially dropped subvolumes
[BUG]
There is an internal report that balance triggered transaction abort,
with the following call trace:
  item 85 key (594509824 169 0) itemoff 12599 itemsize 33
          extent refs 1 gen 197740 flags 2
          ref#0: tree block backref root 7
  item 86 key (594558976 169 0) itemoff 12566 itemsize 33
          extent refs 1 gen 197522 flags 2
          ref#0: tree block backref root 7
 ...
 BTRFS error (device loop0): extent item not found for insert, bytenr 594526208 num_bytes 16384 parent 449921024 root_objectid 934 owner 1 offset 0
 BTRFS error (device loop0): failed to run delayed ref for logical 594526208 num_bytes 16384 type 182 action 1 ref_mod 1: -117
 ------------[ cut here ]------------
 BTRFS: Transaction aborted (error -117)
 WARNING: CPU: 1 PID: 6963 at ../fs/btrfs/extent-tree.c:2168 btrfs_run_delayed_refs+0xfa/0x110 [btrfs]
And btrfs check doesn't report anything wrong related to the extent
tree.
[CAUSE]
The cause is a little complex, firstly the extent tree indeed doesn't
have the backref for 594526208.
The extent tree only have the following two backrefs around that bytenr
on-disk:
        item 65 key (594509824 METADATA_ITEM 0) itemoff 13880 itemsize 33
                refs 1 gen 197740 flags TREE_BLOCK
                tree block skinny level 0
                (176 0x7) tree block backref root CSUM_TREE
        item 66 key (594558976 METADATA_ITEM 0) itemoff 13847 itemsize 33
                refs 1 gen 197522 flags TREE_BLOCK
                tree block skinny level 0
                (176 0x7) tree block backref root CSUM_TREE
But the such missing backref item is not an corruption on disk, as the
offending delayed ref belongs to subvolume 934, and that subvolume is
being dropped:
        item 0 key (934 ROOT_ITEM 198229) itemoff 15844 itemsize 439
                generation 198229 root_dirid 256 bytenr 10741039104 byte_limit 0 bytes_used 345571328
                last_snapshot 198229 flags 0x1000000000001(RDONLY) refs 0
                drop_progress key (206324 EXTENT_DATA 2711650304) drop_level 2
                level 2 generation_v2 198229
And that offending tree block 594526208 is inside the dropped range of
that subvolume.  That explains why there is no backref item for that
bytenr and why btrfs check is not reporting anything wrong.
But this also shows another problem, as btrfs will do all the orphan
subvolume cleanup at a read-write mount.
So half-dropped subvolume should not exist after an RW mount, and
balance itself is also exclusive to subvolume cleanup, meaning we
shouldn't hit a subvolume half-dropped during relocation.
The root cause is, there is no orphan item for this subvolume.
In fact there are 5 subvolumes from around 2021 that have the same
problem.
It looks like the original report has some older kernels running, and
caused those zombie subvolumes.
Thankfully upstream commit 8d488a8c7ba2 ("btrfs: fix subvolume/snapshot
deletion not triggered on mount") has long fixed the bug.
[ENHANCEMENT]
For repairing such old fs, btrfs-progs will be enhanced.
Considering how delayed the problem will show up (at run delayed ref
time) and at that time we have to abort transaction already, it is too
late.
Instead here we reject any half-dropped subvolume for reloc tree at the
earliest time, preventing confusion and extra time wasted on debugging
similar bugs.
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'scripts/gdb/linux/pgtable.py')
0 files changed, 0 insertions, 0 deletions
