diff options
author | Qu Wenruo <wqu@suse.com> | 2025-04-26 08:06:49 +0930 |
---|---|---|
committer | David Sterba <dsterba@suse.com> | 2025-05-15 14:30:55 +0200 |
commit | 8e4f21f2b13d6cc29b59ad7faaac7aafbf3569e3 (patch) | |
tree | 7dcaeaa4c4cd67ff3d75c560bc2a9d0d31af1827 /scripts/lib/kdoc/kdoc_files.py | |
parent | 3649833a58b6cae14651900629e74e9a710e0fb6 (diff) |
btrfs: handle unaligned EOF truncation correctly for subpage cases
[BUG]
The following fsx sequence will fail on btrfs with 64K page size and 4K
fs block size:
#fsx -d -e 1 -N 4 $mnt/junk -S 36386
READ BAD DATA: offset = 0xe9ba, size = 0x6dd5, fname = /mnt/btrfs/junk
OFFSET GOOD BAD RANGE
0xe9ba 0x0000 0x03ac 0x0
operation# (mod 256) for the bad data may be 3
...
LOG DUMP (4 total operations):
1( 1 mod 256): WRITE 0x6c62 thru 0x1147d (0xa81c bytes) HOLE ***WWWW
2( 2 mod 256): TRUNCATE DOWN from 0x1147e to 0x5448 ******WWWW
3( 3 mod 256): ZERO 0x1c7aa thru 0x28fe2 (0xc839 bytes)
4( 4 mod 256): MAPREAD 0xe9ba thru 0x1578e (0x6dd5 bytes) ***RRRR***
[CAUSE]
Only 2 operations are really involved in this case:
3 pollute_eof 0x5448 thru 0xffff (0xabb8 bytes)
3 zero from 0x1c7aa to 0x28fe3, (0xc839 bytes)
4 mapread 0xe9ba thru 0x1578e (0x6dd5 bytes)
At operation 3, fsx pollutes beyond EOF, that is done by mmap()
and write into that mmap() range beyond EOF.
Such write will fill the range beyond EOF, but it will never reach disk
as ranges beyond EOF will not be marked dirty nor uptodate.
Then we zero_range for [0x1c7aa, 0x28fe3], and since the range is beyond
our isize (which was 0x5448), we should zero out any range beyond
EOF (0x5448).
During btrfs_zero_range(), we call btrfs_truncate_block() to dirty the
unaligned head block.
But that function only really zeroes out the block at [0x5000, 0x5fff], it
doesn't bother any range other that that block, since those ranges will
not be marked dirty nor written back.
So the range [0x6000, 0xffff] is still polluted, and later mapread()
will return the poisoned value.
[FIX]
Enhance btrfs_truncate_block() by:
- Pass a @start/@end pair to indicate the full truncation range
This is to handle the following truncation case:
Page size is 64K, fs block size is 4K, truncate range is
[6K, 60K]
0 32K 64K
| |///////////////////////////////////| |
6K 60K
The range is not aligned for its head block, so we need to call
btrfs_truncate_block() with @from = 6K, @front = 0, @len = 0.
But with that information we only know to zero the range [6K, 8K),
if we zero out the range [6K, 64K), the last block will also be
zeroed, causing data loss.
So here we need the full range we're truncating, so that we can avoid
over-truncation.
- Rename @from to @offset
As now the parameter is only utilized to locate a block, it's not
really carrying the old @from meaning well.
- Remove @front parameter
With the full truncate range passed in, we can determine if the
@offset is at the head or tail block.
- Skip truncation if @offset is not in the head nor tail blocks
The call site in hole punch unconditionally call
btrfs_truncate_block() without even checking the range is aligned or
not.
If the @offset is neither in the head nor in tail block, it means we can
safely ignore it.
- Skip truncate if the range inside the target block is already aligned
- Make btrfs_truncate_block() zero all blocks beyond EOF
Since we have the original range, we know exactly if we're doing
truncation beyond EOF (the @end will be (u64)-1).
If we're doing truncation beyond EOF, then enlarge the truncation
range to the folio end, to address the possibly polluted ranges.
Otherwise still keep the zero range inside the block, as we can have
large data folios soon, always truncating every blocks inside the same
folio can be costly for large folios.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'scripts/lib/kdoc/kdoc_files.py')
0 files changed, 0 insertions, 0 deletions