summaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/gecko.py
diff options
context:
space:
mode:
authorGao Xiang <hsiangkao@linux.alibaba.com>2025-03-10 17:54:57 +0800
committerGao Xiang <hsiangkao@linux.alibaba.com>2025-03-17 14:02:15 +0800
commitefb2aef569b35b415c232c4e9fdecd0e540e1f60 (patch)
treeabef043830258d9dca99c16c44cb5cb8dc5fe8ab /tools/perf/scripts/python/gecko.py
parentfe1e57d44d7f106df9048e815e4862cf63921220 (diff)
erofs: add encoded extent on-disk definition
Previously, EROFS provided both (non-)compact compressed indexes to keep necessary hints for each logical block, enabling O(1) random indexing. This approach was originally designed for small compression units (e.g., 4KiB), where compressed data is strictly block-aligned via fixed-sized output compression. However, EROFS now supports big pclusters up to 1MiB and many users use large configurations to minimize image sizes. For such configurations, the total number of extents decreases significantly (e.g., only 1,024 extents for a 1GiB file using 1MiB pclusters), then runtime metadata overhead becomes negligible compared to data I/O and decoding costs. Additionally, some popular compression algorithm (mainly Zstd) still lacks native fixed-sized output compression support (although it's planned by their authors). Instead of just waiting for compressor improvements, let's adopt byte-oriented extents, allowing these compressors to retain their current methods. For example, it speeds up Zstd compression a lot: Processor: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz * 96 Dataset: enwik9 Build time Size Type Command Line 3m52.339s 266653696 FO -C524288 -zzstd,22 3m48.549s 266174464 FO -E48bit -C524288 -zzstd,22 0m12.821s 272134144 FI -E48bit -C1048576 --max-extent-bytes=1048576 -zzstd,22 0m14.528s 248987648 FO -C1048576 -zlzma,9 0m14.605s 248504320 FO -E48bit -C1048576 -zlzma,9 Encoded extents are structured as an array of `struct z_erofs_extent`, sorted by logical address in ascending order: __le32 plen // encoded length, algorithm id and flags __le32 pstart_lo // physical offset LSB __le32 pstart_hi // physical offset MSB __le32 lstart_lo // logical offset __le32 lstart_hi // logical offset MSB .. Note that prefixed reduced records can be used to minimize metadata for specific cases (e.g. lstart less than 32 bits, then 32 to 16 bytes). If the logical lengths of all encoded extents are the same, 4-byte (plen) and 8-byte (plen, pstart_lo) records can be used. Or, 16-byte (plen .. lstart_lo) and 32-byte full records have to be used instead. If 16-byte and 32-byte records are used, the total number of extents is kept in `struct z_erofs_map_header`, and binary search can be applied on them. Note that `eytzinger order` is not considerd because data sequential access is important. If 4-byte records are used, 8-byte start physical offset is between `struct z_erofs_map_header` and the `plen` array. In addition, 64-bit physical offsets can be applied with new encoded extent format to match full 48-bit block addressing. Remove redundant comments around `struct z_erofs_lcluster_index` too. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20250310095459.2620647-8-hsiangkao@linux.alibaba.com
Diffstat (limited to 'tools/perf/scripts/python/gecko.py')
0 files changed, 0 insertions, 0 deletions