diff options
| author | Samuel Holland <samuel.holland@sifive.com> | 2025-04-09 10:14:51 -0700 | 
|---|---|---|
| committer | Palmer Dabbelt <palmer@dabbelt.com> | 2025-06-05 11:09:43 -0700 | 
| commit | be17c0df67959fe4f88dac75dc26ed9252d4b133 (patch) | |
| tree | f00beb7f8a1b319cc911f400b1b945cc731eebb4 /tools/docs/lib/parse_data_structs.py | |
| parent | 881dadf0792c02d7d156e0665dc5eaf18701cd5a (diff) | |
riscv: module: Optimize PLT/GOT entry counting
perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent
inside module_frob_arch_sections(). This is because amdgpu.ko contains
about 300000 relocations in its .rela.text section, and the algorithm in
count_max_entries() takes quadratic time.
Apply two optimizations from the arm64 code, which together reduce the
total execution time by 99.58%. First, sort the relocations so duplicate
entries are adjacent. Second, reduce the number of relocations that must
be sorted by filtering to only relocations that need PLT/GOT entries, as
done in commit d4e0340919fb ("arm64/module: Optimize module load time by
optimizing PLT counting").
Unlike the arm64 code, here the filtering and sorting is done in a
scratch buffer, because the HI20 relocation search optimization in
apply_relocate_add() depends on the original order of the relocations.
This allows accumulating PLT/GOT relocations across sections so sorting
and counting is only done once per module.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-3-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Diffstat (limited to 'tools/docs/lib/parse_data_structs.py')
0 files changed, 0 insertions, 0 deletions
