summaryrefslogtreecommitdiff
path: root/tools/docs/lib/parse_data_structs.py
diff options
context:
space:
mode:
authorSamuel Holland <samuel.holland@sifive.com>2025-04-09 10:14:51 -0700
committerPalmer Dabbelt <palmer@dabbelt.com>2025-06-05 11:09:43 -0700
commitbe17c0df67959fe4f88dac75dc26ed9252d4b133 (patch)
treef00beb7f8a1b319cc911f400b1b945cc731eebb4 /tools/docs/lib/parse_data_structs.py
parent881dadf0792c02d7d156e0665dc5eaf18701cd5a (diff)
riscv: module: Optimize PLT/GOT entry counting
perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent inside module_frob_arch_sections(). This is because amdgpu.ko contains about 300000 relocations in its .rela.text section, and the algorithm in count_max_entries() takes quadratic time. Apply two optimizations from the arm64 code, which together reduce the total execution time by 99.58%. First, sort the relocations so duplicate entries are adjacent. Second, reduce the number of relocations that must be sorted by filtering to only relocations that need PLT/GOT entries, as done in commit d4e0340919fb ("arm64/module: Optimize module load time by optimizing PLT counting"). Unlike the arm64 code, here the filtering and sorting is done in a scratch buffer, because the HI20 relocation search optimization in apply_relocate_add() depends on the original order of the relocations. This allows accumulating PLT/GOT relocations across sections so sorting and counting is only done once per module. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250409171526.862481-3-samuel.holland@sifive.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Diffstat (limited to 'tools/docs/lib/parse_data_structs.py')
0 files changed, 0 insertions, 0 deletions