diff options
author | Eric Biggers <ebiggers@google.com> | 2024-10-13 21:24:47 -0700 |
---|---|---|
committer | Herbert Xu <herbert@gondor.apana.org.au> | 2024-10-26 14:41:59 +0800 |
commit | 84dd048cf89557e1e4badd20c9522f7aaa0275fe (patch) | |
tree | 4d9265071e477b93b29b494bcf65bfb76b7fc08c /scripts/lib/kdoc/kdoc_parser.py | |
parent | eebcadfa21e66a893846ec168fb7c26a46a510cd (diff) |
crypto: x86/crc32c - eliminate jump table and excessive unrolling
crc32c-pcl-intel-asm_64.S has a loop with 1 to 127 iterations fully
unrolled and uses a jump table to jump into the correct location. This
optimization is misguided, as it bloats the binary code size and
introduces an indirect call. x86_64 CPUs can predict loops well, so it
is fine to just use a loop instead. Loop bookkeeping instructions can
compete with the crc instructions for the ALUs, but this is easily
mitigated by unrolling the loop by a smaller amount, such as 4 times.
Therefore, re-roll the loop and make related tweaks to the code.
This reduces the binary code size of crc_pclmul() from 4546 bytes to 418
bytes, a 91% reduction. In general it also makes the code faster, with
some large improvements seen when retpoline is enabled.
More detailed performance results are shown below. They are given as
percent improvement in throughput (negative means regressed) for CPU
microarchitecture vs. input length in bytes. E.g. an improvement from
40 GB/s to 50 GB/s would be listed as 25%.
Table 1: Results with retpoline enabled (the default):
| 512 | 833 | 1024 | 2000 | 3173 | 4096 |
---------------------+-------+-------+-------+------ +-------+-------+
Intel Haswell | 35.0% | 20.7% | 17.8% | 9.7% | -0.2% | 4.4% |
Intel Emerald Rapids | 66.8% | 45.2% | 36.3% | 19.3% | 0.0% | 5.4% |
AMD Zen 2 | 29.5% | 17.2% | 13.5% | 8.6% | -0.5% | 2.8% |
Table 2: Results with retpoline disabled:
| 512 | 833 | 1024 | 2000 | 3173 | 4096 |
---------------------+-------+-------+-------+------ +-------+-------+
Intel Haswell | 3.3% | 4.8% | 4.5% | 0.9% | -2.9% | 0.3% |
Intel Emerald Rapids | 7.5% | 6.4% | 5.2% | 2.3% | -0.0% | 0.6% |
AMD Zen 2 | 11.8% | 1.4% | 0.2% | 1.3% | -0.9% | -0.2% |
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Diffstat (limited to 'scripts/lib/kdoc/kdoc_parser.py')
0 files changed, 0 insertions, 0 deletions