diff options
author | Nicolas Pitre <npitre@baylibre.com> | 2024-10-03 17:16:14 -0400 |
---|---|---|
committer | Arnd Bergmann <arnd@arndb.de> | 2024-10-28 21:44:28 +0000 |
commit | 00a31dd3acea0f88f947fc71e268ebb34b59f218 (patch) | |
tree | 26f7fb60c6ab8a359e82eaa1c159d29703fe0789 /tools/perf/scripts/python/task-analyzer.py | |
parent | 1dc82675cb79200d5e140520efd7ce88b38ea56d (diff) |
asm-generic/div64: optimize/simplify __div64_const32()
Several years later I just realized that this code could be greatly
simplified.
First, let's formalize the need for overflow handling in
__arch_xprod64(). Assuming n = UINT64_MAX, there are 2 cases where
an overflow may occur:
1) If a bias must be added, we have m_lo * n_lo + m or
m_lo * 0xffffffff + ((m_hi << 32) + m_lo) or
((m_lo << 32) - m_lo) + ((m_hi << 32) + m_lo) or
(m_lo + m_hi) << 32 which must be < (1 << 64). So the criteria for no
overflow is m_lo + m_hi < (1 << 32).
2) The cross product m_lo * n_hi + m_hi * n_lo or
m_lo * 0xffffffff + m_hi * 0xffffffff or
((m_lo << 32) - m_lo) + ((m_hi << 32) - m_hi). Assuming the top
result from the previous step (m_lo + m_hi) that must be added to
this, we get (m_lo + m_hi) << 32 again.
So let's have a straight and simpler version when this is true.
Otherwise some reordering allows for taking care of possible overflows
without any actual conditionals. And prevent from generating both code
variants by making sure this is considered only if m is perceived as
constant by the compiler.
This, in turn, allows for greatly simplifying __div64_const32(). The
"special case" may go as well as the regular case works just fine
without needing a bias. Then reduction should be applied all the time as
minimizing m is the key.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Diffstat (limited to 'tools/perf/scripts/python/task-analyzer.py')
0 files changed, 0 insertions, 0 deletions