summaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python/task-analyzer.py
diff options
context:
space:
mode:
authorNicolas Pitre <npitre@baylibre.com>2024-10-03 17:16:14 -0400
committerArnd Bergmann <arnd@arndb.de>2024-10-28 21:44:28 +0000
commit00a31dd3acea0f88f947fc71e268ebb34b59f218 (patch)
tree26f7fb60c6ab8a359e82eaa1c159d29703fe0789 /tools/perf/scripts/python/task-analyzer.py
parent1dc82675cb79200d5e140520efd7ce88b38ea56d (diff)
asm-generic/div64: optimize/simplify __div64_const32()
Several years later I just realized that this code could be greatly simplified. First, let's formalize the need for overflow handling in __arch_xprod64(). Assuming n = UINT64_MAX, there are 2 cases where an overflow may occur: 1) If a bias must be added, we have m_lo * n_lo + m or m_lo * 0xffffffff + ((m_hi << 32) + m_lo) or ((m_lo << 32) - m_lo) + ((m_hi << 32) + m_lo) or (m_lo + m_hi) << 32 which must be < (1 << 64). So the criteria for no overflow is m_lo + m_hi < (1 << 32). 2) The cross product m_lo * n_hi + m_hi * n_lo or m_lo * 0xffffffff + m_hi * 0xffffffff or ((m_lo << 32) - m_lo) + ((m_hi << 32) - m_hi). Assuming the top result from the previous step (m_lo + m_hi) that must be added to this, we get (m_lo + m_hi) << 32 again. So let's have a straight and simpler version when this is true. Otherwise some reordering allows for taking care of possible overflows without any actual conditionals. And prevent from generating both code variants by making sure this is considered only if m is perceived as constant by the compiler. This, in turn, allows for greatly simplifying __div64_const32(). The "special case" may go as well as the regular case works just fine without needing a bias. Then reduction should be applied all the time as minimizing m is the key. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Diffstat (limited to 'tools/perf/scripts/python/task-analyzer.py')
0 files changed, 0 insertions, 0 deletions