linux.git - Linus' kernel tree

diff options

author	Nicolas Pitre <npitre@baylibre.com>	2024-10-03 17:16:14 -0400
committer	Arnd Bergmann <arnd@arndb.de>	2024-10-28 21:44:28 +0000
commit	00a31dd3acea0f88f947fc71e268ebb34b59f218 (patch)
tree	26f7fb60c6ab8a359e82eaa1c159d29703fe0789 /tools/perf/scripts/python/task-analyzer.py
parent	1dc82675cb79200d5e140520efd7ce88b38ea56d (diff)

asm-generic/div64: optimize/simplify __div64_const32()

Several years later I just realized that this code could be greatly simplified. First, let's formalize the need for overflow handling in __arch_xprod64(). Assuming n = UINT64_MAX, there are 2 cases where an overflow may occur: 1) If a bias must be added, we have m_lo * n_lo + m or m_lo * 0xffffffff + ((m_hi << 32) + m_lo) or ((m_lo << 32) - m_lo) + ((m_hi << 32) + m_lo) or (m_lo + m_hi) << 32 which must be < (1 << 64). So the criteria for no overflow is m_lo + m_hi < (1 << 32). 2) The cross product m_lo * n_hi + m_hi * n_lo or m_lo * 0xffffffff + m_hi * 0xffffffff or ((m_lo << 32) - m_lo) + ((m_hi << 32) - m_hi). Assuming the top result from the previous step (m_lo + m_hi) that must be added to this, we get (m_lo + m_hi) << 32 again. So let's have a straight and simpler version when this is true. Otherwise some reordering allows for taking care of possible overflows without any actual conditionals. And prevent from generating both code variants by making sure this is considered only if m is perceived as constant by the compiler. This, in turn, allows for greatly simplifying __div64_const32(). The "special case" may go as well as the regular case works just fine without needing a bias. Then reduction should be applied all the time as minimizing m is the key. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Diffstat (limited to 'tools/perf/scripts/python/task-analyzer.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: