linux.git - Linus' kernel tree

diff options

author	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-01-13 19:48:47 +0100
committer	Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-01-20 17:17:02 +0100
commit	13ed5c4a6d9c91755227b7b0fab7b2543f6adfd2 (patch)
tree	88f3b082f20015bfd78a86a0fe521f72dce50286 /tools/perf/scripts/python/syscall-counts.py
parent	d619b5cc678024fa5ed7eb3702c3991a2aa96823 (diff)

cpuidle: teo: Skip getting the sleep length if wakeups are very frequent

Commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases") attempted to reduce the governor overhead in some cases by making it avoid obtaining the sleep length (the time till the next timer event) which may be costly. Among other things, after the above commit, tick_nohz_get_sleep_length() was not called any more when idle state 0 was to be returned, which turned out to be problematic and the previous behavior in that respect was restored by commit 4b20b07ce72f ("cpuidle: teo: Don't count non- existent intercepts"). However, commit 6da8f9ba5a87 also caused the governor to avoid calling tick_nohz_get_sleep_length() on systems where idle state 0 is a "polling" one (that is, it is not really an idle state, but a loop continuously executed by the CPU) when the target residency of the idle state to be returned was low enough, so there was no practical need to refine the idle state selection in any way. This change was not removed by the other commit, so now on systems where idle state 0 is a "polling" one, tick_nohz_get_sleep_length() is called when idle state 0 is to be returned, but it is not called when a deeper idle state with sufficiently low target residency is to be returned. That is arguably confusing and inconsistent. Moreover, there is no specific reason why the behavior in question should depend on whether or not idle state 0 is a "polling" one. One way to address this would be to make the governor always call tick_nohz_get_sleep_length() to obtain the sleep length, but that would effectively mean reverting commit 6da8f9ba5a87 and restoring the latency issue that was the reason for doing it. This approach is thus not particularly attractive. To address it differently, notice that if a CPU is woken up very often, this is not likely to be caused by timers in the first place (user space has a default timer slack of 50 us and there are relatively few timers with a deadline shorter than several microseconds in the kernel) and even if it were the case, the potential benefit from using a deep idle state would then be questionable for latency reasons. Therefore, if the majority of CPU wakeups occur within several microseconds, it can be assumed that all wakeups in that range are non-timer and the sleep length need not be determined. Accordingly, introduce a new metric for counting wakeups with the measured idle duration below RESIDENCY_THRESHOLD_NS and modify the idle state selection to skip the tick_nohz_get_sleep_length() invocation if idle state 0 has been selected or the target residency of the candidate idle state is below RESIDENCY_THRESHOLD_NS and the value of the new metric is at least 1/2 of the total event count. Since the above requires the measured idle duration to be determined every time, except for the cases when one of the safety nets has triggered in which the wakeup is counted as a hit in the deepest idle state idle residency range, update the handling of those cases to avoid skipping the idle duration computation when the CPU wakeup is "genuine". Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/3851791.kQq0lBPeGt@rjwysocki.net Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> [ rjw: Renamed a struct field ] [ rjw: Fixed typo in the subject and one in a comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Diffstat (limited to 'tools/perf/scripts/python/syscall-counts.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: