diff options
author | Rafael J. Wysocki <rafael.j.wysocki@intel.com> | 2025-01-13 19:48:47 +0100 |
---|---|---|
committer | Rafael J. Wysocki <rafael.j.wysocki@intel.com> | 2025-01-20 17:17:02 +0100 |
commit | 13ed5c4a6d9c91755227b7b0fab7b2543f6adfd2 (patch) | |
tree | 88f3b082f20015bfd78a86a0fe521f72dce50286 /tools/perf/scripts/python/syscall-counts.py | |
parent | d619b5cc678024fa5ed7eb3702c3991a2aa96823 (diff) |
cpuidle: teo: Skip getting the sleep length if wakeups are very frequent
Commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length()
call in some cases") attempted to reduce the governor overhead in some
cases by making it avoid obtaining the sleep length (the time till the
next timer event) which may be costly.
Among other things, after the above commit, tick_nohz_get_sleep_length()
was not called any more when idle state 0 was to be returned, which
turned out to be problematic and the previous behavior in that respect
was restored by commit 4b20b07ce72f ("cpuidle: teo: Don't count non-
existent intercepts").
However, commit 6da8f9ba5a87 also caused the governor to avoid calling
tick_nohz_get_sleep_length() on systems where idle state 0 is a "polling"
one (that is, it is not really an idle state, but a loop continuously
executed by the CPU) when the target residency of the idle state to be
returned was low enough, so there was no practical need to refine the
idle state selection in any way. This change was not removed by the
other commit, so now on systems where idle state 0 is a "polling" one,
tick_nohz_get_sleep_length() is called when idle state 0 is to be
returned, but it is not called when a deeper idle state with
sufficiently low target residency is to be returned. That is arguably
confusing and inconsistent.
Moreover, there is no specific reason why the behavior in question
should depend on whether or not idle state 0 is a "polling" one.
One way to address this would be to make the governor always call
tick_nohz_get_sleep_length() to obtain the sleep length, but that would
effectively mean reverting commit 6da8f9ba5a87 and restoring the latency
issue that was the reason for doing it. This approach is thus not
particularly attractive.
To address it differently, notice that if a CPU is woken up very often,
this is not likely to be caused by timers in the first place (user space
has a default timer slack of 50 us and there are relatively few timers
with a deadline shorter than several microseconds in the kernel) and
even if it were the case, the potential benefit from using a deep idle
state would then be questionable for latency reasons. Therefore, if the
majority of CPU wakeups occur within several microseconds, it can be
assumed that all wakeups in that range are non-timer and the sleep
length need not be determined.
Accordingly, introduce a new metric for counting wakeups with the
measured idle duration below RESIDENCY_THRESHOLD_NS and modify the idle
state selection to skip the tick_nohz_get_sleep_length() invocation if
idle state 0 has been selected or the target residency of the candidate
idle state is below RESIDENCY_THRESHOLD_NS and the value of the new
metric is at least 1/2 of the total event count.
Since the above requires the measured idle duration to be determined
every time, except for the cases when one of the safety nets has
triggered in which the wakeup is counted as a hit in the deepest
idle state idle residency range, update the handling of those cases
to avoid skipping the idle duration computation when the CPU wakeup
is "genuine".
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/3851791.kQq0lBPeGt@rjwysocki.net
Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
[ rjw: Renamed a struct field ]
[ rjw: Fixed typo in the subject and one in a comment ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Diffstat (limited to 'tools/perf/scripts/python/syscall-counts.py')
0 files changed, 0 insertions, 0 deletions