diff options
author | Tejun Heo <tj@kernel.org> | 2024-06-18 10:09:19 -1000 |
---|---|---|
committer | Tejun Heo <tj@kernel.org> | 2024-06-18 10:09:19 -1000 |
commit | 22a920209ab6aa4f8ec960ed81041643fddeaec6 (patch) | |
tree | f394f255753658be2b04fd6f0c163bccc8d7566a /tools/sched_ext/scx_central.c | |
parent | 1c29f8541e178c590c2b9b66b9681e6ccab84cea (diff) |
sched_ext: Implement tickless support
Allow BPF schedulers to indicate tickless operation by setting p->scx.slice
to SCX_SLICE_INF. A CPU whose current task has infinte slice goes into
tickless operation.
scx_central is updated to use tickless operations for all tasks and
instead use a BPF timer to expire slices. This also uses the SCX_ENQ_PREEMPT
and task state tracking added by the previous patches.
Currently, there is no way to pin the timer on the central CPU, so it may
end up on one of the worker CPUs; however, outside of that, the worker CPUs
can go tickless both while running sched_ext tasks and idling.
With schbench running, scx_central shows:
root@test ~# grep ^LOC /proc/interrupts; sleep 10; grep ^LOC /proc/interrupts
LOC: 142024 656 664 449 Local timer interrupts
LOC: 161663 663 665 449 Local timer interrupts
Without it:
root@test ~ [SIGINT]# grep ^LOC /proc/interrupts; sleep 10; grep ^LOC /proc/interrupts
LOC: 188778 3142 3793 3993 Local timer interrupts
LOC: 198993 5314 6323 6438 Local timer interrupts
While scx_central itself is too barebone to be useful as a
production scheduler, a more featureful central scheduler can be built using
the same approach. Google's experience shows that such an approach can have
significant benefits for certain applications such as VM hosting.
v4: Allow operation even if BPF_F_TIMER_CPU_PIN is not available.
v3: Pin the central scheduler's timer on the central_cpu using
BPF_F_TIMER_CPU_PIN.
v2: Convert to BPF inline iterators.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
Diffstat (limited to 'tools/sched_ext/scx_central.c')
-rw-r--r-- | tools/sched_ext/scx_central.c | 29 |
1 files changed, 28 insertions, 1 deletions
diff --git a/tools/sched_ext/scx_central.c b/tools/sched_ext/scx_central.c index 5f09fc666a63..fb3f50886552 100644 --- a/tools/sched_ext/scx_central.c +++ b/tools/sched_ext/scx_central.c @@ -48,6 +48,7 @@ int main(int argc, char **argv) struct bpf_link *link; __u64 seq = 0; __s32 opt; + cpu_set_t *cpuset; libbpf_set_print(libbpf_print_fn); signal(SIGINT, sigint_handler); @@ -77,10 +78,35 @@ int main(int argc, char **argv) /* Resize arrays so their element count is equal to cpu count. */ RESIZE_ARRAY(skel, data, cpu_gimme_task, skel->rodata->nr_cpu_ids); + RESIZE_ARRAY(skel, data, cpu_started_at, skel->rodata->nr_cpu_ids); SCX_OPS_LOAD(skel, central_ops, scx_central, uei); + + /* + * Affinitize the loading thread to the central CPU, as: + * - That's where the BPF timer is first invoked in the BPF program. + * - We probably don't want this user space component to take up a core + * from a task that would benefit from avoiding preemption on one of + * the tickless cores. + * + * Until BPF supports pinning the timer, it's not guaranteed that it + * will always be invoked on the central CPU. In practice, this + * suffices the majority of the time. + */ + cpuset = CPU_ALLOC(skel->rodata->nr_cpu_ids); + SCX_BUG_ON(!cpuset, "Failed to allocate cpuset"); + CPU_ZERO(cpuset); + CPU_SET(skel->rodata->central_cpu, cpuset); + SCX_BUG_ON(sched_setaffinity(0, sizeof(cpuset), cpuset), + "Failed to affinitize to central CPU %d (max %d)", + skel->rodata->central_cpu, skel->rodata->nr_cpu_ids - 1); + CPU_FREE(cpuset); + link = SCX_OPS_ATTACH(skel, central_ops, scx_central); + if (!skel->data->timer_pinned) + printf("WARNING : BPF_F_TIMER_CPU_PIN not available, timer not pinned to central\n"); + while (!exit_req && !UEI_EXITED(skel, uei)) { printf("[SEQ %llu]\n", seq++); printf("total :%10" PRIu64 " local:%10" PRIu64 " queued:%10" PRIu64 " lost:%10" PRIu64 "\n", @@ -88,7 +114,8 @@ int main(int argc, char **argv) skel->bss->nr_locals, skel->bss->nr_queued, skel->bss->nr_lost_pids); - printf(" dispatch:%10" PRIu64 " mismatch:%10" PRIu64 " retry:%10" PRIu64 "\n", + printf("timer :%10" PRIu64 " dispatch:%10" PRIu64 " mismatch:%10" PRIu64 " retry:%10" PRIu64 "\n", + skel->bss->nr_timers, skel->bss->nr_dispatches, skel->bss->nr_mismatches, skel->bss->nr_retries); |