git.armlinux.org.uk/linux.git - Linus' kernel tree

diff options

author	Tejun Heo <tj@kernel.org>	2025-11-11 09:18:06 -1000
committer	Tejun Heo <tj@kernel.org>	2025-11-12 06:43:44 -1000
commit	61debc251c1c9150c7bdfd5c028bc2d078e17d22 (patch)
tree	ae23364e9ca058952288b1de78e049bcc486720f /tools/lib/python/abi/helpers.py
parent	3546119f18647d7ddbba579737d8a222b430cb1c (diff)

sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode

Bypass mode routes tasks through fallback dispatch queues. Originally a single global DSQ, b7b3b2dbae73 ("sched_ext: Split the global DSQ per NUMA node") changed this to per-node DSQs to resolve NUMA-related livelocks. Dan Schatzberg found per-node DSQs can still livelock when many threads are pinned to different small CPU subsets: each CPU must scan many incompatible tasks to find runnable ones, causing severe contention with high CPU counts. Switch to per-CPU bypass DSQs. Each task queues on its current CPU. Default idle CPU selection and direct dispatch handle most cases well. This introduces a failure mode when tasks concentrate on one CPU in over-saturated systems. If the BPF scheduler severely skews placement before triggering bypass, that CPU's queue may be too long to drain, causing RCU stalls. A load balancer in a future patch will address this. The bypass DSQ is separate from local DSQ to enable load balancing: local DSQs use rq locks, preventing efficient scanning and transfer across CPUs, especially problematic when systems are already contended. v2: Clarified why bypass DSQ is separate from local DSQ (Andrea Righi). Reported-by: Dan Schatzberg <schatzberg.dan@gmail.com> Reviewed-by: Dan Schatzberg <schatzberg.dan@gmail.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Tejun Heo <tj@kernel.org>

Diffstat (limited to 'tools/lib/python/abi/helpers.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: