diff options
| author | Dietmar Eggemann <dietmar.eggemann@arm.com> | 2025-03-14 16:13:45 +0100 | 
|---|---|---|
| committer | Ingo Molnar <mingo@kernel.org> | 2025-03-15 10:34:27 +0100 | 
| commit | 76f970ce51c80f625eb6ddbb24e9cb51b977b598 (patch) | |
| tree | a28063a6e1921cd4c8f26943caa2f81839f0e37e /rust/kernel | |
| parent | f3fa0e40df175acd60b71036b9a1fd62310aec03 (diff) | |
Revert "sched/core: Reduce cost of sched_move_task when config autogroup"
This reverts commit eff6c8ce8d4d7faef75f66614dd20bb50595d261.
Hazem reported a 30% drop in UnixBench spawn test with commit
eff6c8ce8d4d ("sched/core: Reduce cost of sched_move_task when config
autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM
(aarch64) (single level MC sched domain):
  https://lkml.kernel.org/r/20250205151026.13061-1-hagarhem@amazon.com
There is an early bail from sched_move_task() if p->sched_task_group is
equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are
pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope'
(Ubuntu '22.04.5 LTS').
So in:
  do_exit()
    sched_autogroup_exit_task()
      sched_move_task()
        if sched_get_task_group(p) == p->sched_task_group
          return
        /* p is enqueued */
        dequeue_task()              \
        sched_change_group()        |
          task_change_group_fair()  |
            detach_task_cfs_rq()    |                              (1)
            set_task_rq()           |
            attach_task_cfs_rq()    |
        enqueue_task()              /
(1) isn't called for p anymore.
Turns out that the regression is related to sgs->group_util in
group_is_overloaded() and group_has_capacity(). If (1) isn't called for
all the 'spawn' tasks then sgs->group_util is ~900 and
sgs->group_capacity = 1024 (single CPU sched domain) and this leads to
group_is_overloaded() returning true (2) and group_has_capacity() false
(3) much more often compared to the case when (1) is called.
I.e. there are much more cases of 'group_is_overloaded' and
'group_fully_busy' in WF_FORK wakeup sched_balance_find_dst_cpu() which
then returns much more often a CPU != smp_processor_id() (5).
This isn't good for these extremely short running tasks (FORK + EXIT)
and also involves calling sched_balance_find_dst_group_cpu() unnecessary
(single CPU sched domain).
Instead if (1) is called for 'p->flags & PF_EXITING' then the path
(4),(6) is taken much more often.
  select_task_rq_fair(..., wake_flags = WF_FORK)
    cpu = smp_processor_id()
    new_cpu = sched_balance_find_dst_cpu(..., cpu, ...)
      group = sched_balance_find_dst_group(..., cpu)
        do {
          update_sg_wakeup_stats()
            sgs->group_type = group_classify()
              if group_is_overloaded()                             (2)
                return group_overloaded
              if !group_has_capacity()                             (3)
                return group_fully_busy
              return group_has_spare                               (4)
        } while group
        if local_sgs.group_type > idlest_sgs.group_type
          return idlest                                            (5)
        case group_has_spare:
          if local_sgs.idle_cpus >= idlest_sgs.idle_cpus
            return NULL                                            (6)
Unixbench Tests './Run -c 4 spawn' on:
(a) VM AWS instance (m7gd.16xlarge) with v6.13 ('maxcpus=4 nr_cpus=4')
    and Ubuntu 22.04.5 LTS (aarch64).
    Shell & test run in '/user.slice/user-1000.slice/session-1.scope'.
    w/o patch	w/ patch
    21005	27120
(b) i7-13700K with tip/sched/core ('nosmt maxcpus=8 nr_cpus=8') and
    Ubuntu 22.04.5 LTS (x86_64).
    Shell & test run in '/A'.
    w/o patch	w/ patch
    67675	88806
CONFIG_SCHED_AUTOGROUP=y & /sys/proc/kernel/sched_autogroup_enabled equal
0 or 1.
Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Hagar Hemdan <hagarhem@amazon.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250314151345.275739-1-dietmar.eggemann@arm.com
Diffstat (limited to 'rust/kernel')
0 files changed, 0 insertions, 0 deletions
