From 4301065920b0cbde3986519582347e883b166f3e Mon Sep 17 00:00:00 2001 From: Peter Williams Date: Thu, 9 Aug 2007 11:16:46 +0200 Subject: sched: simplify move_tasks() The move_tasks() function is currently multiplexed with two distinct capabilities: 1. attempt to move a specified amount of weighted load from one run queue to another; and 2. attempt to move a specified number of tasks from one run queue to another. The first of these capabilities is used in two places, load_balance() and load_balance_idle(), and in both of these cases the return value of move_tasks() is used purely to decide if tasks/load were moved and no notice of the actual number of tasks moved is taken. The second capability is used in exactly one place, active_load_balance(), to attempt to move exactly one task and, as before, the return value is only used as an indicator of success or failure. This multiplexing of sched_task() was introduced, by me, as part of the smpnice patches and was motivated by the fact that the alternative, one function to move specified load and one to move a single task, would have led to two functions of roughly the same complexity as the old move_tasks() (or the new balance_tasks()). However, the new modular design of the new CFS scheduler allows a simpler solution to be adopted and this patch addresses that solution by: 1. adding a new function, move_one_task(), to be used by active_load_balance(); and 2. making move_tasks() a single purpose function that tries to move a specified weighted load and returns 1 for success and 0 for failure. One of the consequences of these changes is that neither move_one_task() or the new move_tasks() care how many tasks sched_class.load_balance() moves and this enables its interface to be simplified by returning the amount of load moved as its result and removing the load_moved pointer from the argument list. This helps simplify the new move_tasks() and slightly reduces the amount of work done in each of sched_class.load_balance()'s implementations. Further simplification, e.g. changes to balance_tasks(), are possible but (slightly) complicated by the special needs of load_balance_fair() so I've left them to a later patch (if this one gets accepted). NB Since move_tasks() gets called with two run queue locks held even small reductions in overhead are worthwhile. [ mingo@elte.hu ] this change also reduces code size nicely: text data bss dec hex filename 39216 3618 24 42858 a76a sched.o.before 39173 3618 24 42815 a73f sched.o.after Signed-off-by: Peter Williams Signed-off-by: Ingo Molnar --- kernel/sched_rt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'kernel/sched_rt.c') diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 002fcf8d3f64..2b0626a43cb8 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -172,15 +172,16 @@ static struct task_struct *load_balance_next_rt(void *arg) return p; } -static int +static unsigned long load_balance_rt(struct rq *this_rq, int this_cpu, struct rq *busiest, unsigned long max_nr_move, unsigned long max_load_move, struct sched_domain *sd, enum cpu_idle_type idle, - int *all_pinned, unsigned long *load_moved) + int *all_pinned) { int this_best_prio, best_prio, best_prio_seen = 0; int nr_moved; struct rq_iterator rt_rq_iterator; + unsigned long load_moved; best_prio = sched_find_first_bit(busiest->rt.active.bitmap); this_best_prio = sched_find_first_bit(this_rq->rt.active.bitmap); @@ -203,11 +204,11 @@ load_balance_rt(struct rq *this_rq, int this_cpu, struct rq *busiest, rt_rq_iterator.arg = busiest; nr_moved = balance_tasks(this_rq, this_cpu, busiest, max_nr_move, - max_load_move, sd, idle, all_pinned, load_moved, + max_load_move, sd, idle, all_pinned, &load_moved, this_best_prio, best_prio, best_prio_seen, &rt_rq_iterator); - return nr_moved; + return load_moved; } static void task_tick_rt(struct rq *rq, struct task_struct *p) -- cgit From a4ac01c36e286dd1b9a1d5cd7422c5af51dc55f8 Mon Sep 17 00:00:00 2001 From: Peter Williams Date: Thu, 9 Aug 2007 11:16:46 +0200 Subject: sched: fix bug in balance_tasks() There are two problems with balance_tasks() and how it used: 1. The variables best_prio and best_prio_seen (inherited from the old move_tasks()) were only required to handle problems caused by the active/expired arrays, the order in which they were processed and the possibility that the task with the highest priority could be on either. These issues are no longer present and the extra overhead associated with their use is unnecessary (and possibly wrong). 2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same this_best_prio variable needs to be used by all scheduling classes or there is a risk of moving too much load. E.g. if the highest priority task on this at the beginning is a fairly low priority task and the rt class migrates a task (during its turn) then that moved task becomes the new highest priority task on this_rq but when the sched_fair class initializes its copy of this_best_prio it will get the priority of the original highest priority task as, due to the run queue locks being held, the reschedule triggered by pull_task() will not have taken place. This could result in inappropriate overriding of skip_for_load and excessive load being moved. The attached patch addresses these problems by deleting all reference to best_prio and best_prio_seen and making this_best_prio a reference parameter to the various functions involved. load_balance_fair() has also been modified so that this_best_prio is only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should preserve the effect of helping spread groups' higher priority tasks around the available CPUs while improving system performance when CONFIG_FAIR_GROUP_SCHED isn't set. Signed-off-by: Peter Williams Signed-off-by: Ingo Molnar --- kernel/sched_rt.c | 19 ++----------------- 1 file changed, 2 insertions(+), 17 deletions(-) (limited to 'kernel/sched_rt.c') diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 2b0626a43cb8..5b559e8c8aa6 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -176,26 +176,12 @@ static unsigned long load_balance_rt(struct rq *this_rq, int this_cpu, struct rq *busiest, unsigned long max_nr_move, unsigned long max_load_move, struct sched_domain *sd, enum cpu_idle_type idle, - int *all_pinned) + int *all_pinned, int *this_best_prio) { - int this_best_prio, best_prio, best_prio_seen = 0; int nr_moved; struct rq_iterator rt_rq_iterator; unsigned long load_moved; - best_prio = sched_find_first_bit(busiest->rt.active.bitmap); - this_best_prio = sched_find_first_bit(this_rq->rt.active.bitmap); - - /* - * Enable handling of the case where there is more than one task - * with the best priority. If the current running task is one - * of those with prio==best_prio we know it won't be moved - * and therefore it's safe to override the skip (based on load) - * of any task we find with that prio. - */ - if (busiest->curr->prio == best_prio) - best_prio_seen = 1; - rt_rq_iterator.start = load_balance_start_rt; rt_rq_iterator.next = load_balance_next_rt; /* pass 'busiest' rq argument into @@ -205,8 +191,7 @@ load_balance_rt(struct rq *this_rq, int this_cpu, struct rq *busiest, nr_moved = balance_tasks(this_rq, this_cpu, busiest, max_nr_move, max_load_move, sd, idle, all_pinned, &load_moved, - this_best_prio, best_prio, best_prio_seen, - &rt_rq_iterator); + this_best_prio, &rt_rq_iterator); return load_moved; } -- cgit From d281918d7c135c555d9cebcf73d4320efa8177dc Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 9 Aug 2007 11:16:47 +0200 Subject: sched: remove 'now' use from assignments change all 'now' timestamp uses in assignments to rq->clock. ( this is an identity transformation that causes no functionality change: all such new rq->clock is necessarily preceded by an update_rq_clock() call. ) Signed-off-by: Ingo Molnar --- kernel/sched_rt.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'kernel/sched_rt.c') diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 5b559e8c8aa6..5fbd87ad0f56 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -15,14 +15,14 @@ static inline void update_curr_rt(struct rq *rq, u64 now) if (!task_has_rt_policy(curr)) return; - delta_exec = now - curr->se.exec_start; + delta_exec = rq->clock - curr->se.exec_start; if (unlikely((s64)delta_exec < 0)) delta_exec = 0; schedstat_set(curr->se.exec_max, max(curr->se.exec_max, delta_exec)); curr->se.sum_exec_runtime += delta_exec; - curr->se.exec_start = now; + curr->se.exec_start = rq->clock; } static void @@ -89,7 +89,7 @@ static struct task_struct *pick_next_task_rt(struct rq *rq, u64 now) queue = array->queue + idx; next = list_entry(queue->next, struct task_struct, run_list); - next->se.exec_start = now; + next->se.exec_start = rq->clock; return next; } -- cgit From f1e14ef64d3e1bdcb3437f1e926fe5a7f861aa0a Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 9 Aug 2007 11:16:48 +0200 Subject: sched: remove the 'u64 now' parameter from update_curr_rt() remove the 'u64 now' parameter from update_curr_rt(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar --- kernel/sched_rt.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'kernel/sched_rt.c') diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 5fbd87ad0f56..fa5a46273b79 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -7,7 +7,7 @@ * Update the current task's runtime statistics. Skip current tasks that * are not in our scheduling class. */ -static inline void update_curr_rt(struct rq *rq, u64 now) +static inline void update_curr_rt(struct rq *rq) { struct task_struct *curr = rq->curr; u64 delta_exec; @@ -42,7 +42,7 @@ dequeue_task_rt(struct rq *rq, struct task_struct *p, int sleep, u64 now) { struct rt_prio_array *array = &rq->rt.active; - update_curr_rt(rq, now); + update_curr_rt(rq); list_del(&p->run_list); if (list_empty(array->queue + p->prio)) @@ -96,7 +96,7 @@ static struct task_struct *pick_next_task_rt(struct rq *rq, u64 now) static void put_prev_task_rt(struct rq *rq, struct task_struct *p, u64 now) { - update_curr_rt(rq, now); + update_curr_rt(rq); p->se.exec_start = 0; } -- cgit From fd390f6a04f22fb457d6fd1855964f79536525de Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 9 Aug 2007 11:16:48 +0200 Subject: sched: remove the 'u64 now' parameter from ->enqueue_task() remove the 'u64 now' parameter from ->enqueue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar --- kernel/sched_rt.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'kernel/sched_rt.c') diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index fa5a46273b79..1edaa99e0d3d 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -25,8 +25,7 @@ static inline void update_curr_rt(struct rq *rq) curr->se.exec_start = rq->clock; } -static void -enqueue_task_rt(struct rq *rq, struct task_struct *p, int wakeup, u64 now) +static void enqueue_task_rt(struct rq *rq, struct task_struct *p, int wakeup) { struct rt_prio_array *array = &rq->rt.active; -- cgit From f02231e51a280f1a0fee4d03ad8f50048e06cced Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 9 Aug 2007 11:16:48 +0200 Subject: sched: remove the 'u64 now' parameter from ->dequeue_task() remove the 'u64 now' parameter from ->dequeue_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar --- kernel/sched_rt.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'kernel/sched_rt.c') diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 1edaa99e0d3d..60591e2512b1 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -36,8 +36,7 @@ static void enqueue_task_rt(struct rq *rq, struct task_struct *p, int wakeup) /* * Adding/removing a task to/from a priority array: */ -static void -dequeue_task_rt(struct rq *rq, struct task_struct *p, int sleep, u64 now) +static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int sleep) { struct rt_prio_array *array = &rq->rt.active; -- cgit From fb8d47240246e20f864f0724a23a7220cd1c59ac Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 9 Aug 2007 11:16:48 +0200 Subject: sched: remove the 'u64 now' parameter from ->pick_next_task() remove the 'u64 now' parameter from ->pick_next_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar --- kernel/sched_rt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel/sched_rt.c') diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index 60591e2512b1..c0b0d6237bb6 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -73,7 +73,7 @@ static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p) resched_task(rq->curr); } -static struct task_struct *pick_next_task_rt(struct rq *rq, u64 now) +static struct task_struct *pick_next_task_rt(struct rq *rq) { struct rt_prio_array *array = &rq->rt.active; struct task_struct *next; -- cgit From 31ee529cc2254e8b62880535ec8f21a4c5e1c091 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 9 Aug 2007 11:16:49 +0200 Subject: sched: remove the 'u64 now' parameter from ->put_prev_task() remove the 'u64 now' parameter from ->put_prev_task(). ( identity transformation that causes no change in functionality. ) Signed-off-by: Ingo Molnar --- kernel/sched_rt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel/sched_rt.c') diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c index c0b0d6237bb6..dcdcad632fd9 100644 --- a/kernel/sched_rt.c +++ b/kernel/sched_rt.c @@ -92,7 +92,7 @@ static struct task_struct *pick_next_task_rt(struct rq *rq) return next; } -static void put_prev_task_rt(struct rq *rq, struct task_struct *p, u64 now) +static void put_prev_task_rt(struct rq *rq, struct task_struct *p) { update_curr_rt(rq); p->se.exec_start = 0; -- cgit