summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-29drm: io_remap_pfn_range() sets VM_IO...Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29sparc: __pci_mmap_set_flags() is uselessAl Viro
io_remap_pfn_range() does all we need Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29mn10300: don't bother with VM_IOAl Viro
io_remap_pfn_range() sets it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29hose_mmap_page_range(): io_remap_pfn_range() will set all those flags...Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29samsung: don't bother with setting VM_IOAl Viro
io_remap_pfn_range() will set it just fine Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29consolidate io_remap_pfn_range definitionsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29UBIFS: fix a horrid bugArtem Bityutskiy
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are in the middle of 'ubifs_readdir()'. This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage, but this may corrupt memory and lead to all kinds of problems like crashes an security holes. This patch fixes the problem by using the 'file->f_version' field, which '->llseek()' always unconditionally sets to zero. We set it to 1 in 'ubifs_readdir()' and whenever we detect that it became 0, we know there was a seek and it is time to clear the state saved in 'file->private_data'. I tested this patch by writing a user-space program which runds readdir and seek in parallell. I could easily crash the kernel without these patches, but could not crash it with these patches. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29UBIFS: prepare to fix a horrid bugArtem Bityutskiy
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are in the middle of 'ubifs_readdir()'. First of all, this means that 'file->private_data' can be freed while 'ubifs_readdir()' uses it. But this particular patch does not fix the problem. This patch is only a preparation, and the fix will follow next. In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly, because 'file->f_pos' can be changed by '->llseek()' at any point. This may lead 'ubifs_readdir()' to returning inconsistent data: directory entry names may correspond to incorrect file positions. So here we introduce a local variable 'pos', read 'file->f_pose' once at very the beginning, and then stick to 'pos'. The result of this is that when 'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of 'ubifs_readdir()', the latter "wins". Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-28mn10300: Use early_param() to parse "mem=" parameterAkira Takeuchi
This fixes the problem that "init=" options may not be passed to kernel correctly. parse_mem_cmdline() of mn10300 arch gets rid of "mem=" string from redboot_command_line. Then init_setup() parses the "init=" options from static_command_line, which is a copy of redboot_command_line, and keeps the pointer to the init options in execute_command variable. Since the commit 026cee0 upstream (params: <level>_initcall-like kernel parameters), static_command_line becomes overwritten by saved_command_line at do_initcall_level(). Notice that saved_command_line is a command line which includes "mem=" string. As a result, execute_command may point to weird string by the length of "mem=" parameter. I noticed this problem when using the command line like this: mem=128M console=ttyS0,115200 init=/bin/sh Here is the processing flow of command line parameters. start_kernel() setup_arch(&command_line) parse_mem_cmdline(cmdline_p) * strcpy(boot_command_line, redboot_command_line); * Remove "mem=xxx" from redboot_command_line. * *cmdline_p = redboot_command_line; setup_command_line(command_line) <-- command_line is redboot_command_line * strcpy(saved_command_line, boot_command_line) * strcpy(static_command_line, command_line) parse_early_param() strlcpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE); parse_early_options(tmp_cmdline); parse_args("early options", cmdline, NULL, 0, 0, 0, do_early_param); parse_args("Booting ..", static_command_line, ...); init_setup() <-- save the pointer in execute_command rest_init() kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND); At this point, execute_command points to "/bin/sh" string. kernel_init() kernel_init_freeable() do_basic_setup() do_initcalls() do_initcall_level() (*) strcpy(static_command_line, saved_command_line); Here, execute_command gets to point to "200" string !! Signed-off-by: David Howells <dhowells@redhat.com>
2013-06-28mn10300: Allow to pass array name to get_user()Akira Takeuchi
This fixes the following compile error: CC block/scsi_ioctl.o block/scsi_ioctl.c: In function 'sg_scsi_ioctl': block/scsi_ioctl.c:449: error: invalid initializer Signed-off-by: David Howells <dhowells@redhat.com>
2013-06-28softirq: Use _RET_IP_Davidlohr Bueso
Use the already defined macro to pass the function return address. Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1367347569.1784.3.camel@buesod1.americas.hpqcorp.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28sched/debug: Remove CONFIG_FAIR_GROUP_SCHED maskAlex Shi
Now that we are using runnable load avg in sched balance, we don't need to keep it under CONFIG_FAIR_GROUP_SCHED. Also align the code style to #ifdef instead of #if defined() and reorder the tg output info. Signed-off-by: Alex Shi <alex.shi@intel.com> Cc: pjt@google.com Cc: kamalesh@linux.vnet.ibm.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1372417835-4698-1-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28genirq: Add the generic chip to the genirq docbookThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Randy Dunlap <rdunlap@infradead.org>
2013-06-28genirq: generic-chip: Export some irq_gc_ functionsFabio Estevam
When building imx_v6_v7_defconfig with imx-drm drivers selected as modules, we get the following build errors: ERROR: "irq_gc_mask_clr_bit" [drivers/staging/imx-drm/ipu-v3/imx-ipu-v3.ko] undefined! ERROR: "irq_gc_mask_set_bit" [drivers/staging/imx-drm/ipu-v3/imx-ipu-v3.ko] undefined! ERROR: "irq_gc_ack_set_bit" [drivers/staging/imx-drm/ipu-v3/imx-ipu-v3.ko] undefined! Export the required functions to avoid this problem. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Cc: shawn.guo@linaro.org Cc: kernel@pengutronix.de Link: http://lkml.kernel.org/r/1372389789-7048-1-git-send-email-festevam@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28genirq: Fix can_request_irq() for IRQs without an actionBen Hutchings
Commit 02725e7471b8 ('genirq: Use irq_get/put functions'), inadvertently changed can_request_irq() to return 0 for IRQs that have no action. This causes pcibios_lookup_irq() to select only IRQs that already have an action with IRQF_SHARED set, or to fail if there are none. Change can_request_irq() to return 1 for IRQs that have no action (if the first two conditions are met). Reported-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is> Tested-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is> (against 3.2) Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: 709647@bugs.debian.org Cc: stable@vger.kernel.org # 2.6.39+ Link: http://bugs.debian.org/709647 Link: http://lkml.kernel.org/r/1372383630.23847.40.camel@deadeye.wl.decadent.org.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28irqchip: exynos-combiner: Staticize combiner_initSachin Kamat
combiner_init() is referenced only in this file. Make it static. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kgene.kim@samsung.com Cc: t.figa@samsung.com Cc: arnd@arndb.de Cc: patches@linaro.org Link: http://lkml.kernel.org/r/1372246597-32323-2-git-send-email-sachin.kamat@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-28sched/debug: Fix formatting of /proc/<PID>/schedKamalesh Babulal
This patch alters format string's width, to align all statistics at par with the longest struct sched_statistic member name under /proc/<PID>/sched. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/20130627165005.GA15583@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28x86/ioremap: Correct function name outputBorislav Petkov
Infact, let the compiler enter the function name so that there are no discrepancies. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1372369996-20556-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28Merge tag 'please-pull-mce' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE cleanup from Tony Luck: "Changes to simplify the SDM means that we can also simplify the code for SRAR (software recoverable action required) errors." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28x86/tboot: Provide debugfs interfaces to access TXT logQiaowei Ren
These logs come from tboot (Trusted Boot, an open source, pre-kernel/VMM module that uses Intel TXT to perform a measured and verified launch of an OS kernel/VMM.). Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Gang Wei <gang.wei@intel.com> Link: http://lkml.kernel.org/r/1372053333-21788-1-git-send-email-qiaowei.ren@intel.com [ Beautified the code a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28drm/qxl: add missing access check for execbuffer ioctlDave Airlie
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-06-28powerpc/eeh: Add eeh_dev to the cache during bootThadeu Lima de Souza Cascardo
commit f8f7d63fd96ead101415a1302035137a866f8998 ("powerpc/eeh: Trace eeh device from I/O cache") broke EEH on pseries for devices that were present during boot and have not been hotplugged/DLPARed. eeh_check_failure will get the eeh_dev from the cache, and will get NULL. eeh_addr_cache_build adds the addresses to the cache, but eeh_dev for the giving pci_device is not set yet. Just reordering the call to eeh_addr_cache_insert_dev works fine. The ordering is similar to the one in eeh_add_device_late. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-27Merge tag 'msm-clock-for-3.11b' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/davidb/linux-msm into next/late From David Brown: MSM clock updates for 3.11. Per Stephen Boyd's coverletter: Resending to collect higher level maintainer acks per Olof's request. The plan is to push this patchset through MSM to the arm-soc tree. This patchset moves the existing MSM clock code and affected drivers to the common clock framework. A prerequisite of moving to the common clock framework is to use clk_prepare() and clk_enable() so the first few patches migrate drivers to that call (clk_prepare() is a no-op on MSM right now). It also removes some custom clock APIs that MSM provides and finally moves the proc_comm clock code to the common struct clk. This patch series will be used as the foundation of the MSM 8660/8960 clock code that I plan to send out after this series. * tag 'msm-clock-for-3.11b' of git://git.kernel.org/pub/scm/linux/kernel/git/davidb/linux-msm: ARM: msm: Migrate to common clock framework ARM: msm: Make proc_comm clock control into a platform driver ARM: msm: Prepare clk_get() users in mach-msm for clock-pcom driver ARM: msm: Remove clock-7x30.h include file ARM: msm: Remove custom clk_set_{max,min}_rate() API ARM: msm: Remove custom clk_set_flags() API msm: iommu: Use clk_set_rate() instead of clk_set_min_rate() msm: iommu: Convert to clk_prepare/unprepare msm_sdcc: Convert to clk_prepare/unprepare usb: otg: msm: Convert to clk_prepare/unprepare msm_serial: Use devm_clk_get() and properly return errors msm_serial: Convert to clk_prepare/unprepare Acked-by: Chris Ball <cjb@laptop.org> # for msm_sdcc.c Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-27Merge branch 'msm/fixes' into next/lateOlof Johansson
Merging in msm/fixes to avoid silly conflicts at top level. Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-27x86/mce: Update MCE severity condition checkChen Gong
Update some SRAR severity conditions check to make it clearer, according to latest Intel SDM Vol 3(June 2013), table 15-20. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-27GFS2: Reserve journal space for quota change in do_growBob Peterson
If a GFS2 file system is mounted with quotas and a file is grown in such a way that its free blocks for the allocation are represented in a secondary bitmap, GFS2 ran out of blocks in the transaction. That resulted in "fatal: assertion "tr->tr_num_buf <= tr->tr_blocks". This patch reserves extra blocks for the quota change so the transaction has enough space. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-27Merge tag 'davinci-for-v3.11/dt' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into next/late From Sekhar Nori: Device Tree updates for DaVinci This patch set updates da850 DTS files to enable use of C pre-processor. Also updates pinctrl-single DT data to go with changes done in that module to enable a single register to service configuration of multiple pins. * tag 'davinci-for-v3.11/dt' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci: ARM: davinci: da850: adopt to pinctrl-single change for configuring multiple pins ARM: davinci: da850: Use #include for all device trees Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-27Merge tag 'at91-fixes' of git://github.com/at91linux/linux-at91 into ↵Arnd Bergmann
next/fixes-non-critical From Nicolas Ferre: Several fixes for: - external irq on non-DT boards - cpuidle code in some circumstances - PMC code in relation with PLLB/PLL_UTMI/USB: mainly for SAMA5D3 and AT91SAM9N12 * tag 'at91-fixes' of git://github.com/at91linux/linux-at91: ARM: at91/PMC: use at91_usb_rate() for UTMI PLL ARM: at91/PMC: fix at91sam9n12 USB FS init ARM: at91/PMC: at91sam9n12 family has a PLLB ARM: at91/PMC: sama5d3 family doesn't have a PLLB ARM: at91: cpuidle: Fix target_residency ARM: at91: fix at91_extern_irq usage for non-dt boards Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-27ARM: ux500: bail out on alien cpusLinus Walleij
This makes the l2x0 initialization fail gracefully on non-ux500 systems. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2013-06-27rbd: send snapshot context with writesJosh Durgin
Sending the right snapshot context with each write is required for snapshots to work. Due to the ordering of calls, the snapshot context is never set for any requests. This causes writes to the current version of the image to be reflected in all snapshots, which are supposed to be read-only. This happens because rbd_osd_req_format_write() sets the snapshot context based on obj_request->img_request. At this point, however, obj_request->img_request has not been set yet, to the snapshot context is set to NULL. Fix this by moving rbd_img_obj_request_add(), which sets obj_request->img_request, before the osd request formatting calls. This resolves: http://tracker.ceph.com/issues/5465 Reported-by: Karol Jurak <karol.jurak@gmail.com> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2013-06-27Merge tag 'renesas-sh-sci-for-v3.11' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/late Renesas sh-sci updates for v3.11 HSCIF support by Ulrich Hecht. * tag 'renesas-sh-sci-for-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: serial: sh-sci: Initialise variables before access in sci_set_termios() ARM: shmobile: r8a7790: don't use external clock for SCIFs ARM: shmobile: r8a7790: HSCIF support serial: sh-sci: HSCIF support Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-27Merge branch 'renesas/boards2' into next/lateArnd Bergmann
Conflicts: arch/arm/mach-shmobile/setup-r8a7778.c This is a dependency for the Renesas sh-sci updates. Signedf-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-27ARM: integrator: let pciv3 use mem/premem from device treeLinus Walleij
Instead of relying on the hard-coded mem/premem bases for the PCI side, read in these from the device tree in the DT probe path. Hard-code the old values on the non-DT probe path. Introduce some static locals to hold these addresses instead of the earlier static #defines. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-27ARM: integrator: set local side PCI addresses rightLinus Walleij
This alters the local side address of the iospace to zero, non prefetchable memory local side address to 0x00000000 and prefetchable memory local side address to 0x10000000, so as to match the values actually poked in by the driver. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-06-27sched: Fix typo in struct sched_avg member descriptionKamalesh Babulal
Remove extra 'for' from the description about member of struct sched_avg. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: pjt@google.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/20130627060409.GB18582@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched/fair: Fix typo describing flags in enqueue_entityKamalesh Babulal
Fix spelling of 'calling' in description of se flags in enqueue_entity(). Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/20130627055418.GA18582@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched/debug: Add load-tracking statistics to taskKamalesh Babulal
At present we print per-entity load-tracking statistics for cfs_rq of cgroups/runqueues. Given that per task statistics is maintained, it can be used to know the contribution made by the task to its parenting cfs_rq level. This patch adds per-task load-tracking statistics to /proc/<PID>/sched. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130625080336.GA20175@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched: Change get_rq_runnable_load() to static and inlineAlex Shi
Based-on-patch-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alex Shi <alex.shi@intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-14-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched/tg: Remove tg.load_weightAlex Shi
Since no one use it. Signed-off-by: Alex Shi <alex.shi@intel.com> Reviewed-by: Paul Turner <pjt@google.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-13-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched/cfs_rq: Change atomic64_t removed_load to atomic_long_tAlex Shi
Similar to runnable_load_avg, blocked_load_avg variable, long type is enough for removed_load in 64 bit or 32 bit machine. Then we avoid the expensive atomic64 operations on 32 bit machine. Signed-off-by: Alex Shi <alex.shi@intel.com> Reviewed-by: Paul Turner <pjt@google.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-12-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched/tg: Use 'unsigned long' for load variable in task groupAlex Shi
Since tg->load_avg is smaller than tg->load_weight, we don't need a atomic64_t variable for load_avg in 32 bit machine. The same reason for cfs_rq->tg_load_contrib. The atomic_long_t/unsigned long variable type are more efficient and convenience for them. Signed-off-by: Alex Shi <alex.shi@intel.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-11-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched: Change cfs_rq load avg to unsigned longAlex Shi
Since the 'u64 runnable_load_avg, blocked_load_avg' in cfs_rq struct are smaller than 'unsigned long' cfs_rq->load.weight. We don't need u64 vaiables to describe them. unsigned long is more efficient and convenience. Signed-off-by: Alex Shi <alex.shi@intel.com> Reviewed-by: Paul Turner <pjt@google.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-10-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched: Consider runnable load average in move_tasks()Alex Shi
Aside from using runnable load average in background, move_tasks is also the key function in load balance. We need consider the runnable load average in it in order to make it an apple to apple load comparison. Morten had caught a div u64 bug on ARM, thanks! Thanks-to: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-8-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_taskAlex Shi
They are the base values in load balance, update them with rq runnable load average, then the load balance will consider runnable load avg naturally. We also try to include the blocked_load_avg as cpu load in balancing, but that cause kbuild performance drop 6% on every Intel machine, and aim7/oltp drop on some of 4 CPU sockets machines. Or only add blocked_load_avg into get_rq_runable_load, hackbench still drop a little on NHM EX. Signed-off-by: Alex Shi <alex.shi@intel.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-7-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched: Update cpu load after task_tickAlex Shi
To get the latest runnable info, we need do this cpuload update after task_tick. Signed-off-by: Alex Shi <alex.shi@intel.com> Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-6-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched: Fix sleep time double accounting in enqueue entityAlex Shi
The woken migrated task will __synchronize_entity_decay(se); in migrate_task_rq_fair, then it needs to set `se->avg.last_runnable_update -= (-se->avg.decay_count) << 20' before update_entity_load_avg, in order to avoid sleep time is updated twice for se.avg.load_avg_contrib in both __syncchronize and update_entity_load_avg. However if the sleeping task is woken up from the same cpu, it miss the last_runnable_update before update_entity_load_avg(se, 0, 1), then the sleep time was used twice in both functions. So we need to remove the double sleep time accounting. Paul also contributed some code comments in this commit. Signed-off-by: Alex Shi <alex.shi@intel.com> Reviewed-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-5-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched: Set an initial value of runnable avg for new forked taskAlex Shi
We need to initialize the se.avg.{decay_count, load_avg_contrib} for a new forked task. Otherwise random values of above variables cause a mess when a new task is enqueued: enqueue_task_fair enqueue_entity enqueue_entity_load_avg and make fork balancing imbalance due to incorrect load_avg_contrib. Further more, Morten Rasmussen notice some tasks were not launched at once after created. So Paul and Peter suggest giving a start value for new task runnable avg time same as sched_slice(). PeterZ said: > So the 'problem' is that our running avg is a 'floating' average; ie. it > decays with time. Now we have to guess about the future of our newly > spawned task -- something that is nigh impossible seeing these CPU > vendors keep refusing to implement the crystal ball instruction. > > So there's two asymptotic cases we want to deal well with; 1) the case > where the newly spawned program will be 'nearly' idle for its lifetime; > and 2) the case where its cpu-bound. > > Since we have to guess, we'll go for worst case and assume its > cpu-bound; now we don't want to make the avg so heavy adjusting to the > near-idle case takes forever. We want to be able to quickly adjust and > lower our running avg. > > Now we also don't want to make our avg too light, such that it gets > decremented just for the new task not having had a chance to run yet -- > even if when it would run, it would be more cpu-bound than not. > > So what we do is we make the initial avg of the same duration as that we > guess it takes to run each task on the system at least once -- aka > sched_slice(). > > Of course we can defeat this with wakeup/fork bombs, but in the 'normal' > case it should be good enough. Paul also contributed most of the code comments in this commit. Signed-off-by: Alex Shi <alex.shi@intel.com> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Paul Turner <pjt@google.com> [peterz; added explanation of sched_slice() usage] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-4-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27sched: Move a few runnable tg variables into CONFIG_SMPAlex Shi
The following 2 variables are only used under CONFIG_SMP, so its better to move their definiation into CONFIG_SMP too. atomic64_t load_avg; atomic_t runnable_avg; Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1371694737-29336-3-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-27Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for ↵Alex Shi
load-tracking" Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then we can use runnable load variables. Also remove 2 CONFIG_FAIR_GROUP_SCHED setting which is not in reverted patch(introduced in 9ee474f), but also need to revert. Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51CA76A3.3050207@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26Merge tag 'fcoe1' into fixesJames Bottomley
This patch fixes a critical bug that was introduced in 3.9 related to VLAN tagging FCoE frames.