summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-06-20genirq: Remove bogus restriction in irq_move_mask_irq()Thomas Gleixner
If an interrupt is marked with the no balancing flag, we still allow setting the affinity for such an interrupt from the kernel itself, but for interrupts which move the affinity from interrupt context via irq_move_mask_irq() this runs into a check for the no balancing flag, which in turn ends up with an endless storm of stack dumps because the move pending flag is not reset. Allow the move for interrupts which have the no balancing flag set and clear the move pending bit before checking for interrupts with the per cpu flag set. Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1506201002570.4107@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-20x86/hpet: Check for irq==0 when allocating hpet MSI interruptsJiang Liu
irq == 0 is not a valid irq for a irqdomain MSI allocation, but hpet code checks only for negative return values. Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/558447AF.30703@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-20Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Replace CTRL+z with 'f' as hotkey for enable/disable events (Arnaldo Carvalho de Melo) - Do not exit when 'f' is pressed in 'report' mode (Arnaldo Carvalho de Melo) - Tell the user how to unfreeze events after pressing 'f' in 'perf top' (Arnaldo Carvalho de Melo) - React to unassigned hotkey pressing in 'top/report' (Arnaldo Carvalho de Melo) - Display total number of samples with --show-total-period in 'annotate' (Martin Liška) - Add timeout to make procfs mmap processing more robust (Kan Liang) - Fix sort__sym_cmp to also compare end of symbol (Yannick Brosseau) Infrastructure changes: - Ensure thread-stack is flushed (Adrian Hunter) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-20Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fix from Arnaldo Carvalho de Melo: - Fix build breakage if prefix= is specified, this introduced a regression for a build idiom used by the Debian "linux-tools" package, that does: MAKE_PERF := $(MAKE) prefix=/usr V=1 ARCH=$(KERNEL_ARCH_PERF) ... Fix it. (Lukas Wunner) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19perf tools: Configurable per thread proc map processing time outKan Liang
The time out to limit the individual proc map processing was hard code to 500ms. This patch introduce a new option --proc-map-timeout to make the time limit configurable. Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ying Huang <ying.huang@intel.com> Link: http://lkml.kernel.org/r/1434549071-25611-2-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf tools: Add time out to force stop proc map processingKan Liang
System wide sampling like 'perf top' or 'perf record -a' read all threads /proc/xxx/maps before sampling. If there are any threads which generating a keeping growing huge maps, perf will do infinite loop during synthesizing. Nothing will be sampled. This patch fixes this issue by adding per-thread timeout to force stop this kind of endless proc map processing. PERF_RECORD_MISC_PROC_MAP_PARSE_TIME_OUT is introduced to indicate that the mmap record are truncated by time out. User will get warning notification when truncated mmap records are detected. Reported-by: Ying Huang <ying.huang@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ying Huang <ying.huang@intel.com> Link: http://lkml.kernel.org/r/1434549071-25611-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf report: Fix sort__sym_cmp to also compare end of symbolYannick Brosseau
When using a map file from a JIT, due to memory reuse, we can obtain multiple symbols with the same start address but a different length. The symbols__find does check for the end so not doing it in sort__sym_cmp was causing the hist_entry in the annotate part of a report to match to the wrong entry, causing a fatal error. Signed-off-by: Yannick Brosseau <scientist@fb.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/1434584470-17771-1-git-send-email-scientist@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf hists browser: React to unassigned hotkey pressingArnaldo Carvalho de Melo
When that happens we were just ignoring the key press, now this message is presented in the bottom line (the help line): "Press '?' for help on key bindings" Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-iyma2j5kj3q9i1stl4mfh90n@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf top: Tell the user how to unfreeze events after pressing 'f'Arnaldo Carvalho de Melo
When the user presses 'f' to disable events the visual cues are, well, the percentages not changing and the number of events freezing. Be more explicit by changing the help line at the bottom of the screen to show the following messages when 'f' is pressed: "Press 'f' again to re-enable the events" And then, when 'f' is pressed again: "Press 'f' to disable the events or 'h' Suggested-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-uhiswg9a9rxm5gxg7ptjskjn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf hists browser: Honour the help line provided by builtin-{top,report}.cArnaldo Carvalho de Melo
The hists_browser was replacing whatever helpline provided by 'top' or 'report' with a static "Press '?' for help on key bindings", fix it. Now the message passed by top appears at the bottom of the screen: "For a higher level overview, try: perf top --sort comm,dso" As well the message that will be added when the user presses 'f' to disable the events, something along the lines of "press f again to re-enable...". Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-dacaja70mbfz3a0yj1n180gx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf hists browser: Do not exit when 'f' is pressed in 'report' modeArnaldo Carvalho de Melo
The 'f' hotkey is only used when in 'top', dynamic mode, to enable/disable events, currently not making sense in the 'report', static mode, where we can't go from showing the histogram entries created from a perf.data file to adding more events after recreating the evlist created from the perf.data file, albeit possible, this is not implemented right now. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-lholzf472pu98dkkijggwx2m@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf top: Replace CTRL+z with 'f' as hotkey for enable/disable eventsArnaldo Carvalho de Melo
I.e. 'freeze'/'unfreeze', this is because CTRL+z has a well known action, i.e. suspend the app, perf needs to follow that convention, that will be done on a separate patch, tho. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-oedcl6ovohara4koig14ayip@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf tools: Fix build breakage if prefix= is specifiedLukas Wunner
Invoking Makefile.perf with prefix= breaks the build since Makefile.perf hands that variable down to Makefile.build where it overrides prefix := $(subst ./,,$(OUTPUT)$(dir)/) leading to errors like this: No rule to make target '/usrabspath.o', needed by '/usrlibperf-in.o' Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Fixes: c819e2cf2eb6f65d3208d195d7a0edef6108d5 Link: http://lkml.kernel.org/r/5582c48a.84a22b0a.a918.5285SMTPIN_ADDED_MISSING@mx.google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf annotate: Rename source_line_percent to source_line_samplesArnaldo Carvalho de Melo
To better reflect the purpose of this struct, that is to hold info about samples, its total number and is percentage. Cc: Martin Liska <mliska@suse.cz> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/n/tip-6bf8gwcl975uurl0ttpvtk69@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf annotate: Display total number of samples with --show-total-periodMartin Liška
To compare two records on an instruction base, with --show-total-period option provided, display total number of samples that belong to a line in assembly language. New hot key 't' is introduced for 'perf annotate' TUI. Signed-off-by: Martin Liska <mliska@suse.cz> Cc: Andi Kleen <andi@firstfloor.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/5583E26D.1040407@suse.cz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19perf tools: Ensure thread-stack is flushedAdrian Hunter
The thread-stack represents a thread's current stack. When a thread exits there can still be many functions on the stack e.g. exit() can be called many levels deep, so all the callers will never return. To get that information output, the thread-stack must be flushed. Previously it was assumed the thread-stack would be flushed when the struct thread was deleted. With thread ref-counting it is no longer clear when that will be, if ever. So instead explicitly flush all the thread-stacks at the end of a session. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1432906425-9911-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-19Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Michael Turquette: "Very late clk regression fixes for the ARM-based AT91 platform. These went unnoticed by me until recently, hence the late pull request" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: at91: fix h32mx prototype inclusion in pmc header clk: at91: trivial: typo in peripheral clock description clk: at91: fix PERIPHERAL_MAX_SHIFT definition clk: at91: pll: fix input range validity check
2015-06-19Merge tag 'sound-4.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Nothing looks scary, just a few usual HD-audio regression fixes and fixup, in addition to a minor Kconfig dependency fix for the old MIPS drivers" * tag 'sound-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix unused label skip_i915 ALSA: hda - Fix noisy outputs on Dell XPS13 (2015 model) ALSA: mips: let SND_SGI_O2 select SND_PCM ALSA: hda - Fix audio crackles on Dell Latitude E7x40 ALSA: hda - adding a DAC/pin preference map for a HP Envy TS machine
2015-06-19Merge branch 'ccf/atmel-fixes-for-4.1' of ↵Michael Turquette
https://github.com/bbrezillon/linux-at91 into clk-fixes
2015-06-19crypto: marvell/cesa - add DT bindings documentationBoris BREZILLON
Add DT bindings documentation for the new marvell-cesa driver. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add support for Kirkwood and Dove SoCsArnaud Ebalard
Add the Kirkwood and Dove SoC descriptions, and control the allhwsupport module parameter to avoid probing the CESA IP when the old CESA driver is enabled (unless it is explicitly requested to do so). Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add support for Orion SoCsBoris BREZILLON
Add the Orion SoC description, and select this implementation by default to support non-DT probing: Orion is the only platform where non-DT boards are declaring the CESA block. Control the allhwsupport module parameter to avoid probing the CESA IP when the old CESA driver is enabled (unless it is explicitly requested to do so). Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add allhwsupport module parameterBoris BREZILLON
The old and new marvell CESA drivers both support Orion and Kirkwood SoCs. Add a module parameter to choose whether these SoCs should be attached to the new or the old driver. The default policy is to keep attaching those IPs to the old driver if it is enabled, until we decide the new CESA driver is stable/secure enough. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add support for all armada SoCsBoris BREZILLON
Add CESA IP description for all the missing armada SoCs (XP, 375 and 38x). Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add SHA256 supportArnaud Ebalard
Add support for SHA256 operations. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add MD5 supportArnaud Ebalard
Add support for MD5 operations. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add Triple-DES supportArnaud Ebalard
Add support for Triple-DES operations. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add DES supportBoris BREZILLON
Add support for DES operations. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add TDMA supportBoris BREZILLON
The CESA IP supports CPU offload through a dedicated DMA engine (TDMA) which can control the crypto block. When you use this mode, all the required data (operation metadata and payload data) are transferred using DMA, and the results are retrieved through DMA when possible (hash results are not retrieved through DMA yet), thus reducing the involvement of the CPU and providing better performances in most cases (for small requests, the cost of DMA preparation might exceed the performance gain). Note that some CESA IPs do not embed this dedicated DMA, hence the activation of this feature on a per platform basis. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: marvell/cesa - add a new driver for Marvell's CESABoris BREZILLON
The existing mv_cesa driver supports some features of the CESA IP but is quite limited, and reworking it to support new features (like involving the TDMA engine to offload the CPU) is almost impossible. This driver has been rewritten from scratch to take those new features into account. This commit introduce the base infrastructure allowing us to add support for DMA optimization. It also includes support for one hash (SHA1) and one cipher (AES) algorithm, and enable those features on the Armada 370 SoC. Other algorithms and platforms will be added later on. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: mv_cesa - explicitly define kirkwood and dove compatible stringsBoris BREZILLON
We are about to add a new driver to support new features like using the TDMA engine to offload the CPU. Orion, Dove and Kirkwood platforms are already using the mv_cesa driver, but Orion SoCs do not embed the TDMA engine, which means we will have to differentiate them if we want to get TDMA support on Dove and Kirkwood. In the other hand, the migration from the old driver to the new one is not something all people are willing to do without first auditing the new driver. Hence we have to support the new compatible in the mv_cesa driver so that new platforms with updated DTs can still attach their crypto engine device to this driver. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: mv_cesa - use gen_pool to reserve the SRAM memory regionBoris BREZILLON
The mv_cesa driver currently expects the SRAM memory region to be passed as a platform device resource. This approach implies two drawbacks: - the DT representation is wrong - the only one that can access the SRAM is the crypto engine The last point is particularly annoying in some cases: for example on armada 370, a small region of the crypto SRAM is used to implement the cpuidle, which means you would not be able to enable both cpuidle and the CESA driver. To address that problem, we explicitly define the SRAM device in the DT and then reference the sram node from the crypto engine node. Also note that the old way of retrieving the SRAM memory region is still supported, or in other words, backward compatibility is preserved. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19crypto: mv_cesa - document the clocks propertyBoris BREZILLON
On Dove platforms, the crypto engine requires a clock. Document this clocks property in the mv_cesa bindings doc. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-06-19Merge branch 'mvebu/drivers' of ↵Herbert Xu
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Merge the mvebu/drivers branch of the arm-soc tree which contains just a single patch bfa1ce5f38938cc9e6c7f2d1011f88eba2b9e2b2 ("bus: mvebu-mbus: add mv_mbus_dram_info_nooverlap()") that happens to be a prerequisite of the new marvell/cesa crypto driver.
2015-06-19x86/boot: Fix overflow warning with 32-bit binutilsBorislav Petkov
When building the kernel with 32-bit binutils built with support only for the i386 target, we get the following warning: arch/x86/kernel/head_32.S:66: Warning: shift count out of range (32 is not between 0 and 31) The problem is that in that case, binutils' internal type representation is 32-bit wide and the shift range overflows. In order to fix this, manipulate the shift expression which creates the 4GiB constant to not overflow the shift count. Suggested-by: Michael Matz <matz@suse.de> Reported-and-tested-by: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19clk: at91: fix h32mx prototype inclusion in pmc headerNicolas Ferre
Trivial fix that prevents to compile this pmc clock driver if h32mx clock is present but smd clock isn't. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Fixes: bcc5fd49a0fd ("clk: at91: add a driver for the h32mx clock") Cc: <stable@vger.kernel.org> # 3.18+
2015-06-19clk: at91: trivial: typo in peripheral clock descriptionNicolas Ferre
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2015-06-19timer: Minimize nohz off overheadThomas Gleixner
If nohz is disabled on the kernel command line the [hr]timer code still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty pointless exercise. Cache nohz_active in [hr]timer per cpu bases and avoid the overhead. Before: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu After: 48.73% hog [.] main 15.36% [kernel] [k] _raw_spin_lock_irqsave 9.77% [kernel] [k] _raw_spin_unlock_irqrestore 6.61% [kernel] [k] lock_timer_base.isra.38 6.42% [kernel] [k] mod_timer 3.90% [kernel] [k] detach_if_pending 3.76% [kernel] [k] del_timer 2.41% [kernel] [k] internal_add_timer 1.39% [kernel] [k] __internal_add_timer 0.76% [kernel] [k] timerfn We probably should have a cached value for nohz full in the per cpu bases as well to avoid the cpumask check. The base cache line is hot already, the cpumask not necessarily. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Reduce timer migration overhead if disabledThomas Gleixner
Eric reported that the timer_migration sysctl is not really nice performance wise as it needs to check at every timer insertion whether the feature is enabled or not. Further the check does not live in the timer code, so we have an extra function call which checks an extra cache line to figure out that it is disabled. We can do better and store that information in the per cpu (hr)timer bases. I pondered to use a static key, but that's a nightmare to update from the nohz code and the timer base cache line is hot anyway when we select a timer base. The old logic enabled the timer migration unconditionally if CONFIG_NO_HZ was set even if nohz was disabled on the kernel command line. With this modification, we start off with migration disabled. The user visible sysctl is still set to enabled. If the kernel switches to NOHZ migration is enabled, if the user did not disable it via the sysctl prior to the switch. If nohz=off is on the kernel command line, migration stays disabled no matter what. Before: 47.76% hog [.] main 14.84% [kernel] [k] _raw_spin_lock_irqsave 9.55% [kernel] [k] _raw_spin_unlock_irqrestore 6.71% [kernel] [k] mod_timer 6.24% [kernel] [k] lock_timer_base.isra.38 3.76% [kernel] [k] detach_if_pending 3.71% [kernel] [k] del_timer 2.50% [kernel] [k] internal_add_timer 1.51% [kernel] [k] get_nohz_timer_target 1.28% [kernel] [k] __internal_add_timer 0.78% [kernel] [k] timerfn 0.48% [kernel] [k] wake_up_nohz_cpu After: 48.10% hog [.] main 15.25% [kernel] [k] _raw_spin_lock_irqsave 9.76% [kernel] [k] _raw_spin_unlock_irqrestore 6.50% [kernel] [k] mod_timer 6.44% [kernel] [k] lock_timer_base.isra.38 3.87% [kernel] [k] detach_if_pending 3.80% [kernel] [k] del_timer 2.67% [kernel] [k] internal_add_timer 1.33% [kernel] [k] __internal_add_timer 0.73% [kernel] [k] timerfn 0.54% [kernel] [k] wake_up_nohz_cpu Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Stats: Simplify the flags handlingThomas Gleixner
Simplify the handling of the flag storage for the timer statistics. No intermediate storage anymore. Just hand over the flags field. I left the printout of 'deferrable' for now because changing this would be an ABI update and I have no idea how strong people feel about that. OTOH, I wonder whether we should kill the whole timer stats stuff because all of that information can be retrieved via ftrace/perf as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224512.046626248@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Replace timer base by a cpu indexThomas Gleixner
Instead of storing a pointer to the per cpu tvec_base we can simply cache a CPU index in the timer_list and use that to get hold of the correct per cpu tvec_base. This is only used in lock_timer_base() and the slightly larger code is peanuts versus the spinlock operation and the d-cache foot print of the timer wheel. Aside of that this allows to get rid of following nuisances: - boot_tvec_base That statically allocated 4k bss data is just kept around so the timer has a home when it gets statically initialized. It serves no other purpose. With the CPU index we assign the timer to CPU0 at static initialization time and therefor can avoid the whole boot_tvec_base dance. That also simplifies the init code, which just can use the per cpu base. Before: text data bss dec hex filename 17491 9201 4160 30852 7884 ../build/kernel/time/timer.o After: text data bss dec hex filename 17440 9193 0 26633 6809 ../build/kernel/time/timer.o - Overloading the base pointer with various flags The CPU index has enough space to hold the flags (deferrable, irqsafe) so we can get rid of the extra masking and bit fiddling with the base pointer. As a benefit we reduce the size of struct timer_list on 64 bit machines. 4 - 8 bytes, a size reduction up to 15% per struct timer_list, which is a real win as we have tons of them embedded in other structs. This changes also the newly added deferrable printout of the timer start trace point to capture and print all timer->flags, which allows us to decode the target cpu of the timer as well. We might have used bitfields for this, but that would change the static initializers and the init function for no value to accomodate big endian bitfields. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Badhri Jagan Sridharan <Badhri@google.com> Link: http://lkml.kernel.org/r/20150526224511.950084301@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Use hlist for the timer wheel hash bucketsThomas Gleixner
This reduces the size of struct tvec_base by 50% and results in slightly smaller code as well. Before: struct tvec_base: size: 8256, cachelines: 129 text data bss dec hex filename 17698 13297 8256 39251 9953 ../build/kernel/time/timer.o After: struct tvec_base: 4160, cachelines: 65 text data bss dec hex filename 17491 9201 4160 30852 7884 ../build/kernel/time/timer.o Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224511.854731214@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timer: Remove FIFO "guarantee"Thomas Gleixner
The FIFO guarantee is only there if two timers are queued into the same bucket at the same jiffie on the same cpu: - The slack value depends on the delta between expiry and enqueue time, so the resulting expiry time can be different for timers which are queued in different jiffies. - Timers which are queued into the secondary array end up after a later queued timer which was queued into the primary array due to cascading. - Timers can end up on different cpus due to the NOHZ target moving around. Obviously there is no guarantee of expiry ordering between cpus. So anything which relies on FIFO behaviour of the timer wheel is broken already. This is a preparatory patch for converting the timer wheel to hlist which reduces the memory foot print of the wheel by 50%. It's a seperate patch so any (unlikely to happen) regression caused by this can be identified clearly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Cc: George Spelvin <linux@horizon.com> Link: http://lkml.kernel.org/r/20150526224511.757520403@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19timers: Sanitize catchup_timer_jiffies() usageThomas Gleixner
catchup_timer_jiffies() has been applied blindly to several functions without looking for possible better ways to do it. 1) internal_add_timer() Move the update to base->all_timers before we actually insert the timer into the wheel. 2) detach_if_pending() Again the update to base->all_timers allows us to explicitely do the timer_jiffies update in place, if this was the last timer which got removed. 3) __run_timers() We only check on entry, which is silly, because base->timer_jiffies can be behind - especially on NOHZ kernels - and if there is a single deferrable timer somewhere between base->timer_jiffies and jiffies we expire it and then loop until base->timer_jiffies == jiffies. Move it into the loop. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Joonwoo Park <joonwoop@codeaurora.org> Cc: Wenbo Wang <wenbo.wang@memblaze.com> Link: http://lkml.kernel.org/r/20150526224511.662994644@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19clk: at91: fix PERIPHERAL_MAX_SHIFT definitionBoris Brezillon
Fix the PERIPHERAL_MAX_SHIFT definition (3 instead of 4) and adapt the round_rate and set_rate logic accordingly. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reported-by: "Wu, Songjun" <Songjun.Wu@atmel.com>
2015-06-19clk: at91: pll: fix input range validity checkBoris Brezillon
The PLL impose a certain input range to work correctly, but it appears that this input range does not apply on the input clock (or parent clock) but on the input clock after it has passed the PLL divisor. Fix the implementation accordingly. Cc: <stable@vger.kernel.org> # v3.14+ Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Reported-by: Jonas Andersson <jonas@microbit.se>
2015-06-19sched/deadline: Remove needless parameter in dl_runtime_exceeded()Zhiqiang Zhang
Sine commit 269ad8015a6b ("sched/deadline: Avoid double-accounting in case of missed deadlines), parameter 'rq' is no longer used, so remove it. Signed-off-by: Zhiqiang Zhang <zhangzhiqiang.zhang@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <juri.lelli@gmail.com> Cc: <luca.abeni@unitn.it> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1434338120-43773-1-git-send-email-zhangzhiqiang.zhang@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched: Remove superfluous resetting of the p->dl_throttled flagWanpeng Li
Resetting the p->dl_throttled flag in rt_mutex_setprio() (for a task that is going to be boosted) is superfluous, as the natural place to do so is in replenish_dl_entity(). If the task was on the runqueue and it is boosted by a DL task, it will be enqueued back with ENQUEUE_REPLENISH flag set, which can guarantee that dl_throttled is reset in replenish_dl_entity(). This patch drops the resetting of throttled status in function rt_mutex_setprio(). Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431496867-4194-6-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched/deadline: Drop duplicate init_sched_dl_class() declarationWanpeng Li
There are two init_sched_dl_class() declarations, this patch drops the duplicate. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431496867-4194-5-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19sched/deadline: Reduce rq lock contention by eliminating locking of ↵Wanpeng Li
non-feasible target This patch adds a check that prevents futile attempts to move DL tasks to a CPU with active tasks of equal or earlier deadline. The same behavior as commit 80e3d87b2c55 ("sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target") for rt class. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431496867-4194-3-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>