summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-10-23perf ui progress: Per progress bar stateArnaldo Carvalho de Melo
That will ease using a progress bar across multiple functions, like in the upcoming patches that will present a progress bar when collapsing histograms. Based on a previous patch by Namhyung Kim. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-cr7lq7ud9fj21bg7wvq27w1u@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf ui: Rename ui_progress to ui_progress_opsArnaldo Carvalho de Melo
Reserving 'struct ui_progress' to the per progress instances, not to the particular set of operations used to implmenet a progress bar in the current UI (GTK, TUI, etc). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-zjqbfp9gx3yo45s0rp9uv42n@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23MAINTAINERS: add to ioatdma maintainer listDave Jiang
Signed-off-by: Dave Jiang <dave.jiang@intel.com> [djbw: add dmaengine list] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-10-23MAINTAINERS: add the new dmaengine mailing listVinod Koul
We have a new mailing list hosted by vger for dmaengine Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-10-23perf tools: Fix non-debug buildAdrian Hunter
In the absence of s DEBUG variable definition on the command line perf tools was building without optimization. Fix by assigning DEBUG if it is not defined. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382427258-17495-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf evlist: Validate that mmap_pages is not too bigAdrian Hunter
Amend perf_evlist__parse_mmap_pages() to check that the mmap_pages entered via the --mmap_pages/-m option is not too big. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382427258-17495-15-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf tools: Do not accept parse_tag_value() overflowAdrian Hunter
parse_tag_value() accepts an "unsigned long" and multiplies it according to a tag character. Do not accept the value if the multiplication overflows. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382427258-17495-14-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf inject: Do not repipe attributes to a perf.data fileAdrian Hunter
perf.data files contain the attributes separately, do not put them in the event stream as well. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382427258-17495-6-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf script: Make perf_script a local variableAdrian Hunter
Change perf_script from being global to being local. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382427258-17495-4-git-send-email-adrian.hunter@intel.com [ Made the minor consistency changes suggested by David Ahern ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf sched: Optimize build timeAdrian Hunter
builtin-sched.c took a log time to build with -O6 optimization. This turned out to be caused by: .curr_pid = { [0 ... MAX_CPUS - 1] = -1 }, Fix by initializing curr_pid programmatically. This addresses the problem cured in f36f83f947ed using a smaller hammer. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382427258-17495-13-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf sched: Make struct perf_sched sched a local variableAdrian Hunter
Change "struct perf_sched sched" from being global to being local. The build slowdown cured by f36f83f947ed is dealt with in the following patch, by programatically setting perf_sched.curr_pid. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382427258-17495-12-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23[SCSI] sd: call blk_pm_runtime_init before add_diskAaron Lu
Sujit has found a race condition that would make q->nr_pending unbalanced, it occurs as Sujit explained: " sd_probe_async() -> add_disk() -> disk_add_event() -> schedule(disk_events_workfn) sd_revalidate_disk() blk_pm_runtime_init() return; Let's say the disk_events_workfn() calls sd_check_events() which tries to send test_unit_ready() and because of sd_revalidate_disk() trying to send another commands the test_unit_ready() might be re-queued as the tagged command queuing is disabled. So the race condition is - Thread 1 | Thread 2 sd_revalidate_disk() | sd_check_events() ...nr_pending = 0 as q->dev = NULL| scsi_queue_insert() blk_runtime_pm_init() | blk_pm_requeue_request() -> | nr_pending = -1 since | q->dev != NULL " The problem is, the test_unit_ready request doesn't get counted the first time it is queued, so the later decrement of q->nr_pending in blk_pm_requeue_request makes it unbalanced. Fix this by calling blk_pm_runtime_init before add_disk so that all requests initiated there will all be counted. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Reported-and-tested-by: Sujit Reddy Thumma <sthumma@codeaurora.org> Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-23[SCSI] qla2xxx: Fix request queue null dereference.Chad Dupuis
If an invalid IOCB is returned on the response queue then the index into the request queue map could be invalid and could return to us a bogus value. This could cause us to try to deference an invalid pointer and cause an exception. If we encounter this condition, simply return as no context can be established for this response. Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-10-23perf bench: Change the procps visible command-name of invididual benchmark ↵Ingo Molnar
tests plus cleanups Before this patch, looking at 'perf bench sched pipe' behavior over 'top' only told us that something related to perf is running: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19934 mingo 20 0 54836 1296 952 R 18.6 0.0 0:00.56 perf 19935 mingo 20 0 54836 384 36 S 18.6 0.0 0:00.56 perf After the patch it's clearly visible what's going on: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19744 mingo 20 0 125m 3536 2644 R 68.2 0.0 0:01.12 sched-pipe 19745 mingo 20 0 125m 1172 276 R 68.2 0.0 0:01.12 sched-pipe The benchmark-subsystem name is concatenated with the individual testcase name. Unfortunately 'perf top' does not show the reconfigured name, possibly because it caches ->comm[] values and does not recognize changes to them? Also clean up a few bits in builtin-bench.c while at it and reorganize the code and the output strings to be consistent. Use iterators to access the various arrays. Rename 'suites' concept to 'benchmark collection' and the 'bench_suite' to 'benchmark/bench'. The many repetitions of 'suite' made the code harder to read and understand. The new output is: comet:~/tip/tools/perf> ./perf bench Usage: perf bench [<common options>] <collection> <benchmark> [<options>] # List of all available benchmark collections: sched: Scheduler and IPC benchmarks mem: Memory access benchmarks numa: NUMA scheduling and MM benchmarks all: All benchmarks comet:~/tip/tools/perf> ./perf bench sched # List of available benchmarks for collection 'sched': messaging: Benchmark for scheduling and IPC pipe: Benchmark for pipe() between two processes all: Test all scheduler benchmarks comet:~/tip/tools/perf> ./perf bench mem # List of available benchmarks for collection 'mem': memcpy: Benchmark for memcpy() memset: Benchmark for memset() tests all: Test all memory benchmarks comet:~/tip/tools/perf> ./perf bench numa # List of available benchmarks for collection 'numa': mem: Benchmark for NUMA workloads all: Test all NUMA benchmarks Individual benchmark modules were not touched. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20131023123756.GA17871@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf probe: Find fentry mcount fuzzed parameter locationMasami Hiramatsu
At this point, --fentry (mcount function entry) option for gcc fuzzes the debuginfo variable locations by skipping the mcount instruction offset (on x86, this is a 5 byte call instruction). This makes variable searching fail at the entry of functions which are mcount'ed. e.g.) Available variables at vfs_read @<vfs_read+0> (No matched variables) This patch adds additional location search at the function entry point to solve this issue, which tries to find the earliest address for the variable location. Note that this only works with function parameters (formal parameters) because any local variables should not exist on the function entry address (those are not initialized yet). With this patch, perf probe shows correct parameters if possible; # perf probe --vars vfs_read Available variables at vfs_read @<vfs_read+0> char* buf loff_t* pos size_t count struct file* file Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20131011071025.15557.13275.stgit@udc4-manage.rcp.hitachi.co.jp Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf probe: Support "$vars" meta argument syntax for local variablesMasami Hiramatsu
Support "$vars" meta argument syntax for tracing all local variables at probe point. Now you can trace all available local variables (including function parameters) at the probe point by passing $vars. # perf probe --add foo $vars This automatically finds all local variables at foo() and adds it as probe arguments. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20131011071023.15557.51770.stgit@udc4-manage.rcp.hitachi.co.jp Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf tools: Stop using 'self' in some more placesArnaldo Carvalho de Melo
As suggested by tglx, 'self' should be replaced by something that is more useful. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-fmblhc6tbb99tk1q8vowtsbj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf test: Consider PERF_SAMPLE_TRANSACTION in the "sample parsing" testArnaldo Carvalho de Melo
[root@sandy ~]# perf test -v 22 22: Test sample parsing : --- start --- sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating ---- end ---- Test sample parsing: FAILED! [root@sandy ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-cx83wuzz30m10m4s1xt0ocyq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23perf test: Clarify the "sample parsing" test entryArnaldo Carvalho de Melo
Before: [root@sandy ~]# perf test -v 22 22: Test sample parsing : --- start --- sample format has changed - test needs updating ---- end ---- Test sample parsing: FAILED! [root@sandy ~]# After: [root@sandy ~]# perf test -v 22 22: Test sample parsing : --- start --- sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating ---- end ---- Test sample parsing: FAILED! [root@sandy ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-8cazc2fpmk70jcbww8c0cobx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-23clockevents: Sanitize ticks to nsec conversionThomas Gleixner
Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use clockevents_config_and_register() where possible" caused a regression for some of the converted subarchs. The reason is, that the clockevents core code converts the minimal hardware tick delta to a nanosecond value for core internal usage. This conversion is affected by integer math rounding loss, so the backwards conversion to hardware ticks will likely result in a value which is less than the configured hardware limitation. The affected subarchs used their own workaround (SIGH!) which got lost in the conversion. The solution for the issue at hand is simple: adding evt->mult - 1 to the shifted value before the integer divison in the core conversion function takes care of it. But this only works for the case where for the scaled math mult/shift pair "mult <= 1 << shift" is true. For the case where "mult > 1 << shift" we can apply the rounding add only for the minimum delta value to make sure that the backward conversion is not less than the given hardware limit. For the upper bound we need to omit the rounding add, because the backwards conversion is always larger than the original latch value. That would violate the upper bound of the hardware device. Though looking closer at the details of that function reveals another bogosity: The upper bounds check is broken as well. Checking for a resulting "clc" value greater than KTIME_MAX after the conversion is pointless. The conversion does: u64 clc = (latch << evt->shift) / evt->mult; So there is no sanity check for (latch << evt->shift) exceeding the 64bit boundary. The latch argument is "unsigned long", so on a 64bit arch the handed in argument could easily lead to an unnoticed shift overflow. With the above rounding fix applied the calculation before the divison is: u64 clc = (latch << evt->shift) + evt->mult - 1; So we need to make sure, that neither the shift nor the rounding add is overflowing the u64 boundary. [ukl: move assignment to rnd after eventually changing mult, fix build issue and correct comment with the right math] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: nicolas.ferre@atmel.com Cc: Marc Pignat <marc.pignat@hevs.ch> Cc: john.stultz@linaro.org Cc: kernel@pengutronix.de Cc: Ronald Wahl <ronald.wahl@raritan.com> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
2013-10-23Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: * Convert callchain children list to rbtree, greatly reducing the time taken for callchain processing, from Namhyung Kim. * Add --max-stack option to limit callchain stack scan in 'top' and 'report', improving callchain processing when reducing the stack depth is an option, from Waiman Long. * Compare dso's also when comparing symbols, to avoid grouping together symbols with the same name but on different DSOs, fix from Namhyung Kim. * 'perf trace' now can can use a 'perf probe' wannabe tracepoint to hook into the userspace -> kernel pathname copy so that it can map fds to pathnames without reading /proc/pid/fd/ symlinks. From Arnaldo Carvalho de Melo. * 'perf trace' now emits hints as to why tracing is not possible, helping the user to setup the system to allow tracing in the desired permission granularity, telling if the problem is due to debugfs not being mounted or with not enough permission for !root, /proc/sys/kernel/perf_event_paranoit value, etc. From Arnaldo Carvalho de Melo. * Add missing 'mmap2' in evsel debug print, from Adrian Hunter. * Add missing decrement in id sample parsing, not a fix per se, just to avoid a problem whem somebody adds another field, from Adrian Hunter. * Improve write_output error message in 'perf record', from Adrian Hunter. * Add missing sample flush for piped events, fix from Adrian Hunter. * Add missing members to perf_event__attr_swap(), fix from Adrian Hunter. * Assorted fixes for 32-bit build, from Adrian Hunter * Print addr by default for BTS in 'perf script', from Adrian Juntmer * Separating data file properties from session, code reorganization from Jiri Olsa. * Show error in 'perf list' if tracepoints not available, from Pekka Enberg. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Several last minute bug fixes. Two of them are on the larger side for rc7, the dasd format patch for older storage devices and the store-clock-fast patch where we have been to optimistic with an optimization" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/time: correct use of store clock fast s390/vmlogrdr: fix array access in vmlogrdr_open() s390/compat,signal: fix return value of copy_siginfo_(to|from)_user32() s390/dasd: check for availability of prefix command during format s390/mm,kvm: fix software dirty bits vs. kvm for old machines
2013-10-23Merge branch 'for-rc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management fixes from Zhang Rui: "These includes several commits that are necessary to properly fix regression for TMU test MUX address setting after reset, for exynos thermal driver. Specifics: - fix a regression that the removal of setting a certain field at TMU configuration setting results in immediately shutdown after reset on Exynos4412 SoC. - revert a patch which tries to link the thermal_zone device and its hwmon node but breaks libsensors. - fix a deadlock/lockdep warning issue in x86_pkg_temp thermal driver, which can be reproduced on a buggy platform only. - fix ti-soc-thermal driver to fall back on bandgap reading when reading from PCB temperature sensor fails" * 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: Revert "drivers: thermal: parent virtual hwmon with thermal zone" drivers: thermal: allow ti-soc-thermal run without pcb zone thermal: exynos: Provide initial setting for TMU's test MUX address at Exynos4412 thermal: exynos: Provide separate TMU data for Exynos4412 thermal: exynos: Remove check for thermal device pointer at exynos_report_trigger() Thermal: x86_pkg_temp: change spin lock
2013-10-23platform/x86: fix asus-wmi build errorRandy Dunlap
Fix build error in asus_wmi.c when ASUS_WMI=y and ACPI_VIDEO=m by preventing that combination. drivers/built-in.o: In function `asus_wmi_probe': asus-wmi.c:(.text+0x65ddb4): undefined reference to `acpi_video_unregister' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-23bcache: Fixed incorrect order of arguments to bio_alloc_bioset()Kent Overstreet
Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-23Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - Compilation fixes for GCC < 4.4.6 - one Kbuild dependency select fix (selecting videobuf on msi3101) - driver fixes on tda10071, e4000, msi3101, soc_camera, s5p-jpeg, saa7134 and adv7511 - some device quirks needed to make them work properly - some videobuf2 core regression fixes for some features used only on embedded drivers * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] saa7134: Fix crash when device is closed before streamoff [media] adv7511: fix error return code in adv7511_probe() [media] ths8200: fix compilation with GCC < 4.4.6 [media] ad9389b: fix compilation with GCC < 4.4.6 [media] adv7511: fix compilation with GCC < 4.4.6 [media] adv7842: fix compilation with GCC < 4.4.6 [media] s5p-jpeg: Initialize vfd_decoder->vfl_dir field [media] videobuf2-dc: Fix support for mappings without struct page in userptr mode [media] vb2: Allow queuing OUTPUT buffers with zeroed 'bytesused' [media] mx3-camera: locking cleanup in mx3_videobuf_queue() [media] sh_vou: almost forever loop in sh_vou_try_fmt_vid_out() [media] tda10071: change firmware download condition [media] msi3101: correct max videobuf2 alloc [media] Add HCL T12Rg-H to STK webcam upside-down table [media] msi3101: Kconfig select VIDEOBUF2_VMALLOC [media] msi3101: msi3101_ioctl_ops can be static [media] e4000: fix PLL calc bug on 32-bit arch [media] uvcvideo: quirk PROBE_DEF for Microsoft Lifecam NX-3000 [media] uvcvideo: quirk PROBE_DEF for Dell SP2008WFP monitor
2013-10-23Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband bugfix from Roland Dreier: "Disable not-quite-ready userspace ABI for IB flow steering" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/core: Temporarily disable create_flow/destroy_flow uverbs
2013-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Sorry I let so much accumulate, I was in Buffalo and wanted a few things to cook in my tree for a while before sending to you. Anyways, it's a lot of little things as usual at this stage in the game" 1) Make bonding MAINTAINERS entry reflect reality, from Andy Gospodarek. 2) Fix accidental sock_put() on timewait mini sockets, from Eric Dumazet. 3) Fix crashes in l2tp due to mis-handling of ipv4 mapped ipv6 addresses, from François CACHEREUL. 4) Fix heap overflow in __audit_sockaddr(), from the eagle eyed Dan Carpenter. 5) tcp_shifted_skb() doesn't take handle FINs properly, from Eric Dumazet. 6) SFC driver bug fixes from Ben Hutchings. 7) Fix TX packet scheduling wedge after channel change in ath9k driver, from Felix Fietkau. 8) Fix user after free in BPF JIT code, from Alexei Starovoitov. 9) Source address selection test is reversed in __ip_route_output_key(), fix from Jiri Benc. 10) VLAN and CAN layer mis-size netlink attributes, from Marc Kleine-Budde. 11) Fix permission checks in sysctls to use current_euid() instead of current_uid(). From Eric W Biederman. 12) IPSEC policies can go away while a timer is still pending for them, add appropriate ref-counting to fix, from Steffen Klassert. 13) Fix mis-programming of FDR and RMCR registers on R8A7740 sh_eth chips, from Nguyen Hong Ky and Simon Horman. 14) MLX4 forgets to DMA unmap pages on RX, fix from Amir Vadai. 15) IPV6 GRE tunnel MTU upper limit is miscalculated, from Oussama Ghorbel. 16) Fix typo in fq_change(), we were assigning "initial quantum" to "quantum". From Eric Dumazet. 17) Set a more appropriate sk_pacing_rate for non-TCP sockets, otherwise FQ packet scheduler does not pace those flows properly. Also from Eric Dumazet. 18) rtlwifi miscalculates packet pointers, from Mark Cave-Ayland. 19) l2tp_xmit_skb() can be called from process context, not just softirq context, so we must always make sure to BH disable around it. From Eric Dumazet. 20) On qdisc reset, we forget to purge the RB tree of SKBs in netem packet scheduler. From Stephen Hemminger. 21) Fix info leak in farsync WAN driver ioctl() handler, from Dan Carpenter and Salva Peiró. 22) Fix PHY reset and other issues in dm9000 driver, from Nikita Kiryanov and Michael Abbott. 23) When hardware can do SCTP crc32 checksums, we accidently don't disable the csum offload when IPSEC transformations have been applied. From Fan Du and Vlad Yasevich. 24) Tail loss probing in TCP leaves the socket in the wrong congestion avoidance state. From Yuchung Cheng. 25) In CPSW driver, enable NAPI before interrupts are turned on, from Markus Pargmann. 26) Integer underflow and dual-assignment in YAM hamradio driver, from Dan Carpenter. 27) If we are going to mangle a packet in tcp_set_skb_tso_segs() we must unclone it. This fixes various hard to track down crashes in drivers where the SKBs ->gso_segs was changing right from underneath the driver during TX queueing. From Eric Dumazet. 28) Fix the handling of VLAN IDs, and in particular the special IDs 0 and 4095, in the bridging layer. From Toshiaki Makita. 29) Another info leak, this time in wanxl WAN driver, from Salva Peiró. 30) Fix race in socket credential passing, from Daniel Borkmann. 31) WHen NETLABEL is disabled, we don't validate CIPSO packets properly, from Seif Mazareeb. 32) Fix identification of fragmented frames in ipv4/ipv6 UDP Fragmentation Offload output paths, from Jiri Pirko. 33) Virtual Function fixes in bnx2x driver from Yuval Mintz and Ariel Elior. 34) When we removed the explicit neighbour pointer from ipv6 routes a slight regression was introduced for users such as IPVS, xt_TEE, and raw sockets. We mix up the users requested destination address with the routes assigned nexthop/gateway. From Julian Anastasov and Simon Horman. 35) Fix stack overruns in rt6_probe(), the issue is that can end up doing two full packet xmit paths at the same time when emitting neighbour discovery messages. From Hannes Frederic Sowa. 36) davinci_emac driver doesn't handle IFF_ALLMULTI correctly, from Mariusz Ceier. 37) Make sure to set TCP sk_pacing_rate after the first legitimate RTT sample, from Neal Cardwell. 38) Wrong netlink attribute passed to xfrm_replay_verify_len(), from Steffen Klassert. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits) ax88179_178a: Add VID:DID for Samsung USB Ethernet Adapter ax88179_178a: Correct the RX error definition in RX header Revert "bridge: only expire the mdb entry when query is received" tcp: initialize passive-side sk_pacing_rate after 3WHS davinci_emac.c: Fix IFF_ALLMULTI setup mac802154: correct a typo in ieee802154_alloc_device() prototype ipv6: probe routes asynchronous in rt6_probe netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper ipv6: fill rt6i_gateway with nexthop address ipv6: always prefer rt6i_gateway if present bnx2x: Set NETIF_F_HIGHDMA unconditionally bnx2x: Don't pretend during register dump bnx2x: Lock DMAE when used by statistic flow bnx2x: Prevent null pointer dereference on error flow bnx2x: Fix config when SR-IOV and iSCSI are enabled bnx2x: Fix Coalescing configuration bnx2x: Unlock VF-PF channel on MAC/VLAN config error bnx2x: Prevent an illegal pointer dereference during panic bnx2x: Fix Maximum CoS estimation for VFs drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing ...
2013-10-22ax88179_178a: Add VID:DID for Samsung USB Ethernet AdapterFreddy Xin
Add VID:DID for Samsung USB Ethernet Adapter. Signed-off-by: Freddy Xin <freddy@asix.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22ax88179_178a: Correct the RX error definition in RX headerFreddy Xin
Correct the definition of AX_RXHDR_CRC_ERR and AX_RXHDR_DROP_ERR. They are BIT29 and BIT31 in pkt_hdr seperately. Signed-off-by: Freddy Xin <freddy@asix.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22Revert "bridge: only expire the mdb entry when query is received"Linus Lüssing
While this commit was a good attempt to fix issues occuring when no multicast querier is present, this commit still has two more issues: 1) There are cases where mdb entries do not expire even if there is a querier present. The bridge will unnecessarily continue flooding multicast packets on the according ports. 2) Never removing an mdb entry could be exploited for a Denial of Service by an attacker on the local link, slowly, but steadily eating up all memory. Actually, this commit became obsolete with "bridge: disable snooping if there is no querier" (b00589af3b) which included fixes for a few more cases. Therefore reverting the following commits (the commit stated in the commit message plus three of its follow up fixes): ==================== Revert "bridge: update mdb expiration timer upon reports." This reverts commit f144febd93d5ee534fdf23505ab091b2b9088edc. Revert "bridge: do not call setup_timer() multiple times" This reverts commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1. Revert "bridge: fix some kernel warning in multicast timer" This reverts commit c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1. Revert "bridge: only expire the mdb entry when query is received" This reverts commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b. ==================== CC: Cong Wang <amwang@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22sony-laptop: don't change keyboard backlight settingsMattia Dongili
Do not touch keyboard backlight unless explicitly passed a module parameter. In this way we won't make wrong assumptions about what are good default values since they actually are different from model to model. The only side effect is that we won't know what is the current value until set via the sysfs attributes. Signed-off-by: Mattia Dongili <malattia@linux.it> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-22vfs: fix new kernel-doc warningsRandy Dunlap
Move kernel-doc notation to immediately before its function to eliminate kernel-doc warnings introduced by commit db14fc3abcd5 ("vfs: add d_walk()") Warning(fs/dcache.c:1343): No description found for parameter 'data' Warning(fs/dcache.c:1343): No description found for parameter 'dentry' Warning(fs/dcache.c:1343): Excess function parameter 'parent' description in 'check_mount' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-22fs/namei.c: fix new kernel-doc warningRandy Dunlap
Add @path parameter to fix kernel-doc warning. Also fix a spello/typo. Warning(fs/namei.c:2304): No description found for parameter 'path' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-22Merge tag 'sound-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "The pending last-minute ASoC fixes, all of which are driver-local (tlv320aic3x, rcar, pcm1681, pcm1792a, omap, fsl) and should be pretty safe to apply" * tag 'sound-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: Add MAINTAINERS entry for dmaengine helpers ASoC: pcm1792a: Fix max_register setting ASoC: pcm1681: Fix max_register setting ASoC: pcm1681: Fix max_register setting ASoC: rcar: fixup generation checker ASoC: tlv320aic3x: Connect 'Left Line1R Mux' and 'Right Line1L Mux' ASoC: fsl: imx-ssi: fix probe on imx31 ASoC: omap: Fix incorrect ARM dependency ASoC: fsl: Fix sound on mx31moboard ASoC: fsl_ssi: Fix irq_of_parse_and_map() return value check
2013-10-22Merge tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs bugfix from David Kleikamp: "Just a patch to fix an oops in an error path" * tag 'jfs-3.12' of git://github.com/kleikamp/linux-shaggy: jfs: fix error path in ialloc
2013-10-22Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "Travelling slowed down getting these out. Two vmwgfx fixes, a radeon revert to avoid a regression, i915 fixes, and some ioctl sizing issues fixed with 32 on 64" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/radeon/audio: don't set speaker allocation on DCE4+ drm/radeon: rework audio option drm/radeon/audio: don't set speaker allocation on DCE3.2 drm/radeon: make missing smc ucode non-fatal (CI) drm/radeon: make missing smc ucode non-fatal (r7xx-SI) drm/radeon/uvd: revert lower msg&fb buffer requirements on UVD3 drm/radeon: stop the leaks in cik_ib_test drm/radeon/atom: workaround vbios bug in transmitter table on rs780 drm/i915: Disable GGTT PTEs on GEN6+ suspend drm/i915: Make PTE valid encoding optional drm: Pad drm_mode_get_connector to 64-bit boundary drm: Prevent overwriting from userspace underallocating core ioctl structs drm/vmwgfx: Don't kill clients on VT switch drm/vmwgfx: Don't put resources with invalid id's on lru list drm/i915: disable LVDS clock gating on CPT v2
2013-10-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - a partial revert of exponent parsing changes to make "Unit" exponent item work properly again, by Nikolai Kondrashov - a few new device IDs additions piggy-backing, by AceLan Kao and David Herrmann * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: wiimote: add LEGO-wiimote VID HID: Fix unit exponent parsing again HID: usbhid: quirk for SiS Touchscreen HID: usbhid: quirk for Synaptics Large Touchccreen
2013-10-22Merge branch 'for-3.12-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "The only interesting bit is ata_eh_qc_retry() update which fixes a problem where a SG_IO command may fail across suspend/resume cycle without the command actually being at fault. Other changes are low level driver specific and fairly low impact" * 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libahci: fix turning on LEDs in ahci_start_port() libata: make ata_eh_qc_retry() bump scmd->allowed on bogus failures ahci_platform: use dev_info() instead of printk() ahci: use dev_info() instead of printk() pata_isapnp: Don't use invalid I/O ports
2013-10-22Merge branch 'for-3.12-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two late fixes for cgroup. One fixes descendant walk introduced during this rc1 cycle. The other fixes a post 3.9 bug during task attach which can lead to hang. Both fixes are critical and the fixes are relatively straight-forward" * 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix to break the while loop in cgroup_attach_task() correctly cgroup: fix cgroup post-order descendant walk of empty subtree
2013-10-22s390/time: correct use of store clock fastMartin Schwidefsky
The result of the store-clock-fast (STCKF) instruction is a bit fuzzy. It can happen that the value stored on one CPU is smaller than the value stored on another CPU, although the order of the stores is the other way around. This can cause deltas of get_tod_clock() values to become negative when they should not be. We need to be more careful with store-clock-fast, this patch partially reverts git commit e4b7b4238e666682555461fa52eecd74652f36bb "time: always use stckf instead of stck if available". The get_tod_clock() function now uses the store-clock-extended (STCKE) instruction. get_tod_clock_fast() can be used if the fuzziness of store-clock-fast is acceptable e.g. for wait loops local to a CPU. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-22Merge branch 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes Most just regression fixes for audio, dpm, and uvd, plus a resource leak fix for cik. * 'drm-fixes-3.12' of git://people.freedesktop.org/~agd5f/linux: drm/radeon/audio: don't set speaker allocation on DCE4+ drm/radeon: rework audio option drm/radeon/audio: don't set speaker allocation on DCE3.2 drm/radeon: make missing smc ucode non-fatal (CI) drm/radeon: make missing smc ucode non-fatal (r7xx-SI) drm/radeon/uvd: revert lower msg&fb buffer requirements on UVD3 drm/radeon: stop the leaks in cik_ib_test drm/radeon/atom: workaround vbios bug in transmitter table on rs780
2013-10-22Merge tag 'drm-intel-fixes-2013-10-21' of ↵Dave Airlie
git://people.freedesktop.org/~danvet/drm-intel into drm-fixes Just an lvds clock gating fix and a pte clearing hack for hsw to avoid memory corruption when hibernating - something doesn't seem to switch off properly, we're still investigating. * tag 'drm-intel-fixes-2013-10-21' of git://people.freedesktop.org/~danvet/drm-intel: (96 commits) drm/i915: Disable GGTT PTEs on GEN6+ suspend drm/i915: Make PTE valid encoding optional drm/i915: disable LVDS clock gating on CPT v2
2013-10-22intel_pstate: Correct calculation of min pstate valueDirk Brandewie
The minimum pstate is supposed to be a percentage of the maximum P state available. Calculate min using max pstate and not the current max which may have been limited by the user Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-22intel_pstate: Improve accuracy by not truncating until final resultBrennan Shacklett
This patch addresses Bug 60727 (https://bugzilla.kernel.org/show_bug.cgi?id=60727) which was due to the truncation of intermediate values in the calculations, which causes the code to consistently underestimate the current cpu frequency, specifically 100% cpu utilization was truncated down to the setpoint of 97%. This patch fixes the problem by keeping the results of all intermediate calculations as fixed point numbers rather scaling them back and forth between integers and fixed point. References: https://bugzilla.kernel.org/show_bug.cgi?id=60727 Signed-off-by: Brennan Shacklett <bpshacklett@gmail.com> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-21tcp: initialize passive-side sk_pacing_rate after 3WHSNeal Cardwell
For passive TCP connections, upon receiving the ACK that completes the 3WHS, make sure we set our pacing rate after we get our first RTT sample. On passive TCP connections, when we receive the ACK completing the 3WHS we do not take an RTT sample in tcp_ack(), but rather in tcp_synack_rtt_meas(). So upon receiving the ACK that completes the 3WHS, tcp_ack() leaves sk_pacing_rate at its initial value. Originally the initial sk_pacing_rate value was 0, so passive-side connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs made in the first RTT. With a default initial cwnd of 10 packets, this happened to be correct for RTTs 5ms or bigger, so it was hard to see problems in WAN or emulated WAN testing. Since 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing"), the initial sk_pacing_rate is 0xffffffff. So after that change, passive TCP connections were keeping this value (and using large numbers of segments per skbuff) until receiving an ACK for data. Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21davinci_emac.c: Fix IFF_ALLMULTI setupMariusz Ceier
When IFF_ALLMULTI flag is set on interface and IFF_PROMISC isn't, emac_dev_mcast_set should only enable RX of multicasts and reset MACHASH registers. It does this, but afterwards it either sets up multicast MACs filtering or disables RX of multicasts and resets MACHASH registers again, rendering IFF_ALLMULTI flag useless. This patch fixes emac_dev_mcast_set, so that multicast MACs filtering and disabling of RX of multicasts are skipped when IFF_ALLMULTI flag is set. Tested with kernel 2.6.37. Signed-off-by: Mariusz Ceier <mceier+kernel@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21mac802154: correct a typo in ieee802154_alloc_device() prototypeAlexandre Belloni
This has no other impact than a cosmetic one. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21ipv6: probe routes asynchronous in rt6_probeHannes Frederic Sowa
Routes need to be probed asynchronous otherwise the call stack gets exhausted when the kernel attemps to deliver another skb inline, like e.g. xt_TEE does, and we probe at the same time. We update neigh->updated still at once, otherwise we would send to many probes. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21Merge branch 'rt6i_gateway'David S. Miller
Julian Anastasov says: ==================== ipv6: use rt6i_gateway as nexthop The following patchset makes sure that rt6i_gateway contains valid nexthop information in all cases, so that we can use different nexthop for sending. The first patch is a simple fix that makes IPVS, TEE, RAW(hdrincl) and RTF_DYNAMIC(without RTF_GATEWAY) work as before 3.9. There is a single corner case not solved by this patch: RAW(hdrincl) or TEE using local address for nexthop, a silly feature, I guess. In this case we see zeroes in rt6i_gateway because we get route that is not cloned. This is solved only with patch 2. The second patch is an optimization that makes sure all resulting routes have rt6i_gateway filled, so that we can avoid the complex ipv6_addr_any() call added to rt6_nexthop() by patch 1. And it sets rt6i_gateway for local routes, a case not handled by patch 1. The third patch uses the new rt6_nexthop() function to fix the matching of gateways in the same way as commit bbb5823cf742a7 ("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper") fixes nf_conntrack_h323_main.c for IPv4. Currently, it depends on the new definition of rt6_nexthop() in patch 2. Actually, if patch 2 is applied, patch 3 becomes a cosmetic change. I see the following two alternatives for applying these patches: 1. Linger patch 2 in net-next to avoid surprises in the upcoming release. In this case patch 3 can be reworked not to depend on the new rt6_nexthop() definition in patch 2. I guess this is a better option, so that patch 2 can be reviewed and tested for longer time. 2. Include all 3 patches in net tree - more risky because this is my first attempt to change IPv6. Here is the situation as handled by patch 2: In IPv6 the resolved routes are always host routes (/128 with DST_HOST), mostly cloned ones. We allow routes in FIB to contain rt6i_gateway with zeroes (eg. for local subnets) but on cloning we can fill the rt6i_gateway field in result. This works even without this patchset. There is a single special case where dst is provided as skb_dst directly without a routing call: icmp6_dst_alloc(). It is a private dst allocated just for the particular ICMP packet. Patch 2 fills rt6i_gateway in this case, needed for the new rt6_nexthop() simplification. The last case is addrconf_dst_alloc(), it can put in FIB local/anycast routes when addresses are added. Patch 2 needs to fill rt6i_gateway in this case because such routes are returned without cloning. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>