summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-07-12test_sysctl: test against int proc_dointvec() array supportLuis R. Rodriguez
Add a few initial respective tests for an array: o Echoing values separated by spaces works o Echoing only first elements will set first elements o Confirm PAGE_SIZE limit still applies even if an array is used Link: http://lkml.kernel.org/r/20170630224431.17374-7-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12test_sysctl: add simple proc_douintvec() caseLuis R. Rodriguez
Test against a simple proc_douintvec() case. While at it, add a test against UINT_MAX. Make sure UINT_MAX works, and UINT_MAX+1 will fail and that negative values are not accepted. Link: http://lkml.kernel.org/r/20170630224431.17374-6-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12test_sysctl: add simple proc_dointvec() caseLuis R. Rodriguez
Test against a simple proc_dointvec() case. While at it, add a test against INT_MAX. Make sure INT_MAX works, and INT_MAX+1 will fail. Also test negative values work. Link: http://lkml.kernel.org/r/20170630224431.17374-5-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12test_sysctl: test against PAGE_SIZE for intLuis R. Rodriguez
Add the following tests to ensure we do not regress: o Test using a buffer full of space (PAGE_SIZE-1) followed by a single digit works o Test using a buffer full of spaces (PAGE_SIZE or over) will fail As tests increase instead of unloading the module and reloading it we can just do a shell reset_vals() with a reset to values we know are set at init on the driver. Link: http://lkml.kernel.org/r/20170630224431.17374-4-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12test_sysctl: add generic script to expand on testsLuis R. Rodriguez
This adds a generic script to let us more easily add more tests cases. Since we really have only two types of tests cases just fold them into the one file. Each test unit is now identified into its separate function: # ./sysctl.sh -l Test ID list: TEST_ID x NUM_TEST TEST_ID: Test ID NUM_TESTS: Number of recommended times to run the test 0001 x 1 - tests proc_dointvec_minmax() 0002 x 1 - tests proc_dostring() For now we start off with what we had before, and run only each test once. We can now watch a test case until it fails: ./sysctl.sh -w 0002 We can also run a test case x number of times, say we want to run a test case 100 times: ./sysctl.sh -c 0001 100 To run a test case only once, for example: ./sysctl.sh -s 0002 The default settings are specified at the top of sysctl.sh. Link: http://lkml.kernel.org/r/20170630224431.17374-3-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12test_sysctl: add dedicated proc sysctl test driverLuis R. Rodriguez
The existing tools/testing/selftests/sysctl/ tests include two test cases, but these use existing production kernel sysctl interfaces. We want to expand test coverage but we can't just be looking for random safe production values to poke at, that's just insane! Instead just dedicate a test driver for debugging purposes and port the existing scripts to use it. This will make it easier for further tests to be added. Subsequent patches will extend our test coverage for sysctl. The stress test driver uses a new license (GPL on Linux, copyleft-next outside of Linux). Linus was fine with this [0] and later due to Ted's and Alans's request ironed out an "or" language clause to use [1] which is already present upstream. [0] https://lkml.kernel.org/r/CA+55aFyhxcvD+q7tp+-yrSFDKfR0mOHgyEAe=f_94aKLsOu0Og@mail.gmail.com [1] https://lkml.kernel.org/r/1495234558.7848.122.camel@linux.intel.com Link: http://lkml.kernel.org/r/20170630224431.17374-2-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12sysctl: add unsigned int range supportLuis R. Rodriguez
To keep parity with regular int interfaces provide the an unsigned int proc_douintvec_minmax() which allows you to specify a range of allowed valid numbers. Adding proc_douintvec_minmax_sysadmin() is easy but we can wait for an actual user for that. Link: http://lkml.kernel.org/r/20170519033554.18592-6-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12sysctl: simplify unsigned int supportLuis R. Rodriguez
Commit e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields") added proc_douintvec() to start help adding support for unsigned int, this however was only half the work needed. Two fixes have come in since then for the following issues: o Printing the values shows a negative value, this happens since do_proc_dointvec() and this uses proc_put_long() This was fixed by commit 5380e5644afbba9 ("sysctl: don't print negative flag for proc_douintvec"). o We can easily wrap around the int values: UINT_MAX is 4294967295, if we echo in 4294967295 + 1 we end up with 0, using 4294967295 + 2 we end up with 1. o We echo negative values in and they are accepted This was fixed by commit 425fffd886ba ("sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec"). It still also failed to be added to sysctl_check_table()... instead of adding it with the current implementation just provide a proper and simplified unsigned int support without any array unsigned int support with no negative support at all. Historically sysctl proc helpers have supported arrays, due to the complexity this adds though we've taken a step back to evaluate array users to determine if its worth upkeeping for unsigned int. An evaluation using Coccinelle has been done to perform a grammatical search to ask ourselves: o How many sysctl proc_dointvec() (int) users exist which likely should be moved over to proc_douintvec() (unsigned int) ? Answer: about 8 - Of these how many are array users ? Answer: Probably only 1 o How many sysctl array users exist ? Answer: about 12 This last question gives us an idea just how popular arrays: they are not. Array support should probably just be kept for strings. The identified uint ports are: drivers/infiniband/core/ucma.c - max_backlog drivers/infiniband/core/iwcm.c - default_backlog net/core/sysctl_net_core.c - rps_sock_flow_sysctl() net/netfilter/nf_conntrack_timestamp.c - nf_conntrack_timestamp -- bool net/netfilter/nf_conntrack_acct.c nf_conntrack_acct -- bool net/netfilter/nf_conntrack_ecache.c - nf_conntrack_events -- bool net/netfilter/nf_conntrack_helper.c - nf_conntrack_helper -- bool net/phonet/sysctl.c proc_local_port_range() The only possible array users is proc_local_port_range() but it does not seem worth it to add array support just for this given the range support works just as well. Unsigned int support should be desirable more for when you *need* more than INT_MAX or using int min/max support then does not suffice for your ranges. If you forget and by mistake happen to register an unsigned int proc entry with an array, the driver will fail and you will get something as follows: sysctl table check failed: debug/test_sysctl//uint_0002 array now allowed CPU: 2 PID: 1342 Comm: modprobe Tainted: G W E <etc> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS <etc> Call Trace: dump_stack+0x63/0x81 __register_sysctl_table+0x350/0x650 ? kmem_cache_alloc_trace+0x107/0x240 __register_sysctl_paths+0x1b3/0x1e0 ? 0xffffffffc005f000 register_sysctl_table+0x1f/0x30 test_sysctl_init+0x10/0x1000 [test_sysctl] do_one_initcall+0x52/0x1a0 ? kmem_cache_alloc_trace+0x107/0x240 do_init_module+0x5f/0x200 load_module+0x1867/0x1bd0 ? __symbol_put+0x60/0x60 SYSC_finit_module+0xdf/0x110 SyS_finit_module+0xe/0x10 entry_SYSCALL_64_fastpath+0x1e/0xad RIP: 0033:0x7f042b22d119 <etc> Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields") Link: http://lkml.kernel.org/r/20170519033554.18592-5-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Suggested-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Liping Zhang <zlpnobody@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12sysctl: fold sysctl_writes_strict checks into helperLuis R. Rodriguez
The mode sysctl_writes_strict positional checks keep being copy and pasted as we add new proc handlers. Just add a helper to avoid code duplication. Link: http://lkml.kernel.org/r/20170519033554.18592-4-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Suggested-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12sysctl: kdoc'ify sysctl_writes_strictLuis R. Rodriguez
Document the different sysctl_writes_strict modes in code. Link: http://lkml.kernel.org/r/20170519033554.18592-3-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12sysctl: fix lax sysctl_check_table() sanity checkLuis R. Rodriguez
Patch series "sysctl: few fixes", v5. I've been working on making kmod more deterministic, and as I did that I couldn't help but notice a few issues with sysctl. My end goal was just to fix unsigned int support, which back then was completely broken. Liping Zhang has sent up small atomic fixes, however it still missed yet one more fix and Alexey Dobriyan had also suggested to just drop array support given its complexity. I have inspected array support using Coccinelle and indeed its not that popular, so if in fact we can avoid it for new interfaces, I agree its best. I did develop a sysctl stress driver but will hold that off for another series. This patch (of 5): Commit 7c60c48f58a7 ("sysctl: Improve the sysctl sanity checks") improved sanity checks considerbly, however the enhancements on sysctl_check_table() meant adding a functional change so that only the last table entry's sanity error is propagated. It also changed the way errors were propagated so that each new check reset the err value, this means only last sanity check computed is used for an error. This has been in the kernel since v3.4 days. Fix this by carrying on errors from previous checks and iterations as we traverse the table and ensuring we keep any error from previous checks. We keep iterating on the table even if an error is found so we can complain for all errors found in one shot. This works as -EINVAL is always returned on error anyway, and the check for error is any non-zero value. Fixes: 7c60c48f58a7 ("sysctl: Improve the sysctl sanity checks") Link: http://lkml.kernel.org/r/20170519033554.18592-2-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12kexec/kdump: minor Documentation updates for arm64 and ImageBharat Bhushan
Minor updates in Documentation for arm64 as relocatable kernel. Also this patch updates documentation for using uncompressed image "Image" which is used for ARM64. Link: http://lkml.kernel.org/r/1495104793-6563-1-git-send-email-Bharat.Bhushan@nxp.com Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Pratyush Anand <panand@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12kdump: protect vmcoreinfo data under the crash memoryXunlei Pang
Currently vmcoreinfo data is updated at boot time subsys_initcall(), it has the risk of being modified by some wrong code during system is running. As a result, vmcore dumped may contain the wrong vmcoreinfo. Later on, when using "crash", "makedumpfile", etc utility to parse this vmcore, we probably will get "Segmentation fault" or other unexpected errors. E.g. 1) wrong code overwrites vmcoreinfo_data; 2) further crashes the system; 3) trigger kdump, then we obviously will fail to recognize the crash context correctly due to the corrupted vmcoreinfo. Now except for vmcoreinfo, all the crash data is well protected(including the cpu note which is fully updated in the crash path, thus its correctness is guaranteed). Given that vmcoreinfo data is a large chunk prepared for kdump, we better protect it as well. To solve this, we relocate and copy vmcoreinfo_data to the crash memory when kdump is loading via kexec syscalls. Because the whole crash memory will be protected by existing arch_kexec_protect_crashkres() mechanism, we naturally protect vmcoreinfo_data from write(even read) access under kernel direct mapping after kdump is loaded. Since kdump is usually loaded at the very early stage after boot, we can trust the correctness of the vmcoreinfo data copied. On the other hand, we still need to operate the vmcoreinfo safe copy when crash happens to generate vmcoreinfo_note again, we rely on vmap() to map out a new kernel virtual address and update to use this new one instead in the following crash_save_vmcoreinfo(). BTW, we do not touch vmcoreinfo_note, because it will be fully updated using the protected vmcoreinfo_data after crash which is surely correct just like the cpu crash note. Link: http://lkml.kernel.org/r/1493281021-20737-3-git-send-email-xlpang@redhat.com Signed-off-by: Xunlei Pang <xlpang@redhat.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Juergen Gross <jgross@suse.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12powerpc/fadump: use the correct VMCOREINFO_NOTE_SIZE for phdrXunlei Pang
vmcoreinfo_max_size stands for the vmcoreinfo_data, the correct one we should use is vmcoreinfo_note whose total size is VMCOREINFO_NOTE_SIZE. Like explained in commit 77019967f06b ("kdump: fix exported size of vmcoreinfo note"), it should not affect the actual function, but we better fix it, also this change should be safe and backward compatible. After this, we can get rid of variable vmcoreinfo_max_size, let's use the corresponding macros directly, fewer variables means more safety for vmcoreinfo operation. [xlpang@redhat.com: fix build warning] Link: http://lkml.kernel.org/r/1494830606-27736-1-git-send-email-xlpang@redhat.com Link: http://lkml.kernel.org/r/1493281021-20737-2-git-send-email-xlpang@redhat.com Signed-off-by: Xunlei Pang <xlpang@redhat.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Juergen Gross <jgross@suse.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12kexec: move vmcoreinfo out of the kernel's .bss sectionXunlei Pang
As Eric said, "what we need to do is move the variable vmcoreinfo_note out of the kernel's .bss section. And modify the code to regenerate and keep this information in something like the control page. Definitely something like this needs a page all to itself, and ideally far away from any other kernel data structures. I clearly was not watching closely the data someone decided to keep this silly thing in the kernel's .bss section." This patch allocates extra pages for these vmcoreinfo_XXX variables, one advantage is that it enhances some safety of vmcoreinfo, because vmcoreinfo now is kept far away from other kernel data structures. Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com Signed-off-by: Xunlei Pang <xlpang@redhat.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12kernel/fork.c: virtually mapped stacks: do not disable interruptsChristoph Lameter
The reason to disable interrupts seems to be to avoid switching to a different processor while handling per cpu data using individual loads and stores. If we use per cpu RMV primitives we will not have to disable interrupts. Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.org Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12mm/memory.c: mark create_huge_pmd() inline to prevent build failureGeert Uytterhoeven
With gcc 4.1.2: mm/memory.o: In function `create_huge_pmd': memory.c:(.text+0x93e): undefined reference to `do_huge_pmd_anonymous_page' Interestingly, create_huge_pmd() is emitted in the assembler output, but never called. Converting transparent_hugepage_enabled() from a macro to a static inline function reduced the ability of the compiler to remove unused code. Fix this by marking create_huge_pmd() inline. Fixes: 16981d763501c0e0 ("mm: improve readability of transparent_hugepage_enabled()") Link: http://lkml.kernel.org/r/1499842660-10665-1-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12kernel.h: handle pointers to arrays better in container_of()Ian Abbott
If the first parameter of container_of() is a pointer to a non-const-qualified array type (and the third parameter names a non-const-qualified array member), the local variable __mptr will be defined with a const-qualified array type. In ISO C, these types are incompatible. They work as expected in GNU C, but some versions will issue warnings. For example, GCC 4.9 produces the warning "initialization from incompatible pointer type". Here is an example of where the problem occurs: ------------------------------------------------------- #include <linux/kernel.h> #include <linux/module.h> MODULE_LICENSE("GPL"); struct st { int a; char b[16]; }; static int __init example_init(void) { struct st t = { .a = 101, .b = "hello" }; char (*p)[16] = &t.b; struct st *x = container_of(p, struct st, b); printk(KERN_DEBUG "%p %p\n", (void *)&t, (void *)x); return 0; } static void __exit example_exit(void) { } module_init(example_init); module_exit(example_exit); ------------------------------------------------------- Building the module with gcc-4.9 results in these warnings (where '{m}' is the module source and '{k}' is the kernel source): ------------------------------------------------------- In file included from {m}/example.c:1:0: {m}/example.c: In function `example_init': {k}/include/linux/kernel.h:854:48: warning: initialization from incompatible pointer type const typeof( ((type *)0)->member ) *__mptr = (ptr); \ ^ {m}/example.c:14:17: note: in expansion of macro `container_of' struct st *x = container_of(p, struct st, b); ^ {k}/include/linux/kernel.h:854:48: warning: (near initialization for `x') const typeof( ((type *)0)->member ) *__mptr = (ptr); \ ^ {m}/example.c:14:17: note: in expansion of macro `container_of' struct st *x = container_of(p, struct st, b); ^ ------------------------------------------------------- Replace the type checking performed by the macro to avoid these warnings. Make sure `*(ptr)` either has type compatible with the member, or has type compatible with `void`, ignoring qualifiers. Raise compiler errors if this is not true. This is stronger than the previous behaviour, which only resulted in compiler warnings for a type mismatch. [arnd@arndb.de: fix new warnings for container_of()] Link: http://lkml.kernel.org/r/20170620200940.90557-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20170525120316.24473-7-abbotti@mev.co.uk Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Borislav Petkov <bp@suse.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12include/linux/dcache.h: use unsigned chars in struct name_snapshotStephen Rothwell
"kernel.h: handle pointers to arrays better in container_of()" triggers: In file included from include/uapi/linux/stddef.h:1:0, from include/linux/stddef.h:4, from include/uapi/linux/posix_types.h:4, from include/uapi/linux/types.h:13, from include/linux/types.h:5, from include/linux/syscalls.h:71, from fs/dcache.c:17: fs/dcache.c: In function 'release_dentry_name_snapshot': include/linux/compiler.h:542:38: error: call to '__compiletime_assert_305' declared with attribute error: pointer type mismatch in container_of() _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:525:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^ include/linux/compiler.h:542:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/build_bug.h:46:37: note: in expansion of macro 'compiletime_assert' #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) ^ include/linux/kernel.h:860:2: note: in expansion of macro 'BUILD_BUG_ON_MSG' BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \ ^ fs/dcache.c:305:7: note: in expansion of macro 'container_of' p = container_of(name->name, struct external_name, name[0]); Switch name_snapshot to use unsigned chars, matching struct qstr and struct external_name. Link: http://lkml.kernel.org/r/20170710152134.0f78c1e6@canb.auug.org.au Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12kokr/memory-barriers.txt: Fix obsolete link to atomic_ops.txtSeongJae Park
Obsolete links to atomic_ops.txt exist in ko_KR/memory-barriers.txt though the file has moved to core-api/atomic_ops.rst. This commit fixes the obsolete links. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-07-12memory-barriers.txt: Fix broken link to atomic_ops.txtSeongJae Park
Few obsolete links to atomic_ops.txt exist in memory-barriers.txt though the file has moved to core-api/atomic_ops.rst. This commit fixes the obsolete links. Signed-off-by: SeongJae Park <sj38.park@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-07-12docs: Turn off section numbering for the input docsJonathan Corbet
The input docs enable section numbering at multiple levels, leading to a lot of bright-red "nested numbered toctree" warnings in newer Sphinx versions. Just take that directive out for now to help alleviate the global red-pixel shortage. Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-07-12docs: Include uaccess docs from the right fileJonathan Corbet
Documentation/core-api/kernel-api.rst was including kerneldoc comments from arch/x86/include/asm/uaccess_32.h, but the relevant comments moved to .../uaccess.h some time ago. Correct the include to pick up the comments and eliminate a warning. Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-07-12net: stmmac: revert "support future possible different internal phy mode"LABBE Corentin
Since internal phy-mode is reserved for non-xMII protocol we cannot use it with dwmac-sun8i. Furthermore, all DT patchs which comes with this patch were cleaned, so the current state is broken. This reverts commit 1c2fa5f84683 ("net: stmmac: support future possible different internal phy mode") Fixes: 1c2fa5f84683 ("net: stmmac: support future possible different internal phy mode") Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12sfc: don't read beyond unicast address listBert Kenward
If we have more than 32 unicast MAC addresses assigned to an interface we will read beyond the end of the address table in the driver when adding filters. The next 256 entries store multicast addresses, so we will end up attempting to insert duplicate filters, which is mostly harmless. If we add more than 288 unicast addresses we will then read past the multicast address table, which is likely to be more exciting. Fixes: 12fb0da45c9a ("sfc: clean fallbacks between promisc/normal in efx_ef10_filter_sync_rx_mode") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12Merge branch 'net-doc-fixes'David S. Miller
Stephen Hemminger says: ==================== minor net kernel-doc fixes Fix a couple of small errors in kernel-doc for networking ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12datagram: fix kernel-doc commentsstephen hemminger
An underscore in the kernel-doc comment section has special meaning and mis-use generates an errors. ./net/core/datagram.c:207: ERROR: Unknown target name: "msg". ./net/core/datagram.c:379: ERROR: Unknown target name: "msg". ./net/core/datagram.c:816: ERROR: Unknown target name: "t". Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12socket: add documentation for missing elementsstephen hemminger
Fill in missing kernel-doc for missing elements in struct sock. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12smsc911x: Add check for ioremap_nocache() return codeAlexey Khoroshilov
There is no check for return code of smsc911x_drv_probe() in smsc911x_drv_probe(). The patch adds one. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-12Merge branch 'for-rc' of ↵Rafael J. Wysocki
https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq Pull devfreq changes for v4.13 from MyungJoo Ham. * 'for-rc' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq: PM / devfreq: constify attribute_group structures. PM / devfreq: tegra: fix error return code in tegra_devfreq_probe() PM / devfreq: rk3399_dmc: fix error return code in rk3399_dmcfreq_probe()
2017-07-12Merge branch 'next' into for-linusDmitry Torokhov
Prepare second round of input updates for 4.13 merge window.
2017-07-12rtc: Remove wrong deprecation commentAlexandre Belloni
rtc_time_to_tm and rtc_tm_to_time are not deprecated and make perfect sense for RTCs that are simple 32bit counters. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2017-07-12PCI / PM: Restore PME Enable after config space restorationRafael J. Wysocki
Commit dc15e71eefc7 (PCI / PM: Restore PME Enable if skipping wakeup setup) introduced a mechanism by which the PME Enable bit can be restored by pci_enable_wake() if dev->wakeup_prepared is set in case it has been overwritten by PCI config space restoration. However, that commit overlooked the fact that on some systems (Dell XPS13 9360 in particular) the AML handling wakeup events checks PME Status and PME Enable and it won't trigger a Notify() for devices where those bits are not set while it is running. That happens during resume from suspend-to-idle when pci_restore_state() invoked by pci_pm_default_resume_early() clears PME Enable before the wakeup events are processed by AML, effectively causing those wakeup events to be ignored. Fix this issue by restoring the PME Enable configuration right after pci_restore_state() has been called instead of doing that in pci_enable_wake(). Fixes: dc15e71eefc7 (PCI / PM: Restore PME Enable if skipping wakeup setup) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2017-07-12platform/x86: silead_dmi: Add entry for Ployer Momo7w tablet touchscreenHans de Goede
This Ployer Momo7w revision has the same hardware as the Trekstor ST70416-6, so we re-use the surftab_wintron70_st70416_6_data. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2017-07-12KVM: trigger uevents when creating or destroying a VMClaudio Imbrenda
This patch adds a few lines to the KVM common code to fire a KOBJ_CHANGE uevent whenever a KVM VM is created or destroyed. The event carries five environment variables: CREATED indicates how many times a new VM has been created. It is useful for example to trigger specific actions when the first VM is started COUNT indicates how many VMs are currently active. This can be used for logging or monitoring purposes PID has the pid of the KVM process that has been started or stopped. This can be used to perform process-specific tuning. STATS_PATH contains the path in debugfs to the directory with all the runtime statistics for this VM. This is useful for performance monitoring and profiling. EVENT described the type of event, its value can be either "create" or "destroy" Specific udev rules can be then set up in userspace to deal with the creation or destruction of VMs as needed. Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12KVM: SVM: Enable Virtual VMLOAD VMSAVE featureJanakarajan Natarajan
Enable the Virtual VMLOAD VMSAVE feature. This is done by setting bit 1 at position B8h in the vmcb. The processor must have nested paging enabled, be in 64-bit mode and have support for the Virtual VMLOAD VMSAVE feature for the bit to be set in the vmcb. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12KVM: SVM: Add Virtual VMLOAD VMSAVE feature definitionJanakarajan Natarajan
Define a new cpufeature definition for Virtual VMLOAD VMSAVE. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12KVM: SVM: Rename lbr_ctl field in the vmcb control areaJanakarajan Natarajan
Rename the lbr_ctl variable to better reflect the purpose of the field - provide support for virtualization extensions. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12KVM: SVM: Prepare for new bit definition in lbr_ctlJanakarajan Natarajan
The lbr_ctl variable in the vmcb control area is used to enable or disable Last Branch Record (LBR) virtualization. However, this is to be done using only bit 0 of the variable. To correct this and to prepare for a new feature, change the current usage to work only on a particular bit. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12KVM: SVM: handle singlestep exception when skipping emulated instructionsLadi Prosek
kvm_skip_emulated_instruction handles the singlestep debug exception which is something we almost always want. This commit (specifically the change in rdmsr_interception) makes the debug.flat KVM unit test pass on AMD. Two call sites still call skip_emulated_instruction directly: * In svm_queue_exception where it's used only for moving the rip forward * In task_switch_interception which is analogous to handle_task_switch in VMX Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12KVM: x86: take slots_lock in kvm_free_pitRadim Krčmář
kvm_vm_release() did not have slots_lock when calling kvm_io_bus_unregister_dev() and this went unnoticed until 4a12f9517728 ("KVM: mark kvm->busses as rcu protected") added dynamic checks. Luckily, there should be no race at that point: ============================= WARNING: suspicious RCU usage 4.12.0.kvm+ #0 Not tainted ----------------------------- ./include/linux/kvm_host.h:479 suspicious rcu_dereference_check() usage! lockdep_rcu_suspicious+0xc5/0x100 kvm_io_bus_unregister_dev+0x173/0x190 [kvm] kvm_free_pit+0x28/0x80 [kvm] kvm_arch_sync_events+0x2d/0x30 [kvm] kvm_put_kvm+0xa7/0x2a0 [kvm] kvm_vm_release+0x21/0x30 [kvm] Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definitionGleb Fotengauer-Malinovskiy
In case of KVM_S390_GET_CMMA_BITS, the kernel does not only read struct kvm_s390_cmma_log passed from userspace (which constitutes _IOC_WRITE), it also writes back a return value (which constitutes _IOC_READ) making this an _IOWR ioctl instead of _IOW. Fixes: 4036e387 ("KVM: s390: ioctls to get and set guest storage attributes") Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12kvm: vmx: Properly handle machine check during VM-entryJim Mattson
vmx_complete_atomic_exit should call kvm_machine_check for any VM-entry failure due to a machine-check event. Such an exit should be recognized solely by its basic exit reason (i.e. the low 16 bits of the VMCS exit reason field). None of the other VMCS exit information fields contain valid information when the VM-exit is due to "VM-entry failure due to machine-check event". Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com> [Changed VM_EXIT_INTR_INFO condition to better describe its reason.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12KVM: x86: update master clock before computing kvmclock_offsetRadim Krčmář
kvm master clock usually has a different frequency than the kernel boot clock. This is not a problem until the master clock is updated; update uses the current kernel boot clock to compute new kvm clock, which erases any kvm clock cycles that might have built up due to frequency difference over a long period. KVM_SET_CLOCK is one of places where we can safely update master clock as the guest-visible clock is going to be shifted anyway. The problem with current code is that it updates the kvm master clock after updating the offset. If the master clock was enabled before calling KVM_SET_CLOCK, then it might have built up a significant delta from kernel boot clock. In the worst case, the time set by userspace would be shifted by so much that it couldn't have been set at any point during KVM_SET_CLOCK. To fix this, move kvm_gen_update_masterclock() before computing kvmclock_offset, which means that the master clock and kernel boot clock will be sufficiently close together. Another solution would be to replace get_kvmclock_ns() with "ktime_get_boot_ns() + ka->kvmclock_offset", which is marginally more accurate, but would break symmetry with KVM_GET_CLOCK. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-07-12nfsd4: factor ctime into change attributeJ. Bruce Fields
Factoring ctime into the nfsv4 change attribute gives us better properties than just i_version alone. Eventually we'll likely also expose this (as opposed to raw i_version) to userspace, at which point we'll want to move it to a common helper, called from either userspace or individual filesystems. For now, nfsd is the only user. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir fieldChuck Lever
Clean up: No need to save the I/O direction. The functions that release svc_rdma_chunk_ctxt already know what direction to use. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: use offset_in_page() macroChuck Lever
Clean up: Use offset_in_page() macro instead of open-coding. Reported-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw APIChuck Lever
Clean up: Registration mode details are now handled by the rdma_rw API, and thus can be removed from svcrdma. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Clean-up svc_rdma_unmap_dmaChuck Lever
There's no longer a need to compare each SGE's lkey with the PD's local_dma_lkey. Now that FRWR is gone, all DMA mappings are for pages that were registered with this key. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-07-12svcrdma: Remove frmr cacheChuck Lever
Clean up: Now that the svc_rdma_recvfrom path uses the rdma_rw API, the details of Read sink buffer registration are dealt with by the kernel's RDMA core. This cache is no longer used, and can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>