summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-11-26virtio_ring: cache whether we will use DMA APITiwei Bie
Cache whether we will use DMA API, instead of doing the check every time. We are going to check whether DMA API is used more often in packed ring. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: extract split ring handling from ring creationTiwei Bie
Introduce a specific function to create the split ring. And also move the DMA allocation and size information to the .split sub-structure. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: allocate desc state for split ring separatelyTiwei Bie
Put the split ring's desc state into the .split sub-structure, and allocate desc state for split ring separately, this makes the code more readable and more consistent with what we will do for packed ring. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: introduce helper for indirect featureTiwei Bie
Introduce a helper to check whether we will use indirect feature. It will be used by packed ring too. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: introduce debug helpersTiwei Bie
Introduce debug helpers for last_add_time update, check and invalid. They will be used by packed ring too. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: put split ring fields in a sub structTiwei Bie
Put the split ring specific fields in a sub-struct named as "split" to avoid misuse after introducing packed ring. There is no functional change. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: put split ring functions togetherTiwei Bie
Put the xxx_split() functions together to make the code more readable and avoid misuse after introducing the packed ring. There is no functional change. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio_ring: add _split suffix for split ring functionsTiwei Bie
Add _split suffix for split ring specific functions. This is a preparation for introducing the packed ring support. There is no functional change. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26virtio: add packed ring types and macrosTiwei Bie
Add types and macros for packed ring. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26Merge branch 'libbpf-versioning-doc'Alexei Starovoitov
Andrey Ignatov says: ==================== This patch set adds ABI versioning and documentation to libbpf. Patch 1 renames btf_get_from_id to btf__get_from_id to follow naming convention. Patch 2 adds version script and has more details on ABI versioning. Patch 3 adds simple check that all global symbols are versioned. Patch 4 documents a few aspects of libbpf API and ABI in dev process. v1->v2: * add patch from Martin KaFai Lau <kafai@fb.com> to rename btf_get_from_id; * add documentation for libbpf API and ABI. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26libbpf: Document API and ABI conventionsAndrey Ignatov
Document API and ABI for libbpf: naming convention, symbol visibility, ABI versioning. This is just a starting point. Documentation can be significantly extended in the future to cover more topics. ABI versioning section touches only a few basic points with a link to more comprehensive documentation from Ulrich Drepper. This section can be extended in the future when there is better understanding what works well and what not so well in libbpf development process and production usage. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26libbpf: Verify versioned symbolsAndrey Ignatov
Since ABI versioning info is kept separately from the code it's easy to forget to update it while adding a new API. Add simple verification that all global symbols exported with LIBBPF_API are versioned in libbpf.map version script. The idea is to check that number of global symbols in libbpf-in.o, that is the input to the linker, matches with number of unique versioned symbols in libbpf.so, that is the output of the linker. If these numbers don't match, it may mean some symbol was not versioned and make will fail. "Unique" means that if a symbol is present in more than one version of ABI due to ABI changes, it'll be counted once. Another option to calculate number of global symbols in the "input" could be to count number of LIBBPF_ABI entries in C headers but it seems to be fragile. Example of output when a symbol is missing in version script: ... LD libbpf-in.o LINK libbpf.a LINK libbpf.so Warning: Num of global symbols in libbpf-in.o (115) does NOT match with num of versioned symbols in libbpf.so (114). Please make sure all LIBBPF_API symbols are versioned in libbpf.map. make: *** [check_abi] Error 1 Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26libbpf: Add version script for DSOAndrey Ignatov
More and more projects use libbpf and one day it'll likely be packaged and distributed as DSO and that requires ABI versioning so that both compatible and incompatible changes to ABI can be introduced in a safe way in the future without breaking executables dynamically linked with a previous version of the library. Usual way to do ABI versioning is version script for the linker. Add such a script for libbpf. All global symbols currently exported via LIBBPF_API macro are added to the version script libbpf.map. The version name LIBBPF_0.0.1 is constructed from the name of the library + version specified by $(LIBBPF_VERSION) in Makefile. Version script does not duplicate the work done by LIBBPF_API macro, it rather complements it. The macro is used at compile time and can be used by compiler to do optimization that can't be done at link time, it is purely about global symbol visibility. The version script, in turn, is used at link time and takes care of ABI versioning. Both techniques are described in details in [1]. Whenever ABI is changed in the future, version script should be changed appropriately. [1] https://www.akkadia.org/drepper/dsohowto.pdf Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26libbpf: Name changing for btf_get_from_idMartin KaFai Lau
s/btf_get_from_id/btf__get_from_id/ to restore the API naming convention. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26xtensa: fix coprocessor part of ptrace_{get,set}xregsMax Filippov
Layout of coprocessor registers in the elf_xtregs_t and xtregs_coprocessor_t may be different due to alignment. Thus it is not always possible to copy data between the xtregs_coprocessor_t structure and the elf_xtregs_t and get correct values for all registers. Use a table of offsets and sizes of individual coprocessor register groups to do coprocessor context copying in the ptrace_getxregs and ptrace_setxregs. This fixes incorrect coprocessor register values reading from the user process by the native gdb on an xtensa core with multiple coprocessors and registers with high alignment requirements. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-11-26xtensa: fix coprocessor context offset definitionsMax Filippov
Coprocessor context offsets are used by the assembly code that moves coprocessor context between the individual fields of the thread_info::xtregs_cp structure and coprocessor registers. This fixes coprocessor context clobbering on flushing and reloading during normal user code execution and user process debugging in the presence of more than one coprocessor in the core configuration. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-11-26xtensa: enable coprocessors that are being flushedMax Filippov
coprocessor_flush_all may be called from a context of a thread that is different from the thread being flushed. In that case contents of the cpenable special register may not match ti->cpenable of the target thread, resulting in unhandled coprocessor exception in the kernel context. Set cpenable special register to the ti->cpenable of the target register for the duration of the flush and restore it afterwards. This fixes the following crash caused by coprocessor register inspection in native gdb: (gdb) p/x $w0 Illegal instruction in kernel: sig: 9 [#1] PREEMPT Call Trace: ___might_sleep+0x184/0x1a4 __might_sleep+0x41/0xac exit_signals+0x14/0x218 do_exit+0xc9/0x8b8 die+0x99/0xa0 do_illegal_instruction+0x18/0x6c common_exception+0x77/0x77 coprocessor_flush+0x16/0x3c arch_ptrace+0x46c/0x674 sys_ptrace+0x2ce/0x3b4 system_call+0x54/0x80 common_exception+0x77/0x77 note: gdb[100] exited with preempt_count 1 Killed Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-11-26ia64: export node_distance functionMatias Bjørling
The numa_slit variable used by node_distance is available to a module as long as it is linked at compile-time. However, it is not available to loadable modules. Leading to errors such as: ERROR: "numa_slit" [drivers/nvme/host/nvme-core.ko] undefined! The error above is caused by the nvme multipath code that makes use of node_distance for its path calculation. When the patch was added, the lightnvm subsystem would select nvme and always compile it in, leading to the node_distance call to always succeed. However, when this requirement was removed, nvme could be compiled in as a module, which exposed this bug. This patch extracts node_distance to a function and exports it. Since ACPI is depending on node_distance being a simple lookup to numa_slit, the previous behavior is exposed as slit_distance and its users updated. Fixes: f333444708f82 "nvme: take node locality into account when selecting a path" Fixes: 73569e11032f "lightnvm: remove dependencies on BLK_DEV_NVME and PCI" Signed-off-by: Matias Bjøring <mb@lightnvm.io> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-26bpf, doc: add entries of who looks over which jitsDaniel Borkmann
Make the high-level BPF JIT entry a general 'catch-all' and add architecture specific entries to make it more clear who actively maintains which BPF JIT compiler. The list (L) address implies that this eventually lands in the bpf patchwork bucket. Goal is that this set of responsible developers listed here is always up to date and a point of contact for helping out in e.g. feature development, fixes, review or testing patches in order to help long-term in ensuring quality of the BPF JITs and therefore BPF core under a given architecture. Every new JIT in future /must/ have an entry here as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Sandipan Das <sandipan@linux.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Paul Burton <paul.burton@mips.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26Merge branch 'non-jit-btf-func_info'Alexei Starovoitov
Yonghong Song says: ==================== Commit 838e96904ff3 ("bpf: Introduce bpf_func_info") added bpf func info support. The userspace is able to get better ksym's for bpf programs with jit, and is able to print out func prototypes. For a program containing func-to-func calls, the existing implementation returns user specified number of function calls and BTF types if jit is enabled. If the jit is not enabled, it only returns the type for the main function. This is undesirable. Interpreter may still be used and we should keep feature identical regardless of whether jit is enabled or not. This patch fixed this discrepancy. The following example shows bpftool output for the bpf program in selftests test_btf_haskv.o when jit is disabled: $ bpftool prog dump xlated id 1490 int _dummy_tracepoint(struct dummy_tracepoint_args * arg): 0: (85) call pc+2#__bpf_prog_run_args32 1: (b7) r0 = 0 2: (95) exit int test_long_fname_1(struct dummy_tracepoint_args * arg): 3: (85) call pc+1#__bpf_prog_run_args32 4: (95) exit int test_long_fname_2(struct dummy_tracepoint_args * arg): 5: (b7) r2 = 0 6: (63) *(u32 *)(r10 -4) = r2 7: (79) r1 = *(u64 *)(r1 +8) 8: (15) if r1 == 0x0 goto pc+9 9: (bf) r2 = r10 10: (07) r2 += -4 11: (18) r1 = map[id:1173] 13: (85) call bpf_map_lookup_elem#77088 14: (15) if r0 == 0x0 goto pc+3 15: (61) r1 = *(u32 *)(r0 +4) 16: (07) r1 += 1 17: (63) *(u32 *)(r0 +4) = r1 18: (95) exit $ bpftool prog dump jited id 1490 no instructions returned ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26tools/bpf: change selftest test_btf for both jit and non-jitYonghong Song
The selftest test_btf is changed to test both jit and non-jit. The test result should be the same regardless of whether jit is enabled or not. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26bpf: btf: support proper non-jit func infoYonghong Song
Commit 838e96904ff3 ("bpf: Introduce bpf_func_info") added bpf func info support. The userspace is able to get better ksym's for bpf programs with jit, and is able to print out func prototypes. For a program containing func-to-func calls, the existing implementation returns user specified number of function calls and BTF types if jit is enabled. If the jit is not enabled, it only returns the type for the main function. This is undesirable. Interpreter may still be used and we should keep feature identical regardless of whether jit is enabled or not. This patch fixed this discrepancy. Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info") Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26sparc: Correct ctx->saw_frame_pointer logic.David Miller
We need to initialize the frame pointer register not just if it is seen as a source operand, but also if it is seen as the destination operand of a store or an atomic instruction (which effectively is a source operand). This is exercised by test_verifier's "non-invalid fp arithmetic" Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26sparc: Fix JIT fused branch convergance.David Miller
On T4 and later sparc64 cpus we can use the fused compare and branch instruction. However, it can only be used if the branch destination is in the range of a signed 10-bit immediate offset. This amounts to 1024 instructions forwards or backwards. After the commit referenced in the Fixes: tag, the largest possible size program seen by the JIT explodes by a significant factor. As a result of this convergance takes many more passes since the expanded "BPF_LDX | BPF_MSH | BPF_B" code sequence, for example, contains several embedded branch on condition instructions. On each pass, as suddenly new fused compare and branch instances become valid, this makes thousands more in range for the next pass. And so on and so forth. This is most greatly exemplified by "BPF_MAXINSNS: exec all MSH" which takes 35 passes to converge, and shrinks the image by about 64K. To decrease the cost of this number of convergance passes, do the convergance pass before we have the program image allocated, just like other JITs (such as x86) do. Fixes: e0cea7ce988c ("bpf: implement ld_abs/ld_ind in native bpf") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26Merge branch 'arm64-jit-fixes'Alexei Starovoitov
Daniel Borkmann says: ==================== This set contains a fix for arm64 BPF JIT. First patch generalizes ppc64 way of retrieving subprog into bpf_jit_get_func_addr() as core code and uses the same on arm64 in second patch. Tested on both arm64 and ppc64. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26bpf, arm64: fix getting subprog addr from aux for callsDaniel Borkmann
The arm64 JIT has the same issue as ppc64 JIT in that the relative BPF to BPF call offset can be too far away from core kernel in that relative encoding into imm is not sufficient and could potentially be truncated, see also fd045f6cd98e ("arm64: add support for module PLTs") which adds spill-over space for module_alloc() and therefore bpf_jit_binary_alloc(). Therefore, use the recently added bpf_jit_get_func_addr() helper for properly fetching the address through prog->aux->func[off]->bpf_func instead. This also has the benefit to optimize normal helper calls since their address can use the optimized emission. Tested on Cavium ThunderX CN8890. Fixes: db496944fdaa ("bpf: arm64: add JIT support for multi-function programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26bpf, ppc64: generalize fetching subprog into bpf_jit_get_func_addrDaniel Borkmann
Make fetching of the BPF call address from ppc64 JIT generic. ppc64 was using a slightly different variant rather than through the insns' imm field encoding as the target address would not fit into that space. Therefore, the target subprog number was encoded into the insns' offset and fetched through fp->aux->func[off]->bpf_func instead. Given there are other JITs with this issue and the mechanism of fetching the address is JIT-generic, move it into the core as a helper instead. On the JIT side, we get information on whether the retrieved address is a fixed one, that is, not changing through JIT passes, or a dynamic one. For the former, JITs can optimize their imm emission because this doesn't change jump offsets throughout JIT process. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Tested-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-27netfilter: nf_conncount: remove wrong condition check routineTaehee Yoo
All lists that reach the tree_nodes_free() function have both zero counter and true dead flag. The reason for this is that lists to be release are selected by nf_conncount_gc_list() which already decrements the list counter and sets on the dead flag. Therefore, this if statement in tree_nodes_free() is unnecessary and wrong. Fixes: 31568ec09ea0 ("netfilter: nf_conncount: fix list_del corruption in conn_free") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27netfilter: nat: fix double register in masquerade modulesTaehee Yoo
There is a reference counter to ensure that masquerade modules register notifiers only once. However, the existing reference counter approach is not safe, test commands are: while : do modprobe ip6t_MASQUERADE & modprobe nft_masq_ipv6 & modprobe -rv ip6t_MASQUERADE & modprobe -rv nft_masq_ipv6 & done numbers below represent the reference counter. -------------------------------------------------------- CPU0 CPU1 CPU2 CPU3 CPU4 [insmod] [insmod] [rmmod] [rmmod] [insmod] -------------------------------------------------------- 0->1 register 1->2 returns 2->1 returns 1->0 0->1 register <-- unregister -------------------------------------------------------- The unregistation of CPU3 should be processed before the registration of CPU4. In order to fix this, use a mutex instead of reference counter. splat looks like: [ 323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381] [ 323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n] [ 323.869574] irq event stamp: 194074 [ 323.898930] hardirqs last enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c [ 323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c [ 323.898930] softirqs last enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b [ 323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0 [ 323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27 [ 323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240 [ 323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488 [ 323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 [ 323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000 [ 323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0 [ 323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001 [ 323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000 [ 323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0 [ 323.898930] FS: 00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 [ 323.898930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0 [ 323.898930] Call Trace: [ 323.898930] ? atomic_notifier_chain_register+0x2d0/0x2d0 [ 323.898930] ? down_read+0x150/0x150 [ 323.898930] ? sched_clock_cpu+0x126/0x170 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] register_netdevice_notifier+0xbb/0x790 [ 323.898930] ? __dev_close_many+0x2d0/0x2d0 [ 323.898930] ? __mutex_unlock_slowpath+0x17f/0x740 [ 323.898930] ? wait_for_completion+0x710/0x710 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 323.898930] ? up_write+0x6c/0x210 [ 323.898930] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 324.127073] ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables] [ 324.127073] nft_chain_filter_init+0x1e/0xe8a [nf_tables] [ 324.127073] nf_tables_module_init+0x37/0x92 [nf_tables] [ ... ] Fixes: 8dd33cc93ec9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables") Fixes: be6b635cd674 ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27netfilter: add missing error handling code for register functionsTaehee Yoo
register_{netdevice/inetaddr/inet6addr}_notifier may return an error value, this patch adds the code to handle these error paths. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-27netfilter: ipv6: Preserve link scope traffic original oifAlin Nastac
When ip6_route_me_harder is invoked, it resets outgoing interface of: - link-local scoped packets sent by neighbor discovery - multicast packets sent by MLD host - multicast packets send by MLD proxy daemon that sets outgoing interface through IPV6_PKTINFO ipi6_ifindex Link-local and multicast packets must keep their original oif after ip6_route_me_harder is called. Signed-off-by: Alin Nastac <alin.nastac@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-26bpf: Avoid unnecessary instruction in convert_bpf_ld_abs()David Miller
'offset' is constant and if it is zero, no need to subtract it from BPF_REG_TMP. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-11-26 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Extend BTF to support function call types and improve the BPF symbol handling with this info for kallsyms and bpftool program dump to make debugging easier, from Martin and Yonghong. 2) Optimize LPM lookups by making longest_prefix_match() handle multiple bytes at a time, from Eric. 3) Adds support for loading and attaching flow dissector BPF progs from bpftool, from Stanislav. 4) Extend the sk_lookup() helper to be supported from XDP, from Nitin. 5) Enable verifier to support narrow context loads with offset > 0 to adapt to LLVM code generation (currently only offset of 0 was supported). Add test cases as well, from Andrey. 6) Simplify passing device functions for offloaded BPF progs by adding callbacks to bpf_prog_offload_ops instead of ndo_bpf. Also convert nfp and netdevsim to make use of them, from Quentin. 7) Add support for sock_ops based BPF programs to send events to the perf ring-buffer through perf_event_output helper, from Sowmini and Daniel. 8) Add read / write support for skb->tstamp from tc BPF and cg BPF programs to allow for supporting rate-limiting in EDT qdiscs like fq from BPF side, from Vlad. 9) Extend libbpf API to support map in map types and add test cases for it as well to BPF kselftests, from Nikita. 10) Account the maximum packet offset accessed by a BPF program in the verifier and use it for optimizing nfp JIT, from Jiong. 11) Fix error handling regarding kprobe_events in BPF sample loader, from Daniel T. 12) Add support for queue and stack map type in bpftool, from David. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26Merge tag 'hwmon-for-v4.20-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - fix temp4_type attribute permissions in w83795 driver - fix tacho fault detection in mlxreg-fan driver - fix current value calculations in ina2xx driver - fix initial notification/warning in raspberrypi driver - fix a NULL pointer access in ina2xx * tag 'hwmon-for-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (w83795) temp4_type has writable permission hwmon: (mlxreg-fan) Fix macros for tacho fault reading hwmon: (ina2xx) Fix current value calculation hwmon: (raspberrypi) Fix initial notify hwmon (ina2xx) Fix NULL id pointer in probe()
2018-11-26netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, tooFlorian Westphal
syzbot was able to trigger the WARN in cttimeout_default_get() by passing UDPLITE as l4protocol. Alias UDPLITE to UDP, both use same timeout values. Furthermore, also fetch GRE timeouts. GRE is a bit more complicated, as it still can be a module and its netns_proto_gre struct layout isn't visible outside of the gre module. Can't move timeouts around, it appears conntrack sysctl unregister assumes net_generic() returns nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead. A followup nf-next patch could make gre tracker be built-in as well if needed, its not that large. Last, make the WARN() mention the missing protocol value in case anything else is missing. Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com Fixes: 8866df9264a3 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-26ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notfXin Long
ip_vs_dst_event is supposed to clean up all dst used in ipvs' destinations when a net dev is going down. But it works only when the dst's dev is the same as the dev from the event. Now with the same priority but late registration, ip_vs_dst_notifier is always called later than ipv6_dev_notf where the dst's dev is set to lo for NETDEV_DOWN event. As the dst's dev lo is not the same as the dev from the event in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work. Also as these dst have to wait for dest_trash_timer to clean them up. It would cause some non-permanent kernel warnings: unregister_netdevice: waiting for br0 to become free. Usage count = 3 To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5. Note that for ipv4 route fib_netdev_notifier doesn't set dst's dev to lo in NETDEV_DOWN event, so this fix is only needed when IP_VS_IPV6 is defined. Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-11-25 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix an off-by-one bug when adjusting subprog start offsets after patching, from Edward. 2) Fix several bugs such as overflow in size allocation in queue / stack map creation, from Alexei. 3) Fix wrong IPv6 destination port byte order in bpf_sk_lookup_udp helper, from Andrey. 4) Fix several bugs in bpftool such as preventing an infinite loop in get_fdinfo, error handling and man page references, from Quentin. 5) Fix a warning in bpf_trace_printk() that wasn't catching an invalid format string, from Martynas. 6) Fix a bug in BPF cgroup local storage where non-atomic allocation was used in atomic context, from Roman. 7) Fix a NULL pointer dereference bug in bpftool from reallocarray() error handling, from Jakub and Wen. 8) Add a copy of pkt_cls.h and tc_bpf.h uapi headers to the tools include infrastructure so that bpftool compiles on older RHEL7-like user space which does not ship these headers, from Yonghong. 9) Fix BPF kselftests for user space where to get ping test working with ping6 and ping -6, from Li. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26bpf: align map type names formatting.David Calavera
Make the formatting for map_type_name array consistent. Signed-off-by: David Calavera <david.calavera@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-26bpf: btf: fix spelling mistake "Memmber" -> "Member"Colin Ian King
There is a spelling mistake in a btf_verifier_log_member message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-26bpf, tags: Fix DEFINE_PER_CPU expansionRustam Kovhaev
Building tags produces warning: ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name pattern "\1" Let's use the same fix as in commit 25528213fe9f ("tags: Fix DEFINE_PER_CPU expansions"), even though it violates the usual code style. Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-25Linux 4.20-rc4v4.20-rc4Linus Torvalds
2018-11-25net: remove unsafe skb_insert()Eric Dumazet
I do not see how one can effectively use skb_insert() without holding some kind of lock. Otherwise other cpus could have changed the list right before we have a chance of acquiring list->lock. Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this one probably meant to use __skb_insert() since it appears nesqp->pau_list is protected by nesqp->pau_lock. This looks like nesqp->pau_lock could be removed, since nesqp->pau_list.lock could be used instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Faisal Latif <faisal.latif@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma <linux-rdma@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25net: bridge: remove redundant checks for null p->dev and p->brColin Ian King
A recent change added a null check on p->dev after p->dev was being dereferenced by the ns_capable check on p->dev. It turns out that neither the p->dev and p->br null checks are necessary, and can be removed, which cleans up a static analyis warning. As Nikolay Aleksandrov noted, these checks can be removed because: "My reasoning of why it shouldn't be possible: - On port add new_nbp() sets both p->dev and p->br before creating kobj/sysfs - On port del (trickier) del_nbp() calls kobject_del() before call_rcu() to destroy the port which in turn calls sysfs_remove_dir() which uses kernfs_remove() which deactivates (shouldn't be able to open new files) and calls kernfs_drain() to drain current open/mmaped files in the respective dir before continuing, thus making it impossible to open a bridge port sysfs file with p->dev and p->br equal to NULL. So I think it's safe to remove those checks altogether. It'd be nice to get a second look over my reasoning as I might be missing something in sysfs/kernfs call path." Thanks to Nikolay Aleksandrov's suggestion to remove the check and David Miller for sanity checking this. Detected by CoverityScan, CID#751490 ("Dereference before null check") Fixes: a5f3ea54f3cc ("net: bridge: add support for raw sysfs port options") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25Merge branch 'r8169-xmit_more'David S. Miller
Heiner Kallweit says: ==================== r8169: make use of xmit_more and __netdev_sent_queue This series adds helper __netdev_sent_queue to the core and makes use of it in the r8169 driver. Heiner Kallweit (2): net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue r8169: make use of xmit_more and __netdev_sent_queue v2: - fix minor style issue ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25r8169: make use of xmit_more and __netdev_sent_queueHeiner Kallweit
Make use of xmit_more and add the functionality introduced with 3e59020abf0f ("net: bql: add __netdev_tx_sent_queue()"). I used the mlx4 driver as template. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queueHeiner Kallweit
Similar to netdev_sent_queue add helper __netdev_sent_queue as variant of __netdev_tx_sent_queue. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25Merge tag 'kvm-ppc-fixes-4.20-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM fixes for 4.20 This has a single 1-line patch which fixes a bug in the recently-merged nested HV KVM support.
2018-11-25Merge tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fixes from Christoph Hellwig: "Two dma-direct / swiotlb regressions fixes: - zero is a valid physical address on some arm boards, we can't use it as the error value - don't try to cache flush the error return value (no matter what it is)" * tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: Skip cache maintenance on map error dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB
2018-11-25Merge tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - Fix a NFSv4 state manager deadlock when returning a delegation - NFSv4.2 copy do not allocate memory under the lock - flexfiles: Use the correct stateid for IO in the tightly coupled case * tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: flexfiles: use per-mirror specified stateid for IO NFSv4.2 copy do not allocate memory under the lock NFSv4: Fix a NFSv4 state manager deadlock
2018-11-25MAINTAINERS: change Sparse's maintainerLuc Van Oostenryck
I'm taking over the maintainance of Sparse so add myself as maintainer and move Christopher's info to CREDITS. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>