summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-05-03selftests: forwarding: lib: Add sysctl_set(), sysctl_restore()Petr Machata
Add two helper functions: sysctl_set() to change the value of a given sysctl setting, and sysctl_restore() to change it back to what it was. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03net/mlx5e: fix spelling mistake: "loobpack" -> "loopback"Colin Ian King
Trivial fix to spelling mistake in netdev_err error message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03Merge branch 'selftests-forwarding-Two-enhancements'David S. Miller
Ido Schimmel says: ==================== selftests: forwarding: Two enhancements First patch increases the maximum deviation in the multipath tests which proved to be too low in some cases. Second patch allows user to run only specific tests from each file using the TESTS environment variable. This granularity is needed in setups where not all the tests can pass. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03selftests: forwarding: Allow running specific testsIdo Schimmel
Similar to commit a511858c7536 ("selftests: fib_tests: Allow user to run a specific test"), allow user to run only a subset of the tests using the TESTS environment variable. This is useful when not all the tests can pass on a given system. Example: # export TESTS="ping_ipv4 ping_ipv6" # ./bridge_vlan_aware.sh TEST: ping [PASS] TEST: ping6 [PASS] Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03selftests: forwarding: Increase maximum deviation in multipath testIdo Schimmel
We sometimes observe failures in the test due to too large discrepancy between the measured and expected ratios. For example: TEST: ECMP [FAIL] Too large discrepancy between expected and measured ratios INFO: Expected ratio 1.00 Measured ratio 1.11 Fix this by allowing an up to 15% deviation between both ratios. Another possibility is to increase the number of generated flows, but this will prolong the execution time of the test, which is already quite high. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03cxgb4: update latest firmware version supportedGanesh Goudar
Change t4fw_version.h to update latest firmware version number to 1.19.1.0. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03Merge tag 'dma-mapping-4.17-4' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fix from Christoph Hellwig: "Fix an incorrect warning selection introduced in the last merge window" * tag 'dma-mapping-4.17-4' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix inversed DMA_ATTR_NO_WARN test
2018-05-03bpf, x86_32: add eBPF JIT compiler for ia32Wang YanQing
The JIT compiler emits ia32 bit instructions. Currently, It supports eBPF only. Classic BPF is supported because of the conversion by BPF core. Almost all instructions from eBPF ISA supported except the following: BPF_ALU64 | BPF_DIV | BPF_K BPF_ALU64 | BPF_DIV | BPF_X BPF_ALU64 | BPF_MOD | BPF_K BPF_ALU64 | BPF_MOD | BPF_X BPF_STX | BPF_XADD | BPF_W BPF_STX | BPF_XADD | BPF_DW It doesn't support BPF_JMP|BPF_CALL with BPF_PSEUDO_CALL at the moment. IA32 has few general purpose registers, EAX|EDX|ECX|EBX|ESI|EDI. I use EAX|EDX|ECX|EBX as temporary registers to simulate instructions in eBPF ISA, and allocate ESI|EDI to BPF_REG_AX for constant blinding, all others eBPF registers, R0-R10, are simulated through scratch space on stack. The reasons behind the hardware registers allocation policy are: 1:MUL need EAX:EDX, shift operation need ECX, so they aren't fit for general eBPF 64bit register simulation. 2:We need at least 4 registers to simulate most eBPF ISA operations on registers operands instead of on register&memory operands. 3:We need to put BPF_REG_AX on hardware registers, or constant blinding will degrade jit performance heavily. Tested on PC (Intel(R) Core(TM) i5-5200U CPU). Testing results on i5-5200U: 1) test_bpf: Summary: 349 PASSED, 0 FAILED, [319/341 JIT'ed] 2) test_progs: Summary: 83 PASSED, 0 FAILED. 3) test_lpm: OK 4) test_lru_map: OK 5) test_verifier: Summary: 828 PASSED, 0 FAILED. Above tests are all done in following two conditions separately: 1:bpf_jit_enable=1 and bpf_jit_harden=0 2:bpf_jit_enable=1 and bpf_jit_harden=2 Below are some numbers for this jit implementation: Note: I run test_progs in kselftest 100 times continuously for every condition, the numbers are in format: total/times=avg. The numbers that test_bpf reports show almost the same relation. a:jit_enable=0 and jit_harden=0 b:jit_enable=1 and jit_harden=0 test_pkt_access:PASS:ipv4:15622/100=156 test_pkt_access:PASS:ipv4:10674/100=106 test_pkt_access:PASS:ipv6:9130/100=91 test_pkt_access:PASS:ipv6:4855/100=48 test_xdp:PASS:ipv4:240198/100=2401 test_xdp:PASS:ipv4:138912/100=1389 test_xdp:PASS:ipv6:137326/100=1373 test_xdp:PASS:ipv6:68542/100=685 test_l4lb:PASS:ipv4:61100/100=611 test_l4lb:PASS:ipv4:37302/100=373 test_l4lb:PASS:ipv6:101000/100=1010 test_l4lb:PASS:ipv6:55030/100=550 c:jit_enable=1 and jit_harden=2 test_pkt_access:PASS:ipv4:10558/100=105 test_pkt_access:PASS:ipv6:5092/100=50 test_xdp:PASS:ipv4:131902/100=1319 test_xdp:PASS:ipv6:77932/100=779 test_l4lb:PASS:ipv4:38924/100=389 test_l4lb:PASS:ipv6:57520/100=575 The numbers show we get 30%~50% improvement. See Documentation/networking/filter.txt for more information. Changelog: Changes v5-v6: 1:Add do {} while (0) to RETPOLINE_RAX_BPF_JIT for consistence reason. 2:Clean up non-standard comments, reported by Daniel Borkmann. 3:Fix a memory leak issue, repoted by Daniel Borkmann. Changes v4-v5: 1:Delete is_on_stack, BPF_REG_AX is the only one on real hardware registers, so just check with it. 2:Apply commit 1612a981b766 ("bpf, x64: fix JIT emission for dead code"), suggested by Daniel Borkmann. Changes v3-v4: 1:Fix changelog in commit. I install llvm-6.0, then test_progs willn't report errors. I submit another patch: "bpf: fix misaligned access for BPF_PROG_TYPE_PERF_EVENT program type on x86_32 platform" to fix another problem, after that patch, test_verifier willn't report errors too. 2:Fix clear r0[1] twice unnecessarily in *BPF_IND|BPF_ABS* simulation. Changes v2-v3: 1:Move BPF_REG_AX to real hardware registers for performance reason. 3:Using bpf_load_pointer instead of bpf_jit32.S, suggested by Daniel Borkmann. 4:Delete partial codes in 1c2a088a6626, suggested by Daniel Borkmann. 5:Some bug fixes and comments improvement. Changes v1-v2: 1:Fix bug in emit_ia32_neg64. 2:Fix bug in emit_ia32_arsh_r64. 3:Delete filename in top level comment, suggested by Thomas Gleixner. 4:Delete unnecessary boiler plate text, suggested by Thomas Gleixner. 5:Rewrite some words in changelog. 6:CodingSytle improvement and a little more comments. Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-03tcp: restore autocorkingEric Dumazet
When adding rb-tree for TCP retransmit queue, we inadvertently broke TCP autocorking. tcp_should_autocork() should really check if the rtx queue is not empty. Tested: Before the fix : $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 540000 262144 500 10.00 2682.85 2.47 1.59 3.618 2.329 TcpExtTCPAutoCorking 33 0.0 // Same test, but forcing TCP_NODELAY $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 540000 262144 500 10.00 1408.75 2.44 2.96 6.802 8.259 TcpExtTCPAutoCorking 1 0.0 After the fix : $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 540000 262144 500 10.00 5472.46 2.45 1.43 1.761 1.027 TcpExtTCPAutoCorking 361293 0.0 // With TCP_NODELAY option $ nstat -n;./netperf -H 10.246.7.152 -Cc -- -D -m 500;nstat | grep AutoCork MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET : nodelay Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 540000 262144 500 10.00 5454.96 2.46 1.63 1.775 1.174 TcpExtTCPAutoCorking 315448 0.0 Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Michael Wenig <mwenig@vmware.com> Tested-by: Michael Wenig <mwenig@vmware.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Michael Wenig <mwenig@vmware.com> Tested-by: Michael Wenig <mwenig@vmware.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03ip6_gre: correct the function name in ip6gre_tnl_addr_conflict() commentSun Lianwen
The function name is wrong in ip6gre_tnl_addr_conflict() comment, which use ip6_tnl_addr_conflict instead of ip6gre_tnl_addr_conflict. Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03rds: do not leak kernel memory to user landEric Dumazet
syzbot/KMSAN reported an uninit-value in put_cmsg(), originating from rds_cmsg_recv(). Simply clear the structure, since we have holes there, or since rx_traces might be smaller than RDS_MSG_RX_DGRAM_TRACE_MAX. BUG: KMSAN: uninit-value in copy_to_user include/linux/uaccess.h:184 [inline] BUG: KMSAN: uninit-value in put_cmsg+0x600/0x870 net/core/scm.c:242 CPU: 0 PID: 4459 Comm: syz-executor582 Not tainted 4.16.0+ #87 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 kmsan_internal_check_memory+0x135/0x1e0 mm/kmsan/kmsan.c:1157 kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199 copy_to_user include/linux/uaccess.h:184 [inline] put_cmsg+0x600/0x870 net/core/scm.c:242 rds_cmsg_recv net/rds/recv.c:570 [inline] rds_recvmsg+0x2db5/0x3170 net/rds/recv.c:657 sock_recvmsg_nosec net/socket.c:803 [inline] sock_recvmsg+0x1d0/0x230 net/socket.c:810 ___sys_recvmsg+0x3fb/0x810 net/socket.c:2205 __sys_recvmsg net/socket.c:2250 [inline] SYSC_recvmsg+0x298/0x3c0 net/socket.c:2262 SyS_recvmsg+0x54/0x80 net/socket.c:2257 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 3289025aedc0 ("RDS: add receive message trace used by application") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: linux-rdma <linux-rdma@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03qmi_wwan: do not steal interfaces from class driversBjørn Mork
The USB_DEVICE_INTERFACE_NUMBER matching macro assumes that the { vendorid, productid, interfacenumber } set uniquely identifies one specific function. This has proven to fail for some configurable devices. One example is the Quectel EM06/EP06 where the same interface number can be either QMI or MBIM, without the device ID changing either. Fix by requiring the vendor-specific class for interface number based matching. Functions of other classes can and should use class based matching instead. Fixes: 03304bcb5ec4 ("net: qmi_wwan: use fixed interface number matching") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03Merge branch 'act_csum-get_fill_size'David S. Miller
Craig Dillabaugh says: ==================== Update csum tc action for batch operation. This patchset includes two patches the first updating act_csum.c to include the get_fill_size routine required for batch operation, and the second including updated TDC tests for the feature. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03tc-testing: Updated csum action tests batch create w/wo cookies.Craig Dillabaugh
Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-03net sched: Implemented get_fill_size routine for act_csum.Craig Dillabaugh
Signed-off-by: Craig Dillabaugh <cdillaba@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02Merge tag 'trace-v4.17-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various fixes in tracing: - Tracepoints should not give warning on OOM failures - Use special field for function pointer in trace event - Fix igrab issues in uprobes - Fixes to the new histogram triggers" * tag 'trace-v4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracepoint: Do not warn on ENOMEM tracing: Add field modifier parsing hist error for hist triggers tracing: Add field parsing hist error for hist triggers tracing: Restore proper field flag printing when displaying triggers tracing: initcall: Ordered comparison of function pointers tracing: Remove igrab() iput() call from uprobes.c tracing: Fix bad use of igrab in trace_uprobe.c
2018-05-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: "Just a few driver fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: atmel_mxt_ts - add missing compatible strings to OF device table Input: atmel_mxt_ts - fix the firmware update Input: atmel_mxt_ts - add touchpad button mapping for Samsung Chromebook Pro MAINTAINERS: Rakesh Iyer can't be reached anymore Input: hideep_ts - fix a typo in Kconfig Input: alps - fix reporting pressure of v3 trackstick Input: leds - fix out of bound access Input: synaptics-rmi4 - fix an unchecked out of memory error path
2018-05-02ipv4: fix fnhe usage by non-cached routesJulian Anastasov
Allow some non-cached routes to use non-expired fnhe: 1. ip_del_fnhe: moved above and now called by find_exception. The 4.5+ commit deed49df7390 expires fnhe only when caching routes. Change that to: 1.1. use fnhe for non-cached local output routes, with the help from (2) 1.2. allow __mkroute_input to detect expired fnhe (outdated fnhe_gw, for example) when do_cache is false, eg. when itag!=0 for unicast destinations. 2. __mkroute_output: keep fi to allow local routes with orig_oif != 0 to use fnhe info even when the new route will not be cached into fnhe. After commit 839da4d98960 ("net: ipv4: set orig_oif based on fib result for local traffic") it means all local routes will be affected because they are not cached. This change is used to solve a PMTU problem with IPVS (and probably Netfilter DNAT) setups that redirect local clients from target local IP (local route to Virtual IP) to new remote IP target, eg. IPVS TUN real server. Loopback has 64K MTU and we need to create fnhe on the local route that will keep the reduced PMTU for the Virtual IP. Without this change fnhe_pmtu is updated from ICMP but never exposed to non-cached local routes. This includes routes with flowi4_oif!=0 for 4.6+ and with flowi4_oif=any for 4.14+). 3. update_or_create_fnhe: make sure fnhe_expires is not 0 for new entries Fixes: 839da4d98960 ("net: ipv4: set orig_oif based on fib result for local traffic") Fixes: d6d5e999e5df ("route: do not cache fib route info on local routes with oif") Fixes: deed49df7390 ("route: check and remove route cache when we get route") Cc: David Ahern <dsahern@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small bug fixes: an illegally overlapping memcmp in target code, a potential infinite loop in isci under certain rare phy conditions and an ATA queue depth (performance) correction for storvsc" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: Fix fortify_panic kernel exception scsi: isci: Fix infinite loop in while loop scsi: storvsc: Set up correct queue depth values for IDE devices
2018-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-05-03 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Several BPF sockmap fixes mostly related to bugs in error path handling, that is, a bug in updating the scatterlist length / offset accounting, a missing sk_mem_uncharge() in redirect error handling, and a bug where the outstanding bytes counter sg_size was not zeroed, from John. 2) Fix two memory leaks in the x86-64 BPF JIT, one in an error path where we still don't converge after image was allocated and another one where BPF calls are used and JIT passes don't converge, from Daniel. 3) Minor fix in BPF selftests where in test_stacktrace_build_id() we drop useless args in urandom_read and we need to add a missing newline in a CHECK() error message, from Song. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02Merge branch 'bpf-sockmap-fixes'Alexei Starovoitov
John Fastabend says: ==================== When I added the test_sockmap to selftests I mistakenly changed the test logic a bit. The result of this was on redirect cases we ended up choosing the wrong sock from the BPF program and ended up sending to a socket that had no receive handler. The result was the actual receive handler, running on a different socket, is timing out and closing the socket. This results in errors (-EPIPE to be specific) on the sending side. Typically happening if the sender does not complete the send before the receive side times out. So depending on timing and the size of the send we may get errors. This exposed some bugs in the sockmap error path handling. This series fixes the errors. The primary issue is we did not do proper memory accounting in these cases which resulted in missing a sk_mem_uncharge(). This happened in the redirect path and in one case on the normal send path. See the three patches for the details. The other take-away from this is we need to fix the test_sockmap and also add more negative test cases. That will happen in bpf-next. Finally, I tested this using the existing test_sockmap program, the older sockmap sample test script, and a few real use cases with Cilium. All of these seem to be in working correctly. v2: fix compiler warning, drop iterator variable 'i' that is no longer used in patch 3. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02bpf: sockmap, fix error handling in redirect failuresJohn Fastabend
When a redirect failure happens we release the buffers in-flight without calling a sk_mem_uncharge(), the uncharge is called before dropping the sock lock for the redirecte, however we missed updating the ring start index. When no apply actions are in progress this is OK because we uncharge the entire buffer before the redirect. But, when we have apply logic running its possible that only a portion of the buffer is being redirected. In this case we only do memory accounting for the buffer slice being redirected and expect to be able to loop over the BPF program again and/or if a sock is closed uncharge the memory at sock destruct time. With an invalid start index however the program logic looks at the start pointer index, checks the length, and when seeing the length is zero (from the initial release and failure to update the pointer) aborts without uncharging/releasing the remaining memory. The fix for this is simply to update the start index. To avoid fixing this error in two locations we do a small refactor and remove one case where it is open-coded. Then fix it in the single function. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02bpf: sockmap, zero sg_size on error when buffer is releasedJohn Fastabend
When an error occurs during a redirect we have two cases that need to be handled (i) we have a cork'ed buffer (ii) we have a normal sendmsg buffer. In the cork'ed buffer case we don't currently support recovering from errors in a redirect action. So the buffer is released and the error should _not_ be pushed back to the caller of sendmsg/sendpage. The rationale here is the user will get an error that relates to old data that may have been sent by some arbitrary thread on that sock. Instead we simple consume the data and tell the user that the data has been consumed. We may add proper error recovery in the future. However, this patch fixes a bug where the bytes outstanding counter sg_size was not zeroed. This could result in a case where if the user has both a cork'ed action and apply action in progress we may incorrectly call into the BPF program when the user expected an old verdict to be applied via the apply action. I don't have a use case where using apply and cork at the same time is valid but we never explicitly reject it because it should work fine. This patch ensures the sg_size is zeroed so we don't have this case. In the normal sendmsg buffer case (no cork data) we also do not zero sg_size. Again this can confuse the apply logic when the logic calls into the BPF program when the BPF programmer expected the old verdict to remain. So ensure we set sg_size to zero here as well. And additionally to keep the psock state in-sync with the sk_msg_buff release all the memory as well. Previously we did this before returning to the user but this left a gap where psock and sk_msg_buff states were out of sync which seems fragile. No additional overhead is taken here except for a call to check the length and realize its already been freed. This is in the error path as well so in my opinion lets have robust code over optimized error paths. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02bpf: sockmap, fix scatterlist update on error path in send with applyJohn Fastabend
When the call to do_tcp_sendpage() fails to send the complete block requested we either retry if only a partial send was completed or abort if we receive a error less than or equal to zero. Before returning though we must update the scatterlist length/offset to account for any partial send completed. Before this patch we did this at the end of the retry loop, but this was buggy when used while applying a verdict to fewer bytes than in the scatterlist. When the scatterlist length was being set we forgot to account for the apply logic reducing the size variable. So the result was we chopped off some bytes in the scatterlist without doing proper cleanup on them. This results in a WARNING when the sock is tore down because the bytes have previously been charged to the socket but are never uncharged. The simple fix is to simply do the accounting inside the retry loop subtracting from the absolute scatterlist values rather than trying to accumulate the totals and subtract at the end. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02net_sched: fq: take care of throttled flows before reuseEric Dumazet
Normally, a socket can not be freed/reused unless all its TX packets left qdisc and were TX-completed. However connect(AF_UNSPEC) allows this to happen. With commit fc59d5bdf1e3 ("pkt_sched: fq: clear time_next_packet for reused flows") we cleared f->time_next_packet but took no special action if the flow was still in the throttled rb-tree. Since f->time_next_packet is the key used in the rb-tree searches, blindly clearing it might break rb-tree integrity. We need to make sure the flow is no longer in the rb-tree to avoid this problem. Fixes: fc59d5bdf1e3 ("pkt_sched: fq: clear time_next_packet for reused flows") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6"Ido Schimmel
This reverts commit edd7ceb78296 ("ipv6: Allow non-gateway ECMP for IPv6"). Eric reported a division by zero in rt6_multipath_rebalance() which is caused by above commit that considers identical local routes to be siblings. The division by zero happens because a nexthop weight is not set for local routes. Revert the commit as it does not fix a bug and has side effects. To reproduce: # ip -6 address add 2001:db8::1/64 dev dummy0 # ip -6 address add 2001:db8::1/64 dev dummy1 Fixes: edd7ceb78296 ("ipv6: Allow non-gateway ECMP for IPv6") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02Merge branch 'r8169-series-with-further-improvements'David S. Miller
Heiner Kallweit says: ==================== r8169: series with further improvements I thought I'm more or less done with the basic refactoring. But again I stumbled across things that can be improved / simplified. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: replace get_protocol with vlan_get_protocolHeiner Kallweit
This patch is basically the same as 6e74d1749a33 ("r8152: replace get_protocol with vlan_get_protocol"). Use vlan_get_protocol instead of duplicating the functionality. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: avoid potentially misaligned access when getting mac addressHeiner Kallweit
Interpreting a member of an u16 array as u32 may result in a misaligned access. Also it's not really intuitive to define a mac address variable as array of three u16 words. Therefore use an array of six bytes that is properly aligned for 32 bit access. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: improve PCI config space accessHeiner Kallweit
Some chips have a non-zero function id, however instead of hardcoding the id's (CSIAR_FUNC_NIC and CSIAR_FUNC_NIC2) we can get them dynamically via PCI_FUNC(pci_dev->devfn). This way we can get rid of the csi_ops. In general csi is just a fallback mechanism for PCI config space access in case no native access is supported. Therefore let's try native access first. I checked with Realtek regarding the functionality of config space byte 0x070f and according to them it controls the L0s/L1 entrance latency. Currently ASPM is disabled in general and therefore this value isn't used. However we may introduce a whitelist for chips where ASPM is known to work, therefore let's keep this code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: drop rtl_generic_opHeiner Kallweit
Only two places are left where rtl_generic_op() is used, so we can inline it and simplify the code a little. This change also avoids the overhead of unlocking/locking in case the respective operation isn't set. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: replace longer if statements with switch statementsHeiner Kallweit
Some longer if statements can be simplified by using switch statements instead. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: simplify code by using ranges in switch clausesHeiner Kallweit
Several switch statements can be significantly simplified by using case ranges. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: drop member pll_power_ops from struct rtl8169_privateHeiner Kallweit
After merging r810x_pll_power_down/up and r8168_pll_power_down/up we don't need member pll_power_ops any longer and can drop it, thus simplifying the code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: merge r810x_pll_power_down/up into r8168_pll_power_down/upHeiner Kallweit
r810x_pll_power_down/up and r8168_pll_power_down/up have a lot in common, so we can simplify the code by merging the former into the latter. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: remove 810x_phy_power_up/downHeiner Kallweit
The functionality of 810x_phy_power_up/down is covered by the default clause in 8168_phy_power_up/down. Therefore we don't need these functions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02r8169: remove unneeded check in r8168_pll_power_downHeiner Kallweit
RTL_GIGA_MAC_VER_23/24 are configured by rtl_hw_start_8168cp_2() and rtl_hw_start_8168cp_3() respectively which both apply CPCMD_QUIRK_MASK, thus clearing bit ASF. Bit ASF isn't set at any other place in the driver, therefore this check can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02parisc: Fix section mismatchesHelge Deller
Fix three section mismatches: 1) Section mismatch in reference from the function ioread8() to the function .init.text:pcibios_init_bridge() 2) Section mismatch in reference from the function free_initmem() to the function .init.text:map_pages() 3) Section mismatch in reference from the function ccio_ioc_init() to the function .init.text:count_parisc_driver() Signed-off-by: Helge Deller <deller@gmx.de>
2018-05-02parisc: drivers.c: Fix section mismatchesHelge Deller
Fix two section mismatches in drivers.c: 1) Section mismatch in reference from the function alloc_tree_node() to the function .init.text:create_tree_node(). 2) Section mismatch in reference from the function walk_native_bus() to the function .init.text:alloc_pa_dev(). Signed-off-by: Helge Deller <deller@gmx.de>
2018-05-02Merge branch 'x86-bpf-jit-fixes'Alexei Starovoitov
Daniel Borkmann says: ==================== Fix two memory leaks in x86 JIT. For details, please see individual patches in this series. Thanks! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02bpf, x64: fix memleak when not converging on callsDaniel Borkmann
The JIT logic in jit_subprogs() is as follows: for all subprogs we allocate a bpf_prog_alloc(), populate it (prog->is_func = 1 here), and pass it to bpf_int_jit_compile(). If a failure occurred during JIT and prog->jited is not set, then we bail out from attempting to JIT the whole program, and punt to the interpreter instead. In case JITing went successful, we fixup BPF call offsets and do another pass to bpf_int_jit_compile() (extra_pass is true at that point) to complete JITing calls. Given that requires to pass JIT context around addrs and jit_data from x86 JIT are freed in the extra_pass in bpf_int_jit_compile() when calls are involved (if not, they can be freed immediately). However, if in the original pass, the JIT image didn't converge then we leak addrs and jit_data since image itself is NULL, the prog->is_func is set and extra_pass is false in that case, meaning both will become unreachable and are never cleaned up, therefore we need to free as well on !image. Only x64 JIT is affected. Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02bpf, x64: fix memleak when not converging after imageDaniel Borkmann
While reviewing x64 JIT code, I noticed that we leak the prior allocated JIT image in the case where proglen != oldproglen during the JIT passes. Prior to the commit e0ee9c12157d ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") we would just break out of the loop, and using the image as the JITed prog since it could only shrink in size anyway. After e0ee9c12157d, we would bail out to out_addrs label where we free addrs and jit_data but not the image coming from bpf_jit_binary_alloc(). Fixes: e0ee9c12157d ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-02Merge branch 'net-smc-small-features'David S. Miller
Ursula Braun says: ==================== net/smc: small features 2018/04/30 here are 4 smc patches for net-next covering small new features in different areas: * link health check * diagnostics for IPv6 smc sockets * ioctl * improvement for vlan determination v2 changes: * better title * patch 2 - remove compile problem for disabled CONFIG_IPV6 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02net/smc: determine vlan_id of stacked net_deviceUrsula Braun
An SMC link group is bound to a specific vlan_id. Its link uses the RoCE-GIDs established for the specific vlan_id. This patch makes sure the appropriate vlan_id is determined for stacked scenarios like for instance a master bonding device with vlan devices enslaved. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02net/smc: handle ioctls SIOCINQ, SIOCOUTQ, and SIOCOUTQNSDUrsula Braun
SIOCINQ returns the amount of unread data in the RMB. SIOCOUTQ returns the amount of unsent or unacked sent data in the send buffer. SIOCOUTQNSD returns the amount of data prepared for sending, but not yet sent. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02net/smc: ipv6 support for smc_diag.cKarsten Graul
Update smc_diag.c to support ipv6 addresses on the diagnosis interface. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02net/smc: periodic testlink supportKarsten Graul
Add periodic LLC testlink support to ensure the link is still active. The interval time is initialized using the value of sysctl_tcp_keepalive_time. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02net/smc: restrict non-blocking connect finishUrsula Braun
The smc_poll code tries to finish connect() if the socket is in state SMC_INIT and polling of the internal CLC-socket returns with EPOLLOUT. This makes sense for a select/poll call following a connect call, but not without preceding connect(). With this patch smc_poll starts connect logic only, if the CLC-socket is no longer in its initial state TCP_CLOSE. In addition, a poll error on the internal CLC-socket is always propagated to the SMC socket. With this patch the code path mentioned by syzbot https://syzkaller.appspot.com/bug?extid=03faa2dc16b8b64be396 is no longer possible. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reported-by: syzbot+03faa2dc16b8b64be396@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-028139too: Use disable_irq_nosync() in rtl8139_poll_controller()Ingo Molnar
Use disable_irq_nosync() instead of disable_irq() as this might be called in atomic context with netpoll. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02Merge branch 'mlxsw-Reject-unsupported-FIB-configurations'David S. Miller
Ido Schimmel says: ==================== mlxsw: Reject unsupported FIB configurations Recently it became possible for listeners of the FIB notification chain to veto operations such as addition of routes and rules. Adjust the mlxsw driver to take advantage of it and return an error for unsupported FIB rules and for routes configured after the abort mechanism was triggered (due to exceeded resources for example). v2: * Change error code in first patch to -EOPNOTSUPP (David Ahern). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>