linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2017-09-08	bpf: make error reporting in bpf_warn_invalid_xdp_action more clear	Daniel Borkmann
	Differ between illegal XDP action code and just driver unsupported one to provide better feedback when we throw a one-time warning here. Reason is that with 814abfabef3c ("xdp: add bpf_redirect helper function") not all drivers support the new XDP return code yet and thus they will fall into their 'default' case when checking for return codes after program return, which then triggers a bpf_warn_invalid_xdp_action() stating that the return code is illegal, but from XDP perspective it's not. I decided not to place something like a XDP_ACT_MAX define into uapi i) given we don't have this either for all other program types, ii) future action codes could have further encoding there, which would render such define unsuitable and we wouldn't be able to rip it out again, and iii) we rarely add new action codes. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	Revert "mdio_bus: Remove unneeded gpiod NULL check"	Florian Fainelli
	This reverts commit 95b80bf3db03c2bf572a357cf74b9a6aefef0a4a ("mdio_bus: Remove unneeded gpiod NULL check"), this commit assumed that GPIOLIB checks for NULL descriptors, so it's safe to drop them, but it is not when CONFIG_GPIOLIB is disabled in the kernel. If we do call gpiod_set_value_cansleep() on a GPIO descriptor we will issue warnings coming from the inline stubs declared in include/linux/gpio/consumer.h. Fixes: 95b80bf3db03 ("mdio_bus: Remove unneeded gpiod NULL check") Reported-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	Merge branch 'xdp-bpf-fixes'	David S. Miller
	John Fastabend says: ==================== net: Fixes for XDP/BPF The following fixes, UAPI updates, and small improvement, i. XDP needs to be called inside RCU with preempt disabled. ii. Not strictly a bug fix but we have an attach command in the sockmap UAPI already to avoid having a single kernel released with only the attach and not the detach I'm pushing this into net branch. Its early in the RC cycle so I think this is OK (not ideal but better than supporting a UAPI with a missing detach forever). iii. Final patch replace cpu_relax with cond_resched in devmap. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	bpf: devmap, use cond_resched instead of cpu_relax	John Fastabend
	Be a bit more friendly about waiting for flush bits to complete. Replace the cpu_relax() with a cond_resched(). Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	bpf: add support for sockmap detach programs	John Fastabend
	The bpf map sockmap supports adding programs via attach commands. This patch adds the detach command to keep the API symmetric and allow users to remove previously added programs. Otherwise the user would have to delete the map and re-add it to get in this state. This also adds a series of additional tests to capture detach operation and also attaching/detaching invalid prog types. API note: socks will run (or not run) programs depending on the state of the map at the time the sock is added. We do not for example walk the map and remove programs from previously attached socks. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	net: rcu lock and preempt disable missing around generic xdp	John Fastabend
	do_xdp_generic must be called inside rcu critical section with preempt disabled to ensure BPF programs are valid and per-cpu variables used for redirect operations are consistent. This patch ensures this is true and fixes the splat below. The netif_receive_skb_internal() code path is now broken into two rcu critical sections. I decided it was better to limit the preempt_enable/disable block to just the xdp static key portion and the fallout is more rcu_read_lock/unlock calls. Seems like the best option to me. [ 607.596901] ============================= [ 607.596906] WARNING: suspicious RCU usage [ 607.596912] 4.13.0-rc4+ #570 Not tainted [ 607.596917] ----------------------------- [ 607.596923] net/core/dev.c:3948 suspicious rcu_dereference_check() usage! [ 607.596927] [ 607.596927] other info that might help us debug this: [ 607.596927] [ 607.596933] [ 607.596933] rcu_scheduler_active = 2, debug_locks = 1 [ 607.596938] 2 locks held by pool/14624: [ 607.596943] #0: (rcu_read_lock_bh){......}, at: [<ffffffff95445ffd>] ip_finish_output2+0x14d/0x890 [ 607.596973] #1: (rcu_read_lock_bh){......}, at: [<ffffffff953c8e3a>] __dev_queue_xmit+0x14a/0xfd0 [ 607.597000] [ 607.597000] stack backtrace: [ 607.597006] CPU: 5 PID: 14624 Comm: pool Not tainted 4.13.0-rc4+ #570 [ 607.597011] Hardware name: Dell Inc. Precision Tower 5810/0HHV7N, BIOS A17 03/01/2017 [ 607.597016] Call Trace: [ 607.597027] dump_stack+0x67/0x92 [ 607.597040] lockdep_rcu_suspicious+0xdd/0x110 [ 607.597054] do_xdp_generic+0x313/0xa50 [ 607.597068] ? time_hardirqs_on+0x5b/0x150 [ 607.597076] ? mark_held_locks+0x6b/0xc0 [ 607.597088] ? netdev_pick_tx+0x150/0x150 [ 607.597117] netif_rx_internal+0x205/0x3f0 [ 607.597127] ? do_xdp_generic+0xa50/0xa50 [ 607.597144] ? lock_downgrade+0x2b0/0x2b0 [ 607.597158] ? __lock_is_held+0x93/0x100 [ 607.597187] netif_rx+0x119/0x190 [ 607.597202] loopback_xmit+0xfd/0x1b0 [ 607.597214] dev_hard_start_xmit+0x127/0x4e0 Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices") Fixes: b5cdae3291f7 ("net: Generic XDP") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	bpf: don't select potentially stale ri->map from buggy xdp progs	Daniel Borkmann
	We can potentially run into a couple of issues with the XDP bpf_redirect_map() helper. The ri->map in the per CPU storage can become stale in several ways, mostly due to misuse, where we can then trigger a use after free on the map: i) prog A is calling bpf_redirect_map(), returning XDP_REDIRECT and running on a driver not supporting XDP_REDIRECT yet. The ri->map on that CPU becomes stale when the XDP program is unloaded on the driver, and a prog B loaded on a different driver which supports XDP_REDIRECT return code. prog B would have to omit calling to bpf_redirect_map() and just return XDP_REDIRECT, which would then access the freed map in xdp_do_redirect() since not cleared for that CPU. ii) prog A is calling bpf_redirect_map(), returning a code other than XDP_REDIRECT. prog A is then detached, which triggers release of the map. prog B is attached which, similarly as in i), would just return XDP_REDIRECT without having called bpf_redirect_map() and thus be accessing the freed map in xdp_do_redirect() since not cleared for that CPU. iii) prog A is attached to generic XDP, calling the bpf_redirect_map() helper and returning XDP_REDIRECT. xdp_do_generic_redirect() is currently not handling ri->map (will be fixed by Jesper), so it's not being reset. Later loading a e.g. native prog B which would, say, call bpf_xdp_redirect() and then returns XDP_REDIRECT would find in xdp_do_redirect() that a map was set and uses that causing use after free on map access. Fix thus needs to avoid accessing stale ri->map pointers, naive way would be to call a BPF function from drivers that just resets it to NULL for all XDP return codes but XDP_REDIRECT and including XDP_REDIRECT for drivers not supporting it yet (and let ri->map being handled in xdp_do_generic_redirect()). There is a less intrusive way w/o letting drivers call a reset for each BPF run. The verifier knows we're calling into bpf_xdp_redirect_map() helper, so it can do a small insn rewrite transparent to the prog itself in the sense that it fills R4 with a pointer to the own bpf_prog. We have that pointer at verification time anyway and R4 is allowed to be used as per calling convention we scratch R0 to R5 anyway, so they become inaccessible and program cannot read them prior to a write. Then, the helper would store the prog pointer in the current CPUs struct redirect_info. Later in xdp_do_*_redirect() we check whether the redirect_info's prog pointer is the same as passed xdp_prog pointer, and if that's the case then all good, since the prog holds a ref on the map anyway, so it is always valid at that point in time and must have a reference count of at least 1. If in the unlikely case they are not equal, it means we got a stale pointer, so we clear and bail out right there. Also do reset map and the owning prog in bpf_xdp_redirect(), so that bpf_xdp_redirect_map() and bpf_xdp_redirect() won't get mixed up, only the last call should take precedence. A tc bpf_redirect() doesn't use map anywhere yet, so no need to clear it there since never accessed in that layer. Note that in case the prog is released, and thus the map as well we're still under RCU read critical section at that time and have preemption disabled as well. Once we commit with the __dev_map_insert_ctx() from xdp_do_redirect_map() and set the map to ri->map_to_flush, we still wait for a xdp_do_flush_map() to finish in devmap dismantle time once flush_needed bit is set, so that is fine. Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	net: tulip: Constify tulip_tbl	Kees Cook
	It looks like all users of tulip_tbl are reads, so mark this table as read-only. $ git grep tulip_tbl # edited to avoid line-wraps... interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ... interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs&~RxPollInt, ... interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ... interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs \| TimerInt, pnic.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7); tulip.h: extern struct tulip_chip_table tulip_tbl[]; tulip_core.c:struct tulip_chip_table tulip_tbl[] = { tulip_core.c:iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR5); tulip_core.c:iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7); tulip_core.c:setup_timer(&tp->timer, tulip_tbl[tp->chip_id].media_timer, tulip_core.c:const char *chip_name = tulip_tbl[chip_idx].chip_name; tulip_core.c:if (pci_resource_len (pdev, 0) < tulip_tbl[chip_idx].io_size) tulip_core.c:ioaddr = pci_iomap(..., tulip_tbl[chip_idx].io_size); tulip_core.c:tp->flags = tulip_tbl[chip_idx].flags; tulip_core.c:setup_timer(&tp->timer, tulip_tbl[tp->chip_id].media_timer, tulip_core.c:INIT_WORK(&tp->media_work, tulip_tbl[tp->chip_id].media_task); Cc: "David S. Miller" <davem@davemloft.net> Cc: Jarod Wilson <jarod@redhat.com> Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Cc: netdev@vger.kernel.org Cc: linux-parisc@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	net: ethernet: ti: netcp_core: no need in netif_napi_del	Ivan Khoronzhuk
	Don't remove rx_napi specifically just before free_netdev(), it's supposed to be done in it and is confusing w/o tx_napi deletion. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	davicom: Display proper debug level up to 6	Mathieu Malaterre
	This will make it explicit some messages are of the form: dm9000_dbg(db, 5, ... Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	net: phy: sfp: rename dt properties to match the binding	Baruch Siach
	Make the Rx rate select control gpio property name match the documented binding. This would make the addition of 'rate-select1-gpios' for SFP+ support more natural. Also, make the MOD-DEF0 gpio property name match the documentation. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	dt-binding: net: sfp binding documentation	Baruch Siach
	Add device-tree binding documentation SFP transceivers. Support for SFP transceivers has been recently introduced (drivers/net/phy/sfp.c). Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	dt-bindings: add SFF vendor prefix	Baruch Siach
	Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	dt-bindings: net: don't confuse with generic PHY property	Baruch Siach
	This complements commit 9a94b3a4bd (dt-binding: phy: don't confuse with Ethernet phy properties). The generic PHY 'phys' property sometime appears in the same node with the Ethernet PHY 'phy' or 'phy-handle' properties. Add a warning in ethernet.txt to reduce confusion. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	ip6_tunnel: fix setting hop_limit value for ipv6 tunnel	Haishuang Yan
	Similar to vxlan/geneve tunnel, if hop_limit is zero, it should fall back to ip6_dst_hoplimt(). Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	ip_tunnel: fix setting ttl and tos value in collect_md mode	Haishuang Yan
	ttl and tos variables are declared and assigned, but are not used in iptunnel_xmit() function. Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel") Cc: Alexei Starovoitov <ast@fb.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	ipv6: fix typo in fib6_net_exit()	Eric Dumazet
	IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ. Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns destruction") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	tcp: fix a request socket leak	Eric Dumazet
	While the cited commit fixed a possible deadlock, it added a leak of the request socket, since reqsk_put() must be called if the BPF filter decided the ACK packet must be dropped. Fixes: d624d276d1dd ("tcp: fix possible deadlock in TCP stack vs BPF filter") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf	David S. Miller
	Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Fix SCTP connection setup when IPVS module is loaded and any scheduler is registered, from Xin Long. 2) Don't create a SCTP connection from SCTP ABORT packets, also from Xin Long. 3) WARN_ON() and drop packet, instead of BUG_ON() races when calling nf_nat_setup_info(). This is specifically a longstanding problem when br_netfilter with conntrack support is in place, patch from Florian Westphal. 4) Avoid softlock splats via iptables-restore, also from Florian. 5) Revert NAT hashtable conversion to rhashtable, semantics of rhlist are different from our simple NAT hashtable, this has been causing problems in the recent Linux kernel releases. From Florian. 6) Add per-bucket spinlock for NAT hashtable, so at least we restore one of the benefits we got from the previous rhashtable conversion. 7) Fix incorrect hashtable size in memory allocation in xt_hashlimit, from Zhizhou Tian. 8) Fix build/link problems with hashlimit and 32-bit arches, to address recent fallout from a new hashlimit mode, from Vishwanath Pai. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	Merge tag 'wireless-drivers-for-davem-2017-09-08' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.14 Few fixes to regressions introduced in the last one or two releases. The iwlwifi fix is for a regression reported by Linus. rtlwifi * fix two antenna selection related bugs iwlwifi * fix regression with older firmwares brcmfmac * workaround firmware crash for bcm4345 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	sctp: fix missing wake ups in some situations	Marcelo Ricardo Leitner
	Commit fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") minimized the number of wake ups that are triggered in case the association receives a packet with multiple data chunks on it and/or when io_events are enabled and then commit 0970f5b36659 ("sctp: signal sk_data_ready earlier on data chunks reception") moved the wake up to as soon as possible. It thus relies on the state machine running later to clean the flag that the event was already generated. The issue is that there are 2 call paths that calls sctp_ulpq_tail_event() outside of the state machine, causing the flag to linger and possibly omitting a needed wake up in the sequence. One of the call paths is when enabling SCTP_SENDER_DRY_EVENTS via setsockopt(SCTP_EVENTS), as noticed by Harald Welte. The other is when partial reliability triggers removal of chunks from the send queue when the application calls sendmsg(). This commit fixes it by not setting the flag in case the socket is not owned by the user, as it won't be cleaned later. This works for user-initiated calls and also for rx path processing. Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible") Reported-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	netfilter: xt_hashlimit: fix build error caused by 64bit division	Vishwanath Pai
	64bit division causes build/link errors on 32bit architectures. It prints out error messages like: ERROR: "__aeabi_uldivmod" [net/netfilter/xt_hashlimit.ko] undefined! The value of avg passed through by userspace in BYTE mode cannot exceed U32_MAX. Which means 64bit division in user2rate_bytes is unnecessary. To fix this I have changed the type of param 'user' to u32. Since anything greater than U32_MAX is an invalid input we error out in hashlimit_mt_check_common() when this is the case. Changes in v2: Making return type as u32 would cause an overflow for small values of 'user' (for example 2, 3 etc). To avoid this I bumped up 'r' to u64 again as well as the return type. This is OK since the variable that stores the result is u64. We still avoid 64bit division here since 'user' is u32. Fixes: bea74641e378 ("netfilter: xt_hashlimit: add rate match mode") Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-08	netfilter: xt_hashlimit: alloc hashtable with right size	Zhizhou Tian
	struct xt_byteslimit_htable used hlist_head, but memory allocation is done through sizeof(struct list_head). Signed-off-by: Zhizhou Tian <zhizhou.tian@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-08	netfilter: core: remove erroneous warn_on	Florian Westphal
	kernel test robot reported: WARNING: CPU: 0 PID: 1244 at net/netfilter/core.c:218 __nf_hook_entries_try_shrink+0x49/0xcd [..] After allowing batching in nf_unregister_net_hooks its possible that an earlier call to __nf_hook_entries_try_shrink already compacted the list. If this happens we don't need to do anything. Fixes: d3ad2c17b4047 ("netfilter: core: batch nf_unregister_net_hooks synchronize_net calls") Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-08	netfilter: nat: use keyed locks	Florian Westphal
	no need to serialize on a single lock, we can partition the table and add/delete in parallel to different slots. This restores one of the advantages that got lost with the rhlist revert. Cc: Ivan Babrou <ibobrik@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-08	netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"	Florian Westphal
	This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1. It was not a good idea. The custom hash table was a much better fit for this purpose. A fast lookup is not essential, in fact for most cases there is no lookup at all because original tuple is not taken and can be used as-is. What needs to be fast is insertion and deletion. rhlist removal however requires a rhlist walk. We can have thousands of entries in such a list if source port/addresses are reused for multiple flows, if this happens removal requests are so expensive that deletions of a few thousand flows can take several seconds(!). The advantages that we got from rhashtable are: 1) table auto-sizing 2) multiple locks 1) would be nice to have, but it is not essential as we have at most one lookup per new flow, so even a million flows in the bysource table are not a problem compared to current deletion cost. 2) is easy to add to custom hash table. I tried to add hlist_node to rhlist to speed up rhltable_remove but this isn't doable without changing semantics. rhltable_remove_fast will check that the to-be-deleted object is part of the table and that requires a list walk that we want to avoid. Furthermore, using hlist_node increases size of struct rhlist_head, which in turn increases nf_conn size. Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821 Reported-by: Ivan Babrou <ibobrik@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-08	netfilter: xtables: add scheduling opportunity in get_counters	Florian Westphal
	There are reports about spurious softlockups during iptables-restore, a backtrace i saw points at get_counters -- it uses a sequence lock and also has unbounded restart loop. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-08	netfilter: nf_nat: don't bug when mapping already exists	Florian Westphal
	It seems preferrable to limp along if we have a conflicting mapping, its certainly better than a BUG(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-08	ipv6: fix memory leak with multiple tables during netns destruction	Sabrina Dubroca
	fib6_net_exit only frees the main and local tables. If another table was created with fib6_alloc_table, we leak it when the netns is destroyed. Fix this in the same way ip_fib_net_exit cleans up tables, by walking through the whole hashtable of fib6_table's. We can get rid of the special cases for local and main, since they're also part of the hashtable. Reproducer: ip netns add x ip -net x -6 rule add from 6003:1::/64 table 100 ip netns del x Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08	netfilter: ipvs: do not create conn for ABORT packet in sctp_conn_schedule	Xin Long
	There's no reason for ipvs to create a conn for an ABORT packet even if sysctl_sloppy_sctp is set. This patch is to accept it without creating a conn, just as ipvs does for tcp's RST packet. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-08	netfilter: ipvs: fix the issue that sctp_conn_schedule drops non-INIT packet	Xin Long
	Commit 5e26b1b3abce ("ipvs: support scheduling inverse and icmp SCTP packets") changed to check packet type early. It introduced a side effect: if it's not a INIT packet, ports will be set as NULL, and the packet will be dropped later. It caused that sctp couldn't create connection when ipvs module is loaded and any scheduler is registered on server. Li Shuang reproduced it by running the cmds on sctp server: # ipvsadm -A -t 1.1.1.1:80 -s rr # ipvsadm -D -t 1.1.1.1:80 then the server could't work any more. This patch is to return 1 when it's not an INIT packet. It means ipvs will accept it without creating a conn for it, just like what it does for tcp. Fixes: 5e26b1b3abce ("ipvs: support scheduling inverse and icmp SCTP packets") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-08	brcmfmac: feature check for multi-scheduled scan fails on bcm4345 devices	Ian W MORRISON
	The firmware feature check introduced for multi-scheduled scan is also failing for bcm4345 devices resulting in a firmware crash. The reason for this crash has not yet been root cause so this patch avoids the feature check for those device as a short-term fix. Fixes: 9fe929aaace6 ("brcmfmac: add firmware feature detection for gscan feature") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Ian W MORRISON <ianwmorrison@gmail.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2017-09-07	rds: Fix incorrect statistics counting	Håkon Bugge
	In rds_send_xmit() there is logic to batch the sends. However, if another thread has acquired the lock and has incremented the send_gen, it is considered a race and we yield. The code incrementing the s_send_lock_queue_raced statistics counter did not count this event correctly. This commit counts the race condition correctly. Changes from v1: - Removed check for someone_on_xmit() - Fixed incorrect indentation Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Knut Omang <knut.omang@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-07	isdn: isdnloop: fix logic error in isdnloop_sendbuf	Arnd Bergmann
	gcc-7 found an ancient bug in the loop driver, leading to a condition that is always false, meaning we ignore the contents of 'card->flags' here: drivers/isdn/isdnloop/isdnloop.c:412:37: error: ?: using integer constants in boolean context, the expression will always evaluate to 'true' [-Werror=int-in-bool-context] This changes the braces in the expression to ensure we actually compare the flag bits, rather than comparing a constant. As Joe Perches pointed out, an earlier patch of mine incorrectly assumed this was a false-positive warning. Cc: Joe Perches <joe@perches.com> Link: https://patchwork.kernel.org/patch/9840289/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-07	udp: drop head states only when all skb references are gone	Paolo Abeni
	After commit 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing") we clear the skb head state as soon as the skb carrying them is first processed. Since the same skb can be processed several times when MSG_PEEK is used, we can end up lacking the required head states, and eventually oopsing. Fix this clearing the skb head state only when processing the last skb reference. Reported-by: Eric Dumazet <edumazet@google.com> Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-07	ip6_gre: update mtu properly in ip6gre_err	Xin Long
	Now when probessing ICMPV6_PKT_TOOBIG, ip6gre_err only subtracts the offset of gre header from mtu info. The expected mtu of gre device should also subtract gre header. Otherwise, the next packets still can't be sent out. Jianlin found this issue when using the topo: client(ip6gre)<---->(nic1)route(nic2)<----->(ip6gre)server and reducing nic2's mtu, then both tcp and sctp's performance with big size data became 0. This patch is to fix it by also subtracting grehdr (tun->tun_hlen) from mtu info when updating gre device's mtu in ip6gre_err(). It also needs to subtract ETH_HLEN if gre dev'type is ARPHRD_ETHER. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-07	net: sched: fix memleak for chain zero	Jiri Pirko
	There's a memleak happening for chain 0. The thing is, chain 0 needs to be always present, not created on demand. Therefore tcf_block_get upon creation of block calls the tcf_chain_create function directly. The chain is created with refcnt == 1, which is not correct in this case and causes the memleak. So move the refcnt increment into tcf_chain_get function even for the case when chain needs to be created. Reported-by: Jakub Kicinski <kubakici@wp.pl> Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-07	Merge tag 'mac80211-for-davem-2017-09-07' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Back from a long absence, so we have a number of things: * a remain-on-channel fix from Avi * hwsim TX power fix from Beni * null-PTR dereference with iTXQ in some rare configurations (Chunho) * 40 MHz custom regdomain fixes (Emmanuel) * look at right place in HT/VHT capability parsing (Igor) * complete A-MPDU teardown properly (Ilan) * Mesh ID Element ordering fix (Liad) * avoid tracing warning in ht_dbg() (Sharon) * fix print of assoc/reassoc (Simon) * fix encrypted VLAN with iTXQ (myself) * fix calling context of TX queue wake (myself) * fix a deadlock with ath10k aggregation (myself) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-07	iwlwifi: mvm: only send LEDS_CMD when the FW supports it	Luca Coelho
	The LEDS_CMD command is only supported in some newer FW versions (e.g. iwlwifi-8000C-31.ucode), so we can't send it to older versions (such as iwlwifi-8000C-27.ucode). To fix this, check for a new bit in the FW capabilities TLV that tells when the command is supported. Note that the current version of -31.ucode in linux-firmware.git (31.532993.0) does not have this capability bit set, so the LED won't work, even though this version should support it. But we will update this firmware soon, so it won't be a problem anymore. Fixes: 7089ae634c50 ("iwlwifi: mvm: use firmware LED command where applicable") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2017-09-07	rtlwifi: btcoexist: Fix antenna selection code	Larry Finger
	In commit 87d8a9f35202 ("rtlwifi: btcoex: call bind to setup btcoex"), the code turns on a call to exhalbtc_bind_bt_coex_withadapter(). This routine contains a bug that causes incorrect antenna selection for those HP laptops with only one antenna and an incorrectly programmed EFUSE. These boxes are the ones that need the ant_sel module parameter. Fixes: 87d8a9f35202 ("rtlwifi: btcoex: call bind to setup btcoex") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Ping-Ke Shih <pkshih@realtek.com> Cc: Yan-Hsuan Chuang <yhchuang@realtek.com> Cc: Birming Chiu <birming@realtek.com> Cc: Shaofu <shaofu@realtek.com> Cc: Steven Ting <steventing@realtek.com> Cc: Stable <stable@vger.kernel.org> # 4.13+ Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2017-09-07	rtlwifi: btcoexist: Fix breakage of ant_sel for rtl8723be	Larry Finger
	In commit bcd37f4a0831 ("rtlwifi: btcoex: 23b 2ant: let bt transmit when hw initialisation done"), there is an additional error when the module parameter ant_sel is used to select the auxilary antenna. The error is that the antenna selection is not checked when writing the antenna selection register. Fixes: bcd37f4a0831 ("rtlwifi: btcoex: 23b 2ant: let bt transmit when hw initialisation done") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Ping-Ke Shih <pkshih@realtek.com> Cc: Yan-Hsuan Chuang <yhchuang@realtek.com> Cc: Birming Chiu <birming@realtek.com> Cc: Shaofu <shaofu@realtek.com> Cc: Steven Ting <steventing@realtek.com> Cc: Stable <stable@vger.kernel.org> # 4.12+ Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2017-09-06	tipc: remove unnecessary call to dev_net()	Kleber Sacilotto de Souza
	The net device is already stored in the 'net' variable, so no need to call dev_net() again. Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-06	netlink: access nlk groups safely in netlink bind and getname	Xin Long
	Now there is no lock protecting nlk ngroups/groups' accessing in netlink bind and getname. It's safe from nlk groups' setting in netlink_release, but not from netlink_realloc_groups called by netlink_setsockopt. netlink_lock_table is needed in both netlink bind and getname when accessing nlk groups. Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-06	netlink: fix an use-after-free issue for nlk groups	Xin Long
	ChunYu found a netlink use-after-free issue by syzkaller: [28448.842981] BUG: KASAN: use-after-free in __nla_put+0x37/0x40 at addr ffff8807185e2378 [28448.969918] Call Trace: [...] [28449.117207] __nla_put+0x37/0x40 [28449.132027] nla_put+0xf5/0x130 [28449.146261] sk_diag_fill.isra.4.constprop.5+0x5a0/0x750 [netlink_diag] [28449.176608] __netlink_diag_dump+0x25a/0x700 [netlink_diag] [28449.202215] netlink_diag_dump+0x176/0x240 [netlink_diag] [28449.226834] netlink_dump+0x488/0xbb0 [28449.298014] __netlink_dump_start+0x4e8/0x760 [28449.317924] netlink_diag_handler_dump+0x261/0x340 [netlink_diag] [28449.413414] sock_diag_rcv_msg+0x207/0x390 [28449.432409] netlink_rcv_skb+0x149/0x380 [28449.467647] sock_diag_rcv+0x2d/0x40 [28449.484362] netlink_unicast+0x562/0x7b0 [28449.564790] netlink_sendmsg+0xaa8/0xe60 [28449.661510] sock_sendmsg+0xcf/0x110 [28449.865631] __sys_sendmsg+0xf3/0x240 [28450.000964] SyS_sendmsg+0x32/0x50 [28450.016969] do_syscall_64+0x25c/0x6c0 [28450.154439] entry_SYSCALL64_slow_path+0x25/0x25 It was caused by no protection between nlk groups' free in netlink_release and nlk groups' accessing in sk_diag_dump_groups. The similar issue also exists in netlink_seq_show(). This patch is to defer nlk groups' free in deferred_put_nlk_sk. Reported-by: ChunYu Wang <chunwang@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-06	sched: Use __qdisc_drop instead of kfree_skb in sch_prio and sch_qfq	Gao Feng
	The commit 520ac30f4551 ("net_sched: drop packets after root qdisc lock is released) made a big change of tc for performance. There are two points left in sch_prio and sch_qfq which are not changed with that commit. Now enhance them now with __qdisc_drop. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-06	dt-binding: phy: don't confuse with Ethernet phy properties	Baruch Siach
	The generic PHY 'phys' property sometime appears in the same node with the Ethernet PHY 'phy' or 'phy-handle' properties. Add a warning in phy-bindings.txt to reduce confusion. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-06	Merge branch 'linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 4.14: API: - Defer scompress scratch buffer allocation to first use. - Add __crypto_xor that takes separte src and dst operands. - Add ahash multiple registration interface. - Revamped aead/skcipher algif code to fix async IO properly. Drivers: - Add non-SIMD fallback code path on ARM for SVE. - Add AMD Security Processor framework for ccp. - Add support for RSA in ccp. - Add XTS-AES-256 support for CCP version 5. - Add support for PRNG in sun4i-ss. - Add support for DPAA2 in caam. - Add ARTPEC crypto support. - Add Freescale RNGC hwrng support. - Add Microchip / Atmel ECC driver. - Add support for STM32 HASH module" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: af_alg - get_page upon reassignment to TX SGL crypto: cavium/nitrox - Fix an error handling path in 'nitrox_probe()' crypto: inside-secure - fix an error handling path in safexcel_probe() crypto: rockchip - Don't dequeue the request when device is busy crypto: cavium - add release_firmware to all return case crypto: sahara - constify platform_device_id MAINTAINERS: Add ARTPEC crypto maintainer crypto: axis - add ARTPEC-6/7 crypto accelerator driver crypto: hash - add crypto_(un)register_ahashes() dt-bindings: crypto: add ARTPEC crypto crypto: algif_aead - fix comment regarding memory layout crypto: ccp - use dma_mapping_error to check map error lib/mpi: fix build with clang crypto: sahara - Remove leftover from previous used spinlock crypto: sahara - Fix dma unmap direction crypto: af_alg - consolidation of duplicate code crypto: caam - Remove unused dentry members crypto: ccp - select CONFIG_CRYPTO_RSA crypto: ccp - avoid uninitialized variable warning crypto: serpent - improve __serpent_setkey with UBSAN ...
2017-09-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	Linus Torvalds
	Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-09-06	Merge tag 'wberr-v4.14-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull writeback error handling updates from Jeff Layton: "This pile continues the work from last cycle on better tracking writeback errors. In v4.13 we added some basic errseq_t infrastructure and converted a few filesystems to use it. This set continues refining that infrastructure, adds documentation, and converts most of the other filesystems to use it. The main exception at this point is the NFS client" * tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: ecryptfs: convert to file_write_and_wait in ->fsync mm: remove optimizations based on i_size in mapping writeback waits fs: convert a pile of fsync routines to errseq_t based reporting gfs2: convert to errseq_t based writeback error reporting for fsync fs: convert sync_file_range to use errseq_t based error-tracking mm: add file_fdatawait_range and file_write_and_wait fuse: convert to errseq_t based error tracking for fsync mm: consolidate dax / non-dax checks for writeback Documentation: add some docs for errseq_t errseq: rename __errseq_set to errseq_set
2017-09-06	Merge tag 'locks-v4.14-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "This pile just has a few file locking fixes from Ben Coddington. There are a couple of cleanup patches + an attempt to bring sanity to the l_pid value that is reported back to userland on an F_GETLK request. After a few gyrations, he came up with a way for filesystems to communicate to the VFS layer code whether the pid should be translated according to the namespace or presented as-is to userland" * tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: restore a warn for leaked locks on close fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks fs/locks: Use allocation rather than the stack in fcntl_getlk()