summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-10-14Merge tag 'mac80211-for-davem-2015-10-13' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Like last time, we have two small fixes: * fast-xmit was not doing powersave filter clearing correctly, disable fast-xmit while any such operations are still pending * a debugfs file was broken due to some infrastructure changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14tipc: eliminate risk of stalled link synchronizationJon Paul Maloy
In commit 6e498158a827 ("tipc: move link synch and failover to link aggregation level") we introduced a new mechanism for performing link failover and synchronization. We have now detected a bug in this mechanism. During link synchronization we use the arrival of any packet on the tunnel link to trig a check for whether it has reached the synchronization point or not. This has turned out to be too permissive, since it may cause an arriving non-last SYNCH packet to end the synch state, just to see the next SYNCH packet initiate a new synch state with a new, higher synch point. This is not fatal, but should be avoided, because it may significantly extend the synchronization period, while at the same time we are not allowed to send NACKs if packets are lost. In the worst case, a low-traffic user may see its traffic stall until a LINK_PROTOCOL state message trigs the link to leave synchronization state. At the same time, LINK_PROTOCOL packets which happen to have a (non- valid) sequence number lower than the tunnel link's rcv_nxt value will be consistently dropped, and will never be able to resolve the situation described above. We fix this by exempting LINK_PROTOCOL packets from the sequence number check, as they should be. We also reduce (but don't completely eliminate) the risk of entering multiple synchronization states by only allowing the (logically) first SYNCH packet to initiate a synchronization state. This works independently of actual packet arrival order. Fixes: commit 6e498158a827 ("tipc: move link synch and failover to link aggregation level") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-13Merge tag 'nfsd-4.3-2' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "Two nfsd fixes, one for an RDMA crash, one for a pnfs/block protocol bug" * tag 'nfsd-4.3-2' of git://linux-nfs.org/~bfields/linux: svcrdma: Fix NFS server crash triggered by 1MB NFS WRITE nfsd/blocklayout: accept any minlength
2015-10-13netfilter: sync with packet rx also after removing queue entriesFlorian Westphal
We need to sync packet rx again after flushing the queue entries. Otherwise, the following race could happen: cpu1: nf_unregister_hook(H) called, H unliked from lists, calls synchronize_net() to wait for packet rx completion. Problem is that while no new nf_queue_entry structs that use H can be allocated, another CPU might receive a verdict from userspace just before cpu1 calls nf_queue_nf_hook_drop to remove this entry: cpu2: receive verdict from userspace, lock queue cpu2: unlink nf_queue_entry struct E, which references H, from queue list cpu1: calls nf_queue_nf_hook_drop, blocks on queue spinlock cpu2: unlock queue cpu1: nf_queue_nf_hook_drop drops affected queue entries cpu2: call nf_reinject for E cpu1: kfree(H) cpu2: potential use-after-free for H Cc: Eric W. Biederman <ebiederm@xmission.com> Fixes: 085db2c04557 ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-13ipv6: Don't call with rt6_uncached_list_flush_devEric W. Biederman
As originally written rt6_uncached_list_flush_dev makes no sense when called with dev == NULL as it attempts to flush all uncached routes regardless of network namespace when dev == NULL. Which is simply incorrect behavior. Furthermore at the point rt6_ifdown is called with dev == NULL no more network devices exist in the network namespace so even if the code in rt6_uncached_list_flush_dev were to attempt something sensible it would be meaningless. Therefore remove support in rt6_uncached_list_flush_dev for handling network devices where dev == NULL, and only call rt6_uncached_list_flush_dev when rt6_ifdown is called with a network device. Fixes: 8d0b94afdca8 ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-13switchdev: check if the vlan id is in the proper vlan rangeNikolay Aleksandrov
VLANs 0 and 4095 are reserved and shouldn't be used, add checks to switchdev similar to the bridge. Also make sure ids above 4095 cannot be passed either. Fixes: 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-13mac80211: Fix hwflags debugfs file formatMohammed Shafi Shajakhan
Commit 30686bf7f5b3 ("mac80211: convert HW flags to unsigned long bitmap") accidentally removed the newline delimiter from the hwflags debugfs file. Fix this by adding back the newline between the HW flags. Cc: stable@vger.kernel.org [4.2] Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> [fix commit log] Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-10-12svcrdma: Fix NFS server crash triggered by 1MB NFS WRITEChuck Lever
Now that the NFS server advertises a maximum payload size of 1MB for RPC/RDMA again, it crashes in svc_process_common() when NFS client sends a 1MB NFS WRITE on an NFS/RDMA mount. The server has set up a 259 element array of struct page pointers in rq_pages[] for each incoming request. The last element of the array is NULL. When an incoming request has been completely received, rdma_read_complete() attempts to set the starting page of the incoming page vector: rqstp->rq_arg.pages = &rqstp->rq_pages[head->hdr_count]; and the page to use for the reply: rqstp->rq_respages = &rqstp->rq_arg.pages[page_no]; But the value of page_no has already accounted for head->hdr_count. Thus rq_respages now points past the end of the incoming pages. For NFS WRITE operations smaller than the maximum, this is harmless. But when the NFS WRITE operation is as large as the server's max payload size, rq_respages now points at the last entry in rq_pages, which is NULL. Fixes: cc9a903d915c ('svcrdma: Change maximum server payload . . .') BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-10-12netfilter: ipt_rpfilter: remove the nh_scope test in rpfilter_lookup_reverselucien
--accept-local option works for res.type == RTN_LOCAL, which should be from the local table, but there, the fib_info's nh->nh_scope = RT_SCOPE_NOWHERE ( > RT_SCOPE_HOST). in fib_create_info(). if (cfg->fc_scope == RT_SCOPE_HOST) { struct fib_nh *nh = fi->fib_nh; /* Local address is added. */ if (nhs != 1 || nh->nh_gw) goto err_inval; nh->nh_scope = RT_SCOPE_NOWHERE; <=== nh->nh_dev = dev_get_by_index(net, fi->fib_nh->nh_oif); err = -ENODEV; if (!nh->nh_dev) goto failure; but in our rpfilter_lookup_reverse(): if (dev_match || flags & XT_RPFILTER_LOOSE) return FIB_RES_NH(res).nh_scope <= RT_SCOPE_HOST; if nh->nh_scope > RT_SCOPE_HOST, it will fail. --accept-local option will never be passed. it seems the test is bogus and can be removed to fix this issue. if (dev_match || flags & XT_RPFILTER_LOOSE) return FIB_RES_NH(res).nh_scope <= RT_SCOPE_HOST; ipv6 does not have this issue. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-11ipv6: drop frames with attached skb->sk in forwardingHannes Frederic Sowa
This is a clone of commit 2ab957492d13b ("ip_forward: Drop frames with attached skb->sk") for ipv6. This commit has exactly the same reasons as the above mentioned commit, namely to prevent panics during netfilter reload or a misconfigured stack. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11ipv6: gre: setup default multicast routes over PtP linksHannes Frederic Sowa
GRE point-to-point interfaces should also support ipv6 multicast. Setting up default multicast routes on interface creation was forgotten. Add it. Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=103231> Cc: Julien Muchembled <jm@jmuchemb.eu> Cc: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dumazet <ndumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11sch_hhf: fix return value of hhf_drop()WANG Cong
Similar to commit c0afd9ce4d6a ("fq_codel: fix return value of fq_codel_drop()") ->drop() is supposed to return the number of bytes it dropped, but hhf_drop () returns the id of the bucket where it drops a packet from. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Terry Lam <vtlam@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09Merge tag 'nfsd-4.3-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfix from Bruce Fields: "Just one RDMA bugfix" * tag 'nfsd-4.3-1' of git://linux-nfs.org/~bfields/linux: svcrdma: handle rdma read with a non-zero initial page offset
2015-10-08bpf: clear sender_cpu before xmitAlexei Starovoitov
Similar to commit c29390c6dfee ("xps: must clear sender_cpu before forwarding") the skb->sender_cpu needs to be cleared before xmit. Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-08act_mirred: clear sender cpu before sending to txWANG Cong
Similar to commit c29390c6dfee ("xps: must clear sender_cpu before forwarding") the skb->sender_cpu needs to be cleared when moving from Rx Tx, otherwise kernel could crash. Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices") Cc: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMITJoe Stringer
Previously, the CT_ATTR_FLAGS attribute, when nested under the OVS_ACTION_ATTR_CT, encoded a 32-bit bitmask of flags that modify the semantics of the ct action. It's more extensible to just represent each flag as a nested attribute, and this requires no additional error checking to reject flags that aren't currently supported. Suggested-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Extend ct_state match field to 32 bitsJoe Stringer
The ct_state field was initially added as an 8-bit field, however six of the bits are already being used and use cases are already starting to appear that may push the limits of this field. This patch extends the field to 32 bits while retaining the internal representation of 8 bits. This should cover forward compatibility of the ABI for the foreseeable future. This patch also reorders the OVS_CS_F_* bits to be sequential. Suggested-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Reject ct_state unsupported bitsJoe Stringer
Previously, if userspace specified ct_state bits in the flow key which are currently undefined (and therefore unsupported), then they would be ignored. This could cause unexpected behaviour in future if userspace is extended to support additional bits but attempts to communicate with the current version of the kernel. This patch rectifies the situation by rejecting such ct_state bits. Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Ensure flow is valid before executing ctJoe Stringer
The ct action uses parts of the flow key, so we need to ensure that it is valid before executing that action. Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: Fix skb leak in ovs_fragment()Joe Stringer
If ovs_fragment() was unable to fragment the skb due to an L2 header that exceeds the supported length, skbs would be leaked. Fix the bug. Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: dsa: exit probe if no switch were foundNeil Armstrong
If no switch were found in dsa_setup_dst, return -ENODEV and exit the dsa_probe cleanly. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: dsa: switch to devm_ calls and remove kfree callsNeil Armstrong
Now the kfree calls exists in the the remove functions, remove them in all places except the of_probe functions and replace allocation calls with their devm_ counterparts. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: dsa: complete dsa_switch_destroyNeil Armstrong
When unbinding dsa, complete the dsa_switch_destroy to unregister the fixed link phy then cleanly unregister and destroy the net devices. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: dsa: add missing dsa_switch mdiobus removeNeil Armstrong
To prevent memory leakage on unbinding, add missing mdiobus unregister and unallocation calls. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: dsa: add missing kfree on removeNeil Armstrong
To prevent memory leakage on unbinding, add missing kfree calls. Includes minor cosmetic change to make patch clean. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Fix vti use case with oif in dst lookups for IPv6David Ahern
It occurred to me yesterday that 741a11d9e4103 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes the oif to be considered in lookups which is known to break vti. This explains why 58189ca7b274 did not the IPv6 change at the time it was submitted. Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-06xprtrdma: Don't require LOCAL_DMA_LKEY support for fastregSagi Grimberg
There is no need to require LOCAL_DMA_LKEY support as the PD allocation makes sure that there is a local_dma_lkey. Also correctly set a return value in error path. This caused a NULL pointer dereference in mlx5 which removed the support for LOCAL_DMA_LKEY. Fixes: bb6c96d72879 ("xprtrdma: Replace global lkey with lkey local to PD") Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-05openvswitch: Fix ovs_vport_get_stats()Pravin B Shelar
Not every device has dev->tstats set. So when OVS tries to calculate vport stats it causes kernel panic. Following patch fixes it by using standard API to get net-device stats. ---8<--- Unable to handle kernel paging request at virtual address 766b4008 Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: vport_vxlan vxlan ip6_udp_tunnel udp_tunnel tun bridge stp llc openvswitch ipv6 CPU: 7 PID: 1108 Comm: ovs-vswitchd Not tainted 4.3.0-rc3+ #82 PC is at ovs_vport_get_stats+0x150/0x1f8 [openvswitch] <snip> Call trace: [<ffffffbffc0859f8>] ovs_vport_get_stats+0x150/0x1f8 [openvswitch] [<ffffffbffc07cdb0>] ovs_vport_cmd_fill_info+0x140/0x1e0 [openvswitch] [<ffffffbffc07cf0c>] ovs_vport_cmd_dump+0xbc/0x138 [openvswitch] [<ffffffc00045a5ac>] netlink_dump+0xb8/0x258 [<ffffffc00045ace0>] __netlink_dump_start+0x120/0x178 [<ffffffc00045dd9c>] genl_family_rcv_msg+0x2d4/0x308 [<ffffffc00045de58>] genl_rcv_msg+0x88/0xc4 [<ffffffc00045cf24>] netlink_rcv_skb+0xd4/0x100 [<ffffffc00045dab0>] genl_rcv+0x30/0x48 [<ffffffc00045c830>] netlink_unicast+0x154/0x200 [<ffffffc00045cc9c>] netlink_sendmsg+0x308/0x364 [<ffffffc00041e10c>] sock_sendmsg+0x14/0x2c [<ffffffc000420d58>] SyS_sendto+0xbc/0xf0 Code: aa1603e1 f94037a4 aa1303e2 aa1703e0 (f9400465) Reported-by: Tomasz Sawicki <tomasz.sawicki@objectiveintegration.uk> Fixes: 8c876639c98 ("openvswitch: Remove vport stats.") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05ovs: do not allocate memory from offline numa nodeKonstantin Khlebnikov
When openvswitch tries allocate memory from offline numa node 0: stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0) It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h This patch disables numa affinity in this case. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05bpf: fix panic in SO_GET_FILTER with native ebpf programsDaniel Borkmann
When sockets have a native eBPF program attached through setsockopt(sk, SOL_SOCKET, SO_ATTACH_BPF, ...), and then try to dump these over getsockopt(sk, SOL_SOCKET, SO_GET_FILTER, ...), the following panic appears: [49904.178642] BUG: unable to handle kernel NULL pointer dereference at (null) [49904.178762] IP: [<ffffffff81610fd9>] sk_get_filter+0x39/0x90 [49904.182000] PGD 86fc9067 PUD 531a1067 PMD 0 [49904.185196] Oops: 0000 [#1] SMP [...] [49904.224677] Call Trace: [49904.226090] [<ffffffff815e3d49>] sock_getsockopt+0x319/0x740 [49904.227535] [<ffffffff812f59e3>] ? sock_has_perm+0x63/0x70 [49904.228953] [<ffffffff815e2fc8>] ? release_sock+0x108/0x150 [49904.230380] [<ffffffff812f5a43>] ? selinux_socket_getsockopt+0x23/0x30 [49904.231788] [<ffffffff815dff36>] SyS_getsockopt+0xa6/0xc0 [49904.233267] [<ffffffff8171b9ae>] entry_SYSCALL_64_fastpath+0x12/0x71 The underlying issue is the very same as in commit b382c0865600 ("sock, diag: fix panic in sock_diag_put_filterinfo"), that is, native eBPF programs don't store an original program since this is only needed in cBPF ones. However, sk_get_filter() wasn't updated to test for this at the time when eBPF could be attached. Just throw an error to the user to indicate that eBPF cannot be dumped over this interface. That way, it can also be known that a program _is_ attached (as opposed to just return 0), and a different (future) method needs to be consulted for a dump. Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05openvswitch: Rename LABEL->LABELSJoe Stringer
Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name for these to be consistent with conntrack. Fixes: c2ac667 "openvswitch: Allow matching on conntrack label" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05net/unix: fix logic about sk_peek_offsetAndrey Vagin
Now send with MSG_PEEK can return data from multiple SKBs. Unfortunately we take into account the peek offset for each skb, that is wrong. We need to apply the peek offset only once. In addition, the peek offset should be used only if MSG_PEEK is set. Cc: "David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING Cc: Eric Dumazet <edumazet@google.com> (commit_signer:1/14=7%) Cc: Aaron Conole <aconole@bytheb.org> Fixes: 9f389e35674f ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag") Signed-off-by: Andrey Vagin <avagin@openvz.org> Tested-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05act_mirred: always release tcf hashWANG Cong
Align with other tc actions. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05act_mirred: fix a race condition on mirred_listWANG Cong
After commit 1ce87720d456 ("net: sched: make cls_u32 lockless") we began to release tc actions in a RCU callback. However, mirred action relies on RTNL lock to protect the global mirred_list, therefore we could have a race condition between RCU callback and netdevice event, which caused a list corruption as reported by Vinson. Instead of relying on RTNL lock, introduce a spinlock to protect this list. Note, in non-bind case, it is still called with RTNL lock, therefore should disable BH too. Reported-by: Vinson Lee <vlee@twopensource.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05ipv4: fix reply_dst leakage on arp replyJiri Benc
There are cases when the created metadata reply is not used. Ensure the allocated memory is freed also in such cases. Fixes: 63d008a4e9ee ("ipv4: send arp replies to the correct tunnel") Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05inet: fix race in reqsk_queue_unlink()Eric Dumazet
reqsk_timer_handler() tests if icsk_accept_queue.listen_opt is NULL at its beginning. By the time it calls inet_csk_reqsk_queue_drop() and reqsk_queue_unlink(), listener might have been closed and inet_csk_listen_stop() had called reqsk_queue_yank_acceptq() which sets icsk_accept_queue.listen_opt to NULL We therefore need to correctly check listen_opt being NULL after holding syn_wait_lock for proper synchronization. Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer") Fixes: b357a364c57c ("inet: fix possible panic in reqsk_queue_unlink()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-02Merge tag 'nfs-rdma-for-4.3-2' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust
NFS: NFSoRDMA bugfix Fixes a use-after-free bug. Signed-off-by: Anna Schumaker <Anna.Schumaker@netapp.com>
2015-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix regression in SKB partial checksum handling, from Pravin B Shalar. 2) Fix VLAN inside of VXLAN handling in i40e driver, from Jesse Brandeburg. 3) Cure softlockups during accept() in SCTP, from Karl Heiss. 4) MSG_PEEK should return multiple SKBs worth of data in AF_UNIX, from Aaron Conole. 5) IPV6 erroneously ignores output interface specifier in lookup key for route lookups, fix from David Ahern. 6) In Marvell DSA driver, forward unknown frames to CPU port, from Andrew Lunn. 7) Mission flow flag initializations in some code paths, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: Initialize flow flags in input path net: dsa: fix preparation of a port STP update testptp: Silence compiler warnings on ppc64 net/mlx4: Handle return codes in mlx4_qp_attach_common dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port skbuff: Fix skb checksum partial check. net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set net sysfs: Print link speed as signed integer bna: fix error handling af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag af_unix: Convert the unix_sk macro to an inline function for type safety net: sctp: Don't use 64 kilobyte lookup table for four elements l2tp: protect tunnel->del_work by ref_count net/ibm/emac: bump version numbers for correct work with ethtool sctp: Prevent soft lockup when sctp_accept() is called during a timeout event sctp: Whitespace fix i40e/i40evf: check for stopped admin queue i40e: fix VLAN inside VXLAN r8169: fix handling rtl_readphy result net: hisilicon: fix handling platform_get_irq result
2015-10-01Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: - Fixes for mlx5 related issues - Fixes for ipoib multicast handling * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/ipoib: increase the max mcast backlog queue IB/ipoib: Make sendonly multicast joins create the mcast group IB/ipoib: Expire sendonly multicast joins IB/mlx5: Remove pa_lkey usages IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY IB/iser: Add module parameter for always register memory xprtrdma: Replace global lkey with lkey local to PD
2015-10-01netfilter: fix Kconfig dependencies for nf_dup_ipv{4,6}Pablo Neira Ayuso
net/built-in.o: In function `nf_dup_ipv4': (.text+0xed24d): undefined reference to `nf_conntrack_untracked' net/built-in.o: In function `nf_dup_ipv4': (.text+0xed267): undefined reference to `nf_conntrack_untracked' net/built-in.o: In function `nf_dup_ipv6': (.text+0x158aef): undefined reference to `nf_conntrack_untracked' net/built-in.o: In function `nf_dup_ipv6': (.text+0x158b09): undefined reference to `nf_conntrack_untracked' Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-29net: Initialize flow flags in input pathDavid Ahern
The fib_table_lookup tracepoint found 2 places where the flowi4_flags is not initialized. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: dsa: fix preparation of a port STP updateVivien Didelot
Because of the default 0 value of ret in dsa_slave_port_attr_set, a driver may return -EOPNOTSUPP from the commit phase of a STP state, which triggers a WARN() from switchdev. This happened on a 6185 switch which does not support hardware bridging. Fixes: 3563606258cf ("switchdev: convert STP update to switchdev attr set") Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29skbuff: Fix skb checksum partial check.Pravin B Shelar
Earlier patch 6ae459bda tried to detect void ckecksum partial skb by comparing pull length to checksum offset. But it does not work for all cases since checksum-offset depends on updates to skb->data. Following patch fixes it by validating checksum start offset after skb-data pointer is updated. Negative value of checksum offset start means there is no need to checksum. Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull") Reported-by: Andrew Vagin <avagin@odin.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is setDavid Ahern
Wolfgang reported that IPv6 stack is ignoring oif in output route lookups: With ipv6, ip -6 route get always returns the specific route. $ ip -6 r 2001:db8:e2::1 dev enp2s0 proto kernel metric 256 2001:db8:e2::/64 dev enp2s0 metric 1024 2001:db8:e3::1 dev enp3s0 proto kernel metric 256 2001:db8:e3::/64 dev enp3s0 metric 1024 fe80::/64 dev enp3s0 proto kernel metric 256 default via 2001:db8:e3::255 dev enp3s0 metric 1024 $ ip -6 r get 2001:db8:e2::100 2001:db8:e2::100 from :: dev enp2s0 src 2001:db8:e3::1 metric 0 cache $ ip -6 r get 2001:db8:e2::100 oif enp3s0 2001:db8:e2::100 from :: dev enp2s0 src 2001:db8:e3::1 metric 0 cache The stack does consider the oif but a mismatch in rt6_device_match is not considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags. Cc: Wolfgang Nothdurft <netdev@linux-dude.de> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29net sysfs: Print link speed as signed integerAlexander Stein
Otherwise 4294967295 (MBit/s) (-1) will be printed when there is no link. Documentation/ABI/testing/sysfs-class-net does not state if this shall be signed or unsigned. Also remove the now unused variable fmt_udec. Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29af_unix: return data from multiple SKBs on recv() with MSG_PEEK flagAaron Conole
AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag is set. This is referenced in kernel bugzilla #12323 @ https://bugzilla.kernel.org/show_bug.cgi?id=12323 As described both in the BZ and lkml thread @ http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an AF_UNIX socket only reads a single skb, where the desired effect is to return as much skb data has been queued, until hitting the recv buffer size (whichever comes first). The modified MSG_PEEK path will now move to the next skb in the tree and jump to the again: label, rather than following the natural loop structure. This requires duplicating some of the loop head actions. This was tested using the python socketpair python code attached to the bugzilla issue. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29svcrdma: handle rdma read with a non-zero initial page offsetSteve Wise
The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions were not taking into account the initial page_offset when determining the rdma read length. This resulted in a read who's starting address and length exceeded the base/bounds of the frmr. The server gets an async error from the rdma device and kills the connection, and the client then reconnects and resends. This repeats indefinitely, and the application hangs. Most work loads don't tickle this bug apparently, but one test hit it every time: building the linux kernel on a 16 core node with 'make -j 16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA. This bug seems to only be tripped with devices having small fastreg page list depths. I didn't see it with mlx4, for instance. Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic') Signed-off-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-09-29xfrm: Fix state threshold configuration from userspaceMichael Rossberg
Allow to change the replay threshold (XFRMA_REPLAY_THRESH) and expiry timer (XFRMA_ETIMER_THRESH) of a state without having to set other attributes like replay counter and byte lifetime. Changing these other values while traffic flows will break the state. Signed-off-by: Michael Rossberg <michael.rossberg@tu-ilmenau.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-09-28net: sctp: Don't use 64 kilobyte lookup table for four elementsDenys Vlasenko
Seemingly innocuous sctp_trans_state_to_prio_map[] array is way bigger than it looks, since "[SCTP_UNKNOWN] = 2" expands into "[0xffff] = 2" ! This patch replaces it with switch() statement. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Neil Horman <nhorman@tuxdriver.com> CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> CC: linux-sctp@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-28l2tp: protect tunnel->del_work by ref_countAlexander Couzens
There is a small chance that tunnel_free() is called before tunnel->del_work scheduled resulting in a zero pointer dereference. Signed-off-by: Alexander Couzens <lynxis@fe80.eu> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>