summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-08-10ieee802154: 6lowpan: fix error frag handlingAlexander Aring
This patch fixes the error handling for lowpan_xmit_fragment by replace "-PTR_ERR" to "PTR_ERR". PTR_ERR returns already a negative errno code. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-10ieee802154: add ack request default handlingAlexander Aring
This patch introduce a new mib entry which isn't part of 802.15.4 but useful as default behaviour to set the ack request bit or not if we don't know if the ack request bit should set. This is currently used for stacks like IEEE 802.15.4 6LoWPAN. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-10mac802154: change frame_retries behaviourAlexander Aring
This patch changes the default minimum value of frame_retries to 0 and changes the frame_retries default value to 3 which is also 802.15.4 default. We don't use the frame_retries "-1" value as indicator for no-aret mode anymore, instead we checking on the ack request bit inside the 802.15.4 frame control field. This allows a acknowledge handling per frame. This checking is done by transceiver or inside xmit callback of driver layer. If a transceiver doesn't support ARET handling the transmit functionality ignores ack frames then, which isn't well but should not effect anything of current functionality. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-10mac802154: cfg: remove test and set checksAlexander Aring
This patch removes several checks if a value is really changed. This makes only sense if we have another layer call e.g. calling the driver_ops which is done by callbacks like "set_channel". For MAC settings which need to be set by phy registers (if the phy supports that handling) this is set by doing an interface up currently and are not direct driver_ops calls, so we remove the checks from these configuration callbacks. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Suggested-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-10mac802154: fix wpan mac setting while lowpan is thereAlexander Aring
If we currently change the mac address inside the wpan interface while we have a lowpan interface on top of the wpan interface, the mac address setting doesn't reach the lowpan interface. The effect would be that the IPv6 lowpan interface has the old SLAAC address and isn't working anymore because the lowpan interface use in internal mechanism sometimes dev->addr which is the old mac address of the wpan interface. This patch checks if a wpan interface belongs to lowpan interface, if yes then we need to check if the lowpan interface is down and change the mac address also at the lowpan interface. When the lowpan interface will be set up afterwards, it will use the correct SLAAC address which based on the updated mac address setting. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Tested-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-10ieee802154: 6lowpan: remove multiple lowpan per wpan supportAlexander Aring
We currently supports multiple lowpan interfaces per wpan interface. I never saw any use case into such functionality. We drop this feature now because it's much easier do deal with address changes inside the under laying wpan interface. This patch removes the multiple lowpan interface and adds a lowpan_dev netdev pointer into the wpan_dev, if this pointer isn't null the wpan interface belongs to the assigned lowpan interface. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Tested-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-106lowpan: Fix extraction of flow label fieldLukasz Duda
The lowpan_fetch_skb function is used to fetch the first byte, which also increments the data pointer in skb structure, making subsequent array lookup of byte 0 actually being byte 1. To decompress the first byte of the Flow Label when the TF flag is set to 0x01, the second half of the first byte is needed. The patch fixes the extraction of the Flow Label field. Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Lukasz Duda <lukasz.duda@nordicsemi.no> Signed-off-by: Glenn Ruben Bakke <glenn.ruben.bakke@nordicsemi.no> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-10Bluetooth: Fix breakage in amp_write_rem_assoc_frag()Dan Carpenter
We should be passing the pointer itself instead of the address of the pointer. This was a copy and paste bug when we replaced the calls to hci_send_cmd(). Originally, the arguments were "len, cp" but we overwrote them with "sizeof(cp), &cp" by mistake. Fixes: b3d3914006a0 ('Bluetooth: Move amp assoc read/write completed callback to amp.c') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-10netlink: make sure -EBUSY won't escape from netlink_insertDaniel Borkmann
Linus reports the following deadlock on rtnl_mutex; triggered only once so far (extract): [12236.694209] NetworkManager D 0000000000013b80 0 1047 1 0x00000000 [12236.694218] ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018 [12236.694224] ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff [12236.694230] ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80 [12236.694235] Call Trace: [12236.694250] [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10 [12236.694257] [<ffffffff815d133a>] ? schedule+0x2a/0x70 [12236.694263] [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10 [12236.694271] [<ffffffff815d2c3f>] ? __mutex_lock_slowpath+0x7f/0xf0 [12236.694280] [<ffffffff815d2cc6>] ? mutex_lock+0x16/0x30 [12236.694291] [<ffffffff814f1f90>] ? rtnetlink_rcv+0x10/0x30 [12236.694299] [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180 [12236.694309] [<ffffffff814f5ad3>] ? rtnl_getlink+0x113/0x190 [12236.694319] [<ffffffff814f202a>] ? rtnetlink_rcv_msg+0x7a/0x210 [12236.694331] [<ffffffff8124565c>] ? sock_has_perm+0x5c/0x70 [12236.694339] [<ffffffff814f1fb0>] ? rtnetlink_rcv+0x30/0x30 [12236.694346] [<ffffffff8150d62c>] ? netlink_rcv_skb+0x9c/0xc0 [12236.694354] [<ffffffff814f1f9f>] ? rtnetlink_rcv+0x1f/0x30 [12236.694360] [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180 [12236.694367] [<ffffffff8150d344>] ? netlink_sendmsg+0x484/0x5d0 [12236.694376] [<ffffffff810a236f>] ? __wake_up+0x2f/0x50 [12236.694387] [<ffffffff814cad23>] ? sock_sendmsg+0x33/0x40 [12236.694396] [<ffffffff814cb05e>] ? ___sys_sendmsg+0x22e/0x240 [12236.694405] [<ffffffff814cab75>] ? ___sys_recvmsg+0x135/0x1a0 [12236.694415] [<ffffffff811a9d12>] ? eventfd_write+0x82/0x210 [12236.694423] [<ffffffff811a0f9e>] ? fsnotify+0x32e/0x4c0 [12236.694429] [<ffffffff8108cb70>] ? wake_up_q+0x60/0x60 [12236.694434] [<ffffffff814cba09>] ? __sys_sendmsg+0x39/0x70 [12236.694440] [<ffffffff815d4797>] ? entry_SYSCALL_64_fastpath+0x12/0x6a It seems so far plausible that the recursive call into rtnetlink_rcv() looks suspicious. One way, where this could trigger is that the senders NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so the rtnl_getlink() request's answer would be sent to the kernel instead to the actual user process, thus grabbing rtnl_mutex() twice. One theory would be that netlink_autobind() triggered via netlink_sendmsg() internally overwrites the -EBUSY error to 0, but where it is wrongly originating from __netlink_insert() instead. That would reset the socket's portid to 0, which is then filled into NETLINK_CB(skb).portid later on. As commit d470e3b483dc ("[NETLINK]: Fix two socket hashing bugs.") also puts it, -EBUSY should not be propagated from netlink_insert(). It looks like it's very unlikely to reproduce. We need to trigger the rhashtable_insert_rehash() handler under a situation where rehashing currently occurs (one /rare/ way would be to hit ht->elasticity limits while not filled enough to expand the hashtable, but that would rather require a specifically crafted bind() sequence with knowledge about destination slots, seems unlikely). It probably makes sense to guard __netlink_insert() in any case and remap that error. It was suggested that EOVERFLOW might be better than an already overloaded ENOMEM. Reference: http://thread.gmane.org/gmane.linux.network/372676 Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10netfilter: SYNPROXY: fix sending window update to clientPhil Sutter
Upon receipt of SYNACK from the server, ipt_SYNPROXY first sends back an ACK to finish the server handshake, then calls nf_ct_seqadj_init() to initiate sequence number adjustment of forwarded packets to the client and finally sends a window update to the client to unblock it's TX queue. Since synproxy_send_client_ack() does not set synproxy_send_tcp()'s nfct parameter, no sequence number adjustment happens and the client receives the window update with incorrect sequence number. Depending on client TCP implementation, this leads to a significant delay (until a window probe is being sent). Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-10netfilter: ip6t_SYNPROXY: fix NULL pointer dereferencePhil Sutter
This happens when networking namespaces are enabled. Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-09net: ethernet: Fix double word "the the" in eth.cMasanari Iida
This patch fix double word "the the" in Documentation/DocBook/networking/API-eth-get-headlen.html Documentation/DocBook/networking/netdev.html Documentation/DocBook/networking.xml These files are generated from comment in source, so I have to fix comment in net/ethernet/eth.c. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09mpls: Enforce payload type of traffic sent using explicit NULLRobert Shearman
RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only label on the stack, then after popping the resulting packet must be treated as a IPv4 packet and forwarded based on the IPv4 header. The same is true for IPv6 Explicit NULL with an IPv6 packet following. Therefore, when installing the IPv4/IPv6 Explicit NULL label routes, add an attribute that specifies the expected payload type for use at forwarding time for determining the type of the encapsulated packet instead of inspecting the first nibble of the packet. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09net: dsa: add support for switchdev FDB objectsVivien Didelot
Remove the fdb_{add,del,getnext} function pointer in favor of new port_fdb_{add,del,getnext}. Implement the switchdev_port_obj_{add,del,dump} functions in DSA to support the SWITCHDEV_OBJ_PORT_FDB objects. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09net: switchdev: support static FDB addressesVivien Didelot
This patch adds a is_static boolean to the switchdev_obj_fdb structure, in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09net: switchdev: change fdb addr for a byte arrayVivien Didelot
The address in the switchdev_obj_fdb structure is currently represented as a pointer. Replacing it for a 6-byte array allows switchdev to carry addresses directly read from hardware registers, not stored by the switch chip driver (as in Rocker). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-09net:wimax: Fix doucble word "the the" in networking.xmlMasanari Iida
This patch fix a double word "the the" in Documentation/DocBook/networking.xml and Documentation/DocBook/networking/API-Wimax-report-rfkill-sw.html. These files are generated from comment in source, so I had to fix the typo in net/wimax/io-rfkill.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07net: Fix race condition in store_rps_mapTom Herbert
There is a race condition in store_rps_map that allows jump label count in rps_needed to go below zero. This can happen when concurrently attempting to set and a clear map. Scenario: 1. rps_needed count is zero 2. New map is assigned by setting thread, but rps_needed count _not_ yet incremented (rps_needed count still zero) 2. Map is cleared by second thread, old_map set to that just assigned 3. Second thread performs static_key_slow_dec, rps_needed count now goes negative Fix is to increment or decrement rps_needed under the spinlock. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Antonio Quartulli says: ==================== Included changes: - prevent DAT from replying on behalf of local clients and confuse L2 bridges - fix crash on double list removal of TT objects (tt_local_entry) - fix crash due to missing NULL checks - initialize bw values for new GWs objects to prevent memory leak ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07openvswitch: Make 100 percents packets sampled when sampling rate is 1.Wenyu Zhang
When sampling rate is 1, the sampling probability is UINT32_MAX. The packet should be sampled even the prandom32() generate the number of UINT32_MAX. And none packet need be sampled when the probability is 0. Signed-off-by: Wenyu Zhang <wenyuz@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATAAlexei Starovoitov
IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA, so combine them into single IFLA_VXLAN_COLLECT_METADATA flag. 'flowbased' doesn't convey real meaning of the vxlan tunnel mode. This mode can be used by routing, tc+bpf and ovs. Only ovs is strictly flow based, so 'collect metadata' is a better name for this tunnel mode. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.Sowmini Varadhan
Register pernet subsys init/stop functions that will set up and tear down per-net RDS-TCP listen endpoints. Unregister pernet subusys functions on 'modprobe -r' to clean up these end points. Enable keepalive on both accept and connect socket endpoints. The keepalive timer expiration will ensure that client socket endpoints will be removed as appropriate from the netns when an interface is removed from a namespace. Register a device notifier callback that will clean up all sockets (and thus avoid the need to wait for keepalive timeout) when the loopback device is unregistered from the netns indicating that the netns is getting deleted. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than ↵Sowmini Varadhan
init_net Open the sockets calling sock_create_kern() with the correct struct net pointer, and use that struct net pointer when verifying the address passed to rds_bind(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07netfilter: nfacct: per network namespace supportAndreas Schultz
- Move the nfnl_acct_list into the network namespace, initialize and destroy it per namespace - Keep track of refcnt on nfacct objects, the old logic does not longer work with a per namespace list - Adjust xt_nfacct to pass the namespace when registring objects Signed-off-by: Andreas Schultz <aschultz@tpip.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: nft_limit: add per-byte limitingPablo Neira Ayuso
This patch adds a new NFTA_LIMIT_TYPE netlink attribute to indicate the type of limiting. Contrary to per-packet limiting, the cost is calculated from the packet path since this depends on the packet length. The burst attribute indicates the number of bytes in which the rate can be exceeded. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: nft_limit: constant token cost per packetPablo Neira Ayuso
The cost per packet can be calculated from the control plane path since this doesn't ever change. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: nft_limit: add burst parameterPablo Neira Ayuso
This patch adds the burst parameter. This burst indicates the number of packets that can exceed the limit. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: nft_limit: factor out shared code with per-byte limitingPablo Neira Ayuso
This patch prepares the introduction of per-byte limiting. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: nft_limit: convert to token-based limiting at nanosecond granularityPablo Neira Ayuso
Rework the limit expression to use a token-based limiting approach that refills the bucket gradually. The tokens are calculated at nanosecond granularity instead jiffies to improve precision. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: nft_limit: rename to nft_limit_pktsPablo Neira Ayuso
To prepare introduction of bytes ratelimit support. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: nf_tables: add nft_dup expressionPablo Neira Ayuso
This new expression uses the nf_dup engine to clone packets to a given gateway. Unlike xt_TEE, we use an index to indicate output interface which should be fine at this stage. Moreover, change to the preemtion-safe this_cpu_read(nf_skb_duplicated) from nf_dup_ipv{4,6} to silence a lockdep splat. Based on the original tee expression from Arturo Borrero Gonzalez, although this patch has diverted quite a bit from this initial effort due to the change to support maps. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: factor out packet duplication for IPv4/IPv6Pablo Neira Ayuso
Extracted from the xtables TEE target. This creates two new modules for IPv4 and IPv6 that are shared between the TEE target and the new nf_tables dup expressions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: xt_TEE: get rid of WITH_CONNTRACK definitionPablo Neira Ayuso
Use IS_ENABLED(CONFIG_NF_CONNTRACK) instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-07netfilter: nft_counter: convert it to use per-cpu countersPablo Neira Ayuso
This patch converts the existing seqlock to per-cpu counters. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-06bridge: netlink: account for the IFLA_BRPORT_PROXYARP_WIFI attribute size ↵Nikolay Aleksandrov
and policy The attribute size wasn't accounted for in the get_slave_size() callback (br_port_get_slave_size) when it was introduced, so fix it now. Also add a policy entry for it in br_port_policy. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: 842a9ae08a25 ("bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06bridge: netlink: account for the IFLA_BRPORT_PROXYARP attribute size and policyNikolay Aleksandrov
The attribute size wasn't accounted for in the get_slave_size() callback (br_port_get_slave_size) when it was introduced, so fix it now. Also add a policy entry for it in br_port_policy. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06net: pktgen: don't abuse current->state in pktgen_thread_worker()Oleg Nesterov
Commit 1fbe4b46caca "net: pktgen: kill the Wait for kthread_stop code in pktgen_thread_worker()" removed (in particular) the final __set_current_state(TASK_RUNNING) and I didn't notice the previous set_current_state(TASK_INTERRUPTIBLE). This triggers the warning in __might_sleep() after return. Afaics, we can simply remove both set_current_state()'s, and we could do this a long ago right after ef87979c273a2 "pktgen: better scheduler friendliness" which changed pktgen_thread_worker() to use wait_event_interruptible_timeout(). Reported-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06af_mpls: add null dev check in find_outdevRoopa Prabhu
This patch adds null dev check for the 'cfg->rc_via_table == NEIGH_LINK_TABLE or dev_get_by_index() failed' case Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06mpls: small cleanup in inet/inet6_fib_lookup_dev()Dan Carpenter
We recently changed this code from returning NULL to returning ERR_PTR. There are some left over NULL assignments which we can remove. We can preserve the error code from ip_route_output() instead of always returning -ENODEV. Also these functions use a mix of gotos and direct returns. There is no cleanup necessary so I changed the gotos to direct returns. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06net: Fix skb_set_peeked use-after-free bugHerbert Xu
The commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec ("net: Clone skb before setting peeked flag") introduced a use-after-free bug in skb_recv_datagram. This is because skb_set_peeked may create a new skb and free the existing one. As it stands the caller will continue to use the old freed skb. This patch fixes it by making skb_set_peeked return the new skb (or the old one if unchanged). Fixes: 738ac1ebb96d ("net: Clone skb before setting peeked flag") Reported-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Brenden Blanco <bblanco@plumgrid.com> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06Bluetooth: fix MGMT_EV_NEW_LONG_TERM_KEY eventJakub Pawlowski
This patch fixes how MGMT_EV_NEW_LONG_TERM_KEY event is build. Right now val vield is filled with only 1 byte, instead of whole value. This bug was introduced in commit 1fc62c526a57 ("Bluetooth: Fix exposing full value of shortened LTKs") Before that patch, if you paired with device using bluetoothd using simple pairing, and then restarted bluetoothd, you would be able to re-connect, but device would fail to establish encryption and would terminate connection. After this patch connecting after bluetoothd restart works fine. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-05xprtrdma: take HCA driver refcount at clientDevesh Sharma
This is a rework of the following patch sent almost a year back: http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html In presence of active mount if someone tries to rmmod vendor-driver, the command remains stuck forever waiting for destruction of all rdma-cm-id. in worst case client can crash during shutdown with active mounts. The existing code assumes that ia->ri_id->device cannot change during the lifetime of a transport. xprtrdma do not have support for DEVICE_REMOVAL event either. Lifting that assumption and adding support for DEVICE_REMOVAL event is a long chain of work, and is in plan. The community decided that preventing the hang right now is more important than waiting for architectural changes. Thus, this patch introduces a temporary workaround to acquire HCA driver module reference count during the mount of a nfs-rdma mount point. Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-05xprtrdma: Count RDMA_NOMSG type callsChuck Lever
RDMA_NOMSG type calls are less efficient than RDMA_MSG. Count NOMSG calls so administrators can tell if they happen to be used more than expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-05xprtrdma: Clean up xprt_rdma_print_stats()Chuck Lever
checkpatch.pl complained about the seq_printf() format string split across lines and the use of %Lu. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-05xprtrdma: Fix large NFS SYMLINK callsChuck Lever
Repair how rpcrdma_marshal_req() chooses which RDMA message type to use for large non-WRITE operations so that it picks RDMA_NOMSG in the correct situations, and sets up the marshaling logic to SEND only the RPC/RDMA header. Large NFSv2 SYMLINK requests now use RDMA_NOMSG calls. The Linux NFS server XDR decoder for NFSv2 SYMLINK does not handle having the pathname argument arrive in a separate buffer. The decoder could be fixed, but this is simpler and RDMA_NOMSG can be used in a variety of other situations. Ensure that the Linux client continues to use "RDMA_MSG + read list" when sending large NFSv3 SYMLINK requests, which is more efficient than using RDMA_NOMSG. Large NFSv4 CREATE(NF4LNK) requests are changed to use "RDMA_MSG + read list" just like NFSv3 (see Section 5 of RFC 5667). Before, these did not work at all. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-05xprtrdma: Fix XDR tail buffer marshallingChuck Lever
Currently xprtrdma appends an extra chunk element to the RPC/RDMA read chunk list of each NFSv4 WRITE compound. The extra element contains the final GETATTR operation in the compound. The result is an extra RDMA READ operation to transfer a very short piece of each NFS WRITE compound (typically 16 bytes). This is inefficient. It is also incorrect. The client is sending the trailing GETATTR at the same Position as the preceding WRITE data payload. Whether or not RFC 5667 allows the GETATTR to appear in a read chunk, RFC 5666 requires that these two separate RPC arguments appear at two distinct Positions. It can also be argued that the GETATTR operation is not bulk data, and therefore RFC 5667 forbids its appearance in a read chunk at all. Although RFC 5667 is not precise about when using a read list with NFSv4 COMPOUND is allowed, the intent is that only data arguments not touched by NFS (ie, read and write payloads) are to be sent using RDMA READ or WRITE. The NFS client constructs GETATTR arguments itself, and therefore is required to send the trailing GETATTR operation as additional inline content, not as a data payload. NB: This change is not backwards compatible. Some older servers do not accept inline content following the read list. The Linux NFS server should handle this content correctly as of commit a97c331f9aa9 ("svcrdma: Handle additional inline content"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-05xprtrdma: Don't provide a reply chunk when expecting a short replyChuck Lever
Currently Linux always offers a reply chunk, even when the reply can be sent inline (ie. is smaller than 1KB). On the client, registering a memory region can be expensive. A server may choose not to use the reply chunk, wasting the cost of the registration. This is a change only for RPC replies smaller than 1KB which the server constructs in the RPC reply send buffer. Because the elements of the reply must be XDR encoded, a copy-free data transfer has no benefit in this case. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-05xprtrdma: Always provide a write list when sending NFS READChuck Lever
The client has been setting up a reply chunk for NFS READs that are smaller than the inline threshold. This is not efficient: both the server and client CPUs have to copy the reply's data payload into and out of the memory region that is then transferred via RDMA. Using the write list, the data payload is moved by the device and no extra data copying is necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-By: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-05xprtrdma: Account for RPC/RDMA header size when deciding to inlineChuck Lever
When the size of the RPC message is near the inline threshold (1KB), the client would allow messages to be sent that were a few bytes too large. When marshaling RPC/RDMA requests, ensure the combined size of RPC/RDMA header and RPC header do not exceed the inline threshold. Endpoints typically reject RPC/RDMA messages that exceed the size of their receive buffers. The two server implementations I test with (Linux and Solaris) use receive buffers that are larger than the client’s inline threshold. Thus so far this has been benign, observed only by code inspection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-08-05xprtrdma: Remove logic that constructs RDMA_MSGP type callsChuck Lever
RDMA_MSGP type calls insert a zero pad in the middle of the RPC message to align the RPC request's data payload to the server's alignment preferences. A server can then "page flip" the payload into place to avoid a data copy in certain circumstances. However: 1. The client has to have a priori knowledge of the server's preferred alignment 2. Requests eligible for RDMA_MSGP are requests that are small enough to have been sent inline, and convey a data payload at the _end_ of the RPC message Today 1. is done with a sysctl, and is a global setting that is copied during mount. Linux does not support CCP to query the server's preferences (RFC 5666, Section 6). A small-ish NFSv3 WRITE might use RDMA_MSGP, but no NFSv4 compound fits bullet 2. Thus the Linux client currently leaves RDMA_MSGP disabled. The Linux server handles RDMA_MSGP, but does not use any special page flipping, so it confers no benefit. Clean up the marshaling code by removing the logic that constructs RDMA_MSGP type calls. This also reduces the maximum send iovec size from four to just two elements. /proc/sys/sunrpc/rdma_inline_write_padding is a kernel API, and thus is left in place. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>