summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-12-03inet: Add a count to struct inet_listen_hashbucketMartin KaFai Lau
This patch adds a count to the 'struct inet_listen_hashbucket'. It counts how many sk is hashed to a bucket. It will be used to decide if the (to-be-added) portaddr listener's hashtable should be used during inet[6]_lookup_listener(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03enic: add sw timestamp supportGovindarajulu Varadarajan
Add ethtool ops to advertise sw timestamping. Call skb_tx_timestamp() just before ringing the wq doorbell. Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03Merge branch 'hv_netvsc-minor-optimizations'David S. Miller
Stephen Hemminger says: ==================== hv_netvsc: minor optimizations These are a set of local optimizations the Hyper-V networking driver. Also include a vmbus patch in this set, because it depends on the netvsc that last used that function. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03vmbus: make hv_get_ringbuffer_availbytes localStephen Hemminger
The last use of hv_get_ringbuffer_availbytes in drivers is now gone. Only used by the debug info routine so make it static. Also, add READ_ONCE() to avoid any possible issues with potentially volatile index values. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03hv_netvsc: optimize initialization of RNDIS headerStephen Hemminger
The memset of the whole maximum possible RNDIS header is unnecessary. For the main part of the header use a structure assignment. No need to memset the whole per packet info. Instead rely on caller to set what it wants. Also get rid of cast to void and signed/unsigned conversion. Now return pointer to per packet data (rather than the header) which simplifies use by code setting up the packet data. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03hv_netvsc: use reciprocal divide to speed up percent calculationStephen Hemminger
Every packet sent checks the available ring space. The calculation can be sped up by using reciprocal divide which is multiplication. Since ring_size can only be configured by module parameter, so it doesn't have to be passed around everywhere. Also it should be unsigned since it is number of pages. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03hv_netvsc: replace divide with mask when computing paddingStephen Hemminger
Packet alignment is always a power of 2 therefore modulus can be replaced with a faster and operation Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03hv_netvsc: don't need local xmit_moreStephen Hemminger
Since skb is always non-NULL in the copy portion of netvsc_send do not need local variable. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03hv_netvsc: drop unused macrosStephen Hemminger
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03rxrpc: Fix the MAINTAINERS recordDavid Howells
Fix the MAINTAINERS record so that it's more obvious who the maintainer for AF_RXRPC is. Reported-by: Joe Perches <joe@perches.com> Reported-by: David Miller <davem@davemloft.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03rxrpc: Use correct netns source in rxrpc_release_sock()David Howells
In rxrpc_release_sock() there may be no rx->local value to access, so we can't unconditionally follow it to the rxrpc network namespace information to poke the connection reapers. Instead, use the socket's namespace pointer to find the namespace. This unfixed code causes the following static checker warning: net/rxrpc/af_rxrpc.c:898 rxrpc_release_sock() error: we previously assumed 'rx->local' could be null (see line 887) Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03liquidio: fix incorrect indentation of assignment statementColin Ian King
Remove one extraneous level of indentation on assignment statement. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03Merge tag 'linux-can-fixes-for-4.15-20171201' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-12-01 this is a pull for net consisting of nine patches. The first three patches are by Jimmy Assarsson for the kvaser_usb driver and add the missing free()s in some error path, a signed/unsigned comparison and ratelimit the error messages in case of incomplete messages. Oliver Stäbler's patch for the ti_hecc driver fix the napi poll function's return value. The return values of the probe function of the peak_canfd and peak_pci PCI drivers are fixed by Stephane Grosjean's patch. Two patches by me for the flexcan driver update the bugs/features/quirks overview table and fix the error state transition for the VF610 SoC. The two patches by Martin Kelly for the mcba_usb driver fix a typo and a device disconnect bug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03Merge tag 'at24-4.15-fixes-for-wolfram' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current Please consider pulling the following fixes for v4.15. While it doesn't fix any regression introduced in the v4.15 merge window, we have a feature in at24 since linux v4.8 - reading the mac address block from at24mac series - which turned out to be not working. This pull request contains changes that fix it together with a patch that hardens the read and write argument sanitization with out-of-bounds checks that were missing.
2017-12-03stmmac: reset last TSO segment size after device openLars Persson
The mss variable tracks the last max segment size sent to the TSO engine. We do not update the hardware as long as we receive skb:s with the same value in gso_size. During a network device down/up cycle (mapped to stmmac_release() and stmmac_open() callbacks) we issue a reset to the hardware and it forgets the setting for mss. However we did not zero out our mss variable so the next transmission of a gso packet happens with an undefined hardware setting. This triggers a hang in the TSO engine and eventuelly the netdev watchdog will bark. Fixes: f748be531d70 ("stmmac: support new GMAC4") Signed-off-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03ipvlan: Add new func ipvlan_is_valid_dev instead of duplicated codesGao Feng
There are multiple duplicated condition checks in the current codes, so I add the new func ipvlan_is_valid_dev instead of the duplicated codes to check if the netdev is real ipvlan dev. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03ipvlan: Add the skb->mark as flow4's member to lookup routeGao Feng
Current codes don't use skb->mark to assign flowi4_mark, it would make the policy route rule with fwmark doesn't work as expected. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03Merge branch 'realtek-phy-improvements'David S. Miller
Martin Blumenstingl says: ==================== Realtek Ethernet PHY driver improvements This series provides some small improvements and cleanups for the Realtek Ethernet PHY driver. None of the patches in this series should change any functionality. The goal is to make the code a bit easier to read by: - re-using the BIT and GENMASK macros (which makes it easier to compare the #defines in the kernel with the values from the datasheets) - rename a #define from a generic name to a PHY-specific name since it's only used for one specific PHY - logically group the register #defines and their register bit #defines together - indentation cleanups - removed some code duplicating for reading/writing registers on a Realtek specific "page" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03net: phy: realtek: add utility functions to read/write page addressesMartin Blumenstingl
Realtek PHYs implement the concept of so-called "extension pages". The reason for this is probably because these PHYs expose more registers than available in the standard address range. After all read/write operations on such a page are done the driver should switch back to page 0 where the standard MII registers (such as MII_BMCR) are available. When referring to such a register the datasheets of RTL8211E and RTL8211F always specify: - the page / "ext. page" which has to be written to RTL821x_PAGE_SELECT - an address (sometimes also called reg) These new utility functions make the existing code easier to read since it removes some duplication (switching back to page 0 is done within the new helpers for example). No functional changes are intended. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03net: phy: realtek: use the same indentation for all #definesMartin Blumenstingl
This simply makes the code easier to read. No functional changes. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03net: phy: realtek: group all register bit #defines for RTL821x_INERMartin Blumenstingl
This simply moves all register bit #defines which describe the (PHY specific) bits in the RTL821x_INER right below the RTL821x_INER register definition. This makes it easier to spot which registers and bits belong together. No functional changes. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03net: phy: realtek: rename RTL821x_INER_INIT to RTL8211B_INER_INITMartin Blumenstingl
This macro is only used by the RTL8211B code. RTL8211E and RTL8211F both use other bits to initialize the RTL821x_INER register. No functional changes. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03net: phy: realtek: use the BIT and GENMASK macrosMartin Blumenstingl
This makes it easier to compare the #defines with the datasheets. No functional changes. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02Merge branch 's390-qeth-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2017-12-01 please apply the following three fixes for 4.15. These should also go back to stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02s390/qeth: build max size GSO skbs on L2 devicesJulian Wiedmann
The current GSO skb size limit was copy&pasted over from the L3 path, where it is needed due to a TSO limitation. As L2 devices don't offer TSO support (and thus all GSO skbs are segmented before they reach the driver), there's no reason to restrict the stack in how large it may build the GSO skbs. Fixes: d52aec97e5bc ("qeth: enable scatter/gather in layer 2 mode") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02s390/qeth: fix GSO throughput regressionJulian Wiedmann
Using GSO with small MTUs currently results in a substantial throughput regression - which is caused by how qeth needs to map non-linear skbs into its IO buffer elements: compared to a linear skb, each GSO-segmented skb effectively consumes twice as many buffer elements (ie two instead of one) due to the additional header-only part. This causes the Output Queue to be congested with low-utilized IO buffers. Fix this as follows: If the MSS is low enough so that a non-SG GSO segmentation produces order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is where we anticipate the biggest savings, since an SG-enabled GSO segmentation produces skbs that always consume at least two buffer elements. Larger MSS values continue to get a SG-enabled GSO segmentation, since 1) the relative overhead of the additional header-only buffer element becomes less noticeable, and 2) the linearization overhead increases. With the throughput regression fixed, re-enable NETIF_F_SG by default to reap the significant CPU savings of GSO. Fixes: 5722963a8e83 ("qeth: do not turn on SG per default") Reported-by: Nils Hoppmann <niho@de.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02s390/qeth: fix thinko in IPv4 multicast address trackingJulian Wiedmann
Commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") reworked how secondary addresses are managed for qeth devices. Instead of dropping & subsequently re-adding all addresses on every ndo_set_rx_mode() call, qeth now keeps track of the addresses that are currently registered with the HW. On a ndo_set_rx_mode(), we thus only need to do (de-)registration requests for the addresses that have actually changed. On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong hashtable - and thus never finds a match. As a result, we first delete *all* such addresses, and then re-add them again. So each set_rx_mode() causes a short period where the IPv4 Multicast addresses are not registered, and the card stops forwarding inbound traffic for them. Fix this by setting the ->is_multicast flag on the lookup object, thus enabling qeth_l3_ip_from_hash() to search the correct hashtable and find a match there. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02Merge branch 'vhost-skb-leaks'David S. Miller
Wei Xu says: ==================== vhost: fix a few skb leaks Matthew found a roughly 40% tcp throughput regression with commit c67df11f(vhost_net: try batch dequing from skb array) as discussed in the following thread: https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html v4: - fix zero iov iterator count in tap/tap_do_read()(Jason) - don't put tun in case of EBADFD(Jason) - Replace msg->msg_control with new 'skb' when calling tun/tap_do_read() v3: - move freeing skb from vhost to tun/tap recvmsg() to not confuse the callers. v2: - add Matthew as the reporter, thanks matthew. - moving zero headcount check ahead instead of defer consuming skb due to jason and mst's comment. - add freeing skb in favor of recvmsg() fails. ==================== Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02tap: free skb if flags errorWei Xu
tap_recvmsg() supports accepting skb by msg_control after commit 3b4ba04acca8 ("tap: support receiving skb from msg_control"), the skb if presented should be freed within the function, otherwise it would be leaked. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02tun: free skb in early errorsWei Xu
tun_recvmsg() supports accepting skb by msg_control after commit ac77cfd4258f ("tun: support receiving skb through msg_control"), the skb if presented should be freed no matter how far it can go along, otherwise it would be leaked. This patch fixes several missed cases. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02vhost: fix skb leak in handle_rx()Wei Xu
Matthew found a roughly 40% tcp throughput regression with commit c67df11f(vhost_net: try batch dequing from skb array) as discussed in the following thread: https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html Eventually we figured out that it was a skb leak in handle_rx() when sending packets to the VM. This usually happens when a guest can not drain out vq as fast as vhost fills in, afterwards it sets off the traffic jam and leaks skb(s) which occurs as no headcount to send on the vq from vhost side. This can be avoided by making sure we have got enough headcount before actually consuming a skb from the batched rx array while transmitting, which is simply done by moving checking the zero headcount a bit ahead. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02Merge branch 'bnxt_en-fixes'David S. Miller
Michael Chan says: ==================== bnxt_en: Fixes. A shutdown fix for SMARTNIC, 2 fixes related to TC Flower vxlan filters, and the last one fixes an out-of-scope variable when sending short firmware messages. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()Vasundhara Volam
short_input variable is assigned to another data pointer which is referred out of its scope. Fix it by moving short_input definition to the beginning of bnxt_hwrm_do_send_msg() function. No failure has been reported so far due to this issue. Fixes: e605db801bde ("bnxt_en: Support for Short Firmware Message") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02bnxt_en: fix dst/src fid for vxlan encap/decap actionsSathya Perla
For flows that involve a vxlan encap action, the vxlan sock interface may be specified as the outgoing interface. The driver must resolve the outgoing PF interface used by this socket and use the dst_fid of the PF in the hwrm_cfa_encap_record_alloc cmd. Similarily for flows that have a vxlan decap action, the fid of the incoming PF interface must be used as the src_fid in the hwrm_cfa_decap_filter_alloc cmd. Fixes: 8c95f773b4a3 ("bnxt_en: add support for Flower based vxlan encap/decap offload") Signed-off-by: Sathya Perla <sathya.perla@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02bnxt_en: wildcard smac while creating tunnel decap filterSunil Challa
While creating a decap filter the tunnel smac need not (and must not) be specified as we cannot ascertain the neighbor in the recv path. 'ttl' match is also not needed for the decap filter and must be wild-carded. Fixes: f484f6782e01 ("bnxt_en: add hwrm FW cmds for cfa_encap_record and decap_filter") Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdownRay Jui
The current 'bnxt_shutdown' implementation only invokes 'bnxt_ulp_shutdown' to shut down RoCE in the case when the system is in the path of power off (SYSTEM_POWER_OFF). While this may work in most cases, it does not work in the smart NIC case, when Linux 'reboot' command is initiated from the Linux that runs on the ARM cores of the NIC card. In this particular case, Linux 'reboot' results in a system 'L3' level reset where the entire ARM and associated subsystems are being reset, but at the same time, Nitro core is being kept in sane state (to allow external PCIe connected servers to continue to work). Without properly shutting down RoCE and freeing all associated resources, it results in the ARM core to hang immediately after the 'reboot' By always invoking 'bnxt_ulp_shutdown' in 'bnxt_shutdown', it fixes the above issue Fixes: 0efd2fc65c92 ("bnxt_en: Add a callback to inform RDMA driver during PCI shutdown.") Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02Merge branch 'dsa-cross-chip-FDB-support'David S. Miller
Vivien Didelot says: ==================== net: dsa: cross-chip FDB support DSA can have interconnected switches. For instance, the ZII Dev Rev B board described in arch/arm/boot/dts/vf610-zii-dev-rev-b.dts has a switch fabric composed of 3 switch devices like this: lan4 lan6 CPU (eth1) | lan5 | lan7 | | | | | [0 1 2 3 4 6 5]---[6 0 1 2 3 4 5]---[9 0 1 2 3 4 5 6 7 8] | | | | | | | lan0 | lan2 lan3 lan8 | optical4 lan1 optical3 One current issue with DSA is cross-chip FDB. If we add a static MAC address on lan3, only its parent switch 1 (the one in the middle) will be programmed. That is not correct in a cross-chip environment, because the DSA ports connecting to switch 1 of adjacent switch 0 (on the left) and switch 2 (on the right) must be programmed too. Without this patchset, a dump of the hardware FDB of switches 0, 1 and 2 after programming a MAC address on lan3 looks like this (*): # bridge fdb add 11:22:33:44:55:66 dev lan3 # cat /sys/kernel/debug/mv88e6xxx/sw*/atu/0 | grep -v FID 0 ff:ff:ff:ff:ff:ff MC_STATIC n 0 1 2 3 4 5 6 0 11:22:33:44:55:66 MC_STATIC_MGMT_PO n 0 - - - - - - 0 ff:ff:ff:ff:ff:ff MC_STATIC n 0 1 2 3 4 5 6 0 ff:ff:ff:ff:ff:ff MC_STATIC n 0 1 2 3 4 5 6 7 8 9 With this patchset applied, adjacent DSA ports get programmed too: # bridge fdb add 11:22:33:44:55:66 dev lan3 # cat /sys/kernel/debug/mv88e6xxx/sw*/atu/0 | grep -v FID 0 11:22:33:44:55:66 MC_STATIC_MGMT_PO n - - - - - 5 - 0 ff:ff:ff:ff:ff:ff MC_STATIC n 0 1 2 3 4 5 6 0 11:22:33:44:55:66 MC_STATIC_MGMT_PO n 0 - - - - - - 0 ff:ff:ff:ff:ff:ff MC_STATIC n 0 1 2 3 4 5 6 0 11:22:33:44:55:66 MC_STATIC_MGMT_PO n - - - - - - - - - 9 0 ff:ff:ff:ff:ff:ff MC_STATIC n 0 1 2 3 4 5 6 7 8 9 In order to do that, the first commit introduces a dsa_towards_port() helper which returns the local port of a switch which must be used to reach an arbitrary switch port (local or from an adjacent switch.) The second patch uses this helper to configure the port reaching the target port for every switches of the fabric. (*) a patch for squashed debugfs interface which applies on top of this patchset is available here: https://github.com/vivien/linux/commit/f8e6ba34c68a72d3bf42f4dea79abacb2e61a3cc.patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02net: dsa: support cross-chip FDB operationsVivien Didelot
When a MAC address is added to or removed from a switch port in the fabric, the target switch must program its port and adjacent switches must program their local DSA port used to reach the target switch. For this purpose, use the dsa_towards_port() helper to identify the local switch port which must be programmed. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02net: dsa: introduce dsa_towards_port helperVivien Didelot
Add a new helper returning the local port used to reach an arbitrary switch port in the fabric. Its only user at the moment is the dsa_upstream_port helper, which returns the local port reaching the dedicated CPU port, but it will be used in cross-chip FDB operations. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02Merge branch 'dsa-simplify-switchdev-prepare-phase'David S. Miller
Vivien Didelot says: ==================== net: dsa: simplify switchdev prepare phase This patch series brings no functional changes. It removes the unused switchdev_trans arguments from the dsa_switch_ops for both MDB and VLAN operations, and provides functions to prepare and add these objects for a given bitmap of ports. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02net: dsa: add switch mdb bitmap functionsVivien Didelot
This patch brings no functional changes. It moves out the MDB code iterating on a multicast group into new dsa_switch_mdb_{prepare,add}_bitmap() functions. This gives us a better isolation of the two switchdev phases. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02net: dsa: add switch vlan bitmap functionsVivien Didelot
This patch brings no functional changes. It moves out the VLAN code iterating on a list of VLAN members into new dsa_switch_vlan_{prepare,add}_bitmap() functions. This gives us a better isolation of the two switchdev phases. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02net: dsa: remove trans argument from mdb opsVivien Didelot
The DSA switch MDB ops pass the switchdev_trans structure down to the drivers, but no one is using them and they aren't supposed to anyway. Remove the trans argument from MDB prepare and add operations. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02net: dsa: remove trans argument from vlan opsVivien Didelot
The DSA switch VLAN ops pass the switchdev_trans structure down to the drivers, but no one is using them and they aren't supposed to anyway. Remove the trans argument from VLAN prepare and add operations. At the same time, fix the following checkpatch warning: WARNING: line over 80 characters #74: FILE: drivers/net/dsa/dsa_loop.c:177: + const struct switchdev_obj_port_vlan *vlan) Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-02openvswitch: do not propagate headroom updates to internal portPaolo Abeni
After commit 3a927bc7cf9d ("ovs: propagate per dp max headroom to all vports") the need_headroom for the internal vport is updated accordingly to the max needed headroom in its datapath. That avoids the pskb_expand_head() costs when sending/forwarding packets towards tunnel devices, at least for some scenarios. We still require such copy when using the ovs-preferred configuration for vxlan tunnels: br_int / \ tap vxlan (remote_ip:X) br_phy \ NIC where the route towards the IP 'X' is via 'br_phy'. When forwarding traffic from the tap towards the vxlan device, we will call pskb_expand_head() in vxlan_build_skb() because br-phy->needed_headroom is equal to tun->needed_headroom. With this change we avoid updating the internal vport needed_headroom, so that in the above scenario no head copy is needed, giving 5% performance improvement in UDP throughput test. As a trade-off, packets sent from the internal port towards a tunnel device will now experience the head copy overhead. The rationale is that the latter use-case is less relevant performance-wise. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03Merge branch 'bpf-xdp-stack-uninit-and-offload-tests'Daniel Borkmann
Jakub Kicinski says: ==================== The purpose of this series is to add a software model of BPF offloads to make it easier for everyone to test them and make some of the more arcane rules and assumptions more clear. The series starts with 3 patches aiming to make XDP handling in the drivers less error prone. Currently driver authors have to remember to free XDP programs if XDP is active during unregister. With this series the core will disable XDP on its own. It will take place after close, drivers are not expected to perform reconfiguration when disabling XDP on a downed device. Next two patches add the software netdev driver, followed by a python test which exercises all the corner cases which came to my mind. Test needs to be run as root. It will print basic information to stdout, but can also create a more detailed log of all commands when --log option is passed. Log is in Emacs Org-mode format. ./tools/testing/selftests/bpf/test_offload.py --log /tmp/log Last two patches replace the SR-IOV API implementation of dummy. v3: - move the freeing of vfs to release (Phil). v2: - free device from the release function; - use bus-based name generatin instead of netdev name. v1: - replace the SR-IOV API implementation of dummy; - make the dev_xdp_uninstall() also handle the XDP generic (Daniel). ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-03net: dummy: remove fake SR-IOV functionalityJakub Kicinski
netdevsim driver seems like a better place for fake SR-IOV functionality. Remove the code previously added to dummy. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-03netdevsim: add SR-IOV functionalityJakub Kicinski
dummy driver was extended with VF-related netdev APIs for testing SR-IOV-related software. netdevsim did not exist back then. Implement SR-IOV functionality in netdevsim. Notable difference is that since netdevsim has no module parameters, we will actually create a device with sriov_numvfs attribute for each netdev. The zero MAC address is accepted as some HW use it to mean any address is allowed. Link state is also now validated. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-03selftests/bpf: add offload test based on netdevsimJakub Kicinski
Add a test of BPF offload control path interfaces based on just-added netdevsim driver. Perform various checks of both the stack and the expected driver behaviour. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-03netdevsim: add bpf offload supportJakub Kicinski
Add support for loading programs for netdevsim devices and expose the related information via DebugFS. Both offload of XDP and cls_bpf programs is supported. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>