summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-08-30i40e: use correct length for strncpyMitch Williams
Caught by GCC 8. When we provide a length for strncpy, we should not include the terminating null. So we must tell it one less than the size of the destination buffer. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-30i40evf: Validate the number of queues a PF sendsPaul M Stillwell Jr
A PF can send any number of queues to the VF and the VF may not be able to support that many. Check to see that the number of queues is less than or equal to the max number of queues the VF can have. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-30i40evf: Change a VF mac without reloading the VF driverPaweł Jabłoński
Add possibility to change a VF mac address from host side without reloading the VF driver on the guest side. Without this patch it is not possible to change the VF mac because executing i40evf_virtchnl_completion function with VIRTCHNL_OP_GET_VF_RESOURCES opcode resets the VF mac address to previous value. Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-30i40evf: update ethtool stats code and use helper functionsJacob Keller
Fix a bug in the way we handled VF queues, by always showing stats for the maximum number of queues, even if they aren't allocated. It is not safe to change the number of strings reported to ethtool, as grabbing statistics occurs over multiple ethtool ops for which the rtnl_lock() cannot be held the entire time. Avoid this by always reporting queue stats for the maximum number of queues in the netdevice. Share some of the helper functionality for adding stats with the PF code in i40e_ethtool_stats.h This should reduce the chance of potential future bugs, and make adding new statistics easier. Note for the queue stats, unlike the PF driver we do not keep an array of queue pointers, but an array of queues, so care must be taken to avoid accessing queue memory that hasn't yet been allocated. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-30i40e: move ethtool stats boiler plate code to i40e_ethtool_stats.hJacob Keller
Move the boiler plate structures and helper functions we recently added into their own header file, so that the complete collection is located together. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-30i40e: convert queue stats to i40e_stats arrayJacob Keller
Use an i40e_stats array to handle the queue stats, instead of coding similar functionality separately. Because of how the queue stats are accessed on some kernels, we can't easily use i40e_add_ethtool_stats. Instead, implement a separate helper, i40e_add_queue_stats, which we'll use instead. This helper will correctly implement the u64_stats_fetch_begin_irq logic and allow retries until successful. We share the most complex code by re-using i40e_add_one_ethtool_stat. This logic additionally easily supports skipping disabled rings by using a ternary operator before calling the u64_stats_fetch_begin_irq() function, so that we correctly zero-out the stats values without having to perform two separate sections of code. This significantly reduces the boiler plate code in i40e_get_ethtool_stats, and helps keep the complex logic contained to as few functions as possible. With this patch, we've finally converted all the statistics to use the helpers and the i40e_stats function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-30xsk: include XDP meta data in AF_XDP framesBjörn Töpel
Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did not include XDP meta data in the data buffers copied out to the user application. In this commit, we check if meta data is available, and if so, it is prepended to the frame. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-30Merge branch 'bpf-bpffs-bpftool-dump-with-btf'Daniel Borkmann
Yonghong Song says: ==================== Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") and Commit 699c86d6ec21 ("bpf: btf: add pretty print for hash/lru_hash maps") added bpffs pretty print for array, hash and lru hash maps. The pretty print gives users a structurally formatted dump for keys/values which much easy to understand than raw bytes. This patch set implemented bpffs pretty print support for percpu arraymap, percpu hashmap and percpu lru hashmap. For complex key/value types, the pretty print here is even more useful due to: . large volumne of data making it even harder to correlate bytes to a particular field in a particular cpu. . kernel rounds the value size for each cpu to multiple of 8. User has to be aware of this otherwise wrong value may be derived from cpu 1/2/... For example, we may have a bpffs pretty print like below: 43602: { cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO} } for a percpu map. This patch also added percpu formatted print on bpftool. For example, bpftool may print like below: { "key": 0, "values": [{ "cpu": 0, "value": { "ui32": 0, "ui16": 0, } },{ "cpu": 1, "value": { "ui32": 1, "ui16": 0, } },{ "cpu": 2, "value": { "ui32": 2, "ui16": 0, } },{ "cpu": 3, "value": { "ui32": 3, "ui16": 0, } } ] } Patch #1 implemented bpffs pretty print for percpu arraymap/hash/lru_hash in kernel. Patch #2 added the test case in tools bpf selftest test_btf. Patch #3 added percpu map btf based dump. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-30tools/bpf: bpftool: add btf percpu map formated dumpYonghong Song
The btf pretty print is added to percpu arraymap, percpu hashmap and percpu lru hashmap. For each <key, value> pair, the following will be added to plain/json output: { "key": <pretty_print_key>, "values": [{ "cpu": 0, "value": <pretty_print_value_on_cpu0> },{ "cpu": 1, "value": <pretty_print_value_on_cpu1> },{ .... },{ "cpu": n, "value": <pretty_print_value_on_cpun> } ] } For example, the following could be part of plain or json formatted output: { "key": 0, "values": [{ "cpu": 0, "value": { "ui32": 0, "ui16": 0, } },{ "cpu": 1, "value": { "ui32": 1, "ui16": 0, } },{ "cpu": 2, "value": { "ui32": 2, "ui16": 0, } },{ "cpu": 3, "value": { "ui32": 3, "ui16": 0, } } ] } Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-30tools/bpf: add bpffs percpu map pretty print tests in test_btfYonghong Song
The bpf selftest test_btf is extended to test bpffs percpu map pretty print for percpu array, percpu hash and percpu lru hash. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-30bpf: add bpffs pretty print for percpu arraymap/hash/lru_hashYonghong Song
Added bpffs pretty print for percpu arraymap, percpu hashmap and percpu lru hashmap. For each map <key, value> pair, the format is: <key_value>: { cpu0: <value_on_cpu0> cpu1: <value_on_cpu1> ... cpun: <value_on_cpun> } For example, on my VM, there are 4 cpus, and for test_btf test in the next patch: cat /sys/fs/bpf/pprint_test_percpu_hash You may get: ... 43602: { cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO} } 72847: { cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE} } ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-29Merge tag 'mac80211-next-for-davem-2018-08-29' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Only a few changes at this point: * new channels in 60 GHz * clarify (average) ACK signal reporting API * expose ieee80211_send_layer2_update() for all drivers * start/stop mac80211's TXQs properly when required * avoid regulatory restore with IE ignoring * spelling: contidion -> condition * fully implement WFA Multi-AP backhaul ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29vxlan: reduce dirty cache line in vxlan_find_macLi RongQing
vxlan_find_mac() unconditionally set f->used for every packet, this causes a cache miss for every packet, since remote, hlist and used of vxlan_fdb share the same cache line, which are accessed when send every packets. so f->used is set only if not equal to jiffies, to reduce dirty cache line times, this gives 3% speed-up with small packets. Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29Merge branch 'liquidio-improve-soft-command-response-handling'David S. Miller
Weilin Chang says: ==================== liquidio: improve soft command/response handling Change soft command handling to fix the possible race condition when the process handles a response of a soft command that was already freed by an application which got timeout for this request. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29liquidio: remove obsolete functions and data structuresFelix Manlunas
1. Remove unused functions and data structures. 2. Change the sending of the remaining soft commands to synchronous. Signed-off-by: Weilin Chang <weilin.chang@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29liquidio: change octnic_ctrl_pkt to do synchronous soft commandsFelix Manlunas
1. Change struct octnic_ctrl_pkt to support synchronous operation. 2. Change code which use structure octnic_ctrl_pkt to send sc's synchronously. Signed-off-by: Weilin Chang <weilin.chang@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29liquidio: make soft command calls synchronousFelix Manlunas
1. Add wait_for_sc_completion_timeout() for waiting the response and handling common response errors 2. Send sc's synchronously: remove unused callback function, and context structure; use wait_for_sc_completion_timeout() to wait its response. Signed-off-by: Weilin Chang <weilin.chang@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29liquidio: improve soft command handlingFelix Manlunas
1. Set LIO_SC_MAX_TMO_MS as the maximum timeout value for a soft command (sc). All sc's use this value as a hard timeout value. Add expiry_time in struct octeon_soft_command to keep the hard timeout value. The field wait_time and timeout in struct octeon_soft_command will be obsoleted in the last patch of this patch series. 2. Add processing a synchronous sc in sc response thread lio_process_ordered_list. The memory allocated for a synchronous sc will be freed by lio_process_ordered_list() to the sc pool. 3. Add two response lists for lio_process_ordered_list to process the storage allocated for sc's: OCTEON_DONE_SC_LIST response list keeps all sc's which will be freed to the pool after their requestors have finished processing the responses. OCTEON_ZOMBIE_SC_LIST response list keeps all sc's which have got LIO_SC_MAX_TMO_MS timeout. When an sc gets a hard timeout, lio_process_order_list() will recheck its status 1 ms later. If the status has not updated by the firmware at that time, the sc will be removed from OCTEON_DONE_SC_LIST response list to OCTEON_ZOMBIE_SC_LIST response list. The sc's in the OCTEON_ZOMBIE_SC_LIST response list will be freed when the driver is unloaded. Signed-off-by: Weilin Chang <weilin.chang@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net/tls: Calculate nsg for zerocopy path without skb_cow_data.Doron Roberts-Kedes
decrypt_skb fails if the number of sg elements required to map it is greater than MAX_SKB_FRAGS. nsg must always be calculated, but skb_cow_data adds unnecessary memcpy's for the zerocopy case. The new function skb_nsg calculates the number of scatterlist elements required to map the skb without the extra overhead of skb_cow_data. This patch reduces memcpy by 50% on my encrypted NBD benchmarks. Reported-by: Vakul Garg <Vakul.garg@nxp.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Tested-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net: nixge: Add support for 64-bit platformsMoritz Fischer
Add support for 64-bit platforms to driver. The hardware only supports 32-bit register accesses so the accesses need to be split up into two writes when setting the current and tail descriptor values. Signed-off-by: Moritz Fischer <mdf@kernel.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29selftests/net: add ip_defrag selftestPeter Oskolkov
This test creates a raw IPv4 socket, fragments a largish UDP datagram and sends the fragments out of order. Then repeats in a loop with different message and fragment lengths. Then does the same with overlapping fragments (with overlapping fragments the expectation is that the recv times out). Tested: root@<host># time ./ip_defrag.sh ipv4 defrag PASS ipv4 defrag with overlaps PASS real 1m7.679s user 0m0.628s sys 0m2.242s A similar test for IPv6 is to follow. Signed-off-by: Peter Oskolkov <posk@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29ip: fail fast on IP defrag errorsPeter Oskolkov
The current behavior of IP defragmentation is inconsistent: - some overlapping/wrong length fragments are dropped without affecting the queue; - most overlapping fragments cause the whole frag queue to be dropped. This patch brings consistency: if a bad fragment is detected, the whole frag queue is dropped. Two major benefits: - fail fast: corrupted frag queues are cleared immediately, instead of by timeout; - testing of overlapping fragments is now much easier: any kind of random fragment length mutation now leads to the frag queue being discarded (IP packet dropped); before this patch, some overlaps were "corrected", with tests not seeing expected packet drops. Note that in one case (see "if (end&7)" conditional) the current behavior is preserved as there are concerns that this could be legitimate padding. Signed-off-by: Peter Oskolkov <posk@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29liquidio: fix race condition in instruction completion processingRick Farrington
In lio_enable_irq, the pkt_in_done count register was being cleared to zero. However, there could be some completed instructions which were not yet processed due to budget and limit constraints. So, only write this register with the number of actual completions that were processed. Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29liquidio: remove unnecessary delay when processing IQ responsesRick Farrington
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29Merge branch 'ethtool-drop-get_settings-and-set_settings-ops'David S. Miller
Michal Kubecek says: ==================== ethtool: drop get_settings and set_settings ops As Andrew Lunn pointed out in recent discussion, there is only one in tree driver left which still defines deprecated callbacks get_settings() and set_settings() in ethtool_ops. First patch converts this driver to get_link_ksettings() and set_link_ksettings(). Second patch then removes the deprecated callbacks from struct ethtool_ops and ethtool code which falls back to them. This doesn't break old versions of ethtool or any other userspace code using ETHTOOL_{G,S}SET. We still implement both (old) ETHTOOL_{G,S}SET and (new) ETHTOOL_{G,S}LINKSETTINGS ioctl commands but after this series both will be implemented only using {g,s}et_link_ksettings(). The only affected code would be out of tree NIC drivers which have not been converted yet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29ethtool: drop get_settings and set_settings callbacksMichal Kubecek
Since [gs]et_settings ethtool_ops callbacks have been deprecated in February 2016, all in tree NIC drivers have been converted to provide [gs]et_link_ksettings() and out of tree drivers have had enough time to do the same. Drop get_settings() and set_settings() and implement both ETHTOOL_[GS]SET and ETHTOOL_[GS]LINKSETTINGS only using [gs]et_link_ksettings(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-298390/etherh: convert to ethtool_{get, set}_link_ksettingsMichal Kubecek
This is the last in-tree driver using the old {get,set}_settings API. Note: this is only build tested. I don't have the hardware at hand; as it's 10Mb/s half duplex device and driver can be built only for one subplatform of 32-bit ARM (Acorn RiscPC), it may be difficult to find someone who does. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net: thunderbolt: Convert to use SPDX identifierMika Westerberg
This gets rid of the licence boilerblate in favor of SPDX identifier which only takes a single line comment. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29genetlink: constify genl_err_attr() argumentMichal Kubecek
genl_err_attr() sets netlink_ext_ack::bad_attr which is a pointer to const struct nlattr so make the attr argument also const. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net: ethernet: Convert to using %pOFn instead of device_node.nameRob Herring
In preparation to remove the node name pointer from struct device_node, convert printf users to use the %pOFn format specifier. Cc: "David S. Miller" <davem@davemloft.net> Cc: Yisen Zhuang <yisen.zhuang@huawei.com> Cc: Salil Mehta <salil.mehta@huawei.com> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Cc: Felix Fietkau <nbd@openwrt.org> Cc: John Crispin <john@phrozen.org> Cc: Sean Wang <sean.wang@mediatek.com> Cc: Nelson Chang <nelson.chang@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Wingman Kwok <w-kwok2@ti.com> Cc: Murali Karicheri <m-karicheri2@ti.com> Cc: netdev@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Acked-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29net/ncsi: remove duplicated include from ncsi-netlink.cYueHaibing
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29Merge branch 'verifier-liveness-simplification'Alexei Starovoitov
Edward Cree says: ==================== The first patch is a simplification of register liveness tracking by using a separate parentage chain for each register and stack slot, thus avoiding the need for logic to handle callee-saved registers when applying read marks. In the future this idea may be extended to form use-def chains. The second patch adds information about misc/zero data on the stack to the state dumps emitted to the log at various points; this information was found essential in debugging the first patch, and may be useful elsewhere. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29bpf/verifier: display non-spill stack slot types in print_verifier_stateEdward Cree
If a stack slot does not hold a spilled register (STACK_SPILL), then each of its eight bytes could potentially have a different slot_type. This information can be important for debugging, and previously we either did not print anything for the stack slot, or just printed fp-X=0 in the case where its first byte was STACK_ZERO. Instead, print eight characters with either 0 (STACK_ZERO), m (STACK_MISC) or ? (STACK_INVALID) for any stack slot which is neither STACK_SPILL nor entirely STACK_INVALID. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29bpf/verifier: per-register parent pointersEdward Cree
By giving each register its own liveness chain, we elide the skip_callee() logic. Instead, each register's parent is the state it inherits from; both check_func_call() and prepare_func_exit() automatically connect reg states to the correct chain since when they copy the reg state across (r1-r5 into the callee as args, and r0 out as the return value) they also copy the parent pointer. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29Merge branch 'AF_XDP-zerocopy-for-i40e'Alexei Starovoitov
Björn Töpel says: ==================== This patch set introduces zero-copy AF_XDP support for Intel's i40e driver. In the first preparatory patch we also add support for XDP_REDIRECT for zero-copy allocated frames so that XDP programs can redirect them. This was a ToDo from the first AF_XDP zero-copy patch set from early June. Special thanks to Alex Duyck and Jesper Dangaard Brouer for reviewing earlier versions of this patch set. The i40e zero-copy code is located in its own file i40e_xsk.[ch]. Note that in the interest of time, to get an AF_XDP zero-copy implementation out there for people to try, some code paths have been copied from the XDP path to the zero-copy path. It is out goal to merge the two paths in later patch sets. In contrast to the implementation from beginning of June, this patch set does not require any extra HW queues for AF_XDP zero-copy TX. Instead, the XDP TX HW queue is used for both XDP_REDIRECT and AF_XDP zero-copy TX. Jeff, given that most of changes are in i40e, it is up to you how you would like to route these patches. The set is tagged bpf-next, but if taking it via the Intel driver tree is easier, let us know. We have run some benchmarks on a dual socket system with two Broadwell E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14 cores which gives a total of 28, but only two cores are used in these experiments. One for TR/RX and one for the user space application. The memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is 8192MB and with 8 of those DIMMs in the system we have 64 GB of total memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The NIC is Intel I40E 40Gbit/s using the i40e driver. Below are the results in Mpps of the I40E NIC benchmark runs for 64 and 1500 byte packets, generated by a commercial packet generator HW outputing packets at full 40 Gbit/s line rate. The results are with retpoline and all other spectre and meltdown fixes, so these results are not comparable to the ones from the zero-copy patch set in June. AF_XDP performance 64 byte packets. Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy rxdrop 2.6 8.2 15.0 txpush 2.2 - 21.9 l2fwd 1.7 2.3 11.3 AF_XDP performance 1500 byte packets: Benchmark XDP_SKB XDP_DRV XDP_DRV with zerocopy rxdrop 2.0 3.3 3.3 l2fwd 1.3 1.7 3.1 XDP performance on our system as a base line: 64 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 18.4M 0 1500 byte packets: XDP stats CPU pps issue-pps XDP-RX CPU 16 3.3M 0 The structure of the patch set is as follows: Patch 1: Add support for XDP_REDIRECT of zero-copy allocated frames Patches 2-4: Preparatory patches to common xsk and net code Patches 5-7: Preparatory patches to i40e driver code for RX Patch 8: i40e zero-copy support for RX Patch 9: Preparatory patch to i40e driver code for TX Patch 10: i40e zero-copy support for TX Patch 11: Add flags to sample application to force zero-copy/copy mode ==================== Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29samples/bpf: add -c/--copy -z/--zero-copy flags to xdpsockBjörn Töpel
The -c/--copy -z/--zero-copy flags enforces either copy or zero-copy mode. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: add AF_XDP zero-copy Tx supportMagnus Karlsson
This patch adds zero-copy Tx support for AF_XDP sockets. It implements the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a NAPI context. This means pulling egress packets from the Tx ring, placing the frames on the NIC HW descriptor ring and completing sent frames back to the application via the completion ring. The regular XDP Tx ring is used for AF_XDP as well. This rationale for this is as follows: XDP_REDIRECT guarantees mutual exclusion between different NAPI contexts based on CPU id. In other words, a netdev can XDP_REDIRECT to another netdev with a different NAPI context, since the operation is bound to a specific core and each core has its own hardware ring. As the AF_XDP Tx action is running in the same NAPI context and using the same ring, it will also be protected from XDP_REDIRECT actions with the exact same mechanism. As with AF_XDP Rx, all AF_XDP Tx specific functions are added to i40e_xsk.c. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: move common Tx functions to i40e_txrx_common.hMagnus Karlsson
This patch prepares for the upcoming zero-copy Tx functionality, by moving common functions and refactor chunks of code into re-usable functions, used both by the regular path and zero-copy path. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: add AF_XDP zero-copy Rx supportBjörn Töpel
This patch adds zero-copy Rx support for AF_XDP sockets. Instead of allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain queue. All AF_XDP specific functions are added to a new file, i40e_xsk.c. Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS will allocate a new buffer and copy the zero-copy frame prior passing it to the kernel stack. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: move common Rx functions to i40e_txrx_common.hBjörn Töpel
This patch prepares for the upcoming zero-copy Rx functionality, by moving/changing linkage of common functions, used both by the regular path and zero-copy path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: refactor Rx path for re-useBjörn Töpel
In this commit, the Rx path is refactored some, as a step torwards the introduction AF_XDP Rx zero-copy. The page re-use counter is moved into the i40e_reuse_rx_page, instead of bumping the counter in many places. The Rx buffer page clearing is moved for better readability. Lastely, functions to update statistics and bump the XDP Tx ring are introduced. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29i40e: added queue pair disable/enable functionsBjörn Töpel
Add functions for queue pair enable/disable. Instead of resetting the whole device, only the affected queue pair is disabled or enabled. This plumbing is used in a later commit, when zero-copy AF_XDP support is introduced. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29net: add napi_if_scheduled_mark_missedMagnus Karlsson
The function napi_if_scheduled_mark_missed is used to check if the NAPI context is scheduled, if so set NAPIF_STATE_MISSED and return true. Used by the AF_XDP zero-copy i40e Tx code implementation in order to make sure that irq affinity is honored by the napi context. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29xsk: expose xdp_umem_get_{data,dma} to driversBjörn Töpel
Move the xdp_umem_get_{data,dma} functions to include/net/xdp_sock.h, so that the upcoming zero-copy implementation in the Ethernet drivers can utilize them. Also, supply some dummy function implementations for CONFIG_XDP_SOCKETS=n configs. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29xdp: export xdp_rxq_info_unreg_mem_modelBjörn Töpel
Export __xdp_rxq_info_unreg_mem_model as xdp_rxq_info_unreg_mem_model, so it can be used from netdev drivers. Also, add additional checks for the memory type. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29xdp: implement convert_to_xdp_frame for MEM_TYPE_ZERO_COPYBjörn Töpel
This commit adds proper MEM_TYPE_ZERO_COPY support for convert_to_xdp_frame. Converting a MEM_TYPE_ZERO_COPY xdp_buff to an xdp_frame is done by transforming the MEM_TYPE_ZERO_COPY buffer into a MEM_TYPE_PAGE_ORDER0 frame. This is costly, and in the future it might make sense to implement a more sophisticated thread-safe alloc/free scheme for MEM_TYPE_ZERO_COPY, so that no allocation and copy is required in the fast-path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29bpf: use --cgroup in test_suite if suppliedJohn Fastabend
If the user supplies a --cgroup value in the arguments when running the test_suite go ahaead and run the self tests there. I use this to test with multiple cgroup users. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-29bpf: sockmap test remove shutdown() callsJohn Fastabend
Currently, we do a shutdown(sk, SHUT_RDWR) on both peer sockets and a shutdown on the sender as well. However, this is incorrect and can occasionally cause issues if you happen to have bad timing. First peer1 or peer2 may still be in use depending on the test and timing. Second we really should only be closing the read side and/or write side depending on if the test is receiving or sending. But, really none of this is needed just remove the shutdown calls. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-29bpf: remove duplicated include from syscall.cYueHaibing
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-29cfg80211: clarify frames covered by average ACK signal reportBalaji Pothunoori
Modify the API to include all ACK frames in average ACK signal strength reporting, not just ACKs for data frames. Make exposing the data conditional on implementing the extended feature flag. This is how it was really implemented in mac80211, update the code there to use the new defines and clean up some of the setting code. Keep nl80211.h source compatibility by keeping the old names. Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org> [rewrite commit log, change compatibility to be old=new instead of the other way around, update kernel-doc, roll in mac80211 changes, make mac80211 depend on valid bit instead of HW flag] Signed-off-by: Johannes Berg <johannes.berg@intel.com>