summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-08-31net: phy: add Lynx PCS moduleIoana Ciornei
Add a Lynx PCS module which exposes the necessary operations to drive the PCS using phylink. The majority of the code is extracted from the Felix DSA driver, which will be also modified in a later patch, and exposed as a separate module for code reusability purposes. As such, this aims at feature and bug parity with the existing Felix DSA driver, and thus USXGMII, SGMII, QSGMII and 2500Base-X (only w/o in-band AN) are supported by the Lynx PCS module since these were also supported by Felix. The module can only be enabled by the drivers in need and not user selectable. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: mdiobus: add clause 45 mdiobus write accessorIoana Ciornei
Add the locked variant of the clause 45 mdiobus write accessor - mdiobus_c45_write(). Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: phylink: consider QSGMII interface mode in phylink_mii_c22_pcs_get_stateIoana Ciornei
The same link partner advertisement word is used for both QSGMII and SGMII, thus treat both interface modes using the same phylink_decode_sgmii_word() function. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: phylink: add helper function to decode USXGMII wordIoana Ciornei
With the new addition of the USXGMII link partner ability constants we can now introduce a phylink helper that decodes the USXGMII word and populates the appropriate fields in the phylink_link_state structure based on them. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net/wan/fsl_ucc_hdlc: Add MODULE_DESCRIPTIONYueHaibing
Add missing MODULE_DESCRIPTION. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: hns: Remove unused macro AE_NAME_PORT_ID_IDXYueHaibing
There is no caller in tree. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: dl2k: Remove unused macro DRV_NAMEYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: wan: slic_ds26522: Remove unused macro DRV_NAMEYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31tipc: Remove unused macro TIPC_NACK_INTVYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31tipc: Remove unused macro TIPC_FWD_MSGYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31mptcp: Remove unused macro MPTCP_SAME_STATEYueHaibing
There is no caller in tree any more. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: clean up codestyleMiaohe Lin
This is a pure codestyle cleanup patch. No functional change intended. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: Use helper macro IP_MAX_MTU in __ip_append_data()Miaohe Lin
What 0xFFFF means here is actually the max mtu of a ip packet. Use help macro IP_MAX_MTU here. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: ethernet: ti: am65-cpts: fix i2083 genf (and estf) Reconfiguration IssueGrygorii Strashko
The new bit TX_GENF_CLR_EN has been added in AM65x SR2.0 to fix i2083 errata, which can be just set unconditionally for all SoCs. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31Merge branch 'sfc-clean-up-some-W-1-build-warnings'David S. Miller
Edward Cree says: ==================== sfc: clean up some W=1 build warnings A collection of minor fixes to issues flagged up by W=1. After this series, the only remaining warnings in the sfc driver are some 'member missing in kerneldoc' warnings from ptp.c. Tested by building on x86_64 and running 'ethtool -p' on an EF10 NIC; there was no error, but I couldn't observe the actual LED as I'm working remotely. [ Incidentally, ethtool_phys_id()'s behaviour on an error return looks strange — if I'm reading it right, it will break out of the inner loop but not the outer one, and eventually return the rc from the last run of the inner loop. Is this intended? ] ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31sfc: return errors from efx_mcdi_set_id_led, and de-indirectEdward Cree
W=1 warnings indicated that 'rc' was unused in efx_mcdi_set_id_led(); change the function to return int instead of void and plumb the rc through the caller efx_ethtool_phys_id(). Since (post-Falcon) all sfc NICs use MCDI for this, there's no point in indirecting through a nic_type method, so remove that and just call efx_mcdi_set_id_led() directly. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31sfc: fix kernel-doc on struct efx_loopback_stateEdward Cree
Missing 'struct' keyword caused "cannot understand function prototype" warnings. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31sfc: fix unused-but-set-variable warning in efx_farch_filter_remove_safeEdward Cree
Thanks to some past refactor, 'spec' is not actually used in this function; the code using it moved to the callee efx_farch_filter_remove. Remove the variable to fix a W=1 warning. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31sfc: fix W=1 warnings in efx_farch_handle_rx_not_okEdward Cree
Some of these RX-event flags aren't used at all, so remove them. Others are used only #ifdef DEBUG to log a message; suppress the unused-var warnings #ifndef DEBUG with a void cast. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31Merge branch 'Add-ip6_fragment-in-ipv6_stub'David S. Miller
wenxu says: ==================== Add ip6_fragment in ipv6_stub Add ip6_fragment in ipv6_stub and use it in openvswitch This version add default function eafnosupport_ipv6_fragment ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31openvswitch: using ip6_fragment in ipv6_stubwenxu
Using ipv6_stub->ipv6_fragment to avoid the netfilter dependency Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31ipv6: add ipv6_fragment hook in ipv6_stubwenxu
Add ipv6_fragment to ipv6_stub to avoid calling netfilter when access ip6_fragment. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31Merge branch 'gtp-minor-enhancements'David S. Miller
Nicolas Dichtel says: ==================== gtp: minor enhancements The first patch removes a useless rcu lock and the second relax alloc constraints when a PDP context is added. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31gtp: relax alloc constraint when adding a pdpNicolas Dichtel
When a PDP context is added, the rtnl lock is held, thus no need to force a GFP_ATOMIC. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31gtp: remove useless rcu_read_lock()Nicolas Dichtel
The rtnl lock is taken just the line above, no need to take the rcu also. Fixes: 1788b8569f5d ("gtp: fix use-after-free in gtp_encap_destroy()") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31net: phylink: avoid oops during initialisationRussell King
If we intend to use PCS operations, mac_pcs_get_state() will not be implemented, so will be NULL. If we also intend to register the PCS operations in mac_prepare() or mac_config(), then this leads to an attempt to call NULL function pointer during phylink_start(). Avoid this, but we must report the link is down. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31Merge branch 'hinic-add-debugfs-support'David S. Miller
Luo bin says: ==================== hinic: add debugfs support add debugfs node for querying sq/rq info and function table ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31hinic: add support to query function tableLuo bin
add debugfs node for querying function table, for example: cat /sys/kernel/debug/hinic/0000:15:00.0/func_table/valid Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31hinic: add support to query rq infoLuo bin
add debugfs node for querying rq info, for example: cat /sys/kernel/debug/hinic/0000:15:00.0/RQs/0x0/rq_hw_pi Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31hinic: add support to query sq infoLuo bin
add debugfs node for querying sq info, for example: cat /sys/kernel/debug/hinic/0000:15:00.0/SQs/0x0/sq_pi Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31xsk: Documentation for XDP_SHARED_UMEM between queues and netdevsMagnus Karlsson
Add documentation for the XDP_SHARED_UMEM feature when a UMEM is shared between different queues and/or netdevs. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-16-git-send-email-magnus.karlsson@intel.com
2020-08-31samples/bpf: Add new sample xsk_fwd.cCristian Dumitrescu
This sample code illustrates the packet forwarding between multiple AF_XDP sockets in multi-threading environment. All the threads and sockets are sharing a common buffer pool, with each socket having its own private buffer cache. The sockets are created with the xsk_socket__create_shared() function, which allows multiple AF_XDP sockets to share the same UMEM object. Example 1: Single thread handling two sockets. Packets received from socket A (on top of interface IFA, queue QA) are forwarded to socket B (on top of interface IFB, queue QB) and vice-versa. The thread is affinitized to CPU core C: ./xsk_fwd -i IFA -q QA -i IFB -q QB -c C Example 2: Two threads, each handling two sockets. Packets from socket A are sent to socket B (by thread X), packets from socket B are sent to socket A (by thread X); packets from socket C are sent to socket D (by thread Y), packets from socket D are sent to socket C (by thread Y). The two threads are bound to CPU cores CX and CY: ./xdp_fwd -i IFA -q QA -i IFB -q QB -i IFC -q QC -i IFD -q QD -c CX -c CY Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-15-git-send-email-magnus.karlsson@intel.com
2020-08-31libbpf: Support shared umems between queues and devicesMagnus Karlsson
Add support for shared umems between hardware queues and devices to the AF_XDP part of libbpf. This so that zero-copy can be achieved in applications that want to send and receive packets between HW queues on one device or between different devices/netdevs. In order to create sockets that share a umem between hardware queues and devices, a new function has been added called xsk_socket__create_shared(). It takes the same arguments as xsk_socket_create() plus references to a fill ring and a completion ring. So for every socket that share a umem, you need to have one more set of fill and completion rings. This in order to maintain the single-producer single-consumer semantics of the rings. You can create all the sockets via the new xsk_socket__create_shared() call, or create the first one with xsk_socket__create() and the rest with xsk_socket__create_shared(). Both methods work. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-14-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Add shared umem support between devicesMagnus Karlsson
Add support to share a umem between different devices. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same device. Note that when sharing a umem between devices, just as in the case of sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket (with two setsockopts, one for each ring) before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics of the rings can be upheld. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-13-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Add shared umem support between queue idsMagnus Karlsson
Add support to share a umem between queue ids on the same device. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same queue id and device, and you shared one set of fill and completion rings. However, note that when sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics can be upheld. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-12-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for better ↵Magnus Karlsson
performance Test for dma_need_sync earlier to increase performance. xsk_buff_dma_sync_for_cpu() takes an xdp_buff as parameter and from that the xsk_buff_pool reference is dug out. Perf shows that this dereference causes a lot of cache misses. But as the buffer pool is now sent down to the driver at zero-copy initialization time, we might as well use this pointer directly, instead of going via the xsk_buff and we can do so already in xsk_buff_dma_sync_for_cpu() instead of in xp_dma_sync_for_cpu. This gets rid of these cache misses. Throughput increases with 3% for the xdpsock l2fwd sample application on my machine. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-11-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Rearrange internal structs for better performanceMagnus Karlsson
Rearrange the xdp_sock, xdp_umem and xsk_buff_pool structures so that they get smaller and align better to the cache lines. In the previous commits of this patch set, these structs have been reordered with the focus on functionality and simplicity, not performance. This patch improves throughput performance by around 3%. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-10-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Enable sharing of dma mappingsMagnus Karlsson
Enable the sharing of dma mappings by moving them out from the buffer pool. Instead we put each dma mapped umem region in a list in the umem structure. If dma has already been mapped for this umem and device, it is not mapped again and the existing dma mappings are reused. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-9-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move addrs from buffer pool to umemMagnus Karlsson
Replicate the addrs pointer in the buffer pool to the umem. This mapping will be the same for all buffer pools sharing the same umem. In the buffer pool we leave the addrs pointer for performance reasons. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-8-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move xsk_tx_list and its lock to buffer poolMagnus Karlsson
Move the xsk_tx_list and the xsk_tx_list_lock from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one xsk_tx_list per device and queue id, so it should be located in the buffer pool. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-7-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move queue_id, dev and need_wakeup to buffer poolMagnus Karlsson
Move queue_id, dev, and need_wakeup from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one buffer pool per dev and queue id, so these variables should belong to the buffer pool, not the umem. Need_wakeup is also something that is set on a per napi level, so there is usually one per device and queue id. So move this to the buffer pool too. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-6-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move fill and completion rings to buffer poolMagnus Karlsson
Move the fill and completion rings from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queue ids. In this case, we need one fill and completion ring per queue id. As the buffer pool is per queue id and napi id this is a natural place for it and one umem struture can be shared between these buffer pools. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-5-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Create and free buffer pool independently from umemMagnus Karlsson
Create and free the buffer pool independently from the umem. Move these operations that are performed on the buffer pool from the umem create and destroy functions to new create and destroy functions just for the buffer pool. This so that in later commits we can instantiate multiple buffer pools per umem when sharing a umem between HW queues and/or devices. We also erradicate the back pointer from the umem to the buffer pool as this will not work when we introduce the possibility to have multiple buffer pools per umem. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-4-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Rename xsk zero-copy driver interfacesMagnus Karlsson
Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umemMagnus Karlsson
Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
2020-08-31netlink: policy: correct validation type checkJohannes Berg
In the policy export for binary attributes I erroneously used a != NLA_VALIDATE_NONE comparison instead of checking for the two possible values, which meant that if a validation function pointer ended up aliasing the min/max as negatives, we'd hit a warning in nla_get_range_unsigned(). Fix this to correctly check for only the two types that should be handled here, i.e. range with or without warn-too-long. Reported-by: syzbot+353df1490da781637624@syzkaller.appspotmail.com Fixes: 8aa26c575fb3 ("netlink: make NLA_BINARY validation more flexible") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31bpf: Fix build without BPF_LSM.Alexei Starovoitov
resolve_btfids doesn't like empty set. Add unused ID when BPF_LSM is off. Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs") Reported-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Song Liu <songliubraving@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200831163132.66521-1-alexei.starovoitov@gmail.com
2020-08-31bpf: Fix build without BPF_SYSCALL, but with BPF_JIT.Alexei Starovoitov
When CONFIG_BPF_SYSCALL is not set, but CONFIG_BPF_JIT=y the kernel build fails: In file included from ../kernel/bpf/trampoline.c:11: ../kernel/bpf/trampoline.c: In function ‘bpf_trampoline_update’: ../kernel/bpf/trampoline.c:220:39: error: ‘call_rcu_tasks_trace’ undeclared ../kernel/bpf/trampoline.c: In function ‘__bpf_prog_enter_sleepable’: ../kernel/bpf/trampoline.c:411:2: error: implicit declaration of function ‘rcu_read_lock_trace’ ../kernel/bpf/trampoline.c: In function ‘__bpf_prog_exit_sleepable’: ../kernel/bpf/trampoline.c:416:2: error: implicit declaration of function ‘rcu_read_unlock_trace’ This is due to: obj-$(CONFIG_BPF_JIT) += trampoline.o obj-$(CONFIG_BPF_JIT) += dispatcher.o There is a number of functions that arch/x86/net/bpf_jit_comp.c is using from these two files, but none of them will be used when only cBPF is on (which is the case for BPF_SYSCALL=n BPF_JIT=y). Add rcu_trace functions to rcupdate_trace.h. The JITed code won't execute them and BPF trampoline logic won't be used without BPF_SYSCALL. Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/bpf/20200831155155.62754-1-alexei.starovoitov@gmail.com
2020-08-28Merge branch 'bpf-sleepable'Daniel Borkmann
Alexei Starovoitov says: ==================== v2->v3: - switched to minimal allowlist approach. Essentially that means that syscall entry, few btrfs allow_error_inject functions, should_fail_bio(), and two LSM hooks: file_mprotect and bprm_committed_creds are the only hooks that allow attaching of sleepable BPF programs. When comprehensive analysis of LSM hooks will be done this allowlist will be extended. - added patch 1 that fixes prototypes of two mm functions to reliably work with error injection. It's also necessary for resolve_btfids tool to recognize these two funcs, but that's secondary. v1->v2: - split fmod_ret fix into separate patch - added denylist v1: This patch set introduces the minimal viable support for sleepable bpf programs. In this patch only fentry/fexit/fmod_ret and lsm progs can be sleepable. Only array and pre-allocated hash and lru maps allowed. Here is 'perf report' difference of sleepable vs non-sleepable: 3.86% bench [k] __srcu_read_unlock 3.22% bench [k] __srcu_read_lock 0.92% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep 0.50% bench [k] bpf_trampoline_10297 0.26% bench [k] __bpf_prog_exit_sleepable 0.21% bench [k] __bpf_prog_enter_sleepable vs 0.88% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry 0.84% bench [k] bpf_trampoline_10297 0.13% bench [k] __bpf_prog_enter 0.12% bench [k] __bpf_prog_exit vs 0.79% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep 0.72% bench [k] bpf_trampoline_10381 0.31% bench [k] __bpf_prog_exit_sleepable 0.29% bench [k] __bpf_prog_enter_sleepable Sleepable vs non-sleepable program invocation overhead is only marginally higher due to rcu_trace. srcu approach is much slower. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-08-28selftests/bpf: Add sleepable testsAlexei Starovoitov
Modify few tests to sanity test sleepable bpf functionality. Running 'bench trig-fentry-sleep' vs 'bench trig-fentry' and 'perf report': sleepable with SRCU: 3.86% bench [k] __srcu_read_unlock 3.22% bench [k] __srcu_read_lock 0.92% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep 0.50% bench [k] bpf_trampoline_10297 0.26% bench [k] __bpf_prog_exit_sleepable 0.21% bench [k] __bpf_prog_enter_sleepable sleepable with RCU_TRACE: 0.79% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep 0.72% bench [k] bpf_trampoline_10381 0.31% bench [k] __bpf_prog_exit_sleepable 0.29% bench [k] __bpf_prog_enter_sleepable non-sleepable with RCU: 0.88% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry 0.84% bench [k] bpf_trampoline_10297 0.13% bench [k] __bpf_prog_enter 0.12% bench [k] __bpf_prog_exit Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-6-alexei.starovoitov@gmail.com