summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-06-20netlabel: Reorder fields in 'struct netlbl_domaddr6_map'Christophe JAILLET
Group some variables based on their sizes to reduce hole and avoid padding. On x86_64, this shrinks the size of 'struct netlbl_domaddr6_map' from 72 to 64 bytes. It saves a few bytes of memory and is more cache-line friendly. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/aa109847260e51e174c823b6d1441f75be370f01.1687083361.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20mptcp: Reorder fields in 'struct mptcp_pm_add_entry'Christophe JAILLET
Group some variables based on their sizes to reduce hole and avoid padding. On x86_64, this shrinks the size of 'struct mptcp_pm_add_entry' from 136 to 128 bytes. It saves a few bytes of memory and is more cache-line friendly. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/e47b71de54fd3e580544be56fc1bb2985c77b0f4.1687081558.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20netfilter: nf_tables: Fix for deleting base chains with payloadPhil Sutter
When deleting a base chain, iptables-nft simply submits the whole chain to the kernel, including the NFTA_CHAIN_HOOK attribute. The new code added by fixed commit then turned this into a chain update, destroying the hook but not the chain itself. Detect the situation by checking if the chain type is either netdev or inet/ingress. Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nfnetlink_osf: fix module autoloadPablo Neira Ayuso
Move the alias from xt_osf to nfnetlink_osf. Fixes: f9324952088f ("netfilter: nfnetlink_osf: extract nfnetlink_subsystem code from xt_osf.c") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: drop module reference after updating chainPablo Neira Ayuso
Otherwise the module reference counter is leaked. Fixes b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: disallow timeout for anonymous setsPablo Neira Ayuso
Never used from userspace, disallow these parameters. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: disallow updates of anonymous setsPablo Neira Ayuso
Disallow updates of set timeout and garbage collection parameters for anonymous sets. Fixes: 123b99619cca ("netfilter: nf_tables: honor set timeout and garbage collection updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: reject unbound chain set before commit phasePablo Neira Ayuso
Use binding list to track set transaction and to check for unbound chains before entering the commit phase. Bail out if chain binding remain unused before entering the commit step. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: reject unbound anonymous set before commit phasePablo Neira Ayuso
Add a new list to track set transaction and to check for unbound anonymous sets before entering the commit phase. Bail out at the end of the transaction handling if an anonymous set remains unbound. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: disallow element updates of bound anonymous setsPablo Neira Ayuso
Anonymous sets come with NFT_SET_CONSTANT from userspace. Although API allows to create anonymous sets without NFT_SET_CONSTANT, it makes no sense to allow to add and to delete elements for bound anonymous sets. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: fix underflow in object reference counterPablo Neira Ayuso
Since ("netfilter: nf_tables: drop map element references from preparation phase"), integration with commit protocol is better, therefore drop the workaround that b91d90368837 ("netfilter: nf_tables: fix leaking object reference count") provides. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nft_set_pipapo: .walk does not deal with generationsPablo Neira Ayuso
The .walk callback iterates over the current active set, but it might be useful to iterate over the next generation set. Use the generation mask to determine what set view (either current or next generation) is use for the walk iteration. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: drop map element references from preparation phasePablo Neira Ayuso
set .destroy callback releases the references to other objects in maps. This is very late and it results in spurious EBUSY errors. Drop refcount from the preparation phase instead, update set backend not to drop reference counter from set .destroy path. Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the reference counter because the transaction abort path releases the map references for each element since the set is unbound. The abort path also deals with releasing reference counter for new elements added to unbound sets. Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chainPablo Neira Ayuso
Add a new state to deal with rule expressions deactivation from the newrule error path, otherwise the anonymous set remains in the list in inactive state for the next generation. Mark the set/chain transaction as unbound so the abort path releases this object, set it as inactive in the next generation so it is not reachable anymore from this transaction and reference counter is dropped. Fixes: 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: nf_tables: fix chain binding transaction logicPablo Neira Ayuso
Add bound flag to rule and chain transactions as in 6a0a8d10a366 ("netfilter: nf_tables: use-after-free in failing rule with bound set") to skip them in case that the chain is already bound from the abort path. This patch fixes an imbalance in the chain use refcnt that triggers a WARN_ON on the table and chain destroy path. This patch also disallows nested chain bindings, which is not supported from userspace. The logic to deal with chain binding in nft_data_hold() and nft_data_release() is not correct. The NFT_TRANS_PREPARE state needs a special handling in case a chain is bound but next expressions in the same rule fail to initialize as described by 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE"). The chain is left bound if rule construction fails, so the objects stored in this chain (and the chain itself) are released by the transaction records from the abort path, follow up patch ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain") completes this error handling. When deleting an existing rule, chain bound flag is set off so the rule expression .destroy path releases the objects. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-20netfilter: ipset: Replace strlcpy with strscpyAzeem Shaikh
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). Direct replacement is safe here since return value from all callers of STRLCPY macro were ignored. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230613003437.3538694-1-azeemshaikh38@gmail.com
2023-06-20Merge tag 'ipsec-2023-06-20' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ipsec-2023-06-20
2023-06-20net: dsa: introduce preferred_default_local_cpu_port and use on MT7530Vladimir Oltean
Since the introduction of the OF bindings, DSA has always had a policy that in case multiple CPU ports are present in the device tree, the numerically smallest one is always chosen. The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because it has higher bandwidth. The MT7530 driver developers had 3 options: - to modify DSA when the MT7531 switch support was introduced, such as to prefer the better port - to declare both CPU ports in device trees as CPU ports, and live with the sub-optimal performance resulting from not preferring the better port - to declare just port 6 in the device tree as a CPU port Of course they chose the path of least resistance (3rd option), kicking the can down the road. The hardware description in the device tree is supposed to be stable - developers are not supposed to adopt the strategy of piecemeal hardware description, where the device tree is updated in lockstep with the features that the kernel currently supports. Now, as a result of the fact that they did that, any attempts to modify the device tree and describe both CPU ports as CPU ports would make DSA change its default selection from port 6 to 5, effectively resulting in a performance degradation visible to users with the MT7531BE switch as can be seen below. Without preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender [ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver With preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender [ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver Using one port for WAN and the other ports for LAN is a very popular use case which is what this test emulates. As such, this change proposes that we retroactively modify stable kernels (which don't support the modification of the CPU port assignments, so as to let user space fix the problem and restore the throughput) to keep the mt7530 driver preferring port 6 even with device trees where the hardware is more fully described. Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20Merge tag 'ieee802154-for-net-2023-06-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan Stefan Schmidt says: ==================== An update from ieee802154 for your *net* tree: Two small fixes and MAINTAINERS update this time. Azeem Shaikh ensured consistent use of strscpy through the tree and fixed the usage in our trace.h. Chen Aotian fixed a potential memory leak in the hwsim simulator for ieee802154. Miquel Raynal updated the MAINATINERS file with the new team git tree locations and patchwork URLs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-19NFS: add sysfs shutdown knobBenjamin Coddington
Within each nfs_server sysfs tree, add an entry named "shutdown". Writing 1 to this file will set the cl_shutdown bit on the rpc_clnt structs associated with that mount. If cl_shutdown is set, the task scheduler immediately returns -EIO for new tasks. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19NFS: Add sysfs links to sunrpc clients for nfs_clientsBenjamin Coddington
For the general and state management nfs_client under each mount, create symlinks to their respective rpc_client sysfs entries. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19ipv6: exthdrs: Remove redundant skb_headlen() check in ip6_parse_tlv().Kuniyuki Iwashima
ipv6_destopt_rcv() and ipv6_parse_hopopts() pulls these data - Hop-by-Hop/Destination Options Header : 8 - Hdr Ext Len : skb_transport_header(skb)[1] << 3 and calls ip6_parse_tlv(), so it need not check if skb_headlen() is less than skb_transport_offset(skb) + (skb_transport_header(skb)[1] << 3). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-19ipv6: exthdrs: Reload hdr only when needed in ipv6_srh_rcv().Kuniyuki Iwashima
We need not reload hdr in ipv6_srh_rcv() unless we call pskb_expand_head(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-19ipv6: exthdrs: Replace pskb_pull() with skb_pull() in ipv6_srh_rcv().Kuniyuki Iwashima
ipv6_rthdr_rcv() pulls these data - Segment Routing Header : 8 - Hdr Ext Len : skb_transport_header(skb)[1] << 3 needed by ipv6_srh_rcv(), so pskb_pull() in ipv6_srh_rcv() never fails and can be replaced with skb_pull(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-19ipv6: rpl: Remove redundant multicast tests in ipv6_rpl_srh_rcv().Kuniyuki Iwashima
ipv6_rpl_srh_rcv() checks if ipv6_hdr(skb)->daddr or ohdr->rpl_segaddr[i] is the multicast address with ipv6_addr_type(). We have the same check for ipv6_hdr(skb)->daddr in ipv6_rthdr_rcv(), so we need not recheck it in ipv6_rpl_srh_rcv(). Also, we should use ipv6_addr_is_multicast() for ohdr->rpl_segaddr[i] instead of ipv6_addr_type(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-19ipv6: rpl: Remove pskb(_may)?_pull() in ipv6_rpl_srh_rcv().Kuniyuki Iwashima
As Eric Dumazet pointed out [0], ipv6_rthdr_rcv() pulls these data - Segment Routing Header : 8 - Hdr Ext Len : skb_transport_header(skb)[1] << 3 needed by ipv6_rpl_srh_rcv(). We can remove pskb_may_pull() and replace pskb_pull() with skb_pull() in ipv6_rpl_srh_rcv(). Link: https://lore.kernel.org/netdev/CANn89iLboLwLrHXeHJucAqBkEL_S0rJFog68t7wwwXO-aNf5Mg@mail.gmail.com/ [0] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-19SUNRPC: Add a TCP-with-TLS RPC transport classChuck Lever
Use the new TLS handshake API to enable the SunRPC client code to request a TLS handshake. This implements support for RFC 9289, only on TCP sockets. Upper layers such as NFS use RPC-with-TLS to protect in-transit traffic. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: Capture CMSG metadata on client-side receiveChuck Lever
kTLS sockets use CMSG to report decryption errors and the need for session re-keying. For RPC-with-TLS, an "application data" message contains a ULP payload, and that is passed along to the RPC client. An "alert" message triggers connection reset. Everything else is discarded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: Ignore data_ready callbacks during TLS handshakesChuck Lever
The RPC header parser doesn't recognize TLS handshake traffic, so it will close the connection prematurely with an error. To avoid that, shunt the transport's data_ready callback when there is a TLS handshake in progress. The XPRT_SOCK_IGNORE_RECV flag will be toggled by code added in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: Add RPC client support for the RPC_AUTH_TLS auth flavorChuck Lever
The new authentication flavor is used only to discover peer support for RPC-over-TLS. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: Trace the rpc_create_argsChuck Lever
Pass the upper layer's rpc_create_args to the rpc_clnt_new() tracepoint so additional parts of the upper layer's request can be recorded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: Plumb an API for setting transport layer securityChuck Lever
Add an initial set of policies along with fields for upper layers to pass the requested policy down to the transport layer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: attempt to reach rpcbind with an abstract socket nameNeilBrown
NFS is primarily name-spaced using network namespaces. However it contacts rpcbind (and gss_proxy) using AF_UNIX sockets which are name-spaced using the mount namespaces. This requires a container using NFSv3 (the form that requires rpcbind) to manage both network and mount namespaces, which can seem an unnecessary burden. As NFS is primarily a network service it makes sense to use network namespaces as much as possible, and to prefer to communicate with an rpcbind running in the same network namespace. This can be done, while preserving the benefits of AF_UNIX sockets, by using an abstract socket address. An abstract address has a nul at the start of sun_path, and a length that is exactly the complete size of the sockaddr_un up to the end of the name, NOT including any trailing nul (which is not part of the address). Abstract addresses are local to a network namespace - regular AF_UNIX path names a resolved in the mount namespace ignoring the network namespace. This patch causes rpcb to first try an abstract address before continuing with regular AF_UNIX and then IP addresses. This ensures backwards compatibility. Choosing the name needs some care as the same address will be configured for rpcbind, and needs to be built in to libtirpc for this enhancement to be fully successful. There is no formal standard for choosing abstract addresses. The defacto standard appears to be to use a path name similar to what would be used for a filesystem AF_UNIX address - but with a leading nul. In that case "\0/var/run/rpcbind.sock" seems like the best choice. However at this time /var/run is deprecated in favour of /run, so "\0/run/rpcbind.sock" might be better. Though as we are deliberately moving away from using the filesystem it might seem more sensible to explicitly break the connection and just have "\0rpcbind.socket" using the same name as the systemd unit file.. This patch chooses the second option, which seems least likely to raise objections. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19SUNRPC: support abstract unix socket addressesNeilBrown
An "abtract" address for an AF_UNIX socket start with a nul and can contain any bytes for the given length, but traditionally doesn't contain other nuls. When reported, the leading nul is replaced by '@'. sunrpc currently rejects connections to these addresses and reports them as an empty string. To provide support for future use of these addresses, allow them for outgoing connections and report them more usefully. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19wifi: mac80211: add eht_capa debugfs fieldBen Greear
Output looks like this: [root@ct523c-0b29 ~]# cat /debug/ieee80211/wiphy6/netdev\:wlan6/stations/50\:28\:4a\:bd\:f4\:a7/eht_capa EHT supported MAC-CAP: 0x82 0x00 PHY-CAP: 0x0c 0x00 0x00 0x00 0x00 0x48 0x00 0x00 0x00 OM-CONTROL MAX-MPDU-LEN: 11454 242-TONE-RU-GT20MHZ NDP-4-EHT-LFT-32-GI BEAMFORMEE-80-NSS: 0 BEAMFORMEE-160-NSS: 0 BEAMFORMEE-320-NSS: 0 SOUNDING-DIM-80-NSS: 0 SOUNDING-DIM-160-NSS: 0 SOUNDING-DIM-320-NSS: 0 MAX_NC: 0 PPE_THRESHOLD_PRESENT NOMINAL_PKT_PAD: 0us MAX-NUM-SUPP-EHT-LTF: 1 SUPP-EXTRA-EHT-LTF MCS15-SUPP-MASK: 0 EHT bw <= 80 MHz, max NSS for MCS 8-9: Rx=2, Tx=2 EHT bw <= 80 MHz, max NSS for MCS 10-11: Rx=2, Tx=2 EHT bw <= 80 MHz, max NSS for MCS 12-13: Rx=2, Tx=2 EHT bw <= 160 MHz, max NSS for MCS 8-9: Rx=0, Tx=0 EHT bw <= 160 MHz, max NSS for MCS 10-11: Rx=0, Tx=0 EHT bw <= 160 MHz, max NSS for MCS 12-13: Rx=0, Tx=0 EHT bw <= 320 MHz, max NSS for MCS 8-9: Rx=0, Tx=0 EHT bw <= 320 MHz, max NSS for MCS 10-11: Rx=0, Tx=0 EHT bw <= 320 MHz, max NSS for MCS 12-13: Rx=0, Tx=0 EHT PPE Thresholds: 0xc1 0x0e 0xe0 0x00 0x00 Signed-off-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20230517184428.999384-1-greearb@candelatech.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19ipvs: align inner_mac_header for encapsulationTerin Stock
When using encapsulation the original packet's headers are copied to the inner headers. This preserves the space for an inner mac header, which is not used by the inner payloads for the encapsulation types supported by IPVS. If a packet is using GUE or GRE encapsulation and needs to be segmented, flow can be passed to __skb_udp_tunnel_segment() which calculates a negative tunnel header length. A negative tunnel header length causes pskb_may_pull() to fail, dropping the packet. This can be observed by attaching probes to ip_vs_in_hook(), __dev_queue_xmit(), and __skb_udp_tunnel_segment(): perf probe --add '__dev_queue_xmit skb->inner_mac_header \ skb->inner_network_header skb->mac_header skb->network_header' perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen' perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \ skb->inner_network_header skb->mac_header skb->network_header' These probes the headers and tunnel header length for packets which traverse the IPVS encapsulation path. A TCP packet can be forced into the segmentation path by being smaller than a calculated clamped MSS, but larger than the advertised MSS. probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52 probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2 When using veth-based encapsulation, the interfaces are set to be mac-less, which does not preserve space for an inner mac header. This prevents this issue from occurring. In our real-world testing of sending a 32KB file we observed operation time increasing from ~75ms for veth-based encapsulation to over 1.5s using IPVS encapsulation due to retries from dropped packets. This changeset modifies the packet on the encapsulation path in ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac header offset. This fixes UDP segmentation for both encapsulation types, and corrects the inner headers for any IPIP flows that may use it. Fixes: 84c0d5e96f3a ("ipvs: allow tunneling with gue encapsulation") Signed-off-by: Terin Stock <terin@cloudflare.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-19bpf: Centralize permissions checks for all BPF map typesAndrii Nakryiko
This allows to do more centralized decisions later on, and generally makes it very explicit which maps are privileged and which are not (e.g., LRU_HASH and LRU_PERCPU_HASH, which are privileged HASH variants, as opposed to unprivileged HASH and HASH_PERCPU; now this is explicit and easy to verify). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230613223533.3689589-4-andrii@kernel.org
2023-06-19wifi: mac80211: check EHT basic MCS/NSS setJohannes Berg
Check that all the NSS in the EHT basic MCS/NSS set are actually supported, otherwise disable EHT for the connection. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.737827c906c9.I0c11a3cd46ab4dcb774c11a5bbc30aecfb6fce11@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: cfg80211: search all RNR elements for colocated APsBenjamin Berg
An AP reporting colocated APs may send more than one reduced neighbor report element. As such, iterate all elements instead of only parsing the first one when looking for colocated APs. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.ffe2c014f478.I372a4f96c88f7ea28ac39e94e0abfc465b5330d4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: cfg80211: stop parsing after allocation failureBenjamin Berg
The error handling code would break out of the loop incorrectly, causing the rest of the message to be misinterpreted. Fix this by also jumping out of the surrounding while loop, which will trigger the error detection code. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.0ffac98475cf.I6f5c08a09f5c9fced01497b95a9841ffd1b039f8@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: update multi-link element STA reconfigJohannes Berg
Update the MLE STA reconfig sub-type to 802.11be D3.0 format, which includes the operation update field. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.2e1383b31f07.I8055a111c8fcf22e833e60f5587a4d8d21caca5b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: mac80211: agg-tx: prevent start/stop raceJohannes Berg
There were crashes reported in this code, and the timer_shutdown() warning in one of the previous patches indicates that the timeout timer for the AP response (addba_resp_timer) is still armed while we're stopping the aggregation session. After a very long deliberation of the code, so far the only way I could find that might cause this would be the following sequence: - session start requested - session start indicated to driver, but driver returns IEEE80211_AMPDU_TX_START_DELAY_ADDBA - session stop requested, sets HT_AGG_STATE_WANT_STOP - session stop worker runs ___ieee80211_stop_tx_ba_session(), sets HT_AGG_STATE_STOPPING From here on, the order doesn't matter exactly, but: 1. driver calls ieee80211_start_tx_ba_cb_irqsafe(), setting HT_AGG_STATE_START_CB 2. driver calls ieee80211_stop_tx_ba_cb_irqsafe(), setting HT_AGG_STATE_STOP_CB 3. the worker will run ieee80211_start_tx_ba_cb() for HT_AGG_STATE_START_CB 4. the worker will run ieee80211_stop_tx_ba_cb() for HT_AGG_STATE_STOP_CB (the order could also be 1./3./2./4.) This will cause ieee80211_start_tx_ba_cb() to send out the AddBA request frame to the AP and arm the timer, but we're already in the middle of stopping and so the ieee80211_stop_tx_ba_cb() will no longer assume it needs to stop anything. Prevent this by checking for WANT_STOP/STOPPING in the start CB, and warn if we're sending a frame on a stopping session. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.e5b52777462a.I0b2ed6658e81804279f5d7c9c1918cb1f6626bf2@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: mac80211: agg-tx: add a few locking assertionsJohannes Berg
This is all true today, but difficult to understand since the callers are in other files etc. Add two new lockdep assertions to make things easier to read. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.7f03dec6a90b.I762c11e95da005b80fa0184cb1173b99ec362acf@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: mac80211: Support link removal using Reconfiguration ML elementIlan Peer
Add support for handling link removal indicated by the Reconfiguration Multi-Link element. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.d8a046dc0c1a.I4dcf794da2a2d9f4e5f63a4b32158075d27c0660@changeid [use cfg80211_links_removed() API instead] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: mac80211: add set_active_links variant not locking sdataBenjamin Berg
There are cases where keeping sdata locked for an operation. Add a variant that does not take sdata lock to permit these usecases. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: mac80211: add ___ieee80211_disconnect variant not locking sdataBenjamin Berg
There are cases where keeping sdata locked for an operation. Add a variant that does not take sdata lock to permit these usecases. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: cfg80211/nl80211: Add support to indicate STA MLD setup links removalVeerendranath Jakkam
STA MLD setup links may get removed if AP MLD remove the corresponding affiliated APs with Multi-Link reconfiguration as described in P802.11be_D3.0, section 35.3.6.2.2 Removing affiliated APs. Currently, there is no support to notify such operation to cfg80211 and userspace. Add support for the drivers to indicate STA MLD setup links removal to cfg80211 and notify the same to userspace. Upon receiving such indication from the driver, clear the MLO links information of the removed links in the WDEV. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20230317142153.237900-1-quic_vjakkam@quicinc.com [rename function and attribute, fix kernel-doc] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: cfg80211: do not scan disabled links on 6GHzBenjamin Berg
If a link is disabled on 6GHz, we should not send a probe request on the channel to resolve it. Simply skip such RNR entries so that the link is ignored. Userspace can still see the link in the RNR and may generate an ML probe request in order to associate to the (currently) disabled link. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.4f7384006471.Iff8f1081e76a298bd25f9468abb3a586372cddaa@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: cfg80211: handle BSS data contained in ML probe responsesBenjamin Berg
The basic multi-link element within an multi-link probe response will contain full information about BSSes that are part of an MLD AP. This BSS information may be used to associate with a link of an MLD AP without having received a beacon from the BSS itself. This patch adds parsing of the data and adding/updating the BSS using the received elements. Doing this means that userspace can discover the BSSes using an ML probe request and request association on these links. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.29593bd0ae1f.Ic9a67b8f022360aa202b870a932897a389171b14@changeid [swap loop conditions smatch complained about] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-06-19wifi: cfg80211: use structs for TBTT information accessBenjamin Berg
Make the data access a bit nicer overall by using structs. There is a small change here to also accept a TBTT information length of eight bytes as we do not require the 20 MHz PSD information. This also fixes a bug reading the short SSID on big endian machines. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230618214436.4c3f8901c1bc.Ic3e94fd6e1bccff7948a252ad3bb87e322690a17@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>