summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2024-11-18net/udp: Add a new struct for hash2 slotPhilo Lu
Preparing for udp 4-tuple hash (uhash4 for short). To implement uhash4 without cache line missing when lookup, hslot2 is used to record the number of hashed sockets in hslot4. Thus adding a new struct udp_hslot_main with field hash4_cnt, which is used by hash2. The new struct is used to avoid doubling the size of udp_hslot. Before uhash4 lookup, firstly checking hash4_cnt to see if there are hashed sks in hslot4. Because hslot2 is always used in lookup, there is no cache line miss. Related helpers are updated, and use the helpers as possible. uhash4 is implemented in following patches. Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18Merge tag 'ipsec-next-2024-11-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next-11-15 1) Add support for RFC 9611 per cpu xfrm state handling. 2) Add inbound and outbound xfrm state caches to speed up state lookups. 3) Convert xfrm to dscp_t. From Guillaume Nault. 4) Fix error handling in build_aevent. From Everest K.C. 5) Replace strncpy with strscpy_pad in copy_to_user_auth. From Daniel Yang. 6) Fix an uninitialized symbol during acquire state insertion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-15Merge tag 'for-net-next-2024-11-14' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - btusb: add Foxconn 0xe0fc for Qualcomm WCN785x - btmtk: Fix ISO interface handling - Add quirk for ATS2851 - btusb: Add RTL8852BE device 0489:e123 - ISO: Do not emit LE PA/BIG Create Sync if previous is pending - btusb: Add USB HW IDs for MT7920/MT7925 - btintel_pcie: Add handshake between driver and firmware - btintel_pcie: Add recovery mechanism - hci_conn: Use disable_delayed_work_sync - SCO: Use kref to track lifetime of sco_conn - ISO: Use kref to track lifetime of iso_conn - btnxpuart: Add GPIO support to power save feature - btusb: Add 0x0489:0xe0f3 and 0x13d3:0x3623 for Qualcomm WCN785x * tag 'for-net-next-2024-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (51 commits) Bluetooth: MGMT: Add initial implementation of MGMT_OP_HCI_CMD_SYNC Bluetooth: fix use-after-free in device_for_each_child() Bluetooth: btintel: Direct exception event to bluetooth stack Bluetooth: hci_core: Fix calling mgmt_device_connected Bluetooth: hci_bcm: Use the devm_clk_get_optional() helper Bluetooth: ISO: Send BIG Create Sync via hci_sync Bluetooth: hci_conn: Remove alloc from critical section Bluetooth: ISO: Use kref to track lifetime of iso_conn Bluetooth: SCO: Use kref to track lifetime of sco_conn Bluetooth: HCI: Add IPC(11) bus type Bluetooth: btusb: Add 3 HWIDs for MT7925 Bluetooth: btusb: Add new VID/PID 0489/e124 for MT7925 Bluetooth: ISO: Update hci_conn_hash_lookup_big for Broadcast slave Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending Bluetooth: ISO: Fix matching parent socket for BIS slave Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pending Bluetooth: btrtl: Decrease HCI_OP_RESET timeout from 10 s to 2 s Bluetooth: btbcm: fix missing of_node_put() in btbcm_get_board_name() Bluetooth: btusb: Add new VID/PID 0489/e111 for MT7925 Bluetooth: btmtk: adjust the position to init iso data anchor ... ==================== Link: https://patch.msgid.link/20241114214731.1994446-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15Merge tag 'nf-next-24-11-15' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Extended netlink error reporting if nfnetlink attribute parser fails, from Donald Hunter. 2) Incorrect request_module() module, from Simon Horman. 3) A series of patches to reduce memory consumption for set element transactions. Florian Westphal says: "When doing a flush on a set or mass adding/removing elements from a set, each element needs to allocate 96 bytes to hold the transactional state. In such cases, virtually all the information in struct nft_trans_elem is the same. Change nft_trans_elem to a flex-array, i.e. a single nft_trans_elem can hold multiple set element pointers. The number of elements that can be stored in one nft_trans_elem is limited by the slab allocator, this series limits the compaction to at most 62 elements as it caps the reallocation to 2048 bytes of memory." 4) A series of patches to prepare the transition to dscp_t in .flowi_tos. From Guillaume Nault. 5) Support for bitwise operations with two source registers, from Jeremy Sowden. * tag 'nf-next-24-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: bitwise: add support for doing AND, OR and XOR directly netfilter: bitwise: rename some boolean operation functions netfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t. netfilter: nft_fib: Convert nft_fib4_eval() to dscp_t. netfilter: rpfilter: Convert rpfilter_mt() to dscp_t. netfilter: flow_offload: Convert nft_flow_route() to dscp_t. netfilter: ipv4: Convert ip_route_me_harder() to dscp_t. netfilter: nf_tables: allocate element update information dynamically netfilter: nf_tables: switch trans_elem to real flex array netfilter: nf_tables: prepare nft audit for set element compaction netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure netfilter: nf_tables: add nft_trans_commit_list_add_elem helper netfilter: bpf: Pass string literal as format argument of request_module() netfilter: nfnetlink: Report extack policy errors for batched ops ==================== Link: https://patch.msgid.link/20241115133207.8907-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14Bluetooth: MGMT: Add initial implementation of MGMT_OP_HCI_CMD_SYNCLuiz Augusto von Dentz
This adds the initial implementation of MGMT_OP_HCI_CMD_SYNC as documented in mgmt-api (BlueZ tree): Send HCI command and wait for event Command =========================================== Command Code: 0x005B Controller Index: <controller id> Command Parameters: Opcode (2 Octets) Event (1 Octet) Timeout (1 Octet) Parameter Length (2 Octets) Parameter (variable) Return Parameters: Response (1-variable Octets) This command may be used to send a HCI command and wait for an (optional) event. The HCI command is specified by the Opcode, any arbitrary is supported including vendor commands, but contrary to the like of Raw/User channel it is run as an HCI command send by the kernel since it uses its command synchronization thus it is possible to wait for a specific event as a response. Setting event to 0x00 will cause the command to wait for either HCI Command Status or HCI Command Complete. Timeout is specified in seconds, setting it to 0 will cause the default timeout to be used. Possible errors: Failed Invalid Parameters Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: HCI: Add IPC(11) bus typeLuiz Augusto von Dentz
Zephyr(1) has been using the same bus defines as Linux so tools likes of btmon, etc, are able to decode the bus used by the driver to transport HCI packets. Link: https://github.com/zephyrproject-rtos/zephyr/pull/80808 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: ISO: Update hci_conn_hash_lookup_big for Broadcast slaveIulia Tanasescu
Currently, hci_conn_hash_lookup_big only checks for BIS master connections, by filtering out connections with the destination address set. This commit updates this function to also consider BIS slave connections, since it is also used for a Broadcast Receiver to set an available BIG handle before issuing the LE BIG Create Sync command. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pendingIulia Tanasescu
The Bluetooth Core spec does not allow a LE BIG Create sync command to be sent to Controller if another one is pending (Vol 4, Part E, page 2586). In order to avoid this issue, the HCI_CONN_CREATE_BIG_SYNC was added to mark that the LE BIG Create Sync command has been sent for a hcon. Once the BIG Sync Established event is received, the hcon flag is erased and the next pending hcon is handled. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pendingIulia Tanasescu
The Bluetooth Core spec does not allow a LE PA Create sync command to be sent to Controller if another one is pending (Vol 4, Part E, page 2493). In order to avoid this issue, the HCI_CONN_CREATE_PA_SYNC was added to mark that the LE PA Create Sync command has been sent for a hcon. Once the PA Sync Established event is received, the hcon flag is erased and the next pending hcon is handled. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: Add new quirks for ATS2851Danil Pylaev
This adds quirks for broken extended create connection, and write auth payload timeout. Signed-off-by: Danil Pylaev <danstiv404@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc8). Conflicts: tools/testing/selftests/net/.gitignore 252e01e68241 ("selftests: net: add netlink-dumps to .gitignore") be43a6b23829 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw") https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/ drivers/net/phy/phylink.c 671154f174e0 ("net: phylink: ensure PHY momentary link-fails are handled") 7530ea26c810 ("net: phylink: remove "using_mac_select_pcs"") Adjacent changes: drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c 5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") e96321fad3ad ("net: ethernet: Switch back to struct platform_driver::remove()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14Merge tag 'net-6.12-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth. Quite calm week. No new regression under investigation. Current release - regressions: - eth: revert "igb: Disable threaded IRQ for igb_msix_other" Current release - new code bugs: - bluetooth: btintel: direct exception event to bluetooth stack Previous releases - regressions: - core: fix data-races around sk->sk_forward_alloc - netlink: terminate outstanding dump on socket close - mptcp: error out earlier on disconnect - vsock: fix accept_queue memory leak - phylink: ensure PHY momentary link-fails are handled - eth: mlx5: - fix null-ptr-deref in add rule err flow - lock FTE when checking if active - eth: dwmac-mediatek: fix inverted handling of mediatek,mac-wol Previous releases - always broken: - sched: fix u32's systematic failure to free IDR entries for hnodes. - sctp: fix possible UAF in sctp_v6_available() - eth: bonding: add ns target multicast address to slave device - eth: mlx5: fix msix vectors to respect platform limit - eth: icssg-prueth: fix 1 PPS sync" * tag 'net-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) net: sched: u32: Add test case for systematic hnode IDR leaks selftests: bonding: add ns multicast group testing bonding: add ns target multicast address to slave device net: ti: icssg-prueth: Fix 1 PPS sync stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines net: Make copy_safe_from_sockptr() match documentation net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol ipmr: Fix access to mfc_cache_list without lock held samples: pktgen: correct dev to DEV net: phylink: ensure PHY momentary link-fails are handled mptcp: pm: use _rcu variant under rcu_read_lock mptcp: hold pm lock when deleting entry mptcp: update local address flags when setting it net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes. MAINTAINERS: Re-add cancelled Renesas driver sections Revert "igb: Disable threaded IRQ for igb_msix_other" Bluetooth: btintel: Direct exception event to bluetooth stack Bluetooth: hci_core: Fix calling mgmt_device_connected virtio/vsock: Improve MSG_ZEROCOPY error handling vsock: Fix sk_error_queue memory leak ...
2024-11-14netfilter: nf_tables: allocate element update information dynamicallyFlorian Westphal
Move the timeout/expire/flag members from nft_trans_one_elem struct into a dybamically allocated structure, only needed when timeout update was requested. This halves size of nft_trans_one_elem struct and allows to compact up to 124 elements in one transaction container rather than 62. This halves memory requirements for a large flush or insert transaction, where ->update remains NULL. Care has to be taken to release the extra data in all spots, including abort path. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-14netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structureFlorian Westphal
Add helpers to release the individual elements contained in the trans_elem container structure. No functional change intended. Followup patch will add 'nelems' member and will turn 'priv' into a flexible array. These helpers can then loop over all elements. Care needs to be taken to handle a mix of new elements and existing elements that are being updated (e.g. timeout refresh). Before this patch, NEWSETELEM transaction with update is released early so nft_trans_set_elem_destroy() won't get called, so we need to skip elements marked as update. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-14bonding: add ns target multicast address to slave deviceHangbin Liu
Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves") tried to resolve the issue where backup slaves couldn't be brought up when receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only worked for drivers that receive all multicast messages, such as the veth interface. For standard drivers, the NS multicast message is silently dropped because the slave device is not a member of the NS target multicast group. To address this, we need to make the slave device join the NS target multicast group, ensuring it can receive these IPv6 NS messages to validate the slave’s status properly. There are three policies before joining the multicast group: 1. All settings must be under active-backup mode (alb and tlb do not support arp_validate), with backup slaves and slaves supporting multicast. 2. We can add or remove multicast groups when arp_validate changes. 3. Other operations, such as enslaving, releasing, or setting NS targets, need to be guarded by arp_validate. Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-13net: simplify eeecfg_mac_can_tx_lpiHeiner Kallweit
Simplify the function. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/f9a4623b-b94c-466c-8733-62057c6d9a17@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-13Merge tag 'wireless-next-2024-11-13' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.13 Most likely the last -next pull request for v6.13. Most changes are in Realtek and Qualcomm drivers, otherwise not really anything noteworthy. Major changes: mac80211 * EHT 1024 aggregation size for transmissions ath12k * switch to using wiphy_lock() and remove ar->conf_mutex * firmware coredump collection support * add debugfs support for a multitude of statistics ath11k * dt: document WCN6855 hardware inputs ath9k * remove include/linux/ath9k_platform.h ath5k * Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support rtw88: * 8821au and 8812au USB adapters support rtw89 * thermal protection * firmware secure boot for WiFi 6 chip * tag 'wireless-next-2024-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (154 commits) Revert "wifi: iwlegacy: do not skip frames with bad FCS" wifi: mac80211: pass MBSSID config by reference wifi: mac80211: Support EHT 1024 aggregation size in TX net: rfkill: gpio: Add check for clk_enable() wifi: brcmfmac: Fix oops due to NULL pointer dereference in brcmf_sdiod_sglist_rw() wifi: Switch back to struct platform_driver::remove() wifi: ipw2x00: libipw_rx_any(): fix bad alignment wifi: brcmfmac: release 'root' node in all execution paths wifi: iwlwifi: mvm: don't call power_update_mac in fast suspend wifi: iwlwifi: s/IWL_MVM_INVALID_STA/IWL_INVALID_STA wifi: iwlwifi: bump minimum API version in BZ/SC to 92 wifi: iwlwifi: move IWL_LMAC_*_INDEX to fw/api/context.h wifi: iwlwifi: be less noisy if the NIC is dead in S3 wifi: iwlwifi: mvm: tell iwlmei when we finished suspending wifi: iwlwifi: allow fast resume on ax200 wifi: iwlwifi: mvm: support new initiator and responder command version wifi: iwlwifi: mvm: use wiphy locked debugfs for low-latency wifi: iwlwifi: mvm: MLO scan upon channel condition degradation wifi: iwlwifi: mvm: support new versions of the wowlan APIs wifi: iwlwifi: mvm: allow always calling iwl_mvm_get_bss_vif() ... ==================== Link: https://patch.msgid.link/20241113172918.A8A11C4CEC3@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-13Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye) - Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS context retrieval (Zijian Zhang) - Fix BPF bits iterator selftest for s390x (Hou Tao) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6 bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx selftests/bpf: Use -4095 as the bad address for bits iterator
2024-11-12net: ip: make ip_route_use_hint() return drop reasonsMenglong Dong
In this commit, we make ip_route_use_hint() return drop reasons. The drop reasons that we return are similar to what we do in ip_route_input_slow(), and no drop reasons are added in this commit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_mkroute_input/__mkroute_input return drop reasonsMenglong Dong
In this commit, we make ip_mkroute_input() and __mkroute_input() return drop reasons. The drop reason "SKB_DROP_REASON_ARP_PVLAN_DISABLE" is introduced for the case: the packet which is not IP is forwarded to the in_dev, and the proxy_arp_pvlan is not enabled. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input() return drop reasonsMenglong Dong
In this commit, we make ip_route_input() return skb drop reasons that come from ip_route_input_noref(). Meanwhile, adjust all the call to it. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input_noref() return drop reasonsMenglong Dong
In this commit, we make ip_route_input_noref() return drop reasons, which come from ip_route_input_rcu(). We need adjust the callers of ip_route_input_noref() to make sure the return value of ip_route_input_noref() is used properly. The errno that ip_route_input_noref() returns comes from ip_route_input and bpf_lwt_input_reroute in the origin logic, and we make them return -EINVAL on error instead. In the following patch, we will make ip_route_input() returns drop reasons too. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input_slow() return drop reasonsMenglong Dong
In this commit, we make ip_route_input_slow() return skb drop reasons, and following new skb drop reasons are added: SKB_DROP_REASON_IP_INVALID_DEST The only caller of ip_route_input_slow() is ip_route_input_rcu(), and we adjust it by making it return -EINVAL on error. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_mc_validate_source() return drop reasonMenglong Dong
Make ip_mc_validate_source() return drop reason, and adjust the call of it in ip_route_input_mc(). Another caller of it is ip_rcv_finish_core->udp_v4_early_demux, and the errno is not checked in detail, so we don't do more adjustment for it. The drop reason "SKB_DROP_REASON_IP_LOCALNET" is added in this commit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make fib_validate_source() support drop reasonsMenglong Dong
In this commit, we make fib_validate_source() and __fib_validate_source() return -reason instead of errno on error. The return value of fib_validate_source can be -errno, 0, and 1. It's hard to make fib_validate_source() return drop reasons directly. The fib_validate_source() will return 1 if the scope of the source(revert) route is HOST. And the __mkroute_input() will mark the skb with IPSKB_DOREDIRECT in this case (combine with some other conditions). And then, a REDIRECT ICMP will be sent in ip_forward() if this flag exists. We can't pass this information to __mkroute_input if we make fib_validate_source() return drop reasons. Therefore, we introduce the wrapper fib_validate_source_reason() for fib_validate_source(), which will return the drop reasons on error. In the origin logic, LINUX_MIB_IPRPFILTER will be counted if fib_validate_source() return -EXDEV. And now, we need to adjust it by checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take effect only after the patch "net: ip: make ip_route_input_noref() return drop reasons", as we can't pass the drop reasons from fib_validate_source() to ip_rcv_finish_core() in this patch. Following new drop reasons are added in this patch: SKB_DROP_REASON_IP_LOCAL_SOURCE SKB_DROP_REASON_IP_INVALID_SOURCE Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-11net: Add control functions for irq suspensionMartin Karsten
The napi_suspend_irqs routine bootstraps irq suspension by elongating the defer timeout to irq_suspend_timeout. The napi_resume_irqs routine effectively cancels irq suspension by forcing the napi to be scheduled immediately. Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Joe Damato <jdamato@fastly.com> Tested-by: Martin Karsten <mkarsten@uwaterloo.ca> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://patch.msgid.link/20241109050245.191288-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Register rtnl_dellink() and rtnl_setlink() with ↵Kuniyuki Iwashima
RTNL_FLAG_DOIT_PERNET_WIP. Currently, rtnl_setlink() and rtnl_dellink() cannot be fully converted to per-netns RTNL due to a lack of handling peer/lower/upper devices in different netns. For example, when we change a device in rtnl_setlink() and need to propagate that to its upper devices, we want to avoid acquiring all netns locks, for which we do not know the upper limit. The same situation happens when we remove a device. rtnl_dellink() could be transformed to remove a single device in the requested netns and delegate other devices to per-netns work, and rtnl_setlink() might be ? Until we come up with a better idea, let's use a new flag RTNL_FLAG_DOIT_PERNET_WIP for rtnl_dellink() and rtnl_setlink(). This will unblock converting RTNL users where such devices are not related. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241108004823.29419-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Add peer_type in struct rtnl_link_ops.Kuniyuki Iwashima
In ops->newlink(), veth, vxcan, and netkit call rtnl_link_get_net() with a net pointer, which is the first argument of ->newlink(). rtnl_link_get_net() could return another netns based on IFLA_NET_NS_PID and IFLA_NET_NS_FD in the peer device's attributes. We want to get it and fill rtnl_nets->nets[] in advance in rtnl_newlink() for per-netns RTNL. All of the three get the peer netns in the same way: 1. Call rtnl_nla_parse_ifinfomsg() 2. Call ops->validate() (vxcan doesn't have) 3. Call rtnl_link_get_net_tb() Let's add a new field peer_type to struct rtnl_link_ops and prefetch netns in the peer ifla to add it to rtnl_nets in rtnl_newlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241108004823.29419-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Remove __rtnl_link_register()Kuniyuki Iwashima
link_ops is protected by link_ops_mutex and no longer needs RTNL, so we have no reason to have __rtnl_link_register() separately. Let's remove it and call rtnl_link_register() from ifb.ko and dummy.ko. Note that both modules' init() work on init_net only, so we need not export pernet_ops_rwsem and can use rtnl_net_lock() there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241108004823.29419-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Protect link_ops by mutex.Kuniyuki Iwashima
rtnl_link_unregister() holds RTNL and calls synchronize_srcu(), but rtnl_newlink() will acquire SRCU frist and then RTNL. Then, we need to unlink ops and call synchronize_srcu() outside of RTNL to avoid the deadlock. rtnl_link_unregister() rtnl_newlink() ---- ---- lock(rtnl_mutex); lock(&ops->srcu); lock(rtnl_mutex); sync(&ops->srcu); Let's move as such and add a mutex to protect link_ops. Now, link_ops is protected by its dedicated mutex and rtnl_link_register() no longer needs to hold RTNL. While at it, we move the initialisation of ops->dellink and ops->srcu out of the mutex scope. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241108004823.29419-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Remove __rtnl_link_unregister().Kuniyuki Iwashima
rtnl_link_unregister() holds RTNL and calls __rtnl_link_unregister(), where we call synchronize_srcu() to wait inflight RTM_NEWLINK requests for per-netns RTNL. We put synchronize_srcu() in __rtnl_link_unregister() due to ifb.ko and dummy.ko. However, rtnl_newlink() will acquire SRCU before RTNL later in this series. Then, lockdep will detect the deadlock: rtnl_link_unregister() rtnl_newlink() ---- ---- lock(rtnl_mutex); lock(&ops->srcu); lock(rtnl_mutex); sync(&ops->srcu); To avoid the problem, we must call synchronize_srcu() before RTNL in rtnl_link_unregister(). As a preparation, let's remove __rtnl_link_unregister(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241108004823.29419-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: netlink: add nla_get_*_default() accessorsJohannes Berg
There are quite a number of places that use patterns such as if (attr) val = nla_get_u16(attr); else val = DEFAULT; Add nla_get_u16_default() and friends like that to not have to type this out all the time. Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20241108114145.acd2aadb03ac.I3df6aac71d38a5baa1c0a03d0c7e82d4395c030e@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Create netdev->neighbour associationGilad Naaman
Create a mapping between a netdev and its neighoburs, allowing for much cheaper flushes. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241107160444.2913124-7-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Remove bare neighbour::next pointerGilad Naaman
Remove the now-unused neighbour::next pointer, leaving struct neighbour solely with the hlist_node implementation. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241107160444.2913124-6-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Convert iteration to use hlist+macroGilad Naaman
Remove all usage of the bare neighbour::next pointer, replacing them with neighbour::hash and its for_each macro. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241107160444.2913124-5-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Define neigh_for_each_in_bucketGilad Naaman
Introduce neigh_for_each_in_bucket in neighbour.h, to help iterate over the neighbour table more succinctly. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241107160444.2913124-3-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Add hlist_node to struct neighbourGilad Naaman
Add a doubly-linked node to neighbours, so that they can be deleted without iterating the entire bucket they're in. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241107160444.2913124-2-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09net: mctp: Expose transport binding identifier via IFLA attributeKhang Nguyen
MCTP control protocol implementations are transport binding dependent. Endpoint discovery is mandatory based on transport binding. Message timing requirements are specified in each respective transport binding specification. However, we currently have no means to get this information from MCTP links. Add a IFLA_MCTP_PHYS_BINDING netlink link attribute, which represents the transport type using the DMTF DSP0239-defined type numbers, returned as part of RTM_GETLINK data. We get an IFLA_MCTP_PHYS_BINDING attribute for each MCTP link, for example: - 0x00 (unspec) for loopback interface; - 0x01 (SMBus/I2C) for mctpi2c%d interfaces; and - 0x05 (serial) for mctpserial%d interfaces. Signed-off-by: Khang Nguyen <khangng@os.amperecomputing.com> Reviewed-by: Matt Johnston <matt@codeconstruct.com.au> Link: https://patch.msgid.link/20241105071915.821871-1-khangng@os.amperecomputing.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07ipv4: Prepare ip_route_output() to future .flowi4_tos conversion.Guillaume Nault
Convert the "tos" parameter of ip_route_output() to dscp_t. This way we'll have a dscp_t value directly available when .flowi4_tos will eventually be converted to dscp_t. All ip_route_output() callers but one set this "tos" parameter to 0 and therefore don't need to be adapted to the new prototype. Only br_nf_pre_routing_finish() needs conversion. It can just use ip4h_dscp() to get the DSCP field from the IPv4 header. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/0f10d031dd44c70aae9bc6e19391cb30d5c2fe71.1730928699.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc7). Conflicts: drivers/net/ethernet/freescale/enetc/enetc_pf.c e15c5506dd39 ("net: enetc: allocate vf_state during PF probes") 3774409fd4c6 ("net: enetc: build enetc_pf_common.c as a separate module") https://lore.kernel.org/20241105114100.118bd35e@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/am65-cpsw-nuss.c de794169cf17 ("net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7") 4a7b2ba94a59 ("net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07wifi: mac80211: fix description of ieee80211_set_active_links() for new sequenceZong-Zhe Yang
The sequence of calls has changed, but the description is inconsistent. So, fix the description. Fixes: 188a1bf89432 ("wifi: mac80211: re-order assigning channel in activate links") Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Link: https://patch.msgid.link/20241101082143.11138-1-kevin_yang@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07Merge tag 'nf-next-24-11-07' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following series contains Netfilter updates for net-next: 1) Make legacy xtables configs user selectable, from Breno Leitao. 2) Fix a few sparse warnings related to percpu, from Uros Bizjak. 3) Use strscpy_pad, from Justin Stitt. 4) Use nft_trans_elem_alloc() in catchall flush, from Florian Westphal. 5) A series of 7 patches to fix false positive with CONFIG_RCU_LIST=y. Florian also sees possible issue with 10 while module load/removal when requesting an expression that is available via module. As for patch 11, object is being updated so reference on the module already exists so I don't see any real issue. Florian says: "Unfortunately there are many more errors, and not all are false positives. First patches pass lockdep_commit_lock_is_held() to the rcu list traversal macro so that those splats are avoided. The last two patches are real code change as opposed to 'pass the transaction mutex to relax rcu check': Those two lists are not protected by transaction mutex so could be altered in parallel. This targets nf-next because these are long-standing issues." netfilter pull request 24-11-07 * tag 'nf-next-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: must hold rcu read lock while iterating object type list netfilter: nf_tables: must hold rcu read lock while iterating expression type list netfilter: nf_tables: avoid false-positive lockdep splats with basechain hook netfilter: nf_tables: avoid false-positive lockdep splats in set walker netfilter: nf_tables: avoid false-positive lockdep splats with flowtables netfilter: nf_tables: avoid false-positive lockdep splats with sets netfilter: nf_tables: avoid false-positive lockdep splat on rule deletion netfilter: nf_tables: prefer nft_trans_elem_alloc helper netfilter: nf_tables: replace deprecated strncpy with strscpy_pad netfilter: nf_tables: Fix percpu address space issues in nf_tables_api.c netfilter: Make legacy configs user selectable ==================== Link: https://patch.msgid.link/20241106234625.168468-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07netfilter: nf_tables: wait for rcu grace period on net_device removalPablo Neira Ayuso
8c873e219970 ("netfilter: core: free hooks with call_rcu") removed synchronize_net() call when unregistering basechain hook, however, net_device removal event handler for the NFPROTO_NETDEV was not updated to wait for RCU grace period. Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal") does not remove basechain rules on device removal, I was hinted to remove rules on net_device removal later, see 5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on netdevice removal"). Although NETDEV_UNREGISTER event is guaranteed to be handled after synchronize_net() call, this path needs to wait for rcu grace period via rcu callback to release basechain hooks if netns is alive because an ongoing netlink dump could be in progress (sockets hold a reference on the netns). Note that nf_tables_pre_exit_net() unregisters and releases basechain hooks but it is possible to see NETDEV_UNREGISTER at a later stage in the netns exit path, eg. veth peer device in another netns: cleanup_net() default_device_exit_batch() unregister_netdevice_many_notify() notifier_call_chain() nf_tables_netdev_event() __nft_release_basechain() In this particular case, same rule of thumb applies: if netns is alive, then wait for rcu grace period because netlink dump in the other netns could be in progress. Otherwise, if the other netns is going away then no netlink dump can be in progress and basechain hooks can be released inmediately. While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain validation, which should not ever happen. Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-07net: nfc: Propagate ISO14443 type A target ATS to userspace via netlinkJuraj Šarinay
Add a 20-byte field ats to struct nfc_target and expose it as NFC_ATTR_TARGET_ATS via the netlink interface. The payload contains 'historical bytes' that help to distinguish cards from one another. The information is commonly used to assemble an emulated ATR similar to that reported by smart cards with contacts. Add a 20-byte field target_ats to struct nci_dev to hold the payload obtained in nci_rf_intf_activated_ntf_packet() and copy it to over to nfc_target.ats in nci_activate_target(). The approach is similar to the handling of 'general bytes' within ATR_RES. Replace the hard-coded size of rats_res within struct activation_params_nfca_poll_iso_dep by the equal constant NFC_ATS_MAXSIZE now defined in nfc.h Within NCI, the information corresponds to the 'RATS Response' activation parameter that omits the initial length byte TL. This loses no information and is consistent with our handling of SENSB_RES that also drops the first (constant) byte. Tested with nxp_nci_i2c on a few type A targets including an ICAO 9303 compliant passport. I refrain from the corresponding change to digital_in_recv_ats() to have the few drivers based on digital.h fill nfc_target.ats, as I have no way to test it. That class of drivers appear not to set NFC_ATTR_TARGET_SENSB_RES either. Consider a separate patch to propagate (all) the parameters. Signed-off-by: Juraj Šarinay <juraj@sarinay.com> Link: https://patch.msgid.link/20241103124525.8392-1-juraj@sarinay.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-06bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rxZijian Zhang
As the introduction of the support for vsock and unix sockets in sockmap, tls_sw_has_ctx_tx/rx cannot presume the socket passed in must be IS_ICSK. vsock and af_unix sockets have vsock_sock and unix_sock instead of inet_connection_sock. For these sockets, tls_get_ctx may return an invalid pointer and cause page fault in function tls_sw_ctx_rx. BUG: unable to handle page fault for address: 0000000000040030 Workqueue: vsock-loopback vsock_loopback_work RIP: 0010:sk_psock_strp_data_ready+0x23/0x60 Call Trace: ? __die+0x81/0xc3 ? no_context+0x194/0x350 ? do_page_fault+0x30/0x110 ? async_page_fault+0x3e/0x50 ? sk_psock_strp_data_ready+0x23/0x60 virtio_transport_recv_pkt+0x750/0x800 ? update_load_avg+0x7e/0x620 vsock_loopback_work+0xd0/0x100 process_one_work+0x1a7/0x360 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x112/0x130 ? __kthread_cancel_work+0x40/0x40 ret_from_fork+0x1f/0x40 v2: - Add IS_ICSK check v3: - Update the commits in Fixes Fixes: 634f1a7110b4 ("vsock: support sockmap") Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20241106003742.399240-1-zijianzhang@bytedance.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-11-06xfrm: Convert struct xfrm_dst_lookup_params -> tos to dscp_t.Guillaume Nault
Add type annotation to the "tos" field of struct xfrm_dst_lookup_params, to ensure that the ECN bits aren't mistakenly taken into account when doing route lookups. Rename that field (tos -> dscp) to make that change explicit. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-05netfilter: nf_tables: avoid false-positive lockdep splats with flowtablesFlorian Westphal
The transaction mutex prevents concurrent add/delete, its ok to iterate those lists outside of rcu read side critical sections. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-03Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-10-31 We've added 13 non-merge commits during the last 16 day(s) which contain a total of 16 files changed, 710 insertions(+), 668 deletions(-). The main changes are: 1) Optimize and homogenize bpf_csum_diff helper for all archs and also add a batch of new BPF selftests for it, from Puranjay Mohan. 2) Rewrite and migrate the test_tcp_check_syncookie.sh BPF selftest into test_progs so that it can be run in BPF CI, from Alexis Lothoré. 3) Two BPF sockmap selftest fixes, from Zijian Zhang. 4) Small XDP synproxy BPF selftest cleanup to remove IP_DF check, from Vincent Li. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Add a selftest for bpf_csum_diff() selftests/bpf: Don't mask result of bpf_csum_diff() in test_verifier bpf: bpf_csum_diff: Optimize and homogenize for all archs net: checksum: Move from32to16() to generic header selftests/bpf: remove xdp_synproxy IP_DF check selftests/bpf: remove test_tcp_check_syncookie selftests/bpf: test MSS value returned with bpf_tcp_gen_syncookie selftests/bpf: add ipv4 and dual ipv4/ipv6 support in btf_skc_cls_ingress selftests/bpf: get rid of global vars in btf_skc_cls_ingress selftests/bpf: add missing ns cleanups in btf_skc_cls_ingress selftests/bpf: factorize conn and syncookies tests in a single runner selftests/bpf: Fix txmsg_redir of test_txmsg_pull in test_sockmap selftests/bpf: Fix msg_verify_data in test_sockmap ==================== Link: https://patch.msgid.link/20241031221543.108853-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net/tcp: Add missing lockdep annotations for TCP-AO hlist traversalsDmitry Safonov
Under CONFIG_PROVE_RCU_LIST + CONFIG_RCU_EXPERT hlist_for_each_entry_rcu() provides very helpful splats, which help to find possible issues. I missed CONFIG_RCU_EXPERT=y in my testing config the same as described in a3e4bf7f9675 ("configs/debug: make sure PROVE_RCU_LIST=y takes effect"). The fix itself is trivial: add the very same lockdep annotations as were used to dereference ao_info from the socket. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20241028152645.35a8be66@kernel.org/ Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20241030-tcp-ao-hlist-lockdep-annotate-v1-1-bf641a64d7c6@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31netlink: add NLA_POLICY_MAX_LEN macroAntonio Quartulli
Similarly to NLA_POLICY_MIN_LEN, NLA_POLICY_MAX_LEN defines a policy with a maximum length value. The netlink generator for YAML specs has been extended accordingly. Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20241029-b4-ovpn-v11-1-de4698c73a25@openvpn.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>