summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-04net: hibmcge: Add support for abnormal irq handling featureJijie Shao
the hardware error was reported by interrupt, and need be fixed by doing function reset, but the whole reset flow takes a long time, should not do it in irq handler, so do it in scheduled task. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: hibmcge: Add support for checksum offloadJijie Shao
This patch implements the rx checksum offload feature. The tx checksum offload processing in .ndo_start_xmit() has been accepted. This patch also adds the tx checksum feature, including NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: hibmcge: Add support for dump statisticsJijie Shao
The driver supports many hw statistics. This patch supports dump statistics through ethtool_ops and ndo.get_stats64(). The type of hw statistics register is u32, To prevent the statistics register from overflowing, the driver dump the statistics every 30 seconds. in a scheduled task. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04Merge branch 'introduce-flowtable-hw-offloading-in-airoha_eth-driver'Paolo Abeni
Lorenzo Bianconi says: ==================== Introduce flowtable hw offloading in airoha_eth driver Introduce netfilter flowtable integration in airoha_eth driver to offload 5-tuple flower rules learned by the PPE module if the user accelerates them using a nft configuration similar to the one reported below: table inet filter { flowtable ft { hook ingress priority filter devices = { lan1, lan2, lan3, lan4, eth1 } flags offload; } chain forward { type filter hook forward priority filter; policy accept; meta l4proto { tcp, udp } flow add @ft } } Packet Processor Engine (PPE) module available on EN7581 SoC populates the PPE table with 5-tuples flower rules learned from traffic forwarded between the GDM ports connected to the Packet Switch Engine (PSE) module. airoha_eth driver configures and collects data from the PPE module via a Network Processor Unit (NPU) RISC-V module available on the EN7581 SoC. Move airoha_eth driver in a dedicated folder (drivers/net/ethernet/airoha). v7: https://lore.kernel.org/r/20250224-airoha-en7581-flowtable-offload-v7-0-b4a22ad8364e@kernel.org v6: https://lore.kernel.org/r/20250221-airoha-en7581-flowtable-offload-v6-0-d593af0e9487@kernel.org v5: https://lore.kernel.org/r/20250217-airoha-en7581-flowtable-offload-v5-0-28be901cb735@kernel.org v4: https://lore.kernel.org/r/20250213-airoha-en7581-flowtable-offload-v4-0-b69ca16d74db@kernel.org v3: https://lore.kernel.org/r/20250209-airoha-en7581-flowtable-offload-v3-0-dba60e755563@kernel.org v2: https://lore.kernel.org/r/20250207-airoha-en7581-flowtable-offload-v2-0-3a2239692a67@kernel.org v1: https://lore.kernel.org/r/20250205-airoha-en7581-flowtable-offload-v1-0-d362cfa97b01@kernel.org ==================== Link: https://patch.msgid.link/20250228-airoha-en7581-flowtable-offload-v8-0-01dc1653f46e@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Introduce PPE debugfs supportLorenzo Bianconi
Similar to PPE support for Mediatek devices, introduce PPE debugfs in order to dump binded and unbinded flows. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Add loopback support for GDM2Lorenzo Bianconi
Enable hw redirection for traffic received on GDM2 port to GDM{3,4}. This is required to apply Qdisc offloading (HTB or ETS) for traffic to and from GDM{3,4} port. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Introduce flowtable offload supportLorenzo Bianconi
Introduce netfilter flowtable integration in order to allow airoha_eth driver to offload 5-tuple flower rules learned by the PPE module if the user accelerates them using a nft configuration similar to the one reported below: table inet filter { flowtable ft { hook ingress priority filter devices = { lan1, lan2, lan3, lan4, eth1 } flags offload; } chain forward { type filter hook forward priority filter; policy accept; meta l4proto { tcp, udp } flow add @ft } } Tested-by: Sayantan Nandy <sayantan.nandy@airoha.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Introduce Airoha NPU supportLorenzo Bianconi
Packet Processor Engine (PPE) module available on EN7581 SoC populates the PPE table with 5-tuples flower rules learned from traffic forwarded between the GDM ports connected to the Packet Switch Engine (PSE) module. The airoha_eth driver can enable hw acceleration of learned 5-tuples rules if the user configure them in netfilter flowtable (netfilter flowtable support will be added with subsequent patches). airoha_eth driver configures and collects data from the PPE module via a Network Processor Unit (NPU) RISC-V module available on the EN7581 SoC. Introduce basic support for Airoha NPU module. Tested-by: Sayantan Nandy <sayantan.nandy@airoha.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04dt-bindings: net: airoha: Add airoha,npu phandle propertyLorenzo Bianconi
Introduce the airoha,npu property for the NPU node available on EN7581 SoC. The airoha Network Processor Unit (NPU) is used to offload network traffic forwarded between Packet Switch Engine (PSE) ports. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04dt-bindings: net: airoha: Add the NPU node for EN7581 SoCLorenzo Bianconi
This patch adds the NPU document binding for EN7581 SoC. The Airoha Network Processor Unit (NPU) provides a configuration interface to implement wired and wireless hardware flow offloading programming Packet Processor Engine (PPE) flow table. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Rename airoha_set_gdm_port_fwd_cfg() in ↵Lorenzo Bianconi
airoha_set_vip_for_gdm_port() Rename airoha_set_gdm_port() in airoha_set_vip_for_gdm_port(). Get rid of airoha_set_gdm_ports routine. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Move REG_GDM_FWD_CFG() initialization in airoha_dev_init()Lorenzo Bianconi
Move REG_GDM_FWD_CFG() register initialization in airoha_dev_init routine. Moreover, always send traffic PPE module in order to be processed by hw accelerator. This is a preliminary patch to enable netfilter flowtable hw offloading on EN7581 SoC. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Enable support for multiple net_devicesLorenzo Bianconi
In the current codebase airoha_eth driver supports just a single net_device connected to the Packet Switch Engine (PSE) lan port (GDM1). As shown in commit 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC"), PSE can switch packets between four GDM ports. Enable the capability to create a net_device for each GDM port of the PSE module. Moreover, since the QDMA blocks can be shared between net_devices, do not stop TX/RX DMA in airoha_dev_stop() if there are active net_devices for this QDMA block. This is a preliminary patch to enable flowtable hw offloading for EN7581 SoC. Co-developed-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: dsa: mt7530: Enable Rx sptag for EN7581 SoCLorenzo Bianconi
Packet Processor Engine (PPE) module used for hw acceleration on EN7581 mac block, in order to properly parse packets, requires DSA untagged packets on TX side and read DSA tag from DMA descriptor on RX side. For this reason, enable RX Special Tag (SPTAG) for EN7581 SoC. This is a preliminary patch to enable netfilter flowtable hw offloading on EN7581 SoC. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Move DSA tag in DMA descriptorLorenzo Bianconi
Packet Processor Engine (PPE) module reads DSA tags from the DMA descriptor and requires untagged DSA packets to properly parse them. Move DSA tag in the DMA descriptor on TX side and read DSA tag from DMA descriptor on RX side. In order to avoid skb reallocation, store tag in skb_dst on RX side. This is a preliminary patch to enable netfilter flowtable hw offloading on EN7581 SoC. Tested-by: Sayantan Nandy <sayantan.nandy@airoha.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Move register definitions in airoha_regs.hLorenzo Bianconi
Move common airoha_eth register definitions in airoha_regs.h in order to reuse them for Packet Processor Engine (PPE) codebase. PPE module is used to enable support for flowtable hw offloading in airoha_eth driver. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Move reg/write utility routines in airoha_eth.hLorenzo Bianconi
This is a preliminary patch to introduce flowtable hw offloading support for airoha_eth driver. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Move definitions in airoha_eth.hLorenzo Bianconi
Move common airoha_eth definitions in airoha_eth.h in order to reuse them for Packet Processor Engine (PPE) codebase. PPE module is used to enable support for flowtable hw offloading in airoha_eth driver. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: airoha: Move airoha_eth driver in a dedicated folderLorenzo Bianconi
The airoha_eth driver has no codebase shared with mtk_eth_soc one. Moreover, the upcoming features (flowtable hw offloading, PCS, ..) will not reuse any code from MediaTek driver. Move the Airoha driver in a dedicated folder. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04Merge branch 'net-notify-users-when-an-iface-cannot-change-its-netns'Paolo Abeni
Nicolas Dichtel says: ==================== net: notify users when an iface cannot change its netns This series adds a way to see if an interface cannot be moved to another netns. Documentation/netlink/specs/rt_link.yaml | 3 ++ .../networking/net_cachelines/net_device.rst | 2 +- Documentation/networking/switchdev.rst | 2 +- drivers/net/amt.c | 2 +- drivers/net/bonding/bond_main.c | 2 +- drivers/net/ethernet/adi/adin1110.c | 2 +- .../net/ethernet/marvell/prestera/prestera_main.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 2 +- drivers/net/ethernet/rocker/rocker_main.c | 2 +- drivers/net/ethernet/ti/cpsw_new.c | 2 +- drivers/net/loopback.c | 2 +- drivers/net/net_failover.c | 2 +- drivers/net/team/team_core.c | 2 +- drivers/net/vrf.c | 2 +- include/linux/netdevice.h | 9 +++-- include/uapi/linux/if_link.h | 1 + net/batman-adv/soft-interface.c | 2 +- net/bridge/br_device.c | 2 +- net/core/dev.c | 45 +++++++++++++++++----- net/core/rtnetlink.c | 5 ++- net/hsr/hsr_device.c | 2 +- net/ieee802154/6lowpan/core.c | 2 +- net/ieee802154/core.c | 10 ++--- net/ipv4/ip_tunnel.c | 2 +- net/ipv4/ipmr.c | 2 +- net/ipv6/ip6_gre.c | 2 +- net/ipv6/ip6_tunnel.c | 2 +- net/ipv6/ip6mr.c | 2 +- net/ipv6/sit.c | 2 +- net/openvswitch/vport-internal_dev.c | 2 +- net/wireless/core.c | 10 ++--- tools/testing/selftests/net/forwarding/README | 2 +- 34 files changed, 86 insertions(+), 53 deletions(-) Comments are welcome. Regards, Nicolas ==================== Link: https://patch.msgid.link/20250228102144.154802-1-nicolas.dichtel@6wind.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: plumb extack in __dev_change_net_namespace()Nicolas Dichtel
It could be hard to understand why the netlink command fails. For example, if dev->netns_immutable is set, the error is "Invalid argument". Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: advertise netns_immutable property via netlinkNicolas Dichtel
Since commit 05c1280a2bcf ("netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local"), there is no way to see if the netns_immutable property s set on a device. Let's add a netlink attribute to advertise it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: rename netns_local to netns_immutableNicolas Dichtel
The name 'netns_local' is confusing. A following commit will export it via netlink, so let's use a more explicit name. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04Merge branch 'some-pktgen-fixes-improvments-part-ii'Paolo Abeni
Peter Seiderer says: ==================== Some pktgen fixes/improvments (part II) While taking a look at '[PATCH net] pktgen: Avoid out-of-range in get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds access in get_imix_entries' ([2], [3]) and doing some tests and code review I detected that the /proc/net/pktgen/... parsing logic does not honour the user given buffer bounds (resulting in out-of-bounds access). This can be observed e.g. by the following simple test (sometimes the old/'longer' previous value is re-read from the buffer): $ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0 $ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0 Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0 Result: OK: min_pkt_size=12345 $ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0 Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0 Result: OK: min_pkt_size=12345 $ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0 Params: count 1000 min_pkt_size: 123 max_pkt_size: 0 Result: OK: min_pkt_size=123 So fix the out-of-bounds access (and some minor findings) and add a simple proc_net_pktgen selftest... Patch set splited into part I (now already applied to net-next) - net: pktgen: replace ENOTSUPP with EOPNOTSUPP - net: pktgen: enable 'param=value' parsing - net: pktgen: fix hex32_arg parsing for short reads - net: pktgen: fix 'rate 0' error handling (return -EINVAL) - net: pktgen: fix 'ratep 0' error handling (return -EINVAL) - net: pktgen: fix ctrl interface command parsing - net: pktgen: fix access outside of user given buffer in pktgen_thread_write() nd part II (this one): - net: pktgen: use defines for the various dec/hex number parsing digits lengths - net: pktgen: fix mix of int/long - net: pktgen: remove extra tmp variable (re-use len instead) - net: pktgen: remove some superfluous variable initializing - net: pktgen: fix mpls maximum labels list parsing - net: pktgen: fix access outside of user given buffer in pktgen_if_write() - net: pktgen: fix mpls reset parsing - net: pktgen: remove all superfluous index assignements - selftest: net: add proc_net_pktgen [1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@red-soft.ru/ [2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/ [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76201b5979768500bca362871db66d77cb4c225e ==================== Link: https://patch.msgid.link/20250227135604.40024-1-ps.report@gmx.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04selftest: net: add proc_net_pktgenPeter Seiderer
Add some test for /proc/net/pktgen/... interface. - enable 'CONFIG_NET_PKTGEN=m' in tools/testing/selftests/net/config Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: remove all superfluous index assignementsPeter Seiderer
Remove all superfluous index ('i += len') assignements (value not used afterwards). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: fix mpls reset parsingPeter Seiderer
Fix mpls list reset parsing to work as describe in Documentation/networking/pktgen.rst: pgset "mpls 0" turn off mpls (or any invalid argument works too!) - before the patch $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 mpls: 00000001, 00000002 Result: OK: mpls=00000001,00000002 $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ echo "mpls 0" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 mpls: 00000000 Result: OK: mpls=00000000 $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ echo "mpls invalid" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 Result: OK: mpls= - after the patch $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 mpls: 00000001, 00000002 Result: OK: mpls=00000001,00000002 $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ echo "mpls 0" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 Result: OK: mpls= $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ echo "mpls invalid" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 Result: OK: mpls= Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: fix access outside of user given buffer in pktgen_if_write()Peter Seiderer
Honour the user given buffer size for the hex32_arg(), num_arg(), strn_len(), get_imix_entries() and get_labels() calls (otherwise they will access memory outside of the user given buffer). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: fix mpls maximum labels list parsingPeter Seiderer
Fix mpls maximum labels list parsing up to MAX_MPLS_LABELS entries (instead of up to MAX_MPLS_LABELS - 1). Addresses the following: $ echo "mpls 00000f00,00000f01,00000f02,00000f03,00000f04,00000f05,00000f06,00000f07,00000f08,00000f09,00000f0a,00000f0b,00000f0c,00000f0d,00000f0e,00000f0f" > /proc/net/pktgen/lo\@0 -bash: echo: write error: Argument list too long Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: remove some superfluous variable initializingPeter Seiderer
Remove some superfluous variable initializing before hex32_arg call (as the same init is done here already). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: remove extra tmp variable (re-use len instead)Peter Seiderer
Remove extra tmp variable in pktgen_if_write (re-use len instead). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: fix mix of int/longPeter Seiderer
Fix mix of int/long (and multiple conversion from/to) by using consequently size_t for i and max and ssize_t for len and adjust function signatures of hex32_arg(), count_trail_chars(), num_arg() and strn_len() accordingly. Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-03net: sfp: add quirk for FS SFP-10GM-T copper SFP+ moduleMartin Schiller
Add quirk for a copper SFP that identifies itself as "FS" "SFP-10GM-T". It uses RollBall protocol to talk to the PHY and needs 4 sec wait before probing the PHY. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://patch.msgid.link/20250227071058.1520027-1-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03mptcp: Remove unused declaration mptcp_set_owner_r()Yue Haibing
Commit 6639498ed85f ("mptcp: cleanup mem accounting") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228095148.4003065-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03Merge branch 'add-sock_kmemdup-helper'Jakub Kicinski
Geliang Tang says: ==================== add sock_kmemdup helper While developing MPTCP BPF path manager [1], I found it's useful to add a new sock_kmemdup() helper. My use case is this: In mptcp_userspace_pm_append_new_local_addr() function (see patch 3 in this patchset), it uses sock_kmalloc() to allocate an address entry "e", then immediately duplicate the input "entry" to it: ''' e = sock_kmalloc(sk, sizeof(*e), GFP_ATOMIC); if (!e) { ret = -ENOMEM; goto append_err; } *e = *entry; ''' When I implemented MPTCP BPF path manager, I needed to implement a code similar to this in BPF. The kfunc sock_kmalloc() can be easily invoked in BPF to allocate an entry "e", but the code "*e = *entry;" that assigns "entry" to "e" is not easy to implemented. I had to implement such a "copy entry" helper in BPF: ''' static void mptcp_pm_copy_addr(struct mptcp_addr_info *dst, struct mptcp_addr_info *src) { dst->id = src->id; dst->family = src->family; dst->port = src->port; if (src->family == AF_INET) { dst->addr.s_addr = src->addr.s_addr; } else if (src->family == AF_INET6) { dst->addr6.s6_addr32[0] = src->addr6.s6_addr32[0]; dst->addr6.s6_addr32[1] = src->addr6.s6_addr32[1]; dst->addr6.s6_addr32[2] = src->addr6.s6_addr32[2]; dst->addr6.s6_addr32[3] = src->addr6.s6_addr32[3]; } } static void mptcp_pm_copy_entry(struct mptcp_pm_addr_entry *dst, struct mptcp_pm_addr_entry *src) { mptcp_pm_copy_addr(&dst->addr, &src->addr); dst->flags = src->flags; dst->ifindex = src->ifindex; } ''' And add "write permission" for BPF to each field of mptcp_pm_addr_entry: ''' @@ static int bpf_mptcp_pm_btf_struct_access(struct bpf_verifier_log *log, case offsetof(struct mptcp_pm_addr_entry, addr.port): end = offsetofend(struct mptcp_pm_addr_entry, addr.port); break; #if IS_ENABLED(CONFIG_MPTCP_IPV6) case offsetof(struct mptcp_pm_addr_entry, addr.addr6.s6_addr32[0]): end = offsetofend(struct mptcp_pm_addr_entry, addr.addr6.s6_addr32[0]); break; case offsetof(struct mptcp_pm_addr_entry, addr.addr6.s6_addr32[1]): end = offsetofend(struct mptcp_pm_addr_entry, addr.addr6.s6_addr32[1]); break; case offsetof(struct mptcp_pm_addr_entry, addr.addr6.s6_addr32[2]): end = offsetofend(struct mptcp_pm_addr_entry, addr.addr6.s6_addr32[2]); break; case offsetof(struct mptcp_pm_addr_entry, addr.addr6.s6_addr32[3]): end = offsetofend(struct mptcp_pm_addr_entry, addr.addr6.s6_addr32[3]); break; #else case offsetof(struct mptcp_pm_addr_entry, addr.addr.s_addr): end = offsetofend(struct mptcp_pm_addr_entry, addr.addr.s_addr); break; #endif ''' But if there's a sock_kmemdup() helper, it will become much simpler, only need to call kfunc sock_kmemdup() instead in BPF. So this patchset adds this new helper and uses it in several places. [1] https://lore.kernel.org/mptcp/cover.1738924875.git.tanggeliang@kylinos.cn/ ==================== Link: https://patch.msgid.link/cover.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03mptcp: use sock_kmemdup for address entryGeliang Tang
Instead of using sock_kmalloc() to allocate an address entry "e" and then immediately duplicate the input "entry" to it, the newly added sock_kmemdup() helper can be used in mptcp_userspace_pm_append_new_local_addr() to simplify the code. More importantly, the code "*e = *entry;" that assigns "entry" to "e" is not easy to implemented in BPF if we use the same code to implement an append_new_local_addr() helper of a BFP path manager. This patch avoids this type of memory assignment operation. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/3e5a307aed213038a87e44ff93b5793229b16279.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03net: use sock_kmemdup for ip_optionsGeliang Tang
Instead of using sock_kmalloc() to allocate an ip_options and then immediately duplicate another ip_options to the newly allocated one in ipv6_dup_options(), mptcp_copy_ip_options() and sctp_v4_copy_ip_options(), the newly added sock_kmemdup() helper can be used to simplify the code. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/91ae749d66600ec6fb679e0e518fda6acb5c3e6f.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03sock: add sock_kmemdup helperGeliang Tang
This patch adds the sock version of kmemdup() helper, named sock_kmemdup(), to duplicate the input "src" memory block using the socket's option memory buffer. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/f828077394c7d1f3560123497348b438c875b510.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03Merge branch 'tcp-misc-changes'Jakub Kicinski
Eric Dumazet says: ==================== tcp: misc changes Minor changes, following recent changes in TCP stack. ==================== Link: https://patch.msgid.link/20250301201424.2046477-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: tcp_set_window_clamp() cleanupEric Dumazet
Remove one indentation level. Use max_t() and clamp() macros. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: remove READ_ONCE(req->ts_recent)Eric Dumazet
After commit 8d52da23b6c6 ("tcp: Defer ts_recent changes until req is owned"), req->ts_recent is not changed anymore. It is set once in tcp_openreq_init(), bpf_sk_assign_tcp_reqsk() or cookie_tcp_reqsk_alloc() before the req can be seen by other cpus/threads. This completes the revert of eba20811f326 ("tcp: annotate data-races around tcp_rsk(req)->ts_recent"). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wang Hai <wanghai38@huawei.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03net: gro: convert four dev_net() callsEric Dumazet
tcp4_check_fraglist_gro(), tcp6_check_fraglist_gro(), udp4_gro_lookup_skb() and udp6_gro_lookup_skb() assume RCU is held so that the net structure does not disappear. Use dev_net_rcu() instead of dev_net() to get LOCKDEP support. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: convert to dev_net_rcu()Eric Dumazet
TCP uses of dev_net() are under RCU protection, change them to dev_net_rcu() to get LOCKDEP support. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: add four drop reasons to tcp_check_req()Eric Dumazet
Use two existing drop reasons in tcp_check_req(): - TCP_RFC7323_PAWS - TCP_OVERWINDOW Add two new ones: - TCP_RFC7323_TSECR (corresponds to LINUX_MIB_TSECRREJECTED) - TCP_LISTEN_OVERFLOW (when a listener accept queue is full) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: add a drop_reason pointer to tcp_check_req()Eric Dumazet
We want to add new drop reasons for packets dropped in 3WHS in the following patches. tcp_rcv_state_process() has to set reason to TCP_FASTOPEN, because tcp_check_req() will conditionally overwrite the drop_reason. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03Merge branch 'ipv4-fib-convert-rtm_newroute-and-rtm_delroute-to-per-netns-rtnl'Jakub Kicinski
Kuniyuki Iwashima says: ==================== ipv4: fib: Convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL. Patch 1 is misc cleanup. Patch 2 ~ 8 converts two fib_info hash tables to per-netns. Patch 9 ~ 12 converts rtnl_lock() to rtnl_net_lcok(). v2: https://lore.kernel.org/20250226192556.21633-1-kuniyu@amazon.com v1: https://lore.kernel.org/20250225182250.74650-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250228042328.96624-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.Kuniyuki Iwashima
We converted fib_info hash tables to per-netns one and now ready to convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL. Let's hold rtnl_net_lock() in inet_rtm_newroute() and inet_rtm_delroute(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-13-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config().Kuniyuki Iwashima
fib_valid_key_len() is called in the beginning of fib_table_insert() or fib_table_delete() to check if the prefix length is valid. fib_table_insert() and fib_table_delete() are called from 3 paths - ip_rt_ioctl() - inet_rtm_newroute() / inet_rtm_delroute() - fib_magic() In the first ioctl() path, rtentry_to_fib_config() checks the prefix length with bad_mask(). Also, fib_magic() always passes the correct prefix: 32 or ifa->ifa_prefixlen, which is already validated. Let's move fib_valid_key_len() to the rtnetlink path, rtm_to_fib_config(). While at it, 2 direct returns in rtm_to_fib_config() are changed to goto to match other places in the same function Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Hold rtnl_net_lock() in ip_rt_ioctl().Kuniyuki Iwashima
ioctl(SIOCADDRT/SIOCDELRT) calls ip_rt_ioctl() to add/remove a route in the netns of the specified socket. Let's hold rtnl_net_lock() there. Note that rtentry_to_fib_config() can be called without rtnl_net_lock() if we convert rtentry.dev handling to RCU later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Hold rtnl_net_lock() for ip_fib_net_exit().Kuniyuki Iwashima
ip_fib_net_exit() requires RTNL and is called from fib_net_init() and fib_net_exit_batch(). Let's hold rtnl_net_lock() before ip_fib_net_exit(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>