summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-23Merge branch 'bridge-mc-per-vlan-qquery'David S. Miller
Yong Wang says: ==================== bridge: multicast: per vlan query improvement when port or vlan state changes The current implementation of br_multicast_enable_port() only operates on port's multicast context, which doesn't take into account in case of vlan snooping, one downside is the port's igmp query timer will NOT resume when port state gets changed from BR_STATE_BLOCKING to BR_STATE_FORWARDING etc. Such code flow will briefly look like: 1.vlan snooping --> br_multicast_port_query_expired with per vlan port_mcast_ctx --> port in BR_STATE_BLOCKING state --> then one-shot timer discontinued The port state could be changed by STP daemon or kernel STP, taking mstpd as example: 2.mstpd --> netlink_sendmsg --> br_setlink --> br_set_port_state with non blocking states, i.e. BR_STATE_LEARNING or BR_STATE_FORWARDING --> br_port_state_selection --> br_multicast_enable_port --> enable multicast with port's multicast_ctx Here for per vlan snooping, the vlan context of the port should be used instead of port's multicast_ctx. The first patch corrects such behavior. Similarly, vlan state change also impacts multicast behavior, the 2nd patch adds function to update the corresponding multicast context when vlan state changes. The 3rd patch adds the selftests to confirm that IGMP/MLD query does happen when the STP state becomes forwarding. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-23selftests: net/bridge : add tests for per vlan snooping with stp state changesYong Wang
Change ALL_TESTS definition to "test-per-line". Add the test case of per vlan snooping with port stp state change to forwarding and also vlan equivalent case in both bridge_igmp.sh and bridge_mld.sh. Signed-off-by: Yong Wang <yongwang@nvidia.com> Reviewed-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-23net: bridge: mcast: update multicast contex when vlan state is changedYong Wang
When the vlan STP state is changed, which could be manipulated by "bridge vlan" commands, similar to port STP state, this also impacts multicast behaviors such as igmp query. In the scenario of per-VLAN snooping, there's a need to update the corresponding multicast context to re-arm the port query timer when vlan state becomes "forwarding" etc. Update br_vlan_set_state() function to enable vlan multicast context in such scenario. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <yongwang@nvidia.com> Reviewed-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-23net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functionsYong Wang
When a bridge port STP state is changed from BLOCKING/DISABLED to FORWARDING, the port's igmp query timer will NOT re-arm itself if the bridge has been configured as per-VLAN multicast snooping. Solve this by choosing the correct multicast context(s) to enable/disable port multicast based on whether per-VLAN multicast snooping is enabled or not, i.e. using per-{port, VLAN} context in case of per-VLAN multicast snooping by re-implementing br_multicast_enable_port() and br_multicast_disable_port() functions. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <yongwang@nvidia.com> Reviewed-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-22xdp: create locked/unlocked instances of xdp redirect target settersJoshua Washington
Commit 03df156dd3a6 ("xdp: double protect netdev->xdp_flags with netdev->lock") introduces the netdev lock to xdp_set_features_flag(). The change includes a _locked version of the method, as it is possible for a driver to have already acquired the netdev lock before calling this helper. However, the same applies to xdp_features_(set|clear)_redirect_flags(), which ends up calling the unlocked version of xdp_set_features_flags() leading to deadlocks in GVE, which grabs the netdev lock as part of its suspend, reset, and shutdown processes: [ 833.265543] WARNING: possible recursive locking detected [ 833.270949] 6.15.0-rc1 #6 Tainted: G E [ 833.276271] -------------------------------------------- [ 833.281681] systemd-shutdow/1 is trying to acquire lock: [ 833.287090] ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: xdp_set_features_flag+0x29/0x90 [ 833.295470] [ 833.295470] but task is already holding lock: [ 833.301400] ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: gve_shutdown+0x44/0x90 [gve] [ 833.309508] [ 833.309508] other info that might help us debug this: [ 833.316130] Possible unsafe locking scenario: [ 833.316130] [ 833.322142] CPU0 [ 833.324681] ---- [ 833.327220] lock(&dev->lock); [ 833.330455] lock(&dev->lock); [ 833.333689] [ 833.333689] *** DEADLOCK *** [ 833.333689] [ 833.339701] May be due to missing lock nesting notation [ 833.339701] [ 833.346582] 5 locks held by systemd-shutdow/1: [ 833.351205] #0: ffffffffa9c89130 (system_transition_mutex){+.+.}-{4:4}, at: __se_sys_reboot+0xe6/0x210 [ 833.360695] #1: ffff93b399e5c1b8 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xb4/0x1f0 [ 833.369144] #2: ffff949d19a471b8 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xc2/0x1f0 [ 833.377603] #3: ffffffffa9eca050 (rtnl_mutex){+.+.}-{4:4}, at: gve_shutdown+0x33/0x90 [gve] [ 833.386138] #4: ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: gve_shutdown+0x44/0x90 [gve] Introduce xdp_features_(set|clear)_redirect_target_locked() versions which assume that the netdev lock has already been acquired before setting the XDP feature flag and update GVE to use the locked version. Fixes: 03df156dd3a6 ("xdp: double protect netdev->xdp_flags with netdev->lock") Tested-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250422011643.3509287-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22Merge branch 'implement-udp-tunnel-port-for-txgbe'Jakub Kicinski
Jiawen Wu says: ==================== Implement udp tunnel port for txgbe v3: https://lore.kernel.org/20250417080328.426554-1-jiawenwu@trustnetic.com v2: https://lore.kernel.org/20250414091022.383328-1-jiawenwu@trustnetic.com v1: https://lore.kernel.org/20250410074456.321847-1-jiawenwu@trustnetic.com ==================== Link: https://patch.msgid.link/20250421022956.508018-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: wangxun: restrict feature flags for tunnel packetsJiawen Wu
Implement ndo_features_check to restrict Tx checksum offload flags, since there are some inner layer length and protocols unsupported. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20250421022956.508018-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: txgbe: Support to set UDP tunnel portJiawen Wu
Tunnel types VXLAN/VXLAN_GPE/GENEVE are supported for txgbe devices. The hardware supports to set only one port for each tunnel type. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250421022956.508018-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22Merge branch 'net-followup-series-for-exit_rtnl'Jakub Kicinski
Kuniyuki Iwashima says: ==================== net: Followup series for ->exit_rtnl(). Patch 1 drops the hold_rtnl arg in ops_undo_list() as suggested by Jakub. Patch 2 & 3 apply ->exit_rtnl() to pfcp and ppp. v1: https://lore.kernel.org/20250415022258.11491-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250418003259.48017-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22ppp: Split ppp_exit_net() to ->exit_rtnl().Kuniyuki Iwashima
ppp_exit_net() unregisters devices related to the netns under RTNL and destroys lists and IDR. Let's use ->exit_rtnl() for the device unregistration part to save RTNL dances for each netns. Note that we delegate the for_each_netdev_safe() part to default_device_exit_batch() and replace unregister_netdevice_queue() with ppp_nl_dellink() to align with bond, geneve, gtp, and pfcp. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418003259.48017-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22pfcp: Convert pfcp_net_exit() to ->exit_rtnl().Kuniyuki Iwashima
pfcp_net_exit() holds RTNL and cleans up all devices in the netns and other devices tied to sockets in the netns. We can use ->exit_rtnl() to save RTNL dance for all dying netns. Note that we delegate the for_each_netdev() part to default_device_exit_batch() to avoid a list corruption splat like the one reported in commit 4ccacf86491d ("gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418003259.48017-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: Drop hold_rtnl arg from ops_undo_list().Kuniyuki Iwashima
ops_undo_list() first iterates over ops_list for ->pre_exit(). Let's check if any of the ops has ->exit_rtnl() there and drop the hold_rtnl argument. Note that nexthop uses ->exit_rtnl() and is built-in, so hold_rtnl is always true for setup_net() and cleanup_net() for now. Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/netdev/20250414170148.21f3523c@kernel.org/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418003259.48017-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: stmmac: visconti: convert to set_clk_tx_rate() methodRussell King (Oracle)
Convert visconti to use the set_clk_tx_rate() method. By doing so, the GMAC control register will already have been updated (unlike with the fix_mac_speed() method) so this code can be removed while porting to the set_clk_tx_rate() method. There is also no need for the spinlock, and has never been - neither fix_mac_speed() nor set_clk_tx_rate() can be called by more than one thread at a time, so the lock does nothing useful. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u5SiQ-001I0B-OQ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: dsa: rzn1_a5psw: Make the read-only array offsets static constColin Ian King
Don't populate the read-only array offsets on the stack at run time, instead make it static const. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://patch.msgid.link/20250417161353.490219-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: ethernet: mtk_eth_soc: net: revise NETSYSv3 hardware configurationBo-Cun Chen
Change hardware configuration for the NETSYSv3. - Enable PSE dummy page mechanism for the GDM1/2/3 - Enable PSE drop mechanism when the WDMA Rx ring full - Enable PSE no-drop mechanism for packets from the WDMA Tx - Correct PSE free drop threshold - Correct PSE CDMA high threshold Fixes: 1953f134a1a8b ("net: ethernet: mtk_eth_soc: add NETSYS_V3 version support") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/b71f8fd9d4bb69c646c4d558f9331dd965068606.1744907886.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22Merge branch 'add-gbeth-glue-layer-driver-for-renesas-rz-v2h-p-soc'Jakub Kicinski
Lad Prabhakar says: ==================== Add GBETH glue layer driver for Renesas RZ/V2H(P) SoC This patch series adds support for the GBETH (Gigabit Ethernet) glue layer driver for the Renesas RZ/V2H(P) SoC. The GBETH IP is integrated with the Synopsys DesignWare MAC (version 5.20). The changes include updating the device tree bindings, documenting the GBETH bindings, and adding the DWMAC glue layer for the Renesas GBETH. v1: https://lore.kernel.org/20250302181808.728734-1-prabhakar.mahadev-lad.rj@bp.renesas.com ==================== Link: https://patch.msgid.link/20250417084015.74154-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22MAINTAINERS: Add entry for Renesas RZ/V2H(P) DWMAC GBETH glue layer driverLad Prabhakar
Add a new MAINTAINERS entry for the Renesas RZ/V2H(P) DWMAC GBETH glue layer driver. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250417084015.74154-5-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: stmmac: Add DWMAC glue layer for Renesas GBETHLad Prabhakar
Add the DWMAC glue layer for the GBETH IP found in the Renesas RZ/V2H(P) SoC. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://patch.msgid.link/20250417084015.74154-4-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22dt-bindings: net: Document support for Renesas RZ/V2H(P) GBETHLad Prabhakar
GBETH IP on the Renesas RZ/V2H(P) SoC is integrated with Synopsys DesignWare MAC (version 5.20). Document the device tree bindings for the GBETH glue layer. Generic compatible string 'renesas,rzv2h-gbeth' is added since this module is identical on both the RZ/V2H(P) and RZ/G3E SoCs. The Rx/Tx clocks supplied for GBETH on the RZ/V2H(P) SoC is depicted below: Rx / Tx -------+------------- on / off ------- | | Rx-180 / Tx-180 +---- not ---- on / off ------- Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250417084015.74154-3-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22dt-bindings: net: dwmac: Increase 'maxItems' for 'interrupts' and ↵Lad Prabhakar
'interrupt-names' Increase the `maxItems` value for the `interrupts` and `interrupt-names` properties to 11 to support additional per-channel Tx/Rx completion interrupts on the Renesas RZ/V2H(P) SoC, which features the `snps,dwmac-5.20` IP. Refactor the `interrupt-names` property by replacing repeated `enum` entries with a `oneOf` list. Add support for per-channel receive and transmit completion interrupts using regex patterns. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250417084015.74154-2-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22tipc: fix NULL pointer dereference in tipc_mon_reinit_self()Tung Nguyen
syzbot reported: tipc: Node number set to 1055423674 Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 3 UID: 0 PID: 6017 Comm: kworker/3:5 Not tainted 6.15.0-rc1-syzkaller-00246-g900241a5cc15 #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Workqueue: events tipc_net_finalize_work RIP: 0010:tipc_mon_reinit_self+0x11c/0x210 net/tipc/monitor.c:719 ... RSP: 0018:ffffc9000356fb68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003ee87cba RDX: 0000000000000000 RSI: ffffffff8dbc56a7 RDI: ffff88804c2cc010 RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007 R13: fffffbfff2111097 R14: ffff88804ead8000 R15: ffff88804ead9010 FS: 0000000000000000(0000) GS:ffff888097ab9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000f720eb00 CR3: 000000000e182000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tipc_net_finalize+0x10b/0x180 net/tipc/net.c:140 process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3319 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400 kthread+0x3c2/0x780 kernel/kthread.c:464 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:153 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> ... RIP: 0010:tipc_mon_reinit_self+0x11c/0x210 net/tipc/monitor.c:719 ... RSP: 0018:ffffc9000356fb68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003ee87cba RDX: 0000000000000000 RSI: ffffffff8dbc56a7 RDI: ffff88804c2cc010 RBP: dffffc0000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007 R13: fffffbfff2111097 R14: ffff88804ead8000 R15: ffff88804ead9010 FS: 0000000000000000(0000) GS:ffff888097ab9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000f720eb00 CR3: 000000000e182000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 There is a racing condition between workqueue created when enabling bearer and another thread created when disabling bearer right after that as follow: enabling_bearer | disabling_bearer --------------- | ---------------- tipc_disc_timeout() | { | bearer_disable() ... | { schedule_work(&tn->work); | tipc_mon_delete() ... | { } | ... | write_lock_bh(&mon->lock); | mon->self = NULL; | write_unlock_bh(&mon->lock); | ... | } tipc_net_finalize_work() | } { | ... | tipc_net_finalize() | { | ... | tipc_mon_reinit_self() | { | ... | write_lock_bh(&mon->lock); | mon->self->addr = tipc_own_addr(net); | write_unlock_bh(&mon->lock); | ... | } | ... | } | ... | } | 'mon->self' is set to NULL in disabling_bearer thread and dereferenced later in enabling_bearer thread. This commit fixes this issue by validating 'mon->self' before assigning node address to it. Reported-by: syzbot+ed60da8d686dc709164c@syzkaller.appspotmail.com Fixes: 46cb01eeeb86 ("tipc: update mon's self addr when node addr generated") Signed-off-by: Tung Nguyen <tung.quang.nguyen@est.tech> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250417074826.578115-1-tung.quang.nguyen@est.tech Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22ptp: Do not enable by default during compile testingKrzysztof Kozlowski
Enabling the compile test should not cause automatic enabling of all drivers, but only allow to choose to compile them. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250417074643.81448-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23crypto: atmel-sha204a - Set hwrng quality to lowest possibleMarek Behún
According to the review by Bill Cox [1], the Atmel SHA204A random number generator produces random numbers with very low entropy. Set the lowest possible entropy for this chip just to be safe. [1] https://www.metzdowd.com/pipermail/cryptography/2014-December/023858.html Fixes: da001fb651b00e1d ("crypto: atmel-i2c - add support for SHA204A random number generator") Cc: <stable@vger.kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-23crypto: scomp - Fix off-by-one bug when calculating last pageHerbert Xu
Fix off-by-one bug in the last page calculation for src and dst. Reported-by: Nhat Pham <nphamcs@gmail.com> Fixes: 2d3553ecb4e3 ("crypto: scomp - Remove support for some non-trivial SG lists") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-04-22virtio-net: disable delayed refill when pausing rxBui Quang Minh
When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call napi_disable() on the receive queue's napi. In delayed refill_work, it also calls napi_disable() on the receive queue's napi. When napi_disable() is called on an already disabled napi, it will sleep in napi_disable_locked while still holding the netdev_lock. As a result, later napi_enable gets stuck too as it cannot acquire the netdev_lock. This leads to refill_work and the pause-then-resume tx are stuck altogether. This scenario can be reproducible by binding a XDP socket to virtio-net interface without setting up the fill ring. As a result, try_fill_recv will fail until the fill ring is set up and refill_work is scheduled. This commit adds virtnet_rx_(pause/resume)_all helpers and fixes up the virtnet_rx_resume to disable future and cancel all inflights delayed refill_work before calling napi_disable() to pause the rx. Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250417072806.18660-2-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: phy: leds: fix memory leakQingfang Deng
A network restart test on a router led to an out-of-memory condition, which was traced to a memory leak in the PHY LED trigger code. The root cause is misuse of the devm API. The registration function (phy_led_triggers_register) is called from phy_attach_direct, not phy_probe, and the unregister function (phy_led_triggers_unregister) is called from phy_detach, not phy_remove. This means the register and unregister functions can be called multiple times for the same PHY device, but devm-allocated memory is not freed until the driver is unbound. This also prevents kmemleak from detecting the leak, as the devm API internally stores the allocated pointer. Fix this by replacing devm_kzalloc/devm_kcalloc with standard kzalloc/kcalloc, and add the corresponding kfree calls in the unregister path. Fixes: 3928ee6485a3 ("net: phy: leds: Add support for "link" trigger") Fixes: 2e0bc452f472 ("net: phy: leds: add support for led triggers on phy link state change") Signed-off-by: Hao Guan <hao.guan@siflower.com.cn> Signed-off-by: Qingfang Deng <qingfang.deng@siflower.com.cn> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250417032557.2929427-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22emulex/benet: Annotate flash_cookie as nonstringKees Cook
GCC 15's new -Wunterminated-string-initialization notices that the 32 character "flash_cookie" (which is not used as a C-String) needs to be marked as "nonstring": drivers/net/ethernet/emulex/benet/be_cmds.c:2618:51: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (17 chars into 16 available) [-Wunterminated-string-initialization] 2618 | static char flash_cookie[2][16] = {"*** SE FLAS", "H DIRECTORY *** "}; | ^~~~~~~~~~~~~~~~~~ Add this annotation, avoid using a multidimensional array, but keep the string split (with a comment about why). Additionally mark it const and annotate the "cookie" member that is being memcmp()ed against as nonstring too. Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250416221028.work.967-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: phylink: mac_link_(up|down)() clarificationsRussell King (Oracle)
As a result of an email from the fbnic author, I reviewed the phylink documentation, and I have decided to clarify the wording in the mac_link_(up|down)() kernel documentation as this was written from the point of view of mvneta/mvpp2 and is misleading. The documentation talks about forcing the link - indeed, this is what is done in the mvneta and mvpp2 drivers but not at the physical layer but the MACs idea, which has the effect of only allowing or stopping packet flow at the MAC. This "link" needs to be controlled when using a PHY or fixed link to start or stop packet flow at the MAC. However, as the MAC and PCS are tightly integrated, if the MACs idea of the link is forced down, it has the side effect that there is no way to determine that the media link has come up - in this mode, the MAC must be allowed to follow its built-in PCS so we can read the link state. Frame the documentation in more generic terms, to avoid the thought that the physical media link to the partner needs in some way to be forced up or down with these calls; it does not. If that were to be done, it would be a self-fulfilling prophecy - e.g. if the media link goes down, then mac_link_down() will be called, and if the media link is then placed into a forced down state, there is no possibility that the media link will ever come up again - clearly this is a wrong interpretation. These methods are notifications to the MAC about what has happened to the media link state - either from the PHY, or a PCS, or whatever mechanism fixed-link is using. Thus, reword them to get away from talking about changing link state to avoid confusion with media link state. This is not a change of any requirements of these methods. Also, remove the obsolete references to EEE for these methods, we now have the LPI functions for configuring the EEE parameters which renders this redundant, and also makes the passing of "phy" to the mac_link_up() function obsolete. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/E1u5Ah5-001GO1-7E@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22Merge branch ↵Jakub Kicinski
'net-phy-dp83822-add-support-for-changing-the-mac-series-termination' Dimitri Fedrau says: ==================== net: phy: dp83822: Add support for changing the MAC series termination The dp83822 provides the possibility to set the resistance value of the the MAC series termination. Modifying the resistance to an appropriate value can reduce signal reflections and therefore improve signal quality. v2: https://lore.kernel.org/20250408-dp83822-mac-impedance-v2-0-fefeba4a9804@liebherr.com v1: https://lore.kernel.org/20250307-dp83822-mac-impedance-v1-0-bdd85a759b45@liebherr.com ==================== Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-0-028ac426cddb@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: phy: dp83822: Add support for changing the MAC terminationDimitri Fedrau
The dp83822 provides the possibility to set the resistance value of the the MAC termination. Modifying the resistance to an appropriate value can reduce signal reflections and therefore improve signal quality. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com> Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-4-028ac426cddb@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: phy: Add helper for getting MAC termination resistanceDimitri Fedrau
Add helper which returns the MAC termination resistance value. Modifying the resistance to an appropriate value can reduce signal reflections and therefore improve signal quality. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com> Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-3-028ac426cddb@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22dt-bindings: net: dp83822: add constraints for mac-termination-ohmsDimitri Fedrau
Property mac-termination-ohms is defined in ethernet-phy.yaml. Add allowed values for the property. Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-2-028ac426cddb@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22dt-bindings: net: ethernet-phy: add property mac-termination-ohmsDimitri Fedrau
Add property mac-termination-ohms in the device tree bindings for selecting the resistance value of the builtin series termination resistors of the PHY. Changing the resistance to an appropriate value can reduce signal reflections and therefore improve signal quality. Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250416-dp83822-mac-impedance-v3-1-028ac426cddb@liebherr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22r8169: use pci_prepare_to_sleep in rtl_shutdownHeiner Kallweit
Use pci_prepare_to_sleep() like PCI core does in pci_pm_suspend_noirq. This aligns setting a low-power mode during shutdown with the handling of the transition to system suspend. Also the transition to runtime suspend uses pci_target_state() instead of setting D3hot unconditionally. Note: pci_prepare_to_sleep() uses device_may_wakeup() to check whether device may generate wakeup events. So we don't lose anything by not passing tp->saved_wolopts any longer. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/f573fdbd-ba6d-41c1-b68f-311d3c88db2c@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22octeontx2-af: Remove unused rvu_npc_enable_bcast_entryDr. David Alan Gilbert
The last use of rvu_npc_enable_bcast_entry() was removed in 2021 by commit 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250420225810.171852-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: phylink: fix suspend/resume with WoL enabled and link downRussell King (Oracle)
When WoL is enabled, we update the software state in phylink to indicate that the link is down, and disable the resolver from bringing the link back up. On resume, we attempt to bring the overall state into consistency by calling the .mac_link_down() method, but this is wrong if the link was already down, as phylink strictly orders the .mac_link_up() and .mac_link_down() methods - and this would break that ordering. Fixes: f97493657c63 ("net: phylink: add suspend/resume support") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u55Qf-0016RN-PA@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22Merge tag 'for-6.15-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - subpage mode fixes: - access correct object (folio) when looking up bit offset - fix assertion condition for number of blocks per folio - fix upper boundary of locking range in hole punch - zoned fixes: - fix potential deadlock caught by lockdep when zone reporting and device freeze run in parallel - fix zone write pointer mismatch and NULL pointer dereference when metadata are converted from DUP to RAID1 - fix error handling when reloc inode creation fails - in tree-checker, unify error code for header level check - block layer: add helpers to read zone capacity * tag 'for-6.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: skip reporting zone for new block group block: introduce zone capacity helper btrfs: tree-checker: adjust error code for header level check btrfs: fix invalid inode pointer after failure to create reloc inode btrfs: zoned: return EIO on RAID1 block group write pointer mismatch btrfs: fix the ASSERT() inside GET_SUBPAGE_BITMAP() btrfs: avoid page_lockend underflow in btrfs_punch_hole_lock_range() btrfs: subpage: access correct object when reading bitmap start in subpage_calc_start_bit()
2025-04-22Merge tag 'integrity-6.15-rc3-fix' of https://github.com/linux-integrity/linuxLinus Torvalds
Pull integrity fix from Roberto Sassu: "One performance fix to avoid unnecessarily taking the inode lock" * tag 'integrity-6.15-rc3-fix' of https://github.com/linux-integrity/linux: ima: process_measurement() needlessly takes inode_lock() on MAY_READ
2025-04-22ima: process_measurement() needlessly takes inode_lock() on MAY_READFrederick Lawler
On IMA policy update, if a measure rule exists in the policy, IMA_MEASURE is set for ima_policy_flags which makes the violation_check variable always true. Coupled with a no-action on MAY_READ for a FILE_CHECK call, we're always taking the inode_lock(). This becomes a performance problem for extremely heavy read-only workloads. Therefore, prevent this only in the case there's no action to be taken. Signed-off-by: Frederick Lawler <fred@cloudflare.com> Acked-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-04-22net: 802: Remove unused p8022 codeDr. David Alan Gilbert
p8022.c defines two external functions, register_8022_client() and unregister_8022_client(), the last use of which was removed in 2018 by commit 7a2e838d28cf ("staging: ipx: delete it from the tree") Remove the p8022.c file, it's corresponding header, and glue surrounding it. There was one place the header was included in vlan.c but it didn't use the functions it declared. There was a comment in net/802/Makefile about checking against net/core/Makefile, but that's at least 20 years old and there's no sign of net/core/Makefile mentioning it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250418011519.145320-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: lwtunnel: disable BHs when requiredJustin Iurman
In lwtunnel_{output|xmit}(), dev_xmit_recursion() may be called in preemptible scope for PREEMPT kernels. This patch disables BHs before calling dev_xmit_recursion(). BHs are re-enabled only at the end, since we must ensure the same CPU is used for both dev_xmit_recursion_inc() and dev_xmit_recursion_dec() (and any other recursion levels in some cases) in order to maintain valid per-cpu counters. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/netdev/CAADnVQJFWn3dBFJtY+ci6oN1pDFL=TzCmNbRgey7MdYxt_AP2g@mail.gmail.com/ Reported-by: Eduard Zingerman <eddyz87@gmail.com> Closes: https://lore.kernel.org/netdev/m2h62qwf34.fsf@gmail.com/ Fixes: 986ffb3a57c5 ("net: lwtunnel: fix recursion loops") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250416160716.8823-1-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22net: selftests: initialize TCP header and skb payload with zeroOleksij Rempel
Zero-initialize TCP header via memset() to avoid garbage values that may affect checksum or behavior during test transmission. Also zero-fill allocated payload and padding regions using memset() after skb_put(), ensuring deterministic content for all outgoing test packets. Fixes: 3e1e58d64c3d ("net: add generic selftest support") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250416160125.2914724-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22rtase: Add ndo_setup_tc support for CBS offload in traffic control setupJustin Lai
Add support for ndo_setup_tc to enable CBS offload functionality as part of traffic control configuration for network devices, where CBS is applied from the CPU to the switch. More specifically, CBS is applied at the GMAC in the topmost architecture diagram. Signed-off-by: Justin Lai <justinlai0215@realtek.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250416115757.28156-1-justinlai0215@realtek.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22rxrpc: rxgk: Set error code in rxgk_yfs_decode_ticket()Dan Carpenter
Propagate the error code if key_alloc() fails. Don't return success. Fixes: 9d1d2b59341f ("rxrpc: rxgk: Implement the yfs-rxgk security class (GSSAPI)") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/Z_-P_1iLDWksH1ik@stanley.mountain Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22net: phy: microchip: force IRQ polling mode for lan88xxFiona Klute
With lan88xx based devices the lan78xx driver can get stuck in an interrupt loop while bringing the device up, flooding the kernel log with messages like the following: lan78xx 2-3:1.0 enp1s0u3: kevent 4 may have been dropped Removing interrupt support from the lan88xx PHY driver forces the driver to use polling instead, which avoids the problem. The issue has been observed with Raspberry Pi devices at least since 4.14 (see [1], bug report for their downstream kernel), as well as with Nvidia devices [2] in 2020, where disabling interrupts was the vendor-suggested workaround (together with the claim that phylib changes in 4.9 made the interrupt handling in lan78xx incompatible). Iperf reports well over 900Mbits/sec per direction with client in --dualtest mode, so there does not seem to be a significant impact on throughput (lan88xx device connected via switch to the peer). [1] https://github.com/raspberrypi/linux/issues/2447 [2] https://forums.developer.nvidia.com/t/jetson-xavier-and-lan7800-problem/142134/11 Link: https://lore.kernel.org/0901d90d-3f20-4a10-b680-9c978e04ddda@lunn.ch Fixes: 792aec47d59d ("add microchip LAN88xx phy driver") Signed-off-by: Fiona Klute <fiona.klute@gmx.de> Cc: kernel-list@raspberrypi.com Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250416102413.30654-1-fiona.klute@gmx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22Merge branch 'ionic-support-qsfp-cmis'Paolo Abeni
Shannon Nelson says: ==================== ionic: support QSFP CMIS This patchset sets up support for additional pages and better handling of the QSFP CMIS data. v1: https://lore.kernel.org/netdev/20250411182140.63158-1-shannon.nelson@amd.com/ ==================== Link: https://patch.msgid.link/20250415231317.40616-1-shannon.nelson@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22ionic: add module eeprom channel data to ionic_if and ethtoolShannon Nelson
Make the CMIS module type's page 17 channel data available for ethtool to request. As done previously, carve space for this data from the port_info reserved space. In the future, if additional pages are needed, a new firmware AdminQ command will be added for accessing random pages. Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250415231317.40616-4-shannon.nelson@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22ionic: support ethtool get_module_eeprom_by_pageShannon Nelson
Add support for the newer get_module_eeprom_by_page interface. Only the upper half of the 256 byte page is available for reading, and the firmware puts the two sections into the extended sprom buffer, so a union is used over the extended sprom buffer to make clear which page is to be accessed. With get_module_eeprom_by_page implemented there is no need for the older get_module_info or git_module_eeprom interfaces, so remove them. Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250415231317.40616-3-shannon.nelson@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22ionic: extend the QSFP module sprom for more pagesShannon Nelson
Some QSFP modules have more eeprom to be read by ethtool than the initial high and low page 0 that is currently available in the DSC's ionic sprom[] buffer. Since the current sprom[] is baked into the middle of an existing API struct, to make the high end of page 1 and page 2 available a block is carved from a reserved space of the existing port_info struct and the ionic_get_module_eeprom() service is taught how to get there. Newer firmware writes the additional QSFP page info here, yet this remains backward compatible because older firmware sets this space to all 0 and older ionic drivers do not use the reserved space. Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250415231317.40616-2-shannon.nelson@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-22Merge branch 'vxlan-convert-fdb-table-to-rhashtable'Paolo Abeni
Ido Schimmel says: ==================== vxlan: Convert FDB table to rhashtable The VXLAN driver currently stores FDB entries in a hash table with a fixed number of buckets (256), resulting in reduced performance as the number of entries grows. This patchset solves the issue by converting the driver to use rhashtable which maintains a more or less constant performance regardless of the number of entries. Measured transmitted packets per second using a single pktgen thread with varying number of entries when the transmitted packet always hits the default entry (worst case): Number of entries | Improvement ------------------|------------ 1k | +1.12% 4k | +9.22% 16k | +55% 64k | +585% 256k | +2460% The first patches are preparations for the conversion in the last patch. Specifically, the series is structured as follows: Patch #1 adds RCU read-side critical sections in the Tx path when accessing FDB entries. Targeting at net-next as I am not aware of any issues due to this omission despite the code being structured that way for a long time. Without it, traces will be generated when converting FDB lookup to rhashtable_lookup(). Patch #2-#5 simplify the creation of the default FDB entry (all-zeroes). Current code assumes that insertion into the hash table cannot fail, which will no longer be true with rhashtable. Patches #6-#10 add FDB entries to a linked list for entry traversal instead of traversing over them using the fixed size hash table which is removed in the last patch. Patches #11-#12 add wrappers for FDB lookup that make it clear when each should be used along with lockdep annotations. Needed as a preparation for rhashtable_lookup() that must be called from an RCU read-side critical section. Patch #13 treats dst cache initialization errors as non-fatal. See more info in the commit message. The current code happens to work because insertion into the fixed size hash table is slow enough for the per-CPU allocator to be able to create new chunks of per-CPU memory. Patch #14 adds an FDB key structure that includes the MAC address and source VNI. To be used as rhashtable key. Patch #15 does the conversion to rhashtable. ==================== Link: https://patch.msgid.link/20250415121143.345227-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>