summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-05-17tun: Fix memory leak for detached NAPI queue.Kuniyuki Iwashima
syzkaller reported [0] memory leaks of sk and skb related to the TUN device with no repro, but we can reproduce it easily with: struct ifreq ifr = {} int fd_tun, fd_tmp; char buf[4] = {}; fd_tun = openat(AT_FDCWD, "/dev/net/tun", O_WRONLY, 0); ifr.ifr_flags = IFF_TUN | IFF_NAPI | IFF_MULTI_QUEUE; ioctl(fd_tun, TUNSETIFF, &ifr); ifr.ifr_flags = IFF_DETACH_QUEUE; ioctl(fd_tun, TUNSETQUEUE, &ifr); fd_tmp = socket(AF_PACKET, SOCK_PACKET, 0); ifr.ifr_flags = IFF_UP; ioctl(fd_tmp, SIOCSIFFLAGS, &ifr); write(fd_tun, buf, sizeof(buf)); close(fd_tun); If we enable NAPI and multi-queue on a TUN device, we can put skb into tfile->sk.sk_write_queue after the queue is detached. We should prevent it by checking tfile->detached before queuing skb. Note this must be done under tfile->sk.sk_write_queue.lock because write() and ioctl(IFF_DETACH_QUEUE) can run concurrently. Otherwise, there would be a small race window: write() ioctl(IFF_DETACH_QUEUE) `- tun_get_user `- __tun_detach |- if (tfile->detached) |- tun_disable_queue | `-> false | `- tfile->detached = tun | `- tun_queue_purge |- spin_lock_bh(&queue->lock) `- __skb_queue_tail(queue, skb) Another solution is to call tun_queue_purge() when closing and reattaching the detached queue, but it could paper over another problems. Also, we do the same kind of test for IFF_NAPI_FRAGS. [0]: unreferenced object 0xffff88801edbc800 (size 2048): comm "syz-executor.1", pid 33269, jiffies 4295743834 (age 18.756s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 ...@............ backtrace: [<000000008c16ea3d>] __do_kmalloc_node mm/slab_common.c:965 [inline] [<000000008c16ea3d>] __kmalloc+0x4a/0x130 mm/slab_common.c:979 [<000000003addde56>] kmalloc include/linux/slab.h:563 [inline] [<000000003addde56>] sk_prot_alloc+0xef/0x1b0 net/core/sock.c:2035 [<000000003e20621f>] sk_alloc+0x36/0x2f0 net/core/sock.c:2088 [<0000000028e43843>] tun_chr_open+0x3d/0x190 drivers/net/tun.c:3438 [<000000001b0f1f28>] misc_open+0x1a6/0x1f0 drivers/char/misc.c:165 [<000000004376f706>] chrdev_open+0x111/0x300 fs/char_dev.c:414 [<00000000614d379f>] do_dentry_open+0x2f9/0x750 fs/open.c:920 [<000000008eb24774>] do_open fs/namei.c:3636 [inline] [<000000008eb24774>] path_openat+0x143f/0x1a30 fs/namei.c:3791 [<00000000955077b5>] do_filp_open+0xce/0x1c0 fs/namei.c:3818 [<00000000b78973b0>] do_sys_openat2+0xf0/0x260 fs/open.c:1356 [<00000000057be699>] do_sys_open fs/open.c:1372 [inline] [<00000000057be699>] __do_sys_openat fs/open.c:1388 [inline] [<00000000057be699>] __se_sys_openat fs/open.c:1383 [inline] [<00000000057be699>] __x64_sys_openat+0x83/0xf0 fs/open.c:1383 [<00000000a7d2182d>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<00000000a7d2182d>] do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80 [<000000004cc4e8c4>] entry_SYSCALL_64_after_hwframe+0x72/0xdc unreferenced object 0xffff88802f671700 (size 240): comm "syz-executor.1", pid 33269, jiffies 4295743854 (age 18.736s) hex dump (first 32 bytes): 68 c9 db 1e 80 88 ff ff 68 c9 db 1e 80 88 ff ff h.......h....... 00 c0 7b 2f 80 88 ff ff 00 c8 db 1e 80 88 ff ff ..{/............ backtrace: [<00000000e9d9fdb6>] __alloc_skb+0x223/0x250 net/core/skbuff.c:644 [<000000002c3e4e0b>] alloc_skb include/linux/skbuff.h:1288 [inline] [<000000002c3e4e0b>] alloc_skb_with_frags+0x6f/0x350 net/core/skbuff.c:6378 [<00000000825f98d7>] sock_alloc_send_pskb+0x3ac/0x3e0 net/core/sock.c:2729 [<00000000e9eb3df3>] tun_alloc_skb drivers/net/tun.c:1529 [inline] [<00000000e9eb3df3>] tun_get_user+0x5e1/0x1f90 drivers/net/tun.c:1841 [<0000000053096912>] tun_chr_write_iter+0xac/0x120 drivers/net/tun.c:2035 [<00000000b9282ae0>] call_write_iter include/linux/fs.h:1868 [inline] [<00000000b9282ae0>] new_sync_write fs/read_write.c:491 [inline] [<00000000b9282ae0>] vfs_write+0x40f/0x530 fs/read_write.c:584 [<00000000524566e4>] ksys_write+0xa1/0x170 fs/read_write.c:637 [<00000000a7d2182d>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<00000000a7d2182d>] do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80 [<000000004cc4e8c4>] entry_SYSCALL_64_after_hwframe+0x72/0xdc Fixes: cde8b15f1aab ("tuntap: add ioctl to attach or detach a file form tuntap device") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-16Merge tag 'ipsec-2023-05-16' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2023-05-16 1) Don't check the policy default if we have an allow policy. Fix from Sabrina Dubroca. 2) Fix netdevice refount usage on offload. From Leon Romanovsky. 3) Use netdev_put instead of dev_puti to correctly release the netdev on failure in xfrm_dev_policy_add. From Leon Romanovsky. 4) Revert "Fix XFRM-I support for nested ESP tunnels" This broke Netfilter policy matching. From Martin Willi. 5) Reject optional tunnel/BEET mode templates in outbound policies on netlink and pfkey sockets. From Tobias Brunner. 6) Check if_id in inbound policy/secpath match to make it symetric to the outbound codepath. From Benedict Wong. * tag 'ipsec-2023-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Check if_id in inbound policy/secpath match af_key: Reject optional tunnel/BEET mode templates in outbound policies xfrm: Reject optional tunnel/BEET mode templates in outbound policies Revert "Fix XFRM-I support for nested ESP tunnels" xfrm: Fix leak of dev tracker xfrm: release all offloaded policy memory xfrm: don't check the default policy if the policy allows the packet ==================== Link: https://lore.kernel.org/r/20230516052405.2677554-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-16Merge tag 'linux-can-fixes-for-6.4-20230515' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2023-05-15 The first 2 patches are by Oliver Hartkopp and allow the MSG_CMSG_COMPAT flag for isotp and j1939. The next patch is by Oliver Hartkopp, too and adds missing CAN XL support in can_put_echo_skb(). Geert Uytterhoeven's patch let's the bxcan driver depend on ARCH_STM32. The last 5 patches are from Dario Binacchi and also affect the bxcan driver. The bxcan driver hit mainline with v6.4-rc1 and was originally written for IP cores containing 2 CAN interfaces with shared resources. Dario's series updates the DT bindings and driver to support IP cores with a single CAN interface instance as well as adding the bxcan to the stm32f746's device tree. * tag 'linux-can-fixes-for-6.4-20230515' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: ARM: dts: stm32: add CAN support on stm32f746 can: bxcan: add support for single peripheral configuration ARM: dts: stm32: add pin map for CAN controller on stm32f7 ARM: dts: stm32f429: put can2 in secondary mode dt-bindings: net: can: add "st,can-secondary" property can: CAN_BXCAN should depend on ARCH_STM32 can: dev: fix missing CAN XL support in can_put_echo_skb() can: j1939: recvmsg(): allow MSG_CMSG_COMPAT flag can: isotp: recvmsg(): allow MSG_CMSG_COMPAT flag ==================== Link: https://lore.kernel.org/r/20230515204722.1000957-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-16devlink: Fix crash with CONFIG_NET_NS=nIdo Schimmel
'__net_initdata' becomes a no-op with CONFIG_NET_NS=y, but when this option is disabled it becomes '__initdata', which means the data can be freed after the initialization phase. This annotation is obviously incorrect for the devlink net device notifier block which is still registered after the initialization phase [1]. Fix this crash by removing the '__net_initdata' annotation. [1] general protection fault, probably for non-canonical address 0xcccccccccccccccc: 0000 [#1] PREEMPT SMP CPU: 3 PID: 117 Comm: (udev-worker) Not tainted 6.4.0-rc1-custom-gdf0acdc59b09 #64 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 RIP: 0010:notifier_call_chain+0x58/0xc0 [...] Call Trace: <TASK> dev_set_mac_address+0x85/0x120 dev_set_mac_address_user+0x30/0x50 do_setlink+0x219/0x1270 rtnl_setlink+0xf7/0x1a0 rtnetlink_rcv_msg+0x142/0x390 netlink_rcv_skb+0x58/0x100 netlink_unicast+0x188/0x270 netlink_sendmsg+0x214/0x470 __sys_sendto+0x12f/0x1a0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: e93c9378e33f ("devlink: change per-devlink netdev notifier to static one") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/netdev/600ddf9e-589a-2aa0-7b69-a438f833ca10@samsung.com/ Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230515162925.1144416-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-16net: bcmgenet: Restore phy_stop() depending upon suspend/closeFlorian Fainelli
Removing the phy_stop() from bcmgenet_netif_stop() ended up causing warnings from the PHY library that phy_start() is called from the RUNNING state since we are no longer stopping the PHY state machine during bcmgenet_suspend(). Restore the call to phy_stop() but make it conditional on being called from the close or suspend path. Fixes: c96e731c93ff ("net: bcmgenet: connect and disconnect from the PHY state machine") Fixes: 93e0401e0fc0 ("net: bcmgenet: Remove phy_stop() from bcmgenet_netif_stop()") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230515025608.2587012-1-f.fainelli@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-15Merge patch series "can: bxcan: add support for single peripheral configuration"Marc Kleine-Budde
Dario Binacchi <dario.binacchi@amarulasolutions.com> says: The series adds support for managing bxCAN controllers in single peripheral configuration. Unlike stm32f4 SOCs, where bxCAN controllers are only in dual peripheral configuration, stm32f7 SOCs contain three CAN peripherals, CAN1 and CAN2 in dual peripheral configuration and CAN3 in single peripheral configuration: - Dual CAN peripheral configuration: * CAN1: Primary bxCAN for managing the communication between a secondary bxCAN and the 512-byte SRAM memory. * CAN2: Secondary bxCAN with no direct access to the SRAM memory. This means that the two bxCAN cells share the 512-byte SRAM memory and CAN2 can't be used without enabling CAN1. - Single CAN peripheral configuration: * CAN3: Primary bxCAN with dedicated Memory Access Controller unit and 512-byte SRAM memory. The driver has been tested on the stm32f769i-discovery board with a kernel version 5.19.0-rc2 in loopback + silent mode: | ip link set can[0-2] type can bitrate 125000 loopback on listen-only on | ip link set up can[0-2] | candump can[0-2] -L & | cansend can[0-2] 300#AC.AB.AD.AE.75.49.AD.D1 Changes in v2: - s/fiter/filter/ in the commit message - Replace struct bxcan_mb::primary with struct bxcan_mb::cfg. - Move after the patch "can: bxcan: add support for single peripheral configuration". - Add node gcan3. - Rename gcan as gcan1. - Add property "st,can-secondary" to can2 node. - Drop patch "dt-bindings: mfd: stm32f7: add binding definition for CAN3" because it has been accepted. - Add patch "ARM: dts: stm32f429: put can2 in secondary mode". - Add patch "dt-bindings: net: can: add "st,can-secondary" property". v1: https://lore.kernel.org/all/20230423172528.1398158-1-dario.binacchi@amarulasolutions.com Link: https://lore.kernel.org/all/20230427204540.3126234-1-dario.binacchi@amarulasolutions.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15ARM: dts: stm32: add CAN support on stm32f746Dario Binacchi
Add support for bxcan (Basic eXtended CAN controller) to STM32F746. The chip contains three CAN peripherals, CAN1 and CAN2 in dual peripheral configuration and CAN3 in single peripheral configuration: - Dual CAN peripheral configuration: * CAN1: Primary bxCAN for managing the communication between a secondary bxCAN and the 512-byte SRAM memory. * CAN2: Secondary bxCAN with no direct access to the SRAM memory. This means that the two bxCAN cells share the 512-byte SRAM memory and CAN2 can't be used without enabling CAN1. - Single CAN peripheral configuration: * CAN3: Primary bxCAN with dedicated Memory Access Controller unit and 512-byte SRAM memory. ------------------------------------------------------------------------- | features | CAN1 | CAN2 | CAN 3 | ------------------------------------------------------------------------- | SRAM | 512-byte shared between CAN1 & CAN2 | 512-byte | ------------------------------------------------------------------------- | Filters | 26 filters shared between CAN1 & CAN2 | 14 filters | ------------------------------------------------------------------------- Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Link: https://lore.kernel.org/all/20230427204540.3126234-6-dario.binacchi@amarulasolutions.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15can: bxcan: add support for single peripheral configurationDario Binacchi
Add support for bxCAN controller in single peripheral configuration: - primary bxCAN - dedicated Memory Access Controller unit - 512-byte SRAM memory - 14 filter banks Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Link: https://lore.kernel.org/all/20230427204540.3126234-5-dario.binacchi@amarulasolutions.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15ARM: dts: stm32: add pin map for CAN controller on stm32f7Dario Binacchi
Add pin configurations for using CAN controller on stm32f7. Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Link: https://lore.kernel.org/all/20230427204540.3126234-4-dario.binacchi@amarulasolutions.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15ARM: dts: stm32f429: put can2 in secondary modeDario Binacchi
This is a preparation patch for the upcoming support to manage CAN peripherals in single configuration. The addition ensures backwards compatibility. Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Link: https://lore.kernel.org/all/20230427204540.3126234-3-dario.binacchi@amarulasolutions.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15dt-bindings: net: can: add "st,can-secondary" propertyDario Binacchi
On the stm32f7 Socs the can peripheral can be in single or dual configuration. In the dual configuration, in turn, it can be in primary or secondary mode. The addition of the 'st,can-secondary' property allows you to specify this mode in the dual configuration. CAN peripheral nodes in single configuration contain neither "st,can-primary" nor "st,can-secondary". Signed-off-by: Dario Binacchi <dario.binacchi@amarulasolutions.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/all/20230427204540.3126234-2-dario.binacchi@amarulasolutions.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15can: CAN_BXCAN should depend on ARCH_STM32Geert Uytterhoeven
The STMicroelectronics STM32 basic extended CAN Controller (bxCAN) is only present on STM32 SoCs. Hence drop the "|| OF" part from its dependency rule, to prevent asking the user about this driver when configuring a kernel without STM32 SoC support. Fixes: f00647d8127be4d3 ("can: bxcan: add support for ST bxCAN controller") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/all/40095112efd1b2214e4223109fd9f0c6d0158a2d.1680609318.git.geert+renesas@glider.be Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15can: dev: fix missing CAN XL support in can_put_echo_skb()Oliver Hartkopp
can_put_echo_skb() checks for the enabled IFF_ECHO flag and the correct ETH_P type of the given skbuff. When implementing the CAN XL support the new check for ETH_P_CANXL has been forgotten. Fixes: fb08cba12b52 ("can: canxl: update CAN infrastructure for CAN XL frames") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230506184515.39241-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15can: j1939: recvmsg(): allow MSG_CMSG_COMPAT flagOliver Hartkopp
The control message provided by J1939 support MSG_CMSG_COMPAT but blocked recvmsg() syscalls that have set this flag, i.e. on 32bit user space on 64 bit kernels. Link: https://github.com/hartkopp/can-isotp/issues/59 Cc: Oleksij Rempel <o.rempel@pengutronix.de> Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Link: https://lore.kernel.org/20230505110308.81087-3-mkl@pengutronix.de Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15can: isotp: recvmsg(): allow MSG_CMSG_COMPAT flagOliver Hartkopp
The control message provided by isotp support MSG_CMSG_COMPAT but blocked recvmsg() syscalls that have set this flag, i.e. on 32bit user space on 64 bit kernels. Link: https://github.com/hartkopp/can-isotp/issues/59 Cc: Oleksij Rempel <o.rempel@pengutronix.de> Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Fixes: 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when reading from socket") Link: https://lore.kernel.org/20230505110308.81087-2-mkl@pengutronix.de Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-05-15net: phylink: fix ksettings_set() ethtool callRussell King (Oracle)
While testing a Fiberstore SFP-10G-T module (which uses 10GBASE-R with rate adaption) in a Clearfog platform (which can't do that) it was found that the PHYs advertisement was not limited according to the hosts capabilities when using ethtool to change it. Fix this by ensuring that we mask the advertisement with the computed support mask as the very first thing we do. Fixes: cbc1bb1e4689 ("net: phylink: simplify phy case for ksettings_set method") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-15Merge branch 'tipc-fixes'David S. Miller
Xin Long says: ==================== tipc: fix the mtu update in link mtu negotiation This patchset fixes a crash caused by a too small MTU carried in the activate msg. Note that as such malicious packet does not exist in the normal env, the fix won't break any application The 1st patch introduces a function to calculate the minimum MTU for the bearer, and the 2nd patch fixes the crash with this helper. While at it, the 3rd patch fixes the udp bearer mtu update by netlink with this helper. ==================== Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-15tipc: check the bearer min mtu properly when setting it by netlinkXin Long
Checking the bearer min mtu with tipc_udp_mtu_bad() only works for IPv4 UDP bearer, and IPv6 UDP bearer has a different value for the min mtu. This patch checks with encap_hlen + TIPC_MIN_BEARER_MTU for min mtu, which works for both IPv4 and IPv6 UDP bearer. Note that tipc_udp_mtu_bad() is still used to check media min mtu in __tipc_nl_media_set(), as m->mtu currently is only used by the IPv4 UDP bearer as its default mtu value. Fixes: 682cd3cf946b ("tipc: confgiure and apply UDP bearer MTU on running links") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-15tipc: do not update mtu if msg_max is too small in mtu negotiationXin Long
When doing link mtu negotiation, a malicious peer may send Activate msg with a very small mtu, e.g. 4 in Shuang's testing, without checking for the minimum mtu, l->mtu will be set to 4 in tipc_link_proto_rcv(), then n->links[bearer_id].mtu is set to 4294967228, which is a overflow of '4 - INT_H_SIZE - EMSG_OVERHEAD' in tipc_link_mss(). With tipc_link.mtu = 4, tipc_link_xmit() kept printing the warning: tipc: Too large msg, purging xmit list 1 5 0 40 4! tipc: Too large msg, purging xmit list 1 15 0 60 4! And with tipc_link_entry.mtu 4294967228, a huge skb was allocated in named_distribute(), and when purging it in tipc_link_xmit(), a crash was even caused: general protection fault, probably for non-canonical address 0x2100001011000dd: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.3.0.neta #19 RIP: 0010:kfree_skb_list_reason+0x7e/0x1f0 Call Trace: <IRQ> skb_release_data+0xf9/0x1d0 kfree_skb_reason+0x40/0x100 tipc_link_xmit+0x57a/0x740 [tipc] tipc_node_xmit+0x16c/0x5c0 [tipc] tipc_named_node_up+0x27f/0x2c0 [tipc] tipc_node_write_unlock+0x149/0x170 [tipc] tipc_rcv+0x608/0x740 [tipc] tipc_udp_recv+0xdc/0x1f0 [tipc] udp_queue_rcv_one_skb+0x33e/0x620 udp_unicast_rcv_skb.isra.72+0x75/0x90 __udp4_lib_rcv+0x56d/0xc20 ip_protocol_deliver_rcu+0x100/0x2d0 This patch fixes it by checking the new mtu against tipc_bearer_min_mtu(), and not updating mtu if it is too small. Fixes: ed193ece2649 ("tipc: simplify link mtu negotiation") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-15tipc: add tipc_bearer_min_mtu to calculate min mtuXin Long
As different media may requires different min mtu, and even the same media with different net family requires different min mtu, add tipc_bearer_min_mtu() to calculate min mtu accordingly. This API will be used to check the new mtu when doing the link mtu negotiation in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-15net: mdio: i2c: fix rollball accessorsRussell King (Oracle)
Commit 87e3bee0f247 ("net: mdio: i2c: Separate C22 and C45 transactions") separated the non-rollball bus accessors, but left the rollball accessors as is. As rollball accessors are clause 45, this results in the rollball protocol being completely non-functional. Fix this. Fixes: 87e3bee0f247 ("net: mdio: i2c: Separate C22 and C45 transactions") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-15virtio_net: Fix error unwinding of XDP initializationFeng Liu
When initializing XDP in virtnet_open(), some rq xdp initialization may hit an error causing net device open failed. However, previous rqs have already initialized XDP and enabled NAPI, which is not the expected behavior. Need to roll back the previous rq initialization to avoid leaks in error unwinding of init code. Also extract helper functions of disable and enable queue pairs. Use newly introduced disable helper function in error unwinding and virtnet_close. Use enable helper function in virtnet_open. Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-15net: fec: remove the xdp_return_frame when lack of tx BDsShenwei Wang
In the implementation, the sent_frame count does not increment when transmit errors occur. Therefore, bq_xmit_all() will take care of returning the XDP frames. Fixes: 26312c685ae0 ("net: fec: correct the counting of XDP sent frames") Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-15net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment()Dong Chenchen
As the call trace shows, skb_panic was caused by wrong skb->mac_header in nsh_gso_segment(): invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 3 PID: 2737 Comm: syz Not tainted 6.3.0-next-20230505 #1 RIP: 0010:skb_panic+0xda/0xe0 call Trace: skb_push+0x91/0xa0 nsh_gso_segment+0x4f3/0x570 skb_mac_gso_segment+0x19e/0x270 __skb_gso_segment+0x1e8/0x3c0 validate_xmit_skb+0x452/0x890 validate_xmit_skb_list+0x99/0xd0 sch_direct_xmit+0x294/0x7c0 __dev_queue_xmit+0x16f0/0x1d70 packet_xmit+0x185/0x210 packet_snd+0xc15/0x1170 packet_sendmsg+0x7b/0xa0 sock_sendmsg+0x14f/0x160 The root cause is: nsh_gso_segment() use skb->network_header - nhoff to reset mac_header in skb_gso_error_unwind() if inner-layer protocol gso fails. However, skb->network_header may be reset by inner-layer protocol gso function e.g. mpls_gso_segment. skb->mac_header reset by the inaccurate network_header will be larger than skb headroom. nsh_gso_segment nhoff = skb->network_header - skb->mac_header; __skb_pull(skb,nsh_len) skb_mac_gso_segment mpls_gso_segment skb_reset_network_header(skb);//skb->network_header+=nsh_len return -EINVAL; skb_gso_error_unwind skb_push(skb, nsh_len); skb->mac_header = skb->network_header - nhoff; // skb->mac_header > skb->headroom, cause skb_push panic Use correct mac_offset to restore mac_header and get rid of nhoff. Fixes: c411ed854584 ("nsh: add GSO support") Reported-by: syzbot+632b5d9964208bfef8c0@syzkaller.appspotmail.com Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13Merge branch 'hns3-fixes'David S. Miller
Hao Lan says: ==================== net: hns3: fix some bug for hns3 There are some bugfixes for the HNS3 ethernet driver. patch#1 fix miss checking for rx packet. patch#2 fixes VF promisc mode not update when mac table full bug, and patch#3 fixes a nterrupts not initialization in VF FLR bug. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13net: hns3: fix reset timeout when enable full VFJijie Shao
The timeout of the cmdq reset command has been increased to resolve the reset timeout issue in the full VF scenario. The timeout of other cmdq commands remains unchanged. Fixes: 8d307f8e8cf1 ("net: hns3: create new set of unified hclge_comm_cmd_send APIs") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13net: hns3: fix reset delay time to avoid configuration timeoutJie Wang
Currently the hns3 vf function reset delays 5000ms before vf rebuild process. In product applications, this delay is too long for application configurations and causes configuration timeout. According to the tests, 500ms delay is enough for reset process except PF FLR. So this patch modifies delay to 500ms in these scenarios. Fixes: 6988eb2a9b77 ("net: hns3: Add support to reset the enet/ring mgmt layer") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13net: hns3: fix sending pfc frames after reset issueJijie Shao
To prevent the system from abnormally sending PFC frames after an abnormal reset. The hns3 driver notifies the firmware to disable pfc before reset. Fixes: 35d93a30040c ("net: hns3: adjust the process of PF reset") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13net: hns3: fix output information incomplete for dumping tx queue info with ↵Jie Wang
debugfs In function hns3_dump_tx_queue_info, The print buffer is not enough when the tx BD number is configured to 32760. As a result several BD information wouldn't be displayed. So fix it by increasing the tx queue print buffer length. Fixes: 630a6738da82 ("net: hns3: adjust string spaces of some parameters of tx bd info in debugfs") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Hao Lan <lanhao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13Merge branch 'dsa-rzn1-a5psw-stp'David S. Miller
Alexis Lothoré says: ==================== net: dsa: rzn1-a5psw: fix STP states handling This small series fixes STP support and while adding a new function to enable/disable learning, use that to disable learning on standalone ports at switch setup as reported by Vladimir Oltean. This series was initially submitted on net-next by Clement Leger, but some career evolutions has made him hand me over those topics. Also, this new revision is submitted on net instead of net-next for V1 based on Vladimir Oltean's suggestion Changes since v2: - fix commit split by moving A5PSW_MGMT_CFG_ENABLE in relevant commit - fix reverse christmas tree ordering in a5psw_port_stp_state_set Changes since v1: - fix typos in commit messages and doc - re-split STP states handling commit - add Fixes: tag and new Signed-off-by - submit series as fix on net instead of net-next - split learning and blocking setting functions - remove unused define A5PSW_PORT_ENA_TX_SHIFT - add boolean for tx/rx enabled for clarity ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13net: dsa: rzn1-a5psw: disable learning for standalone portsClément Léger
When ports are in standalone mode, they should have learning disabled to avoid adding new entries in the MAC lookup table which might be used by other bridge ports to forward packets. While adding that, also make sure learning is enabled for CPU port. Fixes: 888cdb892b61 ("net: dsa: rzn1-a5psw: add Renesas RZ/N1 advanced 5 port switch driver") Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13net: dsa: rzn1-a5psw: fix STP states handlingAlexis Lothoré
stp_set_state() should actually allow receiving BPDU while in LEARNING mode which is not the case. Additionally, the BLOCKEN bit does not actually forbid sending forwarded frames from that port. To fix this, add a5psw_port_tx_enable() function which allows to disable TX. However, while its name suggest that TX is totally disabled, it is not and can still allow to send BPDUs even if disabled. This can be done by using forced forwarding with the switch tagging mechanism but keeping "filtering" disabled (which is already the case in the rzn1-a5sw tag driver). With these fixes, STP support is now functional. Fixes: 888cdb892b61 ("net: dsa: rzn1-a5psw: add Renesas RZ/N1 advanced 5 port switch driver") Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13net: dsa: rzn1-a5psw: enable management frames for CPU portClément Léger
Currently, management frame were discarded before reaching the CPU port due to a misconfiguration of the MGMT_CONFIG register. Enable them by setting the correct value in this register in order to correctly receive management frame and handle STP. Fixes: 888cdb892b61 ("net: dsa: rzn1-a5psw: add Renesas RZ/N1 advanced 5 port switch driver") Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-13erspan: get the proto with the md version for collect_mdXin Long
In commit 20704bd1633d ("erspan: build the header with the right proto according to erspan_ver"), it gets the proto with t->parms.erspan_ver, but t->parms.erspan_ver is not used by collect_md branch, and instead it should get the proto with md->version for collect_md. Thanks to Kevin for pointing this out. Fixes: 20704bd1633d ("erspan: build the header with the right proto according to erspan_ver") Fixes: 94d7d8f29287 ("ip6_gre: add erspan v2 support") Reported-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-12tcp: fix possible sk_priority leak in tcp_v4_send_reset()Eric Dumazet
When tcp_v4_send_reset() is called with @sk == NULL, we do not change ctl_sk->sk_priority, which could have been set from a prior invocation. Change tcp_v4_send_reset() to set sk_priority and sk_mark fields before calling ip_send_unicast_reply(). This means tcp_v4_send_reset() and tcp_v4_send_ack() no longer have to clear ctl_sk->sk_mark after their call to ip_send_unicast_reply(). Fixes: f6c0f5d209fa ("tcp: honor SO_PRIORITY in TIME_WAIT state") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-12vsock: avoid to close connected socket after the timeoutZhuang Shengen
When client and server establish a connection through vsock, the client send a request to the server to initiate the connection, then start a timer to wait for the server's response. When the server's RESPONSE message arrives, the timer also times out and exits. The server's RESPONSE message is processed first, and the connection is established. However, the client's timer also times out, the original processing logic of the client is to directly set the state of this vsock to CLOSE and return ETIMEDOUT. It will not notify the server when the port is released, causing the server port remain. when client's vsock_connect timeout,it should check sk state is ESTABLISHED or not. if sk state is ESTABLISHED, it means the connection is established, the client should not set the sk state to CLOSE Note: I encountered this issue on kernel-4.18, which can be fixed by this patch. Then I checked the latest code in the community and found similar issue. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Zhuang Shengen <zhuangshengen@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-12sfc: disable RXFCS and RXALL features by defaultPieter Jansen van Vuuren
By default we would not want RXFCS and RXALL features enabled as they are mainly intended for debugging purposes. This does not stop users from enabling them later on as needed. Fixes: 8e57daf70671 ("sfc_ef100: RX path for EF100") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-12ice: Fix undersized tx_flags variableJan Sokolowski
As not all ICE_TX_FLAGS_* fit in current 16-bit limited tx_flags field that was introduced in the Fixes commit, VLAN-related information would be discarded completely. As such, creating a vlan and trying to run ping through would result in no traffic passing. Fix that by refactoring tx_flags variable into flags only and a separate variable that holds VLAN ID. As there is some space left, type variable can fit between those two. Pahole reports no size change to ice_tx_buf struct. Fixes: aa1d3faf71a6 ("ice: Robustify cleaning/completing XDP Tx buffers") Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-12MAINTAINERS: exclude wireless drivers from netdevJakub Kicinski
It seems that we mostly get netdev CCed on wireless patches which are written by people who don't know any better and CC everything that get_maintainers spits out. Rather than patches which indeed could benefit from general networking review. Marking them down in patchwork as Awaiting Upstream is a bit tedious. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-12nfp: fix NFP_NET_MAX_DSCP definition errorHuayu Chen
The patch corrects the NFP_NET_MAX_DSCP definition in the main.h file. The incorrect definition result DSCP bits not being mapped properly when DCB is set. When NFP_NET_MAX_DSCP was defined as 4, the next 60 DSCP bits failed to be set. Fixes: 9b7fe8046d74 ("nfp: add DCB IEEE support") Cc: stable@vger.kernel.org Signed-off-by: Huayu Chen <huayu.chen@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-12MAINTAINERS: don't CC docs@ for netlink spec changesJakub Kicinski
Documentation/netlink/ contains machine-readable protocol specs in YAML. Those are much like device tree bindings, no point CCing docs@ for the changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com>
2023-05-12MAINTAINERS: sctp: move Neil to CREDITSMarcelo Ricardo Leitner
Neil moved away from SCTP related duties. Move him to CREDITS then and while at it, update SCTP project website. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-12net: phy: dp83867: add w/a for packet errors seen with short cablesGrygorii Strashko
Introduce the W/A for packet errors seen with short cables (<1m) between two DP83867 PHYs. The W/A recommended by DM requires FFE Equalizer Configuration tuning by writing value 0x0E81 to DSP_FFE_CFG register (0x012C), surrounded by hard and soft resets as follows: write_reg(0x001F, 0x8000); //hard reset write_reg(DSP_FFE_CFG, 0x0E81); write_reg(0x001F, 0x4000); //soft reset Since DP83867 PHY DM says "Changing this register to 0x0E81, will not affect Long Cable performance.", enable the W/A by default. Fixes: 2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-11net: fec: Better handle pm_runtime_get() failing in .remove()Uwe Kleine-König
In the (unlikely) event that pm_runtime_get() (disguised as pm_runtime_resume_and_get()) fails, the remove callback returned an error early. The problem with this is that the driver core ignores the error value and continues removing the device. This results in a resource leak. Worse the devm allocated resources are freed and so if a callback of the driver is called later the register mapping is already gone which probably results in a crash. Fixes: a31eda65ba21 ("net: fec: fix clock count mis-match") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230510200020.1534610-1-u.kleine-koenig@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-11ipv6: remove nexthop_fib6_nh_bh()Eric Dumazet
After blamed commit, nexthop_fib6_nh_bh() and nexthop_fib6_nh() are the same. Delete nexthop_fib6_nh_bh(), and convert /proc/net/ipv6_route to standard rcu to avoid this splat: [ 5723.180080] WARNING: suspicious RCU usage [ 5723.180083] ----------------------------- [ 5723.180084] include/net/nexthop.h:516 suspicious rcu_dereference_check() usage! [ 5723.180086] other info that might help us debug this: [ 5723.180087] rcu_scheduler_active = 2, debug_locks = 1 [ 5723.180089] 2 locks held by cat/55856: [ 5723.180091] #0: ffff9440a582afa8 (&p->lock){+.+.}-{3:3}, at: seq_read_iter (fs/seq_file.c:188) [ 5723.180100] #1: ffffffffaac07040 (rcu_read_lock_bh){....}-{1:2}, at: rcu_lock_acquire (include/linux/rcupdate.h:326) [ 5723.180109] stack backtrace: [ 5723.180111] CPU: 14 PID: 55856 Comm: cat Tainted: G S I 6.3.0-dbx-DEV #528 [ 5723.180115] Call Trace: [ 5723.180117] <TASK> [ 5723.180119] dump_stack_lvl (lib/dump_stack.c:107) [ 5723.180124] dump_stack (lib/dump_stack.c:114) [ 5723.180126] lockdep_rcu_suspicious (include/linux/context_tracking.h:122) [ 5723.180132] ipv6_route_seq_show (include/net/nexthop.h:?) [ 5723.180135] ? ipv6_route_seq_next (net/ipv6/ip6_fib.c:2605) [ 5723.180140] seq_read_iter (fs/seq_file.c:272) [ 5723.180145] seq_read (fs/seq_file.c:163) [ 5723.180151] proc_reg_read (fs/proc/inode.c:316 fs/proc/inode.c:328) [ 5723.180155] vfs_read (fs/read_write.c:468) [ 5723.180160] ? up_read (kernel/locking/rwsem.c:1617) [ 5723.180164] ksys_read (fs/read_write.c:613) [ 5723.180168] __x64_sys_read (fs/read_write.c:621) [ 5723.180170] do_syscall_64 (arch/x86/entry/common.c:?) [ 5723.180174] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 5723.180177] RIP: 0033:0x7fa455677d2a Fixes: 09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230510154646.370659-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-11devlink: change per-devlink netdev notifier to static oneJiri Pirko
The commit 565b4824c39f ("devlink: change port event netdev notifier from per-net to global") changed original per-net notifier to be per-devlink instance. That fixed the issue of non-receiving events of netdev uninit if that moved to a different namespace. That worked fine in -net tree. However, later on when commit ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device") and commit 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") were merged, a deadlock was introduced when removing a namespace with devlink instance with another nested instance. Here there is the bad flow example resulting in deadlock with mlx5: net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) -> devlink_pernet_pre_exit() -> devlink_reload() -> mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() -> mlx5_detach_device() -> del_adev() -> mlx5e_remove() -> mlx5e_destroy_devlink() -> devlink_free() -> unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem) Steps to reproduce: $ modprobe mlx5_core $ ip netns add ns1 $ devlink dev reload pci/0000:08:00.0 netns ns1 $ ip netns del ns1 Resolve this by converting the notifier from per-devlink instance to a static one registered during init phase and leaving it registered forever. Use this notifier for all devlink port instances created later on. Note what a tree needs this fix only in case all of the cited fixes commits are present. Reported-by: Moshe Shemesh <moshe@nvidia.com> Fixes: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global") Fixes: ee75f1fc44dd ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device") Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230510144621.932017-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-11Merge branch 'selftests-seg6-make-srv6_end_dt4_l3vpn_test-more-robust'Jakub Kicinski
Andrea Mayer says: ==================== selftests: seg6: make srv6_end_dt4_l3vpn_test more robust This pachset aims to improve and make more robust the selftests performed to check whether SRv6 End.DT4 beahvior works as expected under different system configurations. Some Linux distributions enable Deduplication Address Detection and Reverse Path Filtering mechanisms by default which can interfere with SRv6 End.DT4 behavior and cause selftests to fail. The following patches improve selftests for End.DT4 by taking these two mechanisms into account. Specifically: - patch 1/2: selftests: seg6: disable DAD on IPv6 router cfg for srv6_end_dt4_l3vpn_test - patch 2/2: selftets: seg6: disable rp_filter by default in srv6_end_dt4_l3vpn_test ==================== Link: https://lore.kernel.org/r/20230510111638.12408-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-11selftets: seg6: disable rp_filter by default in srv6_end_dt4_l3vpn_testAndrea Mayer
On some distributions, the rp_filter is automatically set (=1) by default on a netdev basis (also on VRFs). In an SRv6 End.DT4 behavior, decapsulated IPv4 packets are routed using the table associated with the VRF bound to that tunnel. During lookup operations, the rp_filter can lead to packet loss when activated on the VRF. Therefore, we chose to make this selftest more robust by explicitly disabling the rp_filter during tests (as it is automatically set by some Linux distributions). Fixes: 2195444e09b4 ("selftests: add selftest for the SRv6 End.DT4 behavior") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Tested-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-11selftests: seg6: disable DAD on IPv6 router cfg for srv6_end_dt4_l3vpn_testAndrea Mayer
The srv6_end_dt4_l3vpn_test instantiates a virtual network consisting of several routers (rt-1, rt-2) and hosts. When the IPv6 addresses of rt-{1,2} routers are configured, the Deduplicate Address Detection (DAD) kicks in when enabled in the Linux distros running the selftests. DAD is used to check whether an IPv6 address is already assigned in a network. Such a mechanism consists of sending an ICMPv6 Echo Request and waiting for a reply. As the DAD process could take too long to complete, it may cause the failing of some tests carried out by the srv6_end_dt4_l3vpn_test script. To make the srv6_end_dt4_l3vpn_test more robust, we disable DAD on routers since we configure the virtual network manually and do not need any address deduplication mechanism at all. Fixes: 2195444e09b4 ("selftests: add selftest for the SRv6 End.DT4 behavior") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-11Merge tag 'net-6.4-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Current release - regressions: - mtk_eth_soc: fix NULL pointer dereference Previous releases - regressions: - core: - skb_partial_csum_set() fix against transport header magic value - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs(). - annotate sk->sk_err write from do_recvmmsg() - add vlan_get_protocol_and_depth() helper - netlink: annotate accesses to nlk->cb_running - netfilter: always release netdev hooks from notifier Previous releases - always broken: - core: deal with most data-races in sk_wait_event() - netfilter: fix possible bug_on with enable_hooks=1 - eth: bonding: fix send_peer_notif overflow - eth: xpcs: fix incorrect number of interfaces - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register" * tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits) af_unix: Fix data races around sk->sk_shutdown. af_unix: Fix a data race of sk->sk_receive_queue->qlen. net: datagram: fix data-races in datagram_poll() net: mscc: ocelot: fix stat counter register values ipvlan:Fix out-of-bounds caused by unclear skb->cb docs: networking: fix x25-iface.rst heading & index order gve: Remove the code of clearing PBA bit tcp: add annotations around sk->sk_shutdown accesses net: add vlan_get_protocol_and_depth() helper net: pcs: xpcs: fix incorrect number of interfaces net: deal with most data-races in sk_wait_event() net: annotate sk->sk_err write from do_recvmmsg() netlink: annotate accesses to nlk->cb_running kselftest: bonding: add num_grat_arp test selftests: forwarding: lib: add netns support for tc rule handle stats get Documentation: bonding: fix the doc of peer_notif_delay bonding: fix send_peer_notif overflow net: ethernet: mtk_eth_soc: fix NULL pointer dereference selftests: nft_flowtable.sh: check ingress/egress chain too selftests: nft_flowtable.sh: monitor result file sizes ...