summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2025-01-15net: add netdev->up protected by netdev_lock()Jakub Kicinski
Some uAPI (netdev netlink) hide net_device's sub-objects while the interface is down to ensure uniform behavior across drivers. To remove the rtnl_lock dependency from those uAPIs we need a way to safely tell if the device is down or up. Add an indication of whether device is open or closed, protected by netdev->lock. The semantics are the same as IFF_UP, but taking netdev_lock around every write to ->flags would be a lot of code churn. We don't want to blanket the entire open / close path by netdev_lock, because it will prevent us from applying it to specific structures - core helpers won't be able to take that lock from any function called by the drivers on open/close paths. So the state of the flag is "pessimistic", as in it may report false negatives, but never false positives. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: add helpers for lookup and walking netdevs under netdev_lock()Jakub Kicinski
Add helpers for accessing netdevs under netdev_lock(). There's some careful handling needed to find the device and lock it safely, without it getting unregistered, and without taking rtnl_lock (the latter being the whole point of the new locking, after all). Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: make netdev_lock() protect netdev->reg_stateJakub Kicinski
Protect writes to netdev->reg_state with netdev_lock(). From now on holding netdev_lock() is sufficient to prevent the net_device from getting unregistered, so code which wants to hold just a single netdev around no longer needs to hold rtnl_lock. We do not protect the NETREG_UNREGISTERED -> NETREG_RELEASED transition. We'd need to move mutex_destroy(netdev->lock) to .release, but the real reason is that trying to stop the unregistration process mid-way would be unsafe / crazy. Taking references on such devices is not safe, either. So the intended semantics are to lock REGISTERED devices. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: add netdev_lock() / netdev_unlock() helpersJakub Kicinski
Add helpers for locking the netdev instance, use it in drivers and the shaper code. This will make grepping for the lock usage much easier, as we extend the lock to cover more fields. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250115035319.559603-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15inet: ipmr: fix data-racesEric Dumazet
Following fields of 'struct mr_mfc' can be updated concurrently (no lock protection) from ip_mr_forward() and ip6_mr_forward() - bytes - pkt - wrong_if - lastuse They also can be read from other functions. Convert bytes, pkt and wrong_if to atomic_long_t, and use READ_ONCE()/WRITE_ONCE() for lastuse. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250114221049.1190631-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: disallow setup single buffer XDP when tcp-data-split is enabled.Taehee Yoo
When a single buffer XDP is attached, NIC should guarantee only single page packets will be received. tcp-data-split feature splits packets into header and payload. single buffer XDP can't handle it properly. So attaching single buffer XDP should be disallowed when tcp-data-split is enabled. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-6-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: ethtool: add ring parameter filteringTaehee Yoo
While the devmem is running, the tcp-data-split and hds-thresh configuration should not be changed. If user tries to change tcp-data-split and threshold value while the devmem is running, it fails and shows extack message. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-5-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: devmem: add ring parameter filteringTaehee Yoo
If driver doesn't support ring parameter or tcp-data-split configuration is not sufficient, the devmem should not be set up. Before setup the devmem, tcp-data-split should be ON and hds-thresh value should be 0. Tested-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-4-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: ethtool: add support for configuring hds-threshTaehee Yoo
The hds-thresh option configures the threshold value of the header-data-split. If a received packet size is larger than this threshold value, a packet will be split into header and payload. The header indicates TCP and UDP header, but it depends on driver spec. The bnxt_en driver supports HDS(Header-Data-Split) configuration at FW level, affecting TCP and UDP too. So, If hds-thresh is set, it affects UDP and TCP packets. Example: # ethtool -G <interface name> hds-thresh <value> # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... HDS thresh: 1023 Current hardware settings: ... TCP data split: on HDS thresh: 256 The default/min/max values are not defined in the ethtool so the drivers should define themself. The 0 value means that all TCP/UDP packets' header and payload will be split. Tested-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-3-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: ethtool: add hds_config member in ethtool_netdev_stateTaehee Yoo
When tcp-data-split is UNKNOWN mode, drivers arbitrarily handle it. For example, bnxt_en driver automatically enables if at least one of LRO/GRO/JUMBO is enabled. If tcp-data-split is UNKNOWN and LRO is enabled, a driver returns ENABLES of tcp-data-split, not UNKNOWN. So, `ethtool -g eth0` shows tcp-data-split is enabled. The problem is in the setting situation. In the ethnl_set_rings(), it first calls get_ringparam() to get the current driver's config. At that moment, if driver's tcp-data-split config is UNKNOWN, it returns ENABLE if LRO/GRO/JUMBO is enabled. Then, it sets values from the user and driver's current config to kernel_ethtool_ringparam. Last it calls .set_ringparam(). The driver, especially bnxt_en driver receives ETHTOOL_TCP_DATA_SPLIT_ENABLED. But it can't distinguish whether it is set by the user or just the current config. When user updates ring parameter, the new hds_config value is updated and current hds_config value is stored to old_hdsconfig. Driver's .set_ringparam() callback can distinguish a passed tcp-data-split value is came from user explicitly. If .set_ringparam() is failed, hds_config is rollbacked immediately. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-2-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15mptcp: fix for setting remote ipv4mapped addressGeliang Tang
Commit 1c670b39cec7 ("mptcp: change local addr type of subflow_destroy") introduced a bug in mptcp_pm_nl_subflow_destroy_doit(). ipv6_addr_set_v4mapped() should be called to set the remote ipv4 address 'addr_r.addr.s_addr' to the remote ipv6 address 'addr_r.addr6', not 'addr_l.addr.addr6', which is the local ipv6 address. Fixes: 1c670b39cec7 ("mptcp: change local addr type of subflow_destroy") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250114-net-next-mptcp-fix-remote-addr-v1-1-debcd84ea86f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15Bluetooth: MGMT: Fix slab-use-after-free Read in mgmt_remove_adv_monitor_syncMazin Al Haddad
This fixes the following crash: ================================================================== BUG: KASAN: slab-use-after-free in mgmt_remove_adv_monitor_sync+0x3a/0xd0 net/bluetooth/mgmt.c:5543 Read of size 8 at addr ffff88814128f898 by task kworker/u9:4/5961 CPU: 1 UID: 0 PID: 5961 Comm: kworker/u9:4 Not tainted 6.12.0-syzkaller-10684-gf1cd565ce577 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: hci0 hci_cmd_sync_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x169/0x550 mm/kasan/report.c:489 kasan_report+0x143/0x180 mm/kasan/report.c:602 mgmt_remove_adv_monitor_sync+0x3a/0xd0 net/bluetooth/mgmt.c:5543 hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:332 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Allocated by task 16026: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4314 kmalloc_noprof include/linux/slab.h:901 [inline] kzalloc_noprof include/linux/slab.h:1037 [inline] mgmt_pending_new+0x65/0x250 net/bluetooth/mgmt_util.c:269 mgmt_pending_add+0x36/0x120 net/bluetooth/mgmt_util.c:296 remove_adv_monitor+0x102/0x1b0 net/bluetooth/mgmt.c:5568 hci_mgmt_cmd+0xc47/0x11d0 net/bluetooth/hci_sock.c:1712 hci_sock_sendmsg+0x7b8/0x11c0 net/bluetooth/hci_sock.c:1832 sock_sendmsg_nosec net/socket.c:711 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:726 sock_write_iter+0x2d7/0x3f0 net/socket.c:1147 new_sync_write fs/read_write.c:586 [inline] vfs_write+0xaeb/0xd30 fs/read_write.c:679 ksys_write+0x18f/0x2b0 fs/read_write.c:731 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 16022: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2338 [inline] slab_free mm/slub.c:4598 [inline] kfree+0x196/0x420 mm/slub.c:4746 mgmt_pending_foreach+0xd1/0x130 net/bluetooth/mgmt_util.c:259 __mgmt_power_off+0x183/0x430 net/bluetooth/mgmt.c:9550 hci_dev_close_sync+0x6c4/0x11c0 net/bluetooth/hci_sync.c:5208 hci_dev_do_close net/bluetooth/hci_core.c:483 [inline] hci_dev_close+0x112/0x210 net/bluetooth/hci_core.c:508 sock_do_ioctl+0x158/0x460 net/socket.c:1209 sock_ioctl+0x626/0x8e0 net/socket.c:1328 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported-by: syzbot+479aff51bb361ef5aa18@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=479aff51bb361ef5aa18 Tested-by: syzbot+479aff51bb361ef5aa18@syzkaller.appspotmail.com Signed-off-by: Mazin Al Haddad <mazin@getstate.dev> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-15Bluetooth: Allow reset via sysfsHsin-chen Chuang
Allow sysfs to trigger hdev reset. This is required to recover devices that are not responsive from userspace. Signed-off-by: Hsin-chen Chuang <chharry@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-15Bluetooth: Get rid of cmd_timeout and use the reset callbackHsin-chen Chuang
The hdev->reset is never used now and the hdev->cmd_timeout actually does reset. This patch changes the call path from hdev->cmd_timeout -> vendor_cmd_timeout -> btusb_reset -> hdev->reset , to hdev->reset -> vendor_reset -> btusb_reset Which makes it clear when we export the hdev->reset to a wider usage e.g. allowing reset from sysfs. This patch doesn't introduce any behavior change. Signed-off-by: Hsin-chen Chuang <chharry@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-15Bluetooth: L2CAP: handle NULL sock pointer in l2cap_sock_allocFedor Pchelkin
A NULL sock pointer is passed into l2cap_sock_alloc() when it is called from l2cap_sock_new_connection_cb() and the error handling paths should also be aware of it. Seemingly a more elegant solution would be to swap bt_sock_alloc() and l2cap_chan_create() calls since they are not interdependent to that moment but then l2cap_chan_create() adds the soon to be deallocated and still dummy-initialized channel to the global list accessible by many L2CAP paths. The channel would be removed from the list in short period of time but be a bit more straight-forward here and just check for NULL instead of changing the order of function calls. Found by Linux Verification Center (linuxtesting.org) with SVACE static analysis tool. Fixes: 7c4f78cdb8e7 ("Bluetooth: L2CAP: do not leave dangling sk pointer on error in l2cap_sock_create()") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-15Bluetooth: hci: Remove deadcodeDr. David Alan Gilbert
hci_bdaddr_list_del_with_flags() was added in 2020's commit 8baaa4038edb ("Bluetooth: Add bdaddr_list_with_flags for classic whitelist") but has remained unused. hci_remove_ext_adv_instance() was added in 2020's commit eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") but has remained unused. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-15Bluetooth: MGMT: Mark LL Privacy as stableLuiz Augusto von Dentz
This marks LL Privacy as stable by removing its experimental UUID and move its functionality to Device Flag (HCI_CONN_FLAG_ADDRESS_RESOLUTION) which can be set by MGMT Device Set Flags so userspace retain control of the feature. Link: https://github.com/bluez/bluez/issues/1028 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-15wifi: cfg80211: adjust allocation of colocated AP dataDmitry Antipov
In 'cfg80211_scan_6ghz()', an instances of 'struct cfg80211_colocated_ap' are allocated as if they would have 'ssid' as trailing VLA member. Since this is not so, extra IEEE80211_MAX_SSID_LEN bytes are not needed. Briefly tested with KUnit. Fixes: c8cb5b854b40 ("nl80211/cfg80211: support 6 GHz scanning") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://patch.msgid.link/20250113155417.552587-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-15wifi: mac80211: fix memory leak in ieee80211_mgd_assoc_ml_reconf()Dan Carpenter
Free the "data" allocation before returning on this error path. Fixes: 36e05b0b8390 ("wifi: mac80211: Support dynamic link addition and removal") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/7ad826a7-7651-48e7-9589-7d2dc17417c2@stanley.mountain Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-15saner replacement for debugfs_rename()Al Viro
Existing primitive has several problems: 1) calling conventions are clumsy - it returns a dentry reference that is either identical to its second argument or is an ERR_PTR(-E...); in both cases no refcount changes happen. Inconvenient for users and bug-prone; it would be better to have it return 0 on success and -E... on failure. 2) it allows cross-directory moves; however, no such caller have ever materialized and considering the way debugfs is used, it's unlikely to happen in the future. What's more, any such caller would have fun issues to deal with wrt interplay with recursive removal. It also makes the calling conventions clumsier... 3) tautological rename fails; the callers have no race-free way to deal with that. 4) new name must have been formed by the caller; quite a few callers have it done by sprintf/kasprintf/etc., ending up with considerable boilerplate. Proposed replacement: int debugfs_change_name(dentry, fmt, ...). All callers convert to that easily, and it's simpler internally. IMO debugfs_rename() should go; if we ever get a real-world use case for cross-directory moves in debugfs, we can always look into the right way to handle that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20250112080705.141166-21-viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-14net: netpoll: ensure skb_pool list is always initializedJohn Sperbeck
When __netpoll_setup() is called directly, instead of through netpoll_setup(), the np->skb_pool list head isn't initialized. If skb_pool_flush() is later called, then we hit a NULL pointer in skb_queue_purge_reason(). This can be seen with this repro, when CONFIG_NETCONSOLE is enabled as a module: ip tuntap add mode tap tap0 ip link add name br0 type bridge ip link set dev tap0 master br0 modprobe netconsole netconsole=4444@10.0.0.1/br0,9353@10.0.0.2/ rmmod netconsole The backtrace is: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page ... ... ... Call Trace: <TASK> __netpoll_free+0xa5/0xf0 br_netpoll_cleanup+0x43/0x50 [bridge] do_netpoll_cleanup+0x43/0xc0 netconsole_netdev_event+0x1e3/0x300 [netconsole] unregister_netdevice_notifier+0xd9/0x150 cleanup_module+0x45/0x920 [netconsole] __se_sys_delete_module+0x205/0x290 do_syscall_64+0x70/0x150 entry_SYSCALL_64_after_hwframe+0x76/0x7e Move the skb_pool list setup and initial skb fill into __netpoll_setup(). Fixes: 221a9c1df790 ("net: netpoll: Individualize the skb pool") Signed-off-by: John Sperbeck <jsperbeck@google.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250114011354.2096812-1-jsperbeck@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14socket: Remove unused kernel_sendmsg_lockedDr. David Alan Gilbert
The last use of kernel_sendmsg_locked() was removed in 2023 by commit dc97391e6610 ("sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250112131318.63753-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14mptcp: fix spurious wake-up on under memory pressurePaolo Abeni
The wake-up condition currently implemented by mptcp_epollin_ready() is wrong, as it could mark the MPTCP socket as readable even when no data are present and the system is under memory pressure. Explicitly check for some data being available in the receive queue. Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-2-0d986ee7b1b6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14mptcp: be sure to send ack when mptcp-level window re-opensPaolo Abeni
mptcp_cleanup_rbuf() is responsible to send acks when the user-space reads enough data to update the receive windows significantly. It tries hard to avoid acquiring the subflow sockets locks by checking conditions similar to the ones implemented at the TCP level. To avoid too much code duplication - the MPTCP protocol can't reuse the TCP helpers as part of the relevant status is maintained into the msk socket - and multiple costly window size computation, mptcp_cleanup_rbuf uses a rough estimate for the most recently advertised window size: the MPTCP receive free space, as recorded as at last-ack time. Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect a zero to non-zero win change in some corner cases, skipping the tcp_cleanup_rbuf call and leaving the peer stuck. After commit ea66758c1795 ("tcp: allow MPTCP to update the announced window"), MPTCP has actually cheap access to the announced window value. Use it in mptcp_cleanup_rbuf() for a more accurate ack generation. Fixes: e3859603ba13 ("mptcp: better msk receive window updates") Cc: stable@vger.kernel.org Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/20250107131845.5e5de3c5@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250113-net-mptcp-connect-st-flakes-v1-1-0d986ee7b1b6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14Bluetooth: iso: Allow BIG re-syncIulia Tanasescu
A Broadcast Sink might require BIG sync to be terminated and re-established multiple times, while keeping the same PA sync handle active. This can be possible if the configuration of the listening (PA sync) socket is reset once all bound BISes are established and accepted by the user space: 1. The DEFER setup flag needs to be reset on the parent socket, to allow another BIG create sync procedure to be started on socket read. 2. The BT_SK_BIG_SYNC flag needs to be cleared on the parent socket, to allow another BIG create sync command to be sent. 3. The socket state needs to transition from BT_LISTEN to BT_CONNECTED, to mark that the listening process has completed and another one can be started if needed. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-01-14tcp: add LINUX_MIB_PAWS_OLD_ACK SNMP counterEric Dumazet
Prior patch in the series added TCP_RFC7323_PAWS_ACK drop reason. This patch adds the corresponding SNMP counter, for folks using nstat instead of tracing for TCP diagnostics. nstat -az | grep PAWSOldAck Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250113135558.3180360-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14tcp: add TCP_RFC7323_PAWS_ACK drop reasonEric Dumazet
XPS can cause reorders because of the relaxed OOO conditions for pure ACK packets. For hosts not using RFS, what can happpen is that ACK packets are sent on behalf of the cpu processing NIC interrupts, selecting TX queue A for ACK packet P1. Then a subsequent sendmsg() can run on another cpu. TX queue selection uses the socket hash and can choose another queue B for packets P2 (with payload). If queue A is more congested than queue B, the ACK packet P1 could be sent on the wire after P2. A linux receiver when processing P1 (after P2) currently increments LINUX_MIB_PAWSESTABREJECTED (TcpExtPAWSEstab) and use TCP_RFC7323_PAWS drop reason. It might also send a DUPACK if not rate limited. In order to better understand this pattern, this patch adds a new drop_reason : TCP_RFC7323_PAWS_ACK. For old ACKS like these, we no longer increment LINUX_MIB_PAWSESTABREJECTED and no longer sends a DUPACK, keeping credit for other more interesting DUPACK. perf record -e skb:kfree_skb -a perf script ... swapper 0 [148] 27475.438637: skb:kfree_skb: ... location=tcp_validate_incoming+0x4f0 reason: TCP_RFC7323_PAWS_ACK swapper 0 [208] 27475.438706: skb:kfree_skb: ... location=tcp_validate_incoming+0x4f0 reason: TCP_RFC7323_PAWS_ACK swapper 0 [208] 27475.438908: skb:kfree_skb: ... location=tcp_validate_incoming+0x4f0 reason: TCP_RFC7323_PAWS_ACK swapper 0 [148] 27475.439010: skb:kfree_skb: ... location=tcp_validate_incoming+0x4f0 reason: TCP_RFC7323_PAWS_ACK swapper 0 [148] 27475.439214: skb:kfree_skb: ... location=tcp_validate_incoming+0x4f0 reason: TCP_RFC7323_PAWS_ACK swapper 0 [208] 27475.439286: skb:kfree_skb: ... location=tcp_validate_incoming+0x4f0 reason: TCP_RFC7323_PAWS_ACK ... Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250113135558.3180360-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14tcp: add drop_reason support to tcp_disordered_ack()Eric Dumazet
Following patch is adding a new drop_reason to tcp_validate_incoming(). Change tcp_disordered_ack() to not return a boolean anymore, but a drop reason. Change its name to tcp_disordered_ack_check() Refactor tcp_validate_incoming() to ease the code review of the following patch, and reduce indentation level. This patch is a refactor, with no functional change. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250113135558.3180360-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: pse-pd: Split ethtool_get_status into multiple callbacksKory Maincent
The ethtool_get_status callback currently handles all status and PSE information within a single function. This approach has two key drawbacks: 1. If the core requires some information for purposes other than ethtool_get_status, redundant code will be needed to fetch the same data from the driver (like is_enabled). 2. Drivers currently have access to all information passed to ethtool. New variables will soon be added to ethtool status, such as PSE ID, power domain IDs, and budget evaluation strategies, which are meant to be managed solely by the core. Drivers should not have the ability to modify these variables. To resolve these issues, ethtool_get_status has been split into multiple callbacks, with each handling a specific piece of information required by ethtool or the core. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14vsock: prevent null-ptr-deref in vsock_*[has_data|has_space]Stefano Garzarella
Recent reports have shown how we sometimes call vsock_*_has_data() when a vsock socket has been de-assigned from a transport (see attached links), but we shouldn't. Previous commits should have solved the real problems, but we may have more in the future, so to avoid null-ptr-deref, we can return 0 (no space, no data available) but with a warning. This way the code should continue to run in a nearly consistent state and have a warning that allows us to debug future problems. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/Z2K%2FI4nlHdfMRTZC@v4bel-B760M-AORUS-ELITE-AX/ Link: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/ Link: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/ Co-developed-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Co-developed-by: Wongi Lee <qwerty@theori.io> Signed-off-by: Wongi Lee <qwerty@theori.io> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Reviewed-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14vsock: reset socket state when de-assigning the transportStefano Garzarella
Transport's release() and destruct() are called when de-assigning the vsock transport. These callbacks can touch some socket state like sock flags, sk_state, and peer_shutdown. Since we are reassigning the socket to a new transport during vsock_connect(), let's reset these fields to have a clean state with the new transport. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14vsock/virtio: cancel close work in the destructorStefano Garzarella
During virtio_transport_release() we can schedule a delayed work to perform the closing of the socket before destruction. The destructor is called either when the socket is really destroyed (reference counter to zero), or it can also be called when we are de-assigning the transport. In the former case, we are sure the delayed work has completed, because it holds a reference until it completes, so the destructor will definitely be called after the delayed work is finished. But in the latter case, the destructor is called by AF_VSOCK core, just after the release(), so there may still be delayed work scheduled. Refactor the code, moving the code to delete the close work already in the do_close() to a new function. Invoke it during destruction to make sure we don't leave any pending work. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Reported-by: Hyunwoo Kim <v4bel@theori.io> Closes: https://lore.kernel.org/netdev/Z37Sh+utS+iV3+eb@v4bel-B760M-AORUS-ELITE-AX/ Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Tested-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14vsock/bpf: return early if transport is not assignedStefano Garzarella
Some of the core functions can only be called if the transport has been assigned. As Michal reported, a socket might have the transport at NULL, for example after a failed connect(), causing the following trace: BUG: kernel NULL pointer dereference, address: 00000000000000a0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 12faf8067 P4D 12faf8067 PUD 113670067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 15 UID: 0 PID: 1198 Comm: a.out Not tainted 6.13.0-rc2+ RIP: 0010:vsock_connectible_has_data+0x1f/0x40 Call Trace: vsock_bpf_recvmsg+0xca/0x5e0 sock_recvmsg+0xb9/0xc0 __sys_recvfrom+0xb3/0x130 __x64_sys_recvfrom+0x20/0x30 do_syscall_64+0x93/0x180 entry_SYSCALL_64_after_hwframe+0x76/0x7e So we need to check the `vsk->transport` in vsock_bpf_recvmsg(), especially for connected sockets (stream/seqpacket) as we already do in __vsock_connectible_recvmsg(). Fixes: 634f1a7110b4 ("vsock: support sockmap") Cc: stable@vger.kernel.org Reported-by: Michal Luczaj <mhal@rbox.co> Closes: https://lore.kernel.org/netdev/5ca20d4c-1017-49c2-9516-f6f75fd331e9@rbox.co/ Tested-by: Michal Luczaj <mhal@rbox.co> Reported-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/677f84a8.050a0220.25a300.01b3.GAE@google.com/ Tested-by: syzbot+3affdbfc986ecd9200fd@syzkaller.appspotmail.com Reviewed-by: Hyunwoo Kim <v4bel@theori.io> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14vsock/virtio: discard packets if the transport changesStefano Garzarella
If the socket has been de-assigned or assigned to another transport, we must discard any packets received because they are not expected and would cause issues when we access vsk->transport. A possible scenario is described by Hyunwoo Kim in the attached link, where after a first connect() interrupted by a signal, and a second connect() failed, we can find `vsk->transport` at NULL, leading to a NULL pointer dereference. Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Cc: stable@vger.kernel.org Reported-by: Hyunwoo Kim <v4bel@theori.io> Reported-by: Wongi Lee <qwerty@theori.io> Closes: https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/ Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: hsr: Create and export hsr_get_port_ndev()MD Danish Anwar
Create an API to get the net_device to the slave port of HSR device. The API will take hsr net_device and enum hsr_port_type for which we want the net_device as arguments. This API can be used by client drivers who support HSR and want to get the net_devcie of slave ports from the hsr device. Export this API for the same. This API needs the enum hsr_port_type to be accessible by the drivers using hsr. Move the enum hsr_port_type from net/hsr/hsr_main.h to include/linux/if_hsr.h for the same. Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: ti: icssg-prueth: Add Multicast Filtering support for VLAN in MAC modeMD Danish Anwar
Add multicast filtering support for VLAN interfaces in dual EMAC mode for ICSSG driver. The driver uses vlan_for_each() API to get the list of available vlans. The driver then sync mc addr of vlan interface with a locally mainatined list emac->vlan_mcast_list[vid] using __hw_addr_sync_multiple() API. __hw_addr_sync_multiple() is used instead of __hw_addr_sync() to sync vdev->mc with local list because the sync_cnt for addresses in vdev->mc will already be set by the vlan_dev_set_rx_mode() [net/8021q/vlan_dev.c] and __hw_addr_sync() only syncs when the sync_cnt == 0. Whereas __hw_addr_sync_multiple() can sync addresses even if sync_cnt is not 0. Export __hw_addr_sync_multiple() so that driver can use it. Once the local list is synced, driver calls __hw_addr_sync_dev() with the local list, vdev, sync and unsync callbacks. __hw_addr_sync_dev() is used with the local maintained list as the list to synchronize instead of using __dev_mc_sync() on vdev because __dev_mc_sync() on vdev will call __hw_addr_sync_dev() on vdev->mc and sync_cnt for addresses in vdev->mc will already be set by the vlan_dev_set_rx_mode() [net/8021q/vlan_dev.c] and __hw_addr_sync_dev() only syncs if the sync_cnt of addresses in the list (vdev->mc in this case) is 0. Whereas __hw_addr_sync_dev() on local list will work fine as the sync_cnt for addresses in the local list will still be 0. Based on change in addresses in the local list, sync / unsync callbacks are invoked. In the sync / unsync API in driver, based on whether the ndev is vlan or not, driver passes appropriate vid to FDB helper functions. Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14Merge tag 'nf-next-25-01-11' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains a small batch of Netfilter/IPVS updates for net-next: 1) Remove unused genmask parameter in nf_tables_addchain() 2) Speed up reads from /proc/net/ip_vs_conn, from Florian Westphal. 3) Skip empty buckets in hashlimit to avoid atomic operations that results in false positive reports by syzbot with lockdep enabled, patch from Eric Dumazet. 4) Add conntrack event timestamps available via ctnetlink, from Florian Westphal. netfilter pull request 25-01-11 * tag 'nf-next-25-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: conntrack: add conntrack event timestamp netfilter: xt_hashlimit: htable_selective_cleanup() optimization ipvs: speed up reads from ip_vs_conn proc file netfilter: nf_tables: remove the genmask parameter ==================== Link: https://patch.msgid.link/20250111230800.67349-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: ethtool: add support for structured PHY statisticsJakub Kicinski
Introduce a new way to report PHY statistics in a structured and standardized format using the netlink API. This new method does not replace the old driver-specific stats, which can still be accessed with `ethtool -S <eth name>`. The structured stats are available with `ethtool -S <eth name> --all-groups`. This new method makes it easier to diagnose problems by organizing stats in a consistent and documented way. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: ethtool: plumb PHY stats to PHY driversJakub Kicinski
Introduce support for standardized PHY statistics reporting in ethtool by extending the PHYLIB framework. Add the functions phy_ethtool_get_phy_stats() and phy_ethtool_get_link_ext_stats() to provide a consistent interface for retrieving PHY-level and link-specific statistics. These functions are used within the ethtool implementation to avoid direct access to the phy_device structure outside of the PHYLIB framework. A new structure, ethtool_phy_stats, is introduced to standardize PHY statistics such as packet counts, byte counts, and error counters. Drivers are updated to include callbacks for retrieving PHY and link-specific statistics, ensuring values are explicitly set only for supported fields, initialized with ETHTOOL_STAT_NOT_SET to avoid ambiguity. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14ethtool: linkstate: migrate linkstate functions to support multi-PHY setupsOleksij Rempel
Adapt linkstate_get_sqi() and linkstate_get_sqi_max() to take a phy_device argument directly, enabling support for setups with multiple PHYs. The previous assumption of a single PHY attached to a net_device no longer holds. Use ethnl_req_get_phydev() to identify the appropriate PHY device for the operation. Update linkstate_prepare_data() and related logic to accommodate this change, ensuring compatibility with multi-PHY configurations. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14udp: Make rehash4 independent in udp_lib_rehash()Philo Lu
As discussed in [0], rehash4 could be missed in udp_lib_rehash() when udp hash4 changes while hash2 doesn't change. This patch fixes this by moving rehash4 codes out of rehash2 checking, and then rehash2 and rehash4 are done separately. By doing this, we no longer need to call rehash4 explicitly in udp_lib_hash4(), as the rehash callback in __ip4_datagram_connect takes it. Thus, now udp_lib_hash4() returns directly if the sk is already hashed. Note that uhash4 may fail to work under consecutive connect(<dst address>) calls because rehash() is not called with every connect(). To overcome this, connect(<AF_UNSPEC>) needs to be called after the next connect to a new destination. [0] https://lore.kernel.org/all/4761e466ab9f7542c68cdc95f248987d127044d2.1733499715.git.pabeni@redhat.com/ Fixes: 78c91ae2c6de ("ipv4/udp: Add 4-tuple hash for connected socket") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Link: https://patch.msgid.link/20250110010810.107145-1-lulie@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: sched: calls synchronize_net() only when neededEric Dumazet
dev_deactivate_many() role is to remove the qdiscs of a network device. When/if a qdisc is dismantled, an rcu grace period is needed to make sure all outstanding qdisc enqueue are done before we proceed with a qdisc reset. Most virtual devices do not have a qdisc. We can call the expensive synchronize_net() only if needed. Note that dev_deactivate_many() does not have to deal with qdisc-less dev_queue_xmit, as an old comment was claiming. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250109171850.2871194-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-13net: cleanup init_dummy_netdev_core()Jakub Kicinski
init_dummy_netdev_core() used to cater to net_devices which did not come from alloc_netdev_mqs(). Since that's no longer supported remove the init logic which duplicates alloc_netdev_mqs(). While at it rename back to init_dummy_netdev(). Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250113003456.3904110-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13net: remove init_dummy_netdev()Jakub Kicinski
init_dummy_netdev() can initialize statically declared or embedded net_devices. Such netdevs did not come from alloc_netdev_mqs(). After recent work by Breno, there are the only two cases where we have do that. Switch those cases to alloc_netdev_mqs() and delete init_dummy_netdev(). Dealing with static netdevs is not worth the maintenance burden. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20250113003456.3904110-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13net/ncsi: fix locking in Get MAC Address handlingPaul Fertser
Obtaining RTNL lock in a response handler is not allowed since it runs in an atomic softirq context. Postpone setting the MAC address by adding a dedicated step to the configuration FSM. Fixes: 790071347a0a ("net/ncsi: change from ndo_set_mac_address to dev_set_mac_address") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20241129-potin-revert-ncsi-set-mac-addr-v1-1-94ea2cb596af@gmail.com Signed-off-by: Paul Fertser <fercerpav@gmail.com> Tested-by: Potin Lai <potin.lai.pt@gmail.com> Link: https://patch.msgid.link/20250109145054.30925-1-fercerpav@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13net/smc: fix data error when recvmsg with MSG_PEEK flagGuangguan Wang
When recvmsg with MSG_PEEK flag, the data will be copied to user's buffer without advancing consume cursor and without reducing the length of rx available data. Once the expected peek length is larger than the value of bytes_to_rcv, in the loop of do while in smc_rx_recvmsg, the first loop will copy bytes_to_rcv bytes of data from the position local_tx_ctrl.cons, the second loop will copy the min(bytes_to_rcv, read_remaining) bytes from the position local_tx_ctrl.cons again because of the lacking of process with advancing consume cursor and reducing the length of available data. So do the subsequent loops. The data copied in the second loop and the subsequent loops will result in data error, as it should not be copied if no more data arrives and it should be copied from the position advancing bytes_to_rcv bytes from the local_tx_ctrl.cons if more data arrives. This issue can be reproduce by the following python script: server.py: import socket import time server_ip = '0.0.0.0' server_port = 12346 server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server_socket.bind((server_ip, server_port)) server_socket.listen(1) print('Server is running and listening for connections...') conn, addr = server_socket.accept() print('Connected by', addr) while True: data = conn.recv(1024) if not data: break print('Received request:', data.decode()) conn.sendall(b'Hello, client!\n') time.sleep(5) conn.sendall(b'Hello, again!\n') conn.close() client.py: import socket server_ip = '<server ip>' server_port = 12346 resp=b'Hello, client!\nHello, again!\n' client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) client_socket.connect((server_ip, server_port)) request = 'Hello, server!' client_socket.sendall(request.encode()) peek_data = client_socket.recv(len(resp), socket.MSG_PEEK | socket.MSG_WAITALL) print('Peeked data:', peek_data.decode()) client_socket.close() Fixes: 952310ccf2d8 ("smc: receive data from RMBE") Reported-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Link: https://patch.msgid.link/20250104143201.35529-1-guangguan.wang@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13SUNRPC: display total RPC tasks for RPC clientDai Ngo
Display the total number of RPC tasks, including tasks waiting on workqueue and wait queues, for rpc_clnt. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-13SUNRPC: only put task on cl_tasks list after the RPC call slot is reserved.Dai Ngo
Under heavy write load, we've seen the cl_tasks list grows to millions of entries. Even though the list is extremely long, the system still runs fine until the user wants to get the information of all active RPC tasks by doing: When this happens, tasks_start acquires the cl_lock to walk the cl_tasks list, returning one entry at a time to the caller. The cl_lock is held until all tasks on this list have been processed. While the cl_lock is held, completed RPC tasks have to spin wait in rpc_task_release_client for the cl_lock. If there are millions of entries in the cl_tasks list it will take a long time before tasks_stop is called and the cl_lock is released. The spin wait tasks can use up all the available CPUs in the system, preventing other jobs to run, this causes the system to temporarily lock up. This patch fixes this problem by delaying inserting the RPC task on the cl_tasks list until the RPC call slot is reserved. This limits the length of the cl_tasks to the number of call slots available in the system. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-01-13wifi: mac80211: avoid double free in auth/assoc timeoutMiri Korenblit
In case of authentication/association timeout (as detected in ieee80211_iface_work->ieee80211_sta_work), ieee80211_destroy_auth_data is called. At the beginning of it, the pointer to ifmgd::auth_data memory is copied to a local variable. If iface_work is queued again during the execution of the current one, and then the driver is flushing the wiphy_works (for its needs), ieee80211_destroy_auth_data will run again and free auth_data. Then when the execution of the original worker continues, the previously copied pointer will be freed, causing a kernel bug: kernel BUG at mm/slub.c:553! (double free) Same for association timeout (just with ieee80211_destroy_assoc_data and ifmgd::assoc_data) Fix this by NULLifying auth/assoc data right after we copied the pointer to it. That way, even in the scenario above, the code will not handle the same timeout twice. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250102161730.0c3f7f781096.I2b458fb53291b06717077a815755288a81274756@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13wifi: mac80211: ibss: mark IBSS left before leavingJohannes Berg
Mark that we left the IBSS before actually leaving (which requires calling the driver). Otherwise, it's possible to have the driver do some work flushing etc. while leaving, and then get into the work trying to join again while all data is being destroyed. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250102161730.81a6c12b304c.I8484f768371e128152a84aa164854cca9ec1066b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>