summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-05vxlan: Avoid unnecessary updates to FDB 'used' timeIdo Schimmel
Now that the VXLAN driver ages out FDB entries based on their 'updated' time we can remove unnecessary updates of the 'used' time from the Rx path and the control path, so that the 'used' time is only updated by the Tx path. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250204145549.1216254-8-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05vxlan: Age out FDB entries based on 'updated' timeIdo Schimmel
Currently, the VXLAN driver ages out FDB entries based on their 'used' time which is refreshed by both the Tx and Rx paths. This means that an FDB entry will not age out if traffic is only forwarded to the target host: # ip link add name vx1 up type vxlan id 10010 local 192.0.2.1 dstport 4789 learning ageing 10 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1 # bridge fdb get 00:11:22:33:44:55 br vx1 self 00:11:22:33:44:55 dev vx1 dst 198.51.100.1 self # mausezahn vx1 -a own -b 00:11:22:33:44:55 -c 0 -p 100 -q & # sleep 20 # bridge fdb get 00:11:22:33:44:55 br vx1 self 00:11:22:33:44:55 dev vx1 dst 198.51.100.1 self This is wrong as an FDB entry will remain present when we no longer have an indication that the host is still behind the current remote. It is also inconsistent with the bridge driver: # ip link add name br1 up type bridge ageing_time $((10 * 100)) # ip link add name swp1 up master br1 type dummy # bridge fdb add 00:11:22:33:44:55 dev swp1 master dynamic # bridge fdb get 00:11:22:33:44:55 br br1 00:11:22:33:44:55 dev swp1 master br1 # mausezahn br1 -a own -b 00:11:22:33:44:55 -c 0 -p 100 -q & # sleep 20 # bridge fdb get 00:11:22:33:44:55 br br1 Error: Fdb entry not found. Solve this by aging out entries based on their 'updated' time, which is not refreshed by the Tx path: # ip link add name vx1 up type vxlan id 10010 local 192.0.2.1 dstport 4789 learning ageing 10 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1 # bridge fdb get 00:11:22:33:44:55 br vx1 self 00:11:22:33:44:55 dev vx1 dst 198.51.100.1 self # mausezahn vx1 -a own -b 00:11:22:33:44:55 -c 0 -p 100 -q & # sleep 20 # bridge fdb get 00:11:22:33:44:55 br vx1 self Error: Fdb entry not found. But is refreshed by the Rx path: # ip address add 192.0.2.1/32 dev lo # ip link add name vx1 up type vxlan id 10010 local 192.0.2.1 dstport 4789 localbypass # ip link add name vx2 up type vxlan id 20010 local 192.0.2.1 dstport 4789 learning ageing 10 # bridge fdb add 00:11:22:33:44:55 dev vx1 self static dst 127.0.0.1 vni 20010 # mausezahn vx1 -a 00:aa:bb:cc:dd:ee -b 00:11:22:33:44:55 -c 0 -p 100 -q & # sleep 20 # bridge fdb get 00:aa:bb:cc:dd:ee br vx2 self 00:aa:bb:cc:dd:ee dev vx2 dst 127.0.0.1 self # pkill mausezahn # sleep 20 # bridge fdb get 00:aa:bb:cc:dd:ee br vx2 self Error: Fdb entry not found. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250204145549.1216254-7-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05vxlan: Refresh FDB 'updated' time upon user space updatesIdo Schimmel
When a host migrates to a different remote and a packet is received from the new remote, the corresponding FDB entry is updated and its 'updated' time is refreshed. However, when user space replaces the remote of an FDB entry, its 'updated' time is not refreshed: # ip link add name vx1 up type vxlan id 10010 dstport 4789 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1 # sleep 10 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]' 10 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.2 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]' 10 This can lead to the entry being aged out prematurely and it is also inconsistent with the bridge driver: # ip link add name br1 up type bridge # ip link add name swp1 master br1 up type dummy # ip link add name swp2 master br1 up type dummy # bridge fdb add 00:11:22:33:44:55 dev swp1 master dynamic vlan 1 # sleep 10 # bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]' 10 # bridge fdb replace 00:11:22:33:44:55 dev swp2 master dynamic vlan 1 # bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]' 0 Adjust the VXLAN driver to refresh the 'updated' time of an FDB entry whenever one of its attributes is changed by user space: # ip link add name vx1 up type vxlan id 10010 dstport 4789 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1 # sleep 10 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]' 10 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.2 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]' 0 Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250204145549.1216254-6-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05vxlan: Refresh FDB 'updated' time upon 'NTF_USE'Ido Schimmel
The 'NTF_USE' flag can be used by user space to refresh FDB entries so that they will not age out. Currently, the VXLAN driver implements it by refreshing the 'used' field in the FDB entry as this is the field according to which FDB entries are aged out. Subsequent patches will switch the VXLAN driver to age out entries based on the 'updated' field. Prepare for this change by refreshing the 'updated' field upon 'NTF_USE'. This is consistent with the bridge driver's FDB: # ip link add name br1 up type bridge # ip link add name swp1 master br1 up type dummy # bridge fdb add 00:11:22:33:44:55 dev swp1 master dynamic vlan 1 # sleep 10 # bridge fdb replace 00:11:22:33:44:55 dev swp1 master dynamic vlan 1 # bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]' 10 # sleep 10 # bridge fdb replace 00:11:22:33:44:55 dev swp1 master use dynamic vlan 1 # bridge -s -j fdb get 00:11:22:33:44:55 br br1 vlan 1 | jq '.[]["updated"]' 0 Before: # ip link add name vx1 up type vxlan id 10010 dstport 4789 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1 # sleep 10 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]' 10 # sleep 10 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self use dynamic dst 198.51.100.1 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]' 20 After: # ip link add name vx1 up type vxlan id 10010 dstport 4789 # bridge fdb add 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1 # sleep 10 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self dynamic dst 198.51.100.1 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]' 10 # sleep 10 # bridge fdb replace 00:11:22:33:44:55 dev vx1 self use dynamic dst 198.51.100.1 # bridge -s -j -p fdb get 00:11:22:33:44:55 br vx1 self | jq '.[]["updated"]' 0 Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250204145549.1216254-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05vxlan: Always refresh FDB 'updated' time when learning is enabledIdo Schimmel
Currently, when learning is enabled and a packet is received from the expected remote, the 'updated' field of the FDB entry is not refreshed. This will become a problem when we switch the VXLAN driver to age out entries based on the 'updated' field. Solve this by always refreshing an FDB entry when we receive a packet with a matching source MAC address, regardless if it was received via the expected remote or not as it indicates the host is alive. This is consistent with the bridge driver's FDB. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250204145549.1216254-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05vxlan: Read jiffies once when updating FDB 'used' timeIdo Schimmel
Avoid two volatile reads in the data path. Instead, read jiffies once and only if an FDB entry was found. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250204145549.1216254-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05vxlan: Annotate FDB data racesIdo Schimmel
The 'used' and 'updated' fields in the FDB entry structure can be accessed concurrently by multiple threads, leading to reports such as [1]. Can be reproduced using [2]. Suppress these reports by annotating these accesses using READ_ONCE() / WRITE_ONCE(). [1] BUG: KCSAN: data-race in vxlan_xmit / vxlan_xmit write to 0xffff942604d263a8 of 8 bytes by task 286 on cpu 0: vxlan_xmit+0xb29/0x2380 dev_hard_start_xmit+0x84/0x2f0 __dev_queue_xmit+0x45a/0x1650 packet_xmit+0x100/0x150 packet_sendmsg+0x2114/0x2ac0 __sys_sendto+0x318/0x330 __x64_sys_sendto+0x76/0x90 x64_sys_call+0x14e8/0x1c00 do_syscall_64+0x9e/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f read to 0xffff942604d263a8 of 8 bytes by task 287 on cpu 2: vxlan_xmit+0xadf/0x2380 dev_hard_start_xmit+0x84/0x2f0 __dev_queue_xmit+0x45a/0x1650 packet_xmit+0x100/0x150 packet_sendmsg+0x2114/0x2ac0 __sys_sendto+0x318/0x330 __x64_sys_sendto+0x76/0x90 x64_sys_call+0x14e8/0x1c00 do_syscall_64+0x9e/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x00000000fffbac6e -> 0x00000000fffbac6f Reported by Kernel Concurrency Sanitizer on: CPU: 2 UID: 0 PID: 287 Comm: mausezahn Not tainted 6.13.0-rc7-01544-gb4b270f11a02 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [2] #!/bin/bash set +H echo whitelist > /sys/kernel/debug/kcsan echo !vxlan_xmit > /sys/kernel/debug/kcsan ip link add name vx0 up type vxlan id 10010 dstport 4789 local 192.0.2.1 bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 198.51.100.1 taskset -c 0 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q & taskset -c 2 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q & Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250204145549.1216254-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-onlyGeert Uytterhoeven
if CONFIG_NET_IPGRE is enabled, but CONFIG_IPV6 is disabled: net/ipv4/ip_gre.c: In function ‘ipgre_err’: net/ipv4/ip_gre.c:144:22: error: variable ‘data_len’ set but not used [-Werror=unused-but-set-variable] 144 | unsigned int data_len = 0; | ^~~~~~~~ Fix this by moving all data_len processing inside the IPV6-only section that uses its result. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501121007.2GofXmh5-lkp@intel.com/ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/d09113cfe2bfaca02f3dddf832fb5f48dd20958b.1738704881.git.geert@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05Merge branch 'rxrpc-call-state-fixes'Jakub Kicinski
David Howells says: ==================== rxrpc: Call state fixes Here some call state fixes for AF_RXRPC. (1) Fix the state of a call to not treat the challenge-response cycle as part of an incoming call's state set. The problem is that it makes handling received of the final packet in the receive phase difficult as that wants to change the call state - but security negotiations may not yet be complete. (2) Fix a race between the changing of the call state at the end of the request reception phase of a service call, recvmsg() collecting the last data and sendmsg() trying to send the reply before the I/O thread has advanced the call state. Link: https://lore.kernel.org/20250203110307.7265-2-dhowells@redhat.com ==================== Link: https://patch.msgid.link/20250204230558.712536-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05rxrpc: Fix race in call state changing vs recvmsg()David Howells
There's a race in between the rxrpc I/O thread recording the end of the receive phase of a call and recvmsg() examining the state of the call to determine whether it has completed. The problem is that call->_state records the I/O thread's view of the call, not the application's view (which may lag), so that alone is not sufficient. To this end, the application also checks whether there is anything left in call->recvmsg_queue for it to pick up. The call must be in state RXRPC_CALL_COMPLETE and the recvmsg_queue empty for the call to be considered fully complete. In rxrpc_input_queue_data(), the latest skbuff is added to the queue and then, if it was marked as LAST_PACKET, the state is advanced... But this is two separate operations with no locking around them. As a consequence, the lack of locking means that sendmsg() can jump into the gap on a service call and attempt to send the reply - but then get rejected because the I/O thread hasn't advanced the state yet. Simply flipping the order in which things are done isn't an option as that impacts the client side, causing the checks in rxrpc_kernel_check_life() as to whether the call is still alive to race instead. Fix this by moving the update of call->_state inside the skb queue spinlocked section where the packet is queued on the I/O thread side. rxrpc's recvmsg() will then automatically sync against this because it has to take the call->recvmsg_queue spinlock in order to dequeue the last packet. rxrpc's sendmsg() doesn't need amending as the app shouldn't be calling it to send a reply until recvmsg() indicates it has returned all of the request. Fixes: 93368b6bd58a ("rxrpc: Move call state changes from recvmsg to I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250204230558.712536-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05rxrpc: Fix call state set to not include the SERVER_SECURING stateDavid Howells
The RXRPC_CALL_SERVER_SECURING state doesn't really belong with the other states in the call's state set as the other states govern the call's Rx/Tx phase transition and govern when packets can and can't be received or transmitted. The "Securing" state doesn't actually govern the reception of packets and would need to be split depending on whether or not we've received the last packet yet (to mirror RECV_REQUEST/ACK_REQUEST). The "Securing" state is more about whether or not we can start forwarding packets to the application as recvmsg will need to decode them and the decoding can't take place until the challenge/response exchange has completed. Fix this by removing the RXRPC_CALL_SERVER_SECURING state from the state set and, instead, using a flag, RXRPC_CALL_CONN_CHALLENGING, to track whether or not we can queue the call for reception by recvmsg() or notify the kernel app that data is ready. In the event that we've already received all the packets, the connection event handler will poke the app layer in the appropriate manner. Also there's a race whereby the app layer sees the last packet before rxrpc has managed to end the rx phase and change the state to one amenable to allowing a reply. Fix this by queuing the packet after calling rxrpc_end_rx_phase(). Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250204230558.712536-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net: sched: Fix truncation of offloaded action statisticsIdo Schimmel
In case of tc offload, when user space queries the kernel for tc action statistics, tc will query the offloaded statistics from device drivers. Among other statistics, drivers are expected to pass the number of packets that hit the action since the last query as a 64-bit number. Unfortunately, tc treats the number of packets as a 32-bit number, leading to truncation and incorrect statistics when the number of packets since the last query exceeds 0xffffffff: $ tc -s filter show dev swp2 ingress filter protocol all pref 1 flower chain 0 filter protocol all pref 1 flower chain 0 handle 0x1 skip_sw in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device swp1) stolen index 1 ref 1 bind 1 installed 58 sec used 0 sec Action statistics: Sent 1133877034176 bytes 536959475 pkt (dropped 0, overlimits 0 requeues 0) [...] According to the above, 2111-byte packets were redirected which is impossible as only 64-byte packets were transmitted and the MTU was 1500. Fix by treating packets as a 64-bit number: $ tc -s filter show dev swp2 ingress filter protocol all pref 1 flower chain 0 filter protocol all pref 1 flower chain 0 handle 0x1 skip_sw in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device swp1) stolen index 1 ref 1 bind 1 installed 61 sec used 0 sec Action statistics: Sent 1370624380864 bytes 21416005951 pkt (dropped 0, overlimits 0 requeues 0) [...] Which shows that only 64-byte packets were redirected (1370624380864 / 21416005951 = 64). Fixes: 380407023526 ("net/sched: Enable netdev drivers to update statistics of offloaded actions") Reported-by: Joe Botha <joe@atomic.ac> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250204123839.1151804-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05r8169: don't scan PHY addresses > 0Heiner Kallweit
The PHY address is a dummy, because r8169 PHY access registers don't support a PHY address. Therefore scan address 0 only. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/830637dd-4016-4a68-92b3-618fcac6589d@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05tun: revert fix group permission checkWillem de Bruijn
This reverts commit 3ca459eaba1bf96a8c7878de84fa8872259a01e3. The blamed commit caused a regression when neither tun->owner nor tun->group is set. This is intended to be allowed, but now requires CAP_NET_ADMIN. Discussion in the referenced thread pointed out that the original issue that prompted this patch can be resolved in userspace. The relaxed access control may also make a device accessible when it previously wasn't, while existing users may depend on it to not be. This is a clean pure git revert, except for fixing the indentation on the gid_valid line that checkpatch correctly flagged. Fixes: 3ca459eaba1b ("tun: fix group permission check") Link: https://lore.kernel.org/netdev/CAFqZXNtkCBT4f+PwyVRmQGoT3p1eVa01fCG_aNtpt6dakXncUg@mail.gmail.com/ Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Stas Sergeev <stsp2@yandex.ru> Link: https://patch.msgid.link/20250204161015.739430-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net: flush_backlog() small changesEric Dumazet
Add READ_ONCE() around reads of skb->dev->reg_state, because this field can be changed from other threads/cpus. Instead of calling dev_kfree_skb_irq() and kfree_skb() while interrupts are masked and locks held, use a temporary list and use __skb_queue_purge_reason() Use SKB_DROP_REASON_DEV_READY drop reason to better describe why these skbs are dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20250204144825.316785-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05s390/net: Remove LCS driverAswin Karuvally
The original Open Systems Adapter (OSA) was introduced by IBM in the mid-90s. These were then superseded by OSA-Express in 1999 which used Queued Direct IO to greatly improve throughput. The newer cards retained the older, slower non-QDIO (OSE) modes for compatibility with older systems. In Linux, the lcs driver was responsible for cards operating in the older OSE mode and the qeth driver was introduced to allow the OSA-Express cards to operate in the newer QDIO (OSD) mode. For an S390 machine from 1998 or later, there is no reason to use the OSE mode and lcs driver as all OSA cards since 1999 provide the faster OSD mode. As a result, it's been years since we have heard of a customer configuration involving the lcs driver. This patch removes the lcs driver. The technology it supports has been obsolete for past 25+ years and is irrelevant for current use cases. Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250204103135.1619097-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05cxgb4: Avoid a -Wflex-array-member-not-at-end warningGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Move the conflicting declaration to the end of the structure. Notice that `struct ethtool_dump` is a flexible structure --a structure that contains a flexible-array member. Fix the following warning: ./drivers/net/ethernet/chelsio/cxgb4/cxgb4.h:1215:29: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/Z6GBZ4brXYffLkt_@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05Merge branch 'net_sched-two-security-bug-fixes-and-test-cases'Jakub Kicinski
Cong Wang says: ==================== net_sched: two security bug fixes and test cases This patchset contains two bug fixes reported in security mailing list, and test cases for both of them. ==================== Link: https://patch.msgid.link/20250204005841.223511-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests/tc-testing: Add a test case for qdisc_tree_reduce_backlog()Cong Wang
Integrate the test case provided by Mingi Cho into TDC. All test results: 1..4 ok 1 ca5e - Check class delete notification for ffff: ok 2 e4b7 - Check class delete notification for root ffff: ok 3 33a9 - Check ingress is not searchable on backlog update ok 4 a4b9 - Test class qlen notification Cc: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-5-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05netem: Update sch->q.qlen before qdisc_tree_reduce_backlog()Cong Wang
qdisc_tree_reduce_backlog() notifies parent qdisc only if child qdisc becomes empty, therefore we need to reduce the backlog of the child qdisc before calling it. Otherwise it would miss the opportunity to call cops->qlen_notify(), in the case of DRR, it resulted in UAF since DRR uses ->qlen_notify() to maintain its active list. Fixes: f8d4bc455047 ("net/sched: netem: account for backlog updates from child qdisc") Cc: Martin Ottens <martin.ottens@fau.de> Reported-by: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-4-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests/tc-testing: Add a test case for pfifo_head_drop qdisc when limit==0Quang Le
When limit == 0, pfifo_tail_enqueue() must drop new packet and increase dropped packets count of the qdisc. All test results: 1..16 ok 1 a519 - Add bfifo qdisc with system default parameters on egress ok 2 585c - Add pfifo qdisc with system default parameters on egress ok 3 a86e - Add bfifo qdisc with system default parameters on egress with handle of maximum value ok 4 9ac8 - Add bfifo qdisc on egress with queue size of 3000 bytes ok 5 f4e6 - Add pfifo qdisc on egress with queue size of 3000 packets ok 6 b1b1 - Add bfifo qdisc with system default parameters on egress with invalid handle exceeding maximum value ok 7 8d5e - Add bfifo qdisc on egress with unsupported argument ok 8 7787 - Add pfifo qdisc on egress with unsupported argument ok 9 c4b6 - Replace bfifo qdisc on egress with new queue size ok 10 3df6 - Replace pfifo qdisc on egress with new queue size ok 11 7a67 - Add bfifo qdisc on egress with queue size in invalid format ok 12 1298 - Add duplicate bfifo qdisc on egress ok 13 45a0 - Delete nonexistent bfifo qdisc ok 14 972b - Add prio qdisc on egress with invalid format for handles ok 15 4d39 - Delete bfifo qdisc twice ok 16 d774 - Check pfifo_head_drop qdisc enqueue behaviour when limit == 0 Signed-off-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05pfifo_tail_enqueue: Drop new packet when sch->limit == 0Quang Le
Expected behaviour: In case we reach scheduler's limit, pfifo_tail_enqueue() will drop a packet in scheduler's queue and decrease scheduler's qlen by one. Then, pfifo_tail_enqueue() enqueue new packet and increase scheduler's qlen by one. Finally, pfifo_tail_enqueue() return `NET_XMIT_CN` status code. Weird behaviour: In case we set `sch->limit == 0` and trigger pfifo_tail_enqueue() on a scheduler that has no packet, the 'drop a packet' step will do nothing. This means the scheduler's qlen still has value equal 0. Then, we continue to enqueue new packet and increase scheduler's qlen by one. In summary, we can leverage pfifo_tail_enqueue() to increase qlen by one and return `NET_XMIT_CN` status code. The problem is: Let's say we have two qdiscs: Qdisc_A and Qdisc_B. - Qdisc_A's type must have '->graft()' function to create parent/child relationship. Let's say Qdisc_A's type is `hfsc`. Enqueue packet to this qdisc will trigger `hfsc_enqueue`. - Qdisc_B's type is pfifo_head_drop. Enqueue packet to this qdisc will trigger `pfifo_tail_enqueue`. - Qdisc_B is configured to have `sch->limit == 0`. - Qdisc_A is configured to route the enqueued's packet to Qdisc_B. Enqueue packet through Qdisc_A will lead to: - hfsc_enqueue(Qdisc_A) -> pfifo_tail_enqueue(Qdisc_B) - Qdisc_B->q.qlen += 1 - pfifo_tail_enqueue() return `NET_XMIT_CN` - hfsc_enqueue() check for `NET_XMIT_SUCCESS` and see `NET_XMIT_CN` => hfsc_enqueue() don't increase qlen of Qdisc_A. The whole process lead to a situation where Qdisc_A->q.qlen == 0 and Qdisc_B->q.qlen == 1. Replace 'hfsc' with other type (for example: 'drr') still lead to the same problem. This violate the design where parent's qlen should equal to the sum of its childrens'qlen. Bug impact: This issue can be used for user->kernel privilege escalation when it is reachable. Fixes: 57dbb2d83d10 ("sched: add head drop fifo queue") Reported-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Quang Le <quanglex97@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://patch.msgid.link/20250204005841.223511-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests: mptcp: connect: -f: no reconnectMatthieu Baerts (NGI0)
The '-f' parameter is there to force the kernel to emit MPTCP FASTCLOSE by closing the connection with unread bytes in the receive queue. The xdisconnect() helper was used to stop the connection, but it does more than that: it will shut it down, then wait before reconnecting to the same address. This causes the mptcp_join's "fastclose test" to fail all the time. This failure is due to a recent change, with commit 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect"), but that went unnoticed because the test is currently ignored. The recent modification only shown an existing issue: xdisconnect() doesn't need to be used here, only the shutdown() part is needed. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250204-net-mptcp-sft-conn-f-v1-1-6b470c72fffa@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05bridge: mdb: Allow replace of a host-joined groupPetr Machata
Attempts to replace an MDB group membership of the host itself are currently bounced: # ip link add name br up type bridge vlan_filtering 1 # bridge mdb replace dev br port br grp 239.0.0.1 vid 2 # bridge mdb replace dev br port br grp 239.0.0.1 vid 2 Error: bridge: Group is already joined by host. A similar operation done on a member port would succeed. Ignore the check for replacement of host group memberships as well. The bit of code that this enables is br_multicast_host_join(), which, for already-joined groups only refreshes the MC group expiration timer, which is desirable; and a userspace notification, also desirable. Change a selftest that exercises this code path from expecting a rejection to expecting a pass. The rest of MDB selftests pass without modification. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/e5c5188b9787ae806609e7ca3aa2a0a501b9b5c4.1738685648.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05selftests: net: suppress ReST file generation when building selftestsJakub Kicinski
Some selftests need libynl.a. When building it try to skip generating the ReST documentation, libynl.a does not depend on them. Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203214850.1282291-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05Merge branch 'net-sysfs-remove-the-rtnl_trylock-restart_syscall-construction'Jakub Kicinski
Antoine Tenart says: ==================== net-sysfs: remove the rtnl_trylock/restart_syscall construction The series initially aimed at improving spins (and thus delays) while accessing net sysfs under rtnl lock contention[1]. The culprit was the trylock/restart_syscall constructions. There wasn't much interest at the time but it got traction recently for other reasons (lowering the rtnl lock pressure). Since v1[2]: - Do not export rtnl_lock_interruptible [Stephen]. - Add netdev_warn_once messages in rx_queue_add_kobject [Jakub]. Since the RFC[1]: - Limit the breaking of the sysfs protection to sysfs_rtnl_lock() only as this is not needed in the whole rtnl locking section thanks to the additional check on dev_isalive(). This simplifies error handling as well as the unlocking path. - Used an interruptible version of rtnl_lock, as done by Jakub in his experiments. - Removed a WARN_ONCE_ONCE [Greg]. - Removed explicit inline markers [Stephen]. Most of the reasoning is explained in comments added in patch 1. This was tested by stress-testing net sysfs attributes (read/write ops) while adding/removing queues and adding/removing veths, all in parallel. I also used an OCP single node cluster, spawning lots of pods. [1] https://lore.kernel.org/all/20231018154804.420823-1-atenart@kernel.org/T/ [2] https://lore.kernel.org/all/20250117102612.132644-1-atenart@kernel.org/T/ ==================== Link: https://patch.msgid.link/20250204170314.146022-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net-sysfs: remove rtnl_trylock from queue attributesAntoine Tenart
Similar to the commit removing remove rtnl_trylock from device attributes we here apply the same technique to networking queues. Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250204170314.146022-5-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net-sysfs: prevent uncleared queues from being re-addedAntoine Tenart
With the (upcoming) removal of the rtnl_trylock/restart_syscall logic and because of how Tx/Rx queues are implemented (and their requirements), it might happen that a queue is re-added before having the chance to be cleared. In such rare case, do not complete the queue addition operation. Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250204170314.146022-4-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net-sysfs: move queue attribute groups outside the default groupsAntoine Tenart
Rx/tx queues embed their own kobject for registering their per-queue sysfs files. The issue is they're using the kobject default groups for this and entirely rely on the kobject refcounting for releasing their sysfs paths. In order to remove rtnl_trylock calls we need sysfs files not to rely on their associated kobject refcounting for their release. Thus we here move queues sysfs files from the kobject default groups to their own groups which can be removed separately. Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250204170314.146022-3-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net-sysfs: remove rtnl_trylock from device attributesAntoine Tenart
There is an ABBA deadlock between net device unregistration and sysfs files being accessed[1][2]. To prevent this from happening all paths taking the rtnl lock after the sysfs one (actually kn->active refcount) use rtnl_trylock and return early (using restart_syscall)[3], which can make syscalls to spin for a long time when there is contention on the rtnl lock[4]. There are not many possibilities to improve the above: - Rework the entire net/ locking logic. - Invert two locks in one of the paths — not possible. But here it's actually possible to drop one of the locks safely: the kernfs_node refcount. More details in the code itself, which comes with lots of comments. Note that we check the device is alive in the added sysfs_rtnl_lock helper to disallow sysfs operations to run after device dismantle has started. This also help keeping the same behavior as before. Because of this calls to dev_isalive in sysfs ops were removed. [1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/ [2] https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/ [3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/ [4] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/T/ Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250204170314.146022-2-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05ice: init flow director before RDMAMichal Swiatkowski
Flow director needs only one MSI-X. Load it before RDMA to save MSI-X for it. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice: simplify VF MSI-X managingMichal Swiatkowski
After implementing pf->msix.max field, base vector for other use cases (like VFs) can be fixed. This simplify code when changing MSI-X amount on particular VF, because there is no need to move a base vector. A fixed base vector allows to reserve vectors from the beginning instead of from the end, which is also simpler in code. Store total and rest value in the same struct as max and min for PF. Move tracking vectors from ice_sriov.c to ice_irq.c as it can be also use for other none PF use cases (SIOV). Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice: enable_rdma devlink paramMichal Swiatkowski
Implement enable_rdma devlink parameter to allow user to turn RDMA feature on and off. It is useful when there is no enough interrupts and user doesn't need RDMA feature. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice: treat dyn_allowed only as suggestionMichal Swiatkowski
It can be needed to have some MSI-X allocated as static and rest as dynamic. For example on PF VSI. We want to always have minimum one MSI-X on it, because of that it is allocated as a static one, rest can be dynamic if it is supported. Change the ice_get_irq_res() to allow using static entries if they are free even if caller wants dynamic one. Adjust limit values to the new approach. Min and max in limit means the values that are valid, so decrease max and num_static by one. Set vsi::irq_dyn_alloc if dynamic allocation is supported. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice, irdma: move interrupts code to irdmaMichal Swiatkowski
Move responsibility of MSI-X requesting for RDMA feature from ice driver to irdma driver. It is done to allow simple fallback when there is not enough MSI-X available. Change amount of MSI-X used for control from 4 to 1, as it isn't needed to have more than one MSI-X for this purpose. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice: get rid of num_lan_msix fieldMichal Swiatkowski
Remove the field to allow having more queues than MSI-X on VSI. As default the number will be the same, but if there won't be more MSI-X available VSI can run with at least one MSI-X. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice: remove splitting MSI-X between featuresMichal Swiatkowski
With dynamic approach to alloc MSI-X there is no sense to statically split MSI-X between PF features. Splitting was also calculating needed MSI-X. Move this part to separate function and use as max value. Remove ICE_ESWITCH_MSIX, as there is no need for additional MSI-X for switchdev. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice: devlink PF MSI-X max and min parameterMichal Swiatkowski
Use generic devlink PF MSI-X parameter to allow user to change MSI-X range. Add notes about this parameters into ice devlink documentation. Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05Merge tag 'for-6.14-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - add lockdep annotation for relocation root to fix a splat warning while merging roots - fix assertion failure when splitting ordered extent after transaction abort - don't print 'qgroup inconsistent' message when rescan process updates qgroup data sooner than the subvolume deletion process - fix use-after-free (accessing the error number) when attempting to join an aborted transaction - avoid starting new transaction if not necessary when cleaning qgroup during subvolume drop * tag 'for-6.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: avoid starting new transaction when cleaning qgroup during subvolume drop btrfs: fix use-after-free when attempting to join an aborted transaction btrfs: do not output error message if a qgroup has been already cleaned up btrfs: fix assertion failure when splitting ordered extent after transaction abort btrfs: fix lockdep splat while merging a relocation root
2025-02-05net: phy: realtek: use string choices helpersHeiner Kallweit
Use string choices helpers to simplify the code. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501190707.qQS8PGHW-lkp@intel.com/ Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-04r8169: make Kconfig option for LED support user-visibleHeiner Kallweit
Make config option R8169_LEDS user-visible, so that users can remove support if not needed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/d29f0cdb-32bf-435f-b59d-dc96bca1e3ab@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: phy: realtek: make HWMON support a user-visible Kconfig symbolHeiner Kallweit
Make config symbol REALTEK_PHY_HWMON user-visible, so that users can remove support if not needed. Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/3466ee92-166a-4b0f-9ae7-42b9e046f333@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04netconsole: selftest: Add test for fragmented messagesBreno Leitao
Add a new selftest to verify netconsole's handling of messages that exceed the packet size limit and require fragmentation. The test sends messages with varying sizes and userdata, validating that: 1. Large messages are correctly fragmented and reassembled 2. Userdata fields are properly preserved across fragments 3. Messages work correctly with and without kernel release version appending The test creates a networking environment using netdevsim, sends messages through /dev/kmsg, and verifies the received fragments maintain message integrity. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: atlantic: Avoid -Wflex-array-member-not-at-end warningsGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Remove unused flexible-array member `buf` and, with this, fix the following warnings: drivers/net/ethernet/aquantia/atlantic/aq_hw.h:197:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/ethernet/aquantia/atlantic/hw_atl/../aq_hw.h:197:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Suggested-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Link: https://patch.msgid.link/Z6F3KZVfnAZ2FoJm@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net: warn if NAPI instance wasn't shut downJakub Kicinski
Drivers should always disable a NAPI instance before removing it. If they don't the instance may be queued for polling. Since commit 86e25f40aa1e ("net: napi: Add napi_config") we also remove the NAPI from the busy polling hash table in napi_disable(), so not disabling would leave a stale entry there. Use of busy polling is relatively uncommon so bugs may be lurking in the drivers. Add an explicit warning. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250203215816.1294081-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04ice: count combined queues using Rx/Tx countMichal Swiatkowski
Previous implementation assumes that there is 1:1 matching between vectors and queues. It isn't always true. Get minimum value from Rx/Tx queues to determine combined queues number. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-04cavium/liquidio: Remove unused lio_get_device_idDr. David Alan Gilbert
lio_get_device_id() has been unused since 2018's commit 64fecd3ec512 ("liquidio: remove obsolete functions and data structures") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250203183343.193691-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04mlxsw: spectrum_router: Remove unused functionsDr. David Alan Gilbert
mlxsw_sp_ipip_lb_ul_vr_id() has been unused since 2020's commit acde33bf7319 ("mlxsw: spectrum_router: Reduce mlxsw_sp_ipip_fib_entry_op_gre4()") mlxsw_sp_rif_exists() has been unused since 2023's commit 49c3a615d382 ("mlxsw: spectrum_router: Replay MACVLANs when RIF is made") mlxsw_sp_rif_vid() has been unused since 2023's commit a5b52692e693 ("mlxsw: spectrum_switchdev: Manage RIFs on PVID change") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250203190141.204951-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04net/mlx5: Remove unused mlx5dr_domain_syncDr. David Alan Gilbert
mlx5dr_domain_sync() was added in 2019 by commit 70605ea545e8 ("net/mlx5: DR, Expose APIs for direct rule managing") but hasn't been used. Remove it. mlx5dr_domain_sync() was the only user of mlx5dr_send_ring_force_drain(). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250203185958.204794-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04mlx4: Remove unused functionsDr. David Alan Gilbert
The last use of mlx4_find_cached_mac() was removed in 2014 by commit 2f5bb473681b ("mlx4: Add ref counting to port MAC table for RoCE") mlx4_zone_free_entries() was added in 2014 by commit 7a89399ffad7 ("net/mlx4: Add mlx4_bitmap zone allocator") but hasn't been used. (The _unique version is used) Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20250203185229.204279-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>