summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-05net_sched: sch_fq: add 3 bands and WRR schedulingEric Dumazet
Before Google adopted FQ for its production servers, we had to ensure AF4 packets would get a higher share than BE1 ones. As discussed this week in Netconf 2023 in Paris, it is time to upstream this for public use. After this patch FQ can replace pfifo_fast, with the following differences : - FQ uses WRR instead of strict prio, to avoid starvation of low priority packets. - We make sure each band/prio tracks its own usage against sch->limit. This was done to make sure flood of low priority packets would not prevent AF4 packets to be queued. Contributed by Willem. - priomap can be changed, if needed (default value are the ones coming from pfifo_fast). In this patch, we set default band weights so that : - high prio (band=0) packets get 90% of the bandwidth if they compete with low prio (band=2) packets. - high prio packets get 75% of the bandwidth if they compete with medium prio (band=1) packets. Following patch in this series adds the possibility to tune the per-band weights. As we added many fields in 'struct fq_sched_data', we had to make sure to have the first cache line read-mostly, and avoid wasting precious cache lines. More optimizations are possible but will be sent separately. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-05net_sched: export pfifo_fast prio2band[]Eric Dumazet
pfifo_fast prio2band[] is renamed to sch_default_prio2band[] and exported because we want to share it in FQ. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-05net_sched: sch_fq: remove q->ktime_cacheEric Dumazet
Now that both enqueue() and dequeue() need to use ktime_get_ns(), there is no point wasting 8 bytes in struct fq_sched_data. This makes room for future fields. ;) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-05Merge branch 'net-mana-fix-some-tx-processing-bugs'Paolo Abeni
Haiyang Zhang says: ==================== net: mana: Fix some TX processing bugs Fix TX processing bugs on error handling, tso_bytes calculation, and sge0 size. ==================== Link: https://lore.kernel.org/r/1696020147-14989-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-05net: mana: Fix oversized sge0 for GSO packetsHaiyang Zhang
Handle the case when GSO SKB linear length is too large. MANA NIC requires GSO packets to put only the header part to SGE0, otherwise the TX queue may stop at the HW level. So, use 2 SGEs for the skb linear part which contains more than the packet header. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-05net: mana: Fix the tso_bytes calculationHaiyang Zhang
sizeof(struct hop_jumbo_hdr) is not part of tso_bytes, so remove the subtraction from header size. Cc: stable@vger.kernel.org Fixes: bd7fc6e1957c ("net: mana: Add new MANA VF performance counters for easier troubleshooting") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-05net: mana: Fix TX CQE error handlingHaiyang Zhang
For an unknown TX CQE error type (probably from a newer hardware), still free the SKB, update the queue tail, etc., otherwise the accounting will be wrong. Also, TX errors can be triggered by injecting corrupted packets, so replace the WARN_ONCE to ratelimited error logging. Cc: stable@vger.kernel.org Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-05can: peak_pci: replace deprecated strncpy with strscpyJustin Stitt
`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. NUL-padding is not required since card is already zero-initialized: | card = kzalloc(sizeof(*card), GFP_KERNEL); A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20231005-strncpy-drivers-net-can-sja1000-peak_pci-c-v1-1-c36e1702cd56@google.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-10-05wifi: rtlwifi: remove unreachable code in rtl92d_dm_check_edca_turbo()Dmitry Antipov
Since '!(0x5ea42b & 0xffff0000)' is always false, remove unreachable block in 'rtl92d_dm_check_edca_turbo()' and convert EDCA limits to constant variables. Compile tested only. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231003043318.11370-1-dmantipov@yandex.ru
2023-10-05wifi: rtw89: debug: txpwr table supports Wi-Fi 7 chipsZong-Zhe Yang
We add TX power table format for Wi-Fi 7 chips. Since Wi-Fi 7 tables are larger, in order to reuse some chunks, we extend code to process nested entries. Now, dbgfs txpwr_table can work with Wi-Fi 7 chips. An output example of dbgfs txpwr_table on Wi-Fi 7 chips is shown below. ... [TX power byrate] << BW20 >> CCK - 1M 2M 5.5M 11M | 20, 20, 20, 20, dBm LEGACY - 6M 9M 12M 18M | 18, 18, 18, 18, dBm LEGACY - 24M 36M 48M 54M | 18, 18, 17, 16, dBm EHT - MCS14 MCS15 | 0, 0, dBm DLRU_EHT - MCS14 MCS15 | 0, 18, dBm MCS_1SS - MCS0 MCS1 MCS2 MCS3 | 18, 18, 18, 18, dBm MCS_1SS - MCS4 MCS5 MCS6 MCS7 | 18, 17, 16, 15, dBm MCS_1SS - MCS8 MCS9 MCS10 MCS11 | 14, 13, 12, 11, dBm MCS_1SS - MCS12 MCS13 | 10, 9, dBm HEDCM_1SS - MCS0 MCS1 MCS3 MCS4 | 18, 18, 18, 18, dBm DLRU_MCS_1SS - MCS0 MCS1 MCS2 MCS3 | 18, 18, 18, 18, dBm DLRU_MCS_1SS - MCS4 MCS5 MCS6 MCS7 | 18, 17, 16, 15, dBm DLRU_MCS_1SS - MCS8 MCS9 MCS10 MCS11 | 14, 13, 12, 11, dBm DLRU_MCS_1SS - MCS12 MCS13 | 10, 9, dBm DLRU_HEDCM_1SS - MCS0 MCS1 MCS3 MCS4 | 18, 18, 18, 18, dBm MCS_2SS - MCS0 MCS1 MCS2 MCS3 | 18, 18, 18, 18, dBm ... [TX power limit] << 1TX >> CCK_20M - NON_BF BF | 0, 0, dBm CCK_40M - NON_BF BF | 0, 0, dBm OFDM - NON_BF BF | 18, 0, dBm MCS_20M_0 - NON_BF BF | 18, 0, dBm MCS_20M_1 - NON_BF BF | 0, 0, dBm ... Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231003015446.14658-8-pkshih@realtek.com
2023-10-05wifi: rtw89: debug: show txpwr table according to chip genZong-Zhe Yang
Since current TX power stuffs are for ax chips, add a suffix `_ax` to them. Then, when requested to show txpwr table, select table according to chip generation first. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231003015446.14658-7-pkshih@realtek.com
2023-10-05wifi: rtw89: phy: set TX power RU limit according to chip genZong-Zhe Yang
Wi-Fi 6 chips and Wi-Fi 7 chips have different register design for TX power RU limit. We rename original setting stuffs with a suffix `_ax`, concentrate related enum declaration in phy.h, and implement setting flow for Wi-Fi 7 chips. Then, we set TX power RU limit according to chip generation. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231003015446.14658-6-pkshih@realtek.com
2023-10-05wifi: rtw89: phy: set TX power limit according to chip genZong-Zhe Yang
Wi-Fi 6 chips and Wi-Fi 7 chips have different register design for TX power limit. We rename original setting stuffs with a suffix `_ax`, concentrate related enum declaration in phy.h, and implement setting flow for Wi-Fi 7 chips. Then, we set TX power limit according to chip generation. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231003015446.14658-5-pkshih@realtek.com
2023-10-05wifi: rtw89: phy: set TX power offset according to chip genZong-Zhe Yang
We have a register to control TX power of each rate section to increase or decrease an offset. But, Wi-Fi 6 chips and Wi-Fi 7 chips have different address and format for this control register. We rename original setting stuffs with a suffix `_ax` and implement setting flow for Wi-Fi 7 chips. Then, we set TX power offset according to chip generation. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231003015446.14658-4-pkshih@realtek.com
2023-10-05wifi: rtw89: phy: set TX power by rate according to chip genZong-Zhe Yang
Wi-Fi 6 chips and Wi-Fi 7 chips have different register design for TX power by rate. We rename original setting stuffs with a suffix `_ax` and implement setting flow for Wi-Fi 7 chips. Then, we set TX power by rate according to chip generation. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231003015446.14658-3-pkshih@realtek.com
2023-10-05wifi: rtw89: mac: get TX power control register according to chip genZong-Zhe Yang
There are two difference between Wi-Fi 6 and Wi-Fi 7 chips. 1. Address range of TX power control register 2. Checking code to get a TX power control register So, separate the implementation of them, access according to chip generation, and rename original things with a suffix `_ax`. Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231003015446.14658-2-pkshih@realtek.com
2023-10-05wifi: ath12k: fix debug messagesKalle Valo
In ath12k the debug messages were broken, no matter setting what value to the debug_mask module parameter would not get the debug messages printed. The issue is that __ath12k_dbg() uses dev_dbg() to print the debug messages which requires either enabling CONFIG_DYNAMIC_DEBUG or DEBUG symbol in the driver. ath12k is supposed to use debug_mask module to control whether debug messages are printed or not. Using both CONFIG_DYNAMIC_DEBUG and debug_mask parameter does not make any sense so switch to using dev_printk(), just like ath11k does. Now it's enough just to debug_mask module parameter to get the debug messages. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20231003150132.187875-1-kvalo@kernel.org
2023-10-05wifi: ath11k: fix Tx power value during active CACAditya Kumar Singh
Tx power is fetched from firmware's pdev stats. However, during active CAC, firmware does not fill the current Tx power and sends the max initialised value filled during firmware init. If host sends this power to user space, this is wrong since in certain situations, the Tx power could be greater than the max allowed by the regulatory. Hence, host should not be fetching the Tx power during an active CAC. Fix this issue by returning -EAGAIN error so that user space knows that there's no valid value available. Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1 Fixes: 9a2aa68afe3d ("wifi: ath11k: add get_txpower mac ops") Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230912051857.2284-4-quic_adisi@quicinc.com
2023-10-05wifi: ath11k: fix CAC running state during virtual interface startAditya Kumar Singh
Currently channel definition's primary channel's DFS CAC time as well as primary channel's state i.e usable are used to set the CAC_RUNNING flag for the ath11k radio structure. However, this is wrong since certain channel definition are possbile where primary channel may not be a DFS channel but, secondary channel is a DFS channel. For example - channel 36 with 160 MHz bandwidth. In such cases, the flag will not be set which is wrong. Fix this issue by using cfg80211_chandef_dfs_usable() function from cfg80211 which return trues if at least one channel is in usable state. While at it, modify the CAC running debug log message to print the CAC time as well in milli-seconds. Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1 Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230912051857.2284-3-quic_adisi@quicinc.com
2023-10-04Merge tag 'rtla-v6.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux Pull rtla fixes from Daniel Bristot de Oliveira: "rtla (Real-Time Linux Analysis) tool fixes. Timerlat auto-analysis: - Timerlat is reporting thread interference time without thread noise events occurrence. It was caused because the thread interference variable was not reset after the analysis of a timerlat activation that did not hit the threshold. - The IRQ handler delay is estimated from the delta of the IRQ latency reported by timerlat, and the timestamp from IRQ handler start event. If the delta is near-zero, the drift from the external clock and the trace event and/or the overhead can cause the value to be negative. If the value is negative, print a zero-delay. - IRQ handlers happening after the timerlat thread event but before the stop tracing were being reported as IRQ that happened before the *current* IRQ occurrence. Ignore Previous IRQ noise in this condition because they are valid only for the *next* timerlat activation. Timerlat user-space: - Timerlat is stopping all user-space thread if a CPU becomes offline. Do not stop the entire tool if a CPU is/become offline, but only the thread of the unavailable CPU. Stop the tool only, if all threads leave because the CPUs become/are offline. man-pages: - Fix command line example in timerlat hist man page" * tag 'rtla-v6.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bristot/linux: rtla: fix a example in rtla-timerlat-hist.rst rtla/timerlat: Do not stop user-space if a cpu is offline rtla/timerlat_aa: Fix previous IRQ delay for IRQs that happens after thread sample rtla/timerlat_aa: Fix negative IRQ delay rtla/timerlat_aa: Zero thread sum after every sample analysis
2023-10-04Merge branch 'ynl-makefile-cleanup'Jakub Kicinski
Jakub Kicinski says: ==================== ynl Makefile cleanup While catching up on recent changes I noticed unexpected changes to Makefiles in YNL. Indeed they were not working as intended but the fixes put in place were not what I had in mind :) ==================== Link: https://lore.kernel.org/r/20231003153416.2479808-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04tools: ynl: use uAPI include magic for samplesJakub Kicinski
Makefile.deps provides direct includes in CFLAGS_$(obj). We just need to rewrite the rules to make use of the extra flags, no need to hard-include all of tools/include/uapi. Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231003153416.2479808-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04tools: ynl: don't regen on every makeJakub Kicinski
As far as I can tell the normal Makefile dependency tracking works, generated files get re-generated if the YAML was updated. Let make do its job, don't force the re-generation. make hardclean can be used to force regeneration. Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231003153416.2479808-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04ynl: netdev: drop unnecessary enum-as-flagsJakub Kicinski
enum-as-flags can be used when enum declares bit positions but we want to carry bitmask in an attribute. If the definition is already provided as flags there's no need to indicate the flag-iness of the attribute. Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231003153416.2479808-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04netlink: annotate data-races around sk->sk_errEric Dumazet
syzbot caught another data-race in netlink when setting sk->sk_err. Annotate all of them for good measure. BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg write to 0xffff8881613bb220 of 4 bytes by task 28147 on cpu 0: netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994 sock_recvmsg_nosec net/socket.c:1027 [inline] sock_recvmsg net/socket.c:1049 [inline] __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229 __do_sys_recvfrom net/socket.c:2247 [inline] __se_sys_recvfrom net/socket.c:2243 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffff8881613bb220 of 4 bytes by task 28146 on cpu 1: netlink_recvmsg+0x448/0x780 net/netlink/af_netlink.c:1994 sock_recvmsg_nosec net/socket.c:1027 [inline] sock_recvmsg net/socket.c:1049 [inline] __sys_recvfrom+0x1f4/0x2e0 net/socket.c:2229 __do_sys_recvfrom net/socket.c:2247 [inline] __se_sys_recvfrom net/socket.c:2243 [inline] __x64_sys_recvfrom+0x78/0x90 net/socket.c:2243 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00000000 -> 0x00000016 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 28146 Comm: syz-executor.0 Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231003183455.3410550-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04net: skb_queue_purge_reason() optimizationsEric Dumazet
1) Exit early if the list is empty. 2) splice the list into a local list, so that we block hard irqs only once. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231003181920.3280453-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04sctp: update hb timer immediately after users change hb_intervalXin Long
Currently, when hb_interval is changed by users, it won't take effect until the next expiry of hb timer. As the default value is 30s, users have to wait up to 30s to wait its hb_interval update to work. This becomes pretty bad in containers where a much smaller value is usually set on hb_interval. This patch improves it by resetting the hb timer immediately once the value of hb_interval is updated by users. Note that we don't address the already existing 'problem' when sending a heartbeat 'on demand' if one hb has just been sent(from the timer) mentioned in: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg590224.html Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/75465785f8ee5df2fb3acdca9b8fafdc18984098.1696172660.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04sctp: update transport state when processing a dupcook packetXin Long
During the 4-way handshake, the transport's state is set to ACTIVE in sctp_process_init() when processing INIT_ACK chunk on client or COOKIE_ECHO chunk on server. In the collision scenario below: 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] 192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] when processing COOKIE_ECHO on 192.168.1.2, as it's in COOKIE_WAIT state, sctp_sf_do_dupcook_b() is called by sctp_sf_do_5_2_4_dupcook() where it creates a new association and sets its transport to ACTIVE then updates to the old association in sctp_assoc_update(). However, in sctp_assoc_update(), it will skip the transport update if it finds a transport with the same ipaddr already existing in the old asoc, and this causes the old asoc's transport state not to move to ACTIVE after the handshake. This means if DATA retransmission happens at this moment, it won't be able to enter PF state because of the check 'transport->state == SCTP_ACTIVE' in sctp_do_8_2_transport_strike(). This patch fixes it by updating the transport in sctp_assoc_update() with sctp_assoc_add_peer() where it updates the transport state if there is already a transport with the same ipaddr exists in the old asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/fd17356abe49713ded425250cc1ae51e9f5846c6.1696172325.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-10-03 (i40e, iavf) This series contains updates to i40e and iavf drivers. Yajun Deng aligns reporting of buffer exhaustion statistics to follow documentation for i40e. Jake removes undesired 'inline' from functions in iavf. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: remove "inline" functions from iavf_txrx.c i40e: Add rx_missed_errors for buffer exhaustion ==================== Link: https://lore.kernel.org/r/20231003223610.2004976-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04Merge branch ↵Jakub Kicinski
'fix-a-couple-recent-instances-of-wincompatible-function-pointer-types-strict-from-mode_get-implementations' Nathan Chancellor says: ==================== Fix a couple recent instances of -Wincompatible-function-pointer-types-strict from ->mode_get() implementations This series fixes a couple of instances of -Wincompatible-function-pointer-types-strict that were introduced by a recent series that added a new type of ops, struct dpll_device_ops, along with implementations of the callback ->mode_get() that had a mismatched mode type. This warning is not currently enabled for any build but I am planning on submitting a patch to add it to W=1 to prevent new instances of the warning from popping up while we try and fix the existing instances in other drivers. This series is based on current net-next but if they need to go into individual maintainer trees, please feel free to take the patches individually. ==================== Link: https://lore.kernel.org/r/20231002-net-wifpts-dpll_mode_get-v1-0-a356a16413cf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04mlx5: Fix type of mode parameter in mlx5_dpll_device_mode_get()Nathan Chancellor
When building with -Wincompatible-function-pointer-types-strict, a warning designed to catch potential kCFI failures at build time rather than run time due to incorrect function pointer types, there is a warning due to a mismatch between the type of the mode parameter in mlx5_dpll_device_mode_get() vs. what the function pointer prototype for ->mode_get() in 'struct dpll_device_ops' expects. drivers/net/ethernet/mellanox/mlx5/core/dpll.c:141:14: error: incompatible function pointer types initializing 'int (*)(const struct dpll_device *, void *, enum dpll_mode *, struct netlink_ext_ack *)' with an expression of type 'int (const struct dpll_device *, void *, u32 *, struct netlink_ext_ack *)' (aka 'int (const struct dpll_device *, void *, unsigned int *, struct netlink_ext_ack *)') [-Werror,-Wincompatible-function-pointer-types-strict] 141 | .mode_get = mlx5_dpll_device_mode_get, | ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Change the type of the mode parameter in mlx5_dpll_device_mode_get() to clear up the warning and avoid kCFI failures at run time. Fixes: 496fd0a26bbf ("mlx5: Implement SyncE support using DPLL infrastructure") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231002-net-wifpts-dpll_mode_get-v1-2-a356a16413cf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04ptp: Fix type of mode parameter in ptp_ocp_dpll_mode_get()Nathan Chancellor
When building with -Wincompatible-function-pointer-types-strict, a warning designed to catch potential kCFI failures at build time rather than run time due to incorrect function pointer types, there is a warning due to a mismatch between the type of the mode parameter in ptp_ocp_dpll_mode_get() vs. what the function pointer prototype for ->mode_get() in 'struct dpll_device_ops' expects. drivers/ptp/ptp_ocp.c:4353:14: error: incompatible function pointer types initializing 'int (*)(const struct dpll_device *, void *, enum dpll_mode *, struct netlink_ext_ack *)' with an expression of type 'int (const struct dpll_device *, void *, u32 *, struct netlink_ext_ack *)' (aka 'int (const struct dpll_device *, void *, unsigned int *, struct netlink_ext_ack *)') [-Werror,-Wincompatible-function-pointer-types-strict] 4353 | .mode_get = ptp_ocp_dpll_mode_get, | ^~~~~~~~~~~~~~~~~~~~~ 1 error generated. Change the type of the mode parameter in ptp_ocp_dpll_mode_get() to clear up the warning and avoid kCFI failures at run time. Fixes: 09eeb3aecc6c ("ptp_ocp: implement DPLL ops") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231002-net-wifpts-dpll_mode_get-v1-1-a356a16413cf@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04Merge branch 'r8152-modify-rx_bottom'Jakub Kicinski
Hayes Wang says: ==================== r8152: modify rx_bottom v3: For patch #1, this patch is replaced. The new patch only break the loop, and keep that the driver would queue the rx packets. For patch #2, modify the code depends on patch #1. For work_down < budget, napi_get_frags() and napi_gro_frags() would be used. For the others, nothing is changed. v2: For patch #1, add comment, update commit message, and add Fixes tag. v1: These patches are used to improve rx_bottom(). ==================== Link: https://lore.kernel.org/r/20230926111714.9448-432-nic_swsd@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04r8152: use napi_gro_fragsHayes Wang
Use napi_gro_frags() for the skb of fragments when the work_done is less than budget. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/20230926111714.9448-434-nic_swsd@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04r8152: break the loop when the budget is exhaustedHayes Wang
A bulk transfer of the USB may contain many packets. And, the total number of the packets in the bulk transfer may be more than budget. Originally, only budget packets would be handled by napi_gro_receive(), and the other packets would be queued in the driver for next schedule. This patch would break the loop about getting next bulk transfer, when the budget is exhausted. That is, only the current bulk transfer would be handled, and the other bulk transfers would be queued for next schedule. Besides, the packets which are more than the budget in the current bulk trasnfer would be still queued in the driver, as the original method. In addition, a bulk transfer wouldn't contain more than 400 packets, so the check of queue length is unnecessary. Therefore, I replace it with WARN_ON_ONCE(). Fixes: cf74eb5a5bc8 ("eth: r8152: try to use a normal budget") Signed-off-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/20230926111714.9448-433-nic_swsd@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04Merge branch 'chelsio-annotate-structs-with-__counted_by'Jakub Kicinski
Kees Cook says: ==================== chelsio: Annotate structs with __counted_by This annotates several chelsio structures with the coming __counted_by attribute for bounds checking of flexible arrays at run-time. For more details, see commit dd06e72e68bc ("Compiler Attributes: Add __counted_by macro"). ==================== Link: https://lore.kernel.org/r/20230929181042.work.990-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04cxgb4: Annotate struct smt_data with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct smt_data. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230929181149.3006432-5-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04cxgb4: Annotate struct sched_table with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct sched_table. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230929181149.3006432-4-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04cxgb4: Annotate struct cxgb4_tc_u32_table with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cxgb4_tc_u32_table. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230929181149.3006432-3-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04cxgb4: Annotate struct clip_tbl with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct clip_tbl. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230929181149.3006432-2-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04chelsio/l2t: Annotate struct l2t_data with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct l2t_data. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230929181149.3006432-1-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04tcp: fix delayed ACKs for MSS boundary conditionNeal Cardwell
This commit fixes poor delayed ACK behavior that can cause poor TCP latency in a particular boundary condition: when an application makes a TCP socket write that is an exact multiple of the MSS size. The problem is that there is painful boundary discontinuity in the current delayed ACK behavior. With the current delayed ACK behavior, we have: (1) If an app reads data when > 1*MSS is unacknowledged, then tcp_cleanup_rbuf() ACKs immediately because of: tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss || (2) If an app reads all received data, and the packets were < 1*MSS, and either (a) the app is not ping-pong or (b) we received two packets < 1*MSS, then tcp_cleanup_rbuf() ACKs immediately beecause of: ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED2) || ((icsk->icsk_ack.pending & ICSK_ACK_PUSHED) && !inet_csk_in_pingpong_mode(sk))) && (3) *However*: if an app reads exactly 1*MSS of data, tcp_cleanup_rbuf() does not send an immediate ACK. This is true even if the app is not ping-pong and the 1*MSS of data had the PSH bit set, suggesting the sending application completed an application write. Thus if the app is not ping-pong, we have this painful case where >1*MSS gets an immediate ACK, and <1*MSS gets an immediate ACK, but a write whose last skb is an exact multiple of 1*MSS can get a 40ms delayed ACK. This means that any app that transfers data in one direction and takes care to align write size or packet size with MSS can suffer this problem. With receive zero copy making 4KB MSS values more common, it is becoming more common to have application writes naturally align with MSS, and more applications are likely to encounter this delayed ACK problem. The fix in this commit is to refine the delayed ACK heuristics with a simple check: immediately ACK a received 1*MSS skb with PSH bit set if the app reads all data. Why? If an skb has a len of exactly 1*MSS and has the PSH bit set then it is likely the end of an application write. So more data may not be arriving soon, and yet the data sender may be waiting for an ACK if cwnd-bound or using TX zero copy. Thus we set ICSK_ACK_PUSHED in this case so that tcp_cleanup_rbuf() will send an ACK immediately if the app reads all of the data and is not ping-pong. Note that this logic is also executed for the case where len > MSS, but in that case this logic does not matter (and does not hurt) because tcp_cleanup_rbuf() will always ACK immediately if the app reads data and there is more than an MSS of unACKed data. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Xin Guo <guoxin0309@gmail.com> Link: https://lore.kernel.org/r/20231001151239.1866845-2-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04tcp: fix quick-ack counting to count actual ACKs of new dataNeal Cardwell
This commit fixes quick-ack counting so that it only considers that a quick-ack has been provided if we are sending an ACK that newly acknowledges data. The code was erroneously using the number of data segments in outgoing skbs when deciding how many quick-ack credits to remove. This logic does not make sense, and could cause poor performance in request-response workloads, like RPC traffic, where requests or responses can be multi-segment skbs. When a TCP connection decides to send N quick-acks, that is to accelerate the cwnd growth of the congestion control module controlling the remote endpoint of the TCP connection. That quick-ack decision is purely about the incoming data and outgoing ACKs. It has nothing to do with the outgoing data or the size of outgoing data. And in particular, an ACK only serves the intended purpose of allowing the remote congestion control to grow the congestion window quickly if the ACK is ACKing or SACKing new data. The fix is simple: only count packets as serving the goal of the quickack mechanism if they are ACKing/SACKing new data. We can tell whether this is the case by checking inet_csk_ack_scheduled(), since we schedule an ACK exactly when we are ACKing/SACKing new data. Fixes: fc6415bcb0f5 ("[TCP]: Fix quick-ack decrementing with TSO.") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231001151239.1866845-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04Merge tag 'nf-23-10-04' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter patches for net First patch resolves a regression with vlan header matching, this was broken since 6.5 release. From myself. Second patch fixes an ancient problem with sctp connection tracking in case INIT_ACK packets are delayed. This comes with a selftest, both patches from Xin Long. Patch 4 extends the existing nftables audit selftest, from Phil Sutter. Patch 5, also from Phil, avoids a situation where nftables would emit an audit record twice. This was broken since 5.13 days. Patch 6, from myself, avoids spurious insertion failure if we encounter an overlapping but expired range during element insertion with the 'nft_set_rbtree' backend. This problem exists since 6.2. * tag 'nf-23-10-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: nft_set_rbtree: fix spurious insertion failure netfilter: nf_tables: Deduplicate nft_register_obj audit logs selftests: netfilter: Extend nft_audit.sh selftests: netfilter: test for sctp collision processing in nf_conntrack netfilter: handle the connecting collision properly in nf_conntrack_proto_sctp netfilter: nft_payload: rebuild vlan header on h_proto access ==================== Link: https://lore.kernel.org/r/20231004141405.28749-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04Merge tag 'nf-next-23-09-28' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next First patch, from myself, is a bug fix. The issue (connect timeout) is ancient, so I think its safe to give this more soak time given the esoteric conditions needed to trigger this. Also updates the existing selftest to cover this. Add netlink extacks when an update references a non-existent table/chain/set. This allows userspace to provide much better errors to the user, from Pablo Neira Ayuso. Last patch adds more policy checks to nf_tables as a better alternative to the existing runtime checks, from Phil Sutter. * tag 'nf-next-23-09-28' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: Utilize NLA_POLICY_NESTED_ARRAY netfilter: nf_tables: missing extended netlink error in lookup functions selftests: netfilter: test nat source port clash resolution interaction with tcp early demux netfilter: nf_nat: undo erroneous tcp edemux lookup after port clash ==================== Link: https://lore.kernel.org/r/20230928144916.18339-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04page_pool: fix documentation typosRandy Dunlap
Correct grammar for better readability. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20231001003846.29541-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04sctp: Spelling s/preceeding/preceding/gGeert Uytterhoeven
Fix a misspelling of "preceding". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/663b14d07d6d716ddc34482834d6b65a2f714cfb.1695903447.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04tipc: fix a potential deadlock on &tx->lockChengfeng Ye
It seems that tipc_crypto_key_revoke() could be be invoked by wokequeue tipc_crypto_work_rx() under process context and timer/rx callback under softirq context, thus the lock acquisition on &tx->lock seems better use spin_lock_bh() to prevent possible deadlock. This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. tipc_crypto_work_rx() <workqueue> --> tipc_crypto_key_distr() --> tipc_bcast_xmit() --> tipc_bcbase_xmit() --> tipc_bearer_bc_xmit() --> tipc_crypto_xmit() --> tipc_ehdr_build() --> tipc_crypto_key_revoke() --> spin_lock(&tx->lock) <timer interrupt> --> tipc_disc_timeout() --> tipc_bearer_xmit_skb() --> tipc_crypto_xmit() --> tipc_ehdr_build() --> tipc_crypto_key_revoke() --> spin_lock(&tx->lock) <deadlock here> Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Link: https://lore.kernel.org/r/20230927181414.59928-1-dg573847474@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04net: stmmac: dwmac-stm32: fix resume on STM32 MCUBen Wolsieffer
The STM32MP1 keeps clk_rx enabled during suspend, and therefore the driver does not enable the clock in stm32_dwmac_init() if the device was suspended. The problem is that this same code runs on STM32 MCUs, which do disable clk_rx during suspend, causing the clock to never be re-enabled on resume. This patch adds a variant flag to indicate that clk_rx remains enabled during suspend, and uses this to decide whether to enable the clock in stm32_dwmac_init() if the device was suspended. This approach fixes this specific bug with limited opportunity for unintended side-effects, but I have a follow up patch that will refactor the clock configuration and hopefully make it less error prone. Fixes: 6528e02cc9ff ("net: ethernet: stmmac: add adaptation for stm32mp157c.") Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230927175749.1419774-1-ben.wolsieffer@hefring.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04ptp: ocp: fix error code in probe()Dan Carpenter
There is a copy and paste error so this uses a valid pointer instead of an error pointer. Fixes: 09eeb3aecc6c ("ptp_ocp: implement DPLL ops") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://lore.kernel.org/r/5c581336-0641-48bd-88f7-51984c3b1f79@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>