summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2025-03-04ppp: use IFF_NO_QUEUE in virtual interfacesQingfang Deng
For PPPoE, PPTP, and PPPoL2TP, the start_xmit() function directly forwards packets to the underlying network stack and never returns anything other than 1. So these interfaces do not require a qdisc, and the IFF_NO_QUEUE flag should be set. Introduces a direct_xmit flag in struct ppp_channel to indicate when IFF_NO_QUEUE should be applied. The flag is set in ppp_connect_channel() for relevant protocols. While at it, remove the usused latency member from struct ppp_channel. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250301135517.695809-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04mptcp: pm: exit early with ADD_ADDR echo if possibleMatthieu Baerts (NGI0)
When the userspace PM is used, or when the in-kernel limits are reached, there will be no need to schedule the PM worker to signal new addresses. That corresponds to pm->work_pending set to 0. In this case, an early exit can be done in mptcp_pm_add_addr_echoed() not to hold the PM lock, and iterate over the announced addresses list, not to schedule the worker anyway in this case. This is similar to what is done when a connection or a subflow has been established. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-5-f933c4275676@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04mptcp: pm: in-kernel: reduce parameters of set_flagsGeliang Tang
The number of parameters in mptcp_nl_set_flags() can be reduced. Only need to pass a "local" parameter to it instead of "local->addr" and "local->flags". Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-4-f933c4275676@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04mptcp: pm: in-kernel: avoid access entry without lockGeliang Tang
In mptcp_pm_nl_set_flags(), "entry" is copied to "local" when pernet->lock is held to avoid direct access to entry without pernet->lock. Therefore, "local->flags" should be passed to mptcp_nl_set_flags instead of "entry->flags" when pernet->lock is not held, so as to avoid access to entry. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Fixes: 145dc6cc4abd ("mptcp: pm: change to fullmesh only for 'subflow'") Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228-net-next-mptcp-coverage-small-opti-v1-3-f933c4275676@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04Merge tag 'wireless-next-2025-03-04-v2' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== First 6.15 material: * cfg80211/mac80211 - remove cooked monitor support - strict mode for better AP testing - basic EPCS support - OMI RX bandwidth reduction support * rtw88 - preparation for RTL8814AU support * rtw89 - use wiphy_lock/wiphy_work - preparations for MLO - BT-Coex improvements - regulatory support in firmware files * iwlwifi - preparations for the new iwlmld sub-driver * tag 'wireless-next-2025-03-04-v2' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (128 commits) wifi: iwlwifi: remove mld/roc.c wifi: mac80211: refactor populating mesh related fields in sinfo wifi: cfg80211: reorg sinfo structure elements for mesh wifi: iwlwifi: Fix spelling mistake "Increate" -> "Increase" wifi: iwlwifi: add Debug Host Command APIs wifi: iwlwifi: add IWL_MAX_NUM_IGTKS macro wifi: iwlwifi: add OMI bandwidth reduction APIs wifi: iwlwifi: remove mvm prefix from iwl_mvm_d3_end_notif wifi: iwlwifi: remember if the UATS table was read successfully wifi: iwlwifi: export iwl_get_lari_config_bitmap wifi: iwlwifi: add support for external 32 KHz clock wifi: iwlwifi: mld: add a debug level for EHT prints wifi: iwlwifi: mld: add a debug level for PTP prints wifi: iwlwifi: remove mvm prefix from iwl_mvm_esr_mode_notif wifi: iwlwifi: use 0xff instead of 0xffffffff for invalid wifi: iwlwifi: location api cleanup wifi: cfg80211: expose update timestamp to drivers wifi: mac80211: add ieee80211_iter_chan_contexts_mtx wifi: mac80211: fix integer overflow in hwmp_route_info_get() wifi: mac80211: Fix possible integer promotion issue ... ==================== Link: https://patch.msgid.link/20250304125605.127914-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04Merge tag 'wireless-2025-03-04' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== bugfixes for 6.14: * regressions from this cycle: - mac80211: fix sparse warning for monitor - nl80211: disable multi-link reconfiguration (needs fixing) * older issues: - cfg80211: reject badly combined cooked monitor, fix regulatory hint validity checks - mac80211: handle TXQ flush w/o driver per-sta flush, fix debugfs for monitor, fix element inheritance - iwlwifi: fix rfkill, dead firmware handling, rate API version, free A-MSDU handling, avoid large allocations, fix string format - brcmfmac: fix power handling on some boards * tag 'wireless-2025-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: nl80211: disable multi-link reconfiguration wifi: cfg80211: regulatory: improve invalid hints checking wifi: brcmfmac: keep power during suspend if board requires it wifi: mac80211: Fix sparse warning for monitor_sdata wifi: mac80211: fix vendor-specific inheritance wifi: mac80211: fix MLE non-inheritance parsing wifi: iwlwifi: Fix A-MSDU TSO preparation wifi: iwlwifi: Free pages allocated when failing to build A-MSDU wifi: iwlwifi: limit printed string from FW file wifi: iwlwifi: mvm: use the right version of the rate API wifi: iwlwifi: mvm: don't try to talk to a dead firmware wifi: iwlwifi: mvm: don't dump the firmware state upon RFKILL while suspend wifi: iwlwifi: mvm: clean up ROC on failure wifi: iwlwifi: fw: avoid using an uninitialized variable wifi: iwlwifi: fw: allocate chained SG tables for dump wifi: mac80211: remove debugfs dir for virtual monitor wifi: mac80211: Cleanup sta TXQs on flush wifi: nl80211: reject cooked mode if it is set along with other flags ==================== Link: https://patch.msgid.link/20250304124435.126272-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04s390: Convert MACHINE_IS_[LPAR|VM|KVM], etc, machine_is_[lpar|vm|kvm]()Heiko Carstens
Move machine type detection to the decompressor and use static branches to implement and use machine_is_[lpar|vm|kvm]() instead of a runtime check via MACHINE_IS_[LPAR|VM|KVM]. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-04wifi: nl80211: disable multi-link reconfigurationJohannes Berg
Both the APIs in cfg80211 and the implementation in mac80211 aren't really ready yet, we have a large number of fixes. In addition, it's not possible right now to discover support for this feature from userspace. Disable it for now, there's no rush. Link: https://patch.msgid.link/20250303110538.fbeef42a5687.Iab122c22137e5675ebd99f5c031e30c0e5c7af2e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-04net: plumb extack in __dev_change_net_namespace()Nicolas Dichtel
It could be hard to understand why the netlink command fails. For example, if dev->netns_immutable is set, the error is "Invalid argument". Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: advertise netns_immutable property via netlinkNicolas Dichtel
Since commit 05c1280a2bcf ("netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local"), there is no way to see if the netns_immutable property s set on a device. Let's add a netlink attribute to advertise it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: rename netns_local to netns_immutableNicolas Dichtel
The name 'netns_local' is confusing. A following commit will export it via netlink, so let's use a more explicit name. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: remove all superfluous index assignementsPeter Seiderer
Remove all superfluous index ('i += len') assignements (value not used afterwards). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: fix mpls reset parsingPeter Seiderer
Fix mpls list reset parsing to work as describe in Documentation/networking/pktgen.rst: pgset "mpls 0" turn off mpls (or any invalid argument works too!) - before the patch $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 mpls: 00000001, 00000002 Result: OK: mpls=00000001,00000002 $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ echo "mpls 0" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 mpls: 00000000 Result: OK: mpls=00000000 $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ echo "mpls invalid" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 Result: OK: mpls= - after the patch $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 mpls: 00000001, 00000002 Result: OK: mpls=00000001,00000002 $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ echo "mpls 0" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 Result: OK: mpls= $ echo "mpls 00000001,00000002" > /proc/net/pktgen/lo\@0 $ echo "mpls invalid" > /proc/net/pktgen/lo\@0 $ grep mpls /proc/net/pktgen/lo\@0 Result: OK: mpls= Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: fix access outside of user given buffer in pktgen_if_write()Peter Seiderer
Honour the user given buffer size for the hex32_arg(), num_arg(), strn_len(), get_imix_entries() and get_labels() calls (otherwise they will access memory outside of the user given buffer). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: fix mpls maximum labels list parsingPeter Seiderer
Fix mpls maximum labels list parsing up to MAX_MPLS_LABELS entries (instead of up to MAX_MPLS_LABELS - 1). Addresses the following: $ echo "mpls 00000f00,00000f01,00000f02,00000f03,00000f04,00000f05,00000f06,00000f07,00000f08,00000f09,00000f0a,00000f0b,00000f0c,00000f0d,00000f0e,00000f0f" > /proc/net/pktgen/lo\@0 -bash: echo: write error: Argument list too long Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: remove some superfluous variable initializingPeter Seiderer
Remove some superfluous variable initializing before hex32_arg call (as the same init is done here already). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: remove extra tmp variable (re-use len instead)Peter Seiderer
Remove extra tmp variable in pktgen_if_write (re-use len instead). Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: pktgen: fix mix of int/longPeter Seiderer
Fix mix of int/long (and multiple conversion from/to) by using consequently size_t for i and max and ssize_t for len and adjust function signatures of hex32_arg(), count_trail_chars(), num_arg() and strn_len() accordingly. Signed-off-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-03mptcp: Remove unused declaration mptcp_set_owner_r()Yue Haibing
Commit 6639498ed85f ("mptcp: cleanup mem accounting") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250228095148.4003065-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03mptcp: use sock_kmemdup for address entryGeliang Tang
Instead of using sock_kmalloc() to allocate an address entry "e" and then immediately duplicate the input "entry" to it, the newly added sock_kmemdup() helper can be used in mptcp_userspace_pm_append_new_local_addr() to simplify the code. More importantly, the code "*e = *entry;" that assigns "entry" to "e" is not easy to implemented in BPF if we use the same code to implement an append_new_local_addr() helper of a BFP path manager. This patch avoids this type of memory assignment operation. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/3e5a307aed213038a87e44ff93b5793229b16279.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03net: use sock_kmemdup for ip_optionsGeliang Tang
Instead of using sock_kmalloc() to allocate an ip_options and then immediately duplicate another ip_options to the newly allocated one in ipv6_dup_options(), mptcp_copy_ip_options() and sctp_v4_copy_ip_options(), the newly added sock_kmemdup() helper can be used to simplify the code. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/91ae749d66600ec6fb679e0e518fda6acb5c3e6f.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03sock: add sock_kmemdup helperGeliang Tang
This patch adds the sock version of kmemdup() helper, named sock_kmemdup(), to duplicate the input "src" memory block using the socket's option memory buffer. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/f828077394c7d1f3560123497348b438c875b510.1740735165.git.tanggeliang@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: tcp_set_window_clamp() cleanupEric Dumazet
Remove one indentation level. Use max_t() and clamp() macros. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: remove READ_ONCE(req->ts_recent)Eric Dumazet
After commit 8d52da23b6c6 ("tcp: Defer ts_recent changes until req is owned"), req->ts_recent is not changed anymore. It is set once in tcp_openreq_init(), bpf_sk_assign_tcp_reqsk() or cookie_tcp_reqsk_alloc() before the req can be seen by other cpus/threads. This completes the revert of eba20811f326 ("tcp: annotate data-races around tcp_rsk(req)->ts_recent"). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wang Hai <wanghai38@huawei.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03net: gro: convert four dev_net() callsEric Dumazet
tcp4_check_fraglist_gro(), tcp6_check_fraglist_gro(), udp4_gro_lookup_skb() and udp6_gro_lookup_skb() assume RCU is held so that the net structure does not disappear. Use dev_net_rcu() instead of dev_net() to get LOCKDEP support. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: convert to dev_net_rcu()Eric Dumazet
TCP uses of dev_net() are under RCU protection, change them to dev_net_rcu() to get LOCKDEP support. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: add four drop reasons to tcp_check_req()Eric Dumazet
Use two existing drop reasons in tcp_check_req(): - TCP_RFC7323_PAWS - TCP_OVERWINDOW Add two new ones: - TCP_RFC7323_TSECR (corresponds to LINUX_MIB_TSECRREJECTED) - TCP_LISTEN_OVERFLOW (when a listener accept queue is full) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03tcp: add a drop_reason pointer to tcp_check_req()Eric Dumazet
We want to add new drop reasons for packets dropped in 3WHS in the following patches. tcp_rcv_state_process() has to set reason to TCP_FASTOPEN, because tcp_check_req() will conditionally overwrite the drop_reason. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250301201424.2046477-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL.Kuniyuki Iwashima
We converted fib_info hash tables to per-netns one and now ready to convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL. Let's hold rtnl_net_lock() in inet_rtm_newroute() and inet_rtm_delroute(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-13-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config().Kuniyuki Iwashima
fib_valid_key_len() is called in the beginning of fib_table_insert() or fib_table_delete() to check if the prefix length is valid. fib_table_insert() and fib_table_delete() are called from 3 paths - ip_rt_ioctl() - inet_rtm_newroute() / inet_rtm_delroute() - fib_magic() In the first ioctl() path, rtentry_to_fib_config() checks the prefix length with bad_mask(). Also, fib_magic() always passes the correct prefix: 32 or ifa->ifa_prefixlen, which is already validated. Let's move fib_valid_key_len() to the rtnetlink path, rtm_to_fib_config(). While at it, 2 direct returns in rtm_to_fib_config() are changed to goto to match other places in the same function Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Hold rtnl_net_lock() in ip_rt_ioctl().Kuniyuki Iwashima
ioctl(SIOCADDRT/SIOCDELRT) calls ip_rt_ioctl() to add/remove a route in the netns of the specified socket. Let's hold rtnl_net_lock() there. Note that rtentry_to_fib_config() can be called without rtnl_net_lock() if we convert rtentry.dev handling to RCU later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Hold rtnl_net_lock() for ip_fib_net_exit().Kuniyuki Iwashima
ip_fib_net_exit() requires RTNL and is called from fib_net_init() and fib_net_exit_batch(). Let's hold rtnl_net_lock() before ip_fib_net_exit(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Namespacify fib_info hash tables.Kuniyuki Iwashima
We will convert RTM_NEWROUTE and RTM_DELROUTE to per-netns RTNL. Then, we need to have per-netns hash tables for struct fib_info. Let's allocate the hash tables per netns. fib_info_hash, fib_info_hash_bits, and fib_info_cnt are now moved to struct netns_ipv4 and accessed with net->ipv4.fib_XXX. Also, the netns checks are removed from fib_find_info_nh() and fib_find_info(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Add fib_info_hash_grow().Kuniyuki Iwashima
When the number of struct fib_info exceeds the hash table size in fib_create_info(), we try to allocate a new hash table with the doubled size. The allocation is done in fib_create_info(), and if successful, each struct fib_info is moved to the new hash table by fib_info_hash_move(). Let's integrate the allocation and fib_info_hash_move() as fib_info_hash_grow() to make the following change cleaner. While at it, fib_info_hash_grow() is placed near other hash-table-specific functions. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Remove fib_info_hash_size.Kuniyuki Iwashima
We will allocate the fib_info hash tables per netns. There are 5 global variables for fib_info hash tables: fib_info_hash, fib_info_laddrhash, fib_info_hash_size, fib_info_hash_bits, fib_info_cnt. However, fib_info_laddrhash and fib_info_hash_size can be easily calculated from fib_info_hash and fib_info_hash_bits. Let's remove fib_info_hash_size and use (1 << fib_info_hash_bits) instead. Now we need not pass the new hash table size to fib_info_hash_move(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Remove fib_info_laddrhash pointer.Kuniyuki Iwashima
We will allocate the fib_info hash tables per netns. There are 5 global variables for fib_info hash tables: fib_info_hash, fib_info_laddrhash, fib_info_hash_size, fib_info_hash_bits, fib_info_cnt. However, fib_info_laddrhash and fib_info_hash_size can be easily calculated from fib_info_hash and fib_info_hash_bits. Let's remove the fib_info_laddrhash pointer and instead use fib_info_hash + (1 << fib_info_hash_bits). While at it, fib_info_laddrhash_bucket() is moved near other hash-table-specific functions. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Make fib_info_hashfn() return struct hlist_head.Kuniyuki Iwashima
Every time fib_info_hashfn() returns a hash value, we fetch &fib_info_hash[hash]. Let's return the hlist_head pointer from fib_info_hashfn() and rename it to fib_info_hash_bucket() to match a similar function, fib_info_laddrhash_bucket(). Note that we need to move the fib_info_hash assignment earlier in fib_info_hash_move() to use fib_info_hash_bucket() in the for loop. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Allocate fib_info_hash[] during netns initialisation.Kuniyuki Iwashima
We will allocate fib_info_hash[] and fib_info_laddrhash[] for each netns. Currently, fib_info_hash[] is allocated when the first route is added. Let's move the first allocation to a new __net_init function. Note that we must call fib4_semantics_exit() in fib_net_exit_batch() because ->exit() is called earlier than ->exit_batch(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Allocate fib_info_hash[] and fib_info_laddrhash[] by kvcalloc().Kuniyuki Iwashima
Both fib_info_hash[] and fib_info_laddrhash[] are hash tables for struct fib_info and are allocated by kvzmalloc() separately. Let's replace the two kvzmalloc() calls with kvcalloc() to remove the fib_info_laddrhash pointer later. Note that fib_info_hash_alloc() allocates a new hash table based on fib_info_hash_bits because we will remove fib_info_hash_size later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03ipv4: fib: Use cached net in fib_inetaddr_event().Kuniyuki Iwashima
net is available in fib_inetaddr_event(), let's use it. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250228042328.96624-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-03llc: do not use skb_get() before dev_queue_xmit()Eric Dumazet
syzbot is able to crash hosts [1], using llc and devices not supporting IFF_TX_SKB_SHARING. In this case, e1000 driver calls eth_skb_pad(), while the skb is shared. Simply replace skb_get() by skb_clone() in net/llc/llc_s_ac.c Note that e1000 driver might have an issue with pktgen, because it does not clear IFF_TX_SKB_SHARING, this is an orthogonal change. We need to audit other skb_get() uses in net/llc. [1] kernel BUG at net/core/skbuff.c:2178 ! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 16371 Comm: syz.2.2764 Not tainted 6.14.0-rc4-syzkaller-00052-gac9c34d1e45a #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:pskb_expand_head+0x6ce/0x1240 net/core/skbuff.c:2178 Call Trace: <TASK> __skb_pad+0x18a/0x610 net/core/skbuff.c:2466 __skb_put_padto include/linux/skbuff.h:3843 [inline] skb_put_padto include/linux/skbuff.h:3862 [inline] eth_skb_pad include/linux/etherdevice.h:656 [inline] e1000_xmit_frame+0x2d99/0x5800 drivers/net/ethernet/intel/e1000/e1000_main.c:3128 __netdev_start_xmit include/linux/netdevice.h:5151 [inline] netdev_start_xmit include/linux/netdevice.h:5160 [inline] xmit_one net/core/dev.c:3806 [inline] dev_hard_start_xmit+0x9a/0x7b0 net/core/dev.c:3822 sch_direct_xmit+0x1ae/0xc30 net/sched/sch_generic.c:343 __dev_xmit_skb net/core/dev.c:4045 [inline] __dev_queue_xmit+0x13d4/0x43e0 net/core/dev.c:4621 dev_queue_xmit include/linux/netdevice.h:3313 [inline] llc_sap_action_send_test_c+0x268/0x320 net/llc/llc_s_ac.c:144 llc_exec_sap_trans_actions net/llc/llc_sap.c:153 [inline] llc_sap_next_state net/llc/llc_sap.c:182 [inline] llc_sap_state_process+0x239/0x510 net/llc/llc_sap.c:209 llc_ui_sendmsg+0xd0d/0x14e0 net/llc/af_llc.c:993 sock_sendmsg_nosec net/socket.c:718 [inline] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+da65c993ae113742a25f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/67c020c0.050a0220.222324.0011.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-03netfilter: nft_ct: Use __refcount_inc() for per-CPU nft_ct_pcpu_template.Sebastian Andrzej Siewior
nft_ct_pcpu_template is a per-CPU variable and relies on disabled BH for its locking. The refcounter is read and if its value is set to one then the refcounter is incremented and variable is used - otherwise it is already in use and left untouched. Without per-CPU locking in local_bh_disable() on PREEMPT_RT the read-then-increment operation is not atomic and therefore racy. This can be avoided by using unconditionally __refcount_inc() which will increment counter and return the old value as an atomic operation. In case the returned counter is not one, the variable is in use and we need to decrement counter. Otherwise we can use it. Use __refcount_inc() instead of read and a conditional increment. Fixes: edee4f1e9245 ("netfilter: nft_ct: add zone id set support") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-03wifi: cfg80211: regulatory: improve invalid hints checkingNikita Zhandarovich
Syzbot keeps reporting an issue [1] that occurs when erroneous symbols sent from userspace get through into user_alpha2[] via regulatory_hint_user() call. Such invalid regulatory hints should be rejected. While a sanity check from commit 47caf685a685 ("cfg80211: regulatory: reject invalid hints") looks to be enough to deter these very cases, there is a way to get around it due to 2 reasons. 1) The way isalpha() works, symbols other than latin lower and upper letters may be used to determine a country/domain. For instance, greek letters will also be considered upper/lower letters and for such characters isalpha() will return true as well. However, ISO-3166-1 alpha2 codes should only hold latin characters. 2) While processing a user regulatory request, between reg_process_hint_user() and regulatory_hint_user() there happens to be a call to queue_regulatory_request() which modifies letters in request->alpha2[] with toupper(). This works fine for latin symbols, less so for weird letter characters from the second part of _ctype[]. Syzbot triggers a warning in is_user_regdom_saved() by first sending over an unexpected non-latin letter that gets malformed by toupper() into a character that ends up failing isalpha() check. Prevent this by enhancing is_an_alpha2() to ensure that incoming symbols are latin letters and nothing else. [1] Syzbot report: ------------[ cut here ]------------ Unexpected user alpha2: A� WARNING: CPU: 1 PID: 964 at net/wireless/reg.c:442 is_user_regdom_saved net/wireless/reg.c:440 [inline] WARNING: CPU: 1 PID: 964 at net/wireless/reg.c:442 restore_alpha2 net/wireless/reg.c:3424 [inline] WARNING: CPU: 1 PID: 964 at net/wireless/reg.c:442 restore_regulatory_settings+0x3c0/0x1e50 net/wireless/reg.c:3516 Modules linked in: CPU: 1 UID: 0 PID: 964 Comm: kworker/1:2 Not tainted 6.12.0-rc5-syzkaller-00044-gc1e939a21eb1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: events_power_efficient crda_timeout_work RIP: 0010:is_user_regdom_saved net/wireless/reg.c:440 [inline] RIP: 0010:restore_alpha2 net/wireless/reg.c:3424 [inline] RIP: 0010:restore_regulatory_settings+0x3c0/0x1e50 net/wireless/reg.c:3516 ... Call Trace: <TASK> crda_timeout_work+0x27/0x50 net/wireless/reg.c:542 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa65/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f2/0x390 kernel/kthread.c:389 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Reported-by: syzbot+e10709ac3c44f3d4e800@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e10709ac3c44f3d4e800 Fixes: 09d989d179d0 ("cfg80211: add regulatory hint disconnect support") Cc: stable@kernel.org Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://patch.msgid.link/20250228134659.1577656-1-n.zhandarovich@fintech.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-02net/tls: use the new scatterwalk functionsEric Biggers
Replace calls to the deprecated function scatterwalk_copychunks() with memcpy_from_scatterwalk(), memcpy_to_scatterwalk(), or scatterwalk_skip() as appropriate. The new functions generally behave more as expected and eliminate the need to call scatterwalk_done() or scatterwalk_pagedone(). However, the new functions intentionally do not advance to the next sg entry right away, which would have broken chain_to_walk() which is accessing the fields of struct scatter_walk directly. To avoid this, replace chain_to_walk() with scatterwalk_get_sglist() which supports the needed functionality. Cc: Boris Pismenny <borisp@nvidia.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-28net: gso: fix ownership in __udp_gso_segmentAntoine Tenart
In __udp_gso_segment the skb destructor is removed before segmenting the skb but the socket reference is kept as-is. This is an issue if the original skb is later orphaned as we can hit the following bug: kernel BUG at ./include/linux/skbuff.h:3312! (skb_orphan) RIP: 0010:ip_rcv_core+0x8b2/0xca0 Call Trace: ip_rcv+0xab/0x6e0 __netif_receive_skb_one_core+0x168/0x1b0 process_backlog+0x384/0x1100 __napi_poll.constprop.0+0xa1/0x370 net_rx_action+0x925/0xe50 The above can happen following a sequence of events when using OpenVSwitch, when an OVS_ACTION_ATTR_USERSPACE action precedes an OVS_ACTION_ATTR_OUTPUT action: 1. OVS_ACTION_ATTR_USERSPACE is handled (in do_execute_actions): the skb goes through queue_gso_packets and then __udp_gso_segment, where its destructor is removed. 2. The segments' data are copied and sent to userspace. 3. OVS_ACTION_ATTR_OUTPUT is handled (in do_execute_actions) and the same original skb is sent to its path. 4. If it later hits skb_orphan, we hit the bug. Fix this by also removing the reference to the socket in __udp_gso_segment. Fixes: ad405857b174 ("udp: better wmem accounting on gso") Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250226171352.258045-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28inet: ping: avoid skb_clone() dance in ping_rcv()Eric Dumazet
ping_rcv() callers currently call skb_free() or consume_skb(), forcing ping_rcv() to clone the skb. After this patch ping_rcv() is now 'consuming' the original skb, either moving to a socket receive queue, or dropping it. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250226183437.1457318-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28ipv4: icmp: do not process ICMP_EXT_ECHOREPLY for broadcast/multicast addressesEric Dumazet
There is no point processing ICMP_EXT_ECHOREPLY for routes which would drop ICMP_ECHOREPLY (RFC 1122 3.2.2.6, 3.2.2.8) This seems an oversight of the initial implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250226183437.1457318-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-28wifi: mac80211: refactor populating mesh related fields in sinfoSarika Sharma
Introduce the sta_set_mesh_sinfo() to populate mesh related fields in sinfo structure for station statistics. This will allow for the simplified population of other fields in the sinfo structure for link level in a subsequent patch to add support for MLO station statistics. No functionality changes added. Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250213171632.1646538-3-quic_sarishar@quicinc.com [reword since it's just an internal thing] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-28wifi: cfg80211: reorg sinfo structure elements for meshSarika Sharma
Currently, as multi-link operation(MLO) is not supported for mesh, reorganize the sinfo structure for mesh-specific fields and embed mesh related NL attributes together in organized view. This will allow for the simplified reorganization of sinfo structure for link level in a subsequent patch to add support for MLO station statistics. No functionality changes added. Pahole summary before the reorg of sinfo structure: - size: 256, cachelines: 4, members: 50 - sum members: 239, holes: 4, sum holes: 17 - paddings: 2, sum paddings: 2 - forced alignments: 1, forced holes: 1, sum forced holes: 1 Pahole summary after the reorg of sinfo structure: - size: 248, cachelines: 4, members: 50 - sum members: 239, holes: 4, sum holes: 9 - paddings: 2, sum paddings: 2 - forced alignments: 1, last cacheline: 56 bytes Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com> Link: https://patch.msgid.link/20250213171632.1646538-2-quic_sarishar@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-27net: sched: Remove newline at the end of a netlink error messageGal Pressman
Netlink error messages should not have a newline at the end of the string. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250226093904.6632-5-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>