summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2024-09-11net: ethtool: phy: Check the req_info.pdn field for GET commandsMaxime Chevallier
When processing the netlink GET requests to get PHY info, the req_info.pdn pointer is NULL when no PHY matches the requested parameters, such as when the phy_index is invalid, or there's simply no PHY attached to the interface. Therefore, check the req_info.pdn pointer for NULL instead of dereferencing it. Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: Eric Dumazet <edumazet@google.com> Closes: https://lore.kernel.org/netdev/CANn89iKRW0WpGAh1tKqY345D8WkYCPm3Y9ym--Si42JZrQAu1g@mail.gmail.com/T/#mfced87d607d18ea32b3b4934dfa18d7b36669285 Fixes: 17194be4c8e1 ("net: ethtool: Introduce a command to list PHYs on an interface") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240910174636.857352-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11mptcp: pm: Fix uaf in __timer_delete_syncEdward Adam Davis
There are two paths to access mptcp_pm_del_add_timer, result in a race condition: CPU1 CPU2 ==== ==== net_rx_action napi_poll netlink_sendmsg __napi_poll netlink_unicast process_backlog netlink_unicast_kernel __netif_receive_skb genl_rcv __netif_receive_skb_one_core netlink_rcv_skb NF_HOOK genl_rcv_msg ip_local_deliver_finish genl_family_rcv_msg ip_protocol_deliver_rcu genl_family_rcv_msg_doit tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit tcp_v4_do_rcv mptcp_nl_remove_addrs_list tcp_rcv_established mptcp_pm_remove_addrs_and_subflows tcp_data_queue remove_anno_list_by_saddr mptcp_incoming_options mptcp_pm_del_add_timer mptcp_pm_del_add_timer kfree(entry) In remove_anno_list_by_saddr(running on CPU2), after leaving the critical zone protected by "pm.lock", the entry will be released, which leads to the occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1). Keeping a reference to add_timer inside the lock, and calling sk_stop_timer_sync() with this reference, instead of "entry->add_timer". Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock, do not directly access any members of the entry outside the pm lock, which can avoid similar "entry->x" uaf. Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/tencent_7142963A37944B4A74EF76CD66EA3C253609@qq.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11mptcp: disable active MPTCP in case of blackholeMatthieu Baerts (NGI0)
An MPTCP firewall blackhole can be detected if the following SYN retransmission after a fallback to "plain" TCP is accepted. In case of blackhole, a similar technique to the one in place with TFO is now used: MPTCP can be disabled for a certain period of time, 1h by default. This time period will grow exponentially when more blackhole issues get detected right after MPTCP is re-enabled and will reset to the initial value when the blackhole issue goes away. The blackhole period can be modified thanks to a new sysctl knob: blackhole_timeout. Two new MIB counters help understanding what's happening: - 'Blackhole', incremented when a blackhole is detected. - 'MPCapableSYNTXDisabled', incremented when an MPTCP connection directly falls back to TCP during the blackhole period. Because the technique is inspired by the one used by TFO, an important part of the new code is similar to what can find in tcp_fastopen.c, with some adaptations to the MPTCP case. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/57 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-3-da7ebb4cd2a3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11mptcp: fallback to TCP after SYN+MPC dropsMatthieu Baerts (NGI0)
Some middleboxes might be nasty with MPTCP, and decide to drop packets with MPTCP options, instead of just dropping the MPTCP options (or letting them pass...). In this case, it sounds better to fallback to "plain" TCP after 2 retransmissions, and try again. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/477 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-2-da7ebb4cd2a3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11mptcp: export mptcp_subflow_early_fallback()Matthieu Baerts (NGI0)
This helper will be used outside protocol.h in the following commit. While at it, also add a 'pr_fallback()' debug print, to help identifying fallbacks. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-1-da7ebb4cd2a3@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: hsr: prevent NULL pointer dereference in hsr_proxy_announce()Jeongjun Park
In the function hsr_proxy_annouance() added in the previous commit 5f703ce5c981 ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data"), the return value of the hsr_port_get_hsr() function is not checked to be a NULL pointer, which causes a NULL pointer dereference. To solve this, we need to add code to check whether the return value of hsr_port_get_hsr() is NULL. Reported-by: syzbot+02a42d9b1bd395cbcab4@syzkaller.appspotmail.com Fixes: 5f703ce5c981 ("net: hsr: Send supervisory frames to HSR network with ProxyNodeTable data") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20240907190341.162289-1-aha310510@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: hsr: Remove interlink_sequence_nr.Eric Dumazet
Remove interlink_sequence_nr which is unused. [ bigeasy: split out from Eric's patch ]. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240906132816.657485-3-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11net: hsr: Use the seqnr lock for frames received via interlink port.Sebastian Andrzej Siewior
syzbot reported that the seqnr_lock is not acquire for frames received over the interlink port. In the interlink case a new seqnr is generated and assigned to the frame. Frames, which are received over the slave port have already a sequence number assigned so the lock is not required. Acquire the hsr_priv::seqnr_lock during in the invocation of hsr_forward_skb() if a packet has been received from the interlink port. Reported-by: syzbot+3d602af7549af539274e@syzkaller.appspotmail.com Closes: https://groups.google.com/g/syzkaller-bugs/c/KppVvGviGg4/m/EItSdCZdBAAJ Fixes: 5055cccfc2d1c ("net: hsr: Provide RedBox support (HSR-SAN)") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lukasz Majewski <lukma@denx.de> Tested-by: Lukasz Majewski <lukma@denx.de> Link: https://patch.msgid.link/20240906132816.657485-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12netfilter: nft_socket: make cgroupsv2 matching work with namespacesFlorian Westphal
When running in container environmment, /sys/fs/cgroup/ might not be the real root node of the sk-attached cgroup. Example: In container: % stat /sys//fs/cgroup/ Device: 0,21 Inode: 2214 .. % stat /sys/fs/cgroup/foo Device: 0,21 Inode: 2264 .. The expectation would be for: nft add rule .. socket cgroupv2 level 1 "foo" counter to match traffic from a process that got added to "foo" via "echo $pid > /sys/fs/cgroup/foo/cgroup.procs". However, 'level 3' is needed to make this work. Seen from initial namespace, the complete hierarchy is: % stat /sys/fs/cgroup/system.slice/docker-.../foo Device: 0,21 Inode: 2264 .. i.e. hierarchy is 0 1 2 3 / -> system.slice -> docker-1... -> foo ... but the container doesn't know that its "/" is the "docker-1.." cgroup. Current code will retrieve the 'system.slice' cgroup node and store its kn->id in the destination register, so compare with 2264 ("foo" cgroup id) will not match. Fetch "/" cgroup from ->init() and add its level to the level we try to extract. cgroup root-level is 0 for the init-namespace or the level of the ancestor that is exposed as the cgroup root inside the container. In the above case, cgrp->level of "/" resolved in the container is 2 (docker-1...scope/) and request for 'level 1' will get adjusted to fetch the actual level (3). v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it. (kernel test robot) Cc: cgroups@vger.kernel.org Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2") Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-12netfilter: nft_socket: fix sk refcount leaksFlorian Westphal
We must put 'sk' reference before returning. Fixes: 039b1f4f24ec ("netfilter: nft_socket: fix erroneous socket assignment") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-09-11Merge tag 'wireless-next-2024-09-11' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.12 The last -next "new features" pull request for v6.12. The stack now supports DFS on MLO but otherwise nothing really standing out. Major changes: cfg80211/mac80211 * EHT rate support in AQL airtime * DFS support for MLO rtw89 * complete BT-coexistence code for RTL8852BT * RTL8922A WoWLAN net-detect support * tag 'wireless-next-2024-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits) wifi: brcmfmac: cfg80211: Convert comma to semicolon wifi: rsi: Remove an unused field in struct rsi_debugfs wifi: libertas: Cleanup unused declarations wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_bus_probe() wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_sdio_probe() wifi: wilc1000: fix potential RCU dereference issue in wilc_parse_join_bss_param wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext() wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop() wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors wifi: cfg80211: fix kernel-doc for per-link data wifi: mt76: mt7925: replace chan config with extend txpower config for clc wifi: mt76: mt7925: fix a potential array-index-out-of-bounds issue for clc wifi: mt76: mt7615: check devm_kasprintf() returned value wifi: mt76: mt7925: convert comma to semicolon wifi: mt76: mt7925: fix a potential association failure upon resuming wifi: mt76: Avoid multiple -Wflex-array-member-not-at-end warnings wifi: mt76: mt7921: Check devm_kasprintf() returned value wifi: mt76: mt7915: check devm_kasprintf() returned value wifi: mt76: mt7915: avoid long MCU command timeouts during SER wifi: mt76: mt7996: fix uninitialized TLV data ... ==================== Link: https://patch.msgid.link/20240911084147.A205DC4AF0F@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11sock_map: Add a cond_resched() in sock_hash_free()Eric Dumazet
Several syzbot soft lockup reports all have in common sock_hash_free() If a map with a large number of buckets is destroyed, we need to yield the cpu when needed. Fixes: 75e68e5bf2c7 ("bpf, sockhash: Synchronize delete from bucket list on map free") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240906154449.3742932-1-edumazet@google.com
2024-09-11bpf: Allow bpf_dynptr_from_skb() for tp_btfPhilo Lu
Making tp_btf able to use bpf_dynptr_from_skb(), which is useful for skb parsing, especially for non-linear paged skb data. This is achieved by adding KF_TRUSTED_ARGS flag to bpf_dynptr_from_skb and registering it for TRACING progs. With KF_TRUSTED_ARGS, args from fentry/fexit are excluded, so that unsafe progs like fexit/__kfree_skb are not allowed. We also need the skb dynptr to be read-only in tp_btf. Because may_access_direct_pkt_data() returns false by default when checking bpf_dynptr_from_skb, there is no need to add BPF_PROG_TYPE_TRACING to it explicitly. Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240911033719.91468-5-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11net/9p/usbg: fix CONFIG_USB_GADGET dependencyArnd Bergmann
When USB gadget support is in a loadable module, 9pfs cannot link to it as a built-in driver: x86_64-linux-ld: vmlinux.o: in function `usb9pfs_free_func': trans_usbg.c:(.text+0x1070012): undefined reference to `usb_free_all_descriptors' x86_64-linux-ld: vmlinux.o: in function `disable_ep': trans_usbg.c:(.text+0x1070528): undefined reference to `usb_ep_disable' x86_64-linux-ld: vmlinux.o: in function `usb9pfs_func_unbind': trans_usbg.c:(.text+0x10705df): undefined reference to `usb_ep_free_request' x86_64-linux-ld: trans_usbg.c:(.text+0x107061f): undefined reference to `usb_ep_free_request' x86_64-linux-ld: vmlinux.o: in function `usb9pfs_func_bind': trans_usbg.c:(.text+0x107069f): undefined reference to `usb_interface_id' x86_64-linux-ld: trans_usbg.c:(.text+0x10706b5): undefined reference to `usb_string_id' Change the Kconfig dependency to only allow this to be enabled when it can successfully link and work. Fixes: a3be076dc174 ("net/9p/usbg: Add new usb gadget function transport") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240909111745.248952-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-10Merge tag 'ipsec-next-2024-09-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-09-10 1) Remove an unneeded WARN_ON on packet offload. From Patrisious Haddad. 2) Add a copy from skb_seq_state to buffer function. This is needed for the upcomming IPTFS patchset. From Christian Hopps. 3) Spelling fix in xfrm.h. From Simon Horman. 4) Speed up xfrm policy insertions. From Florian Westphal. 5) Add and revert a patch to support xfrm interfaces for packet offload. This patch was just half cooked. 6) Extend usage of the new xfrm_policy_is_dead_or_sk helper. From Florian Westphal. 7) Update comments on sdb and xfrm_policy. From Florian Westphal. 8) Fix a null pointer dereference in the new policy insertion code From Florian Westphal. 9) Fix an uninitialized variable in the new policy insertion code. From Nathan Chancellor. * tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: policy: Restore dir assignments in xfrm_hash_rebuild() xfrm: policy: fix null dereference Revert "xfrm: add SA information to the offloaded packet" xfrm: minor update to sdb and xfrm_policy comments xfrm: policy: use recently added helper in more places xfrm: add SA information to the offloaded packet xfrm: policy: remove remaining use of inexact list xfrm: switch migrate to xfrm_policy_lookup_bytype xfrm: policy: don't iterate inexact policies twice at insert time selftests: add xfrm policy insertion speed test script xfrm: Correct spelling in xfrm.h net: add copy from skb_seq_state to buffer function xfrm: Remove documentation WARN_ON to limit return values for offloaded SA ==================== Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10sch_cake: constify inverse square root cacheDave Taht
sch_cake uses a cache of the first 16 values of the inverse square root calculation for the Cobalt AQM to save some cycles on the fast path. This cache is populated when the qdisc is first loaded, but there's really no reason why it can't just be pre-populated. So change it to be pre-populated with constants, which also makes it possible to constify it. This gives a modest space saving for the module (not counting debug data): .text: -224 bytes .rodata: +80 bytes .bss: -64 bytes Total: -192 bytes Signed-off-by: Dave Taht <dave.taht@gmail.com> [ fixed up comment, rewrote commit message ] Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20240909091630.22177-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10net: dsa: microchip: update tag_ksz masks for KSZ9477 familyPieter Van Trappen
Remove magic number 7 by introducing a GENMASK macro instead. Remove magic number 0x80 by using the BIT macro instead. Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240909134301.75448-1-vtpieter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flagJason Xing
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter out rx software timestamp report, especially after a process turns on netstamp_needed_key which can time stamp every incoming skb. Previously, we found out if an application starts first which turns on netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE could also get rx timestamp. Now we handle this case by introducing this new flag without breaking users. Quoting Willem to explain why we need the flag: "why a process would want to request software timestamp reporting, but not receive software timestamp generation. The only use I see is when the application does request SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE." Similarly, this new flag could also be used for hardware case where we can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive hardware receive timestamp. Another thing about errqueue in this patch I have a few words to say: In this case, we need to handle the egress path carefully, or else reporting the tx timestamp will fail. Egress path and ingress path will finally call sock_recv_timestamp(). We have to distinguish them. Errqueue is a good indicator to reflect the flow direction. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240909015612.3856-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10Bluetooth: hci_sync: Ignore errors from HCI_OP_REMOTE_NAME_REQ_CANCELLuiz Augusto von Dentz
This ignores errors from HCI_OP_REMOTE_NAME_REQ_CANCEL since it shouldn't interfere with the stopping of discovery and in certain conditions it seems to be failing. Link: https://github.com/bluez/bluez/issues/575 Fixes: d0b137062b2d ("Bluetooth: hci_sync: Rework init stages") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-10Bluetooth: CMTP: Mark BT_CMTP as DEPRECATEDLuiz Augusto von Dentz
This marks BT_CMTP as DEPRECATED in preparation to get it removed. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-10Bluetooth: replace deprecated strncpy with strscpy_padJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [0] and as such we should prefer more robust and less ambiguous string interfaces. The CAPI (part II) [1] states that the manufacturer id should be a "zero-terminated ASCII string" and should "always [be] zero-terminated." Much the same for the serial number: "The serial number, a seven-digit number coded as a zero-terminated ASCII string". Along with this, its clear the original author intended for these buffers to be NUL-padded as well. To meet the specification as well as properly NUL-pad, use strscpy_pad(). In doing this, an opportunity to simplify this code is also present. Remove the min_t() and combine the length check into the main if. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [0] Link: https://capi.org/downloads.html [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-10Bluetooth: hci_core: Fix sending MGMT_EV_CONNECT_FAILEDLuiz Augusto von Dentz
If HCI_CONN_MGMT_CONNECTED has been set then the event shall be HCI_CONN_MGMT_DISCONNECTED. Fixes: b644ba336997 ("Bluetooth: Update device_connected and device_found events to latest API") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-10Bluetooth: Use led_set_brightness() in LED trigger activate() callbackHans de Goede
A LED trigger's activate() callback gets called when the LED trigger gets activated for a specific LED, so that the trigger code can ensure the LED state matches the current state of the trigger condition (LED_FULL when HCI_UP is set in this case). led_trigger_event() is intended for trigger condition state changes and iterates over _all_ LEDs which are controlled by this trigger changing the brightness of each of them. In the activate() case only the brightness of the LED which is being activated needs to change and that LED is passed as an argument to activate(), switch to led_set_brightness() to only change the brightness of the LED being activated. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-10Bluetooth: hci_conn: Remove redundant memset after kzallocKuan-Wei Chiu
Since kzalloc already zeroes the allocated memory, the subsequent memset call is unnecessary. This patch removes the redundant memset to clean up the code and enhance efficiency. Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-09-10Merge branch 'linus' into timers/coreThomas Gleixner
To update with the latest fixes.
2024-09-10net/smc: add sysctl for smc_limit_hsD. Wythe
In commit 48b6190a0042 ("net/smc: Limit SMC visits when handshake workqueue congested"), we introduce a mechanism to put constraint on SMC connections visit according to the pressure of SMC handshake process. At that time, we believed that controlling the feature through netlink was sufficient. However, most people have realized now that netlink is not convenient in container scenarios, and sysctl is a more suitable approach. In addition, since commit 462791bbfa35 ("net/smc: add sysctl interface for SMC") had introcuded smc_sysctl_net_init(), it is reasonable for us to initialize limit_smc_hs in it instead of initializing it in smc_pnet_net_int(). Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Link: https://patch.msgid.link/1725590135-5631-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-10Merge branch 'vfs.file' of ↵Vlastimil Babka
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs into slab/for-6.12/kmem_cache_args Merge prerequisities from the vfs git tree for the following series that introduces kmem_cache_args. The vfs.file branch includes the addition of kmem_cache_create_rcu() which was needed in vfs for the filp cache optimization. The following series refactors this code.
2024-09-10memcg: add charging of already allocated slab objectsShakeel Butt
At the moment, the slab objects are charged to the memcg at the allocation time. However there are cases where slab objects are allocated at the time where the right target memcg to charge it to is not known. One such case is the network sockets for the incoming connection which are allocated in the softirq context. Couple hundred thousand connections are very normal on large loaded server and almost all of those sockets underlying those connections get allocated in the softirq context and thus not charged to any memcg. However later at the accept() time we know the right target memcg to charge. Let's add new API to charge already allocated objects, so we can have better accounting of the memory usage. To measure the performance impact of this change, tcp_crr is used from the neper [1] performance suite. Basically it is a network ping pong test with new connection for each ping pong. The server and the client are run inside 3 level of cgroup hierarchy using the following commands: Server: $ tcp_crr -6 Client: $ tcp_crr -6 -c -H ${server_ip} If the client and server run on different machines with 50 GBPS NIC, there is no visible impact of the change. For the same machine experiment with v6.11-rc5 as base. base (throughput) with-patch tcp_crr 14545 (+- 80) 14463 (+- 56) It seems like the performance impact is within the noise. Link: https://github.com/google/neper [1] Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> # net Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-09-09fou: fix initialization of grcMuhammad Usama Anjum
The grc must be initialize first. There can be a condition where if fou is NULL, goto out will be executed and grc would be used uninitialized. Fixes: 7e4196935069 ("fou: Fix null-ptr-deref in GRO.") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240906102839.202798-1-usama.anjum@collabora.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09af_unix: Don't return OOB skb in manage_oob().Kuniyuki Iwashima
syzbot reported use-after-free in unix_stream_recv_urg(). [0] The scenario is 1. send(MSG_OOB) 2. recv(MSG_OOB) -> The consumed OOB remains in recv queue 3. send(MSG_OOB) 4. recv() -> manage_oob() returns the next skb of the consumed OOB -> This is also OOB, but unix_sk(sk)->oob_skb is not cleared 5. recv(MSG_OOB) -> unix_sk(sk)->oob_skb is used but already freed The recent commit 8594d9b85c07 ("af_unix: Don't call skb_get() for OOB skb.") uncovered the issue. If the OOB skb is consumed and the next skb is peeked in manage_oob(), we still need to check if the skb is OOB. Let's do so by falling back to the following checks in manage_oob() and add the test case in selftest. Note that we need to add a similar check for SIOCATMARK. [0]: BUG: KASAN: slab-use-after-free in unix_stream_read_actor+0xa6/0xb0 net/unix/af_unix.c:2959 Read of size 4 at addr ffff8880326abcc4 by task syz-executor178/5235 CPU: 0 UID: 0 PID: 5235 Comm: syz-executor178 Not tainted 6.11.0-rc5-syzkaller-00742-gfbdaffe41adc #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 unix_stream_read_actor+0xa6/0xb0 net/unix/af_unix.c:2959 unix_stream_recv_urg+0x1df/0x320 net/unix/af_unix.c:2640 unix_stream_read_generic+0x2456/0x2520 net/unix/af_unix.c:2778 unix_stream_recvmsg+0x22b/0x2c0 net/unix/af_unix.c:2996 sock_recvmsg_nosec net/socket.c:1046 [inline] sock_recvmsg+0x22f/0x280 net/socket.c:1068 ____sys_recvmsg+0x1db/0x470 net/socket.c:2816 ___sys_recvmsg net/socket.c:2858 [inline] __sys_recvmsg+0x2f0/0x3e0 net/socket.c:2888 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5360d6b4e9 Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff29b3a458 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 00007fff29b3a638 RCX: 00007f5360d6b4e9 RDX: 0000000000002001 RSI: 0000000020000640 RDI: 0000000000000003 RBP: 00007f5360dde610 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00007fff29b3a628 R14: 0000000000000001 R15: 0000000000000001 </TASK> Allocated by task 5235: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:312 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3988 [inline] slab_alloc_node mm/slub.c:4037 [inline] kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4080 __alloc_skb+0x1c3/0x440 net/core/skbuff.c:667 alloc_skb include/linux/skbuff.h:1320 [inline] alloc_skb_with_frags+0xc3/0x770 net/core/skbuff.c:6528 sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2815 sock_alloc_send_skb include/net/sock.h:1778 [inline] queue_oob+0x108/0x680 net/unix/af_unix.c:2198 unix_stream_sendmsg+0xd24/0xf80 net/unix/af_unix.c:2351 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 5235: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2252 [inline] slab_free mm/slub.c:4473 [inline] kmem_cache_free+0x145/0x350 mm/slub.c:4548 unix_stream_read_generic+0x1ef6/0x2520 net/unix/af_unix.c:2917 unix_stream_recvmsg+0x22b/0x2c0 net/unix/af_unix.c:2996 sock_recvmsg_nosec net/socket.c:1046 [inline] sock_recvmsg+0x22f/0x280 net/socket.c:1068 __sys_recvfrom+0x256/0x3e0 net/socket.c:2255 __do_sys_recvfrom net/socket.c:2273 [inline] __se_sys_recvfrom net/socket.c:2269 [inline] __x64_sys_recvfrom+0xde/0x100 net/socket.c:2269 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The buggy address belongs to the object at ffff8880326abc80 which belongs to the cache skbuff_head_cache of size 240 The buggy address is located 68 bytes inside of freed 240-byte region [ffff8880326abc80, ffff8880326abd70) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x326ab ksm flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff) page_type: 0xfdffffff(slab) raw: 00fff00000000000 ffff88801eaee780 ffffea0000b7dc80 dead000000000003 raw: 0000000000000000 00000000800c000c 00000001fdffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 4686, tgid 4686 (udevadm), ts 32357469485, free_ts 28829011109 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1493 prep_new_page mm/page_alloc.c:1501 [inline] get_page_from_freelist+0x2e4c/0x2f10 mm/page_alloc.c:3439 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4695 __alloc_pages_node_noprof include/linux/gfp.h:269 [inline] alloc_pages_node_noprof include/linux/gfp.h:296 [inline] alloc_slab_page+0x5f/0x120 mm/slub.c:2321 allocate_slab+0x5a/0x2f0 mm/slub.c:2484 new_slab mm/slub.c:2537 [inline] ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3723 __slab_alloc+0x58/0xa0 mm/slub.c:3813 __slab_alloc_node mm/slub.c:3866 [inline] slab_alloc_node mm/slub.c:4025 [inline] kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4080 __alloc_skb+0x1c3/0x440 net/core/skbuff.c:667 alloc_skb include/linux/skbuff.h:1320 [inline] alloc_uevent_skb+0x74/0x230 lib/kobject_uevent.c:289 uevent_net_broadcast_untagged lib/kobject_uevent.c:326 [inline] kobject_uevent_net_broadcast+0x2fd/0x580 lib/kobject_uevent.c:410 kobject_uevent_env+0x57d/0x8e0 lib/kobject_uevent.c:608 kobject_synth_uevent+0x4ef/0xae0 lib/kobject_uevent.c:207 uevent_store+0x4b/0x70 drivers/base/bus.c:633 kernfs_fop_write_iter+0x3a1/0x500 fs/kernfs/file.c:334 new_sync_write fs/read_write.c:497 [inline] vfs_write+0xa72/0xc90 fs/read_write.c:590 page last free pid 1 tgid 1 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1094 [inline] free_unref_page+0xd22/0xea0 mm/page_alloc.c:2612 kasan_depopulate_vmalloc_pte+0x74/0x90 mm/kasan/shadow.c:408 apply_to_pte_range mm/memory.c:2797 [inline] apply_to_pmd_range mm/memory.c:2841 [inline] apply_to_pud_range mm/memory.c:2877 [inline] apply_to_p4d_range mm/memory.c:2913 [inline] __apply_to_page_range+0x8a8/0xe50 mm/memory.c:2947 kasan_release_vmalloc+0x9a/0xb0 mm/kasan/shadow.c:525 purge_vmap_node+0x3e3/0x770 mm/vmalloc.c:2208 __purge_vmap_area_lazy+0x708/0xae0 mm/vmalloc.c:2290 _vm_unmap_aliases+0x79d/0x840 mm/vmalloc.c:2885 change_page_attr_set_clr+0x2fe/0xdb0 arch/x86/mm/pat/set_memory.c:1881 change_page_attr_set arch/x86/mm/pat/set_memory.c:1922 [inline] set_memory_nx+0xf2/0x130 arch/x86/mm/pat/set_memory.c:2110 free_init_pages arch/x86/mm/init.c:924 [inline] free_kernel_image_pages arch/x86/mm/init.c:943 [inline] free_initmem+0x79/0x110 arch/x86/mm/init.c:970 kernel_init+0x31/0x2b0 init/main.c:1476 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Memory state around the buggy address: ffff8880326abb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880326abc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc >ffff8880326abc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880326abd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc ffff8880326abd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb Fixes: 93c99f21db36 ("af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.") Reported-by: syzbot+8811381d455e3e9ec788@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8811381d455e3e9ec788 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240905193240.17565-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09af_unix: Move spin_lock() in manage_oob().Kuniyuki Iwashima
When OOB skb has been already consumed, manage_oob() returns the next skb if exists. In such a case, we need to fall back to the else branch below. Then, we want to keep holding spin_lock(&sk->sk_receive_queue.lock). Let's move it out of if-else branch and add lightweight check before spin_lock() for major use cases without OOB skb. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240905193240.17565-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09af_unix: Rename unlinked_skb in manage_oob().Kuniyuki Iwashima
When OOB skb has been already consumed, manage_oob() returns the next skb if exists. In such a case, we need to fall back to the else branch below. Then, we need to keep two skbs and free them later with consume_skb() and kfree_skb(). Let's rename unlinked_skb accordingly. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240905193240.17565-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09af_unix: Remove single nest in manage_oob().Kuniyuki Iwashima
This is a prep for the later fix. No functional change intended. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240905193240.17565-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09net: remove dev_pick_tx_cpu_id()Jakub Kicinski
dev_pick_tx_cpu_id() has been introduced with two users by commit a4ea8a3dacc3 ("net: Add generic ndo_select_queue functions"). The use in AF_PACKET has been removed in 2019 by commit b71b5837f871 ("packet: rework packet_pick_tx_queue() to use common code selection") The other user was a Netlogic XLP driver, removed in 2021 by commit 47ac6f567c28 ("staging: Remove Netlogic XLP network driver"). It's relatively unlikely that any modern driver will need an .ndo_select_queue implementation which picks purely based on CPU ID and skips XPS, delete dev_pick_tx_cpu_id() Found by code inspection. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240906161059.715546-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()Nathan Chancellor
Clang warns (or errors with CONFIG_WERROR): net/xfrm/xfrm_policy.c:1286:8: error: variable 'dir' is uninitialized when used here [-Werror,-Wuninitialized] 1286 | if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) { | ^~~ net/xfrm/xfrm_policy.c:1257:9: note: initialize the variable 'dir' to silence this warning 1257 | int dir; | ^ | = 0 1 error generated. A recent refactoring removed some assignments to dir because xfrm_policy_is_dead_or_sk() has a dir assignment in it. However, dir is used elsewhere in xfrm_hash_rebuild(), including within loops where it needs to be reloaded for each policy. Restore the assignments before the first use of dir to fix the warning and ensure dir is properly initialized throughout the function. Fixes: 08c2182cf0b4 ("xfrm: policy: use recently added helper in more places") Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-09xfrm: policy: fix null dereferenceFlorian Westphal
Julian Wiedmann says: > + if (!xfrm_pol_hold_rcu(ret)) Coverity spotted that ^^^ needs a s/ret/pol fix-up: > CID 1599386: Null pointer dereferences (FORWARD_NULL) > Passing null pointer "ret" to "xfrm_pol_hold_rcu", which dereferences it. Ditch the bogus 'ret' variable. Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype") Reported-by: Julian Wiedmann <jwiedmann.dev@gmail.com> Closes: https://lore.kernel.org/netdev/06dc2499-c095-4bd4-aee3-a1d0e3ec87c4@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-09sctp: Unmask upper DSCP bits in sctp_v4_get_dst()Ido Schimmel
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Note that the 'tos' variable holds the full DS field. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: udp_tunnel: Unmask upper DSCP bits in udp_tunnel_dst_lookup()Ido Schimmel
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Note that callers of udp_tunnel_dst_lookup() pass the entire DS field in the 'tos' argument. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09netfilter: nf_dup4: Unmask upper DSCP bits in nf_dup_ipv4_route()Ido Schimmel
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09netfilter: nft_flow_offload: Unmask upper DSCP bits in nft_flow_route()Ido Schimmel
Unmask the upper DSCP bits when calling nf_route() which eventually calls ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: netfilter: Unmask upper DSCP bits in ip_route_me_harder()Ido Schimmel
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_xmit()Ido Schimmel
Unmask the upper DSCP bits when initializing an IPv4 flow key via ip_tunnel_init_flow() before passing it to ip_route_output_key() so that in the future we could perform the FIB lookup according to the full DSCP value. Note that the 'tos' variable includes the full DS field. Either the one specified as part of the tunnel parameters or the one inherited from the inner packet. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: ip_tunnel: Unmask upper DSCP bits in ip_md_tunnel_xmit()Ido Schimmel
Unmask the upper DSCP bits when initializing an IPv4 flow key via ip_tunnel_init_flow() before passing it to ip_route_output_key() so that in the future we could perform the FIB lookup according to the full DSCP value. Note that the 'tos' variable includes the full DS field. Either the one specified via the tunnel key or the one inherited from the inner packet. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: ip_tunnel: Unmask upper DSCP bits in ip_tunnel_bind_dev()Ido Schimmel
Unmask the upper DSCP bits when initializing an IPv4 flow key via ip_tunnel_init_flow() before passing it to ip_route_output_key() so that in the future we could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: icmp: Unmask upper DSCP bits in icmp_reply()Ido Schimmel
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09bpf: lwtunnel: Unmask upper DSCP bits in bpf_lwt_xmit_reroute()Ido Schimmel
Unmask the upper DSCP bits when calling ip_route_output_key() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09ipv4: ip_gre: Unmask upper DSCP bits in ipgre_open()Ido Schimmel
Unmask the upper DSCP bits when calling ip_route_output_gre() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09netfilter: br_netfilter: Unmask upper DSCP bits in br_nf_pre_routing_finish()Ido Schimmel
Unmask upper DSCP bits when calling ip_route_output() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-09wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop()Dmitry Antipov
Since '__dev_queue_xmit()' should be called with interrupts enabled, the following backtrace: ieee80211_do_stop() ... spin_lock_irqsave(&local->queue_stop_reason_lock, flags) ... ieee80211_free_txskb() ieee80211_report_used_skb() ieee80211_report_ack_skb() cfg80211_mgmt_tx_status_ext() nl80211_frame_tx_status() genlmsg_multicast_netns() genlmsg_multicast_netns_filtered() nlmsg_multicast_filtered() netlink_broadcast_filtered() do_one_broadcast() netlink_broadcast_deliver() __netlink_sendskb() netlink_deliver_tap() __netlink_deliver_tap_skb() dev_queue_xmit() __dev_queue_xmit() ; with IRQS disabled ... spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags) issues the warning (as reported by syzbot reproducer): WARNING: CPU: 2 PID: 5128 at kernel/softirq.c:362 __local_bh_enable_ip+0xc3/0x120 Fix this by implementing a two-phase skb reclamation in 'ieee80211_do_stop()', where actual work is performed outside of a section with interrupts disabled. Fixes: 5061b0c2b906 ("mac80211: cooperate more with network namespaces") Reported-by: syzbot+1a3986bbd3169c307819@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1a3986bbd3169c307819 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://patch.msgid.link/20240906123151.351647-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-09-09Revert "xfrm: add SA information to the offloaded packet"Steffen Klassert
This reverts commit e7cd191f83fd899c233dfbe7dc6d96ef703dcbbd. While supporting xfrm interfaces in the packet offload API is needed, this patch does not do the right thing. There are more things to do to really support xfrm interfaces, so revert it for now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>