summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-09-02net: enetc: remove unnecessary CONFIG_FSL_ENETC_PTP_CLOCK checkWei Fang
The ENETC_F_RX_TSTAMP flag of priv->active_offloads can only be set when CONFIG_FSL_ENETC_PTP_CLOCK is enabled. Similarly, rx_ring->ext_en can only be set when CONFIG_FSL_ENETC_PTP_CLOCK is enabled as well. So it is safe to remove unnecessary CONFIG_FSL_ENETC_PTP_CLOCK check. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250829050615.1247468-12-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02net: enetc: extract enetc_update_ptp_sync_msg() to handle PTP Sync packetsWei Fang
Move PTP Sync packet processing from enetc_map_tx_buffs() to a new helper function enetc_update_ptp_sync_msg() to simplify the original function. Prepare for upcoming ENETC v4 one-step support. There is no functional change. It is worth mentioning that ENETC_TXBD_TSTAMP is added to replace 0x3fffffff. Prepare for upcoming ENETC v4 one-step support. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250829050615.1247468-11-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02net: enetc: save the parsed information of PTP packet to skb->cbWei Fang
Currently, the Tx PTP packets are parsed twice in the enetc driver, once in enetc_xmit() and once in enetc_map_tx_buffs(). The latter is duplicate and is unnecessary, since the parsed information can be saved to skb->cb so that enetc_map_tx_buffs() can get the previously parsed data from skb->cb. Therefore, add struct enetc_skb_cb as the format of the data in the skb->cb buffer to save the parsed information of PTP packet. Use saved information in enetc_map_tx_buffs() to avoid parsing data again. In addition, rename variables offset1 and offset2 in enetc_map_tx_buffs() to corr_off and tstamp_off for better readability. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250829050615.1247468-10-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02MAINTAINERS: add NETC Timer PTP clock driver sectionWei Fang
Add a section entry for NXP NETC Timer PTP clock driver. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250829050615.1247468-9-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02ptp: netc: add external trigger stamp supportF.S. Peng
The NETC Timer is capable of recording the timestamp on receipt of an external pulse on a GPIO pin. It supports two such external triggers. The recorded value is saved in a 16 entry FIFO accessed by TMR_ETTSa_H/L. An interrupt can be generated when the trigger occurs, when the FIFO reaches a threshold or overflows. Signed-off-by: F.S. Peng <fushi.peng@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250829050615.1247468-8-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02ptp: netc: add periodic pulse output supportWei Fang
NETC Timer has three pulse channels, all of which support periodic pulse output. Bind the channel to a ALARM register and then sets a future time into the ALARM register. When the current time is greater than the ALARM value, the FIPER register will be triggered to count down, and when the count reaches 0, the pulse will be triggered. The PPS signal is also implemented in this way. i.MX95 only has ALARM1 can be used as an indication to the FIPER start down counting, but i.MX943 has ALARM1 and ALARM2 can be used. Therefore, only one channel can work for i.MX95, two channels for i.MX943 as most. In addition, change the PPS channel to be dynamically selected from fixed number (0) because add PTP_CLK_REQ_PEROUT support. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250829050615.1247468-7-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02ptp: netc: add PTP_CLK_REQ_PPS supportWei Fang
The NETC Timer is capable of generating a PPS interrupt to the host. To support this feature, a 64-bit alarm time (which is a integral second of PHC in the future) is set to TMR_ALARM, and the period is set to TMR_FIPER. The alarm time is compared to the current time on each update, then the alarm trigger is used as an indication to the TMR_FIPER starts down counting. After the period has passed, the PPS event is generated. According to the NETC block guide, the Timer has three FIPERs, any of which can be used to generate the PPS events, but in the current implementation, we only need one of them to implement the PPS feature, so FIPER 0 is used as the default PPS generator. Also, the Timer has 2 ALARMs, currently, ALARM 0 is used as the default time comparator. However, if the time is adjusted or the integer of period is changed when PPS is enabled, the PPS event will not be generated at an integral second of PHC. The suggested steps from IP team if time drift happens: 1. Disable FIPER before adjusting the hardware time 2. Rearm ALARM after the time adjustment to make the next PPS event be generated at an integral second of PHC. 3. Re-enable FIPER. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250829050615.1247468-6-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02ptp: netc: add NETC V4 Timer PTP driver supportWei Fang
NETC V4 Timer provides current time with nanosecond resolution, precise periodic pulse, pulse on timeout (alarm), and time capture on external pulse support. And it supports time synchronization as required for IEEE 1588 and IEEE 802.1AS-2020. Inside NETC, ENETC can capture the timestamp of the sent/received packet through the PHC provided by the Timer and record it on the Tx/Rx BD. And through the relevant PHC interfaces provided by the driver, the enetc V4 driver can support PTP time synchronization. In addition, NETC V4 Timer is similar to the QorIQ 1588 timer, but it is not exactly the same. The current ptp-qoriq driver is not compatible with NETC V4 Timer, most of the code cannot be reused, see below reasons. 1. The architecture of ptp-qoriq driver makes the register offset fixed, however, the offsets of all the high registers and low registers of V4 are swapped, and V4 also adds some new registers. so extending ptp-qoriq to make it compatible with V4 Timer is tantamount to completely rewriting ptp-qoriq driver. 2. The usage of some functions is somewhat different from QorIQ timer, such as the setting of TCLK_PERIOD and TMR_ADD, the logic of configuring PPS, etc., so making the driver compatible with V4 Timer will undoubtedly increase the complexity of the code and reduce readability. 3. QorIQ is an expired brand. It is difficult for us to verify whether it works stably on the QorIQ platforms if we refactor the driver, and this will make maintenance difficult, so refactoring the driver obviously does not bring any benefits. Therefore, add this new driver for NETC V4 Timer. Note that the missing features like PEROUT, PPS and EXTTS will be added in subsequent patches. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250829050615.1247468-5-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02ptp: add helpers to get the phc_index by of_node or devWei Fang
Some Ethernet controllers do not have an integrated PTP timer function. Instead, the PTP timer is a separated device and provides PTP hardware clock to the Ethernet controller to use. Therefore, the Ethernet controller driver needs to obtain the PTP clock's phc_index in its ethtool_ops::get_ts_info(). Currently, most drivers implement this in the following ways. 1. The PTP device driver adds a custom API and exports it to the Ethernet controller driver. 2. The PTP device driver adds private data to its device structure. So the private data structure needs to be exposed to the Ethernet controller driver. When registering the ptp clock, ptp_clock_register() always saves the ptp_clock pointer to the private data of ptp_clock::dev. Therefore, as long as ptp_clock::dev is obtained, the phc_index can be obtained. So the following generic APIs can be added to the ptp driver to obtain the phc_index. 1. ptp_clock_index_by_dev(): Obtain the phc_index by the device pointer of the PTP device. 2.ptp_clock_index_by_of_node(): Obtain the phc_index by the of_node pointer of the PTP device. Also, we can add another API like ptp_clock_index_by_fwnode() to get the phc_index by fwnode of PTP device. However, this API is not used in this patch set, so it is better to add it when needed. Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20250829050615.1247468-4-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02dt-bindings: net: move ptp-timer property to ethernet-controller.yamlWei Fang
For some Ethernet controllers, the PTP timer function is not integrated. Instead, the PTP timer is a separate device and provides PTP Hardware Clock (PHC) to the Ethernet controller to use, such as NXP FMan MAC, ENETC, etc. Therefore, a property is needed to indicate this hardware relationship between the Ethernet controller and the PTP timer. Since this use case is also very common, it is better to add a generic property to ethernet-controller.yaml. According to the existing binding docs, there are two good candidates, one is the "ptp-timer" defined in fsl,fman-dtsec.yaml, and the other is the "ptimer-handle" defined in fsl,fman.yaml. From the perspective of the name, the former is more straightforward, so move the "ptp-timer" from fsl,fman-dtsec.yaml to ethernet-controller.yaml. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250829050615.1247468-3-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-02dt-bindings: ptp: add NETC Timer PTP clockWei Fang
NXP NETC (Ethernet Controller) is a multi-function PCIe Root Complex Integrated Endpoint (RCiEP), the Timer is one of its functions which provides current time with nanosecond resolution, precise periodic pulse, pulse on timeout (alarm), and time capture on external pulse support. And also supports time synchronization as required for IEEE 1588 and IEEE 802.1AS-2020. So add device tree binding doc for the PTP clock based on NETC Timer. NETC Timer has three reference clock sources, but the clock mux is inside the IP. Therefore, the driver will parse the clock name to select the desired clock source. If the clocks property is not present, NETC Timer will use the system clock of NETC IP as its reference clock. Because the Timer is a PCIe function of NETC IP, the system clock of NETC is always available to the Timer. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250829050615.1247468-2-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-01Merge branch 'inet-ping-misc-changes'Jakub Kicinski
Eric Dumazet says: ==================== inet: ping: misc changes First and third patches improve security a bit. Second patch (ping_hash removal) is a cleanup. Fourth patch uses EXPORT_IPV6_MOD[_GPL]. ==================== Link: https://patch.msgid.link/20250829153054.474201-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01inet: ping: use EXPORT_IPV6_MOD[_GPL]()Eric Dumazet
There is no neeed to export ping symbols when CONFIG_IPV6=y Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250829153054.474201-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01inet: ping: make ping_port_rover per netnsEric Dumazet
Provide isolation between netns for ping idents. Randomize initial ping_port_rover value at netns creation. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250829153054.474201-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01inet: ping: remove ping_hash()Eric Dumazet
There is no point in keeping ping_hash(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250829153054.474201-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01inet: ping: check sock_net() in ping_get_port() and ping_lookup()Eric Dumazet
We need to check socket netns before considering them in ping_get_port(). Otherwise, one malicious netns could 'consume' all ports. Add corresponding check in ping_lookup(). Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Yue Haibing <yuehaibing@huawei.com> Link: https://patch.msgid.link/20250829153054.474201-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01net: stmmac: mdio: update runtime PMRussell King (Oracle)
Commit 3c7826d0b106 ("net: stmmac: Separate C22 and C45 transactions for xgmac") missed a change that happened in commit e2d0acd40c87 ("net: stmmac: using pm_runtime_resume_and_get instead of pm_runtime_get_sync"). Update the two clause 45 functions that didn't get switched to pm_runtime_resume_and_get(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/E1urv09-00000000gJ1-3SxO@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01selftests: net: fix spelling and grammar mistakesPraveen Balakrishnan
Fix several spelling and grammatical mistakes in output messages from the net selftests to improve readability. Only the message strings for the test output have been modified. No changes to the functional logic of the tests have been made. Signed-off-by: Praveen Balakrishnan <praveen.balakrishnan@magd.ox.ac.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250828211100.51019-1-praveen.balakrishnan@magd.ox.ac.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01ptp: Limit time setting of PTP clocksMiroslav Lichvar
Networking drivers implementing PTP clocks and kernel socket code handling hardware timestamps use the 64-bit signed ktime_t type counting nanoseconds. When a PTP clock reaches the maximum value in year 2262, the timestamps returned to applications will overflow into year 1667. The same thing happens when injecting a large offset with clock_adjtime(ADJ_SETOFFSET). The commit 7a8e61f84786 ("timekeeping: Force upper bound for setting CLOCK_REALTIME") limited the maximum accepted value setting the system clock to 30 years before the maximum representable value (i.e. year 2232) to avoid the overflow, assuming the system will not run for more than 30 years. Enforce the same limit for PTP clocks. Don't allow negative values and values closer than 30 years to the maximum value. Drivers may implement an even lower limit if the hardware registers cannot represent the whole interval between years 1970 and 2262 in the required resolution. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <jstultz@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250828103300.1387025-1-mlichvar@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01net: ethernet: qualcomm: QCOM_PPE should depend on ARCH_QCOMGeert Uytterhoeven
The Qualcomm Technologies, Inc. Packet Process Engine (PPE) is only present on Qualcomm IPQ SoCs. Hence add a dependency on ARCH_QCOM, to prevent asking the user about this driver when configuring a kernel without Qualcomm platform support, Fixes: 353a0f1d5b27606b ("net: ethernet: qualcomm: Add PPE driver for IPQ9574 SoC") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/eb7bd6e6ce27eb6d602a63184d9daa80127e32bd.1756466786.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01tcp: Remove sk->sk_prot->orphan_count.Kuniyuki Iwashima
TCP tracks the number of orphaned (SOCK_DEAD but not yet destructed) sockets in tcp_orphan_count. In some code that was shared with DCCP, tcp_orphan_count is referenced via sk->sk_prot->orphan_count. Let's reference tcp_orphan_count directly. inet_csk_prepare_for_destroy_sock() is moved to inet_connection_sock.c due to header dependency. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250829215641.711664-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29Merge branch 'net-add-rcu-safety-to-dst-dev'Jakub Kicinski
Eric Dumazet says: ==================== net: add rcu safety to dst->dev Followup of commit 88fe14253e18 ("net: dst: add four helpers to annotate data-races around dst->dev"). Use lockdep enabled helpers to convert our unsafe dst->dev uses one at a time. More to come... ==================== Link: https://patch.msgid.link/20250828195823.3958522-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29ipv4: start using dst_dev_rcu()Eric Dumazet
Change icmpv4_xrlim_allow(), ip_defrag() to prevent possible UAF. Change ipmr_prepare_xmit(), ipmr_queue_fwd_xmit(), ip_mr_output(), ipv4_neigh_lookup() to use lockdep enabled dst_dev_rcu(). Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29tcp: use dst_dev_rcu() in tcp_fastopen_active_disable_ofo_check()Eric Dumazet
Use RCU to avoid a pair of atomic operations and a potential UAF on dst_dev()->flags. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29tcp_metrics: use dst_dev_net_rcu()Eric Dumazet
Replace three dst_dev() with a lockdep enabled helper. Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29net: use dst_dev_rcu() in sk_setup_caps()Eric Dumazet
Use RCU to protect accesses to dst->dev from sk_setup_caps() and sk_dst_gso_max_size(). Also use dst_dev_rcu() in ip6_dst_mtu_maybe_forward(), and ip_dst_mtu_maybe_forward(). ip4_dst_hoplimit() can use dst_dev_net_rcu(). Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29ipv6: use RCU in ip6_output()Eric Dumazet
Use RCU in ip6_output() in order to use dst_dev_rcu() to prevent possible UAF. We can remove rcu_read_lock()/rcu_read_unlock() pairs from ip6_finish_output2(). Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29ipv6: use RCU in ip6_xmit()Eric Dumazet
Use RCU in ip6_xmit() in order to use dst_dev_rcu() to prevent possible UAF. Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29ipv6: start using dst_dev_rcu()Eric Dumazet
Refactor icmpv6_xrlim_allow() and ip6_dst_hoplimit() so that we acquire rcu_read_lock() a bit longer to be able to use dst_dev_rcu() instead of dst_dev(). __ip6_rt_update_pmtu() and rt6_do_redirect can directly use dst_dev_rcu() in sections already holding rcu_read_lock(). Small changes to use dst_dev_net_rcu() in ip6_default_advmss(), ipv6_sock_ac_join(), ip6_mc_find_dev() and ndisc_send_skb(). Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29net: dst: introduce dst->dev_rcuEric Dumazet
Followup of commit 88fe14253e18 ("net: dst: add four helpers to annotate data-races around dst->dev"). We want to gradually add explicit RCU protection to dst->dev, including lockdep support. Add an union to alias dst->dev_rcu and dst->dev. Add dst_dev_net_rcu() helper. Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250828195823.3958522-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29Merge branch 'inet_diag-make-dumps-faster-with-simple-filters'Jakub Kicinski
Eric Dumazet says: ==================== inet_diag: make dumps faster with simple filters inet_diag_bc_sk() pulls five cache lines per socket, while most filters only need the two first ones. We can change it to only pull needed cache lines, to make things like "ss -temoi src :21456" much faster. First patches (1-3) are annotating data-races as a first step. ==================== Link: https://patch.msgid.link/20250828102738.2065992-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29inet_diag: avoid cache line misses in inet_diag_bc_sk()Eric Dumazet
inet_diag_bc_sk() pulls five cache lines per socket, while most filters only need the two first ones. Add three booleans to struct inet_diag_dump_data, that are selectively set if a filter needs specific socket fields. - mark_needed /* INET_DIAG_BC_MARK_COND present. */ - cgroup_needed /* INET_DIAG_BC_CGROUP_COND present. */ - userlocks_needed /* INET_DIAG_BC_AUTO present. */ This removes millions of cache lines misses per ss invocation when simple filters are specified on busy servers. offsetof(struct sock, sk_userlocks) = 0xf3 offsetof(struct sock, sk_mark) = 0x20c offsetof(struct sock, sk_cgrp_data) = 0x298 Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250828102738.2065992-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29inet_diag: change inet_diag_bc_sk() first argumentEric Dumazet
We want to have access to the inet_diag_dump_data structure in the following patch. This patch removes duplication in callers. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250828102738.2065992-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29inet_diag: annotate data-races in inet_diag_bc_sk()Eric Dumazet
inet_diag_bc_sk() runs with an unlocked socket, annotate potential races with READ_ONCE(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250828102738.2065992-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29tcp: annotate data-races in tcp_req_diag_fill()Eric Dumazet
req->num_retrans and rsk_timer.expires are read locklessly, and can be changed from tcp_rtx_synack(). Add READ_ONCE()/WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250828102738.2065992-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29inet_diag: annotate data-races in inet_diag_msg_common_fill()Eric Dumazet
inet_diag_msg_common_fill() can run without socket lock. Add READ_ONCE() or data_race() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250828102738.2065992-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29microchip: lan865x: add ndo_eth_ioctl handler to enable PHY ioctl supportParthiban Veerasooran
Introduce support for standard MII ioctl operations in the LAN865x Ethernet driver by implementing the .ndo_eth_ioctl callback. This allows PHY-related ioctl commands to be handled via phy_do_ioctl_running() and enables support for ethtool and other user-space tools that rely on ioctl interface to perform PHY register access using commands like SIOCGMIIREG and SIOCSMIIREG. This feature enables improved diagnostics and PHY configuration capabilities from userspace. Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250828114549.46116-1-parthiban.veerasooran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29vsock/test: Remove redundant semicolonsLiao Yuanhong
Remove unnecessary semicolons. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250828083938.400872-1-liaoyuanhong@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29pppoe: drop sock reference counting on fast pathQingfang Deng
Now that PPPoE sockets are freed via RCU (SOCK_RCU_FREE), it is no longer necessary to take a reference count when looking up sockets on the receive path. Readers are protected by RCU, so the socket memory remains valid until after a grace period. Convert fast-path lookups to avoid refcounting: - Replace get_item() and sk_receive_skb() in pppoe_rcv() with __get_item() and __sk_receive_skb(). - Rework get_item_by_addr() into __get_item_by_addr() (no refcount and move RCU lock into pppoe_ioctl) - Remove unnecessary sock_put() calls. This avoids cacheline bouncing from atomic reference counting and improves performance on the receive fast path. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250828012018.15922-2-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29pppoe: remove rwlock usageQingfang Deng
Like ppp_generic.c, convert the PPPoE socket hash table to use RCU for lookups and a spinlock for updates. This removes rwlock usage and allows lockless readers on the fast path. - Mark hash table and list pointers as __rcu. - Use spin_lock() to protect writers. - Readers use rcu_dereference() under rcu_read_lock(). All known callers of get_item() already hold the RCU read lock, so no additional locking is needed. - get_item() now uses refcount_inc_not_zero() instead of sock_hold() to safely take a reference. This prevents crashes if a socket is already in the process of being freed (sk_refcnt == 0). - Set SOCK_RCU_FREE to defer socket freeing until after an RCU grace period. - Move skb_queue_purge() into sk_destruct callback to ensure purge happens after an RCU grace period. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250828012018.15922-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.17-rc4). No conflicts. Adjacent changes: drivers/net/ethernet/intel/idpf/idpf_txrx.c 02614eee26fb ("idpf: do not linearize big TSO packets") 6c4e68480238 ("idpf: remove obsolete stashing code") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-28Merge tag 'net-6.17-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth. Current release - regressions: - ipv4: fix regression in local-broadcast routes - vsock: fix error-handling regression introduced in v6.17-rc1 Previous releases - regressions: - bluetooth: - mark connection as closed during suspend disconnect - fix set_local_name race condition - eth: - ice: fix NULL pointer dereference on reset - mlx5: fix memory leak in hws_pool_buddy_init error path - bnxt_en: fix stats context reservation logic - hv: fix loss of receive events from host during channel open Previous releases - always broken: - page_pool: fix incorrect mp_ops error handling - sctp: initialize more fields in sctp_v6_from_sk() - eth: - octeontx2-vf: fix max packet length errors - idpf: fix Tx flow scheduling to avoid Tx timeouts - bnxt_en: fix memory corruption during ifdown - ice: fix incorrect counter for buffer allocation failures - mlx5: fix lockdep assertion on sync reset unload event - fbnic: fixup rtnl_lock and devl_lock handling - xgmac: do not enable RX FIFO overflow interrupts - phy: mscc: fix when PTP clock is register and unregister Misc: - add Telit Cinterion LE910C4-WWX new compositions" * tag 'net-6.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits) net: ipv4: fix regression in local-broadcast routes net: macb: Disable clocks once fbnic: Move phylink resume out of service_task and into open/close fbnic: Fixup rtnl_lock and devl_lock handling related to mailbox code net: rose: fix a typo in rose_clear_routes() l2tp: do not use sock_hold() in pppol2tp_session_get_sock() sctp: initialize more fields in sctp_v6_from_sk() MAINTAINERS: rmnet: Update email addresses net: rose: include node references in rose_neigh refcount net: rose: convert 'use' field to refcount_t net: rose: split remove and free operations in rose_remove_neigh() net: hv_netvsc: fix loss of early receive events from host during channel open. net: stmmac: Set CIC bit only for TX queues with COE net: stmmac: xgmac: Correct supported speed modes net: stmmac: xgmac: Do not enable RX FIFO Overflow interrupts net/mlx5e: Set local Xoff after FW update net/mlx5e: Update and set Xon/Xoff upon port speed set net/mlx5e: Update and set Xon/Xoff upon MTU set net/mlx5: Prevent flow steering mode changes in switchdev mode net/mlx5: Nack sync reset when SFs are present ...
2025-08-28Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: split ice_virtchnl.c git-blame friendly way Przemek Kitszel says: Split ice_virtchnl.c into two more files (+headers), in a way that git-blame works better. Then move virtchnl files into a new subdir. No logic changes. I have developed (or discovered ;)) how to split a file in a way that both old and new are nice in terms of git-blame There was not much discussion on [RFC], so I would like to propose to go forward with this approach. There are more commits needed to have it nice, so it forms a git-log vs git-blame tradeoff, but (after the brief moment that this is on the top) we spend orders of magnitude more time looking at the blame output (and commit messages linked from that) - so I find it much better to see actual logic changes instead of "move xx to yy" stuff (typical for "squashed/single-commit splits"). Cherry-picks/rebases work the same with this method as with simple "squashed/single-commit" approach (literally all commits squashed into one (to have better git-log, but shitty git-blame output). Rationale for the split itself is, as usual, "file is big and we want to extend it". * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: finish virtchnl.c split into rss.c ice: extract virt/rss.c: cleanup - p2 ice: extract virt/rss.c: cleanup - p1 ice: split RSS stuff out of virtchnl.c - copy back ice: split RSS stuff out of virtchnl.c - tmp rename ice: finish virtchnl.c split into queues.c ice: extract virt/queues.c: cleanup - p3 ice: extract virt/queues.c: cleanup - p2 ice: extract virt/queues.c: cleanup - p1 ice: split queue stuff out of virtchnl.c - copy back ice: split queue stuff out of virtchnl.c - tmp rename ice: add virt/ and move ice_virtchnl* files there ==================== Link: https://patch.msgid.link/20250827224641.415806-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-28eth: mlx5: remove Kconfig co-dependency with VXLANJakub Kicinski
mlx5 has a Kconfig co-dependency on VXLAN, even tho it doesn't call any VXLAN function (unlike mlxsw). Perhaps this dates back to very old days when tunnel ports were fetched directly from VXLAN. Remove the dependency to allow MLX5=y + VXLAN=m kernel configs. But still avoid compiling in the lib/vxlan code if VXLAN=n. Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20250827234319.3504852-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-28net: stmmac: mdio: clean up c22/c45 accessor splitRussell King (Oracle)
The C45 accessors were setting the GR (register number) field twice, once with the 16-bit register address truncated to five bits, and then overwritten with the C45 devad. This is harmless since the field was being cleared prior to being updated with the C45 devad, except for the extra work. Remove the redundant code. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/E1urGBn-00000000DCH-3swS@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-28Merge branch 'net_sched-extend-rcu-use-in-dump-methods-ii'Jakub Kicinski
Eric Dumazet says: ==================== net_sched: extend RCU use in dump() methods (II) Second series adding RCU dump() to three actions First patch removes BH blocking on modules done in the first series. ==================== Link: https://patch.msgid.link/20250827125349.3505302-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-28net_sched: act_skbmod: use RCU in tcf_skbmod_dump()Eric Dumazet
Also storing tcf_action into struct tcf_skbmod_params makes sure there is no discrepancy in tcf_skbmod_act(). No longer block BH in tcf_skbmod_init() when acquiring tcf_lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250827125349.3505302-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-28net_sched: act_tunnel_key: use RCU in tunnel_key_dump()Eric Dumazet
Also storing tcf_action into struct tcf_tunnel_key_params makes sure there is no discrepancy in tunnel_key_act(). No longer block BH in tunnel_key_init() when acquiring tcf_lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250827125349.3505302-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-28net_sched: act_vlan: use RCU in tcf_vlan_dump()Eric Dumazet
Also storing tcf_action into struct tcf_vlan_params makes sure there is no discrepancy in tcf_vlan_act(). No longer block BH in tcf_vlan_init() when acquiring tcf_lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250827125349.3505302-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-28net_sched: remove BH blocking in eight actionsEric Dumazet
Followup of f45b45cbfae3 ("Merge branch 'net_sched-act-extend-rcu-use-in-dump-methods'") We never grab tcf_lock from BH context in these modules: act_connmark act_csum act_ct act_ctinfo act_mpls act_nat act_pedit act_skbedit No longer block BH when acquiring tcf_lock from init functions. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250827125349.3505302-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>