summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-19netfilter: flowtable: add CLOSING statePablo Neira Ayuso
tcp rst/fin packet triggers an immediate teardown of the flow which results in sending flows back to the classic forwarding path. This behaviour was introduced by: da5984e51063 ("netfilter: nf_flow_table: add support for sending flows back to the slow path") b6f27d322a0a ("netfilter: nf_flow_table: tear down TCP flows if RST or FIN was seen") whose goal is to expedite removal of flow entries from the hardware table. Before these patches, the flow was released after the flow entry timed out. However, this approach leads to packet races when restoring the conntrack state as well as late flow re-offload situations when the TCP connection is ending. This patch adds a new CLOSING state that is is entered when tcp rst/fin packet is seen. This allows for an early removal of the flow entry from the hardware table. But the flow entry still remains in software, so tcp packets to shut down the flow are not sent back to slow path. If syn packet is seen from this new CLOSING state, then this flow enters teardown state, ct state is set to TCP_CONNTRACK_CLOSE state and packet is sent to slow path, so this TCP reopen scenario can be handled by conntrack. TCP_CONNTRACK_CLOSE provides a small timeout that aims at quickly releasing this stale entry from the conntrack table. Moreover, skip hardware re-offload from flowtable software packet if the flow is in CLOSING state. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: flowtable: teardown flow if cached mtu is stalePablo Neira Ayuso
Tear down the flow entry in the unlikely case that the interface mtu changes, this gives the flow a chance to refresh the cached mtu, otherwise such refresh does not occur until flow entry expires. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: conntrack: rework offload nf_conn timeout extension logicFlorian Westphal
Offload nf_conn entries may not see traffic for a very long time. To prevent incorrect 'ct is stale' checks during nf_conntrack table lookup, the gc worker extends the timeout nf_conn entries marked for offload to a large value. The existing logic suffers from a few problems. Garbage collection runs without locks, its unlikely but possible that @ct is removed right after the 'offload' bit test. In that case, the timeout of a new/reallocated nf_conn entry will be increased. Prevent this by obtaining a reference count on the ct object and re-check of the confirmed and offload bits. If those are not set, the ct is being removed, skip the timeout extension in this case. Parallel teardown is also problematic: cpu1 cpu2 gc_worker calls flow_offload_teardown() tests OFFLOAD bit, set clear OFFLOAD bit ct->timeout is repaired (e.g. set to timeout[UDP_CT_REPLIED]) nf_ct_offload_timeout() called expire value is fetched <INTERRUPT> -> NF_CT_DAY timeout for flow that isn't offloaded (and might not see any further packets). Use cmpxchg: if ct->timeout was repaired after the 2nd 'offload bit' test passed, then ct->timeout will only be updated of ct->timeout was not altered in between. As we already have a gc worker for flowtable entries, ct->timeout repair can be handled from the flowtable gc worker. This avoids having flowtable specific logic in the conntrack core and avoids checking entries that were never offloaded. This allows to remove the nf_ct_offload_timeout helper. Its safe to use in the add case, but not on teardown. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: conntrack: remove skb argument from nf_ct_refreshFlorian Westphal
Its not used (and could be NULL), so remove it. This allows to use nf_ct_refresh in places where we don't have an skb without having to double-check that skb == NULL would be safe. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: nft_flow_offload: update tcp state flags under lockFlorian Westphal
The conntrack entry is already public, there is a small chance that another CPU is handling a packet in reply direction and racing with the tcp state update. Move this under ct spinlock. This is done once, when ct is about to be offloaded, so this should not result in a noticeable performance hit. Fixes: 8437a6209f76 ("netfilter: nft_flow_offload: set liberal tracking mode for tcp") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: nft_flow_offload: clear tcp MAXACK flag before moving to slowpathFlorian Westphal
This state reset is racy, no locks are held here. Since commit 8437a6209f76 ("netfilter: nft_flow_offload: set liberal tracking mode for tcp"), the window checks are disabled for normal data packets, but MAXACK flag is checked when validating TCP resets. Clear the flag so tcp reset validation checks are ignored. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: nf_tables: Simplify chain netdev notifierPhil Sutter
With conditional chain deletion gone, callback code simplifies: Instead of filling an nft_ctx object, just pass basechain to the per-chain function. Also plain list_for_each_entry() is safe now. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: nf_tables: Tolerate chains with no remaining hooksPhil Sutter
Do not drop a netdev-family chain if the last interface it is registered for vanishes. Users dumping and storing the ruleset upon shutdown to restore it upon next boot may otherwise lose the chain and all contained rules. They will still lose the list of devices, a later patch will fix that. For now, this aligns the event handler's behaviour with that for flowtables. The controversal situation at netns exit should be no problem here: event handler will unregister the hooks, core nftables cleanup code will drop the chain itself. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: nf_tables: Compare netdev hooks based on stored namePhil Sutter
The 1:1 relationship between nft_hook and nf_hook_ops is about to break, so choose the stored ifname to uniquely identify hooks. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: nf_tables: Use stored ifname in netdev hook dumpsPhil Sutter
The stored ifname and ops.dev->name may deviate after creation due to interface name changes. Prefer the more deterministic stored name in dumps which also helps avoiding inadvertent changes to stored ruleset dumps. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: nf_tables: Store user-defined hook ifnamePhil Sutter
Prepare for hooks with NULL ops.dev pointer (due to non-existent device) and store the interface name and length as specified by the user upon creation. No functional change intended. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: nf_tables: Flowtable hook's pf value never variesPhil Sutter
When checking for duplicate hooks in nft_register_flowtable_net_hooks(), comparing ops.pf value is pointless as it is always NFPROTO_NETDEV with flowtable hooks. Dropping the check leaves the search identical to the one in nft_hook_list_find() so call that function instead of open coding. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: br_netfilter: remove unused conditional and dead codeAntoine Tenart
The SKB_DROP_REASON_IP_INADDRERRORS drop reason is never returned from any function, as such it cannot be returned from the ip_route_input call tree. The 'reason != SKB_DROP_REASON_IP_INADDRERRORS' conditional is thus always true. Looking back at history, commit 50038bf38e65 ("net: ip: make ip_route_input() return drop reasons") changed the ip_route_input returned value check in br_nf_pre_routing_finish from -EHOSTUNREACH to SKB_DROP_REASON_IP_INADDRERRORS. It turns out -EHOSTUNREACH could not be returned either from the ip_route_input call tree and this since commit 251da4130115 ("ipv4: Cache ip_error() routes even when not forwarding."). Not a fix as this won't change the behavior. While at it use kfree_skb_reason. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-19netfilter: nf_tables: fix set size with rbtree backendPablo Neira Ayuso
The existing rbtree implementation uses singleton elements to represent ranges, however, userspace provides a set size according to the number of ranges in the set. Adjust provided userspace set size to the number of singleton elements in the kernel by multiplying the range by two. Check if the no-match all-zero element is already in the set, in such case release one slot in the set size. Fixes: 0ed6389c483d ("netfilter: nf_tables: rename set implementations") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-14Merge branch 'arrange-pse-core-and-update-tps23881-driver'Paolo Abeni
Kory Maincent says: ==================== Arrange PSE core and update TPS23881 driver This patch includes several improvements to the PSE core for better implementation and maintainability: - Move the conversion between current limit and power limit from the driver to the PSE core. - Update power and current limit checks. - Split the ethtool_get_status callback into multiple callbacks. - Fix PSE PI of_node detection. - Clean ethtool header of PSE structures. Additionally, the TPS23881 driver has been updated to support power limit and measurement features, aligning with the new PSE core functionalities. This patch series is the first part of the budget evaluation strategy support patch series sent earlier: https://lore.kernel.org/netdev/20250104161622.7b82dfdf@kmaincent-XPS-13-7390/T/#t Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> ==================== Link: https://patch.msgid.link/20250110-b4-feature_poe_arrange-v3-0-142279aedb94@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: Clean ethtool header of PSE structuresKory Maincent
Remove PSE-specific structures from the ethtool header to improve code modularity, maintain independent headers, and reduce incremental build time. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: Fix missing PI of_node descriptionKory Maincent
The PI of_node was not assigned in the regulator_config structure, leading to failures in resolving the correct supply when different power supplies are assigned to multiple PIs of a PSE controller. This fix ensures that the of_node is properly set in the regulator_config, allowing accurate supply resolution for each PI. Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: tps23881: Add support for power limit and measurement featuresKory Maincent
Expand PSE callbacks to support the newly introduced pi_get/set_pw_limit() and pi_get_voltage() functions. These callbacks allow for power limit configuration in the TPS23881 controller. Additionally, the patch includes the pi_get_pw_class() the pi_get_actual_pw(), and the pi_get_pw_limit_ranges') callbacks providing more comprehensive PoE status reporting. Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: Remove is_enabled callback from driversKory Maincent
The is_enabled callback is now redundant as the admin_state can be obtained directly from the driver and provides the same information. To simplify functionality, the core will handle this internally, making the is_enabled callback unnecessary at the driver level. Remove the callback from all drivers. Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: Split ethtool_get_status into multiple callbacksKory Maincent
The ethtool_get_status callback currently handles all status and PSE information within a single function. This approach has two key drawbacks: 1. If the core requires some information for purposes other than ethtool_get_status, redundant code will be needed to fetch the same data from the driver (like is_enabled). 2. Drivers currently have access to all information passed to ethtool. New variables will soon be added to ethtool status, such as PSE ID, power domain IDs, and budget evaluation strategies, which are meant to be managed solely by the core. Drivers should not have the ability to modify these variables. To resolve these issues, ethtool_get_status has been split into multiple callbacks, with each handling a specific piece of information required by ethtool or the core. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: Use power limit at driver side instead of current limitKory Maincent
The regulator framework uses current limits, but the PSE standard and known PSE controllers rely on power limits. Instead of converting current to power within each driver, perform the conversion in the PSE core. This avoids redundancy in driver implementation and aligns better with the standard, simplifying driver development. Remove at the same time the _pse_ethtool_get_status() function which is not needed anymore. Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: tps23881: Add missing configuration register after disableKory Maincent
When setting the PWOFF register, the controller resets multiple configuration registers. This patch ensures these registers are reconfigured as needed following a disable operation. Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: tps23881: Use helpers to calculate bit offset for a channelKory Maincent
This driver frequently follows a pattern where two registers are read or written in a single operation, followed by calculating the bit offset for a specific channel. Introduce helpers to streamline this process and reduce code redundancy, making the codebase cleaner and more maintainable. Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: tps23881: Simplify function returns by removing redundant checksKory Maincent
Cleaned up several functions in tps23881 by removing redundant checks on return values at the end of functions. These check has been removed, and the return statement now directly returns the function result, reducing the code's complexity and making it more concise. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Kyle Swenson <kyle.swenson@est.tech> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: Add power limit checkKory Maincent
Checking only the current limit is not sufficient. According to the standard, voltage can reach up to 57V and current up to 1.92A, which exceeds the power limit described in the standard (99.9W). Add a power limit check to prevent this. Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: Avoid setting max_uA in regulator constraintsKory Maincent
Setting the max_uA constraint in the regulator API imposes a current limit during the regulator registration process. This behavior conflicts with preserving the maximum PI power budget configuration across reboots. Instead, compare the desired current limit to MAX_PI_CURRENT in the pse_pi_set_current_limit() function to ensure proper handling of the power budget. Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: pse-pd: Remove unused pse_ethtool_get_pw_limit function declarationKory Maincent
Removed the unused pse_ethtool_get_pw_limit() function declaration from pse.h. This function was declared but never implemented or used, making the declaration unnecessary. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Kyle Swenson <kyle.swenson@est.tech> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14Merge branch 'add-multicast-filtering-support-for-vlan-interface'Paolo Abeni
MD Danish Anwar says: ==================== Add Multicast Filtering support for VLAN interface This series adds Multicast filtering support for VLAN interfaces in dual EMAC and HSR offload mode for ICSSG driver. Patch 1/4 - Adds support for VLAN in dual EMAC mode Patch 2/4 - Adds MC filtering support for VLAN in dual EMAC mode Patch 3/4 - Create and export hsr_get_port_ndev() in hsr_device.c Patch 4/4 - Adds MC filtering support for VLAN in HSR mode [1] https://lore.kernel.org/all/20241216100044.577489-2-danishanwar@ti.com/ [2] https://lore.kernel.org/all/202412210336.BmgcX3Td-lkp@intel.com/#t [3] https://lore.kernel.org/all/31bb8a3e-5a1c-4c94-8c33-c0dfd6d643fb@kernel.org/ v1 https://lore.kernel.org/all/20241216100044.577489-1-danishanwar@ti.com/ v2 https://lore.kernel.org/all/20241223092557.2077526-1-danishanwar@ti.com/ v3 https://lore.kernel.org/all/20250103092033.1533374-1-danishanwar@ti.com/ ==================== Link: https://patch.msgid.link/20250110082852.3899027-1-danishanwar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: ti: icssg-prueth: Add Support for Multicast filtering with VLAN in HSR modeMD Danish Anwar
Add multicast filtering support for VLAN interfaces in HSR offload mode for ICSSG driver. The driver calls vlan_for_each() API on the hsr device's ndev to get the list of available vlans for the hsr device. The driver then sync mc addr of vlan interface with a locally mainatined list emac->vlan_mcast_list[vid] using __hw_addr_sync_multiple() API. The driver then calls the sync / unsync callbacks. In the sync / unsync call back, driver checks if the vdev's real dev is hsr device or not. If the real dev is hsr device, driver gets the per port device using hsr_get_port_ndev() and then driver passes appropriate vid to FDB helper functions. Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: hsr: Create and export hsr_get_port_ndev()MD Danish Anwar
Create an API to get the net_device to the slave port of HSR device. The API will take hsr net_device and enum hsr_port_type for which we want the net_device as arguments. This API can be used by client drivers who support HSR and want to get the net_devcie of slave ports from the hsr device. Export this API for the same. This API needs the enum hsr_port_type to be accessible by the drivers using hsr. Move the enum hsr_port_type from net/hsr/hsr_main.h to include/linux/if_hsr.h for the same. Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: ti: icssg-prueth: Add Multicast Filtering support for VLAN in MAC modeMD Danish Anwar
Add multicast filtering support for VLAN interfaces in dual EMAC mode for ICSSG driver. The driver uses vlan_for_each() API to get the list of available vlans. The driver then sync mc addr of vlan interface with a locally mainatined list emac->vlan_mcast_list[vid] using __hw_addr_sync_multiple() API. __hw_addr_sync_multiple() is used instead of __hw_addr_sync() to sync vdev->mc with local list because the sync_cnt for addresses in vdev->mc will already be set by the vlan_dev_set_rx_mode() [net/8021q/vlan_dev.c] and __hw_addr_sync() only syncs when the sync_cnt == 0. Whereas __hw_addr_sync_multiple() can sync addresses even if sync_cnt is not 0. Export __hw_addr_sync_multiple() so that driver can use it. Once the local list is synced, driver calls __hw_addr_sync_dev() with the local list, vdev, sync and unsync callbacks. __hw_addr_sync_dev() is used with the local maintained list as the list to synchronize instead of using __dev_mc_sync() on vdev because __dev_mc_sync() on vdev will call __hw_addr_sync_dev() on vdev->mc and sync_cnt for addresses in vdev->mc will already be set by the vlan_dev_set_rx_mode() [net/8021q/vlan_dev.c] and __hw_addr_sync_dev() only syncs if the sync_cnt of addresses in the list (vdev->mc in this case) is 0. Whereas __hw_addr_sync_dev() on local list will work fine as the sync_cnt for addresses in the local list will still be 0. Based on change in addresses in the local list, sync / unsync callbacks are invoked. In the sync / unsync API in driver, based on whether the ndev is vlan or not, driver passes appropriate vid to FDB helper functions. Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: ti: icssg-prueth: Add VLAN support in EMAC modeMD Danish Anwar
Add support for VLAN filtering in dual EMAC mode. Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14Merge tag 'nf-next-25-01-11' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains a small batch of Netfilter/IPVS updates for net-next: 1) Remove unused genmask parameter in nf_tables_addchain() 2) Speed up reads from /proc/net/ip_vs_conn, from Florian Westphal. 3) Skip empty buckets in hashlimit to avoid atomic operations that results in false positive reports by syzbot with lockdep enabled, patch from Eric Dumazet. 4) Add conntrack event timestamps available via ctnetlink, from Florian Westphal. netfilter pull request 25-01-11 * tag 'nf-next-25-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: conntrack: add conntrack event timestamp netfilter: xt_hashlimit: htable_selective_cleanup() optimization ipvs: speed up reads from ip_vs_conn proc file netfilter: nf_tables: remove the genmask parameter ==================== Link: https://patch.msgid.link/20250111230800.67349-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14Merge branch 'introduce-unified-and-structured-phy'Paolo Abeni
Oleksij Rempel says: ==================== Introduce unified and structured PHY This patch set introduces a unified and well-structured interface for reporting PHY statistics. Instead of relying on arbitrary strings in PHY drivers, this interface provides a consistent and structured way to expose PHY statistics to userspace via ethtool. The initial groundwork for this effort was laid by Jakub Kicinski, who contributed patches to plumb PHY statistics to drivers and added support for structured statistics in ethtool. Building on Jakub's work, I tested the implementation with several PHYs, addressed a few issues, and added support for statistics in two specific PHY drivers. Most of changes are tracked in separate patches. changes v6: - drop ethtool_stat_add patch changes v5: - rebase against latest net-next ==================== Link: https://patch.msgid.link/20250110060517.711683-1-o.rempel@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: phy: dp83tg720: add statistics supportOleksij Rempel
Add support for reporting PHY statistics in the DP83TG720 driver. This includes cumulative tracking of link loss events, transmit/receive packet counts, and error counts. Implemented functions to update and provide statistics via ethtool, with optional polling support enabled through `PHY_POLL_STATS`. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: phy: dp83td510: add statistics supportOleksij Rempel
Add support for reporting PHY statistics in the DP83TD510 driver. This includes cumulative tracking of transmit/receive packet counts, and error counts. Implemented functions to update and provide statistics via ethtool, with optional polling support enabled through `PHY_POLL_STATS`. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: phy: introduce optional polling interface for PHY statisticsOleksij Rempel
Add an optional polling interface for PHY statistics to simplify driver implementation. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14Documentation: networking: update PHY error counter diagnostics in twisted ↵Oleksij Rempel
pair guide Replace generic instructions for monitoring error counters with a procedure using the unified PHY statistics interface (`--all-groups`). Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: ethtool: add support for structured PHY statisticsJakub Kicinski
Introduce a new way to report PHY statistics in a structured and standardized format using the netlink API. This new method does not replace the old driver-specific stats, which can still be accessed with `ethtool -S <eth name>`. The structured stats are available with `ethtool -S <eth name> --all-groups`. This new method makes it easier to diagnose problems by organizing stats in a consistent and documented way. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: ethtool: plumb PHY stats to PHY driversJakub Kicinski
Introduce support for standardized PHY statistics reporting in ethtool by extending the PHYLIB framework. Add the functions phy_ethtool_get_phy_stats() and phy_ethtool_get_link_ext_stats() to provide a consistent interface for retrieving PHY-level and link-specific statistics. These functions are used within the ethtool implementation to avoid direct access to the phy_device structure outside of the PHYLIB framework. A new structure, ethtool_phy_stats, is introduced to standardize PHY statistics such as packet counts, byte counts, and error counters. Drivers are updated to include callbacks for retrieving PHY and link-specific statistics, ensuring values are explicitly set only for supported fields, initialized with ETHTOOL_STAT_NOT_SET to avoid ambiguity. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14ethtool: linkstate: migrate linkstate functions to support multi-PHY setupsOleksij Rempel
Adapt linkstate_get_sqi() and linkstate_get_sqi_max() to take a phy_device argument directly, enabling support for setups with multiple PHYs. The previous assumption of a single PHY attached to a net_device no longer holds. Use ethnl_req_get_phydev() to identify the appropriate PHY device for the operation. Update linkstate_prepare_data() and related logic to accommodate this change, ensuring compatibility with multi-PHY configurations. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: phy: microchip_t1: depend on PTP_1588_CLOCK_OPTIONALDivya Koppera
When microchip_t1_phy is built in and phyptp is module facing undefined reference issue. This get fixed when microchip_t1_phy made dependent on PTP_1588_CLOCK_OPTIONAL. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501090604.YEoJXCXi-lkp@intel.com Fixes: fa51199c5f34 ("net: phy: microchip_rds_ptp : Add rds ptp library for Microchip phys") Signed-off-by: Divya Koppera <divya.koppera@microchip.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Link: https://patch.msgid.link/20250110054424.16807-1-divya.koppera@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: sched: calls synchronize_net() only when neededEric Dumazet
dev_deactivate_many() role is to remove the qdiscs of a network device. When/if a qdisc is dismantled, an rcu grace period is needed to make sure all outstanding qdisc enqueue are done before we proceed with a qdisc reset. Most virtual devices do not have a qdisc. We can call the expensive synchronize_net() only if needed. Note that dev_deactivate_many() does not have to deal with qdisc-less dev_queue_xmit, as an old comment was claiming. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250109171850.2871194-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-13Merge branch 'mlx5-hw-managed-flow-steering-in-fs-core-level'Jakub Kicinski
Tariq Toukan says: ==================== mlx5 HW-Managed Flow Steering in FS core level This patchset by Moshe follows Yevgeny's patchsets [1][2] on subject "HW-Managed Flow Steering in mlx5 driver". As introduced there in HW managed Flow Steering mode (HWS) the driver is configuring steering rules directly to the HW using WQs with a special new type of WQE (Work Queue Element). This way we can reach higher rule insertion/deletion rate with much lower CPU utilization compared to SW Managed Flow Steering (SWS). This patchset adds API to manage namespace, flow tables, flow groups and prepare FTE (Flow Table Entry) rules. It also adds caching and pool mechanisms for HWS actions to allow sharing of steering actions among different rules. The implementation of this API in FS layer, allows FS core to use HW Managed Flow Steering in addition to the existing FW or SW Managed Flow Steering. Patch 13 of this series adds support for configuring HW Managed Flow Steering mode through devlink param, similar to configuring SW Managed Flow Steering mode: # devlink dev param set pci/0000:08:00.0 name flow_steering_mode \ cmode runtime value hmfs In addition, the series contains 2 HWS patches from Yevgeny that implement flow update support. [1] https://lore.kernel.org/netdev/20240903031948.78006-1-saeed@kernel.org/ [2] https://lore.kernel.org/all/20250102181415.1477316-1-tariqt@nvidia.com/ ==================== Link: https://patch.msgid.link/20250109160546.1733647-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13net/mlx5: HWS, update flow - support through bigger action RTCYevgeny Kliteynik
This patch is the second part of update flow implementation. Instead of using two action RTCs, we use the same RTC which is twice the size of what was required before the update flow support. This way we always allocate STEs from the same RTC (same pool), which means that update is done similar to how create is done. The bigger size allows us to allocate and write new STEs, and later free the old (pre-update) STEs. Similar to rule creation, STEs are written in reverse order: - write action STEs, while match STE is still pointing to the old action STEs - overwrite the match STE with the new one, which now is pointing to the new action STEs Old action STEs can be freed only once we got completion on the writing of the new match STE. To implement this we added new rule states: UPDATING/UPDATED. Rule is moved to UPDATING state in the beginning of the update flow. Once all completions are received, rule is moved to UPDATED state. At this point old action STEs are freed and rule goes back to CREATED state. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109160546.1733647-16-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13net/mlx5: HWS, update flow - remove the use of dual RTCsYevgeny Kliteynik
This patch is the first part of update flow implementation. Update flow should support rules with single STE (match STE only), as well as rules with multiple STEs (match STE plus action STEs). Supporting the rules with single STE is straightforward: we just overwrite the STE, which is an atomic operation. Supporting the rules with action STEs is a more complicated case. The existing implementation uses two action RTCs per matcher and alternates between the two for each update request. This implementation was unnecessarily complex and lead to some unhandled edge cases, so the support for rule update with multiple STEs wasn't really functional. This patch removes this code, and the next patch adds implementation of a different approach. Note that after applying this patch and before applying the next patch we still have support for update rule with single STE (only match STE w/o action STEs), but update will fail for rules with action STEs. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109160546.1733647-15-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13net/mlx5: fs, add HWS to steering mode optionsMoshe Shemesh
Add HW Steering mode to mlx5 devlink param of steering mode options. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109160546.1733647-14-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13net/mlx5: fs, add HWS get capabilitiesMoshe Shemesh
Add API function get capabilities to HW Steering flow commands. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109160546.1733647-13-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13net/mlx5: fs, set create match definer to not supported by HWSMoshe Shemesh
Currently HW Steering does not support the API functions of create and destroy match definer. Return not supported error in case requested. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109160546.1733647-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-13net/mlx5: fs, add support for dest vport HWS actionMoshe Shemesh
Add support for HW Steering action of vport destination. Add dest vport actions cache. Hold action in cache per vport / vport and vhca_id. Add action to cache on demand and remove on namespace closure to reduce actions creation and destroy. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109160546.1733647-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>