summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-16net: stmmac: Set page_pool_params.max_len to a precise sizeFurong Xu
DMA engine will always write no more than dma_buf_sz bytes of a received frame into a page buffer, the remaining spaces are unused or used by CPU exclusively. Setting page_pool_params.max_len to almost the full size of page(s) helps nothing more, but wastes more CPU cycles on cache maintenance. For a standard MTU of 1500, then dma_buf_sz is assigned to 1536, and this patch brings ~16.9% driver performance improvement in a TCP RX throughput test with iPerf tool on a single isolated Cortex-A65 CPU core, from 2.43 Gbits/sec increased to 2.84 Gbits/sec. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-16net: stmmac: Switch to zero-copy in non-XDP RX pathFurong Xu
Avoid memcpy in non-XDP RX path by marking all allocated SKBs to be recycled in the upper network stack. This patch brings ~11.5% driver performance improvement in a TCP RX throughput test with iPerf tool on a single isolated Cortex-A65 CPU core, from 2.18 Gbits/sec increased to 2.43 Gbits/sec. Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-15Merge branch 'net-mlx5e-ct-add-support-for-hardware-steering'Jakub Kicinski
Tariq Toukan says: ==================== net/mlx5e: CT: Add support for hardware steering This series start with one more HWS patch by Yevgeny, followed by patches that add support for connection tracking in hardware steering mode. It consists of: - patch #2 hooks up the CT ops for the new mode in the right places. - patch #3 moves a function into a common file, so it can be reused. - patch #4 uses the HWS API to implement connection tracking. The main advantage of hardware steering compared to software steering is vastly improved performance when adding/removing/updating rules. Using the T-Rex traffic generator to initiate multi-million UDP flows per second, a kernel running with these patches was able to offload ~600K unique UDP flows per second, a number around ~7x larger than software steering was able to achieve on the same hardware (256-thread AMD EPYC, 512 GB RAM, ConnectX 7 b2b). ==================== Link: https://patch.msgid.link/20250114130646.1937192-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net/mlx5e: CT: Offload connections with hardware steering rulesCosmin Ratiu
This is modeled similar to how software steering works: - a reference-counted matcher is maintained for each combination of nat/no_nat x ipv4/ipv6 x tcp/udp/gre. - adding a rule involves finding+referencing or creating a corresponding matcher, then actually adding a rule. - updating rules is implemented using the bwc_rule update API, which can change a rule's actions without touching the match value. By using a T-Rex traffic generator to initiate multi-million UDP flows per second, a kernel running with these patches on the RX side was able to offload ~600K flows per second, which is about ~7x larger than what software steering could do on the same hardware (256-thread AMD EPYC, 512 GB RAM, ConnectX-7 b2b). Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250114130646.1937192-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net/mlx5e: CT: Make mlx5_ct_fs_smfs_ct_validate_flow_rule reusableCosmin Ratiu
This function checks whether a flow_rule has the right flow dissector keys and masks used for a connection tracking flow offload. It is currently used locally by the tc_ct smfs module, but is about to be used from another place, so this commit moves it to a better place, renames it to mlx5e_tc_ct_is_valid_flow_rule and drops the unused fs argument. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250114130646.1937192-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net/mlx5e: CT: Add initial support for Hardware SteeringCosmin Ratiu
Connection tracking can offload tuple matches to the NIC either via firmware commands (when the steering mode is dmfs or offload support is disabled due to eswitch being set to legacy) or via software-managed flow steering (smfs). This commit adds stub operations for a third mode, hardware-managed flow steering. This is enabled when both CONFIG_MLX5_TC_CT and CONFIG_MLX5_HW_STEERING are enabled. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250114130646.1937192-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net/mlx5: HWS, rework the check if matcher size can be increasedYevgeny Kliteynik
When checking if the matcher size can be increased, check both match and action RTCs. Also, consider the increasing step - check that it won't cause the new matcher size to become unsupported. Additionally, since we're using '+ 1' for action RTC size yet again, define it as macro and use in all the required places. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250114130646.1937192-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15Merge branch 'net-reduce-rtnl-pressure-in-unregister_netdevice'Jakub Kicinski
Eric Dumazet says: ==================== net: reduce RTNL pressure in unregister_netdevice() One major source of RTNL contention resides in unregister_netdevice() Due to RCU protection of various network structures, and unregister_netdevice() being a synchronous function, it is calling potentially slow functions while holding RTNL. I think we can release RTNL in two points, so that three slow functions are called while RTNL can be used by other threads. v1: https://lore.kernel.org/netdev/20250107130906.098fc8d6@kernel.org/T/#m398c95f5778e1ff70938e079d3c4c43c050ad2a6 ==================== Link: https://patch.msgid.link/20250114205531.967841-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 2)Eric Dumazet
One synchronize_net() call is currently done while holding RTNL. This is source of RTNL contention in workloads adding and deleting many network namespaces per second, because synchronize_rcu() and synchronize_rcu_expedited() can use 60+ ms in some cases. For cleanup_net() use, temporarily release RTNL while calling the last synchronize_net(). This should be safe, because devices are no longer visible to other threads after unlist_netdevice() call and setting dev->reg_state to NETREG_UNREGISTERING. In any case, the new netdev_lock() / netdev_unlock() infrastructure that we are adding should allow to fix potential issues, with a combination of a per-device mutex and dev->reg_state awareness. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Link: https://patch.msgid.link/20250114205531.967841-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 1)Eric Dumazet
Two synchronize_net() calls are currently done while holding RTNL. This is source of RTNL contention in workloads adding and deleting many network namespaces per second, because synchronize_rcu() and synchronize_rcu_expedited() can use 60+ ms in some cases. For cleanup_net() use, temporarily release RTNL while calling the last synchronize_net(). This should be safe, because devices are no longer visible to other threads at this point. In any case, the new netdev_lock() / netdev_unlock() infrastructure that we are adding should allow to fix potential issues, with a combination of a per-device mutex and dev->reg_state awareness. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Link: https://patch.msgid.link/20250114205531.967841-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: no longer hold RTNL while calling flush_all_backlogs()Eric Dumazet
flush_all_backlogs() is called from unregister_netdevice_many_notify() as part of netdevice dismantles. This is currently called under RTNL, and can last up to 50 ms on busy hosts. There is no reason to hold RTNL at this stage, if our caller is cleanup_net() : netns are no more visible, devices are in NETREG_UNREGISTERING state and no other thread could mess our state while RTNL is temporarily released. In order to provide isolation, this patch provides a separate 'net_todo_list' for cleanup_net(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Link: https://patch.msgid.link/20250114205531.967841-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: no longer assume RTNL is held in flush_all_backlogs()Eric Dumazet
flush_all_backlogs() uses per-cpu and static data to hold its temporary data, on the assumption it is called under RTNL protection. Following patch in the series will break this assumption. Use instead a dynamically allocated piece of memory. In the unlikely case the allocation fails, use a boot-time allocated memory. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Link: https://patch.msgid.link/20250114205531.967841-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: expedite synchronize_net() for cleanup_net()Eric Dumazet
cleanup_net() is the single thread responsible for netns dismantles, and a serious bottleneck. Before we can get per-netns RTNL, make sure all synchronize_net() called from this thread are using rcu_synchronize_expedited(). v3: deal with CONFIG_NET_NS=n Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com> Link: https://patch.msgid.link/20250114205531.967841-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15Merge branch 'net-use-netdev-lock-to-protect-napi'Jakub Kicinski
Jakub Kicinski says: ==================== net: use netdev->lock to protect NAPI We recently added a lock member to struct net_device, with a vague plan to start using it to protect netdev-local state, removing the need to take rtnl_lock for new configuration APIs. Lay some groundwork and use this lock for protecting NAPI APIs. v1: https://lore.kernel.org/20250114035118.110297-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250115035319.559603-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15netdev-genl: remove rtnl_lock protection from NAPI opsJakub Kicinski
NAPI lifetime, visibility and config are all fully under netdev_lock protection now. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-12-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: protect NAPI config fields with netdev_lock()Jakub Kicinski
Protect the following members of netdev and napi by netdev_lock: - defer_hard_irqs, - gro_flush_timeout, - irq_suspend_timeout. The first two are written via sysfs (which this patch switches to new lock), and netdev genl which holds both netdev and rtnl locks. irq_suspend_timeout is only written by netdev genl. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: protect napi->irq with netdev_lock()Jakub Kicinski
Take netdev_lock() in netif_napi_set_irq(). All NAPI "control fields" are now protected by that lock (most of the other ones are set during napi add/del). The napi_hash_node is fully protected by the hash spin lock, but close enough for the kdoc... Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: protect threaded status of NAPI with netdev_lock()Jakub Kicinski
Now that NAPI instances can't come and go without holding netdev->lock we can trivially switch from rtnl_lock() to netdev_lock() for setting netdev->threaded via sysfs. Note that since we do not lock netdev_lock around sysfs calls in the core we don't have to "trylock" like we do with rtnl_lock. Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: make netdev netlink ops hold netdev_lock()Jakub Kicinski
In prep for dropping rtnl_lock, start locking netdev->lock in netlink genl ops. We need to be using netdev->up instead of flags & IFF_UP. We can remove the RCU lock protection for the NAPI since NAPI list is protected by netdev->lock already. Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: protect NAPI enablement with netdev_lock()Jakub Kicinski
Wrap napi_enable() / napi_disable() with netdev_lock(). Provide the "already locked" flavor of the API. iavf needs the usual adjustment. A number of drivers call napi_enable() under a spin lock, so they have to be modified to take netdev_lock() first, then spin lock then call napi_enable_locked(). Protecting napi_enable() implies that napi->napi_id is protected by netdev_lock(). Acked-by: Francois Romieu <romieu@fr.zoreil.com> # via-velocity Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: protect netdev->napi_list with netdev_lock()Jakub Kicinski
Hold netdev->lock when NAPIs are getting added or removed. This will allow safe access to NAPI instances of a net_device without rtnl_lock. Create a family of helpers which assume the lock is already taken. Switch iavf to them, as it makes extensive use of netdev->lock, already. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: add netdev->up protected by netdev_lock()Jakub Kicinski
Some uAPI (netdev netlink) hide net_device's sub-objects while the interface is down to ensure uniform behavior across drivers. To remove the rtnl_lock dependency from those uAPIs we need a way to safely tell if the device is down or up. Add an indication of whether device is open or closed, protected by netdev->lock. The semantics are the same as IFF_UP, but taking netdev_lock around every write to ->flags would be a lot of code churn. We don't want to blanket the entire open / close path by netdev_lock, because it will prevent us from applying it to specific structures - core helpers won't be able to take that lock from any function called by the drivers on open/close paths. So the state of the flag is "pessimistic", as in it may report false negatives, but never false positives. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: add helpers for lookup and walking netdevs under netdev_lock()Jakub Kicinski
Add helpers for accessing netdevs under netdev_lock(). There's some careful handling needed to find the device and lock it safely, without it getting unregistered, and without taking rtnl_lock (the latter being the whole point of the new locking, after all). Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: make netdev_lock() protect netdev->reg_stateJakub Kicinski
Protect writes to netdev->reg_state with netdev_lock(). From now on holding netdev_lock() is sufficient to prevent the net_device from getting unregistered, so code which wants to hold just a single netdev around no longer needs to hold rtnl_lock. We do not protect the NETREG_UNREGISTERED -> NETREG_RELEASED transition. We'd need to move mutex_destroy(netdev->lock) to .release, but the real reason is that trying to stop the unregistration process mid-way would be unsafe / crazy. Taking references on such devices is not safe, either. So the intended semantics are to lock REGISTERED devices. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250115035319.559603-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: add netdev_lock() / netdev_unlock() helpersJakub Kicinski
Add helpers for locking the netdev instance, use it in drivers and the shaper code. This will make grepping for the lock usage much easier, as we extend the lock to cover more fields. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20250115035319.559603-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: wwan: iosm: Fix hibernation by re-binding the driver around itMaciej S. Szmigiero
Currently, the driver is seriously broken with respect to the hibernation (S4): after image restore the device is back into IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs full re-launch of the rest of its firmware, but the driver restore handler treats the device as merely sleeping and just sends it a wake-up command. This wake-up command times out but device nodes (/dev/wwan*) remain accessible. However attempting to use them causes the bootloader to crash and enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash dump is ready"). It seems that the device cannot be re-initialized from this crashed stage without toggling some reset pin (on my test platform that's apparently what the device _RST ACPI method does). While it would theoretically be possible to rewrite the driver to tear down the whole MUX / IPC layers on hibernation (so the bootloader does not crash from improper access) and then re-launch the device on restore this would require significant refactoring of the driver (believe me, I've tried), since there are quite a few assumptions hard-coded in the driver about the device never being partially de-initialized (like channels other than devlink cannot be closed, for example). Probably this would also need some programming guide for this hardware. Considering that the driver seems orphaned [1] and other people are hitting this issue too [2] fix it by simply unbinding the PCI driver before hibernation and re-binding it after restore, much like USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar problem. Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses the existing suspend / resume handlers) and S4 (which uses the new code). [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ [2]: https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Link: https://patch.msgid.link/e60287ebdb0ab54c4075071b72568a40a75d0205.1736372610.git.mail@maciej.szmigiero.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-01-08 (ice) This series contains updates to ice driver only. Przemek reworks implementation so that ice_init_hw() is called before ice_adapter initialization. The motivation is to have ability to act on the number of PFs in ice_adapter initialization. This is not done here but the code is also a bit cleaner. Michal adds priority to be considered when matching recipes for proper differentiation. Konrad adds devlink health reporting for firmware generated events. R Sundar utilizes string helpers over open coded versions. Jake adds implementation to utilize a lower latency interface to program PHY timer when supported. Additional information can be found on the original cover letter: https://lore.kernel.org/intel-wired-lan/20241216145453.333745-1-anton.nadezhdin@intel.com/ Karol adds and allows for different PTP delay values to be used per pin. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add in/out PTP pin delays ice: implement low latency PHY timer updates ice: check low latency PHY timer update firmware capability ice: add lock to protect low latency interface ice: rename TS_LL_READ* macros to REG_LL_PROXY_H_* ice: use read_poll_timeout_atomic in ice_read_phy_tstamp_ll_e810 ice: use string choice helpers ice: add fw and port health reporters ice: add recipe priority check in search ice: ice_probe: init ice_adapter after HW init ice: minor: rename goto labels from err to unroll ice: split ice_init_hw() out from ice_init_dev() ice: c827: move wait for FW to ice_init_hw() ==================== Link: https://patch.msgid.link/20250115000844.714530-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15inet: ipmr: fix data-racesEric Dumazet
Following fields of 'struct mr_mfc' can be updated concurrently (no lock protection) from ip_mr_forward() and ip6_mr_forward() - bytes - pkt - wrong_if - lastuse They also can be read from other functions. Convert bytes, pkt and wrong_if to atomic_long_t, and use READ_ONCE()/WRITE_ONCE() for lastuse. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250114221049.1190631-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15Merge branch 'bnxt_en-implement-tcp-data-split-and-thresh-option'Jakub Kicinski
Taehee Yoo says: ==================== bnxt_en: implement tcp-data-split and thresh option This series implements hds-thresh ethtool command. This series also implements backend of tcp-data-split and hds-thresh ethtool command for bnxt_en driver. These ethtool commands are mandatory options for device memory TCP. NICs that use the bnxt_en driver support tcp-data-split feature named HDS(header-data-split). But there is no implementation for the HDS to enable by ethtool. Only getting the current HDS status is implemented and the HDS is just automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled. The hds_threshold follows the rx-copybreak value but it wasn't changeable. Currently, bnxt_en driver enables tcp-data-split by default but not always work. There is hds_threshold value, which indicates that a packet size is larger than this value, a packet will be split into header and data. hds_threshold value has been 256, which is a default value of rx-copybreak value too. The rx-copybreak value hasn't been allowed to change so the hds_threshold too. This patchset decouples hds_threshold and rx-copybreak first. and make tcp-data-split, rx-copybreak, and hds-thresh configurable independently. But the default configuration is the same. The default value of rx-copybreak is 256 and default hds-thresh is also 256. The behavior of rx-copybreak will probably be changed in almost all drivers. If HDS is not enabled, rx-copybreak copies both header and payload from a page. But if HDS is enabled, rx-copybreak copies only header from the first page. Due to this change, it may need to disable(set to 0) rx-copybreak when the HDS is required. There are several related options. TPA(HW-GRO, LRO), JUMBO, jumbo_thresh(firmware command), and Aggregation Ring. The aggregation ring is fundamental to these all features. When gro/lro/jumbo packets are received, NIC receives the first packet from the normal ring. follow packets come from the aggregation ring. These features are working regardless of HDS. If HDS is enabled, the first packet contains the header only, and the following packets contain only payload. So, HW-GRO/LRO is working regardless of HDS. There is another threshold value, which is jumbo_thresh. This is very similar to hds_thresh, but jumbo thresh doesn't split header and data. It just split the first and following data based on length. When NIC receives 1500 sized packet, and jumbo_thresh is 256(default, but follows rx-copybreak), the first data is 256 and the following packet size is 1500-256. Before this patch, at least if one of GRO, LRO, and JUMBO flags is enabled, the Aggregation ring will be enabled. If the Aggregation ring is enabled, both hds_threshold and jumbo_thresh are set to the default value of rx-copybreak. So, GRO, LRO, JUMBO frames, they larger than 256 bytes, they will be split into header and data if the protocol is TCP or UDP. for the other protocol, jumbo_thresh works instead of hds_thresh. This means that tcp-data-split relies on the GRO, LRO, and JUMBO flags. But by this patch, tcp-data-split no longer relies on these flags. If the tcp-data-split is enabled, the Aggregation ring will be enabled. Also, hds_threshold no longer follows rx-copybreak value, it will be set to the hds-thresh value by user-space, but the default value is still 256. If the protocol is TCP or UDP and the HDS is disabled and Aggregation ring is enabled, a packet will be split into several pieces due to jumbo_thresh. When single buffer XDP is attached, tcp-data-split is automatically disabled. LRO, GRO, and JUMBO are tested with BCM57414, BCM57504 and the firmware version is 230.0.157.0. I couldn't find any specification about minimum and maximum value of hds_threshold, but from my test result, it was about 0 ~ 1023. It means, over 1023 sized packets will be split into header and data if tcp-data-split is enabled regardless of hds_treshold value. When hds_threshold is 1500 and received packet size is 1400, HDS should not be activated, but it is activated. The maximum value of hds-thresh value is 256 because it has been working. It was decided very conservatively. I checked out the tcp-data-split(HDS) works independently of GRO, LRO, JUMBO. Also, I checked out tcp-data-split should be disabled automatically when XDP is attached and disallowed to enable it again while XDP is attached. I tested ranged values from min to max for hds-thresh and rx-copybreak, and it works. hds-thresh from 0 to 256, and rx-copybreak 0 to 256. When testing this patchset, I checked skb->data, skb->data_len, and nr_frags values. By this patchset, bnxt_en driver supports a force enable tcp-data-split, but it doesn't support for disable tcp-data-split. When tcp-data-split is explicitly enabled, HDS works always. When tcp-data-split is unknown, it depends on the current configuration of LRO/GRO/JUMBO. 1/10 patch adds a new hds_config member in the ethtool_netdev_state. It indicates that what tcp-data-split value is really updated from userspace. So the driver can distinguish a passed tcp-data-split value is came from user or driver itself. 2/10 patch adds hds-thresh command in the ethtool. This threshold value indicates if a received packet size is larger than this threshold, the packet's header and payload will be split. Example: # ethtool -G <interface name> hds-thresh <value> This option can not be used when tcp-data-split is disabled or not supported. # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... Current hardware settings: ... TCP data split: on HDS thresh: 256 3/10, 4/10 add condition checks for devmem and ethtool. If tcp-data-split is disabled or threshold value is not zero, setup of devmem will be failed. Also, tcp-data-split and hds-thresh will not be changed while devmem is running. 5/10 add condition checks for netdev core. It disallows setup single buffer XDP program when tcp-data-split is enabled. 6/10 patch implements .{set, get}_tunable() in the bnxt_en. The bnxt_en driver has been supporting the rx-copybreak feature but is not configurable, Only the default rx-copybreak value has been working. So, it changes the bnxt_en driver to be able to configure the rx-copybreak value. 7/10 patch adds an implementation of tcp-data-split ethtool command. The HDS relies on the Aggregation ring, which is automatically enabled when either LRO, GRO, or large mtu is configured. So, if the Aggregation ring is enabled, HDS is automatically enabled by it. 8/10 patch adds the implementation of hds-thresh logic in the bnxt_en driver. The default value is 256, which used to be the default rx-copybreak value. 9/10 add HDS feature implementation for netdevsim. HDS feature is not common so far. Only a few NICs support this feature. There is no way to test HDS core-API unless we have proper hw NIC. In order to test HDS core-API without hw NIC, netdevsim can be used. It implements HDS control and data plane for netdevsim. 10/10 add selftest for HDS(tcp-data-split and HDS-thresh). The tcp-data-split tests are the same with `ethtool -G tcp-data-split <on | auto>` HDS-thresh tests are same with `ethtool -G eth0 hds-thresh <0 - MAX>` This series is tested with BCM57504 and netdevsim. ==================== Link: https://patch.msgid.link/20250114142852.3364986-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15selftest: net-drv: hds: add test for HDS featureTaehee Yoo
HDS/HDS-thresh features were updated/implemented. so add some tests for these features. HDS tests are the same with `ethtool -G eth0 tcp-data-split <on | off | auto >` but `auto` depends on driver specification. So, it doesn't include `auto` case. HDS-thresh tests are same with `ethtool -G eth0 hds-thresh <0 - MAX>` It includes both 0 and MAX cases. It also includes exceed case, MAX + 1. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-11-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15netdevsim: add HDS featureTaehee Yoo
HDS options(tcp-data-split, hds-thresh) have dependencies between other features like XDP. Basic dependencies are checked in the core API. netdevsim is very useful to check basic dependencies. The default tcp-data-split mode is UNKNOWN but netdevsim driver returns ENABLED when ethtool dumps tcp-data-split mode. The default value of HDS threshold is 0 and the maximum value is 1024. ethtool shows like this. ethtool -g eni1np1 Ring parameters for eni1np1: Pre-set maximums: ... HDS thresh: 1024 Current hardware settings: ... TCP data split: on HDS thresh: 0 ethtool -G eni1np1 tcp-data-split on hds-thresh 1024 ethtool -g eni1np1 Ring parameters for eni1np1: Pre-set maximums: ... HDS thresh: 1024 Current hardware settings: ... TCP data split: on HDS thresh: 1024 Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-10-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15bnxt_en: add support for hds-thresh ethtool commandTaehee Yoo
The bnxt_en driver has configured the hds_threshold value automatically when TPA is enabled based on the rx-copybreak default value. Now the hds-thresh ethtool command is added, so it adds an implementation of hds-thresh option. Configuration of the hds-thresh is applied only when the tcp-data-split is enabled. The default value of hds-thresh is 256, which is the default value of rx-copybreak, which used to be the hds_thresh value. The maximum hds-thresh is 1023. # Example: # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... HDS thresh: 1023 Current hardware settings: ... TCP data split: on HDS thresh: 256 Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-9-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15bnxt_en: add support for tcp-data-split ethtool commandTaehee Yoo
NICs that uses bnxt_en driver supports tcp-data-split feature by the name of HDS(header-data-split). But there is no implementation for the HDS to enable by ethtool. Only getting the current HDS status is implemented and The HDS is just automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled. The hds_threshold follows rx-copybreak value. and it was unchangeable. This implements `ethtool -G <interface name> tcp-data-split <value>` command option. The value can be <on> and <auto>. The value is <auto> and one of LRO/GRO/JUMBO is enabled, HDS is automatically enabled and all LRO/GRO/JUMBO are disabled, HDS is automatically disabled. HDS feature relies on the aggregation ring. So, if HDS is enabled, the bnxt_en driver initializes the aggregation ring. This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-8-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15bnxt_en: add support for rx-copybreak ethtool commandTaehee Yoo
The bnxt_en driver supports rx-copybreak, but it couldn't be set by userspace. Only the default value(256) has worked. This patch makes the bnxt_en driver support following command. `ethtool --set-tunable <devname> rx-copybreak <value> ` and `ethtool --get-tunable <devname> rx-copybreak`. By this patch, hds_threshol is set to the rx-copybreak value. But it will be set by `ethtool -G eth0 hds-thresh N` in the next patch. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250114142852.3364986-7-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: disallow setup single buffer XDP when tcp-data-split is enabled.Taehee Yoo
When a single buffer XDP is attached, NIC should guarantee only single page packets will be received. tcp-data-split feature splits packets into header and payload. single buffer XDP can't handle it properly. So attaching single buffer XDP should be disallowed when tcp-data-split is enabled. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-6-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: ethtool: add ring parameter filteringTaehee Yoo
While the devmem is running, the tcp-data-split and hds-thresh configuration should not be changed. If user tries to change tcp-data-split and threshold value while the devmem is running, it fails and shows extack message. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-5-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: devmem: add ring parameter filteringTaehee Yoo
If driver doesn't support ring parameter or tcp-data-split configuration is not sufficient, the devmem should not be set up. Before setup the devmem, tcp-data-split should be ON and hds-thresh value should be 0. Tested-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-4-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: ethtool: add support for configuring hds-threshTaehee Yoo
The hds-thresh option configures the threshold value of the header-data-split. If a received packet size is larger than this threshold value, a packet will be split into header and payload. The header indicates TCP and UDP header, but it depends on driver spec. The bnxt_en driver supports HDS(Header-Data-Split) configuration at FW level, affecting TCP and UDP too. So, If hds-thresh is set, it affects UDP and TCP packets. Example: # ethtool -G <interface name> hds-thresh <value> # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... HDS thresh: 1023 Current hardware settings: ... TCP data split: on HDS thresh: 256 The default/min/max values are not defined in the ethtool so the drivers should define themself. The 0 value means that all TCP/UDP packets' header and payload will be split. Tested-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-3-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: ethtool: add hds_config member in ethtool_netdev_stateTaehee Yoo
When tcp-data-split is UNKNOWN mode, drivers arbitrarily handle it. For example, bnxt_en driver automatically enables if at least one of LRO/GRO/JUMBO is enabled. If tcp-data-split is UNKNOWN and LRO is enabled, a driver returns ENABLES of tcp-data-split, not UNKNOWN. So, `ethtool -g eth0` shows tcp-data-split is enabled. The problem is in the setting situation. In the ethnl_set_rings(), it first calls get_ringparam() to get the current driver's config. At that moment, if driver's tcp-data-split config is UNKNOWN, it returns ENABLE if LRO/GRO/JUMBO is enabled. Then, it sets values from the user and driver's current config to kernel_ethtool_ringparam. Last it calls .set_ringparam(). The driver, especially bnxt_en driver receives ETHTOOL_TCP_DATA_SPLIT_ENABLED. But it can't distinguish whether it is set by the user or just the current config. When user updates ring parameter, the new hds_config value is updated and current hds_config value is stored to old_hdsconfig. Driver's .set_ringparam() callback can distinguish a passed tcp-data-split value is came from user explicitly. If .set_ringparam() is failed, hds_config is rollbacked immediately. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-2-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: loopback: Hold rtnl_net_lock() in blackhole_netdev_init().Kuniyuki Iwashima
blackhole_netdev is the global device in init_net. Let's hold rtnl_net_lock(&init_net) in blackhole_netdev_init(). While at it, the unnecessary dev_net_set() call is removed, which is done in alloc_netdev_mqs(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250114081352.47404-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15selftests/net/forwarding: teamd command not foundAlessandro Zanni
Running "make kselftest TARGETS=net/forwarding" results in multiple ccurrences of the same error: - ./lib.sh: line 787: teamd: command not found This patch adds the variable $REQUIRE_TEAMD in every test that uses the command teamd and checks the $REQUIRE_TEAMD variable in the file "lib.sh" to skip the test if the command is not installed. Signed-off-by: Alessandro Zanni <alessandro.zanni87@gmail.com> Link: https://patch.msgid.link/20250114003323.97207-1-alessandro.zanni87@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15Merge branch 'eth-fbnic-add-hardware-monitoring-support'Jakub Kicinski
Sanman Pradhan says: ==================== eth: fbnic: Add hardware monitoring support This patch series adds hardware monitoring support to the fbnic driver. It implements support for reading temperature and voltage sensors via firmware requests, and exposes this data through the HWMON interface. The series is structured as follows: Patch 1: Adds completion infrastructure for firmware requests Patch 2: Implements TSENE sensor message handling Patch 3: Adds HWMON interface support Output: $ ls -l /sys/class/hwmon/hwmon1/ total 0 lrwxrwxrwx 1 root root 0 Sep 10 00:00 device -> ../../../0000:01:00.0 -r--r--r-- 1 root root 4096 Sep 10 00:00 in0_input -r--r--r-- 1 root root 4096 Sep 10 00:00 name lrwxrwxrwx 1 root root 0 Sep 10 00:00 subsystem -> ../../../../../../class/hwmon -r--r--r-- 1 root root 4096 Sep 10 00:00 temp1_input -rw-r--r-- 1 root root 4096 Sep 10 00:00 uevent $ cat /sys/class/hwmon/hwmon1/temp1_input 40480 $ cat /sys/class/hwmon/hwmon1/in0_input 750 ==================== Link: https://patch.msgid.link/20250114000705.2081288-1-sanman.p211993@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15eth: fbnic: Add hardware monitoring support via HWMON interfaceSanman Pradhan
This patch adds support for hardware monitoring to the fbnic driver, allowing for temperature and voltage sensor data to be exposed to userspace via the HWMON interface. The driver registers a HWMON device and provides callbacks for reading sensor data, enabling system admins to monitor the health and operating conditions of fbnic. Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250114000705.2081288-4-sanman.p211993@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15eth: fbnic: hwmon: Add support for reading temperature and voltage sensorsSanman Pradhan
Add support for reading temperature and voltage sensor data from firmware by implementing a new TSENE message type and response parsing. This adds message handler infrastructure to transmit sensor read requests and parse responses. The sensor data will be exposed through the driver's hwmon interface. Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250114000705.2081288-3-sanman.p211993@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15eth: fbnic: hwmon: Add completion infrastructure for firmware requestsSanman Pradhan
Add infrastructure to support firmware request/response handling with completions. Add a completion structure to track message state including message type for matching, completion for waiting for response, and result for error propagation. Use existing spinlock to protect the writes. The data from the various response types will be added to the "union u" by subsequent commits. Signed-off-by: Sanman Pradhan <sanman.p211993@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250114000705.2081288-2-sanman.p211993@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15Merge branch 'net-lan969x-add-fdma-support'Jakub Kicinski
Daniel Machon says: ==================== net: lan969x: add FDMA support == Description: This series is the last of a multi-part series, that prepares and adds support for the new lan969x switch driver. The upstreaming efforts has been split into multiple series: 1) Prepare the Sparx5 driver for lan969x (merged) 2) Add support for lan969x (same basic features as Sparx5 provides excl. FDMA and VCAP, merged). 3) Add lan969x VCAP functionality (merged). 4) Add RGMII support (merged). --> 5) Add FDMA support. == FDMA support: The lan969x switch device uses the same FDMA engine as the Sparx5 switch device, with the same number of channels etc. This means we can utilize the newly added FDMA library, that is already in use by the lan966x and sparx5 drivers. As previous lan969x series, the FDMA implementation will hook into the Sparx5 implementation where possible, however both RX and TX handling will be done differently on lan969x and therefore requires a separate implementation of the RX and TX path. Details are in the commit description of the individual patches == Patch breakdown: Patch #1: Enable FDMA support on lan969x Patch #2: Split start()/stop() functions Patch #3: Activate TX FDMA in start() Patch #4: Ops out a few functions that differ on the two platforms Patch #5: Add FDMA implementation for lan969x v1: https://lore.kernel.org/20250109-sparx5-lan969x-switch-driver-5-v1-0-13d6d8451e63@microchip.com ==================== Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-0-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: lan969x: add FDMA implementationDaniel Machon
The lan969x switch device supports manual frame injection and extraction to and from the switch core, using a number of injection and extraction queues. This technique is currently supported, but delivers poor performance compared to Frame DMA (FDMA). This lan969x implementation of FDMA, hooks into the existing FDMA for Sparx5, but requires its own RX and TX handling, as lan969x does not support the same native cache coherency that Sparx5 does. Effectively, this means that we are going to use the DMA mapping API for mapping and unmapping TX buffers. The RX loop will utilize the page pool API for efficient RX handling. Other than that, the implementation is largely the same, and utilizes the FDMA library for DCB and DB handling. Some numbers: Manual injection/extraction (before this series): // iperf3 -c 1.0.1.1 [ ID] Interval Transfer Bitrate [ 5] 0.00-10.02 sec 345 MBytes 289 Mbits/sec sender [ 5] 0.00-10.06 sec 345 MBytes 288 Mbits/sec receiver FDMA (after this series): // iperf3 -c 1.0.1.1 [ ID] Interval Transfer Bitrate [ 5] 0.00-10.03 sec 1.10 GBytes 940 Mbits/sec sender [ 5] 0.00-10.07 sec 1.10 GBytes 936 Mbits/sec receiver Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-5-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: sparx5: ops out certain FDMA functionsDaniel Machon
We are going to implement the RX and TX paths a bit differently on lan969x and therefore need to introduce new ops for FDMA functions: init, deinit, xmit and poll. Assign the Sparx5 equivalents for these and update the code throughout. Also add a 'struct net_device' argument to the xmit() function, as we will be needing that for lan969x. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-4-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: sparx5: activate FDMA tx in start()Daniel Machon
The function sparx5_fdma_tx_activate() is responsible for configuring the TX FDMA instance and activating the channel. TX activation has previously been done in the xmit() function, when the first frame is transmitted. Now that we have separate functions for starting and stopping the FDMA, it seems reasonable to move the TX activation to the start function. This change has no implications on the functionality. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-3-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: sparx5: split sparx5_fdma_{start(),stop()}Daniel Machon
The two functions: sparx5_fdma_{start(),stop()} are responsible for a number of things, namely: allocation and initialization of FDMA buffers, activation FDMA channels in hardware and activation of the NAPI instance. This patch splits the buffer allocation and initialization into init and deinit functions, and the channel and NAPI activation into start and stop functions. This serves two purposes: 1) the start() and stop() functions can be reused for lan969x and 2) prepares for future MTU change support, where we must be able to stop and start the FDMA channels and NAPI instance, without free'ing and reallocating the FDMA buffers. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/20250113-sparx5-lan969x-switch-driver-5-v2-2-c468f02fd623@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>