git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-08-23	mlx4: Avoid resetting MLX4_INTFF_BONDING per driver	Petr Pavlu
	The mlx4_core driver has a logic that allows a sub-driver to set the MLX4_INTFF_BONDING flag which then causes that function mlx4_do_bond() asks the sub-driver to fully re-probe a device when its bonding configuration changes. Performing this operation is disallowed in mlx4_register_interface() when it is detected that any mlx4 device is multifunction (SRIOV). The code then resets MLX4_INTFF_BONDING in the driver flags. Move this check directly into mlx4_do_bond(). It provides a better separation as mlx4_core no longer directly modifies the sub-driver flags and it will allow to get rid of explicitly keeping track of all mlx4 devices by the intf.c code when it is switched to an auxiliary bus. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23	mlx4: Move the bond work to the core driver	Petr Pavlu
	Function mlx4_en_queue_bond_work() is used in mlx4_en to start a bond reconfiguration. It gathers data about a new port map setting, takes a reference on the netdev that triggered the change and queues a work object on mlx4_en_priv.mdev.workqueue to perform the operation. The scheduled work is mlx4_en_bond_work() which calls mlx4_bond()/mlx4_unbond() and consequently mlx4_do_bond(). At the same time, function mlx4_change_port_types() in mlx4_core might be invoked to change the port type configuration. As part of its logic, it re-registers the whole device by calling mlx4_unregister_device(), followed by mlx4_register_device(). The two operations can result in concurrent access to the data about currently active interfaces on the device. Functions mlx4_register_device() and mlx4_unregister_device() lock the intf_mutex to gain exclusive access to this data. The current implementation of mlx4_do_bond() doesn't do that which could result in an unexpected behavior. An updated version of mlx4_do_bond() for use with an auxiliary bus goes and locks the intf_mutex when accessing a new auxiliary device array. However, doing so can then result in the following deadlock: * A two-port mlx4 device is configured as an Ethernet bond. * One of the ports is changed from eth to ib, for instance, by writing into a mlx4_port<x> sysfs attribute file. * mlx4_change_port_types() is called to update port types. It invokes mlx4_unregister_device() to unregister the device which locks the intf_mutex and starts removing all associated interfaces. * Function mlx4_en_remove() gets invoked and starts destroying its first netdev. This triggers mlx4_en_netdev_event() which recognizes that the configured bond is broken. It runs mlx4_en_queue_bond_work() which takes a reference on the netdev. Removing the netdev now cannot proceed until the work is completed. * Work function mlx4_en_bond_work() gets scheduled. It calls mlx4_unbond() -> mlx4_do_bond(). The latter function tries to lock the intf_mutex but that is not possible because it is held already by mlx4_unregister_device(). This particular case could be possibly solved by unregistering the mlx4_en_netdev_event() notifier in mlx4_en_remove() earlier, but it seems better to decouple mlx4_en more and break this reference order. Avoid then this scenario by recognizing that the bond reconfiguration operates only on a mlx4_dev. The logic to queue and execute the bond work can be moved into the mlx4_core driver. Only a reference on the respective mlx4_dev object is needed to be taken during the work's lifetime. This removes a call from mlx4_en that can directly result in needing to lock the intf_mutex, it remains a privilege of the core driver. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23	mlx4: Get rid of the mlx4_interface.activate callback	Petr Pavlu
	The mlx4_interface.activate callback was introduced in commit 79857cd31fe7 ("net/mlx4: Postpone the registration of net_device"). It dealt with a situation when a netdev notifier received a NETDEV_REGISTER event for a new net_device created by mlx4_en but the same device was not yet visible to mlx4_get_protocol_dev(). The callback can be removed now that mlx4_get_protocol_dev() is gone. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23	mlx4: Replace the mlx4_interface.event callback with a notifier	Petr Pavlu
	Use a notifier to implement mlx4_dispatch_event() in preparation to switch mlx4_en and mlx4_ib to be an auxiliary device. A problem is that if the mlx4_interface.event callback was replaced with something as mlx4_adrv.event then the implementation of mlx4_dispatch_event() would need to acquire a lock on a given device before executing this callback. That is necessary because otherwise there is no guarantee that the associated driver cannot get unbound when the callback is running. However, taking this lock is not possible because mlx4_dispatch_event() can be invoked from the hardirq context. Using an atomic notifier allows the driver to accurately record when it wants to receive these events and solves this problem. A handler registration is done by both mlx4_en and mlx4_ib at the end of their mlx4_interface.add callback. This matches the current situation when mlx4_add_device() would enable events for a given device immediately after this callback, by adding the device on the mlx4_priv.list. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23	mlx4: Use 'void *' as the event param of mlx4_dispatch_event()	Petr Pavlu
	Function mlx4_dispatch_event() takes an 'unsigned long' as its event parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR), a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit integer (remaining events). In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device, the mlx4_interface.event callback is replaced with a notifier and function mlx4_dispatch_event() gets updated to invoke atomic_notifier_call_chain(). This requires forwarding the input 'param' value from the former function to the latter. A problem is that the notifier call takes 'void ' as its 'param' value, compared to 'unsigned long' used by mlx4_dispatch_event(). Re-passing the value would need either punning it to 'void ' or passing down the address of the input 'param'. Both approaches create a number of unnecessary casts. Change instead the input 'param' of mlx4_dispatch_event() from 'unsigned long' to 'void *'. A mlx4_eqe pointer can be passed directly, callers using an int value are adjusted to pass its address. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23	mlx4: Rename member mlx4_en_dev.nb to netdev_nb	Petr Pavlu
	Rename the mlx4_en_dev.nb notifier_block member to netdev_nb in preparation to add a mlx4 core notifier_block. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23	mlx4: Get rid of the mlx4_interface.get_dev callback	Petr Pavlu
	Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev() and the associated mlx4_interface.get_dev callbacks. This is done in preparation to use an auxiliary bus to model the mlx4 driver structure. The change is motivated by the following situation: * The mlx4_en interface is being initialized by mlx4_en_add() and mlx4_en_activate(). * The latter activate function calls mlx4_en_init_netdev() -> register_netdev() to register a new net_device. * A netdev event NETDEV_REGISTER is raised for the device. * The netdev notififier mlx4_ib_netdev_event() is called and it invokes mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() -> mlx4_en_get_netdev() [via mlx4_interface.get_dev]. This chain creates a problem when mlx4_en gets switched to be an auxiliary driver. It contains two device calls which would both need to take a respective device lock. Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer call mlx4_get_protocol_dev() but instead to utilize the information passed in net_device.parent and net_device.dev_port. This data is sufficient to determine that an updated port is one that the mlx4_ib driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to date. Following that, update mlx4_ib_get_netdev() to also not call mlx4_get_protocol_dev() and instead scan all current netdevs to find find a matching one. Note that mlx4_ib_get_netdev() is called early from ib_register_device() and cannot use data tracked in mlx4_ib_dev.iboe.netdevs which is not at that point yet set. Finally, remove function mlx4_get_protocol_dev() and the mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they became unused. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23	qed/qede: Remove unused declarations	Yue Haibing
	Commit 8cd160a29415 ("qede: convert to new udp_tunnel_nic infra") removed qede_udp_tunnel_{add,del}() but not the declarations. Commit 0ebcebbef1cc ("qed: Read device port count from the shmem") removed qed_device_num_engines() but not its declaration. Commit 1e128c81290a ("qed: Add support for hardware offloaded FCoE.") declared but never implemented qed_fcoe_set_pf_params(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-23	octeontx2-pf: Use PTP HW timestamp counter atomic update feature	Sai Krishna
	Some of the newer silicon versions in CN10K series supports a feature where in the current PTP timestamp in HW can be updated atomically without losing any cpu cycles unlike read/modify/write register. This patch uses this feature so that PTP accuracy can be improved while adjusting the master offset in HW. There is no need for SW timecounter when using this feature. So removed references to SW timecounter wherever appropriate. Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-22	Merge tag 'nf-next-23-08-22' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next First patch resolves a fortify warning by wrapping the to-be-copied members via struct_group. Second patch replaces array[0] with array[] in ebtables uapi. Both changes from GONG Ruiqi. The largest chunk is replacement of strncpy with strscpy_pad() in netfilter, from Justin Stitt. Last patch, from myself, aborts ruleset validation if a fatal signal is pending, this speeds up process exit. * tag 'nf-next-23-08-22' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: allow loop termination for pending fatal signal netfilter: xtables: refactor deprecated strncpy netfilter: x_tables: refactor deprecated strncpy netfilter: nft_meta: refactor deprecated strncpy netfilter: nft_osf: refactor deprecated strncpy netfilter: nf_tables: refactor deprecated strncpy netfilter: nf_tables: refactor deprecated strncpy netfilter: ipset: refactor deprecated strncpy netfilter: ebtables: replace zero-length array members netfilter: ebtables: fix fortify warnings in size_entry_mwt() ==================== Link: https://lore.kernel.org/r/20230822154336.12888-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	Merge branch 'mptcp-prepare-mptcp-packet-scheduler-for-bpf-extension'	Jakub Kicinski
	Mat Martineau says: ==================== mptcp: Prepare MPTCP packet scheduler for BPF extension The kernel's MPTCP packet scheduler has, to date, been a one-size-fits all algorithm that is hard-coded. It attempts to balance latency and throughput when transmitting data across multiple TCP subflows, and has some limited tunability through sysctls. It has been a long-term goal of the Linux MPTCP community to support customizable packet schedulers for use cases that need to make different trade-offs regarding latency, throughput, redundancy, and other metrics. BPF is well-suited for configuring customized, per-packet scheduling decisions without having to modify the kernel or manage out-of-tree kernel modules. The first steps toward implementing BPF packet schedulers are to update the existing MPTCP transmit loops to allow more flexible scheduling decisions, and to add infrastructure for swappable packet schedulers. The existing scheduling algorithm remains the default. BPF-related changes will be in a future patch series. This code has been in the MPTCP development tree for quite a while, undergoing testing in our CI and community. Patches 1 and 2 refactor the transmit code and do some related cleanup. Patches 3-9 add infrastructure for registering and calling multiple schedulers. Patch 10 connects the in-kernel default scheduler to the new infrastructure. ==================== Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-0-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: register default scheduler	Geliang Tang
	This patch defines the default packet scheduler mptcp_sched_default. Register it in mptcp_sched_init(), which is invoked in mptcp_proto_init(). Skip deleting this default scheduler in mptcp_unregister_scheduler(). Set msk->sched to the default scheduler when the input parameter of mptcp_init_sched() is NULL. Invoke mptcp_sched_default_get_subflow in get_send() and get_retrans() if the defaut scheduler is set or msk->sched is NULL. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-10-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: use get_retrans wrapper	Geliang Tang
	This patch adds the multiple subflows support for __mptcp_retrans(). Use get_retrans() wrapper instead of mptcp_subflow_get_retrans() in it. Check the subflow scheduled flags to test which subflow or subflows are picked by the scheduler, use them to send data. Move msk_owned_by_me() and fallback checks into get_retrans() wrapper from mptcp_subflow_get_retrans(). Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-9-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: use get_send wrapper	Geliang Tang
	This patch adds the multiple subflows support for __mptcp_push_pending and __mptcp_subflow_push_pending. Use get_send() wrapper instead of mptcp_subflow_get_send() in them. Check the subflow scheduled flags to test which subflow or subflows are picked by the scheduler, use them to send data. Move msk_owned_by_me() and fallback checks into get_send() wrapper from mptcp_subflow_get_send(). This commit allows the scheduler to set the subflow->scheduled bit in multiple subflows, but it does not allow for sending redundant data. Multiple scheduled subflows will send sequential data on each subflow. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-8-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: add scheduler wrappers	Geliang Tang
	This patch defines two packet scheduler wrappers mptcp_sched_get_send() and mptcp_sched_get_retrans(), invoke get_subflow() of msk->sched in them. Set data->reinject to true in mptcp_sched_get_retrans(), set it false in mptcp_sched_get_send(). If msk->sched is NULL, use default functions mptcp_subflow_get_send() and mptcp_subflow_get_retrans() to send data. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-7-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: add scheduled in mptcp_subflow_context	Geliang Tang
	This patch adds a new member scheduled in struct mptcp_subflow_context, which will be set in the MPTCP scheduler context when the scheduler picks this subflow to send data. Add a new helper mptcp_subflow_set_scheduled() to set this flag using WRITE_ONCE(). Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-6-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: add sched in mptcp_sock	Geliang Tang
	This patch adds a new struct member sched in struct mptcp_sock. And two helpers mptcp_init_sched() and mptcp_release_sched() to init and release it. Init it with the sysctl scheduler in mptcp_init_sock(), copy the scheduler from the parent in mptcp_sk_clone(), and release it in __mptcp_destroy_sock(). Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-5-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: add a new sysctl scheduler	Geliang Tang
	This patch adds a new sysctl, named scheduler, to support for selection of different schedulers. Export mptcp_get_scheduler helper to get this sysctl. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-4-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: add struct mptcp_sched_ops	Geliang Tang
	This patch defines struct mptcp_sched_ops, which has three struct members, name, owner and list, and four function pointers: init(), release() and get_subflow(). The scheduler function get_subflow() have a struct mptcp_sched_data parameter, which contains a reinject flag for retrans or not, a subflows number and a mptcp_subflow_context array. Add the scheduler registering, unregistering and finding functions to add, delete and find a packet scheduler on the global list mptcp_sched_list. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-3-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: drop last_snd and MPTCP_RESET_SCHEDULER	Geliang Tang
	Since the burst check conditions have moved out of the function mptcp_subflow_get_send(), it makes all msk->last_snd useless. This patch drops them as well as the macro MPTCP_RESET_SCHEDULER. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-2-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	mptcp: refactor push_pending logic	Geliang Tang
	To support redundant package schedulers more easily, this patch refactors __mptcp_push_pending() logic from: For each dfrag: While sends succeed: Call the scheduler (selects subflow and msk->snd_burst) Update subflow locks (push/release/acquire as needed) Send the dfrag data with mptcp_sendmsg_frag() Update already_sent, snd_nxt, snd_burst Update msk->first_pending Push/release on final subflow -> While first_pending isn't empty: Call the scheduler (selects subflow and msk->snd_burst) Update subflow locks (push/release/acquire as needed) For each pending dfrag: While sends succeed: Send the dfrag data with mptcp_sendmsg_frag() Update already_sent, snd_nxt, snd_burst Update msk->first_pending Break if required by msk->snd_burst / etc Push/release on final subflow Refactors __mptcp_subflow_push_pending logic from: For each dfrag: While sends succeed: Call the scheduler (selects subflow and msk->snd_burst) Send the dfrag data with mptcp_subflow_delegate(), break Send the dfrag data with mptcp_sendmsg_frag() Update dfrag->already_sent, msk->snd_nxt, msk->snd_burst Update msk->first_pending -> While first_pending isn't empty: Call the scheduler (selects subflow and msk->snd_burst) Send the dfrag data with mptcp_subflow_delegate(), break Send the dfrag data with mptcp_sendmsg_frag() For each pending dfrag: While sends succeed: Send the dfrag data with mptcp_sendmsg_frag() Update already_sent, snd_nxt, snd_burst Update msk->first_pending Break if required by msk->snd_burst / etc Move the duplicate code from __mptcp_push_pending() and __mptcp_subflow_push_pending() into a new helper function, named __subflow_push_pending(). Simplify __mptcp_push_pending() and __mptcp_subflow_push_pending() by invoking this helper. Also move the burst check conditions out of the function mptcp_subflow_get_send(), check them in __subflow_push_pending() in the inner "for each pending dfrag" loop. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-1-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	Merge tag 'mlx5-updates-2023-08-16' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-08-16 1) aRFS ethtool stats Improve aRFS observability by adding new set of counters. Each Rx ring will have this set of counters listed below. These counters are exposed through ethtool -S. 1.1) arfs_add: number of times a new rule has been created. 1.2) arfs_request_in: number of times a rule was requested to move from its current Rx ring to a new Rx ring (incremented on the destination Rx ring). 1.3) arfs_request_out: number of times a rule was requested to move out from its current Rx ring (incremented on source/current Rx ring). 1.4) arfs_expired: number of times a rule has been expired by the kernel and removed from HW. 1.5) arfs_err: number of times a rule creation or modification has failed. 2) Supporting inline WQE when possible in SW steering 3) Misc cleanups and fixups to net-next branch * tag 'mlx5-updates-2023-08-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Devcom, only use devcom after NULL check in mlx5_devcom_send_event() net/mlx5: DR, Supporting inline WQE when possible net/mlx5: Rename devlink port ops struct for PFs/VFs net/mlx5: Remove VPORT_UPLINK handling from devlink_port.c net/mlx5: Call mlx5_esw_offloads_rep_load/unload() for uplink port directly net/mlx5: Update dead links in Kconfig documentation net/mlx5: Remove health syndrome enum duplication net/mlx5: DR, Remove unneeded local variable net/mlx5: DR, Fix code indentation net/mlx5: IRQ, consolidate irq and affinity mask allocation net/mlx5e: Fix spelling mistake "Faided" -> "Failed" net/mlx5e: aRFS, Introduce ethtool stats net/mlx5e: aRFS, Warn if aRFS table does not exist for aRFS rule net/mlx5e: aRFS, Prevent repeated kernel rule migrations requests ==================== Link: https://lore.kernel.org/r/20230821175739.81188-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	vrf: Remove unnecessary RCU-bh critical section	Ido Schimmel
	dev_queue_xmit_nit() already uses rcu_read_lock() / rcu_read_unlock() and nothing suggests that softIRQs should be disabled around it. Therefore, remove the rcu_read_lock_bh() / rcu_read_unlock_bh() surrounding it. Tested using [1] with lockdep enabled. [1] #!/bin/bash ip link add name vrf1 up type vrf table 100 ip link add name veth0 type veth peer name veth1 ip link set dev veth1 master vrf1 ip link set dev veth0 up ip link set dev veth1 up ip address add 192.0.2.1/24 dev veth0 ip address add 192.0.2.2/24 dev veth1 ip rule add pref 32765 table local ip rule del pref 0 tcpdump -i vrf1 -c 20 -w /dev/null & sleep 10 ping -i 0.1 -c 10 -q 192.0.2.2 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230821142339.1889961-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	vxlan: vnifilter: Use GFP_KERNEL instead of GFP_ATOMIC	Ido Schimmel
	The function is not called from an atomic context so use GFP_KERNEL instead of GFP_ATOMIC. The allocation of the per-CPU stats is already performed with GFP_KERNEL. Tested using test_vxlan_vnifiltering.sh with CONFIG_DEBUG_ATOMIC_SLEEP. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20230821141923.1889776-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	net: ethernet: ti: Remove unused declarations	Yue Haibing
	Commit e8609e69470f ("net: ethernet: ti: am65-cpsw: Convert to PHYLINK") removed am65_cpsw_nuss_adjust_link() but not its declaration. Commit 84640e27f230 ("net: netcp: Add Keystone NetCP core ethernet driver") declared but never implemented netcp_device_find_module(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Roger Quadros <rogerq@kernel.org> Link: https://lore.kernel.org/r/20230821134029.40084-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	net: microchip: Remove unused declarations	Yue Haibing
	Commit 264a9c5c9dff ("net: sparx5: Remove unused GLAG handling in PGID") removed sparx5_pgid_alloc_glag() but not its declaration. Commit 27d293cceee5 ("net: microchip: sparx5: Add support for rule count by cookie") removed vcap_rule_iter() but not its declaration. Commit 8beef08f4618 ("net: microchip: sparx5: Adding initial VCAP API support") declared but never implemented vcap_api_set_client(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821135556.43224-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	ionic: Remove unused declarations	Yue Haibing
	Commit fbfb8031533c ("ionic: Add hardware init and device commands") declared but never implemented ionic_q_rewind()/ionic_set_dma_mask(). Commit 969f84394604 ("ionic: sync the filters in the work task") declared but never implemented ionic_rx_filters_need_sync(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20230821134717.51936-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	net: mscc: ocelot: Remove unused declarations	Yue Haibing
	Commit 6c30384eb1de ("net: mscc: ocelot: register devlink ports") declared but never implemented ocelot_devlink_init() and ocelot_devlink_teardown(). Commit 2096805497e2 ("net: mscc: ocelot: automatically detect VCAP constants") declared but never implemented ocelot_detect_vcap_constants(). Commit 403ffc2c34de ("net: mscc: ocelot: add support for preemptible traffic classes") declared but never implemented ocelot_port_update_preemptible_tcs(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230821130218.19096-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	net: dsa: microchip: Remove unused declarations	Yue Haibing
	Commit 91a98917a883 ("net: dsa: microchip: move switch chip_id detection to ksz_common") removed ksz8_switch_detect() but not its declaration. Commit 6ec23aaaac43 ("net: dsa: microchip: move ksz_dev_ops to ksz_common.c") declared but never implemented other functions. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20230821125501.19624-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22	netfilter: nf_tables: allow loop termination for pending fatal signal	Florian Westphal
	abort early so task can exit faster if a fatal signal is pending, no need to continue validation in that case. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	netfilter: xtables: refactor deprecated strncpy	Justin Stitt
	Prefer `strscpy_pad` as it's a more robust interface whilst maintaing zero-padding behavior. There may have existed a bug here due to both `tbl->repl.name` and `info->name` having a size of 32 as defined below: \| #define XT_TABLE_MAXNAMELEN 32 This may lead to buffer overreads in some situations -- `strscpy` solves this by guaranteeing NUL-termination of the dest buffer. Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	netfilter: x_tables: refactor deprecated strncpy	Justin Stitt
	Prefer `strscpy_pad` to `strncpy`. Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	netfilter: nft_meta: refactor deprecated strncpy	Justin Stitt
	Prefer `strscpy_pad` to `strncpy`. Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	netfilter: nft_osf: refactor deprecated strncpy	Justin Stitt
	Use `strscpy_pad` over `strncpy` for NUL-terminated strings. We can also drop the + 1 from `NFT_OSF_MAXGENRELEN + 1` since `strscpy` will guarantee NUL-termination. Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	netfilter: nf_tables: refactor deprecated strncpy	Justin Stitt
	Prefer `strscpy_pad` over `strncpy`. Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	netfilter: nf_tables: refactor deprecated strncpy	Justin Stitt
	Prefer `strscpy_pad` over `strncpy`. Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	netfilter: ipset: refactor deprecated strncpy	Justin Stitt
	Use `strscpy_pad` instead of `strncpy`. Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	netfilter: ebtables: replace zero-length array members	GONG, Ruiqi
	As suggested by Kees[1], replace the old-style 0-element array members of multiple structs in ebtables.h with modern C99 flexible array. [1]: https://lore.kernel.org/all/5E8E0F9C-EE3F-4B0D-B827-DC47397E2A4A@kernel.org/ [ fw@strlen.de: keep struct ebt_entry_target as-is, causes compiler warning: "variable sized type 'struct ebt_entry_target' not at the end of a struct or class is a GNU extension" ] Link: https://github.com/KSPP/linux/issues/21 Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	netfilter: ebtables: fix fortify warnings in size_entry_mwt()	GONG, Ruiqi
	When compiling with gcc 13 and CONFIG_FORTIFY_SOURCE=y, the following warning appears: In function ‘fortify_memcpy_chk’, inlined from ‘size_entry_mwt’ at net/bridge/netfilter/ebtables.c:2118:2: ./include/linux/fortify-string.h:592:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 592 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The compiler is complaining: memcpy(&offsets[1], &entry->watchers_offset, sizeof(offsets) - sizeof(offsets[0])); where memcpy reads beyong &entry->watchers_offset to copy {watchers,target,next}_offset altogether into offsets[]. Silence the warning by wrapping these three up via struct_group(). Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-22	bonding: update port speed when getting bond speed	Hangbin Liu
	Andrew reported a bonding issue that if we put an active-back bond on top of a 802.3ad bond interface. When the 802.3ad bond's speed/duplex changed dynamically. The upper bonding interface's speed/duplex can't be changed at the same time, which will show incorrect speed. Fix it by updating the port speed when calling ethtool. Reported-by: Andrew Schorr <ajschorr@alumni.princeton.edu> Closes: https://lore.kernel.org/netdev/ZEt3hvyREPVdbesO@Laptop-X1/ Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20230821101008.797482-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-22	net: remove unnecessary input parameter 'how' in ifdown function	Zhengchao Shao
	When the ifdown function in the dst_ops structure is referenced, the input parameter 'how' is always true. In the current implementation of the ifdown interface, ip6_dst_ifdown does not use the input parameter 'how', xfrm6_dst_ifdown and xfrm4_dst_ifdown functions use the input parameter 'unregister'. But false judgment on 'unregister' in xfrm6_dst_ifdown and xfrm4_dst_ifdown is false, so remove the input parameter 'how' in ifdown function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821084104.3812233-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-22	alx: fix OOB-read compiler warning	GONG, Ruiqi
	The following message shows up when compiling with W=1: In function ‘fortify_memcpy_chk’, inlined from ‘alx_get_ethtool_stats’ at drivers/net/ethernet/atheros/alx/ethtool.c:297:2: ./include/linux/fortify-string.h:592:4: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 592 \| __read_overflow2_field(q_size_field, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to get alx stats altogether, alx_get_ethtool_stats() reads beyond hw->stats.rx_ok. Fix this warning by directly copying hw->stats, and refactor the unnecessarily complicated BUILD_BUG_ON btw. Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821013218.1614265-1-gongruiqi@huaweicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-21	net: pcs: lynxi: implement pcs_disable op	Daniel Golle
	When switching from 10GBase-R/5GBase-R/USXGMII to one of the interface modes provided by mtk-pcs-lynxi we need to make sure to always perform a full configuration of the PHYA. Implement pcs_disable op which resets the stored interface mode to PHY_INTERFACE_MODE_NA to trigger a full reconfiguration once the LynxI PCS driver had previously been deselected in favor of another PCS driver such as the to-be-added driver for the USXGMII PCS found in MT7988. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/f23d1a60d2c9d2fb72e32dcb0eaa5f7e867a3d68.1692327891.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21	Revert "pds_core: Fix some kernel-doc comments"	Jakub Kicinski
	This reverts commit cb39c35783f26892bb1a72b1115c94fa2e77f4c5. Patch was applied to hastily, the problem is already fixed in Alex's vfio tree: https://lore.kernel.org/all/20230821112237.105872b5.alex.williamson@redhat.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21	net/mlx5: Devcom, only use devcom after NULL check in mlx5_devcom_send_event()	Li Zetao
	There is a warning reported by kernel test robot: smatch warnings: drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c:264 mlx5_devcom_send_event() warn: variable dereferenced before IS_ERR check devcom (see line 259) The reason for the warning is that the pointer is used before check, put the assignment to comp after devcom check to silence the warning. Fixes: 88d162b47981 ("net/mlx5: Devcom, Infrastructure changes") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202308041028.AkXYDwJ6-lkp@intel.com/ Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-21	net/mlx5: DR, Supporting inline WQE when possible	Itamar Gozlan
	In WQE (Work Queue Entry), the two types of data segments memories are pointers and inline data, where inline data is passed directly as part of the WQE. For software steering, the maximal inline size should be less than 2*MLX5_SEND_WQE_BB, i.e., the potential data must fit with the required inline WQE headers. Two consecutive blocks (MLX5_SEND_WQE_BB) are not guaranteed to reside on the same memory page. Hence, writes to MLX5_SEND_WQE_BB should be done separately, i.e., each MLX5_SEND_WQE_BB should be obtained using the mlx5_wq_cyc_get_wqe macro. Signed-off-by: Itamar Gozlan <igozlan@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-21	net/mlx5: Rename devlink port ops struct for PFs/VFs	Jiri Pirko
	As this struct is only used for devlink ports created for PF/VF, add it to the name of the variable to distinguish from the SF related ops struct. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-21	net/mlx5: Remove VPORT_UPLINK handling from devlink_port.c	Jiri Pirko
	It is not possible that the functions in devlink_port.c are called for uplink port. Remove this leftover code. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-21	net/mlx5: Call mlx5_esw_offloads_rep_load/unload() for uplink port directly	Jiri Pirko
	For uplink port, mlx5_esw_offloads_load/unload_rep() are currently called. There are 2 check inside, which effectively make the functions a simple wrappers of mlx5_esw_offloads_rep_load/unload() for uplink port. So avoid one check and indirection and call mlx5_esw_offloads_rep_load/unload() for uplink port directly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-21	net/mlx5: Update dead links in Kconfig documentation	Rahul Rameshbabu
	Point to NVIDIA documentation for device specific information now that the Mellanox documentation site is deprecated. Refer to kernel documentation sources for generic information not specific to mlx5 devices. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>