summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
AgeCommit message (Collapse)Author
2025-05-13net: mlxsw: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the mlxsw driver to the new API, so that the ndo_eth_ioctl() path can be removed completely. The UAPI is still ioctl-only, but it's best to remove the "ioctl" mentions from the driver in case a netlink variant appears. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250512154411.848614-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-24mlxsw: Add VXLAN bridge ports to same hardware domain as physical bridge portsAmit Cohen
When hardware floods packets to bridge ports, but flooding to VXLAN bridge port fails during encapsulation to one of the remote VTEPs, the packets are trapped to CPU. In such case, the packets are marked with skb->offload_fwd_mark, which means that packet was L2-forwarded in hardware. Software data path repeats flooding, but packets which are marked with skb->offload_fwd_mark will not be flooded by the bridge to bridge ports which are in the same hardware domain as the ingress port. Currently, mlxsw does not add VXLAN bridge ports to the same hardware domain as physical bridge ports despite the fact that the device is able to forward packets to and from VXLAN tunnels in hardware. In some scenarios (as mentioned above) this can result in remote VTEPs receiving duplicate packets. The packets are first flooded by hardware and after an encapsulation failure, they are flooded again to all remote VTEPs by software. Solve this by adding VXLAN bridge ports to the same hardware domain as physical bridge ports, so then nbp_switchdev_allowed_egress() will return false also for VXLAN, and packets will not be sent twice from VXLAN device. switchdev_bridge_port_offload() should get vxlan_dev not as const, so some changes are required. Call switchdev API from mlxsw_sp_bridge_vxlan_{join,leave}() which handle offload configurations. Reported-by: Vladimir Oltean <olteanv@gmail.com> Closes: https://lore.kernel.org/all/20250210152246.4ajumdchwhvbarik@skbuf/ Reported-by: Vladyslav Mykhaliuk <vmykhaliuk@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/7279056843140fae3a72c2d204c7886b79d03899.1742224300.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-04mlxsw: spectrum_router: Remove unused functionsDr. David Alan Gilbert
mlxsw_sp_ipip_lb_ul_vr_id() has been unused since 2020's commit acde33bf7319 ("mlxsw: spectrum_router: Reduce mlxsw_sp_ipip_fib_entry_op_gre4()") mlxsw_sp_rif_exists() has been unused since 2023's commit 49c3a615d382 ("mlxsw: spectrum_router: Replay MACVLANs when RIF is made") mlxsw_sp_rif_vid() has been unused since 2023's commit a5b52692e693 ("mlxsw: spectrum_switchdev: Manage RIFs on PVID change") Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20250203190141.204951-1-linux@treblig.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-17mlxsw: Move Tx header handling to PCI driverAmit Cohen
Tx header should be added to all packets transmitted from the CPU to Spectrum ASICs. Historically, handling this header was added as a driver function, as Tx header is different between Spectrum and Switch-X. See SwitchX implementation in commit 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support"). From May 2021, there is no support for SwitchX-2 ASIC, and all the relevant code was removed. For now, there is no justification to handle Tx header as part of spectrum.c, we can handle this as part of PCI, in skb_transmit(). A future patch set will add support for XDP in mlxsw driver, to support XDP_TX and XDP_REDIRECT actions, Tx header should be added before transmitting the packet. As preparation for this, move Tx header handling to PCI driver, so then XDP code will not have to call API from spectrum.c. This also improves the code as now Tx header is pushed just before transmitting, so it is not done from many flows which might miss something. Note that for PTP, we should configure Tx header differently, use the fields from mlxsw_txhdr_info to configure the packets correctly in PCI driver. Handle VLAN tagging in switch driver, verify that packet which should be transmitted as data is tagged, otherwise, tag it. Remove the calls for thxdr_construct() functions, as now this is done as part of skb_transmit(). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/293a81e6f7d59a8ec9f9592edb7745536649ff11.1737044384.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-17mlxsw: Initialize txhdr_info according to PTP operationsAmit Cohen
A next patch will construct Tx header as part of pci.c. The switch driver (mlxsw_spectrum.ko) should encapsulate all the differences between the different ASICs and the bus driver (mlxsw_pci.ko) should remain unaware. As preparation, add the relevant info as part of mlxsw_txhdr_info structure, so later bus driver will merely construct the Tx header based on information passed from the switch driver. Most of the packets are transmitted as control packets, but PTP packets in Spectrum-2 and Spectrum-3 should be handled differently. The driver transmits them as data packets, and the default VLAN tag (4095) is added if the packet is not already tagged. Extend PTP operations to store a boolean which indicates whether packets should be transmitted as data packets. Set it for Spectrum-2 and Spectrum-3 only. Extend mlxsw_txhdr_info to store fields which will be used later to construct Tx header. Initialize such fields according to the new boolean which is stored in PTP operations. Note that for now, mlxsw_txhdr_info structure is initialized, but not used, a next patch will use it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/efcaacd4bedef524e840a0c29f96cebf2c4bc0e0.1737044384.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: Add struct kernel_ethtool_ts_infoKory Maincent
In prevision to add new UAPI for hwtstamp we will be limited to the struct ethtool_ts_info that is currently passed in fixed binary format through the ETHTOOL_GET_TS_INFO ethtool ioctl. It would be good if new kernel code already started operating on an extensible kernel variant of that structure, similar in concept to struct kernel_hwtstamp_config vs struct hwtstamp_config. Since struct ethtool_ts_info is in include/uapi/linux/ethtool.h, here we introduce the kernel-only structure in include/linux/ethtool.h. The manual copy is then made in the function called by ETHTOOL_GET_TS_INFO. Acked-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240709-feature_ptp_netnext-v17-6-b5317f50df2a@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14mlxsw: Use the same maximum MTU value throughout the driverAmit Cohen
Currently, the driver uses two different values for maximum MTU, one is stored in mlxsw_port->dev->max_mtu and the second is stored in mlxsw_port->max_mtu. The second one is set to value which is queried from firmware. This value was never tested, and unfortunately is not really supported. That means that with the existing code, user can set MTU to X, which is not really supported by firmware and which is bigger than buffer size which is allocated in pci. To make the driver consistent, use only mlxsw_port->dev->max_mtu for maximum MTU value, for buffers headroom add Ethernet frame headers, which are not included in mlxsw_port->dev->max_mtu. Remove mlxsw_port->max_mtu. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/89fa6f804386b918d337e736e14ac291bb947483.1718275854.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11mlxsw: spectrum: Allow fetch-and-clear of flow countersPetr Machata
For the report_delta-like interface like a previous patch has added for collection of NH group statistics, it's easiest to read the counter and have the HW clear it right away. Thus, change mlxsw_sp_flow_counter_get() to take a bool indicating whether this should be done. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/6a096ede8ee92d5041e3832242c3bbc137198aba.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-30mlxsw: spectrum: Query max_lag onceAmit Cohen
The maximum number of LAGs is queried from core several times. It is used to allocate LAG array, and then to iterate over it. In addition, it is used for PGT initialization. To simplify the code, instead of querying it several times, store the value as part of 'mlxsw_sp' and use it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-30mlxsw: spectrum: Change mlxsw_sp_upper to LAG structureAmit Cohen
The structure mlxsw_sp_upper is used only as LAG. Rename it to mlxsw_sp_lag and move it to spectrum.c file, as it is used only there. Move the function mlxsw_sp_lag_get() with the structure. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-15mlxsw: spectrum_fid: Add an "any" packet typePetr Machata
Flood profiles have been used prior to CFF support for NVE underlay. Like is the case with FID flooding, an NVE profile describes at which offset a datum is located given traffic type. mlxsw currently only ever uses one KVD entry for NVE lookup, i.e. regardless of traffic type, the offset is always zero. To be able to describe this, add a traffic type enumerator describing "any traffic type". Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-29mlxsw: spectrum_fid: Add hooks for RSP table maintenancePetr Machata
In the CFF flood mode, the driver has to allocate a table within PGT, which holds flood vectors for router subport FIDs. For LAGs, these flood vectors have to obviously be maintained dynamically as port membership in a LAG changes. But even for physical ports, the flood vectors have to be kept valid, and may not contain enabled bits corresponding to non-existent ports. It is therefore not possible to precompute the port part of the RSP table, it has to be maintained as ports come and go due to splits. To support the RSP table maintenance, add to FID ops two new ops: fid_port_init and fid_port_fini, for when a port comes to existence, or joins a lag, and vice versa. Invoke these ops from mlxsw_sp_port_fids_init() and mlxsw_sp_port_fids_fini(), which are called when port is added and removed, respectively. Also add two new hooks for LAG maintenance, mlxsw_sp_fid_port_join_lag() / _leave_lag() which transitively call into the same ops. Later patches will actually add the op implementations themselves, this just adds the scaffolding. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/234398a23540317abb25f74f920a5c8121faecf0.1701183892.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29mlxsw: spectrum_fid: Add a not-UC packet typePetr Machata
In CFF flood mode, the rFID family will allocate two tables. One for unknown UC traffic, one for everything else. Add a traffic type for the everything else traffic. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/8fb968b2d1cc37137cd0110c98cdeb625b03ca99.1701183892.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-29mlxsw: spectrum_fid: Privatize FID familiesPetr Machata
Currently, mlxsw always uses a "controlled" flood mode on all Nvidia Spectrum generations. The following patches will however introduce a possibility to run a "CFF" (for Compressed FID Flooding) mode on newer machines, if the FW supports it. Several operations will differ between how they need to be done in controlled mode vs. CFF mode. Thus the per-FID-family ops will differ between controlled and CFF, thus the FID family array as such will differ depending on whether the mode negotiated with FW is controlled or CFF. The simple approach of having several globally visible arrays for spectrum.c to statically choose from no longer works. Instead privatize all FID initialization and finalization logic, and expose it as ops instead. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/d3fa390d97cf3dbd2f7a28741be69b311e2059e4.1701183891.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21mlxsw: spectrum_router: Add a helper to get subport number from a RIFPetr Machata
In the CFF flood mode, responsibility for management of the PGT entries for rFIDs is moved from FW to the driver. All rFIDs are based off either a front panel port, or a LAG port. The flood vectors for port-based rFIDs enable just the port itself, the ones for LAG-based rFIDs enable all member ports of the LAG in question. Since all rFIDs based off the same port have the same flood vector, and similarly for LAG-based rFIDs, the flood entries are shared. The PGT address of the flood vector is therefore determined based on the port (or LAG) number of the RIF connected with the rFID. Add a helper to determine subport number given a RIF, to be used in these calculations. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/d7ab43cf5b021f785f363f236e4b6780d10eea93.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-20mlxsw: spectrum: Allocate LAG table when in SW LAG modePetr Machata
In this patch, if the LAG mode is SW, allocate the LAG table and configure SGCR to indicate where it was allocated. We use the default "DDD" (for dynamic data duplication) layout of the LAG table. In the DDD mode, the membership information for each LAG is copied in 8 PGT entries. This is done for performance reasons. The LAG table then needs to be allocated on an address aligned to 8. Deal with this by moving the LAG init ahead so that the LAG table is allocated at address 0. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20mlxsw: spectrum_pgt: Generalize PGT allocationPetr Machata
PGT blocks are allocated through the function mlxsw_sp_pgt_mid_alloc_range(). The interface assumes that the caller knows which piece of PGT exactly they want to get. That was fine while the FID code was the only client allocating blocks of PGT. However for SW-allocated LAG table, there will be an additional client: mlxsw_sp_lag_init(). The interface should therefore be changed to not require particular coordinates, but to take just the requested size, allocate the block wherever, and give back the PGT address. In this patch, change the interface accordingly. Initialize FID family's pgt_base from the result of the PGT allocation (note that mlxsw makes a copy of the family structure, so what gets initialized is not actually the global structure). Drop the now-unnecessary pgt_base initializations and the corresponding defines. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mlxsw: spectrum: Stop ignoring learning notifications from redirected trafficIdo Schimmel
As explained in the previous patch, with the ignore action prepended to the redirect action, it is not longer possible for redirected traffic to generate learning notifications. Therefore, remove the workaround that was added in commit 577fa14d2100 ("mlxsw: spectrum: Do not process learned records with a dummy FID") as it is no longer needed. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mlxsw: spectrum_flower: Disable learning and security lookup when redirectingIdo Schimmel
It is possible to add a filter that redirects traffic from the ingress of a bridge port that is locked (i.e., performs security / SMAC lookup) and has learning enabled. For example: # ip link add name br0 type bridge # ip link set dev swp1 master br0 # bridge link set dev swp1 learning on locked on mab on # tc qdisc add dev swp1 clsact # tc filter add dev swp1 ingress pref 1 proto ip flower skip_sw src_ip 192.0.2.1 action mirred egress redirect dev swp2 In the kernel's Rx path, this filter is evaluated before the Rx handler of the bridge, which means that redirected traffic should not be affected by bridge port configuration such as learning. However, the hardware data path is a bit different and the redirect action (FORWARDING_ACTION in hardware) merely attaches a pointer to the packet, which is later used by the L2 lookup stage to understand how to forward the packet. Between both stages - ingress ACL and L2 lookup - learning and security lookup are performed, which means that redirected traffic is affected by bridge port configuration, unlike in the kernel's data path. The learning discrepancy was handled in commit 577fa14d2100 ("mlxsw: spectrum: Do not process learned records with a dummy FID") by simply ignoring learning notifications generated by the redirected traffic. A similar solution is not possible for the security / SMAC lookup since - unlike learning - the CPU is not involved and packets that failed the lookup are dropped by the device. Instead, solve this by prepending the ignore action to the redirect action and use it to instruct the device to disable both learning and the security / SMAC lookup for redirected traffic. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-04mlxsw: spectrum: Remove unused function declarationsYue Haibing
Commit c3d2ed93b14d ("mlxsw: Remove old parsing depth infrastructure") left behind mlxsw_sp_nve_inc_parsing_depth_get()/mlxsw_sp_nve_inc_parsing_depth_put(). And commit 532b49e41e64 ("mlxsw: spectrum_span: Derive SBIB from maximum port speed & MTU") remove mlxsw_sp_span_port_mtu_update()/mlxsw_sp_span_speed_update_work() but leave the declarations. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230803142047.42660-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28mlxsw: spectrum: Drop unused functions mlxsw_sp_port_lower_dev_hold/_put()Petr Machata
As of commit 151b89f6025a ("mlxsw: spectrum_router: Reuse work neighbor initialization in work scheduler"), the functions mlxsw_sp_port_lower_dev_hold() and mlxsw_sp_port_dev_put() have no users. Drop them. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/d0adcd7cb4ea19416294a0f861100edba84c9f36.1690471774.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-21mlxsw: spectrum_switchdev: Replay switchdev objects on port joinPetr Machata
Currently it never happens that a netdevice that is already a bridge slave would suddenly become mlxsw upper. The only case where this might be possible as far as mlxsw is concerned, is with LAG netdevices. But if a LAG has any upper (e.g. is enslaved), enlaving mlxsw port to that LAG is forbidden. Thus the only way to install a LAG between a bridge and a mlxsw port is by first enslaving the port to the LAG, and then enslaving that LAG to a bridge. At that point there are no bridge objects (such as port VLANs) to replay. Those are added afterwards, and notified as they are created. This holds even for the PVID. However in the following patches, the requirement that ports be only enslaved to masters without uppers, is going to be relaxed. It will therefore be necessary to replay the existing bridge objects. Without this replay, e.g. the mlxsw bridge_port_vlan objects are not instantiated, which causes issues later, as a lot of code relies on their presence. To that end, add a new notifier block whose sole role is to filter out events related to the one relevant upper, and forward those to the existing switchdev notifier block. Pass the new notifier block to switchdev_bridge_port_offload() when the bridge port is created. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-12mlxsw: spectrum_flower: Add ability to match on port rangesIdo Schimmel
Add the ability to match on port ranges by utilizing the previously added port range registers and the port range key element. Up to two port range registers can be used for each filter, one for source port and another for destination port. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/df4385a9592917e9a22ebff339e0463e4a8dfa82.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12mlxsw: spectrum_acl: Pass main driver structure to mlxsw_sp_acl_rulei_destroy()Ido Schimmel
The main driver structure will be needed in this function by a subsequent patch, so pass it. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/24d96a4e21310e5de2951ace58263db35e44a0df.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12mlxsw: spectrum_port_range: Add devlink resource supportIdo Schimmel
Expose via devlink-resource the maximum number of port range registers and their current occupancy. Besides the observability benefits, this resource will be used by subsequent patches for scale and occupancy tests. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/7945e0c715dc5efb1617f45f7560c1f1bd0bcf8a.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-12mlxsw: spectrum_port_range: Add port range coreIdo Schimmel
The Spectrum ASICs have a fixed number of port range registers, each of which maintains the following parameters: * Minimum and maximum port. * Apply port range for source port, destination port or both. * Apply port range for TCP, UDP or both. * Apply port range for IPv4, IPv6 or both. Implement a port range core which takes care of the allocation and configuration of these registers and exposes an API that allows in-driver consumers (e.g., the ACL code) to request matching on a range of either source or destination port. These registers are going to be used for port range matching in the flower classifier that already matches on EtherType being IPv4 / IPv6 and IP protocol being TCP / UDP. As such, there is no need to limit these registers to a specific EtherType or IP protocol, which will increase the likelihood of a register being shared by multiple flower filters. It is unlikely that a filter will match on the same range of both source and destination ports, which is why each register is only configured to match on either source or destination port. If a filter requires matching on a range of both source and destination ports, it will utilize two port range registers and match on the output of both. For efficient lookup and traversal, use XArray to store the allocated port range registers. The XArray uses RCU and an internal spinlock to synchronise access, so there is no need for a dedicate lock. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/674f00539a0072d455847663b5feb504db51a259.1689092769.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-14mlxsw: spectrum_router: Add a helper specifically for joining a LAGPetr Machata
Currently, joining a LAG very simply means that the LAG RIF should be joined by the subport representing untagged traffic. If the RIF does not exist, it does not have to be created: if the user wants there to be RIF for the LAG device, they are supposed to add an IP address, and they are supposed to do it after tha LAG becomes mlxsw upper. We can also assume that the LAG has no uppers, otherwise the enslavement is not allowed. In the future, these ordering dependencies should be removed. That means that joining LAG will be more complex operation, possibly involving a lazy RIF creation, and possibly joining / lazily creating RIFs for VLAN uppers of the LAG. It will be handy to have a dedicated function that handles all this. The new function mlxsw_sp_router_port_join_lag() is that. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-12mlxsw: spectrum_router: Move here inetaddr validator notifiersPetr Machata
The validation logic is already in the router code. Move there the notifier blocks themselves as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-07mlxsw: spectrum_acl_tcam: Move devlink param to TCAM codeIdo Schimmel
Cited commit added 'DEVLINK_CMD_PARAM_DEL' notifications whenever the network namespace of the devlink instance is changed. Specifically, the notifications are generated after calling reload_down(), but before calling reload_up(). At this stage, the data structures accessed while reading the value of the "acl_region_rehash_interval" devlink parameter are uninitialized, resulting in a use-after-free [1]. Fix by moving the registration and unregistration of the devlink parameter to the TCAM code where it is actually used. This means that the parameter is unregistered during reload_down() and then re-registered during reload_up(), avoiding the use-after-free between these two operations. Reproducer: # ip netns add test123 # devlink dev reload pci/0000:06:00.0 netns test123 [1] BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0 Read of size 4 at addr ffff888162fd37d8 by task devlink/1323 [...] Call Trace: <TASK> dump_stack_lvl+0x95/0xbd print_report+0x181/0x4a1 kasan_report+0xdb/0x200 mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0 mlxsw_sp_params_acl_region_rehash_intrvl_get+0x32/0x80 devlink_nl_param_fill.constprop.0+0x29a/0x11e0 devlink_param_notify.constprop.0+0xb9/0x250 devlink_notify_unregister+0xbc/0x470 devlink_reload+0x1aa/0x440 devlink_nl_cmd_reload+0x559/0x11b0 genl_family_rcv_msg_doit.isra.0+0x1f8/0x2e0 genl_rcv_msg+0x558/0x7f0 netlink_rcv_skb+0x170/0x440 genl_rcv+0x2d/0x40 netlink_unicast+0x53f/0x810 netlink_sendmsg+0x961/0xe80 __sys_sendto+0x2a4/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 7d7e9169a3ec ("devlink: move devlink reload notifications back in between _down() and _up() calls") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-09mlxsw: spectrum: Add an API to configure security checksIdo Schimmel
Add an API to enable or disable security checks on a local port. It will be used by subsequent patches when the 'BR_PORT_LOCKED' flag is toggled. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29mlxsw: Send PTP packets as data packets to overcome a limitationDanielle Ratson
In Spectrum-2 and Spectrum-3, the correction field of PTP packets which are sent as control packets is not updated at egress port. To overcome this limitation, PTP packets which require time stamp, should be sent as data packets with the following details: 1. FID valid = 1 2. FID value above the maximum FID 3. rx_router_port = 1 >From Spectrum-4 and on, this limitation will be solved. Extend the function which handles TX header, in case that the packet is a PTP packet, add TX header with type=data and all the above mentioned requirements. Add operation as part of 'struct mlxsw_sp_ptp_ops', to be able to separate the handling of PTP packets between different ASICs. Use the data packet solution only for Spectrum-2 and Spectrum-3. Therefore, add a dedicated operation structure for Spectrum-4, as it will be same to Spectrum-2 in PTP implementation, just will not have the limitation of control packets. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04mlxsw: Enable unified bridge modelAmit Cohen
After all the preparations for unified bridge model, finally flip mlxsw driver to use the new model. Change config profile, set 'ubridge' to true and remove the configurations that are relevant only for the legacy model. Set 'flood_mode' to 'controlled' as the current mode is not supported with unified bridge model. Remove all the code which is dedicated to the legacy model. Remove 'struct mlxsw_sp.ubridge' variable which was temporarily added to separate configurations between the models. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04mlxsw: Add new FID families for unified bridge modelAmit Cohen
In the unified bridge model, mlxsw will no longer emulate 802.1Q FIDs using 802.1D FIDs. The new FID table will look as follows: +---------------+ | 802.1q FIDs | 4K entries | [1..4094] | +---------------+ | 802.1d FIDs | 1K entries | [4095..5118] | +---------------+ | Dummy FIDs | 1 entry | [5119..5119] | +---------------+ | rFIDs | 11K entries | [5120..16383] | +---------------+ In order to make the change easier to review, four new temporary FID families will be added (e.g., MLXSW_SP_FID_TYPE_8021D_UB) and will not be registered with the FID core until mlxsw is flipped to use the unified bridge model. Add .1d, rfid and dummy FID families for unified bridge, the next patch will add .1q family separately as it requires more changes. The following changes are required: 1. Add 'smpe_index_valid' field to 'struct mlxsw_sp_fid_family' and set SFMR.smpe accordingly. SMPE index is reserved for rFIDs, as their flooding is handled by firmware, and always reserved in Spectrum-1, as it is configured as part of PGT table. 2. Add 'ubridge' field to 'struct mlxsw_sp_fid_family'. This field will be removed later, use it in mlxsw_sp_fid_family_{register,unregister}() to skip the registration / unregistration of the new families when the legacy model is used. 3. Indexes - the start and end indexes of each FID family will need to be changed according to the above diagram. 4. Add flood tables for unified bridge model, use 'fid_offset' as table type, as in the new model the access to flood tables will be using 'fid_offset' calculation. 5. FID family operation changes: a. rFID supposed to be created using SFMR, as it is not created by firmware using unified bridge model. b. port_vid_map() should perform SVFA for rFID, as the mapping is not created by firmware using unified bridge model. c. flood_index() is not aligned to the new model, as this function will be removed later. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04mlxsw: Add support for VLAN RIFsAmit Cohen
Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of 'VLAN' type, whereas RIFs constructed on top of VLAN-unaware bridges are of 'FID' type. Currently 802.1Q FIDs are emulated using 802.1D FIDs, therefore VLAN RIFs are emulated using FID RIFs. As part of converting the driver to use unified bridge model, 802.1Q FIDs and VLAN RIFs will be used. The egress FID is required for VLAN RIFs in Spectrum-2 and above, but not in Spectrum-1, as in Spectrum-1 the mapping for VLAN RIFs is VID->FID, while in other ASICs it is FID->FID. The reason for the change is that it is more scalable to reuse the FID->FID entry than creating multiple {Port, VID}->FID entries for the router port. Use the existing operation structure to separate the configuration between different ASICs. Add support for VLAN RIFs, most of the configurations are same to FID RIFs. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-04mlxsw: Configure ingress RIF classificationAmit Cohen
Before layer 2 forwarding, the device classifies an incoming packet to a FID. The classification is done based on one of the following keys: 1. FID 2. VNI (after decapsulation) 3. VID / {Port, VID} After classification, the FID is known, but also all the attributes of the FID, such as the router interface (RIF) via which a packet that needs to be routed will ingress the router block. In the legacy model, when a RIF was created / destroyed, it was firmware's responsibility to update it in the previously mentioned FID classification records. In the unified bridge model, this responsibility moved to software. The third classification requires to iterate over the FID's {Port, VID} list and issue SVFA write with the correct mapping table according to the port's mode (virtual or not). We never map multiple VLANs to the same FID using VID->FID mapping, so such a mapping needs to be performed once. When a new FID classification entry is configured and the FID already has a RIF, set the RIF as part of SVFA configuration. The reverse needs to be done when clearing a RIF from a FID. Currently, clearing is done by issuing mlxsw_sp_fid_rif_set() with a NULL RIF pointer. Instead, introduce mlxsw_sp_fid_rif_unset(). Note that mlxsw_sp_fid_rif_set() is called after the RIF is fully operational, so it conforms to the internal requirement regarding SVFA.irif_v: "Must not be set for a non-enabled RIF". Do not set the ingress RIF for rFIDs, as the {Port, VID}->rFID entry is configured by firmware when legacy model is used, a next patch will handle this configuration for rFIDs and unified bridge model. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-29mlxsw: spectrum_switchdev: Rename MID structureAmit Cohen
Currently the structure which represents MDB entry is called 'struct mlxsw_sp_mid'. This name is not accurate as a MID entry stores a bitmap of ports to which a packet needs to be replicated and a MDB entry stores the mapping from {MAC, FID} to PGT index (MID). Rename the structure to 'struct mlxsw_sp_mdb_entry'. The structure 'mlxsw_sp_mid' is defined as part of spectrum.h. The only file which uses it is spectrum_switchdev.c, so there is no reason to expose it to other files. Move the definition to spectrum_switchdev.c. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-29mlxsw: Align PGT index to legacy bridge modelAmit Cohen
FID code reserves about 15K entries in PGT table for flooding. These entries are just allocated and are not used yet because the code that uses them is skipped now. The next patches will convert MDB code to use PGT APIs. The allocation of indexes for multicast is done after FID code reserves 15K entries. Currently, legacy bridge model is used and firmware manages PGT table. That means that the indexes which are allocated using PGT API are too high when legacy bridge model is used. To not exceed firmware limitation for MDB entries, add an API that returns the correct 'mid_index', based on bridge model. For legacy model, subtract the number of flood entries from PGT index. Use it to write the correct MID to SMID register. This API will be used also from MDB code in the next patches. PGT should not be aware of MDB and FID different usage, this API is temporary and will be removed once unified bridge model will be used. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-28mlxsw: Extend PGT APIs to support maintaining list of ports per entryAmit Cohen
Add an API to associate a PGT entry with SMPE index and add or remove a port. This API will be used by FID code and MDB code, to add/remove port from specific PGT entry. When the first port is added to PGT entry, allocate the entry in the given MID index, when the last port is removed from PGT entry, free it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-28mlxsw: Add a dedicated structure for bitmap of portsAmit Cohen
Currently when bitmap of ports is needed, 'unsigned long *' type is used. The functions which use the bitmap assume its length according to its name, i.e., each function which gets a bitmap of ports queries the maximum number of ports and uses it as the size. As preparation for the next patch which will use bitmap of ports, add a dedicated structure for it. Refactor the existing code to use the new structure. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-28mlxsw: Add an indication of SMPE index validity for PGT tableAmit Cohen
In Spectrum-1, the index into the MPE table - called switch multicast to port egress VID (SMPE) - is derived from the PGT entry, whereas in Spectrum-2 and later ASICs it is derived from the FID. Therefore, in Spectrum-1, the SMPE index needs to be programmed as part of the PGT entry via SMID register, while it is reserved for Spectrum-2 and later ASICs. Add 'pgt_smpe_index_valid' boolean as part of 'struct mlxsw_sp' and set it to true for Spectrum-1 and to false for the later ASICs. Add 'smpe_index_valid' as part of 'struct mlxsw_sp_pgt' and set it according to the value in 'struct mlxsw_sp' as part of PGT initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-28mlxsw: Add an initial PGT table supportAmit Cohen
The PGT (Port Group Table) table maps an index to a bitmap of local ports to which a packet needs to be replicated. This table is used for layer 2 multicast and flooding. In the legacy model, software did not interact with this table directly. Instead, it was accessed by firmware in response to registers such as SFTR and SMID. In the new model, the SFTR register is deprecated and software has full control over the PGT table using the SMID register. The entire state of the PGT table needs to be maintained in software because member ports in a PGT entry needs to be reference counted to avoid releasing entries which are still in use. Add the following APIs: 1. mlxsw_sp_pgt_{init, fini}() - allocate/free the PGT table. 2. mlxsw_sp_pgt_mid_alloc_range() - allocate a range of MID indexes in PGT. To be used by FID code during initialization to reserve specific PGT indexes for flooding entries. 3. mlxsw_sp_pgt_mid_free_range() - free indexes in a given range. 4. mlxsw_sp_pgt_mid_alloc() - allocate one MID index in the PGT at a non-specific range, just search for free index. To be used by MDB code. 5. mlxsw_sp_pgt_mid_free() - free the given index. Note that alloc() functions do not allocate the entries in software, just allocate IDs using 'idr'. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-28mlxsw: spectrum: Add a temporary variable to indicate bridge modelAmit Cohen
As part of transition to unified bridge model, many different firmware configurations are done. Some of the configuration that needs to be done for the unified bridge model is not valid under the legacy model, and would be rejected by the firmware. At the same time, the driver cannot switch to the unified bridge model until all of the code has been converted. To allow breaking the change into patches, and to not break driver behavior during the transition, add a boolean variable to indicate bridge model. Then, forbidden configurations will be skipped using the check - "if (!mlxsw_sp->ubridge)". The new variable is temporary for several sets, it will be removed when firmware will be configured to work with unified bridge model. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-24mlxsw: spectrum: Rename MLXSW_SP_RIF_TYPE_VLANAmit Cohen
Currently, the driver emulates 802.1Q FIDs using 802.1D FIDs. As such, the RIFs configured on top of these FIDs are FID RIFs and not VLAN RIFs. As part of converting the driver to the unified bridge model, 802.1Q FIDs and VLAN RIFs will be used. As a preparation for this change, rename the emulated VLAN RIFs from 'MLXSW_SP_RIF_TYPE_VLAN' to 'MLXSW_SP_RIF_TYPE_VLAN_EMU'. After the conversion the emulated VLAN RIFs will be removed. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-24mlxsw: spectrum: Use different arrays of FID families per-ASIC typeAmit Cohen
Egress VID for layer 2 multicast is determined from two tables, the MPE and PGT tables. The MPE table is a two dimensional table indexed by local port and SMPE index, which should be thought of as a FID index. In Spectrum-1 the SMPE index is derived from the PGT entry, whereas in Spectrum-2 and newer ASICs the SMPE index is a FID attribute configured via the SFMR register. The validity of the SMPE index in SFMR is influenced from two factors: 1. FID family. SMPE index is reserved for rFIDs, as their flooding is handled by firmware. 2. ASIC generation. SMPE index is always reserved for Spectrum-1. As such, the validity of the SMPE index should be an attribute of the FID family and have different arrays of FID families per-ASIC type. As a preparation for SMPE index configuration, create separate arrays of FID families for different ASICs. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-22mlxsw: Remove lag_vid_valid indicationAmit Cohen
Currently 'struct mlxsw_sp_fid_family' has a field which indicates if 'lag_vid' is valid for use in SFD register. This is a leftover from using .1Q FIDs instead of emulating them using .1D FIDs. Currently when .1Q FIDs are emulated using .1D FIDs, this field is true for both families, so there is no reason to maintain it. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17mlxsw: Add a resource describing number of RIFsPetr Machata
The Spectrum ASIC has a limit on how many L3 devices (called RIFs) can be created. The limit depends on the ASIC and FW revision, and mlxsw reads it from the FW. In order to communicate both the number of RIFs that there can be, and how many are taken now (i.e. occupancy), introduce a corresponding devlink resource. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of tunnel events to router codePetr Machata
The events related to IPIP tunnels are handled by the router code. Move the handling from the central dispatcher in spectrum.c to the new notifier handler in the router module. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of router events to router codePetr Machata
The events NETDEV_PRE_CHANGEADDR, NETDEV_CHANGEADDR and NETDEV_CHANGEMTU have implications for in-ASIC router interface objects, and as such are handled in the router module. Move the handling from the central dispatcher in spectrum.c to the new notifier handler in the router module. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-08mlxsw: spectrum: Move handling of VRF events to router codePetr Machata
Events involving VRF, as L3 concern, are handled in the router code, by the helper mlxsw_sp_netdevice_vrf_event(). The handler is currently invoked from the centralized dispatcher in spectrum.c. Instead, move the call to the newly-introduced router-specific notifier handler. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: spectrum: Add port to linecard mappingJiri Pirko
For each port get slot_index using PMLP register. For ports residing on a linecard, identify it with the linecard by setting mapping using devlink_port_linecard_set() helper. Use linecard slot index for PMTDB register queries. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>