summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)Author
2023-10-03ice: make use of DEFINE_FLEX() in ice_ddp.cPrzemek Kitszel
Use DEFINE_FLEX() macro for constant-num-of-elems (4) flex array members of ice_ddp.c Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20230912115937.1645707-5-przemyslaw.kitszel@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03ice: drop two params of ice_aq_move_sched_elems()Przemek Kitszel
Remove two arguments of ice_aq_move_sched_elems(). Last of them was always NULL, and @grps_req was always 1. Assuming @grps_req to be one, allows us to use DEFINE_FLEX() macro, what removes some need for heap allocations. Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20230912115937.1645707-4-przemyslaw.kitszel@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03ice: ice_sched_remove_elems: replace 1 elem array param by u32Przemek Kitszel
Replace array+size params of ice_sched_remove_elems:() by just single u32, as all callers are using it with "1". This enables moving from heap-based, to stack-based allocation, what is also more elegant thanks to DEFINE_FLEX() macro. Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20230912115937.1645707-3-przemyslaw.kitszel@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03net: Tree wide: Replace xdp_do_flush_map() with xdp_do_flush().Sebastian Andrzej Siewior
xdp_do_flush_map() is deprecated and new code should use xdp_do_flush() instead. Replace xdp_do_flush_map() with xdp_do_flush(). Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Clark Wang <xiaoning.wang@nxp.com> Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: David Arinzon <darinzon@amazon.com> Cc: Edward Cree <ecree.xilinx@gmail.com> Cc: Felix Fietkau <nbd@nbd.name> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Jassi Brar <jaswinder.singh@linaro.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: John Crispin <john@phrozen.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Lorenzo Bianconi <lorenzo@kernel.org> Cc: Louis Peens <louis.peens@corigine.com> Cc: Marcin Wojtas <mw@semihalf.com> Cc: Mark Lee <Mark-MC.Lee@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: Noam Dagan <ndagan@amazon.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Saeed Bishara <saeedb@amazon.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: Shay Agroskin <shayagr@amazon.com> Cc: Shenwei Wang <shenwei.wang@nxp.com> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: Vladimir Oltean <vladimir.oltean@nxp.com> Cc: Wei Fang <wei.fang@nxp.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Arthur Kiyanovski <akiyano@amazon.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230908143215.869913-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-30Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: add PTP auxiliary bus support Michal Michalik says: Auxiliary bus allows exchanging information between PFs, which allows both fixing problems and simplifying new features implementation. The auxiliary bus is enabled for all devices supported by ice driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-28ice: always add legacy 32byte RXDID in supported_rxdidsMichal Schmidt
When the PF and VF drivers both support flexible rx descriptors and have negotiated the VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC capability, the VF driver queries the PF for the list of supported descriptor formats (VIRTCHNL_OP_GET_SUPPORTED_RXDIDS). The PF driver is supposed to set the supported_rxdids bits that correspond to the descriptor formats the firmware implements. The legacy 32-byte rx desc format is always supported, even though it is not expressed in GLFLXP_RXDID_FLAGS. The ice driver does not advertise the legacy 32-byte rx desc support, which leads to this failure to bring up the VF using the Intel out-of-tree iavf driver: iavf 0000:41:01.0: PF does not list support for default Rx descriptor format ... iavf 0000:41:01.0: PF returned error -5 (VIRTCHNL_STATUS_ERR_PARAM) to our request 6 The in-tree iavf driver does not expose this bug, because it does not yet implement VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. The ice driver must always set the ICE_RXDID_LEGACY_1 bit in supported_rxdids. The Intel out-of-tree ice driver and the ice driver in DPDK both do this. I copied this piece of the code and the comment text from the Intel out-of-tree driver. Fixes: e753df8fbca5 ("ice: Add support Flex RXD") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20230920115439.61172-1-mschmidt@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-28ethernet/intel: Use list_for_each_entry() helperJinjie Ruan
Convert list_for_each() to list_for_each_entry() where applicable. No functional changed. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230919170409.1581074-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21igc: Expose tx-usecs coalesce setting to userMuhammad Husaini Zulkifli
When users attempt to obtain the coalesce setting using the ethtool command, current code always returns 0 for tx-usecs. This is because I225/6 always uses a queue pair setting, hence tx_coalesce_usecs does not return a value during the igc_ethtool_get_coalesce() callback process. The pair queue condition checking in igc_ethtool_get_coalesce() is removed by this patch so that the user gets information of the value of tx-usecs. Even if i225/6 is using queue pair setting, there is no harm in notifying the user of the tx-usecs. The implementation of the current code may have previously been a copy of the legacy code i210. Since I225 has the queue pair setting enabled, tx-usecs will always adhere to the user-set rx-usecs value. An error message will appear when the user attempts to set the tx-usecs value for the input parameters because, by default, they should only set the rx-usecs value. This patch also adds the helper function to get the previous rx coalesce value similar to tx coalesce. How to test: User can get the coalesce value using ethtool command. Example command: Get: ethtool -c <interface> Previous output: rx-usecs: 3 rx-frames: n/a rx-usecs-irq: n/a rx-frames-irq: n/a tx-usecs: 0 tx-frames: n/a tx-usecs-irq: n/a tx-frames-irq: n/a New output: rx-usecs: 3 rx-frames: n/a rx-usecs-irq: n/a rx-frames-irq: n/a tx-usecs: 3 tx-frames: n/a tx-usecs-irq: n/a tx-frames-irq: n/a Fixes: 8c5ad0dae93c ("igc: Add ethtool support") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230919170331.1581031-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-20ice: Remove the FW shared parametersMichal Michalik
The only feature using the Firmware (FW) shared parameters was the PTP clock ID. Since this ID is now shared using auxiliary buss - remove the FW shared parameters from the code. Signed-off-by: Michal Michalik <michal.michalik@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-20ice: PTP: add clock domain number to auxiliary interfaceMichal Michalik
The PHC clock id used to be moved between PFs using FW admin queue shared parameters - move the implementation to auxiliary bus. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-20ice: Use PTP auxbus for all PHYs restart in E822Michal Michalik
The E822 (and other devices based on the same PHY) is having issue while setting the PHC timer - the PHY timers are drifting from the PHC. After such a set all PHYs need to be restarted and resynchronised - do it using auxiliary bus. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-20ice: Auxbus devices & driver for E822 TSMichal Michalik
There is a problem in HW in E822-based devices leading to race condition. It might happen that, in order: - PF0 (which owns the PHC) requests few timestamps, - PF1 requests a timestamp, - interrupt is being triggered and both PF0 and PF1 threads are woken up, - PF0 got one timestamp, still waiting for others so not going to sleep, - PF1 gets it's timestamp, process it and go to sleep, - PF1 requests a timestamp again, - just before PF0 goes to sleep timestamp of PF1 appear, - PF0 finishes all it's timestamps and go to sleep (PF1 also sleeping). That leaves PF1 timestamp memory not read, which lead to blocking the next interrupt from arriving. Fix it by adding auxiliary devices and only one driver to handle all the timestamps for all PF's by PHC owner. In the past each PF requested it's own timestamps and process it from the start till the end which causes problem described above. Currently each PF requests the timestamps as before, but the actual reading of the completed timestamps is being done by the PTP auxiliary driver, which is registered by the PF which owns PHC. Additionally, the newly introduced auxiliary driver/devices for PTP clock owner will be used for other features in all products (including E810). Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: check netlist before enabling ICE_F_GNSSJacob Keller
Similar to the change made for ICE_F_SMA_CTRL, check the netlist before enabling support for ICE_F_GNSS. This ensures that the driver only enables the GNSS feature on devices which actually have the feature enabled in the firmware device configuration. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: introduce ice_pf_src_tmr_ownedJacob Keller
Add ice_pf_src_tmr_owned() macro to check the function capability bit indicating if the current function owns the PTP hardware clock. This is slightly shorter than the more verbose access via hw.func_caps.ts_func_info.src_tmr_owned. Use this where possible rather than open coding its equivalent. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: fix pin assignment for E810-T without SMA controlJacob Keller
Since commit 43c4958a3ddb ("ice: Merge pin initialization of E810 and E810T adapters"), the ice_ptp_setup_pins_e810() function has been used for both E810 and E810-T devices. The new implementation only distinguishes between whether the device has SMA control or not. It was assumed this is always true for E810-T devices. In addition, it does not set the n_per_out value appropriately when SMA control is enabled. In some cases, the E810-T device may not have access to SMA control. In that case, the E810-T device actually has access to fewer pins than a standard E810 device. Fix the implementation to correctly assign the appropriate pin counts for E810-T devices both with and without SMA control. The mentioned commit already includes the appropriate macro values for these pin counts but they were unused. Instead of assigning the default E810 values and then overwriting them, handle the cases separately in order of E810-T with SMA, E810-T without SMA, and then standard E810. This flow makes following the logic easier. Fixes: 43c4958a3ddb ("ice: Merge pin initialization of E810 and E810T adapters") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: remove ICE_F_PTP_EXTTS feature flagJacob Keller
The ICE_F_PTP_EXTTS feature flag is ostensibly intended to support checking whether the device supports external timestamp pins. It is only checked in E810-specific code flows, and is enabled for all E810-based devices. E822 and E823 flows unconditionally enable external timestamp support. This makes the feature flag meaningless, as it is always enabled. Just unconditionally enable support for external timestamp pins and remove this unnecessary flag. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: PTP: move quad value check inside ice_fill_phy_msg_e822Karol Kolacinski
The callers of ice_fill_phy_msg_e822 check for whether the quad number is within the expected range. Move this check inside the ice_fill_phy_msg_e822 function instead of duplicating it twice. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: PTP: Rename macros used for PHY/QUAD port definitionsKarol Kolacinski
The ice_fill_phy_msg_e822 function uses several macros to specify the correct address when sending a sideband message to the PHY block in hardware. The names of these macros are fairly generic and confusing. Future development is going to extend the driver to support new hardware families which have different relationships between PHY and QUAD. Rename the macros for clarity and to indicate that they are E822 specific. This also matches closer to the hardware specification in the data sheet. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: PTP: Clean up timestamp registers correctlyKarol Kolacinski
E822 PHY TS registers should not be written and the only way to clean up them is to reset QUAD memory. To ensure that the status bit for the timestamp index is cleared, ensure that ice_clear_phy_tstamp implementations first read the timestamp out. Implementations which can write the register continue to do so. Add a note to indicate this function should only be called on timestamps which have their valid bit set. Update the dynamic debug messages to reflect the actual action taken. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: introduce hw->phy_model for handling PTP PHY differencesJacob Keller
The ice driver has PTP support which works across a couple of different device families. The device families each have different PHY hardware which have unique requirements for programming. Today, there is E810-based hardware, and E822-based hardware. To handle this, the driver checks the ice_is_e810() function to separate between the two existing families of hardware. Future development is going to add new hardware designs which have further unique requirements. To make this easier, introduce a phy_model field to the HW structure. This field represents what PHY model the current device has, and is used to allow distinguishing which logic a particular device needs. This will make supporting future upcoming hardware easier, by providing an obvious place to initialize the PHY model, and by already using switch/case statements instead of the previous if statements. Astute reviewers may notice that there are a handful of remaining checks for ice_is_e810() left in ice_ptp.c These conflict with some other cleanup patches in development, and will be fixed in the near future. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: Support cross-timestamping for E823 devicesJacob Keller
The E822 hardware has cross timestamping support using a device feature termed "Hammock Harbor" by the data sheet. This device feature is similar to PCIe PTM, and captures the Always Running Timer (ART) simultaneously with the PTP hardware clock time. This functionality also exists on E823 devices, but is not currently enabled. Rename the cross-timestamp functions to use the _e82x postfix, indicating that the support works across the E82x family of devices and not just the E822 hardware. The flow for capturing a cross-timestamp requires an additional step on E823 devices. The GLTSYN_CMD register must be programmed with the READ_TIME command. Otherwise, the cross timestamp will always report a value of zero for the PTP hardware clock time. To fix this, call ice_ptp_src_cmd() prior to initiating the cross timestamp logic. Once the cross timestamp has completed, call ice_ptp_src_cmd() with ICE_PTP_OP to ensure that the timer command registers are cleared. Co-developed-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: retry acquiring hardware semaphore during cross-timestamp requestKarol Kolacinski
The hardware for performing a cross-timestamp on E822 uses a hardware semaphore which we must acquire before initiating the cross-timestamp operation. The current implementation only attempts to acquire the semaphore once, and assumes that it will succeed. If the semaphore is busy for any reason, the cross-timestamp operation fails with -EFAULT. Instead of immediately failing, try the acquire the lock a few times with a small sleep between attempts. This ensures that most requests will go through without issue. Additionally, return -EBUSY instead of -EFAULT if the operation can't continue due to the semaphore being busy. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-18ice: prefix clock timer command enumeration values with ICE_PTPSergey Temerkhanov
The ice driver has an enumeration for the various commands that can be programmed to the MAC and PHY for setting up hardware clock operations. Prefix these with ICE_PTP so that they are clearly namespaced to the ice driver. Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-17Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-David S. Miller
queue Tony Nguyen says: ==================== This series contains updates to iavf and i40e drivers. Radoslaw prevents admin queue operations being added when the driver is being removed for iavf. Petr Oros immediately starts reconfiguration on changes to VLANs on iavf. Ivan Vecera moves reset of VF to occur after port VLAN values are set on i40e. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17ice: implement dpll interface to control cguArkadiusz Kubalewski
Control over clock generation unit is required for further development of Synchronous Ethernet feature. Interface provides ability to obtain current state of a dpll, its sources and outputs which are pins, and allows their configuration. Co-developed-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Co-developed-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17ice: add admin commands to access cgu configurationArkadiusz Kubalewski
Add firmware admin command to access clock generation unit configuration, it is required to enable Extended PTP and SyncE features in the driver. Add definitions of possible hardware variations of input and output pins related to clock generation unit and functions to access the data. Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nextDavid S. Miller
-queue Tony Nguyen says: ==================== Support rx-fcs on/off for VFs Ahmed Zaki says: Allow the user to turn on/off the CRC/FCS stripping through ethtool. We first add the CRC offload capability in the virtchannel, then the feature is enabled in ice and iavf drivers. We make sure that the netdev features are fixed such that CRC stripping cannot be disabled if VLAN rx offload (VLAN strip) is enabled. Also, VLAN stripping cannot be enabled unless CRC stripping is ON. Testing was done using tcpdump to make sure that the CRC is included in the frame after: # ethtool -K <interface> rx-fcs on and is not included when it is back "off". Also, ethtool should return an error for the above command if "rx-vlan-offload" is already on and at least one VLAN interface/filter exists on the VF. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16igc: Fix infinite initialization loop with early XDP redirectVinicius Costa Gomes
When an XDP redirect happens before the link is ready, that transmission will not finish and will timeout, causing an adapter reset. If the redirects do not stop, the adapter will not stop resetting. Wait for the driver to signal that there's a carrier before allowing transmissions to proceed. Previous code was relying that when __IGC_DOWN is cleared, the NIC is ready to transmit as all the queues are ready, what happens is that the carrier presence will only be signaled later, after the watchdog workqueue has a chance to run. And during this interval (between clearing __IGC_DOWN and the watchdog running) if any transmission happens the timeout is emitted (detected by igc_tx_timeout()) which causes the reset, with the potential for the infinite loop. Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action") Reported-by: Ferenc Fejes <ferenc.fejes@ericsson.com> Closes: https://lore.kernel.org/netdev/0caf33cf6adb3a5bf137eeaa20e89b167c9986d5.camel@ericsson.com/ Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Ferenc Fejes <ferenc.fejes@ericsson.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-16Merge branch '200GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Introduce Intel IDPF driver Pavan Kumar Linga says: This patch series introduces the Intel Infrastructure Data Path Function (IDPF) driver. It is used for both physical and virtual functions. Except for some of the device operations the rest of the functionality is the same for both PF and VF. IDPF uses virtchnl version2 opcodes and structures defined in the virtchnl2 header file which helps the driver to learn the capabilities and register offsets from the device Control Plane (CP) instead of assuming the default values. The format of the series follows the driver init flow to interface open. To start with, probe gets called and kicks off the driver initialization by spawning the 'vc_event_task' work queue which in turn calls the 'hard reset' function. As part of that, the mailbox is initialized which is used to send/receive the virtchnl messages to/from the CP. Once that is done, 'core init' kicks in which requests all the required global resources from the CP and spawns the 'init_task' work queue to create the vports. Based on the capability information received, the driver creates the said number of vports (one or many) where each vport is associated to a netdev. Also, each vport has its own resources such as queues, vectors etc. From there, rest of the netdev_ops and data path are added. IDPF implements both single queue which is traditional queueing model as well as split queue model. In split queue model, it uses separate queue for both completion descriptors and buffers which helps to implement out-of-order completions. It also helps to implement asymmetric queues, for example multiple RX completion queues can be processed by a single RX buffer queue and multiple TX buffer queues can be processed by a single TX completion queue. In single queue model, same queue is used for both descriptor completions as well as buffer completions. It also supports features such as generic checksum offload, generic receive offload (hardware GRO) etc. --- v7: Patch 2: * removed pci_[disable|enable]_pcie_error_reporting as they are dropped from the core Patch 4, 9: * used 'kasprintf' instead of 'snprintf' to avoid providing explicit character string size which also fixes "-Wformat-truncation" warnings Patch 14: * used 'ethtool_sprintf' instead of 'snprintf' to avoid providing explicit character string size which also fixes "-Wformat-truncation" warning * add string format argument to the 'ethtool_sprintf' to avoid warning on "-Wformat-security" v6: https://lore.kernel.org/netdev/20230825235954.894050-1-pavan.kumar.linga@intel.com/ Note: 'Acked-by' was only added to patches 1, 2, 12 and not to the other patches because of the changes in v6 Patch 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15: * renamed 'reset_lock' to 'vport_ctrl_lock' to reflect the lock usage * to avoid defensive programming, used 'vport_ctrl_lock' for the user callbacks that access the 'vport' to prevent the hardware reset thread from releasing the 'vport', when the user callback is in progress * added some variables to netdev private structure to avoid vport access if possible from ethtool and ndo callbacks * moved 'mac_filter_list_lock' and MAC related flags to vport_config structure and refactored mac filter flow to handle asynchronous ndo mac filter callbacks * stop the queues before starting the reset flow to avoid TX hangs * removed 'sw_mutex' and 'stop_mutex' as they are not needed anymore * added missing clear bit in 'init_task' error path * renamed labels appropriately Patch 8: * replaced page_pool_put_page with page_pool_put_full_page * for the page pool max_len, used PAGE_SIZE Patch 10, 11, 13: * made use of the 'netif_txq_maybe_stop', '__netif_txq_completed_wake' helper macros Patch 13: * removed IDPF_HR_RESET_IN_PROG flag check in idpf_tx_singleq_start as it is defensive Patch 14: * removed max descriptor check as the core does that * removed unnecessary error messages * removed the stats that are common between the ones reported by ethtool and ip link * replaced snprintf with ethtool_sprintf * added a comment to explain the reason for the max queue check * as the netdev queues are set on alloc, there is no need to set them again on reset unless there is a queue change, so move the 'idpf_set_real_num_queues' to 'idpf_initiate_soft_reset' Patch 15: * reworded the 'configure SRIOV' in the commit message v5: https://lore.kernel.org/netdev/20230816004305.216136-1-anthony.l.nguyen@intel.com/ Most Patches: * wrapped line limit to 80 chars to those which don't effect readability Patch 12: * in skb_add_rx_frag, offset 'headlen' w.r.t page_offset when adding a frag to avoid adding the header again Patch 14: * added NULL check for 'rxq' when dereferencing it in page_pool_get_stats v4: https://lore.kernel.org/netdev/20230808003416.3805142-1-anthony.l.nguyen@intel.com/ Patch 1: * s/virtcnl/virtchnl * removed the kernel doc for the error code definitions that don't exist * reworded the summary part in the virtchnl2 header Patch 3: * don't set local variable to NULL on error * renamed sq_send_command_out label with err_unlock * don't use __GFP_ZERO in dma_alloc_coherent Patch 4: * introduced mailbox workqueue to process mailbox interrupts Patch 3, 4, 5, 6, 7, 8, 9, 11, 15: * removed unnecessary variable 0-init Patch 3, 5, 7, 8, 9, 15: * removed defensive programming checks wherever applicable * removed IDPF_CAP_FIELD_LAST as it can be treated as defensive programming Patch 3, 4, 5, 6, 7: * replaced IDPF_DFLT_MBX_BUF_SIZE with IDPF_CTLQ_MAX_BUF_LEN Patch 2 to 15: * add kernel-doc for idpf.h and idpf_txrx.h enums and structures Patch 4, 5, 15: * adjusted the destroy sequence of the workqueues as per the alloc sequence Patch 4, 5, 9, 15: * scrub unnecessary flags in 'idpf_flags' - IDPF_REMOVE_IN_PROG flag can take care of the cases where IDPF_REL_RES_IN_PROG is used, removed the later one - IDPF_REQ_[TX|RX]_SPLITQ are replaced with struct variables - IDPF_CANCEL_[SERVICE|STATS]_TASK are redundant as the work queue doesn't get rescheduled again after 'cancel_delayed_work_sync' - IDPF_HR_CORE_RESET is removed as there is no set_bit for this flag - IDPF_MB_INTR_TRIGGER is removed as it is not needed anymore with the mailbox workqueue implementation Patch 7 to 15: * replaced the custom buffer recycling code with page pool API * switched the header split buffer allocations from using a bunch of pages to using one large chunk of DMA memory * reordered some of the flows in vport_open to support page pool Patch 8, 12: * don't suppress the alloc errors by using __GFP_NOWARN Patch 9: * removed dyn_ctl_clrpba_m as it is not being used Patch 14: * introduced enum idpf_vport_reset_cause instead of using vport flags * introduced page pool stats v3: https://lore.kernel.org/netdev/20230616231341.2885622-1-anthony.l.nguyen@intel.com/ Patch 5: * instead of void, used 'struct virtchnl2_create_vport' type for vport_params_recvd and vport_params_reqd and removed the typecasting * used u16/u32 as needed instead of int for variables which cannot be negative and updated in all the places whereever applicable Patch 6: * changed the commit message to "add ptypes and MAC filter support" * used the sender Signed-off-by as the last tag on all the patches * removed unnecessary variables 0-init * instead of fixing the code in this commit, fixed it in the commit where the change was introduced first * moved get_type_info struct on to the stack instead of memory alloc * moved mutex_lock and ptype_info memory alloc outside while loop and adjusted the return flow * used 'break' instead of 'continue' in ptype id switch case v2: https://lore.kernel.org/netdev/20230614171428.1504179-1-anthony.l.nguyen@intel.com/ Patch 2: * added "Intel(R)" to the DRV_SUMMARY and Makefile. Patch 4, 5, 6, 15: * replaced IDPF_VC_MSG_PENDING flag with mutex 'vc_buf_lock' for the adapter related virtchnl opcodes. * get the mutex lock in the virtchnl send thread itself instead of in receive thread. Patch 5, 6, 7, 8, 9, 11, 14, 15: * replaced IDPF_VPORT_VC_MSG_PENDING flag with mutex 'vc_buf_lock' for the vport related virtchnl opcodes. * get the mutex lock in the virtchnl send thread itself instead of in receive thread. Patch 6: * converted get_ptype_info logic from 1:N to 1:1 message exchange for better handling of mutex lock. Patch 15: * introduced 'stats_lock' spinlock to avoid concurrent stats update. v1: https://lore.kernel.org/netdev/20230530234501.2680230-1-anthony.l.nguyen@intel.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15i40e: Fix VF VLAN offloading when port VLAN is configuredIvan Vecera
If port VLAN is configured on a VF then any other VLANs on top of this VF are broken. During i40e_ndo_set_vf_port_vlan() call the i40e driver reset the VF and iavf driver asks PF (using VIRTCHNL_OP_GET_VF_RESOURCES) for VF capabilities but this reset occurs too early, prior setting of vf->info.pvid field and because this field can be zero during i40e_vc_get_vf_resources_msg() then VIRTCHNL_VF_OFFLOAD_VLAN capability is reported to iavf driver. This is wrong because iavf driver should not report VLAN offloading capability when port VLAN is configured as i40e does not support QinQ offloading. Fix the issue by moving VF reset after setting of vf->port_vlan_id field. Without this patch: $ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs $ ip link set enp2s0f0 vf 0 vlan 3 $ ip link set enp2s0f0v0 up $ ip link add link enp2s0f0v0 name vlan4 type vlan id 4 $ ip link set vlan4 up ... $ ethtool -k enp2s0f0v0 | grep vlan-offload rx-vlan-offload: on tx-vlan-offload: on $ dmesg -l err | grep iavf [1292500.742914] iavf 0000:02:02.0: Failed to add VLAN filter, error IAVF_ERR_INVALID_QP_ID With this patch: $ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs $ ip link set enp2s0f0 vf 0 vlan 3 $ ip link set enp2s0f0v0 up $ ip link add link enp2s0f0v0 name vlan4 type vlan id 4 $ ip link set vlan4 up ... $ ethtool -k enp2s0f0v0 | grep vlan-offload rx-vlan-offload: off [requested on] tx-vlan-offload: off [requested on] $ dmesg -l err | grep iavf Fixes: f9b4b6278d51 ("i40e: Reset the VF upon conflicting VLAN configuration") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-15iavf: schedule a request immediately after add/delete vlanPetr Oros
When the iavf driver wants to reconfigure the VLAN filters (iavf_add_vlan, iavf_del_vlan), it sets a flag in aq_required: adapter->aq_required |= IAVF_FLAG_AQ_ADD_VLAN_FILTER; or: adapter->aq_required |= IAVF_FLAG_AQ_DEL_VLAN_FILTER; This is later processed by the watchdog_task, but it runs periodically every 2 seconds, so it can be a long time before it processes the request. In the worst case, the interface is unable to receive traffic for more than 2 seconds for no objective reason. Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Petr Oros <poros@redhat.com> Co-developed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-15iavf: add iavf_schedule_aq_request() helperPetr Oros
Add helper for set iavf aq request AVF_FLAG_AQ_* and immediately schedule watchdog_task. Helper will be used in cases where it is necessary to run aq requests asap Signed-off-by: Petr Oros <poros@redhat.com> Co-developed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Co-developed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-15iavf: do not process adminq tasks when __IAVF_IN_REMOVE_TASK is setRadoslaw Tyl
Prevent schedule operations for adminq during device remove and when __IAVF_IN_REMOVE_TASK flag is set. Currently, the iavf_down function adds operations for adminq that shouldn't be processed when the device is in the __IAVF_REMOVE state. Reproduction: echo 4 > /sys/bus/pci/devices/0000:17:00.0/sriov_numvfs ip link set dev ens1f0 vf 0 trust on ip link set dev ens1f0 vf 1 trust on ip link set dev ens1f0 vf 2 trust on ip link set dev ens1f0 vf 3 trust on ip link set dev ens1f0 vf 0 mac 00:22:33:44:55:66 ip link set dev ens1f0 vf 1 mac 00:22:33:44:55:67 ip link set dev ens1f0 vf 2 mac 00:22:33:44:55:68 ip link set dev ens1f0 vf 3 mac 00:22:33:44:55:69 echo 0000:17:02.0 > /sys/bus/pci/devices/0000\:17\:02.0/driver/unbind echo 0000:17:02.1 > /sys/bus/pci/devices/0000\:17\:02.1/driver/unbind echo 0000:17:02.2 > /sys/bus/pci/devices/0000\:17\:02.2/driver/unbind echo 0000:17:02.3 > /sys/bus/pci/devices/0000\:17\:02.3/driver/unbind sleep 10 echo 0000:17:02.0 > /sys/bus/pci/drivers/iavf/bind echo 0000:17:02.1 > /sys/bus/pci/drivers/iavf/bind echo 0000:17:02.2 > /sys/bus/pci/drivers/iavf/bind echo 0000:17:02.3 > /sys/bus/pci/drivers/iavf/bind modprobe vfio-pci echo 8086 154c > /sys/bus/pci/drivers/vfio-pci/new_id qemu-system-x86_64 -accel kvm -m 4096 -cpu host \ -drive file=centos9.qcow2,if=none,id=virtio-disk0 \ -device virtio-blk-pci,drive=virtio-disk0,bootindex=0 -smp 4 \ -device vfio-pci,host=17:02.0 -net none \ -device vfio-pci,host=17:02.1 -net none \ -device vfio-pci,host=17:02.2 -net none \ -device vfio-pci,host=17:02.3 -net none \ -daemonize -vnc :5 Current result: There is a probability that the mac of VF in guest is inconsistent with it in host Expected result: When passthrough NIC VF to guest, the VF in guest should always get the same mac as it in host. Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage") Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-13idpf: add SRIOV support and other ndo_opsJoshua Hay
Add support for SRIOV: send the requested number of VFs to the device Control Plane, via the virtchnl message and then enable the VFs using 'pci_enable_sriov'. Add other ndo ops supported by the driver such as features_check, set_rx_mode, validate_addr, set_mac_address, change_mtu, get_stats64, set_features, and tx_timeout. Initialize the statistics task which requests the queue related statistics to the CP. Add loopback and promiscuous mode support and the respective virtchnl messages. Finally, add documentation and build support for the driver. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add ethtool callbacksAlan Brady
Initialize all the ethtool ops that are supported by the driver and add the necessary support for the ethtool callbacks. Also add asynchronous link notification virtchnl support where the device Control Plane sends the link status and link speed as an asynchronous event message. Driver report the link speed on ethtool .idpf_get_link_ksettings query. Introduce soft reset function which is used by some of the ethtool callbacks such as .set_channels, .set_ringparam etc. to change the existing queue configuration. It deletes the existing queues by sending delete queues virtchnl message to the CP and calls the 'vport_stop' flow which disables the queues, vport etc. New set of queues are requested to the CP and reconfigure the queue context by calling the 'vport_open' flow. Soft reset flow also adjusts the number of vectors associated to a vport if .set_channels is called. Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add singleq start_xmit and napi pollJoshua Hay
Add the start_xmit, TX and RX napi poll support for the single queue model. Unlike split queue model, single queue uses same queue to post buffer descriptors and completed descriptors. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add RX splitq napi poll supportAlan Brady
Add support to handle interrupts for the RX completion queue and RX buffer queue. When the interrupt fires on RX completion queue, process the RX descriptors that are received. Allocate and prepare the SKB with the RX packet info, for both data and header buffer. IDPF uses software maintained refill queues to manage buffers between RX queue producer and the buffer queue consumer. They are required in order to maintain a lockless buffer management system and are strictly software only constructs. Instead of updating the RX buffer queue tail with available buffers right after the clean routine, it posts the buffer ids to the refill queues, only to post them to the HW later. If the generic receive offload (GRO) is enabled in the capabilities and turned on by default or via ethtool, then HW performs the packet coalescing if certain criteria are met by the incoming packets and updates the RX descriptor. Similar to GRO, if generic checksum is enabled, HW computes the checksum and updates the respective fields in the descriptor. Add support to update the SKB fields with the GRO and the generic checksum received. Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add TX splitq napi poll supportJoshua Hay
Add support to handle the interrupts for the TX completion queue and process the various completion types. In the flow scheduling mode, the driver processes primarily buffer completions as well as descriptor completions occasionally. This mode supports out of order TX completions. To do so, HW generates one buffer completion per packet. Each of those completions contains the unique tag provided during the TX encoding which is used to locate the packet either on the TX buffer ring or in a hash table. The hash table is used to track TX buffer information so the descriptor(s) for a given packet can be reused while the driver is still waiting on the buffer completion(s). Packets end up in the hash table in one of 2 ways: 1) a packet was stashed during descriptor completion cleaning, or 2) because an out of order buffer completion was processed. A descriptor completion arrives only every so often and is primarily used to guarantee the TX descriptor ring can be reused without having to wait on the individual buffer completions. E.g. a descriptor completion for N+16 guarantees HW read all of the descriptors for packets N through N+15, therefore all of the buffers for packets N through N+15 are stashed into the hash table and the descriptors can be reused for more TX packets. Similarly, a packet can be stashed in the hash table because an out an order buffer completion was processed. E.g. processing a buffer completion for packet N+3 implies that HW read all of the descriptors for packets N through N+3 and they can be reused. However, the HW did not do the DMA yet. The buffers for packets N through N+2 cannot be freed, so they are stashed in the hash table. In either case, the buffer completions will eventually be processed for all of the stashed packets, and all of the buffers will be cleaned from the hash table. In queue based scheduling mode, the driver processes primarily descriptor completions and cleans the TX ring the conventional way. Finally, the driver triggers a TX queue drain after sending the disable queues virtchnl message. When the HW completes the queue draining, it sends the driver a queue marker packet completion. The driver determines when all TX queues have been drained and proceeds with the disable flow. With this, the driver can send TX packets and clean up the resources properly. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add splitq start_xmitJoshua Hay
Add start_xmit support for split queue model. To start with, add the necessary checks to linearize the skb if it uses more number of buffers than the hardware supported limit. Stop the transmit queue if there are no enough descriptors available for the skb to use or if there we're going to potentially overrun the completion queue. Finally prepare the descriptor with all the required information and update the tail. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: initialize interrupts and enable vportPavan Kumar Linga
To further continue 'vport open', initialize all the resources required for the interrupts. To start with, initialize the queue vector indices with the ones received from the device Control Plane. Now that all the TX and RX queues are initialized, map the RX descriptor and buffer queues as well as TX completion queues to the allocated vectors. Initialize and enable the napi handler for the napi polling. Finally, request the IRQs for the interrupt vectors from the stack and setup the interrupt handler. Once the interrupt init is done, send 'map queue vector', 'enable queues' and 'enable vport' virtchnl messages to the CP to complete the 'vport open' flow. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: configure resources for RX queuesAlan Brady
Similar to the TX, RX also supports both single and split queue models. In single queue model, the same descriptor queue is used by SW to post buffer descriptors to HW and by HW to post completed descriptors to SW. In split queue model, "RX buffer queues" are used to pass descriptor buffers from SW to HW whereas "RX queues" are used to post the descriptor completions i.e. descriptors that point to completed buffers, from HW to SW. "RX queue group" is a set of RX queues grouped together and will be serviced by a "RX buffer queue group". IDPF supports 2 buffer queues i.e. large buffer (4KB) queue and small buffer (2KB) queue per buffer queue group. HW uses large buffers for 'hardware gro' feature and also if the packet size is more than 2KB, if not 2KB buffers are used. Add all the resources required for the RX queues initialization. Allocate memory for the RX queue and RX buffer queue groups. Initialize the software maintained refill queues for buffer management algorithm. Same like the TX queues, initialize the queue parameters for the RX queues and send the config RX queue virtchnl message to the device Control Plane. Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: configure resources for TX queuesAlan Brady
IDPF supports two queue models i.e. single queue which is a traditional queueing model as well as split queue model. In single queue model, the same descriptor queue is used by SW to post descriptors to the HW, HW to post completed descriptors to SW. In split queue model, "TX Queues" are used to pass buffers from SW to HW and "TX Completion Queues" are used to post descriptor completions from HW to SW. Device supports asymmetric ratio of TX queues to TX completion queues. Considering this, queue group mechanism is used i.e. some TX queues are grouped together which will be serviced by only one TX completion queue per TX queue group. Add all the resources required for the TX queues initialization. To start with, allocate memory for the TX queue groups, TX queues and TX completion queues. Then, allocate the descriptors for both TX and TX completion queues, and bookkeeping buffers for TX queues alone. Also, allocate queue vectors for the vport and initialize the TX queue related fields for each queue vector. Initialize the queue parameters such as q_id, q_type and tail register offset with the info received from the device control plane (CP). Once all the TX queues are configured, send config TX queue virtchnl message to the CP with all the TX queue context information. Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add ptypes and MAC filter supportPavan Kumar Linga
Add the virtchnl support to request the packet types. Parse the responses received from CP and based on the protocol headers, populate the packet type structure with necessary information. Initialize the MAC address and add the virtchnl support to add and del MAC address. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add create vport and netdev configurationPavan Kumar Linga
Add the required support to create a vport by spawning the init task. Once the vport is created, initialize and allocate the resources needed for it. Configure and register a netdev for each vport with all the features supported by the device based on the capabilities received from the device Control Plane. Spawn the init task till all the default vports are created. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add core init and interrupt requestPavan Kumar Linga
As the mailbox is setup, add the necessary send and receive mailbox message framework to support the virtchnl communication between the driver and device Control Plane (CP). Add the core initialization. To start with, driver confirms the virtchnl version with the CP. Once that is done, it requests and gets the required capabilities and resources needed such as max vectors, queues etc. Based on the vector information received in 'VIRTCHNL2_OP_GET_CAPS', request the stack to allocate the required vectors. Finally add the interrupt handling mechanism for the mailbox queue and enable the interrupt. Note: Checkpatch issues a warning about IDPF_FOREACH_VPORT_VC_STATE and IDPF_GEN_STRING being complex macros and should be enclosed in parentheses but it's not the case. They are never used as a statement and instead only used to define the enum and array. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add controlq init and reset checksJoshua Hay
At the end of the probe, initialize and schedule the event workqueue. It calls the hard reset function where reset checks are done to find if the device is out of the reset. Control queue initialization and the necessary control queue support is added. Introduce function pointers for the register operations which are different between PF and VF devices. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13idpf: add module register and probe functionalityPhani Burra
Add the required support to register IDPF PCI driver, as well as probe and remove call backs. Enable the PCI device and request the kernel to reserve the memory resources that will be used by the driver. Finally map the BAR0 address space. Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Signed-off-by: Shailendra Bhatnagar <shailendra.bhatnagar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-13virtchnl: add virtchnl version 2 opsPavan Kumar Linga
Virtchnl version 1 is an interface used by the current generation of foundational NICs to negotiate the capabilities and configure the HW resources such as queues, vectors, RSS LUT, etc between the PF and VF drivers. It is not extensible to enable new features supported in the next generation of NICs/IPUs and to negotiate descriptor types, packet types and register offsets. To overcome the limitations of the existing interface, introduce the virtchnl version 2 and add the necessary opcodes, structures, definitions, and descriptor formats. The driver also learns the data queue and other register offsets to use instead of hardcoding them. The advantage of this approach is that it gives the flexibility to modify the register offsets if needed, restrict the use of certain descriptor types and negotiate the supported packet types. Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Co-developed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>