summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)Author
2024-09-30ice: clear port vlan config during resetMichal Swiatkowski
Since commit 2a2cb4c6c181 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()") VF VSI is only reconfigured instead of recreated. The context configuration from previous setting is still the same. If any of the config needs to be cleared it needs to be cleared explicitly. Previously there was assumption that port vlan will be cleared automatically. Now, when VSI is only reconfigured we have to do it in the code. Not clearing port vlan configuration leads to situation when the driver VSI config is different than the VSI config in HW. Traffic can't be passed after setting and clearing port vlan, because of invalid VSI config in HW. Example reproduction: > ip a a dev $(VF) $(VF_IP_ADDRESS) > ip l s dev $(VF) up > ping $(VF_IP_ADDRESS) ping is working fine here > ip link set eth5 vf 0 vlan 100 > ip link set eth5 vf 0 vlan 0 > ping $(VF_IP_ADDRESS) ping isn't working Fixes: 2a2cb4c6c181 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Piotr Tyda <piotr.tyda@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-30ice: Fix improper handling of refcount in ice_sriov_set_msix_vec_count()Gui-Dong Han
This patch addresses an issue with improper reference count handling in the ice_sriov_set_msix_vec_count() function. First, the function calls ice_get_vf_by_id(), which increments the reference count of the vf pointer. If the subsequent call to ice_get_vf_vsi() fails, the function currently returns an error without decrementing the reference count of the vf pointer, leading to a reference count leak. The correct behavior, as implemented in this patch, is to decrement the reference count using ice_put_vf(vf) before returning an error when vsi is NULL. Second, the function calls ice_sriov_get_irqs(), which sets vf->first_vector_idx. If this call returns a negative value, indicating an error, the function returns an error without decrementing the reference count of the vf pointer, resulting in another reference count leak. The patch addresses this by adding a call to ice_put_vf(vf) before returning an error when vf->first_vector_idx < 0. This bug was identified by an experimental static analysis tool developed by our team. The tool specializes in analyzing reference count operations and identifying potential mismanagement of reference counts. In this case, the tool flagged the missing decrement operation as a potential issue, leading to this patch. Fixes: 4035c72dc1ba ("ice: reconfig host after changing MSI-X on VF") Fixes: 4d38cb44bd32 ("ice: manage VFs MSI-X using resource tracking") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@outlook.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-30ice: Fix improper handling of refcount in ice_dpll_init_rclk_pins()Gui-Dong Han
This patch addresses a reference count handling issue in the ice_dpll_init_rclk_pins() function. The function calls ice_dpll_get_pins(), which increments the reference count of the relevant resources. However, if the condition WARN_ON((!vsi || !vsi->netdev)) is met, the function currently returns an error without properly releasing the resources acquired by ice_dpll_get_pins(), leading to a reference count leak. To resolve this, the check has been moved to the top of the function. This ensures that the function verifies the state before any resources are acquired, avoiding the need for additional resource management in the error path. This bug was identified by an experimental static analysis tool developed by our team. The tool specializes in analyzing reference count operations and detecting potential issues where resources are not properly managed. In this case, the tool flagged the missing release operation as a potential problem, which led to the development of this patch. Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@outlook.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-30ice: set correct dst VSI in only LAN filtersMichal Swiatkowski
The filters set that will reproduce the problem: $ tc filter add dev $VF0_PR ingress protocol arp prio 0 flower \ skip_sw dst_mac ff:ff:ff:ff:ff:ff action mirred egress \ redirect dev $PF0 $ tc filter add dev $VF0_PR ingress protocol arp prio 0 flower \ skip_sw dst_mac ff:ff:ff:ff:ff:ff src_mac 52:54:00:00:00:10 \ action mirred egress mirror dev $VF1_PR Expected behaviour is to set all broadcast from VF0 to the LAN. If the src_mac match the value from filters, send packet to LAN and to VF1. In this case both LAN_EN and LB_EN flags in switch is set in case of packet matching both filters. As dst VSI for the only LAN enable bit is PF VSI, the packet is being seen on PF. To fix this change dst VSI to the source VSI. It will block receiving any packet even when LB_EN is set by switch, because local loopback is clear on VF VSI during normal operation. Side note: if the second filters action is redirect instead of mirror LAN_EN is clear, because switch is AND-ing LAN_EN from each matched filters and OR-ing LB_EN. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Fixes: 73b483b79029 ("ice: Manage act flags for switchdev offloads") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-15ice: Fix a NULL vs IS_ERR() check in probe()Dan Carpenter
The ice_allocate_sf() function returns error pointers on error. It doesn't return NULL. Update the check to match. Fixes: 177ef7f1e2a0 ("ice: base subfunction aux driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/6951d217-ac06-4482-a35d-15d757fd90a3@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-15ice: Fix a couple NULL vs IS_ERR() bugsDan Carpenter
The ice_repr_create() function returns error pointers. It never returns NULL. Fix the callers to check for IS_ERR(). Fixes: 977514fb0fa8 ("ice: create port representor for SF") Fixes: 415db8399d06 ("ice: make representor code generic") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/7f7aeb91-8771-47b8-9275-9d9f64f947dd@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts (sort of) and no adjacent changes. This merge reverts commit b3c9e65eb227 ("net: hsr: remove seqnr_lock") from net, as it was superseded by commit 430d67bdcb04 ("net: hsr: Use the seqnr lock for frames received via interlink port.") in net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-11Merge branch '200GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== idpf: XDP chapter II: convert Tx completion to libeth Alexander Lobakin says: XDP for idpf is currently 5 chapters: * convert Rx to libeth; * convert Tx completion to libeth (this); * generic XDP and XSk code changes; * actual XDP for idpf via libeth_xdp; * XSk for idpf (^). Part II does the following: * adds generic libeth Tx completion routines; * converts idpf to use generic libeth Tx comp routines; * fixes Tx queue timeouts and robustifies Tx completion in general; * fixes Tx event/descriptor flushes (writebacks). Most idpf patches again remove more lines than adds. Generic Tx completion helpers and structs are needed as libeth_xdp (Ch. III) makes use of them. WB_ON_ITR is needed since XDPSQs don't want to work without it at all. Tx queue timeouts fixes are needed since without them, it's way easier to catch a Tx timeout event when WB_ON_ITR is enabled. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: idpf: enable WB_ON_ITR idpf: fix netdev Tx queue stop/wake idpf: refactor Tx completion routines netdevice: add netdev_tx_reset_subqueue() shorthand idpf: convert to libeth Tx buffer completion libeth: add Tx buffer completion helpers ==================== Link: https://patch.msgid.link/20240909205323.3110312-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09idpf: enable WB_ON_ITRJoshua Hay
Tell hardware to write back completed descriptors even when interrupts are disabled. Otherwise, descriptors might not be written back until the hardware can flush a full cacheline of descriptors. This can cause unnecessary delays when traffic is light (or even trigger Tx queue timeout). The example scenario to reproduce the Tx timeout if the fix is not applied: - configure at least 2 Tx queues to be assigned to the same q_vector, - generate a huge Tx traffic on the first Tx queue - try to send a few packets using the second Tx queue. In such a case Tx timeout will appear on the second Tx queue because no completion descriptors are written back for that queue while interrupts are disabled due to NAPI polling. Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: fix netdev Tx queue stop/wakeMichal Kubiak
netif_txq_maybe_stop() returns -1, 0, or 1, while idpf_tx_maybe_stop_common() says it returns 0 or -EBUSY. As a result, there sometimes are Tx queue timeout warnings despite that the queue is empty or there is at least enough space to restart it. Make idpf_tx_maybe_stop_common() inline and returning true or false, handling the return of netif_txq_maybe_stop() properly. Use a correct goto in idpf_tx_maybe_stop_splitq() to avoid stopping the queue or incrementing the stops counter twice. Fixes: 6818c4d5b3c2 ("idpf: add splitq start_xmit") Fixes: a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll") Cc: stable@vger.kernel.org # 6.7+ Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: refactor Tx completion routinesJoshua Hay
Add a mechanism to guard against stashing partial packets into the hash table to make the driver more robust, with more efficient decision making when cleaning. Don't stash partial packets. This can happen when an RE (Report Event) completion is received in flow scheduling mode, or when an out of order RS (Report Status) completion is received. The first buffer with the skb is stashed, but some or all of its frags are not because the stack is out of reserve buffers. This leaves the ring in a weird state since the frags are still on the ring. Use the field libeth_sqe::nr_frags to track the number of fragments/tx_bufs representing the packet. The clean routines check to make sure there are enough reserve buffers on the stack before stashing any part of the packet. If there are not, next_to_clean is left pointing to the first buffer of the packet that failed to be stashed. This leaves the whole packet on the ring, and the next time around, cleaning will start from this packet. An RS completion is still expected for this packet in either case. So instead of being cleaned from the hash table, it will be cleaned from the ring directly. This should all still be fine since the DESC_UNUSED and BUFS_UNUSED will reflect the state of the ring. If we ever fall below the thresholds, the TxQ will still be stopped, giving the completion queue time to catch up. This may lead to stopping the queue more frequently, but it guarantees the Tx ring will always be in a good state. Also, always use the idpf_tx_splitq_clean function to clean descriptors, i.e. use it from clean_buf_ring as well. This way we avoid duplicating the logic and make sure we're using the same reserve buffers guard rail. This does require a switch from the s16 next_to_clean overflow descriptor ring wrap calculation to u16 and the normal ring size check. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09idpf: convert to libeth Tx buffer completionAlexander Lobakin
&idpf_tx_buffer is almost identical to the previous generations, as well as the way it's handled. Moreover, relying on dma_unmap_addr() and !!buf->skb instead of explicit defining of buffer's type was never good. Use the newly added libeth helpers to do it properly and reduce the copy-paste around the Tx code. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09igb: Always call igb_xdp_ring_update_tail() under Tx lockSriram Yagnaraman
Always call igb_xdp_ring_update_tail() under __netif_tx_lock, add a comment and lockdep assert to indicate that. This is needed to share the same TX ring between XDP, XSK and slow paths. Furthermore, the current XDP implementation is racy on tail updates. Fixes: 9cbc948b5a20 ("igb: add XDP support") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> [Kurt: Add lockdep assert and fixes tag] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09ice: fix VSI lists confusion when adding VLANsMichal Schmidt
The description of function ice_find_vsi_list_entry says: Search VSI list map with VSI count 1 However, since the blamed commit (see Fixes below), the function no longer checks vsi_count. This causes a problem in ice_add_vlan_internal, where the decision to share VSI lists between filter rules relies on the vsi_count of the found existing VSI list being 1. The reproducing steps: 1. Have a PF and two VFs. There will be a filter rule for VLAN 0, referring to a VSI list containing VSIs: 0 (PF), 2 (VF#0), 3 (VF#1). 2. Add VLAN 1234 to VF#0. ice will make the wrong decision to share the VSI list with the new rule. The wrong behavior may not be immediately apparent, but it can be observed with debug prints. 3. Add VLAN 1234 to VF#1. ice will unshare the VSI list for the VLAN 1234 rule. Due to the earlier bad decision, the newly created VSI list will contain VSIs 0 (PF) and 3 (VF#1), instead of expected 2 (VF#0) and 3 (VF#1). 4. Try pinging a network peer over the VLAN interface on VF#0. This fails. Reproducer script at: https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/test-vlan-vsi-list-confusion.sh Commented debug trace: https://gitlab.com/mschmidt2/repro/-/blob/master/RHEL-46814/ice-vlan-vsi-lists-debug.txt Patch adding the debug prints: https://gitlab.com/mschmidt2/linux/-/commit/f8a8814623944a45091a77c6094c40bfe726bfdb (Unsafe, by the way. Lacks rule_lock when dumping in ice_remove_vlan.) Michal Swiatkowski added to the explanation that the bug is caused by reusing a VSI list created for VLAN 0. All created VFs' VSIs are added to VLAN 0 filter. When a non-zero VLAN is created on a VF which is already in VLAN 0 (normal case), the VSI list from VLAN 0 is reused. It leads to a problem because all VFs (VSIs to be specific) that are subscribed to VLAN 0 will now receive a new VLAN tag traffic. This is one bug, another is the bug described above. Removing filters from one VF will remove VLAN filter from the previous VF. It happens a VF is reset. Example: - creation of 3 VFs - we have VSI list (used for VLAN 0) [0 (pf), 2 (vf1), 3 (vf2), 4 (vf3)] - we are adding VLAN 100 on VF1, we are reusing the previous list because 2 is there - VLAN traffic works fine, but VLAN 100 tagged traffic can be received on all VSIs from the list (for example broadcast or unicast) - trust is turning on VF2, VF2 is resetting, all filters from VF2 are removed; the VLAN 100 filter is also removed because 3 is on the list - VLAN traffic to VF1 isn't working anymore, there is a need to recreate VLAN interface to readd VLAN filter One thing I'm not certain about is the implications for the LAG feature, which is another caller of ice_find_vsi_list_entry. I don't have a LAG-capable card at hand to test. Fixes: 23ccae5ce15f ("ice: changes to the interface with the HW and FW for SRIOV_VF+LAG") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Dave Ertman <David.m.ertman@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09ice: stop calling pci_disable_device() as we use pcimPrzemek Kitszel
Our driver uses devres to manage resources, in particular we call pcim_enable_device(), what also means we express the intent to get automatic pci_disable_device() call at driver removal. Manual calls to pci_disable_device() misuse the API. Recent commit (see "Fixes" tag) has changed the removal action from conditional (silent ignore of double call to pci_disable_device()) to unconditional, but able to catch unwanted redundant calls; see cited "Fixes" commit for details. Since that, unloading the driver yields following warn+splat: [70633.628490] ice 0000:af:00.7: disabling already-disabled device [70633.628512] WARNING: CPU: 52 PID: 33890 at drivers/pci/pci.c:2250 pci_disable_device+0xf4/0x100 ... [70633.628744] ? pci_disable_device+0xf4/0x100 [70633.628752] release_nodes+0x4a/0x70 [70633.628759] devres_release_all+0x8b/0xc0 [70633.628768] device_unbind_cleanup+0xe/0x70 [70633.628774] device_release_driver_internal+0x208/0x250 [70633.628781] driver_detach+0x47/0x90 [70633.628786] bus_remove_driver+0x80/0x100 [70633.628791] pci_unregister_driver+0x2a/0xb0 [70633.628799] ice_module_exit+0x11/0x3a [ice] Note that this is the only Intel ethernet driver that needs such fix. Fixes: f748a07a0b64 ("PCI: Remove legacy pcim_release()") Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09ice: fix accounting for filters shared by multiple VSIsJacob Keller
When adding a switch filter (such as a MAC or VLAN filter), it is expected that the driver will detect the case where the filter already exists, and return -EEXIST. This is used by calling code such as ice_vc_add_mac_addr, and ice_vsi_add_vlan to avoid incrementing the accounting fields such as vsi->num_vlan or vf->num_mac. This logic works correctly for the case where only a single VSI has added a given switch filter. When a second VSI adds the same switch filter, the driver converts the existing filter from an ICE_FWD_TO_VSI filter into an ICE_FWD_TO_VSI_LIST filter. This saves switch resources, by ensuring that multiple VSIs can re-use the same filter. The ice_add_update_vsi_list() function is responsible for doing this conversion. When first converting a filter from the FWD_TO_VSI into FWD_TO_VSI_LIST, it checks if the VSI being added is the same as the existing rule's VSI. In such a case it returns -EEXIST. However, when the switch rule has already been converted to a FWD_TO_VSI_LIST, the logic is different. Adding a new VSI in this case just requires extending the VSI list entry. The logic for checking if the rule already exists in this case returns 0 instead of -EEXIST. This breaks the accounting logic mentioned above, so the counters for how many MAC and VLAN filters exist for a given VF or VSI no longer accurately reflect the actual count. This breaks other code which relies on these counts. In typical usage this primarily affects such filters generally shared by multiple VSIs such as VLAN 0, or broadcast and multicast MAC addresses. Fix this by correctly reporting -EEXIST in the case of adding the same VSI to a switch rule already converted to ICE_FWD_TO_VSI_LIST. Fixes: 9daf8208dd4d ("ice: Add support for switch filter programming") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-09ice: Fix lldp packets dropping after changing the number of channelsMartyna Szapar-Mudlaw
After vsi setup refactor commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") ice_cfg_sw_lldp function which removes rx rule directing LLDP packets to vsi is moved from ice_vsi_release to ice_vsi_decfg function. ice_vsi_decfg is used in more cases than just in vsi_release resulting in unnecessary removal of rx lldp packets handling switch rule. This leads to lldp packets being dropped after a change number of channels via ethtool. This patch moves ice_cfg_sw_lldp function that removes rx lldp sw rule back to ice_vsi_release function. Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reported-by: Matěj Grégr <mgregr@netx.as> Closes: https://lore.kernel.org/intel-wired-lan/1be45a76-90af-4813-824f-8398b69745a9@netx.as/T/#u Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: subfunction activation and base devlink opsPiotr Raczynski
Use previously implemented SF aux driver. It is probe during SF activation and remove after deactivation. Implement set/get hw_address and set/get state as basic devlink ops for subfunction. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: basic support for VLAN in subfunctionsMichal Swiatkowski
Implement add / delete vlan for subfunction type VSI. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: support subfunction devlink Tx topologyMichal Swiatkowski
Flow for creating Tx topology is the same as for VF port representors, but the devlink port is stored in different place (sf->devlink_port). When creating VF devlink lock isn't taken, when creating subfunction it is. Setting Tx topology function needs to take this lock, check if it was taken before to not do it twice. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: implement netdevice ops for SF representorMichal Swiatkowski
Subfunction port representor needs the basic netdevice ops to work correctly. Create them. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: check if SF is ready in ethtool opsMichal Swiatkowski
Now there is another type of port representor. Correct checking if parent device is ready to reflect also new PR type. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: don't set target VSI for subfunctionMichal Swiatkowski
Add check for subfunction before setting target VSI. It is needed for PF in switchdev mode but not for subfunction (even in switchdev mode). Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: create port representor for SFMichal Swiatkowski
Implement attaching and detaching SF port representor. It is done in the same way as the VF port representor. SF port representor is always added or removed with devlink lock taken. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: make representor code genericMichal Swiatkowski
Keep the same flow of port representor creation, but instead of general attach function create helpers for specific representor type. Store function pointer for add and remove representor. Type of port representor can be also known based on VSI type, but it is more clean to have it directly saved in port representor structure. Add devlink lock for whole port representor creation and destruction. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: implement netdev for subfunctionPiotr Raczynski
Configure netdevice for subfunction usecase. Mostly it is reusing ops from the PF netdevice. SF netdev is linked to devlink port registered after SF activation. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: base subfunction aux driverPiotr Raczynski
Implement subfunction driver. It is probe when subfunction port is activated. VSI is already created. During the probe VSI is being configured. MAC unicast and broadcast filter is added to allow traffic to pass. Store subfunction pointer in VSI struct. The same is done for VF pointer. Make union of subfunction and VF pointer as only one of them can be set with one VSI. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: allocate devlink for subfunctionPiotr Raczynski
Allocate devlink for subfunction instance. Create header file for subfunction device. Define subfunction device structure there as it is needed for devlink allocation. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: treat subfunction VSI the same as PF VSIMichal Swiatkowski
When subfunction VSI is open the same code as for PF VSI should be executed. Also when up is complete. Reflect that in code by adding subfunction VSI to consideration. In case of stopping, PF doesn't have additional tasks, so the same is with subfunction VSI. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: add basic devlink subfunctions supportPiotr Raczynski
Implement devlink port handlers responsible for ethernet type devlink subfunctions. Create subfunction devlink port and setup all resources needed for a subfunction netdev to operate. Configure new VSI for each new subfunction, initialize and configure interrupts and Tx/Rx resources. Set correct MAC filters and create new netdev. For now, subfunction is limited to only one Tx/Rx queue pair. Only allocate new subfunction VSI with devlink port new command. Allocate and free subfunction MSIX interrupt vectors using new API calls with pci_msix_alloc_irq_at and pci_msix_free_irq. Support both automatic and manual subfunction numbers. If no subfunction number is provided, use xa_alloc to pick a number automatically. This will find the first free index and use that as the number. This reduces burden on users in the simple case where a specific number is not required. It may also be slightly faster to check that a number exists since xarray lookup should be faster than a linear scan of the dyn_ports xarray. Co-developed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: export ice ndo_ops functionsPiotr Raczynski
Make some of the netdevice_ops functions visible from outside for another VSI type created netdev. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: add new VSI type for subfunctionsPiotr Raczynski
Add required plumbing for new VSI type dedicated to devlink subfunctions. Make sure that the vsi is properly configured and destroyed. Also allow loading XDP and AF_XDP sockets. The first implementation of devlink subfunctions supports only one Tx/Rx queue pair per given subfunction. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ixgbe: Remove setting of RX software timestampGal Pressman
The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-06igc: Remove setting of RX software timestampGal Pressman
The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-06igb: Remove setting of RX software timestampGal Pressman
The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-06ice: Remove setting of RX software timestampGal Pressman
The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-06i40e: Remove setting of RX software timestampGal Pressman
The responsibility for reporting of RX software timestamp has moved to the core layer (see __ethtool_get_ts_info()), remove usage from the device drivers. Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/phy/phy_device.c 2560db6ede1a ("net: phy: Fix missing of_node_put() for leds") 1dce520abd46 ("net: phy: Use for_each_available_child_of_node_scoped()") https://lore.kernel.org/20240904115823.74333648@canb.auug.org.au Adjacent changes: drivers/net/ethernet/xilinx/xilinx_axienet.h drivers/net/ethernet/xilinx/xilinx_axienet_main.c 858430db28a5 ("net: xilinx: axienet: Fix race in axienet_stop") 76abb5d675c4 ("net: xilinx: axienet: Add statistics support") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03Merge branch '1GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-08-30 (igc, e1000e, i40e) This series contains updates to igc, e1000e, and i40 drivers. Kurt Kanzenbach adds support for MQPRIO offloads and stops unintended, excess interrupts on igc. Sasha adds reporting of EEE (Energy Efficient Ethernet) ability and moves a register define to a better suited file for igc. Vitaly stops reporting errors on shutdown and suspend as they are not fatal for e1000e. Alex adds reporting of EEE to i40e. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: Add Energy Efficient Ethernet ability for X710 Base-T/KR/KX cards e1000e: avoid failing the system during pm_suspend igc: Move the MULTI GBT AN Control Register to _regs file igc: Add Energy Efficient Ethernet ability igc: Get rid of spurious interrupts igc: Add MQPRIO offload support ==================== Link: https://patch.msgid.link/20240830210451.2375215-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03ice: do not bring the VSI up, if it was down before the XDP setupLarysa Zaremba
After XDP configuration is completed, we bring the interface up unconditionally, regardless of its state before the call to .ndo_bpf(). Preserve the information whether the interface had to be brought down and later bring it up only in such case. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: remove ICE_CFG_BUSY locking from AF_XDP codeLarysa Zaremba
Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing, because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF state, not VSI one. Therefore it does not protect the queue pair from e.g. reset. Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: check ICE_VSI_DOWN under rtnl_lock when preparing for resetLarysa Zaremba
Consider the following scenario: .ndo_bpf() | ice_prepare_for_reset() | ________________________|_______________________________________| rtnl_lock() | | ice_down() | | | test_bit(ICE_VSI_DOWN) - true | | ice_dis_vsi() returns | ice_up() | | | proceeds to rebuild a running VSI | .ndo_bpf() is not the only rtnl-locked callback that toggles the interface to apply new configuration. Another example is .set_channels(). To avoid the race condition above, act only after reading ICE_VSI_DOWN under rtnl_lock. Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: check for XDP rings instead of bpf program when unconfiguringLarysa Zaremba
If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on VSI without applying new ring configuration. When unconfiguring the VSI, we can encounter the state in which there is an XDP program but no XDP rings to destroy or there will be XDP rings that need to be destroyed, but no XDP program to indicate their presence. When unconfiguring, rely on the presence of XDP rings rather then XDP program, as they better represent the current state that has to be destroyed. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: protect XDP configuration with a mutexLarysa Zaremba
The main threat to data consistency in ice_xdp() is a possible asynchronous PF reset. It can be triggered by a user or by TX timeout handler. XDP setup and PF reset code access the same resources in the following sections: * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked * ice_vsi_rebuild() for the PF VSI - not protected * ice_vsi_open() - already rtnl-locked With an unfortunate timing, such accesses can result in a crash such as the one below: [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18 [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14 [ +0.394718] ice 0000:b1:00.0: PTP reset successful [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ +0.000045] #PF: supervisor read access in kernel mode [ +0.000023] #PF: error_code(0x0000) - not-present page [ +0.000023] PGD 0 P4D 0 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000036] Workqueue: ice ice_service_task [ice] [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice] [...] [ +0.000013] Call Trace: [ +0.000016] <TASK> [ +0.000014] ? __die+0x1f/0x70 [ +0.000029] ? page_fault_oops+0x171/0x4f0 [ +0.000029] ? schedule+0x3b/0xd0 [ +0.000027] ? exc_page_fault+0x7b/0x180 [ +0.000022] ? asm_exc_page_fault+0x22/0x30 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice] [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice] [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice] [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice] [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [ +0.000145] ice_rebuild+0x18c/0x840 [ice] [ +0.000145] ? delay_tsc+0x4a/0xc0 [ +0.000022] ? delay_tsc+0x92/0xc0 [ +0.000020] ice_do_reset+0x140/0x180 [ice] [ +0.000886] ice_service_task+0x404/0x1030 [ice] [ +0.000824] process_one_work+0x171/0x340 [ +0.000685] worker_thread+0x277/0x3a0 [ +0.000675] ? preempt_count_add+0x6a/0xa0 [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50 [ +0.000679] ? __pfx_worker_thread+0x10/0x10 [ +0.000653] kthread+0xf0/0x120 [ +0.000635] ? __pfx_kthread+0x10/0x10 [ +0.000616] ret_from_fork+0x2d/0x50 [ +0.000612] ? __pfx_kthread+0x10/0x10 [ +0.000604] ret_from_fork_asm+0x1b/0x30 [ +0.000604] </TASK> The previous way of handling this through returning -EBUSY is not viable, particularly when destroying AF_XDP socket, because the kernel proceeds with removal anyway. There is plenty of code between those calls and there is no need to create a large critical section that covers all of them, same as there is no need to protect ice_vsi_rebuild() with rtnl_lock(). Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp(). Leaving unprotected sections in between would result in two states that have to be considered: 1. when the VSI is closed, but not yet rebuild 2. when VSI is already rebuild, but not yet open The latter case is actually already handled through !netif_running() case, we just need to adjust flag checking a little. The former one is not as trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of hardware interaction happens, this can make adding/deleting rings exit with an error. Luckily, VSI rebuild is pending and can apply new configuration for us in a managed fashion. Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to indicate that ice_xdp() can just hot-swap the program. Also, as ice_vsi_rebuild() flow is touched in this patch, make it more consistent by deconfiguring VSI when coalesce allocation fails. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: move netif_queue_set_napi to rtnl-protected sectionsLarysa Zaremba
Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is not rtnl-locked when called from the reset. This creates the need to take the rtnl_lock just for a single function and complicates the synchronization with .ndo_bpf. At the same time, there no actual need to fill napi-to-queue information at this exact point. Fill napi-to-queue information when opening the VSI and clear it when the VSI is being closed. Those routines are already rtnl-locked. Also, rewrite napi-to-queue assignment in a way that prevents inclusion of XDP queues, as this leads to out-of-bounds writes, such as one below. [ +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0 [ +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047 [ +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2 [ +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000003] Call Trace: [ +0.000003] <TASK> [ +0.000002] dump_stack_lvl+0x60/0x80 [ +0.000007] print_report+0xce/0x630 [ +0.000007] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000007] ? __virt_addr_valid+0x1c9/0x2c0 [ +0.000005] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000003] kasan_report+0xe9/0x120 [ +0.000004] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000004] netif_queue_set_napi+0x1c2/0x1e0 [ +0.000005] ice_vsi_close+0x161/0x670 [ice] [ +0.000114] ice_dis_vsi+0x22f/0x270 [ice] [ +0.000095] ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice] [ +0.000086] ice_prepare_for_reset+0x299/0x750 [ice] [ +0.000087] pci_dev_save_and_disable+0x82/0xd0 [ +0.000006] pci_reset_function+0x12d/0x230 [ +0.000004] reset_store+0xa0/0x100 [ +0.000006] ? __pfx_reset_store+0x10/0x10 [ +0.000002] ? __pfx_mutex_lock+0x10/0x10 [ +0.000004] ? __check_object_size+0x4c1/0x640 [ +0.000007] kernfs_fop_write_iter+0x30b/0x4a0 [ +0.000006] vfs_write+0x5d6/0xdf0 [ +0.000005] ? fd_install+0x180/0x350 [ +0.000005] ? __pfx_vfs_write+0x10/0xA10 [ +0.000004] ? do_fcntl+0x52c/0xcd0 [ +0.000004] ? kasan_save_track+0x13/0x60 [ +0.000003] ? kasan_save_free_info+0x37/0x60 [ +0.000006] ksys_write+0xfa/0x1d0 [ +0.000003] ? __pfx_ksys_write+0x10/0x10 [ +0.000002] ? __x64_sys_fcntl+0x121/0x180 [ +0.000004] ? _raw_spin_lock+0x87/0xe0 [ +0.000005] do_syscall_64+0x80/0x170 [ +0.000007] ? _raw_spin_lock+0x87/0xe0 [ +0.000004] ? __pfx__raw_spin_lock+0x10/0x10 [ +0.000003] ? file_close_fd_locked+0x167/0x230 [ +0.000005] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000005] ? do_syscall_64+0x8c/0x170 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? fput+0x1a/0x2c0 [ +0.000004] ? filp_close+0x19/0x30 [ +0.000004] ? do_dup2+0x25a/0x4c0 [ +0.000004] ? __x64_sys_dup2+0x6e/0x2e0 [ +0.000002] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? __count_memcg_events+0x113/0x380 [ +0.000005] ? handle_mm_fault+0x136/0x820 [ +0.000005] ? do_user_addr_fault+0x444/0xa80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000002] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7f2033593154 Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios") Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03netdev_features: convert NETIF_F_FCOE_MTU to dev->fcoe_mtuAlexander Lobakin
Ability to handle maximum FCoE frames of 2158 bytes can never be changed and thus more of an attribute, not a toggleable feature. Move it from netdev_features_t to "cold" priv flags (bitfield bool) and free yet another feature bit. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-02igc: Unlock on error in igc_io_resume()Dan Carpenter
Call rtnl_unlock() on this error path, before returning. Fixes: bc23aa949aeb ("igc: Add pcie error handler support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-30i40e: Add Energy Efficient Ethernet ability for X710 Base-T/KR/KX cardsAleksandr Loktionov
Add "EEE: Enabled/Disabled" to dmesg for supported X710 Base-T/KR/KX cards. According to the IEEE standard report the EEE ability and the EEE Link Partner ability. Use the kernel's 'ethtool_keee' structure and report EEE link modes. Example: dmesg | grep 'NIC Link is' ethtool --show-eee <device> Before: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None Supported EEE link modes: Not reported Advertised EEE link modes: Not reported Link partner advertised EEE link modes: Not reported After: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None, EEE: Enabled Supported EEE link modes: 100baseT/Full 1000baseT/Full 10000baseT/Full Advertised EEE link modes: 100baseT/Full 1000baseT/Full 10000baseT/Full Link partner advertised EEE link modes: 100baseT/Full 1000baseT/Full 10000baseT/Full Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-30e1000e: avoid failing the system during pm_suspendVitaly Lifshits
Occasionally when the system goes into pm_suspend, the suspend might fail due to a PHY access error on the network adapter. Previously, this would have caused the whole system to fail to go to a low power state. An example of this was reported in the following Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205015 [ 1663.694828] e1000e 0000:00:19.0 eth0: Failed to disable ULP [ 1664.731040] asix 2-3:1.0 eth1: link up, 100Mbps, full-duplex, lpa 0xC1E1 [ 1665.093513] e1000e 0000:00:19.0 eth0: Hardware Error [ 1665.596760] e1000e 0000:00:19.0: pci_pm_resume+0x0/0x80 returned 0 after 2975399 usecs and then the system never recovers from it, and all the following suspend failed due to this [22909.393854] PM: pci_pm_suspend(): e1000e_pm_suspend+0x0/0x760 [e1000e] returns -2 [22909.393858] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -2 [22909.393861] PM: Device 0000:00:1f.6 failed to suspend async: error -2 This can be avoided by changing the return values of __e1000_shutdown and e1000e_pm_suspend functions so that they always return 0 (success). This is consistent with what other drivers do. If the e1000e driver encounters a hardware error during suspend, potential side effects include slightly higher power draw or non-working wake on LAN. This is preferred to a system-level suspend failure, and a warning message is written to the system log, so that the user can be aware that the LAN controller experienced a problem during suspend. Link: https://bugzilla.kernel.org/show_bug.cgi?id=205015 Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-30igc: Move the MULTI GBT AN Control Register to _regs fileSasha Neftin
MULTI GBT AN Control Register is IEEE Standard Register 7.32 (not a mask). The right place should be in igc_reg.h file. In accordance with the registers naming convention added IGC_' prefix. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>