summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-13net: netconsole: Unify Function Return PathsBreno Leitao
The return flow in netconsole's dynamic functions is currently inconsistent. This patch aims to streamline and standardize the process by ensuring that the mutex is unlocked before returning the ret value. Additionally, this update includes a minor functional change where certain strnlen() operations are performed with the dynamic_netconsole_mutex locked. This adjustment is not anticipated to cause any issues, however, it is crucial to document this change for clarity. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: netconsole: Standardize variable namingBreno Leitao
Update variable names from err to ret in cases where the variable may return non-error values. This change facilitates a forthcoming patch that relies on ret being used consistently to handle return values, regardless of whether they indicate an error or not. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: netconsole: Correct mismatched return typesBreno Leitao
netconsole incorrectly mixes int and ssize_t types by using int for return variables in functions that should return ssize_t. This is fixed by updating the return variables to the appropriate ssize_t type, ensuring consistency across the function definitions. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: netpoll: extract core of netpoll_cleanupBreno Leitao
Extract the core part of netpoll_cleanup(), so, it could be called from a caller that has the rtnl lock already. Netconsole uses this in a weird way right now: __netpoll_cleanup(&nt->np); spin_lock_irqsave(&target_list_lock, flags); netdev_put(nt->np.dev, &nt->np.dev_tracker); nt->np.dev = NULL; nt->enabled = false; This will be replaced by do_netpoll_cleanup() as the locking situation is overhauled. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Rik van Riel <riel@surriel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13Merge branch 'stmmac-add-loongson-platform-support'Paolo Abeni
Yanteng Si says: ==================== stmmac: Add Loongson platform support v17: * As Serge's comments: Add return 0 for _dt_config(). Get back the conditional MSI-clear method execution. v16: * As Serge's comments: Move the of_node_put(plat->mdio_node) call to the DT-config/clear methods. Drop 'else if'. * Modify the commit message of 7/14. (LS2K CPU -> LS2K SOC) V15: * Drop return that will not be executed. * Move pdev from patch 12 to patch 13 to pass W=1 builds. RFC v15: * As Serge's comments: Extend the commit message.(patch 7 and patch 11) Add fixes tag for patch 8. Add loongson_dwmac_dt_clear() patch. Modify loongson_dwmac_msi_config(). ... * Pick Huacai's Acked-by tag. * Pick Serge's Reviewed-by tag. * I have already contacted the author(ZhangQing) of the module, so I copied her valid email: diasyzhang@tencent.com. Note: I replied to the comments on v14 last Sunday, but all of Loongson's email servers failed to deliver. The network administrator told me today that he has fixed the problem and re-delivered all the failed emails, but I did not see them on the mailing list. I hope they will not suddenly appear in everyone's mailbox one day. I apologize for this. (The email content mainly agrees with Serge's suggestion.) v14: Because Loongson GMAC can be also found with the 8-channels AV feature enabled, we'll need to reconsider the patches logic and thus the commit logs too. As Serge's comments and Russell's comments: [PATCH net-next v14 01/15] net: stmmac: Move the atds flag to the stmmac_dma_cfg structure [PATCH net-next v14 02/15] net: stmmac: Add multi-channel support [PATCH net-next v14 03/15] net: stmmac: Export dwmac1000_dma_ops [PATCH net-next v14 04/15] net: stmmac: dwmac-loongson: Drop duplicated hash-based filter size init [PATCH net-next v14 05/15] net: stmmac: dwmac-loongson: Drop pci_enable/disable_msi calls [PATCH net-next v14 06/15] net: stmmac: dwmac-loongson: Use PCI_DEVICE_DATA() macro for device identification [PATCH net-next v14 07/15] net: stmmac: dwmac-loongson: Detach GMAC-specific platform data init +-> Init the plat_stmmacenet_data::{tx_queues_to_use,rx_queues_to_use} in the loongson_gmac_data() method. [PATCH net-next v14 08/15] net: stmmac: dwmac-loongson: Init ref and PTP clocks rate [PATCH net-next v14 09/15] net: stmmac: dwmac-loongson: Add phy_interface for Loongson GMAC [PATCH net-next v14 10/15] net: stmmac: dwmac-loongson: Introduce PCI device info data +-> Make sure the setup() method is called after the pci_enable_device() invocation. [PATCH net-next v14 11/15] net: stmmac: dwmac-loongson: Add DT-less GMAC PCI-device support +-> Introduce the loongson_dwmac_dt_config() method here instead of doing that in a separate patch. +-> Add loongson_dwmac_acpi_config() which would just get the IRQ from the pdev->irq field and make sure it is valid. [PATCH net-next v14 12/15] net: stmmac: Fixed failure to set network speed to 1000. +-> Drop the patch as Russell's comments, At the same time, he provided another better repair suggestion, and I decided to send it separately after the patch set was merged. See: <https://lore.kernel.org/netdev/ZoW1fNqV3PxEobFx@shell.armlinux.org.uk/> [PATCH net-next v14 13/15] net: stmmac: dwmac-loongson: Add Loongson Multi-channels GMAC support +-> This is former "net: stmmac: dwmac-loongson: Add Loongson GNET support" patch, but which adds the support of the Loongson GMAC with the 8-channels AV-feature available. +-> loongson_dwmac_intx_config() shall be dropped due to the loongson_dwmac_acpi_config() method added in the PATCH 11/15. +-> Make sure loongson_data::loongson_id is initialized before the stmmac_pci_info::setup() is called. +-> Move the rx_queues_to_use/tx_queues_to_use and coe_unsupported fields initialization to the loongson_gmac_data() method. +-> As before, call the loongson_dwmac_msi_config() method if the multi-channels Loongson MAC has been detected. +-> Move everything GNET-specific to the next patch. [PATCH net-next v14 14/15] net: stmmac: dwmac-loongson: Add Loongson GNET support +-> Everything Loonsgson GNET-specific is supposed to be added in the framework of this patch: + PCI_DEVICE_ID_LOONGSON_GNET macro + loongson_gnet_fix_speed() method + loongson_gnet_data() method + loongson_gnet_pci_info data + The GNET-specific part of the loongson_dwmac_setup() method. + ... [PATCH net-next v14 15/15] net: stmmac: dwmac-loongson: Add loongson module author Other's: Pick Serge's Reviewed-by tag. v13: * Sorry, we have clarified some things in the past 10 days. I did not give you a clear reply to the following questions in v12, so I need to reply again: 1. The current LS2K2000 also have a GMAC(and two GNET) that supports 8 channels, so we have to reconsider the initialization of tx/rx_queues_to_use into probe(); 2. In v12, we disagreed on the loongson_dwmac_msi_config method, but I changed it based on Serge's comments(If I understand correctly): if (dev_of_node(&pdev->dev)) { ret = loongson_dwmac_dt_config(pdev, plat, &res); } if (ld->loongson_id == DWMAC_CORE_LS2K2000) { ret = loongson_dwmac_msi_config(pdev, plat, &res); } else { ret = loongson_dwmac_intx_config(pdev, plat, &res); } 3. Our priv->dma_cap.pcs is false, so let's use PHY_INTERFACE_MODE_NA; 4. Our GMAC does not support Delay, so let's use PHY_INTERFACE_MODE_RGMII_ID, the current dts is wrong, a fix patch will be sent to the LoongArch list later. Others: * Re-split a part of the patch (it seems we do this with every version); * Copied Serge's comments into the commit message of patch; * Fixed the stmmac_dma_operation_mode() method; * Changed some code comments. v12: * The biggest change is the re-splitting of patches. * Add a "gmac_version" in loongson_data, then we only read it once in the _probe(). * Drop Serge's patch. * Rebase to the latest code state. * Fixed the gnet commit message. v11: * Break loongson_phylink_get_caps(), fix bad logic. * Remove a unnecessary ";". * Remove some unnecessary "{}". * add a blank. * Move the code of fix _force_1000 to patch 6/6. The main changes occur in these two functions: loongson_dwmac_probe(); loongson_dwmac_setup(); v10: As Andrew's comment: * Add a #define for the 0x37. * Add a #define for Port Select. others: * Pick Serge's patch, This patch resulted from the process of reviewing our patch set. * Based on Serge's patch, modify our loongson_phylink_get_caps(). * Drop patch 3/6, we need mac_interface. * Adjusted the code layout of gnet patch. * Corrected several errata in commit message. * Move DISABLE_FORCE flag to loongson_gnet_data(). v9: We have not provided a detailed list of equipment for a long time, and I apologize for this. During this period, I have collected some information and now present it to you, hoping to alleviate the pressure of review. 1. IP core We now have two types of IP cores, one is 0x37, similar to dwmac1000; The other is 0x10. Compared to 0x37, we split several DMA registers from one to two, and it is not worth adding a new entry for this. According to Serge's comment, we made these devices work by overwriting priv->synopsys_id = 0x37 and mac->dma = <LS_dma_ops>. 1.1. Some more detailed information The number of DMA channels for 0x37 is 1; The number of DMA channels for 0x10 is 8. Except for channel 0, otherchannels do not support sending hardware checksums. Supported AV features are Qav, Qat, and Qas, and the rest are consistent with 3.73. 2. DEVICE We have two types of devices, one is GMAC, which only has a MAC chip inside and needs an external PHY chip; the other is GNET, which integrates both MAC and PHY chips inside. 2.1. Some more detailed information GMAC device: LS7A1000, LS2K1000, these devices do not support any pause mode. gnet device: LS7A2000, LS2K2000, the chip connection between the mac and phy of these devices is not normal and requires two rounds of negotiation; LS7A2000 does not support half-duplex and multi-channel; to enable multi-channel on LS2K2000, you need to turn off hardware checksum. **Note**: Only the LS2K2000's IP core is 0x10, while the IP cores of other devices are 0x37. 3. TABLE device type pci_id ip_core ls7a1000 gmac 7a03 0x35/0x37 ls2k1000 gmac 7a03 0x35/0x37 ls7a2000 gnet 7a13 0x37 ls2k2000 gnet 7a13 0x10 ----------------------------------------------- Changes: * passed the CI <https://github.com/linux-netdev/nipa/blob/main/tests/patch/checkpatch /checkpatch.sh> * reverse xmas tree order. * Silence build warning. * Re-split the patch. * Add more detailed commit message. * Add more code comment. * Reduce modification of generic code. * using the GNET-specific prefix. * define a new macro for the GNET MAC. * Use an easier way to overwrite mac. * Removed some useless printk. v8: * The biggest change is according to Serge's comment in the previous edition: Seeing the patch in the current state would overcomplicate the generic code and the only functions you need to update are dwmac_dma_interrupt() dwmac1000_dma_init_channel() you can have these methods re-defined with all the Loongson GNET specifics in the low-level platform driver (dwmac-loongson.c). After that you can just override the mac_device_info.dma pointer with a fixed stmmac_dma_ops descriptor. Here is what should be done for that: 1. Keep the Patch 4/9 with my comments fixed. First it will be partly useful for your GNET device. Second in general it's a correct implementation of the normal DW GMAC v3.x multi-channels feature and will be useful for the DW GMACs with that feature enabled. 2. Create the Loongson GNET-specific stmmac_dma_ops.dma_interrupt() stmmac_dma_ops.init_chan() methods in the dwmac-loongson.c driver. Don't forget to move all the Loongson-specific macros from dwmac_dma.h to dwmac-loongson.c. 3. Create a Loongson GNET-specific platform setup method with the next semantics: + allocate stmmac_dma_ops instance and initialize it with dwmac1000_dma_ops. + override the stmmac_dma_ops.{dma_interrupt, init_chan} with the pointers to the methods defined in 2. + allocate mac_device_info instance and initialize the mac_device_info.dma field with a pointer to the new stmmac_dma_ops instance. + call dwmac1000_setup() or initialize mac_device_info in a way it's done in dwmac1000_setup() (the later might be better so you wouldn't need to export the dwmac1000_setup() function). + override stmmac_priv.synopsys_id with a correct value. 4. Initialize plat_stmmacenet_data.setup() with the pointer to the method created in 3. * Others: Re-split the patch. Passed checkpatch.pl test. v7: * Refer to andrew's suggestion: - Add DMA_INTR_ENA_NIE_RX and DMA_INTR_ENA_NIE_TX #define's, etc. * Others: - Using --subject-prefix="PATCH net-next vN" to indicate that the patches are for the networking tree. - Rebase to the latest networking tree: <git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git> v6: * Refer to Serge's suggestion: - Add new platform feature flag: include/linux/stmmac.h: +#define STMMAC_FLAG_HAS_LGMAC BIT(13) - Add the IRQs macros specific to the Loongson Multi-channels GMAC: drivers/net/ethernet/stmicro/stmmac/dwmac_dma.h: +#define DMA_INTR_ENA_NIE_LOONGSON 0x00060000 /* ...*/ #define DMA_INTR_ENA_NIE 0x00010000 /* Normal Summary */ ... - Drop all of redundant changes that don't require the prototypes being converted to accepting the stmmac_priv pointer. * Refer to andrew's suggestion: - Drop white space changes. - break patch up into lots of smaller parts. Some small patches have been put into another series as a preparation see <https://lore.kernel.org/loongarch/cover.1702289232.git.siyanteng@loongson.cn/T/#t> *note* : This series of patches relies on the three small patches above. * others - Drop irq_flags changes. - Changed patch order. v4 -> v5: * Remove an ugly and useless patch (fix channel number). * Remove the non-standard dma64 driver code, and also remove the HWIF entries, since the associated custom callbacks no longer exist. * Refer to Serge's suggestion: Update the dwmac1000_dma.c to support the multi-DMA-channels controller setup. See: v4: <https://lore.kernel.org/loongarch/cover.1692696115.git.chenfeiyang@loongson.cn/> v3: <https://lore.kernel.org/loongarch/cover.1691047285.git.chenfeiyang@loongson.cn/> v2: <https://lore.kernel.org/loongarch/cover.1690439335.git.chenfeiyang@loongson.cn/> v1: <https://lore.kernel.org/loongarch/cover.1689215889.git.chenfeiyang@loongson.cn/> ==================== Link: https://patch.msgid.link/cover.1723014611.git.siyanteng@loongson.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Add loongson module authorYanteng Si
Add Yanteng Si as MODULE_AUTHOR of Loongson DWMAC PCI driver. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Add Loongson GNET supportYanteng Si
The new generation Loongson LS2K2000 SoC and LS7A2000 chipset are equipped with the network controllers called Loongson GNET. It's the single and multi DMA-channels Loongson GMAC but with a PHY attached. Here is the summary of the DW GMAC features the controller has: DW GMAC IP-core: v3.73a Speeds: 10/100/1000Mbps Duplex: Full (both versions), Half (LS2K2000 GNET only) DMA-descriptors type: enhanced L3/L4 filters availability: Y VLAN hash table filter: Y PHY-interface: GMII (PHY is integrated into the chips) Remote Wake-up support: Y Mac Management Counters (MMC): Y Number of additional MAC addresses: 5 MAC Hash-based filter: Y Hash Table Size: 256 AV feature: Y (LS2K2000 GNET only) DMA channels: 8 (LS2K2000 GNET), 1 (LS7A2000 GNET) Let's update the Loongson DWMAC driver to supporting the new Loongson GNET controller. The change is mainly trivial: the driver shall be bound to the PCIe device with DID 0x7a13, and the device-specific setup() method shall be called for it. The only peculiarity concerns the integrated PHY speed change procedure. The PHY has a weird problem with switching from the low speeds to 1000Mbps mode. The speedup procedure requires the PHY-link re-negotiation. So the suggested change provide the device-specific fix_mac_speed() method to overcome the problem. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Add Loongson Multi-channels GMAC supportYanteng Si
The Loongson DWMAC driver currently supports the Loongson GMAC devices (based on the DW GMAC v3.50a/v3.73a IP-core) installed to the LS2K1000 SoC and LS7A1000 chipset. But recently a new generation LS2K2000 SoC was released with the new version of the Loongson GMAC synthesized in. The new controller is based on the DW GMAC v3.73a IP-core with the AV-feature enabled, which implies the multi DMA-channels support. The multi DMA-channels feature has the next vendor-specific peculiarities: 1. Split up Tx and Rx DMA IRQ status/mask bits: Name Tx Rx DMA_INTR_ENA_NIE = 0x00040000 | 0x00020000; DMA_INTR_ENA_AIE = 0x00010000 | 0x00008000; DMA_STATUS_NIS = 0x00040000 | 0x00020000; DMA_STATUS_AIS = 0x00010000 | 0x00008000; DMA_STATUS_FBI = 0x00002000 | 0x00001000; 2. Custom Synopsys ID hardwired into the GMAC_VERSION.SNPSVER register field. It's 0x10 while it should have been 0x37 in accordance with the actual DW GMAC IP-core version. 3. There are eight DMA-channels available meanwhile the Synopsys DW GMAC IP-core supports up to three DMA-channels. 4. It's possible to have each DMA-channel IRQ independently delivered. The MSI IRQs must be utilized for that. Thus in order to have the multi-channels Loongson GMAC controllers supported let's modify the Loongson DWMAC driver in accordance with all the peculiarities described above: 1. Create the multi-channels Loongson GMAC-specific stmmac_dma_ops::dma_interrupt() stmmac_dma_ops::init_chan() callbacks due to the non-standard DMA IRQ CSR flags layout. 2. Create the Loongson DWMAC-specific platform setup() method which gets to initialize the DMA-ops with the dwmac1000_dma_ops instance and overrides the callbacks described in 1. The method also overrides the custom Synopsys ID with the real one in order to have the rest of the HW-specific callbacks correctly detected by the driver core. 3. Make sure the platform setup() method enables the flow control and duplex modes supported by the controller. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Add DT-less GMAC PCI-device supportYanteng Si
The Loongson GMAC driver currently supports the network controllers installed on the LS2K1000 SoC and LS7A1000 chipset, for which the GMAC devices are required to be defined in the platform device tree source. But Loongson machines may have UEFI (implies ACPI) or PMON/UBOOT (implies FDT) as the system bootloaders. In order to have both system configurations support let's extend the driver functionality with the case of having the Loongson GMAC probed on the PCI bus with no device tree node defined for it. That requires to make the device DT-node optional, to rely on the IRQ line detected by the PCI core and to have the MDIO bus ID calculated using the PCIe Domain+BDF numbers. In order to have the device probe() and remove() methods less complicated let's move the DT- and ACPI-specific code to the respective sub-functions. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Introduce PCI device info dataYanteng Si
The Loongson GNET device support is about to be added in one of the next commits. As another preparation for that introduce the PCI device info data with a setup() callback performing the device-specific platform data initializations. Currently it is utilized for the already supported Loongson GMAC device only. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Add phy_interface for Loongson GMACYanteng Si
PHY-interface of the Loongson GMAC device is RGMII with no internal delays added to the data lines signal. So to comply with that let's pre-initialize the platform-data field with the respective enum constant. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Init ref and PTP clocks rateYanteng Si
Reference and PTP clocks rate of the Loongson GMAC devices is 125MHz. (So is in the GNET devices which support is about to be added.) Set the respective plat_stmmacenet_data field up in accordance with that so to have the coalesce command and timestamping work correctly. Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Detach GMAC-specific platform data initYanteng Si
Loongson delivers two types of the network devices: Loongson GMAC and Loongson GNET in the framework of four SOC/Chipsets revisions: Chip Network PCI Dev ID Synopys Version DMA-channel LS2K1000 SOC GMAC 0x7a03 v3.50a/v3.73a 1 LS7A1000 Chipset GMAC 0x7a03 v3.50a/v3.73a 1 LS2K2000 SOC GMAC 0x7a03 v3.73a 8 LS2K2000 SOC GNET 0x7a13 v3.73a 8 LS7A2000 Chipset GNET 0x7a13 v3.73a 1 The driver currently supports the chips with the Loongson GMAC network device synthesized with a single DMA-channel available. As a preparation before adding the Loongson GNET support detach the Loongson GMAC-specific platform data initializations to the loongson_gmac_data() method and preserve the common settings in the loongson_default_data(). While at it drop the return value statement from the loongson_default_data() method as redundant. Note there is no intermediate vendor-specific PCS in between the MAC and PHY on Loongson GMAC and GNET. So the plat->mac_interface field can be freely initialized with the PHY_INTERFACE_MODE_NA value. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Use PCI_DEVICE_DATA() macro for device ↵Yanteng Si
identification For the readability sake convert the hard-coded Loongson GMAC PCI ID to the respective macro and use the PCI_DEVICE_DATA() macro-function to create the pci_device_id array entry. The later change will be specifically useful in order to assign the device-specific data for the currently supported device and for about to be added Loongson GNET controller. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Drop pci_enable/disable_msi callsYanteng Si
The Loongson GMAC driver currently doesn't utilize the MSI IRQs, but retrieves the IRQs specified in the device DT-node. Let's drop the direct pci_enable_msi()/pci_disable_msi() calls then as redundant Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: dwmac-loongson: Drop duplicated hash-based filter size initYanteng Si
The plat_stmmacenet_data::multicast_filter_bins field is twice initialized in the loongson_default_data() method. Drop the redundant initialization, but for the readability sake keep the filters init statements defined in the same place of the method. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: Export dwmac1000_dma_opsYanteng Si
Export the DW GMAC DMA-ops descriptor so one could be available in the low-level platform drivers. It will be utilized to override some callbacks in order to handle the LS2K2000 GNET device specifics. The GNET controller support is being added in one of the following up commits. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: Add multi-channel supportYanteng Si
DW GMAC v3.73 can be equipped with the Audio Video (AV) feature which enables transmission of time-sensitive traffic over bridged local area networks (DWC Ethernet QoS Product). In that case there can be up to two additional DMA-channels available with no Tx COE support (unless there is vendor-specific IP-core alterations). Each channel is implemented as a separate Control and Status register (CSR) for managing the transmit and receive functions, descriptor handling, and interrupt handling. Add the multi-channels DW GMAC controllers support just by making sure the already implemented DMA-configs are performed on the per-channel basis. Note the only currently known instance of the multi-channel DW GMAC IP-core is the LS2K2000 GNET controller, which has been released with the vendor-specific feature extension of having eight DMA-channels. The device support will be added in one of the following up commits. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-13net: stmmac: Move the atds flag to the stmmac_dma_cfg structureYanteng Si
ATDS (Alternate Descriptor Size) is a part of the DMA Bus Mode configs (together with PBL, ALL, EME, etc) of the DW GMAC controllers. Seeing it's not changed at runtime but is activated as long as the IP-core has it supported (at least due to the Type 2 Full Checksum Offload Engine feature), move the respective parameter from the stmmac_dma_ops::init() callback argument to the stmmac_dma_cfg structure, which already have the rest of the DMA-related configs defined. Besides the being added in the next commit DW GMAC multi-channels support will require to add the stmmac_dma_ops::init_chan() callback and have the ATDS flag set/cleared for each channel in there. Having the atds-flag in the stmmac_dma_cfg structure will make the parameter accessible from stmmac_dma_ops::init_chan() callback too. Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Yinggang Gu <guyinggang@loongson.cn> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-12net/smc: Use static_assert() to check struct sizesGustavo A. R. Silva
Commit 9748dbc9f265 ("net/smc: Avoid -Wflex-array-member-not-at-end warnings") introduced tagged `struct smc_clc_v2_extension_fixed` and `struct smc_clc_smcd_v2_extension_fixed`. We want to ensure that when new members need to be added to the flexible structures, they are always included within these tagged structs. So, we use `static_assert()` to ensure that the memory layout for both the flexible structure and the tagged struct is the same after any changes. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Link: https://patch.msgid.link/ZrVBuiqFHAORpFxE@cute Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12nfp: Use static_assert() to check struct sizesGustavo A. R. Silva
Commit d88cabfd9abc ("nfp: Avoid -Wflex-array-member-not-at-end warnings") introduced tagged `struct nfp_dump_tl_hdr`. We want to ensure that when new members need to be added to the flexible structure, they are always included within this tagged struct. So, we use `static_assert()` to ensure that the memory layout for both the flexible structure and the tagged struct is the same after any changes. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ZrVB43Hen0H5WQFP@cute Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12net: macb: Use rcu_dereference() for idev->ifa_list in macb_suspend().Kuniyuki Iwashima
In macb_suspend(), idev->ifa_list is fetched with rcu_access_pointer() and later the pointer is dereferenced as ifa->ifa_local. So, idev->ifa_list must be fetched with rcu_dereference(). Fixes: 0cb8de39a776 ("net: macb: Add ARP support to WOL") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240808040021.6971-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12selftests/bpf: Add a test to verify previous stacksafe() fixYonghong Song
A selftest is added such that without the previous patch, a crash can happen. With the previous patch, the test can run successfully. The new test is written in a way which mimics original crash case: main_prog static_prog_1 static_prog_2 where static_prog_1 has different paths to static_prog_2 and some path has stack allocated and some other path does not. A stacksafe() checking in static_prog_2() triggered the crash. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240812214852.214037-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-12bpf: Fix a kernel verifier crash in stacksafe()Yonghong Song
Daniel Hodges reported a kernel verifier crash when playing with sched-ext. Further investigation shows that the crash is due to invalid memory access in stacksafe(). More specifically, it is the following code: if (exact != NOT_EXACT && old->stack[spi].slot_type[i % BPF_REG_SIZE] != cur->stack[spi].slot_type[i % BPF_REG_SIZE]) return false; The 'i' iterates old->allocated_stack. If cur->allocated_stack < old->allocated_stack the out-of-bound access will happen. To fix the issue add 'i >= cur->allocated_stack' check such that if the condition is true, stacksafe() should fail. Otherwise, cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access is legal. Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks") Cc: Eduard Zingerman <eddyz87@gmail.com> Reported-by: Daniel Hodges <hodgesd@meta.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240812214847.213612-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-12sched: act_ct: avoid -Wflex-array-member-not-at-end warningGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Remove unnecessary flex-array member `pad[]` and refactor the related code a bit. Fix the following warning: net/sched/act_ct.c:57:29: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/ZrY0JMVsImbDbx6r@cute Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12Merge branch 'net-nexthop-increase-weight-to-u16'Jakub Kicinski
Petr Machata says: ==================== net: nexthop: Increase weight to u16 In CLOS networks, as link failures occur at various points in the network, ECMP weights of the involved nodes are adjusted to compensate. With high fan-out of the involved nodes, and overall high number of nodes, a (non-)ECMP weight ratio that we would like to configure does not fit into 8 bits. Instead of, say, 255:254, we might like to configure something like 1000:999. For these deployments, the 8-bit weight may not be enough. To that end, in this patchset increase the next hop weight from u8 to u16. Patch #1 adds a flag that indicates whether the reserved fields are zeroed. This is a follow-up to a new fix merged in commit 6d745cd0e972 ("net: nexthop: Initialize all fields in dumped nexthops"). The theory behind this patch is that there is a strict ordering between the fields actually being zeroed, the kernel declaring that they are, and the kernel repurposing the fields. Thus clients can use the flag to tell if it is safe to interpret the reserved fields in any way. Patch #2 contains the substantial code and the commit message covers the details of the changes. Patches #3 to #6 add selftests. ==================== Link: https://patch.msgid.link/cover.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12selftests: fib_nexthops: Test 16-bit next hop weightsPetr Machata
Add tests that attempt to create NH groups that use full 16 bits of NH weight. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/101cdd3f2bfd9511c9bec95f909d20ff56f70ba5.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12selftests: router_mpath_nh_res: Test 16-bit next hop weightsPetr Machata
Add tests that exercise full 16 bits of NH weight. Like in the previous patch, omit the 255:65535 test when KSFT_MACHINE_SLOW. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/a91d6ead9d1b1b4b7e276ca58a71ef814f42b7dd.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12selftests: router_mpath_nh: Test 16-bit next hop weightsPetr Machata
Add tests that exercise full 16 bits of NH weight. To test the 255:65535, it is necessary to run more packets than for the other tests. On a debug kernel, the test can take up to a minute, therefore avoid the test when KSFT_MACHINE_SLOW. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/c0c257c00ad30b07afc3fa5e2afd135925405544.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12selftests: router_mpath: Sleep after MZPetr Machata
In the context of an offloaded datapath, it may take a while for the ip link stats to be updated. This causes the test to fail when MZ_DELAY is too low. Sleep after the packets are sent for the link stats to get up to date. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/8b1971d948273afd7de2da3d6a2ba35200540e55.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12net: nexthop: Increase weight to u16Petr Machata
In CLOS networks, as link failures occur at various points in the network, ECMP weights of the involved nodes are adjusted to compensate. With high fan-out of the involved nodes, and overall high number of nodes, a (non-)ECMP weight ratio that we would like to configure does not fit into 8 bits. Instead of, say, 255:254, we might like to configure something like 1000:999. For these deployments, the 8-bit weight may not be enough. To that end, in this patch increase the next hop weight from u8 to u16. Increasing the width of an integral type can be tricky, because while the code still compiles, the types may not check out anymore, and numerical errors come up. To prevent this, the conversion was done in two steps. First the type was changed from u8 to a single-member structure, which invalidated all uses of the field. This allowed going through them one by one and audit for type correctness. Then the structure was replaced with a vanilla u16 again. This should ensure that no place was missed. The UAPI for configuring nexthop group members is that an attribute NHA_GROUP carries an array of struct nexthop_grp entries: struct nexthop_grp { __u32 id; /* nexthop id - must exist */ __u8 weight; /* weight of this nexthop */ __u8 resvd1; __u16 resvd2; }; The field resvd1 is currently validated and required to be zero. We can lift this requirement and carry high-order bits of the weight in the reserved field: struct nexthop_grp { __u32 id; /* nexthop id - must exist */ __u8 weight; /* weight of this nexthop */ __u8 weight_high; __u16 resvd2; }; Keeping the fields split this way was chosen in case an existing userspace makes assumptions about the width of the weight field, and to sidestep any endianness issues. The weight field is currently encoded as the weight value minus one, because weight of 0 is invalid. This same trick is impossible for the new weight_high field, because zero must mean actual zero. With this in place: - Old userspace is guaranteed to carry weight_high of 0, therefore configuring 8-bit weights as appropriate. When dumping nexthops with 16-bit weight, it would only show the lower 8 bits. But configuring such nexthops implies existence of userspace aware of the extension in the first place. - New userspace talking to an old kernel will work as long as it only attempts to configure 8-bit weights, where the high-order bits are zero. Old kernel will bounce attempts at configuring >8-bit weights. Renaming reserved fields as they are allocated for some purpose is commonly done in Linux. Whoever touches a reserved field is doing so at their own risk. nexthop_grp::resvd1 in particular is currently used by at least strace, however they carry an own copy of UAPI headers, and the conversion should be trivial. A helper is provided for decoding the weight out of the two fields. Forcing a conversion seems preferable to bending backwards and introducing anonymous unions or whatever. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/483e2fcf4beb0d9135d62e7d27b46fa2685479d4.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12net: nexthop: Add flag to assert that NHGRP reserved fields are zeroPetr Machata
There are many unpatched kernel versions out there that do not initialize the reserved fields of struct nexthop_grp. The issue with that is that if those fields were to be used for some end (i.e. stop being reserved), old kernels would still keep sending random data through the field, and a new userspace could not rely on the value. In this patch, use the existing NHA_OP_FLAGS, which is currently inbound only, to carry flags back to the userspace. Add a flag to indicate that the reserved fields in struct nexthop_grp are zeroed before dumping. This is reliant on the actual fix from commit 6d745cd0e972 ("net: nexthop: Initialize all fields in dumped nexthops"). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/21037748d4f9d8ff486151f4c09083bcf12d5df8.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12ipv6: eliminate ndisc_ops_is_useropt()Maciej Żenczykowski
as it doesn't seem to offer anything of value. There's only 1 trivial user: int lowpan_ndisc_is_useropt(u8 nd_opt_type) { return nd_opt_type == ND_OPT_6CO; } but there's no harm to always treating that as a useropt... Cc: David Ahern <dsahern@kernel.org> Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> Signed-off-by: Maciej Żenczykowski <maze@google.com> Link: https://patch.msgid.link/20240730003010.156977-1-maze@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12Merge branch 'eth-fbnic-add-basic-stats'Jakub Kicinski
Jakub Kicinski says: ==================== eth: fbnic: add basic stats Add basic interface stats to fbnic. v1: https://lore.kernel.org/20240807022631.1664327-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20240810054322.2766421-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12eth: fbnic: add support for basic qstatsStanislav Fomichev
Implement netdev_stat_ops and export the basic per-queue stats. This interface expect users to set the values that are used either to zero or to some other preserved value (they are 0xff by default). So here we export bytes/packets/drops from tx and rx_stats plus set some of the values that are exposed by queue stats to zero. $ cd tools/testing/selftests/drivers/net && ./stats.py [...] Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0 Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20240810054322.2766421-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12eth: fbnic: add basic rtnl statsJakub Kicinski
Count packets, bytes and drop on the datapath, and report to the user. Since queues are completely freed when the device is down - accumulate the stats in the main netdev struct. This means that per-queue stats will only report values since last reset (per qstat recommendation). Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240810054322.2766421-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12bpf: Fix updating attached freplace prog in prog_array mapLeon Hwang
The commit f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed a NULL pointer dereference panic, but didn't fix the issue that fails to update attached freplace prog to prog_array map. Since commit 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog and its target prog are able to tail call each other. And the commit 3aac1ead5eb6 ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL after attaching freplace prog to its target prog. After loading freplace the prog_array's owner type is BPF_PROG_TYPE_SCHED_CLS. Then, after attaching freplace its prog->aux->dst_prog is NULL. Then, while updating freplace in prog_array the bpf_prog_map_compatible() incorrectly returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. After this patch the resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS and update to prog_array can succeed. Fixes: f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20240728114612.48486-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-12netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flagsDavid Howells
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used correctly. The problem is that we try to set them up in the request initialisation, but we the cache may be in the process of setting up still, and so the state may not be correct. Further, we secondarily sample the cache state and make contradictory decisions later. The issue arises because we set up the cache resources, which allows the cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which triggers cache writing even if we didn't set the flags when allocating. Fix this in the following way: (1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in ->init_request() rather than trying to juggle that in netfs_alloc_request(). (2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is to be done, then PG_private_2 is to be used rather than only setting it if we decide to cache and then having netfs_rreq_unlock_folios() set the non-PG_private_2 writeback-to-cache if it wasn't set. (3) Split netfs_rreq_unlock_folios() into two functions, one of which contains the deprecated code for using PG_private_2 to avoid accidentally doing the writeback path - and always use it if USE_PGPRIV2 is set. (4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always wait for PG_private_2. This function is deprecated and only used by ceph anyway, and so label it so. (5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use fscache_operation_valid() on the cache_resources instead. This has the advantage of picking up the result of netfs_begin_cache_read() and fscache_begin_write_operation() - which are called after the object is initialised and will wait for the cache to come to a usable state. Just reverting ae678317b95e[1] isn't a sufficient fix, so this need to be applied on top of that. Without this as well, things like: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { and: WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386 may happen, along with some UAFs due to PG_private_2 not getting used to wait on writeback completion. Fixes: 2ff1e97587f4 ("netfs: Replace PG_fscache by setting folio->private and marking dirty") Reported-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Hristo Venev <hristo@venev.name> cc: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: ceph-devel@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/1173209.1723152682@warthog.procyon.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a ↵David Howells
second writeback flag" This reverts commit ae678317b95e760607c7b20b97c9cd4ca9ed6e1a. Revert the patch that removes the deprecated use of PG_private_2 in netfslib for the moment as Ceph is actually still using this to track data copied to the cache. Fixes: ae678317b95e ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag") Reported-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: ceph-devel@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12file: fix typo in take_fd() commentMathias Krause
The explanatory comment above take_fd() contains a typo, fix that to not confuse readers. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20240809135035.748109-1-minipli@grsecurity.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12pidfd: prevent creation of pidfds for kthreadsChristian Brauner
It's currently possible to create pidfds for kthreads but it is unclear what that is supposed to mean. Until we have use-cases for it and we figured out what behavior we want block the creation of pidfds for kthreads. Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner Fixes: 32fcb426ec00 ("pid: add pidfd_open()") Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12netfs: clean up after renaming FSCACHE_DEBUG configLukas Bulwahn
Commit 6b8e61472529 ("netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG") renames the config, but introduces two issues: First, NETFS_DEBUG mistakenly depends on the non-existing config NETFS, whereas the actual intended config is called NETFS_SUPPORT. Second, the config renaming misses to adjust the documentation of the functionality of this config. Clean up those two points. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Link: https://lore.kernel.org/r/20240731073902.69262-1-lukas.bulwahn@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12libfs: fix infinite directory reads for offset diryangerkun
After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below). 1. create 5000 files(1 2 3...) under one dir 2. call readdir(man 3 readdir) once, and get one entry 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) 4. loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition) We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun <yangerkun@huawei.com> Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12nsfs: fix ioctl declarationChristian Brauner
The kernel is writing an object of type __u64, so the ioctl has to be defined to _IOR(NSIO, 0x5, __u64) instead of _IO(NSIO, 0x5). Reported-by: Dmitry V. Levin <ldv@strace.io> Link: https://lore.kernel.org/r/20240730164554.GA18486@altlinux.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12fs/netfs/fscache_cookie: add missing "n_accesses" checkMax Kellermann
This fixes a NULL pointer dereference bug due to a data race which looks like this: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: events_unbound netfs_rreq_write_to_cache_work RIP: 0010:cachefiles_prepare_write+0x30/0xa0 Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10 RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286 RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000 RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438 RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001 R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68 R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00 FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x70 ? page_fault_oops+0x15d/0x440 ? search_module_extables+0xe/0x40 ? fixup_exception+0x22/0x2f0 ? exc_page_fault+0x5f/0x100 ? asm_exc_page_fault+0x22/0x30 ? cachefiles_prepare_write+0x30/0xa0 netfs_rreq_write_to_cache_work+0x135/0x2e0 process_one_work+0x137/0x2c0 worker_thread+0x2e9/0x400 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Modules linked in: CR2: 0000000000000008 ---[ end trace 0000000000000000 ]--- This happened because fscache_cookie_state_machine() was slow and was still running while another process invoked fscache_unuse_cookie(); this led to a fscache_cookie_lru_do_one() call, setting the FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by fscache_cookie_state_machine(), withdrawing the cookie via cachefiles_withdraw_cookie(), clearing cookie->cache_priv. At the same time, yet another process invoked cachefiles_prepare_write(), which found a NULL pointer in this code line: struct cachefiles_object *object = cachefiles_cres_object(cres); The next line crashes, obviously: struct cachefiles_cache *cache = object->volume->cache; During cachefiles_prepare_write(), the "n_accesses" counter is non-zero (via fscache_begin_operation()). The cookie must not be withdrawn until it drops to zero. The counter is checked by fscache_cookie_state_machine() before switching to FSCACHE_COOKIE_STATE_RELINQUISHING and FSCACHE_COOKIE_STATE_WITHDRAWING (in "case FSCACHE_COOKIE_STATE_FAILED"), but not for FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case FSCACHE_COOKIE_STATE_ACTIVE"). This patch adds the missing check. With a non-zero access counter, the function returns and the next fscache_end_cookie_access() call will queue another fscache_cookie_state_machine() call to handle the still-pending FSCACHE_COOKIE_DO_LRU_DISCARD. Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240729162002.3436763-2-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12filelock: fix name of file_lease slab cacheOmar Sandoval
When struct file_lease was split out from struct file_lock, the name of the file_lock slab cache was copied to the new slab cache for file_lease. This name conflict causes confusion in /proc/slabinfo and /sys/kernel/slab. In particular, it caused failures in drgn's test case for slab cache merging. Link: https://github.com/osandov/drgn/blob/9ad29fd86499eb32847473e928b6540872d3d59a/tests/linux_kernel/helpers/test_slab.py#L81 Fixes: c69ff4071935 ("filelock: split leases out of struct file_lock") Signed-off-by: Omar Sandoval <osandov@fb.com> Link: https://lore.kernel.org/r/2d1d053da1cafb3e7940c4f25952da4f0af34e38.1722293276.git.osandov@fb.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12netfs: Fault in smaller chunks for non-large folio mappingsMatthew Wilcox (Oracle)
As in commit 4e527d5841e2 ("iomap: fault in smaller chunks for non-large folio mappings"), we can see a performance loss for filesystems which have not yet been converted to large folios. Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240527201735.1898381-1-willy@infradead.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12Merge tag 'platform-drivers-x86-v6.11-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: "While the ideapad concurrency fix itself is relatively straightforward, it required moving code around and adding a bit of supporting infrastructure to have a clean inter-driver interface. This shows up in the diffstats. - ideapad-laptop / lenovo-ymc: Protect VPC calls with a mutex - amd/pmf: Query HPD data also when ALS is disabled" * tag 'platform-drivers-x86-v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: ideapad-laptop: add a mutex to synchronize VPC commands platform/x86: ideapad-laptop: move ymc_trigger_ec from lenovo-ymc platform/x86: ideapad-laptop: introduce a generic notification chain platform/x86/amd/pmf: Fix to Update HPD Data When ALS is Disabled
2024-08-12Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull fd bitmap fix from Al Viro: "Fix bitmap corruption on close_range() by cleaning up copy_fd_bitmaps()" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
2024-08-12Merge branch 'ethtool-rss-driver-tweaks'David S. Miller
Jakub Kicinski says: ==================== ethtool: rss: driver tweaks and netlink context dumps This series is a semi-related collection of RSS patches. Main point is supporting dumping RSS contexts via ethtool netlink. At present additional RSS contexts can be queried one by one, and assuming user know the right IDs. This series uses the XArray added by Ed to provide netlink dump support for ETHTOOL_GET_RSS. Patch 1 is a trivial selftest debug patch. Patch 2 coverts mvpp2 for no real reason other than that I had a grand plan of converting all drivers at some stage. Patch 3 removes a now moot check from mlx5 so that all tests can pass. Patch 4 and 5 make a bit used for context support optional, for easier grepping of drivers which need converting if nothing else. Patch 6 OTOH adds a new cap bit; some devices don't support using a different key per context and currently act in surprising ways. Patch 7 and 8 update the RSS netlink code to use XArray. Patch 9 and 10 add support for dumping contexts. Patch 11 and 12 are small adjustments to spec and a new test. I'm getting distracted with other work, so probably won't have the time soon to complete next steps, but things which are missing are (and some of these may be bad ideas): - better discovery Some sort of API to tell the user who many contexts the device can create. Upper bound, devices often share contexts between ports etc. so it's hard to tell exactly and upfront number of contexts for a netdev. But order of magnitude (4 vs 10s) may be enough for container management system to know whether to bother. - create/modify/delete via netlink The only question here is how to handle all the tricky IOCTL legacy. "No change" maps trivially to attribute not present. "reset" (indir_size = 0) probably needs to be a new NLA_FLAG? - better table size handling The current API assumes the LUT has fixed size, which isn't true for modern devices. We should have better APIs for the drivers to resize the tables, and in user facing API - the ability to specify pattern and min size rather than exact table expected (sort of like ethtool CLI already does). - recounted / socket-bound contexts Support for contexts which get "cleaned up" when their parent netlink socket gets closed. The major catch is that ntuple filters (which we don't currently track) depend on the context, so we need auto-removal for both. v5: - fix build v4: https://lore.kernel.org/20240809031827.2373341-1-kuba@kernel.org - adjust to the meaning of max context from net v3: https://lore.kernel.org/20240806193317.1491822-1-kuba@kernel.org - quite a few code comments and commit message changes - mvpp2: fix interpretation of max_context_id (I'll take care of the net -> net-next merge as needed) - filter by ifindex in the selftest v2: https://lore.kernel.org/20240803042624.970352-1-kuba@kernel.org - fix bugs and build in mvpp2 v1: https://lore.kernel.org/20240802001801.565176-1-kuba@kernel.org ==================== Signed-off-by: David S. Miller <davem@davemloft.net>