summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-05-20ublk: convert to refcount_tMing Lei
Convert to refcount_t and prepare for supporting to register bvec buffer automatically, which needs to initialize reference counter as 2, and kref doesn't provide this interface, so convert to refcount_t. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20wifi: ath9k_htc: Abort software beacon handling if disabledToke Høiland-Jørgensen
A malicious USB device can send a WMI_SWBA_EVENTID event from an ath9k_htc-managed device before beaconing has been enabled. This causes a device-by-zero error in the driver, leading to either a crash or an out of bounds read. Prevent this by aborting the handling in ath9k_htc_swba() if beacons are not enabled. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/r/88967.1743099372@localhost Fixes: 832f6a18fc2a ("ath9k_htc: Add beacon slots") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20250402112217.58533-1-toke@toke.dk Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-05-20loop: don't require ->write_iter for writable files in loop_configureChristoph Hellwig
Block devices can be opened read-write even if they can't be written to for historic reasons. Remove the check requiring file->f_op->write_iter when the block devices was opened in loop_configure. The call to loop_check_backing_file just below ensures the ->write_iter is present for backing files opened for writing, which is the only check that is actually needed. Fixes: f5c84eff634b ("loop: Add sanity check for read/write_iter") Reported-by: Christian Hesse <mail@eworm.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250520135420.1177312-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20wifi: ath12k: remove redundant regulatory rules intersection logic in hostAishwarya R
Whenever there is a change in the country code settings from the user, driver does an intersection of the regulatory rules for this new country with the original regulatory rules which were reported during initialization time. There is also similar logic running in firmware with a difference that the intersection in firmware is only done when the country code is configuration during boot up time (BDF/OTP). Firmware logic does not kick in when no country code is configured during device bring up time as the device is always expected to have the country code configured properly in the deployment. There is a debug/test use case that requires absolute regulatory rules to be used for a user configured country code when the device is not configured with a particular country code during boot up time. To support the above test use case, remove the redundant regulatory rules intersection logic in the host driver. Depend on the intersection logic in firmware when the device comes up with pre-configured country code. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00209-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Aishwarya R <quic_aisr@quicinc.com> Signed-off-by: Rajat Soni <quic_rajson@quicinc.com> Link: https://patch.msgid.link/20250505034351.1365914-1-quic_rajson@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-05-20wifi: ath12k: Send MCS15 support to firmware during peer assocMohan Kumar G
As per IEEE 802.11be-2024 - 9.4.2.321, EHT operation element contains MCS15 Disable subfield as the sixth bit, which is set when MCS15 support is not enabled. During association, firmware will use this MCS15 flag to enable or disable the reception of PPDU with EHT-MCS15 capability. Send MCS15 support to firmware through WMI command during peer assoc. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Co-developed-by: Dhanavandhana Kannan <quic_dhanavan1@quicinc.com> Signed-off-by: Dhanavandhana Kannan <quic_dhanavan1@quicinc.com> Signed-off-by: Mohan Kumar G <quic_mkumarg@quicinc.com> Link: https://patch.msgid.link/20250505153536.3275145-1-quic_mkumarg@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-05-20vfio/mlx5: Enable the DMA link APILeon Romanovsky
Remove intermediate scatter-gather table completely and enable new DMA link API. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/f71638d50c9c79a462f2e0423501b1de77617656.1747747694.git.leon@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-20vfio/mlx5: Rewrite create mkey flow to allow better code reuseLeon Romanovsky
Change the creation of mkey to be performed in multiple steps: data allocation, DMA setup and actual call to HW to create that mkey. In this new flow, the whole input to MKEY command is saved to eliminate the need to keep array of pointers for DMA addresses for receive list and in the future patches for send list too. In addition to memory size reduce and elimination of unnecessary data movements to set MKEY input, the code is prepared for future reuse. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/d4ad0384fbd1e23a607cbbe9e5756748f3a761d9.1747747694.git.leon@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-20vfio/mlx5: Explicitly use number of pages instead of allocated lengthLeon Romanovsky
allocated_length is a multiple of page size and number of pages, so let's change the functions to accept number of pages. This improves code readability, simplifies buffer handling, and enables combining DMA send/receive operations, as will be introduced in the next patches. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/76f39993d2ca0311b3bcfe56038a669d03926815.1747747694.git.leon@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-20Merge branch 'dma-mapping-for-6.16-two-step-api' of ↵Alex Williamson
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux into v6.16/vfio/next Merge two step DMA mapping API as basis for mlx5-vfio-pci uses. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-05-20iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()Rob Clark
In situations where mapping/unmapping sequence can be controlled by userspace, attempting to map over a region that has not yet been unmapped is an error. But not something that should spam dmesg. Now that there is a quirk, we can also drop the selftest_running flag, and use the quirk instead for selftests. Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20250519175348.11924-6-robdclark@gmail.com [will: Rename quirk to IO_PGTABLE_QUIRK_NO_WARN per Robin's suggestion] Signed-off-by: Will Deacon <will@kernel.org>
2025-05-20octeontx2-pf: Add tracepoint for NIX_PARSE_SSubbaraya Sundeep
The NIX_PARSE_S structure populated by hardware in the NIX RX CQE has parsing information for the received packet. A tracepoint to dump the all words of NIX_PARSE_S is helpful in debugging packet parser. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/1747331048-15347-1-git-send-email-sbhatta@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-20net: phy: make mdio consumer / device layer a separate moduleHeiner Kallweit
After having factored out the provider part from mdio_bus.c, we can make the mdio consumer / device layer a separate module. This also allows to remove Kconfig symbol MDIO_DEVICE. The module init / exit functions from mdio_bus.c no longer have to be called from phy_device.c. The link order defined in drivers/net/phy/Makefile ensures that init / exit functions are called in the right order. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/dba6b156-5748-44ce-b5e2-e8dc2fcee5a7@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-20platform/x86: think-lmi: Fix attribute name usage for non-compliant itemsMark Pearson
A few, quite rare, WMI attributes have names that are not compatible with filenames, e.g. "Intel VT for Directed I/O (VT-d)". For these cases the '/' gets replaced with '\' for display, but doesn't get switched again when doing the WMI access. Fix this by keeping the original attribute name and using that for sending commands to the BIOS Fixes: a40cd7ef22fb ("platform/x86: think-lmi: Add WMI interface support on Lenovo platforms") Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20250520005027.3840705-1-mpearson-lenovo@squebb.ca Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-20platform/x86: thinkpad_acpi: Ignore battery threshold change event notificationMark Pearson
If user modifies the battery charge threshold an ACPI event is generated. Confirmed with Lenovo FW team this is only generated on user event. As no action is needed, ignore the event and prevent spurious kernel logs. Reported-by: Derek Barbosa <debarbos@redhat.com> Closes: https://lore.kernel.org/platform-driver-x86/7e9a1c47-5d9c-4978-af20-3949d53fb5dc@app.fastmail.com/T/#m5f5b9ae31d3fbf30d7d9a9d76c15fb3502dfd903 Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20250517023348.2962591-1-mpearson-lenovo@squebb.ca Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-20spi: sh-msiof: Transfer size improvements and I2SMark Brown
Merge series from Geert Uytterhoeven <geert+renesas@glider.be>: This patch series (A) improves single transfer sizes in the MSIOF driver, using two methods: - By increasing the assumed FIFO sizes, impacting both PIO and DMA transfers, - By using two groups, impacting DMA transfers, and (B) lets the recently-introduced MSIOF I2S drive reuse the SPI driver's register definitions. All of this is covered with a thick sauce of fixes for (harmless) bugs, cleanups, and refactorings. Note that the driver uses the limitations as specified in the hardware documentation. For discovering the actual FIFO sizes, I wrote some crude test code that can be found at [2]. This is based on spi/for-next and sound-asoc/for-next, and has been tested on a variery of R-Car SoCs. [1] https://lore.kernel.org/cover.1746180072.git.geert+renesas@glider.be [2] https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git/log/?h=topic/msiof-fifo
2025-05-20Add sound card support for QCS9100 and QCS9075Mark Brown
Merge series from Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com>: This patchset adds support for sound card on Qualcomm QCS9100 and QCS9075 boards.
2025-05-20regmap: Move selecting for REGMAP_MDIO and REGMAP_IRQAndrew Davis
If either REGMAP_IRQ or REGMAP_MDIO are set then REGMAP is also set. This then enables the selecting of IRQ_DOMAIN or MDIO_BUS from REGMAP based on the above two symbols respectively. This makes it very easy to end up with "circular dependencies". Instead select the IRQ_DOMAIN or MDIO_BUS from the symbols that make use of them. This is almost equivalent to before but makes it less likely to end up with false circular dependency detections. Signed-off-by: Andrew Davis <afd@ti.com> Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Closes: https://lore.kernel.org/r/bfe991fa-f54c-4d58-b2e0-34c4e4eb48f4@linaro.org/ Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20250516141722.13772-1-afd@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-20i2c: core: add useful info when defer probeXu Yang
Add an useful info when failed to get irq/wakeirq due to -EPROBE_DEFER. Before: [ 15.737361] i2c 2-0050: deferred probe pending: (reason unknown) After: [ 15.816295] i2c 2-0050: deferred probe pending: tcpci: can't get irq Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Reviewed-by: Carlos Song <carlos.song@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-05-20gpiolib: remove unneeded #ifdefBartosz Golaszewski
We are already within another `#ifdef CONFIG_GPIOLIB_IRQCHIP` in gpiochip_to_irq() so there's no need for another guard. Remove it. Acked-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20250519-gpio-irq-kconfig-fixes-v1-3-fe6ba1c6116d@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-20gpio: mpc8xxx: select GPIOLIB_IRQCHIPBartosz Golaszewski
This driver uses gpiochip_irq_reqres() and gpiochip_irq_relres() which are only built with GPIOLIB_IRQCHIP=y. Add the missing Kconfig select. Fixes: 7688a54d5b53 ("gpio: mpc8xxx: Make irq_chip immutable") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505180309.1nosQMkI-lkp@intel.com/ Acked-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20250519-gpio-irq-kconfig-fixes-v1-2-fe6ba1c6116d@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-20gpio: pxa: select GPIOLIB_IRQCHIPBartosz Golaszewski
This driver uses gpiochip_irq_reqres() and gpiochip_irq_relres() which are only built with GPIOLIB_IRQCHIP=y. Add the missing Kconfig select. Fixes: 20117cf426b6 ("gpio: pxa: Make irq_chip immutable") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505181429.mzyIatOU-lkp@intel.com/ Acked-by: Peng Fan <peng.fan@nxp.com> Link: https://lore.kernel.org/r/20250519-gpio-irq-kconfig-fixes-v1-1-fe6ba1c6116d@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-05-20cpufreq: scmi: Skip SCMI devices that aren't used by the CPUsMike Tipton
Currently, all SCMI devices with performance domains attempt to register a cpufreq driver, even if their performance domains aren't used to control the CPUs. The cpufreq framework only supports registering a single driver, so only the first device will succeed. And if that device isn't used for the CPUs, then cpufreq will scale the wrong domains. To avoid this, return early from scmi_cpufreq_probe() if the probing SCMI device isn't referenced by the CPU device phandles. This keeps the existing assumption that all CPUs are controlled by a single SCMI device. Signed-off-by: Mike Tipton <quic_mdtipton@quicinc.com> Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20Merge branch 'rust/cpufreq-dt' into cpufreq/arm/linux-nextViresh Kumar
2025-05-20cpufreq: Add Rust-based cpufreq-dt driverViresh Kumar
Introduce a Rust-based implementation of the cpufreq-dt driver, covering most of the functionality provided by the existing C version. Some features, such as retrieving platform data from `cpufreq-dt-platdev.c`, are still pending. The driver has been tested with QEMU, and frequency scaling works as expected. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-05-20nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_diskNilay Shroff
In the NVMe context, the term "shutdown" has a specific technical meaning. To avoid confusion, this commit renames the nvme_mpath_ shutdown_disk function to nvme_mpath_remove_disk to better reflect its purpose (i.e. removing the disk from the system). However, nvme_mpath_remove_disk was already in use, and its functionality is related to releasing or putting the head node disk. To resolve this naming conflict and improve clarity, the existing nvme_mpath_ remove_disk function is also renamed to nvme_mpath_put_disk. This renaming improves code readability and better aligns function names with their actual roles. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme: introduce multipath_always_on module paramNilay Shroff
Currently, a multipath head disk node is not created for single- ported NVMe adapters or private namespaces with non-unique NSID. However, creating a head node in these cases can help transparently handle transient PCIe link failures. Without a head node, features like delayed removal cannot be leveraged, making it difficult to tolerate such link failures. To address this, this commit introduces nvme_core module parameter multipath_always_on. When multipath_always_on is set to true, it forces the creation of a multipath head node regardless NVMe disk or namespace type. So this option allows the use of delayed removal of head node functionality even for single-ported NVMe disks and private namespaces with a unique NSID and thus helps transparently handle transient PCIe link failures. By default multipath_always_on is set to false, thus preserving the existing behavior. Setting it to true enables improved fault tolerance in PCIe setups. Moreover, please note that enabling this option would also implicitly enable nvme_core.multipath. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme-multipath: introduce delayed removal of the multipath head nodeNilay Shroff
Currently, the multipath head node of an NVMe disk is removed immediately as soon as all paths of the disk are removed. However, this can cause issues in scenarios where: - The disk hot-removal followed by re-addition. - Transient PCIe link failures that trigger re-enumeration, temporarily removing and then restoring the disk. In these cases, removing the head node prematurely may lead to a head disk node name change upon re-addition, requiring applications to reopen their handles if they were performing I/O during the failure. To address this, introduce a delayed removal mechanism of head disk node. During transient failure, instead of immediate removal of head disk node, the system waits for a configurable timeout, allowing the disk to recover. During transient disk failure, if application sends any IO then we queue it instead of failing such IO immediately. If the disk comes back online within the timeout, the queued IOs are resubmitted to the disk ensuring seamless operation. In case disk couldn't recover from the failure then queued IOs are failed to its completion and application receives the error. So this way, if disk comes back online within the configured period, the head node remains unchanged, ensuring uninterrupted workloads without requiring applications to reopen device handles. A new sysfs attribute, named "delayed_removal_secs" is added under head disk blkdev for user who wish to configure time for the delayed removal of head disk node. The default value of this attribute is set to zero second ensuring no behavior change unless explicitly configured. Link: https://lore.kernel.org/linux-nvme/Y9oGTKCFlOscbPc2@infradead.org/ Link: https://lore.kernel.org/linux-nvme/Y+1aKcQgbskA2tra@kbusch-mbp.dhcp.thefacebook.com/ Suggested-by: Keith Busch <kbusch@kernel.org> Suggested-by: Christoph Hellwig <hch@infradead.org> [nilay: reworked based on the original idea/POC from Christoph and Keith] Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme-pci: derive and better document max segments limitsChristoph Hellwig
Redefine the max segments and max integrity limits based on the limiting factors. This keeps exactly the same values for 4k PAGE_SIZE systems, but increases the number of segments for larger page size as it properly derives the scatterlist allocation based limit for them instead of assuming a 4k PAGE_SIZE. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org>
2025-05-20nvme-pci: use struct_size for allocation struct nvme_devChristoph Hellwig
This avoids open coding the variable size array arithmetics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20nvme-pci: add a symolic name for the small pool sizeLeon Romanovsky
Open coding magic numbers in multiple places is never a good idea. Signed-off-by: Leon Romanovsky <leon@kernel.org> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
2025-05-20nvme-pci: use a better encoding for small prp pool allocationsChristoph Hellwig
Add a separate flag to encode that the transfer is using the small page sized pool, and use a normal 0..n count for the number of descriptors. Contains improvements and suggestions from Kanchan Joshi <joshi.k@samsung.com> and Leon Romanovsky <leon@kernel.org>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20nvme-pci: rename the descriptor poolsChristoph Hellwig
They are used for both PRPs and SGLs, and we use descriptor elsewhere when referring to their allocations, so use that name here as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20nvme-pci: remove struct nvme_descriptorChristoph Hellwig
There is no real point in having a union of two pointer types here, just use a void pointer as we mix and match types between the arms of the union between the allocation and freeing side already. Also rename the nr_allocations field to nr_descriptors to better describe what it does. Signed-off-by: Christoph Hellwig <hch@lst.de> [leon: ported forward to include metadata SGL support] Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2025-05-20nvme-pci: store aborted state in flags variableLeon Romanovsky
Instead of keeping dedicated "bool aborted" variable, switch to a flags flags that can be used for other flags as well. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
2025-05-20nvme-pci: don't try to use SGLs for metadata on the admin queueChristoph Hellwig
No admin command defined in an NVMe specification supports metadata, but to protect against vendor specific commands using metadata ensure that we don't try to use SGLs for metadata on the admin queue, as NVMe does not support SGLs on the admin queue for the PCI transport. Do this by checking if the data transfer has been setup using SGLs as that is required for using SGLs for metadata. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2025-05-20nvme-pci: make PRP list DMA pools per-NUMA-nodeCaleb Sander Mateos
NVMe commands with over 8 KB of discontiguous data allocate PRP list pages from the per-nvme_device dma_pool prp_page_pool or prp_small_pool. Each call to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool spinlock. These device-global spinlocks are a significant source of contention when many CPUs are submitting to the same NVMe devices. On a workload issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2 NUMA nodes to 23 NVMe devices, we observed 2.4% of CPU time spent in _raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free. Ideally, the dma_pools would be per-hctx to minimize contention. But that could impose considerable resource costs in a system with many NVMe devices and CPUs. As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each nvme_queue to the set of DMA pools corresponding to its device and its hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by about half, to 1.2%. Preventing the sharing of PRP list pages across NUMA nodes also makes them cheaper to initialize. Link: https://lore.kernel.org/linux-nvme/CADUfDZqa=OOTtTTznXRDmBQo1WrFcDw1hBA7XwM7hzJ-hpckcA@mail.gmail.com/T/#u Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme-pci: factor out a nvme_init_hctx_common() helperCaleb Sander Mateos
nvme_init_hctx() and nvme_admin_init_hctx() are very similar. In preparation for adding more logic, factor out a nvme_init_hctx-common() helper. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvme-fc: do not reference lsrsp after failureDaniel Wagner
The lsrsp object is maintained by the LLDD. The lifetime of the lsrsp object is implicit. Because there is no explicit cleanup/free call into the LLDD, it is not safe to assume after xml_rsp_fails, that the lsrsp is still valid. The LLDD could have freed the object already. With the recent changes how fcloop tracks the resources, this is the case. Thus don't access lsrsp after xml_rsp_fails. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: don't wait for lport cleanupDaniel Wagner
The lifetime of the fcloop_lsreq is not tight to the lifetime of the host or target port, thus there is no need anymore to synchronize the cleanup path anymore. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: add missing fcloop_callback_host_doneDaniel Wagner
Add the missing fcloop_call_host_done calls so that the caller frees resources when something goes wrong. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fc: take tgtport refs for portentryDaniel Wagner
Ensure that the tgtport is not going away as long portentry has a pointer on it. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fc: free pending reqs on tgtport unregisterDaniel Wagner
When nvmet_fc_unregister_targetport is called by the LLDD, it's not possible to communicate with the host, thus all pending request will not be process. Thus explicitly free them. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: drop response if targetport is goneDaniel Wagner
When the target port is gone, the lsrsp pointer is invalid. Thus don't call the done function anymore instead just drop the response. This happens when the target sends a disconnect association. After this the target starts tearing down all resources and doesn't expect any response. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: allocate/free fcloop_lsreq directlyDaniel Wagner
fcloop depends on the host or the target to allocate the fcloop_lsreq object. This means that the lifetime of the fcloop_lsreq is tied to either the host or the target. Consequently, the host or the target must cooperate during shutdown. Unfortunately, this approach does not work well when the target forces a shutdown, as there are dependencies that are difficult to resolve in a clean way. The simplest solution is to decouple the lifetime of the fcloop_lsreq object by managing them directly within fcloop. Since this is not a performance-critical path and only a small number of LS objects are used during setup and cleanup, it does not significantly impact performance to allocate them during normal operation. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: prevent double port deletionDaniel Wagner
The delete callback can be called either via the unregister function or from the transport directly. Thus it is necessary ensure resources are not freed multiple times. Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: access fcpreq only when holding reqlockDaniel Wagner
The abort handling logic expects that the state and the fcpreq are only accessed when holding the reqlock lock. While at it, only handle the aborts in the abort handler. Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: update refs on tfcp_reqDaniel Wagner
Track the lifetime of the in-flight tfcp_req to ensure the object is not freed too early. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: refactor fcloop_delete_local_portDaniel Wagner
Use the newly introduced fcloop_lport_lookup instead of the open coded version. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: refactor fcloop_nport_alloc and track lportDaniel Wagner
The checks for a valid input values are mixed with the logic to insert a newly allocated nport. Refactor the function so that first the checks are done. This allows to untangle the setup steps into a more linear form which reduces the complexity of the functions. Also start tracking lport when a lport is assigned to a nport. This ensures, that the lport is not going away as long it is still referenced by a nport. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-20nvmet-fcloop: remove nport from list on last userDaniel Wagner
The nport object has an association with the rport and lport object, that means we can only remove an nport object from the global nport_list after the last user of an rport or lport is gone. Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>