summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)Author
2024-07-10scsi: qla2xxx: Update version to 10.02.09.300-kNilesh Javali
Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-12-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: Use QP lock to search for bsgQuinn Tran
On bsg timeout, hardware_lock is used as part of search for the srb. Instead, qpair lock should be used to iterate through different qpair. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-11-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: Reduce fabric scan duplicate codeQuinn Tran
For fabric scan, current code uses switch scan opcode and flags as the method to iterate through different commands to carry out the process. This makes it hard to read. This patch convert those opcode and flags into steps. In addition, this help reduce some duplicate code. Consolidate routines that handle GPNFT & GNNFT. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-10-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: Fix optrom version displayed in FDMIShreyas Deodhar
Bios version was popluated for FDMI response. Systems with EFI would show optrom version as 0. EFI version is populated here and BIOS version is already displayed under FDMI_HBA_BOOT_BIOS_NAME. Cc: stable@vger.kernel.org Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-9-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: During vport delete send async logout explicitlyManish Rangankar
During vport delete, it is observed that during unload we hit a crash because of stale entries in outstanding command array. For all these stale I/O entries, eh_abort was issued and aborted (fast_fail_io = 2009h) but I/Os could not complete while vport delete is in process of deleting. BUG: kernel NULL pointer dereference, address: 000000000000001c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI Workqueue: qla2xxx_wq qla_do_work [qla2xxx] RIP: 0010:dma_direct_unmap_sg+0x51/0x1e0 RSP: 0018:ffffa1e1e150fc68 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000021 RCX: 0000000000000001 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff8ce208a7a0d0 RBP: ffff8ce208a7a0d0 R08: 0000000000000000 R09: ffff8ce378aac9c8 R10: ffff8ce378aac8a0 R11: ffffa1e1e150f9d8 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8ce378aac9c8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8d217f000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000001c CR3: 0000002089acc000 CR4: 0000000000350ee0 Call Trace: <TASK> qla2xxx_qpair_sp_free_dma+0x417/0x4e0 ? qla2xxx_qpair_sp_compl+0x10d/0x1a0 ? qla2x00_status_entry+0x768/0x2830 ? newidle_balance+0x2f0/0x430 ? dequeue_entity+0x100/0x3c0 ? qla24xx_process_response_queue+0x6a1/0x19e0 ? __schedule+0x2d5/0x1140 ? qla_do_work+0x47/0x60 ? process_one_work+0x267/0x440 ? process_one_work+0x440/0x440 ? worker_thread+0x2d/0x3d0 ? process_one_work+0x440/0x440 ? kthread+0x156/0x180 ? set_kthread_struct+0x50/0x50 ? ret_from_fork+0x22/0x30 </TASK> Send out async logout explicitly for all the ports during vport delete. Cc: stable@vger.kernel.org Signed-off-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-8-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: Complete command early within lockShreyas Deodhar
A crash was observed while performing NPIV and FW reset, BUG: kernel NULL pointer dereference, address: 000000000000001c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 1 PREEMPT_RT SMP NOPTI RIP: 0010:dma_direct_unmap_sg+0x51/0x1e0 RSP: 0018:ffffc90026f47b88 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000021 RCX: 0000000000000002 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff8881041130d0 RBP: ffff8881041130d0 R08: 0000000000000000 R09: 0000000000000034 R10: ffffc90026f47c48 R11: 0000000000000031 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8881565e4a20 R15: 0000000000000000 FS: 00007f4c69ed3d00(0000) GS:ffff889faac80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000001c CR3: 0000000288a50002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x1a/0x60 ? page_fault_oops+0x16f/0x4a0 ? do_user_addr_fault+0x174/0x7f0 ? exc_page_fault+0x69/0x1a0 ? asm_exc_page_fault+0x22/0x30 ? dma_direct_unmap_sg+0x51/0x1e0 ? preempt_count_sub+0x96/0xe0 qla2xxx_qpair_sp_free_dma+0x29f/0x3b0 [qla2xxx] qla2xxx_qpair_sp_compl+0x60/0x80 [qla2xxx] __qla2x00_abort_all_cmds+0xa2/0x450 [qla2xxx] The command completion was done early while aborting the commands in driver unload path but outside lock to avoid the WARN_ON condition of performing dma_free_attr within the lock. However this caused race condition while command completion via multiple paths causing system crash. Hence complete the command early in unload path but within the lock to avoid race condition. Fixes: 0367076b0817 ("scsi: qla2xxx: Perform lockless command completion in abort path") Cc: stable@vger.kernel.org Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-7-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: Fix flash read failureQuinn Tran
Link up failure is observed as a result of flash read failure. Current code does not check flash read return code where it relies on FW checksum to detect the problem. Add check of flash read failure to detect the problem sooner. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/202406210815.rPDRDMBi-lkp@intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-6-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: Return ENOBUFS if sg_cnt is more than one for ELS cmdsSaurav Kashyap
Firmware only supports single DSDs in ELS Pass-through IOCB (0x53h), sg cnt is decided by the SCSI ML. User is not aware of the cause of an acutal error. Return the appropriate return code that will be decoded by API and application and proper error message will be displayed to user. Fixes: 6e98016ca077 ("[SCSI] qla2xxx: Re-organized BSG interface specific code.") Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-5-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: Fix for possible memory corruptionShreyas Deodhar
Init Control Block is dereferenced incorrectly. Correctly dereference ICB Cc: stable@vger.kernel.org Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-4-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: validate nvme_local_port correctlyNilesh Javali
The driver load failed with error message, qla2xxx [0000:04:00.0]-ffff:0: register_localport failed: ret=ffffffef and with a kernel crash, BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 Workqueue: events_unbound qla_register_fcport_fn [qla2xxx] RIP: 0010:nvme_fc_register_remoteport+0x16/0x430 [nvme_fc] RSP: 0018:ffffaaa040eb3d98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff9dfb46b78c00 RCX: 0000000000000000 RDX: ffff9dfb46b78da8 RSI: ffffaaa040eb3e08 RDI: 0000000000000000 RBP: ffff9dfb612a0a58 R08: ffffffffaf1d6270 R09: 3a34303a30303030 R10: 34303a303030305b R11: 2078787832616c71 R12: ffff9dfb46b78dd4 R13: ffff9dfb46b78c24 R14: ffff9dfb41525300 R15: ffff9dfb46b78da8 FS: 0000000000000000(0000) GS:ffff9dfc67c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000070 CR3: 000000018da10004 CR4: 00000000000206f0 Call Trace: qla_nvme_register_remote+0xeb/0x1f0 [qla2xxx] ? qla2x00_dfs_create_rport+0x231/0x270 [qla2xxx] qla2x00_update_fcport+0x2a1/0x3c0 [qla2xxx] qla_register_fcport_fn+0x54/0xc0 [qla2xxx] Exit the qla_nvme_register_remote() function when qla_nvme_register_hba() fails and correctly validate nvme_local_port. Cc: stable@vger.kernel.org Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-3-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10scsi: qla2xxx: Unable to act on RSCN for port onlineQuinn Tran
The device does not come online when the target port is online. There were multiple RSCNs indicating multiple devices were affected. Driver is in the process of finishing a fabric scan. A new RSCN (device up) arrived at the tail end of the last fabric scan. Driver mistakenly thinks the new RSCN is being taken care of by the previous fabric scan, where this notification is cleared and not acted on. The laser needs to be blinked again to get the device to show up. To prevent driver from accidentally clearing the RSCN notification, each RSCN is given a generation value. A fabric scan will scan for that generation(s). Any new RSCN arrive after the scan start will have a new generation value. This will trigger another scan to get latest data. The RSCN notification flag will be cleared when the scan is associate to that generation. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406210538.w875N70K-lkp@intel.com/ Fixes: bb2ca6b3f09a ("scsi: qla2xxx: Relogin during fabric disturbance") Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240710171057.35066-2-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10Merge branch '6.10/scsi-fixes' into 6.11/scsi-stagingMartin K. Petersen
Pull in my fixes branch to resolve an mpi3mr merge conflict reported by sfr. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-10Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "One core change that moves a disk start message to a location where it will only be printed once instead of twice plus a couple of error handling race fixes in the ufs driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Do not repeat the starting disk message scsi: ufs: core: Fix ufshcd_abort_one racing issue scsi: ufs: core: Fix ufshcd_clear_cmd racing issue
2024-07-08Merge tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme into ↵Jens Axboe
for-6.11/block Pull NVMe updates from Keith: "nvme updates for Linux 6.11 - Device initialization memory leak fixes (Keith) - More constants defined (Weiwen) - Target debugfs support (Hannes) - PCIe subsystem reset enhancements (Keith) - Queue-depth multipath policy (Redhat and PureStorage) - Implement get_unique_id (Christoph) - Authentication error fixes (Gaosheng)" * tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme: (21 commits) nvmet-auth: fix nvmet_auth hash error handling nvme: implement ->get_unique_id nvme-multipath: implement "queue-depth" iopolicy nvme-multipath: prepare for "queue-depth" iopolicy nvme-pci: do not directly handle subsys reset fallout lpfc_nvmet: implement 'host_traddr' nvme-fcloop: implement 'host_traddr' nvmet-fc: implement host_traddr() nvmet-rdma: implement host_traddr() nvmet-tcp: implement host_traddr() nvmet: add 'host_traddr' callback for debugfs nvmet: add debugfs support mailmap: add entry for Weiwen Hu nvme: rename CDR/MORE/DNR to NVME_STATUS_* nvme: fix status magic numbers nvme: rename nvme_sc_to_pr_err to nvme_status_to_pr_err nvme: split device add from initialization nvme: fc: split controller bringup handling nvme: rdma: split controller bringup handling nvme: tcp: split controller bringup handling ...
2024-07-05block: Remove REQ_OP_ZONE_RESET_ALL emulationDamien Le Moal
Now that device mapper can handle resetting all zones of a mapped zoned device using REQ_OP_ZONE_RESET_ALL, all zoned block device drivers support this operation. With this, the request queue feature BLK_FEAT_ZONE_RESETALL is not necessary and the emulation code in blk-zone.c can be removed. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240704052816.623865-5-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-04Merge patch series "mpi3mr: Support PCI Error Recovery"Martin K. Petersen
Sumit Saxena <sumit.saxena@broadcom.com> says: This patch series contains the changes done in the driver to support PCI error recovery. It is rework of older patch series from Ranjan Kumar, see [1]. [1] https://lore.kernel.org/all/20231214205900.270488-1-ranjan.kumar@broadcom.com/ Link: https://lore.kernel.org/r/20240627101735.18286-1-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: mpi3mr: Driver version updateSumit Saxena
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Link: https://lore.kernel.org/r/20240627101735.18286-4-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: mpi3mr: Prevent PCI writes from driver during PCI error recoverySumit Saxena
Prevent interaction with the hardware while the error recovery in progress. Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Co-developed-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Link: https://lore.kernel.org/r/20240627101735.18286-3-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: mpi3mr: Support PCI Error Recovery callback handlersSumit Saxena
PCI Error recovery support is required to recover the controller upon detection of PCI errors. Add support for the PCI error recovery callback handlers in mpi3mr driver. Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Co-developed-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Link: https://lore.kernel.org/r/20240627101735.18286-2-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04Merge patch series "Update lpfc to revision 14.4.0.3"Martin K. Petersen
Justin Tee <justintee8345@gmail.com> says: Update lpfc to revision 14.4.0.3 This patch set contains bug fixes related to discovery, submission of mailbox commands, and proper endianness conversions. The patches were cut against Martin's 6.11/scsi-queue tree. Link: https://lore.kernel.org/r/20240628172011.25921-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Update lpfc version to 14.4.0.3Justin Tee
Update lpfc version to 14.4.0.3. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Revise lpfc_prep_embed_io routine with proper endian macro usagesJustin Tee
On big endian architectures, it is possible to run into a memory out of bounds pointer dereference when FCP targets are zoned. In lpfc_prep_embed_io, the memcpy(ptr, fcp_cmnd, sgl->sge_len) is referencing a little endian formatted sgl->sge_len value. So, the memcpy can cause big endian systems to crash. Redefine the *sgl ptr as a struct sli4_sge_le to make it clear that we are referring to a little endian formatted data structure. And, update the routine with proper le32_to_cpu macro usages. Fixes: af20bb73ac25 ("scsi: lpfc: Add support for 32 byte CDBs") Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-8-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Fix incorrect request len mbox field when setting trunking via sysfsJustin Tee
When setting trunk modes through sysfs, the SLI_CONFIG mailbox command's command payload length is incorrectly hardcoded to 12 bytes. SLI_CONFIG's payload length field should be specified large enough to encompass both the submailbox command header and the submailbox request itself. Thus, replace the hardcoded 12 bytes with a clearer calculation by way of sizeof(struct lpfc_mbx_set_trunk_mode) - sizeof(struct lpfc_sli4_cfg_mhdr). Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Handle mailbox timeouts in lpfc_get_sfp_infoJustin Tee
The MBX_TIMEOUT return code is not handled in lpfc_get_sfp_info and the routine unconditionally frees submitted mailbox commands regardless of return status. The issue is that for MBX_TIMEOUT cases, when firmware returns SFP information at a later time, that same mailbox memory region references previously freed memory in its cmpl routine. Fix by adding checks for the MBX_TIMEOUT return code. During mailbox resource cleanup, check the mbox flag to make sure that the wait did not timeout. If the MBOX_WAKE flag is not set, then do not free the resources because it will be freed when firmware completes the mailbox at a later time in its cmpl routine. Also, increase the timeout from 30 to 60 seconds to accommodate boot scripts requiring longer timeouts. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Fix handling of fully recovered fabric node in dev_loss callbkJustin Tee
In rare cases when a fabric node is recovered after a link bounce and before dev_loss_tmo callbk is reached, the driver may leave the fabric node in an inconsistent state with the NLP_IN_DEV_LOSS flag perpetually set. In lpfc_dev_loss_tmo_callbk, a check is added for a recovered fabric node. If the node is recovered, then don't queue the lpfc_dev_loss_tmo_handler work. In lpfc_dev_loss_tmo_handler, the path taken for the recovered fabric nodes is updated to clear the NLP_IN_DEV_LOSS flag. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Relax PRLI issue conditions after GID_FT responseJustin Tee
If previously in REG_LOGIN_ISSUE state, then remove the requirement that PLOGI must have been received from the remote port before issuing a PRLI. After GID_FT completes, it does not matter whether the driver itself sent a PLOGI or received one. The fact that we're in REG_LOGIN_ISSUE state simply means that the next state should be issuing the PRLI to continue discovery of the remote port. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Allow DEVICE_RECOVERY mode after RSCN receipt if in PRLI_ISSUE stateJustin Tee
Certain vendor specific targets initially register with the fabric as an initiator function first and then re-register as a target function afterwards. The timing of the target function re-registration can cause a race condition such that the driver is stuck assuming the remote port as an initiator function and never discovers the target's hosted LUNs. Expand the nlp_state qualifier to also include NLP_STE_PRLI_ISSUE because the state means that PRLI was issued but we have not quite reached MAPPED_NODE state yet. If we received an RSCN in the PRLI_ISSUE state, then we should restart discovery again by going into DEVICE_RECOVERY. Fixes: dded1dc31aa4 ("scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN") Cc: <stable@vger.kernel.org> # v6.6+ Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: lpfc: Cancel ELS WQE instead of issuing abort when SLI port is inactiveJustin Tee
During SLI port errata events, there should be no expectation that submitted outstanding WQEs will return back CQEs. In these situations, the driver should not rely on receiving CQEs from the SLI port to signal WQE resource clean up. Put an sli_flag LPFC_SLI_ACTIVE check in lpfc_els_flush_cmd() when walking the txcmplq. The sli_flag check helps determine whether to issue an abort or driver based cancel on outstanding WQEs. If !LPFC_SLI_ACTIVE, then there's no point to issue anything to the SLI port. Instead, let the driver based cancel logic clean up the submitted WQE resources. Also, enhance some abort log messages that help with future debugging. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-2-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: sd: Do not repeat the starting disk messageDamien Le Moal
The SCSI disk message "Starting disk" to signal resuming of a suspended disk is printed in both sd_resume() and sd_resume_common() which results in this message being printed twice when resuming from e.g. autosuspend: $ echo 5000 > /sys/block/sda/device/power/autosuspend_delay_ms $ echo auto > /sys/block/sda/device/power/control [ 4962.438293] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 4962.501121] sd 0:0:0:0: [sda] Stopping disk $ echo on > /sys/block/sda/device/power/control [ 4972.805851] sd 0:0:0:0: [sda] Starting disk [ 4980.558806] sd 0:0:0:0: [sda] Starting disk Fix this double print by removing the call to sd_printk() from sd_resume() and moving the call to sd_printk() in sd_resume_common() earlier in the function, before the check using sd_do_start_stop(). Doing so, the message is printed once regardless if sd_resume_common() actually executes sd_start_stop_device() (i.e. SCSI device case) or not (libsas and libata managed ATA devices case). Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240701215326.128067-1-dlemoal@kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: pm8001: Update log level when reading config tableTerrence Adams
Reading the main config table occurs as a part of initialization in pm80xx_chip_init(). Because of this it makes more sense to have it be a part of the INIT logging. Signed-off-by: Terrence Adams <tadamsjr@google.com> Link: https://lore.kernel.org/r/20240627155924.2361370-3-tadamsjr@google.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: pm80xx: Set phy->enable_completion only when we wait for itIgor Pylypiv
pm8001_phy_control() populates the enable_completion pointer with a stack address, sends a PHY_LINK_RESET / PHY_HARD_RESET, waits 300 ms, and returns. The problem arises when a phy control response comes late. After 300 ms the pm8001_phy_control() function returns and the passed enable_completion stack address is no longer valid. Late phy control response invokes complete() on a dangling enable_completion pointer which leads to a kernel crash. Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Terrence Adams <tadamsjr@google.com> Link: https://lore.kernel.org/r/20240627155924.2361370-2-tadamsjr@google.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04scsi: mpi3mr: Correct a test in mpi3mr_sas_port_add()Tomas Henzl
The test for a possible shift overflow is not correct. Fix it by replacing the '>' with a '>='. Signed-off-by: Tomas Henzl <thenzl@redhat.com> Link: https://lore.kernel.org/r/20240627074827.13672-1-thenzl@redhat.com Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-04ata,scsi: Remove wrapper ata_sas_port_alloc()Niklas Cassel
The ata_sas_port_alloc() wrapper mainly exists in order to export the internal libata function which it wraps. The secondary reason is that it initializes some ata_port struct members. However, ata_sas_port_alloc() is only used in a single location, sas_ata_init(), which already performs some ata_port struct member initialization, so it does not make sense to spread this initialization out over two separate locations. Thus, remove the wrapper and instead export the libata function directly, and move the libsas specific ata_port initialization to sas_ata_init(), which already does some ata_port initialization. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240703184418.723066-19-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-04ata,scsi: Remove wrappers ata_sas_tport_{add,delete}()Niklas Cassel
The ata_sas_tport_add() and ata_sas_tport_delete() wrappers only exist in order to export the internal libata functions which they wrap. Remove the wrappers and instead export the libata functions directly. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240703184418.723066-12-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-07-03block: split integrity support out of bio.hChristoph Hellwig
Split struct bio_integrity_payload and the related prototypes out of bio.h into a separate bio-integrity.h header so that it is only pulled in by the few places that need it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240702151047.1746127-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-03Merge tag 'v6.10-rc6' into for-6.11/block-postJens Axboe
Pull in v6.10-rc6 to resolve a conflict for the integrity cleanups. * tag 'v6.10-rc6': (778 commits) Linux 6.10-rc6 ata: ahci: Clean up sysfs file on error ata: libata-core: Fix double free on error ata,scsi: libata-core: Do not leak memory for ata_port struct members ata: libata-core: Fix null pointer dereference on error x86-32: fix cmpxchg8b_emu build error with clang x86: stop playing stack games in profile_pc() i2c: testunit: discard write requests while old command is running i2c: testunit: don't erase registers after STOP tty: mxser: Remove __counted_by from mxser_board.ports[] randomize_kstack: Remove non-functional per-arch entropy filtering string: kunit: add missing MODULE_DESCRIPTION() macros ata: libata-core: Add ATA_HORKAGE_NOLPM for all Crucial BX SSD1 models MAINTAINERS: Update IOMMU tree location tools/power turbostat: Add local build_bug.h header for snapshot target tools/power turbostat: Fix unc freq columns not showing with '-q' or '-l' tools/power turbostat: option '-n' is ambiguous drm/drm_file: Fix pid refcounting race kallsyms: rework symbol lookup return codes gpiolib: cdev: Ignore reconfiguration without direction ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-03parport: Remove parport_driver.devmodelDr. David Alan Gilbert
'devmodel' hasn't actually been used since: 'commit 3275158fa52a ("parport: remove use of devmodel")' and everyone now has it set to true and has been fixed up; remove the flag. (There are still comments all over about it) Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lore.kernel.org/r/20240502154823.67235-4-linux@treblig.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03driver core: have match() callback in struct bus_type take a const *Greg Kroah-Hartman
In the match() callback, the struct device_driver * should not be changed, so change the function callback to be a const *. This is one step of many towards making the driver core safe to have struct device_driver in read-only memory. Because the match() callback is in all busses, all busses are modified to handle this properly. This does entail switching some container_of() calls to container_of_const() to properly handle the constant *. For some busses, like PCI and USB and HV, the const * is cast away in the match callback as those busses do want to modify those structures at this point in time (they have a local lock in the driver structure.) That will have to be changed in the future if they wish to have their struct device * in read-only-memory. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alex Elder <elder@kernel.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-01Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "A couple of error leg problems, one affecting scsi_debug and the other affecting pure SAS (i.e. not SATA) SCSI expanders" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed scsi: scsi_debug: Fix create target debugfs failure
2024-06-30Merge tag 'ata-6.10-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fixes from Niklas Cassel: - Add NOLPM quirk for for all Crucial BX SSD1 models. Considering that we now have had bug reports for 3 different BX SSD1 variants from Crucial with the same product name, make the quirk more inclusive, to catch more device models from the same generation. - Fix a trivial NULL pointer dereference in the error path for ata_host_release(). - Create a ata_port_free(), so that we don't miss freeing ata_port struct members when freeing a struct ata_port. - Fix a trivial double free in the error path for ata_host_alloc(). - Ensure that we remove the libata "remapped NVMe device count" sysfs entry on .probe() error. * tag 'ata-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: ahci: Clean up sysfs file on error ata: libata-core: Fix double free on error ata,scsi: libata-core: Do not leak memory for ata_port struct members ata: libata-core: Fix null pointer dereference on error ata: libata-core: Add ATA_HORKAGE_NOLPM for all Crucial BX SSD1 models
2024-06-30ata,scsi: libata-core: Do not leak memory for ata_port struct membersNiklas Cassel
libsas is currently not freeing all the struct ata_port struct members, e.g. ncq_sense_buf for a driver supporting Command Duration Limits (CDL). Add a function, ata_port_free(), that is used to free a ata_port, including its struct members. It makes sense to keep the code related to freeing a ata_port in its own function, which will also free all the struct members of struct ata_port. Fixes: 18bd7718b5c4 ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD") Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240629124210.181537-8-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-06-28kernel-wide: fix spelling mistakes like "assocative" -> "associative"Jesse Brandeburg
There were several instances of the string "assocat" in the kernel, which should have been spelled "associat", with the various endings of -ive, -ed, -ion, and sometimes beginnging with dis-. Add to the spelling dictionary the corrections so that future instances will be caught by checkpatch, and fix the instances found. Originally noticed by accident with a 'git grep socat'. Link: https://lkml.kernel.org/r/20240612001247.356867-1-jesse.brandeburg@intel.com Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-28mpt3sas_scsih: don't set QUEUE_FLAG_NOMERGESChristoph Hellwig
Setting QUEUE_FLAG_NOMERGES was added in commit d1b01d14b7ba ("scsi: mpt3sas: Set NVMe device queue depth as 128") without any explanation. Drivers should second guess the block layer merge decisions, so remove the flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240627124926.512662-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-28megaraid_sas: don't set QUEUE_FLAG_NOMERGESChristoph Hellwig
Setting QUEUE_FLAG_NOMERGES was added in commit 15dd03811d99dcf ("scsi: megaraid_sas: NVME Interface detection and prop settings") without any explanation. Drivers should second guess the block layer merge decisions, so remove the flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240627124926.512662-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-26Merge patch series "mpi3mr: Host diag buffer support"Martin K. Petersen
Ranjan Kumar <ranjan.kumar@broadcom.com> says: The controllers managed by mpi3mr driver requires system memory to save hardware and firmware diagnostic information, this patch set enhances the drivers to provide host memory to the controller for diagnostic information. This patch set also provides driver changes to push kernel messages into the diagnostic buffers reserved for the driver, so that the information will be available as part of debug data fetched from the controller. In addition, support for configuring automatic diagnostic information is added in the driver. Link: https://lore.kernel.org/r/20240626102646.14298-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: mpi3mr: Update driver version to 8.9.1.0.50Ranjan Kumar
Update driver version to 8.9.1.0.50 Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240626102646.14298-5-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: mpi3mr: Add ioctl support for HDBRanjan Kumar
Add interface for applications to manage the host diagnostic buffers and update the automatic diag buffer capture triggers. Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240626102646.14298-4-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: mpi3mr: Trigger supportRanjan Kumar
Add functions to process automatic diag triggers. If a condition defined in the triggers is met, the driver will call appropriate controller functions to save the diagnostic information. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405151955.BiAWI1SY-lkp@intel.com/ Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240626102646.14298-3-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: mpi3mr: HDB allocation and posting for hardware and firmware buffersRanjan Kumar
To be able to debug controller problems it is beneficial to allocate and configure system/host memory buffers which can be used to capture hardware and firmware diagnostic information. Add functions required to allocate and post firmware and hardware diagnostic buffers to the controller and to set up automatic diagnostic capture triggers. Captures will be triggered under the following circumstances: 1. Firmware is in FAULT state. 2. Admin commands time out. 3. Controller reset caused due to I/O timeout Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405151758.7xrJz6rp-lkp@intel.com/ Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240626102646.14298-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-06-26scsi: lpfc: Fix a possible null pointer dereferenceHuai-Yuan Liu
In function lpfc_xcvr_data_show, the memory allocation with kmalloc might fail, thereby making rdp_context a null pointer. In the following context and functions that use this pointer, there are dereferencing operations, leading to null pointer dereference. To fix this issue, a null pointer check should be added. If it is null, use scnprintf to notify the user and return len. Fixes: 479b0917e447 ("scsi: lpfc: Create a sysfs entry called lpfc_xcvr_data for transceiver info") Signed-off-by: Huai-Yuan Liu <qq810974084@gmail.com> Link: https://lore.kernel.org/r/20240621082545.449170-1-qq810974084@gmail.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>