summaryrefslogtreecommitdiff
path: root/drivers/pci/pcie
AgeCommit message (Collapse)Author
2025-06-23PCI/PTM: Build debugfs code only if CONFIG_DEBUG_FS is enabledManivannan Sadhasivam
Otherwise, the following build error will happen for CONFIG_DEBUG_FS=n && CONFIG_PCIE_PTM=y: drivers/pci/pcie/ptm.c:498:25: error: redefinition of 'pcie_ptm_create_debugfs' 498 | struct pci_ptm_debugfs *pcie_ptm_create_debugfs(struct device *dev, void *pdata, | ^ ./include/linux/pci.h:1915:2: note: previous definition is here 1915 | *pcie_ptm_create_debugfs(struct device *dev, void *pdata, | ^ drivers/pci/pcie/ptm.c:546:6: error: redefinition of 'pcie_ptm_destroy_debugfs' 546 | void pcie_ptm_destroy_debugfs(struct pci_ptm_debugfs *ptm_debugfs) | ^ ./include/linux/pci.h:1918:1: note: previous definition is here 1918 | pcie_ptm_destroy_debugfs(struct pci_ptm_debugfs *ptm_debugfs) { } | Fixes: 132833405e61 ("PCI: Add debugfs support for exposing PTM context") Reported-by: Eric Biggers <ebiggers@kernel.org> Closes: https://lore.kernel.org/linux-pci/20250607025506.GA16607@sol Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250608033305.15214-1-manivannan.sadhasivam@linaro.org
2025-06-04Merge branch 'pci/ptm-debugfs'Bjorn Helgaas
- Add debugfs support for exposing DWC device-specific PTM context (Manivannan Sadhasivam) * pci/ptm-debugfs: PCI: qcom-ep: Mask PTM_UPDATING interrupt PCI: dwc: Add debugfs support for PTM context PCI: dwc: Pass DWC PCIe mode to dwc_pcie_debugfs_init() PCI: Add debugfs support for exposing PTM context
2025-06-04Merge branch 'pci/bwctrl'Bjorn Helgaas
- Simplify link bandwidth controller by replacing the count of Link Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag (Ilpo Järvinen) - Update the Link Speed after retraining, since the Link Speed may have changed (Ilpo Järvinen) * pci/bwctrl: PCI: Update Link Speed after retraining PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag
2025-05-30PCI/ERR: Remove misleading TODO regarding kernel panicManivannan Sadhasivam
A PCI device is just another peripheral in a system. So failure to recover it, must not result in a kernel panic. So remove the TODO which is quite misleading. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Link: https://patch.msgid.link/20250508-pcie-reset-slot-v4-1-7050093e2b50@linaro.org
2025-05-23PCI/AER: Add sysfs attributes for log ratelimitsJon Pan-Doh
Allow userspace to read/write log ratelimits per device (including enable/disable). Create aer/ sysfs directory to store them and any future AER configs. The new sysfs files are: /sys/bus/pci/devices/*/aer/correctable_ratelimit_burst /sys/bus/pci/devices/*/aer/correctable_ratelimit_interval_ms /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_burst /sys/bus/pci/devices/*/aer/nonfatal_ratelimit_interval_ms The default values are ratelimit_burst=10, ratelimit_interval_ms=5000, so if we try to emit more than 10 messages in a 5 second period, some are suppressed. Update AER sysfs ABI filename to reflect the broader scope of AER sysfs attributes (e.g. stats and ratelimits). Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats -> sysfs-bus-pci-devices-aer Tested using aer-inject[1]. Configured correctable log ratelimit to 5. Sent 6 AER errors. Observed 5 errors logged while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) shows 6. Disabled ratelimiting and sent 6 more AER errors. Observed all 6 errors logged and accounted in AER stats (12 total errors). [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git [bhelgaas: note fatal errors are not ratelimited, "aer_report" -> "aer_info", replace ratelimit_log_enable toggle with *_ratelimit_interval_ms] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-21-helgaas@kernel.org
2025-05-23PCI/AER: Ratelimit correctable and non-fatal error loggingJon Pan-Doh
Spammy devices can flood kernel logs with AER errors and slow/stall execution. Add per-device ratelimits for AER correctable and non-fatal uncorrectable errors that use the kernel defaults (10 per 5s). Logging of fatal errors is not ratelimited. There are two AER logging entry points: - aer_print_error() is used by DPC and native AER - pci_print_aer() is used by GHES and CXL The native AER aer_print_error() case includes a loop that may log details from multiple devices, which are ratelimited individually. If we log details for any device, we also log the Error Source ID from the Root Port or RCEC. If no such device details are found, we still log the Error Source from the ERR_* Message, ratelimited by the Root Port or RCEC that received it. The DPC aer_print_error() case is not ratelimited, since this only happens for fatal errors. The CXL pci_print_aer() case is ratelimited by the Error Source device. The GHES pci_print_aer() case is via aer_recover_work_func(), which searches for the Error Source device. If the device is not found, there's no per-device ratelimit, so we use a system-wide ratelimit that covers all error types (correctable, non-fatal, and fatal). Sargun at Meta reported internally that a flood of AER errors causes RCU CPU stall warnings and CSD-lock warnings. Tested using aer-inject[1]. Sent 11 AER errors. Observed 10 errors logged while AER stats (cat /sys/bus/pci/devices/<dev>/aer_dev_correctable) show true count of 11. [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git [bhelgaas: commit log, factor out trace_aer_event() and aer_print_rp_info() changes to previous patches, enable Error Source logging if any downstream detail will be printed, don't ratelimit fatal errors, "aer_report" -> "aer_info", "cor_log_ratelimit" -> "correctable_ratelimit", "uncor_log_ratelimit" -> "nonfatal_ratelimit"] Reported-by: Sargun Dhillon <sargun@meta.com> Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-19-helgaas@kernel.org
2025-05-23PCI/AER: Simplify add_error_device()Bjorn Helgaas
Return -ENOSPC error early so the usual path through add_error_device() is the straightline code. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-18-helgaas@kernel.org
2025-05-23PCI/AER: Convert aer_get_device_error_info(), aer_print_error() to indexBjorn Helgaas
Previously aer_get_device_error_info() and aer_print_error() took a pointer to struct aer_err_info and a pointer to a pci_dev. Typically the pci_dev was one of the elements of the aer_err_info.dev[] array (DPC was an exception, where the dev[] array was unused). Convert aer_get_device_error_info() and aer_print_error() to take an index into the aer_err_info.dev[] array instead. A future patch will add per-device ratelimit information, so the index makes it convenient to find the ratelimit associated with the device. To accommodate DPC, set info->dev[0] to the DPC port before using these interfaces. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-17-helgaas@kernel.org
2025-05-23PCI/AER: Rename struct aer_stats to aer_infoKarolina Stolarek
Update name to reflect the broader definition of structs/variables that are stored (e.g. ratelimits). This is a preparatory patch for adding rate limit support. [bhelgaas: "aer_report" -> "aer_info"] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-16-helgaas@kernel.org
2025-05-23PCI/AER: Reduce pci_print_aer() correctable error level to KERN_WARNINGKarolina Stolarek
Some existing logs in pci_print_aer() log with error severity by default. Convert them to use KERN_WARNING for correctable errors and KERN_ERR for uncorrectable errors. [bhelgaas: commit log] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-15-helgaas@kernel.org
2025-05-23PCI/ERR: Add printk level to pcie_print_tlp_log()Bjorn Helgaas
aer_print_error() produces output at a printk level (KERN_ERR/KERN_WARNING/ etc) that depends on the kind of error, and it calls pcie_print_tlp_log(), which previously always produced output at KERN_ERR. Add a "level" parameter so aer_print_error() can control the level of the pcie_print_tlp_log() output to match. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-14-helgaas@kernel.org
2025-05-23PCI/AER: Check log level once and remember itKarolina Stolarek
When reporting an AER error, we check its type multiple times to determine the log level for each message. Do this check only in the top-level functions (aer_isr_one_error(), pci_print_aer()) and save the level in struct aer_err_info. [bhelgaas: save log level in struct aer_err_info instead of passing it as a parameter] Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-13-helgaas@kernel.org
2025-05-23PCI/AER: Trace error event before ratelimitingBjorn Helgaas
As with the AER statistics, we always want to emit trace events, even if the actual dmesg logging is rate limited. Call trace_aer_event() immediately after pci_dev_aer_stats_incr() so both happen before ratelimiting. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-12-helgaas@kernel.org
2025-05-23PCI/AER: Update statistics before ratelimitingBjorn Helgaas
There are two AER logging entry points: - aer_print_error() is used by DPC (dpc_process_error()) and native AER handling (aer_process_err_devices()). - pci_print_aer() is used by GHES (aer_recover_work_func()) and CXL (cxl_handle_rdport_errors()) Both use __aer_print_error() to print the AER error bits. Previously __aer_print_error() also incremented the AER statistics via pci_dev_aer_stats_incr(). Call pci_dev_aer_stats_incr() early in the entry points instead of in __aer_print_error() so we update the statistics even if the actual printing of error bits is rate limited by a future change. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-11-helgaas@kernel.org
2025-05-23PCI/AER: Simplify pci_print_aer()Bjorn Helgaas
Simplify pci_print_aer() by initializing the struct aer_err_info "info" with a designated initializer list (it was previously initialized with memset()) and using pci_name(). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-10-helgaas@kernel.org
2025-05-23PCI/AER: Initialize aer_err_info before using itBjorn Helgaas
Previously the struct aer_err_info "e_info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "e_info" at declaration with a designated initializer list, which initializes the other members to zero. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-9-helgaas@kernel.org
2025-05-23PCI/AER: Move aer_print_source() earlier in fileBjorn Helgaas
Move aer_print_source() earlier in the file so a future change can use it from aer_print_error(), where it's easier to rate limit it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-8-helgaas@kernel.org
2025-05-23PCI/AER: Rename aer_print_port_info() to aer_print_source()Jon Pan-Doh
Rename aer_print_port_info() to aer_print_source() to be more descriptive. This logs the Error Source ID logged by a Root Port or Root Complex Event Collector when it receives an ERR_COR, ERR_NONFATAL, or ERR_FATAL Message. [bhelgaas: aer_print_rp_info() -> aer_print_source()] Signed-off-by: Jon Pan-Doh <pandoh@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-7-helgaas@kernel.org
2025-05-23PCI/AER: Extract bus/dev/fn in aer_print_port_info() with PCI_BUS_NUM(), etcBjorn Helgaas
Use PCI_BUS_NUM(), PCI_SLOT(), PCI_FUNC() to extract the bus number, device, and function number directly from the Error Source ID. There's no need to shift and mask it explicitly. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-6-helgaas@kernel.org
2025-05-23PCI/AER: Consolidate Error Source ID logging in aer_isr_one_error_type()Bjorn Helgaas
Previously we decoded the AER Error Source ID in aer_isr_one_error_type(), then again in find_source_device() if we didn't find any devices with errors logged in their AER Capabilities. Consolidate this so we only decode and log the Error Source ID once in aer_isr_one_error_type(). Add a "found" parameter so we can add a note when we didn't find any downstream devices with errors logged in their AER Capability. This changes the dmesg logging when we found no devices with errors logged: - pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0 - pci 0000:00:01.0: AER: found no error details for 0000:02:00.0 + pci 0000:00:01.0: AER: Correctable error message received from 0000:02:00.0 (no details found) Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-5-helgaas@kernel.org
2025-05-23PCI/AER: Factor COR/UNCOR error handling out from aer_isr_one_error()Bjorn Helgaas
aer_isr_one_error() duplicates the Error Source ID logging and AER error processing for Correctable Errors and Uncorrectable Errors. Factor out the duplicated code to aer_isr_one_error_type(). aer_isr_one_error() doesn't need the struct aer_rpc pointer, so pass it the Root Port or RCEC pci_dev pointer instead. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-4-helgaas@kernel.org
2025-05-23PCI/DPC: Log Error Source ID only when validBjorn Helgaas
DPC Error Source ID is only valid when the DPC Trigger Reason indicates that DPC was triggered due to reception of an ERR_NONFATAL or ERR_FATAL Message (PCIe r6.0, sec 7.9.14.5). When DPC was triggered by ERR_NONFATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_NFE) or ERR_FATAL (PCI_EXP_DPC_STATUS_TRIGGER_RSN_FE) from a downstream device, log the Error Source ID (decoded into domain/bus/device/function). Don't print the source otherwise, since it's not valid. For DPC trigger due to reception of ERR_NONFATAL or ERR_FATAL, the dmesg logging changes: - pci 0000:00:01.0: DPC: containment event, status:0x000d source:0x0200 - pci 0000:00:01.0: DPC: ERR_FATAL detected + pci 0000:00:01.0: DPC: containment event, status:0x000d, ERR_FATAL received from 0000:02:00.0 and when DPC triggered for other reasons, where DPC Error Source ID is undefined, e.g., unmasked uncorrectable error: - pci 0000:00:01.0: DPC: containment event, status:0x0009 source:0x0200 - pci 0000:00:01.0: DPC: unmasked uncorrectable error detected + pci 0000:00:01.0: DPC: containment event, status:0x0009: unmasked uncorrectable error detected Previously the "containment event" message was at KERN_INFO and the "%s detected" message was at KERN_WARNING. Now the single message is at KERN_WARNING. Fixes: 26e515713342 ("PCI: Add Downstream Port Containment driver") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20250522232339.1525671-3-helgaas@kernel.org
2025-05-23PCI/DPC: Initialize aer_err_info before using itBjorn Helgaas
Previously the struct aer_err_info "info" was allocated on the stack without being initialized, so it contained junk except for the fields we explicitly set later. Initialize "info" at declaration so it starts as all zeros. Fixes: 8aefa9b0d910 ("PCI/DPC: Print AER status in DPC event handling") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250522232339.1525671-2-helgaas@kernel.org
2025-05-15PCI: Update Link Speed after retrainingIlpo Järvinen
PCIe Link Retraining can alter Link Speed. pcie_retrain_link() that performs the Link Training is called from bwctrl and ASPM driver. While bwctrl listens for Link Bandwidth Management Status (LBMS) to pick up changes in Link Speed, there is a race between pcie_reset_lbms() clearing LBMS after the Link Training and pcie_bwnotif_irq() reading the Link Status register. If LBMS is already cleared when the irq handler reads the register, the interrupt handler will return early with IRQ_NONE and won't update the Link Speed. When Link Speed update originates from bwctrl, pcie_bwctrl_change_speed() ensures Link Speed is updated after the retraining. ASPM driver, however, calls pcie_retrain_link() but does not update the Link Speed after retraining which can result in stale Link Speed. Also, it is possible to have ASPM support with CONFIG_PCIEPORTBUS=n in which case bwctrl will not be built in (and thus won't update the Link Speed at all). To ensure Link Speed is not left stale after Link Training, move the call to pcie_update_link_speed() from pcie_bwctrl_change_speed() into pcie_retrain_link(). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://lore.kernel.org/linux-pci/aBCjpfyYmlkJ12AZ@wunner.de Link: https://lore.kernel.org/r/20250514132821.15705-1-ilpo.jarvinen@linux.intel.com
2025-05-15PCI: Add debugfs support for exposing PTM contextManivannan Sadhasivam
Precision Time Management (PTM) mechanism defined in PCIe spec r6.0, sec 6.21 allows precise coordination of timing information across multiple components in a PCIe hierarchy with independent local time clocks. PCI core already supports enabling PTM in the root port and endpoint devices through PTM Extended Capability registers. But the PTM context supported by the PTM capable components such as Root Complex (RC) and Endpoint (EP) controllers were not exposed as of now. Part of the reason is that the spec doesn't define how the context information is exposed to the software and left it to the vendor implementation. So there is no standardized way to get access to the context information and each vendor have defined their own way. This commit adds debugfs support to expose the PTM context to userspace from both PCIe RC and EP controllers. Since the context information is exposed in a vendor specific way, the debugfs interface allows the controller drivers to implement callbacks for each attribute, to be called by the generic PTM driver. The Controller drivers are expected to call pcie_ptm_create_debugfs() to create the debugfs attributes for the PTM context and call pcie_ptm_destroy_debugfs() to destroy them. The drivers should also populate the relevant callbacks in the 'struct pcie_ptm_ops' structure based on the controller implementation. Below PTM context are exposed through debugfs: PCIe RC ======= 1. PTM Local clock 2. PTM T2 timestamp 3. PTM T3 timestamp 4. PTM Context valid PCIe EP ======= 1. PTM Local clock 2. PTM T1 timestamp 3. PTM T4 timestamp 4. PTM Master clock 5. PTM Context update Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [kwilczynski: fix overflow issue reported by Dan Carpenter from https://lore.kernel.org/linux-pci/b41c1754-c6b7-4805-9f14-7c643d6c5304@suswa.mountain] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://patch.msgid.link/20250505-pcie-ptm-v4-1-02d26d51400b@linaro.org
2025-05-15PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flagIlpo Järvinen
PCIe BW controller counted LBMS assertions for the purposes of the Target Speed quirk (pcie_failed_link_retrain()). It was also a plan to expose the LBMS count through sysfs to allow better diagnosing link related issues. Lukas Wunner suggested, however, that adding a trace event would be better for diagnostics purposes, leaving only pcie_failed_link_retrain() as a user of the lbms_count. The logic in pcie_failed_link_retrain() does not require keeping count of LBMS assertions, so replace lbms_count with a simple flag in pci_dev's priv_flags. The reduced complexity allows removing pcie_bwctrl_lbms_rwsem. Since pcie_failed_link_retrain() runs before bwctrl is probed during boot, the LBMS in Link Status register still has to be checked by the quirk. The priv_flags numbering is not continuous because hotplug code added a few flags to fill numbers 4-5 (hotplug and bwctrl changes are routed through in different branches). Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: squashed a fix to resolve build failures from https://lore.kernel.org/all/20250508090036.1528-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20250422115548.1483-1-ilpo.jarvinen@linux.intel.com
2025-03-28Merge tag 'pci-v6.15-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Enable Configuration RRS SV, which makes device readiness visible, early instead of during child bus scanning (Bjorn Helgaas) - Log debug messages about reset methods being used (Bjorn Helgaas) - Avoid reset when it has been disabled via sysfs (Nishanth Aravamudan) - Add common pci-ep-bus.yaml schema for exporting several peripherals of a single PCI function via devicetree (Andrea della Porta) - Create DT nodes for PCI host bridges to enable loading device tree overlays to create platform devices for PCI devices that have several features that require multiple drivers (Herve Codina) Resource management: - Enlarge devres table[] to accommodate bridge windows, ROM, IOV BARs, etc., and validate BAR index in devres interfaces (Philipp Stanner) - Fix typo that repeatedly distributed resources to a bridge instead of iterating over subordinate bridges, which resulted in too little space to assign some BARs (Kai-Heng Feng) - Relax bridge window tail sizing for optional resources, e.g., IOV BARs, to avoid failures when removing and re-adding devices (Ilpo Järvinen) - Allow drivers to enable devices even if we haven't assigned optional IOV resources to them (Ilpo Järvinen) - Rework handling of optional resources (IOV BARs, ROMs) to reduce failures if we can't allocate them (Ilpo Järvinen) - Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory) - Fix s390 mmio_read/write syscalls, which didn't cause page faults in some cases, which broke vfio-pci lazy mapping on first access (Niklas Schnelle) - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was disabled only for s390 (Niklas Schnelle) - Support mmap of PCI resources on s390 except for ISM devices (Niklas Schnelle) ASPM: - Delay pcie_link_state deallocation to avoid dangling pointers that cause invalid references during hot-unplug (Daniel Stodden) Power management: - Allow PCI bridges to go to D3Hot when suspending on all non-x86 systems (Manivannan Sadhasivam) Power control: - Create pwrctrl devices in pci_scan_device() to make it more symmetric with pci_pwrctrl_unregister() and make pwrctrl devices for PCI bridges possible (Manivannan Sadhasivam) - Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc. can still access devices after pci_stop_dev() (Manivannan Sadhasivam) - If there's a pwrctrl device for a PCI device, skip scanning it because the pwrctrl core will rescan the bus after the device is powered on (Manivannan Sadhasivam) - Add a pwrctrl driver for PCI slots based on voltage regulators described via devicetree (Manivannan Sadhasivam) Bandwidth control: - Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the set_pcie_cooling_state.sh test case (Yi Lai) - Avoid a NULL pointer dereference when we run out of bus numbers to assign for a bridge secondary bus (Lukas Wunner) Hotplug: - Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and NULL pointer checks (Lukas Wunner) - Drop shpchp module init/exit logging, replace shpchp dbg() with ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers (Ilpo Järvinen) - Drop 'shpchp_debug' module parameter in favor of standard dynamic debugging (Ilpo Järvinen) - Drop unused cpcihp .get_power(), .set_power() function pointers (Guilherme Giacomo Simoes) - Disable hotplug interrupts in portdrv only when pciehp is not enabled to avoid issuing two hotplug commands too close together (Feng Tang) - Skip pciehp 'device replaced' check if the device has been removed to address a deadlock when resuming after a device was removed during system sleep (Lukas Wunner) - Don't enable pciehp hotplug interupt when resuming in poll mode (Ilpo Järvinen) Virtualization: - Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar Dave) DOE: - Expose supported DOE features via sysfs (Alistair Francis) - Allow DOE support to be enabled even if CXL isn't enabled (Alistair Francis) Endpoint framework: - Convert PCI device data so pci-epf-test works correctly on big-endian endpoint systems (Niklas Cassel) - Add BAR_RESIZABLE type to endpoint framework and add DWC core support for EPF drivers to set BAR_RESIZABLE type and size (Niklas Cassel) - Fix pci-epf-test double free that causes an oops if the host reboots and PERST# deassertion restarts endpoint BAR allocation (Christian Bruel) - Fix endpoint BAR testing so tests can skip disabled BARs instead of reporting them as failures (Niklas Cassel) - Widen endpoint test BAR size variable to accommodate BARs larger than INT_MAX (Niklas Cassel) - Remove unused tools 'pci' build target left over after moving tests to tools/testing/selftests/pci_endpoint (Jianfeng Liu) Altera PCIe controller driver: - Add DT binding and driver support for Agilex family (P-Tile, F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar) AMD MDB PCIe controller driver: - Add DT binding and driver for AMD MDB (Multimedia DMA Bridge) (Thippeswamy Havalige) Broadcom STB PCIe controller driver: - Add BCM2712 MSI-X DT binding and interrupt controller drivers and add softdep on irq_bcm2712_mip driver to ensure that it is loaded first (Stanimir Varbanov) - Expand inbound window map to 64GB so it can accommodate BCM2712 (Stanimir Varbanov) - Add BCM2712 support and DT updates (Stanimir Varbanov) - Apply link speed restriction before bringing link up, not after (Jim Quinlan) - Update Max Link Speed in Link Capabilities via the internal writable register, not the read-only config register (Jim Quinlan) - Handle regulator_bulk_get() error to avoid panic when we call regulator_bulk_free() later (Jim Quinlan) - Disable regulators only when removing the bus immediately below a Root Port because we don't support regulators deeper in the hierarchy (Jim Quinlan) - Make const read-only arrays static (Colin Ian King) Cadence PCIe endpoint driver: - Correct MSG TLP generation so endpoints can generate INTx messages (Hans Zhang) Freescale i.MX6 PCIe controller driver: - Identify the second controller on i.MX8MQ based on devicetree 'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu) - Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the ATU input address (using parent_bus_offset) from devicetree (Frank Li) Freescale Layerscape PCIe controller driver: - Drop deprecated 'num-ib-windows' and 'num-ob-windows' and unnecessary 'status' from example (Krzysztof Kozlowski) - Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg") arg_count to fix probe failure on LS1043A (Ioana Ciornei) HiSilicon STB PCIe controller driver: - Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe JAILLET) Intel Gateway PCIe controller driver: - Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU input address (using parent_bus_offset) from devicetree (Frank Li) Intel VMD host bridge driver: - Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so pci_ops.read() will never sleep, even on PREEMPT_RT where spinlock_t becomes a sleepable lock, to avoid calling a sleeping function from invalid context (Ryo Takakura) MediaTek PCIe Gen3 controller driver: - Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo Bianconi) - Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and program host bridge memory aperture to this syscon node (Lorenzo Bianconi) Qualcomm PCIe controller driver: - Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan) - Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander Stein) - Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry Baryshkov) - Make DT iommu property required for SA8775P and prohibited for SDX55 (Dmitry Baryshkov) - Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry Baryshkov) - Add endpoint DT properties for SAR2130P and enable endpoint mode in driver (Dmitry Baryshkov) - Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as RESERVED (Manivannan Sadhasivam) Rockchip DesignWare PCIe controller driver: - Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas Cassel) Synopsys DesignWare PCIe controller driver: - Add debugfs-based Silicon Debug, Error Injection, Statistical Counter support for DWC (Shradha Todi) - Add debugfs property to expose LTSSM status of DWC PCIe link (Hans Zhang) - Add Rockchip support for DWC debugfs features (Niklas Cassel) - Add dw_pcie_parent_bus_offset() to look up the parent bus address of a specified 'reg' property and return the offset from the CPU physical address (Frank Li) - Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset via 'reg[config]' for host controllers and 'reg[addr_space]' for endpoint controllers (Frank Li) - Apply struct dw_pcie.parent_bus_offset in ATU users to remove use of .cpu_addr_fixup() when programming ATU (Frank Li) TI J721E PCIe driver: - Correct the 'link down' interrupt bit for J784S4 (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce alignment requirement from 1MB to 64KB (Niklas Cassel) Xilinx Versal CPM PCIe controller driver: - Free IRQ domain in probe error path to avoid leaking it (Thippeswamy Havalige) - Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for Versal Net CPM5NC Root Port controller (Thippeswamy Havalige) - Add driver support for CPM5_HOST1 (Thippeswamy Havalige) Miscellaneous: - Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer) - Use for_each_available_child_of_node_scoped() to simplify apple, kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)" * tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits) PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args() PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4 misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts PCI: endpoint: Add intx_capable to epc_features struct dt-bindings: PCI: Add common schema for devices accessible through PCI BARs PCI: intel-gw: Remove intel_pcie_cpu_addr() PCI: imx6: Remove imx_pcie_cpu_addr_fixup() PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup() PCI: dwc: ep: Ensure proper iteration over outbound map windows PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources() PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init() PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug PCI: dwc: Add dw_pcie_parent_bus_offset() PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant PCI: brcmstb: Make const read-only arrays static ...
2025-03-27Merge branch 'pci/misc'Bjorn Helgaas
- Remove unused tools 'pci' build target left over after moving tests to tools/testing/selftests/pci_endpoint (Jianfeng Liu) - Fix typos and whitespace errors (Bjorn Helgaas) * pci/misc: PCI: Fix typos tools/Makefile: Remove pci target # Conflicts: # drivers/pci/endpoint/functions/pci-epf-test.c
2025-03-27Merge branch 'pci/hotplug'Bjorn Helgaas
- Drop shpchp module init/exit logging (Ilpo Järvinen) - Replace shpchp dbg() with ctrl_dbg() and remove unused dbg(), err(), info(), warn() wrappers (Ilpo Järvinen) - Drop 'shpchp_debug' module parameter in favor of standard dynamic debugging (Ilpo Järvinen) - Drop unused .get_power(), .set_power() function pointers (Guilherme Giacomo Simoes) - Drop superfluous pci_hotplug_slot_list (Lukas Wunner) - Drop superfluous try_module_get() calls (Lukas Wunner) - Drop superfluous NULL pointer checks (Lukas Wunner) - Pass struct hotplug_slot pointers directly to avoid backpointer dereferencing in has_*_file() (Lukas Wunner) - Inline pci_hp_{create,remove}_module_link() to reduce exported symbols (Lukas Wunner) - Disable hotplug interrupts in portdrv only when pciehp is not enabled to prevent issuing two hotplug commands too close together (Feng Tang) - Skip pciehp 'device replaced' check if the device has been removed to address a common deadlock when resuming after a device was removed during system sleep (Lukas Wunner) - Don't enable pciehp hotplug interupt when resuming in poll mode (Ilpo Järvinen) * pci/hotplug: PCI: pciehp: Don't enable HPIE when resuming in poll mode PCI: pciehp: Avoid unnecessary device replacement check PCI/portdrv: Only disable pciehp interrupts early when needed PCI: hotplug: Inline pci_hp_{create,remove}_module_link() PCI: hotplug: Avoid backpointer dereferencing in has_*_file() PCI: hotplug: Drop superfluous NULL pointer checks in has_*_file() PCI: hotplug: Drop superfluous try_module_get() calls PCI: hotplug: Drop superfluous pci_hotplug_slot_list PCI: cpcihp: Remove unused .get_power() and .set_power() PCI: shpchp: Remove 'shpchp_debug' module parameter PCI: shpchp: Remove unused logging wrappers PCI: shpchp: Change dbg() -> ctrl_dbg() PCI: shpchp: Remove logging from module init/exit functions
2025-03-27Merge branch 'pci/bwctrl'Bjorn Helgaas
- Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the set_pcie_cooling_state.sh test case (Yi Lai) - Fix the pcie_bwctrl_select_speed() return value in cases where a non-compliant device doesn't advertise valid supported speeds (Ilpo Järvinen) - Avoid a NULL pointer dereference when we run out of bus numbers to assign for a bridge secondary bus (Lukas Wunner) * pci/bwctrl: PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS
2025-03-27Merge branch 'pci/aspm'Bjorn Helgaas
- Delay pcie_link_state deallocation to avoid dangling pointers that cause invalid references during hot-unplug (Daniel Stodden) * pci/aspm: PCI/ASPM: Fix link state exit during switch upstream function removal
2025-03-23PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustionLukas Wunner
When BIOS neglects to assign bus numbers to PCI bridges, the kernel attempts to correct that during PCI device enumeration. If it runs out of bus numbers, no pci_bus is allocated and the "subordinate" pointer in the bridge's pci_dev remains NULL. The PCIe bandwidth controller erroneously does not check for a NULL subordinate pointer and dereferences it on probe. Bandwidth control of unusable devices below the bridge is of questionable utility, so simply error out instead. This mirrors what PCIe hotplug does since commit 62e4492c3063 ("PCI: Prevent NULL dereference during pciehp probe"). The PCI core emits a message with KERN_INFO severity if it has run out of bus numbers. PCIe hotplug emits an additional message with KERN_ERR severity to inform the user that hotplug functionality is disabled at the bridge. A similar message for bandwidth control does not seem merited, given that its only purpose so far is to expose an up-to-date link speed in sysfs and throttle the link speed on certain laptops with limited Thermal Design Power. So error out silently. User-visible messages: pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring [...] pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74 pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them [...] pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring [...] BUG: kernel NULL pointer dereference RIP: pcie_update_link_speed pcie_bwnotif_enable pcie_bwnotif_probe pcie_port_probe_service really_probe Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Wouter Bijlsma <wouter@wouterbijlsma.nl> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219906 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Tested-by: Wouter Bijlsma <wouter@wouterbijlsma.nl> Cc: stable@vger.kernel.org # v6.13+ Link: https://lore.kernel.org/r/3b6c8d973aedc48860640a9d75d20528336f1f3c.1742669372.git.lukas@wunner.de
2025-03-21PCI/bwctrl: Fix pcie_bwctrl_select_speed() return typeIlpo Järvinen
pcie_bwctrl_select_speed() should take __fls() of the speed bit, not return it as a raw value. Instead of directly returning 2.5GT/s speed bit, simply assign the fallback speed (2.5GT/s) into supported_speeds variable to share the normal return path that calls pcie_supported_speeds2target_speed() to calculate __fls(). This code path is not very likely to execute because pcie_get_supported_speeds() should provide valid ->supported_speeds but a spec violating device could fail to synthesize any speed in pcie_get_supported_speeds(). It could also happen in case the supported_speeds intersection is empty (also a violation of the current PCIe specs). Link: https://lore.kernel.org/r/20250321163103.5145-1-ilpo.jarvinen@linux.intel.com Fixes: de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-08PCI: Fix typosBjorn Helgaas
Fix typos and whitespace errors. Link: https://lore.kernel.org/r/20250307231715.438518-1-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-04PCI/portdrv: Only disable pciehp interrupts early when neededFeng Tang
Firmware developers reported that Linux issues two PCIe hotplug commands in very short intervals on an ARM server, which doesn't comply with the PCIe spec. According to PCIe r6.1, sec 6.7.3.2, if the Command Completed event is supported, software must wait for a command to complete or wait at least 1 second before sending a new command. In the failure case, the first PCIe hotplug command is from get_port_device_capability(), which sends a command to disable PCIe hotplug interrupts without waiting for its completion, and the second command comes from pcie_enable_notification() of pciehp driver, which enables hotplug interrupts again. Fix this by only disabling the hotplug interrupts when the pciehp driver is not enabled. Link: https://lore.kernel.org/r/20250303023630.78397-1-feng.tang@linux.alibaba.com Fixes: 2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization") Suggested-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2025-02-21PCI/ERR: Handle TLP Log in Flit modeIlpo Järvinen
Flit mode introduced in PCIe r6.0 alters how the TLP Header Log is presented through AER and DPC Capability registers. The TLP Prefix Log Register is not present with Flit mode, and the register becomes an extension of the TLP Header Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13). Adapt pcie_read_tlp_log() and struct pcie_tlp_log to read and store the extended TLP Header Log when the Link is in Flit mode. As the Prefix Log and Extended TLP Header are not present at the same time, a C union can be used. Determining whether the error occurred while the Link was in Flit mode is a bit complicated. In case of AER, the Advanced Error Capabilities and Control Register directly tells whether the error was logged in Flit mode or not (PCIe r6.1 sec 7.8.4.7). The DPC Capability (PCIe r6.1 sec 7.9.14), unfortunately, does not contain the same information. Unlike AER, the DPC Capability does not provide a way to discern whether the error was logged in Flit mode (this is confirmed by PCI WG to be an oversight in the spec). DPC will bring the Link down immediately following an error, which makes it impossible to acquire the Flit Mode Status directly from the Link Status 2 register because Flit Mode Status is only set in certain Link states (PCIe r6.1 sec 7.5.3.20). As a workaround, use the flit_mode value stored into the struct pci_bus. Link: https://lore.kernel.org/r/20250207161836.2755-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-21PCI/AER: Descope pci_printk() to aer_printk()Ilpo Järvinen
include/linux/pci.h provides low-level pci_printk() interface that is only used by AER because it needs to print the same message with different levels depending on the error severity. No other PCI code uses that functionality and calls pci_<level>() logging functions directly with the appropriate level. Descope pci_printk() into AER as aer_printk(). Link: https://lore.kernel.org/r/20241216161012.1774-5-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: retain pci_printk() for now since shpchp still uses it and I moved those patches to a different branch] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-20PCI/ASPM: Fix link state exit during switch upstream function removalDaniel Stodden
Before 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free"), we would free the ASPM link only after the last function on the bus pertaining to the given link was removed. That was too late. If function 0 is removed before sibling function, link->downstream would point to free'd memory after. After above change, we freed the ASPM parent link state upon any function removal on the bus pertaining to a given link. That is too early. If the link is to a PCIe switch with MFD on the upstream port, then removing functions other than 0 first would free a link which still remains parent_link to the remaining downstream ports. The resulting GPFs are especially frequent during hot-unplug, because pciehp removes devices on the link bus in reverse order. On that switch, function 0 is the virtual P2P bridge to the internal bus. Free exactly when function 0 is removed -- before the parent link is obsolete, but after all subordinate links are gone. Link: https://lore.kernel.org/r/e12898835f25234561c9d7de4435590d957b85d9.1734924854.git.dns@arista.com Fixes: 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free") Signed-off-by: Daniel Stodden <dns@arista.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-02-06PCI/ASPM: Fix L1SS savingIlpo Järvinen
Commit 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both the Upstream Port and its upstream bridge when handling an Upstream Port, which matches what the L1SS restore side does. However, parent->state_saved can be set true at an earlier time when the upstream bridge saved other parts of its state. Then later when attempting to save the L1SS config while handling the Upstream Port, parent->state_saved is true in pci_save_aspm_l1ss_state() resulting in early return and skipping saving bridge's L1SS config because it is assumed to be already saved. Later on restore, junk is written into L1SS config which causes issues with some devices. Remove parent->state_saved check and unconditionally save L1SS config also for the upstream bridge from an Upstream Port which ought to be harmless from correctness point of view. With the Upstream Port check now present, saving the L1SS config more than once for the bridge is no longer a problem (unlike when the parent->state_saved check got introduced into the fix during its development). Link: https://lore.kernel.org/r/20250131152913.2507-1-ilpo.jarvinen@linux.intel.com Fixes: 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731 Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com> Reported by: Rafael J. Wysocki <rafael@kernel.org> Closes: https://lore.kernel.org/r/CAJZ5v0iKmynOQ5vKSQbg1J_FmavwZE-nRONovOZ0mpMVauheWg@mail.gmail.com Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Closes: https://lore.kernel.org/r/d7246feb-4f3f-4d0c-bb64-89566b170671@molgen.mpg.de Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13 9360
2025-01-25Merge tag 'pci-v6.14-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Batch sizing of multiple BARs while memory decoding is disabled instead of disabling/enabling decoding for each BAR individually; this optimizes virtualized environments where toggling decoding enable is expensive (Alex Williamson) - Add host bridge .enable_device() and .disable_device() hooks for bridges that need to configure things like Requester ID to StreamID mapping when enabling devices (Frank Li) - Extend struct pci_ecam_ops with .enable_device() and .disable_device() hooks so drivers that use pci_host_common_probe() instead of their own .probe() have a way to set the .enable_device() callbacks (Marc Zyngier) - Drop 'No bus range found' message so we don't complain when DTs don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas) - Rename the drivers/pci/of_property.c struct of_pci_range to of_pci_range_entry to avoid confusion with the global of_pci_range in include/linux/of_address.h (Bjorn Helgaas) Driver binding: - Update resource request API documentation to encourage callers to supply a driver name when requesting resources (Philipp Stanner) - Export pci_intx_unmanaged() and pcim_intx() (always managed) so callers of pci_intx() (which is sometimes managed) can explicitly choose the one they need (Philipp Stanner) - Convert drivers from pci_intx() to always-managed pcim_intx() or never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix, pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna, ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner) - Remove pci_intx_unmanaged() since pci_intx() is now always unmanaged and pcim_intx() is always managed (Philipp Stanner) Error handling: - Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging rather than building their own (Ilpo Järvinen) - Move TLP Log handling to its own file (Ilpo Järvinen) - Store number of supported End-End TLP Prefixes always so we can read the correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen) - Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log() (Ilpo Järvinen) - Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix Log (Ilpo Järvinen) - Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor BIOSes that don't configure it correctly (Takashi Iwai) ASPM: - Save parent L1 PM Substates config so when we restore it along with an endpoint's config, the parent info isn't junk (Jian-Hong Pan) Power management: - Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because the system can't wake up from suspend (Werner Sembach) Endpoint framework: - Destroy the EPC device in devm_pci_epc_destroy(), which previously didn't call devres_release() (Zijun Hu) - Finish virtual EP removal in pci_epf_remove_vepf(), which previously caused a subsequent pci_epf_add_vepf() to fail with -EBUSY (Zijun Hu) - Write BAR_MASK before iATU registers in pci_epc_set_bar() so we don't depend on the BAR_MASK reset value being larger than the requested BAR size (Niklas Cassel) - Prevent changing BAR size/flags in pci_epc_set_bar() to prevent reads from bypassing the iATU if we reduced the BAR size (Niklas Cassel) - Verify address alignment when programming iATU so we don't attempt to write bits that are read-only because of the BAR size, which could lead to directing accesses to the wrong address (Niklas Cassel) - Implement artpec6 pci_epc_features so we can rely on all drivers supporting it so we can use it in EPC core code (Niklas Cassel) - Check for BARs of fixed size to prevent endpoint drivers from trying to change their size (Niklas Cassel) - Verify that requested BAR size is a power of two when endpoint driver sets the BAR (Niklas Cassel) Endpoint framework tests: - Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing dma_chan_rx (Mohamed Khalfella) - Correct the DMA MEMCPY test so it doesn't fail if the Endpoint supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam) - Add pci-epf-test and pci_endpoint_test support for capabilities (Niklas Cassel) - Add Endpoint test for consecutive BARs (Niklas Cassel) - Remove redundant comparison from Endpoint BAR test because a > 1MB BAR can always be exactly covered by iterating with a 1MB buffer (Hans Zhang) - Move and convert PCI Endpoint tests from tools/pci to Kselftests (Manivannan Sadhasivam) Apple PCIe controller driver: - Convert StreamID mapping configuration from a bus notifier to the .enable_device() and .disable_device() callbacks (Marc Zyngier) Freescale i.MX6 PCIe controller driver: - Add Requester ID to StreamID mapping configuration when enabling devices (Frank Li) - Use DWC core suspend/resume functions for imx6 (Frank Li) - Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard Zhu) - Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank Li) - Add DT binding for optional i.MX95 Refclk and driver support to enable it if the platform hasn't enabled it (Richard Zhu) - Configure PHY based on controller being in Root Complex or Endpoint mode (Frank Li) - Rely on dbi2 and iATU base addresses from DT via dw_pcie_get_resources() instead of hardcoding them (Richard Zhu) - Deassert apps_reset in imx_pcie_deassert_core_reset() since it is asserted in imx_pcie_assert_core_reset() (Richard Zhu) - Add missing reference clock enable or disable logic for IMX6SX, IMX7D, IMX8MM (Richard Zhu) - Remove redundant imx7d_pcie_init_phy() since imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu) Freescale Layerscape PCIe controller driver: - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_property_read_u32_array() (Krzysztof Kozlowski) Marvell MVEBU PCIe controller driver: - Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen) MediaTek PCIe Gen3 controller driver: - Use clk_bulk_prepare_enable() instead of separate clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi) - Rearrange reset assert/deassert so they're both done in the *_power_up() callbacks (Lorenzo Bianconi) - Document that Airoha EN7581 requires PHY init and power-on before PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo Bianconi) - Move Airoha EN7581 post-reset delay from the en7581 clock .enable() method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi) - Sleep instead of delay during Airoha EN7581 power-up, since this is a non-atomic context (Lorenzo Bianconi) - Skip PERST# assertion on Airoha EN7581 during probe and suspend/resume to avoid a hardware defect (Lorenzo Bianconi) - Enable async probe to reduce system startup time (Douglas Anderson) Microchip PolarFlare PCIe controller driver: - Set up the inbound address translation based on whether the platform allows coherent or non-coherent DMA (Daire McNamara) - Update DT binding such that platforms are DMA-coherent by default and must specify 'dma-noncoherent' if needed (Conor Dooley) Mobiveil PCIe controller driver: - Convert mobiveil-pcie.txt to YAML and update 'interrupt-names' and 'reg-names' (Frank Li) Qualcomm PCIe controller driver: - Add DT SM8550 and SM8650 optional 'global' interrupt for link events (Neil Armstrong) - Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta Mylavarapu) - If 'global' IRQ is supported for detection of Link Up events, tell DWC core not to wait for link up (Krishna chaitanya chundru) Renesas R-Car PCIe controller driver: - Avoid passing stack buffer as resource name (King Dix) Rockchip PCIe controller driver: - Simplify clock and reset handling by using bulk interfaces (Anand Moon) - Pass typed rockchip_pcie (not void) pointer to rockchip_pcie_disable_clocks() (Anand Moon) - Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails (Dan Carpenter) Rockchip DesignWare PCIe controller driver: - Use dll_link_up IRQ to detect Link Up and enumerate devices so users don't have to manually rescan (Niklas Cassel) - Tell DWC core not to wait for link up since the 'sys' interrupt is required and detects Link Up events (Niklas Cassel) Synopsys DesignWare PCIe controller driver: - Don't wait for link up in DWC core if driver can detect Link Up event (Krishna chaitanya chundru) - Update ICC and OPP votes after Link Up events (Krishna chaitanya chundru) - Always stop link in dw_pcie_suspend_noirq(), which is required at least for i.MX8QM to re-establish link on resume (Richard Zhu) - Drop racy and unnecessary LTSSM state check before sending PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu) - Add struct of_pci_range.parent_bus_addr for devices that need their immediate parent bus address, not the CPU address, e.g., to program an internal Address Translation Unit (iATU) (Frank Li) TI DRA7xx PCIe controller driver: - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_parse_phandle_with_fixed_args() or of_property_read_u32_index() (Krzysztof Kozlowski) Xilinx Versal CPM PCIe controller driver: - Add DT binding and driver support for Xilinx Versal CPM5 (Thippeswamy Havalige) MicroSemi Switchtec management driver: - Add Microchip PCI100X device IDs (Rakesh Babu Saladi) Miscellaneous: - Move reset related sysfs code from pci.c to pci-sysfs.c where other similar code lives (Ilpo Järvinen) - Simplify reset_method_store() memory management by using __free() instead of explicit kfree() cleanup (Ilpo Järvinen) - Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI hotplug driver (Thomas Weißschuh) - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong Zhang) - Correct documentation of the 'config_acs=' kernel parameter (Akihiko Odaki)" * tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits) PCI: Batch BAR sizing operations dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent PCI: microchip: Set inbound address translation for coherent or non-coherent mode Documentation: Fix pci=config_acs= example PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT PCI: Don't include 'pm_wakeup.h' directly selftests: pci_endpoint: Migrate to Kselftest framework selftests: Move PCI Endpoint tests from tools/pci to Kselftests misc: pci_endpoint_test: Fix IOCTL return value dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML PCI: switchtec: Add Microchip PCI100X device IDs misc: pci_endpoint_test: Remove redundant 'remainder' test misc: pci_endpoint_test: Add consecutive BAR test misc: pci_endpoint_test: Add support for capabilities PCI: endpoint: pci-epf-test: Add support for capabilities PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error PCI: dwc: Simplify config resource lookup ...
2025-01-23Merge branch 'pci/err'Bjorn Helgaas
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging rather than building their own (Ilpo Järvinen) - Move TLP Log handling to its own file (Ilpo Järvinen) - Add #defines for TLP Header/Prefix log sizes (Ilpo Järvinen) - Store number of supported End-End TLP Prefixes always so we can read the correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen) - Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log() (Ilpo Järvinen) - Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix Log (Ilpo Järvinen) * pci/err: PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix Log PCI: Add TLP Prefix reading to pcie_read_tlp_log() PCI: Store number of supported End-End TLP Prefixes PCI: Use unsigned int i in pcie_read_tlp_log() PCI: Use same names in pcie_read_tlp_log() prototype and definition PCI: Add defines for TLP Header/Prefix log sizes PCI: Move TLP Log handling to its own file PCI: Don't expose pcie_read_tlp_log() outside PCI subsystem
2025-01-16PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix LogIlpo Järvinen
Add pcie_print_tlp_log() to print TLP Header and Prefix Log. Print End-End Prefixes only if they are non-zero. Consolidate the few places which currently print TLP using custom formatting. Link: https://lore.kernel.org/r/20250114170840.1633-9-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-16PCI: Add TLP Prefix reading to pcie_read_tlp_log()Ilpo Järvinen
pcie_read_tlp_log() handles only 4 Header Log DWORDs but TLP Prefix Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present. Generalize pcie_read_tlp_log() and struct pcie_tlp_log to also handle TLP Prefix Log. The relevant registers are formatted identically in AER and DPC Capability, but has these variations: a) The offsets of TLP Prefix Log registers vary. b) DPC RP PIO TLP Prefix Log register can be < 4 DWORDs. c) AER TLP Prefix Log Present (PCIe r6.1 sec 7.8.4.7) can indicate Prefix Log is not present. Therefore callers must pass the offset of the TLP Prefix Log register and the entire length to pcie_read_tlp_log() to be able to read the correct number of TLP Prefix DWORDs from the correct offset. Link: https://lore.kernel.org/r/20250114170840.1633-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash ternary fix from https://lore.kernel.org/r/20250116172019.88116-1-colin.i.king@gmail.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-14PCI: Use unsigned int i in pcie_read_tlp_log()Ilpo Järvinen
Loop variable i counting from 0 upwards does not need to be signed so make it unsigned int. Link: https://lore.kernel.org/r/20250114170840.1633-6-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-14PCI: Use same names in pcie_read_tlp_log() prototype and definitionIlpo Järvinen
pcie_read_tlp_log()'s prototype and function signature diverged due to changes made while applying. Make the parameters of pcie_read_tlp_log() named identically. Link: https://lore.kernel.org/r/20250114170840.1633-5-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14PCI: Add defines for TLP Header/Prefix log sizesIlpo Järvinen
Add defines for AER and DPC capabilities TLP Header Logging register sizes (PCIe r6.2, sec 7.8.4 / 7.9.14) and replace literals with them. Link: https://lore.kernel.org/r/20250114170840.1633-4-ilpo.jarvinen@linux.intel.com Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-01-14PCI: Move TLP Log handling to its own fileIlpo Järvinen
TLP Log is a PCIe feature and is processed only by AER and DPC. Configwise, DPC depends AER being enabled. In lack of better place, the TLP Log handling code was initially placed into pci.c but it can be easily placed in a separate file. Move TLP Log handling code to its own file under pcie/ subdirectory and include it only when AER is enabled. Link: https://lore.kernel.org/r/20250114170840.1633-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14Merge tag 'pci-v6.13-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fix from Bjorn Helgaas: - Prevent bwctrl NULL pointer dereference that caused hangs on shutdown on ASUS ROG Strix SCAR 17 G733PYV (Lukas Wunner) * tag 'pci-v6.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/bwctrl: Fix NULL pointer deref on unbind and bind
2025-01-07PCI/bwctrl: Fix NULL pointer deref on unbind and bindLukas Wunner
The interrupt handler for bandwidth notifications, pcie_bwnotif_irq(), dereferences a "data" pointer. On unbind, that pointer is set to NULL by pcie_bwnotif_remove(). However the interrupt handler may still be invoked afterwards and will dereference that NULL pointer. That's because the interrupt is requested using a devm_*() helper and the driver core releases devm_*() resources *after* calling ->remove(). pcie_bwnotif_remove() does clear the Link Bandwidth Management Interrupt Enable and Link Autonomous Bandwidth Interrupt Enable bits in the Link Control Register, but that won't prevent execution of pcie_bwnotif_irq(): The interrupt for bandwidth notifications may be shared with AER, DPC, PME, and hotplug. So pcie_bwnotif_irq() may be executed as long as the interrupt is requested. There's a similar race on bind: pcie_bwnotif_probe() requests the interrupt when the "data" pointer still points to NULL. A NULL pointer deref may thus likewise occur if AER, DPC, PME or hotplug raise an interrupt in-between the bandwidth controller's call to devm_request_irq() and assignment of the "data" pointer. Drop the devm_*() usage and reorder requesting of the interrupt to fix the issue. While at it, drop a stray but harmless no_free_ptr() invocation when assigning the "data" pointer in pcie_bwnotif_probe(). Ilpo points out that the locking on unbind and bind needs to be symmetric, so move the call to pcie_bwnotif_disable() inside the critical section protected by pcie_bwctrl_setspeed_rwsem and pcie_bwctrl_lbms_rwsem. Evert reports a hang on shutdown of an ASUS ROG Strix SCAR 17 G733PYV. The issue is no longer reproducible with the present commit. Evert found that attaching a USB-C monitor prevented the hang. The machine contains an ASMedia USB 3.2 controller below a hotplug-capable Root Port. So one possible explanation is that the controller gets hot-removed on shutdown unless something is connected. And the ensuing hotplug interrupt occurs exactly when the bandwidth controller is unregistering. The precise cause could not be determined because the screen had already turned black when the hang occurred. Link: https://lore.kernel.org/r/ae2b02c9cfbefff475b6e132b0aa962aaccbd7b2.1736162539.git.lukas@wunner.de Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Evert Vorster <evorster@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219629 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Evert Vorster <evorster@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-12-21Merge tag 'pci-v6.13-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI fixes from Krzysztof Wilczyński: "Two small patches that are important for fixing boot time hang on Intel JHL7540 'Titan Ridge' platforms equipped with a Thunderbolt controller. The boot time issue manifests itself when a PCI Express bandwidth control is unnecessarily enabled on the Thunderbolt controller downstream ports, which only supports a link speed of 2.5 GT/s in accordance with USB4 v2 specification (p. 671, sec. 11.2.1, "PCIe Physical Layer Logical Sub-block"). As such, there is no need to enable bandwidth control on such downstream port links, which also works around the issue. Both patches were tested by the original reporter on the hardware on which the failure origin golly manifested itself. Both fixes were proven to resolve the reported boot hang issue, and both patches have been in linux-next this week with no reported problems" * tag 'pci-v6.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/bwctrl: Enable only if more than one speed is supported PCI: Honor Max Link Speed when determining supported speeds