summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-12Merge branch 'pci/enumeration'Bjorn Helgaas
- Collect interrupt-related code in irq.c (Ilpo Järvinen) - Mark 3ware-9650SE Root Port Extended Tags as broken (Jörg Wedekind) * pci/enumeration: PCI: Mark 3ware-9650SE Root Port Extended Tags as broken PCI: Place interrupt related code into irq.c # Conflicts: # drivers/pci/Makefile
2024-03-12Merge branch 'pci/dpc'Bjorn Helgaas
- After a DPC event, print all logged TLP Prefixes instead of printing the first prefix several times (Ilpo Järvinen) - Ignore the expected Surprise Down error that may cause a DPC event when hot-removing a device (Smita Koralahalli) - Add an RP PIO log size quirk for Intel Raptor Lake Root Ports, which still don't advertise the correct log size, which prevented logging of RP PIO Log registers when DPC is triggered (Paul Menzel) * pci/dpc: PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports PCI/DPC: Ignore Surprise Down error on hot removal PCI/DPC: Print all TLP Prefixes, not just the first
2024-03-12Merge branch 'pci/devres'Bjorn Helgaas
- Unmap MMIO mappings in pci_iounmap() to avoid a leak when ARCH_HAS_GENERIC_IOPORT_MAP is defined (Philipp Stanner) - Move pci_iomap.c to drivers/pci/ since it's all PCI-related (Philipp Stanner) - Move other PCI-related devres code from lib/devres.c to drivers/pci/ (Philipp Stanner) - Move other devres code from pci.c to devres.c (Philipp Stanner) * pci/devres: PCI: Move devres code from pci.c to devres.c PCI: Move PCI-specific devres code to drivers/pci/ PCI: Move pci_iomap.c to drivers/pci/ pci_iounmap(): Fix MMIO mapping leak
2024-03-12Merge branch 'pci/aspm'Bjorn Helgaas
- Collect ASPM-related code into aspm.c (David E. Box) - Save and restore ASPM L1 PM Substates configuration so these states continue working after suspend/resume (David E. Box) - Move the ASPM L1.2-related LTR save/restore next to the ASPM save/restore (David E. Box) - Move the required L1 disable before L1 Substate configuration into pci_restore_aspm_l1ss_state() (Bjorn Helgaas) - Update save_save when ASPM config is changed, so a .slot_reset() during error recovery restores the changed config, not the .probe()-time config (Vidya Sagar) * pci/aspm: PCI/ASPM: Update save_state when configuration changes PCI/ASPM: Disable L1 before configuring L1 Substates PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state() PCI/ASPM: Save L1 PM Substates Capability for suspend/resume PCI/ASPM: Move pci_save_ltr_state() to aspm.c PCI/ASPM: Always build aspm.c PCI/ASPM: Move pci_configure_ltr() to aspm.c
2024-03-12Merge branch 'pci/aer'Bjorn Helgaas
- Fix sysfs paths in aer_rootport_total_err_* documentation (Johan Hovold) - Block runtime suspend while handling AER errors (Stanislaw Gruszka) - Add a generic Header Log structure and reader shared by AER and DPC (Ilpo Järvinen) * pci/aer: PCI/AER: Generalize TLP Header Log reading PCI/AER: Use explicit register size for PCI_ERR_CAP PCI/AER: Block runtime suspend when handling errors PCI/AER: Clean up version indentation in ABI docs PCI/AER: Fix rootport attribute paths in ABI docs
2024-03-12PCI/ASPM: Update save_state when configuration changesVidya Sagar
Many PCIe device drivers save the configuration state of their device during probe and restore it when their .slot_reset() hook is called during PCIe error recovery. If the ASPM configuration is changed after the driver's probe is called and before an error event occurs, .slot_reset() restores the ASPM configuration to what it was at the time of probe, not to what it was just before the occurrence of the error event. This leads to a mismatch in ASPM configuration between the device and its upstream device. Update the saved configuration of the device when the ASPM configuration changes. Link: https://lore.kernel.org/r/20240222174436.3565146-1-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> [bhelgaas: commit log, rebase to pci/aspm, rename to pci_update_aspm_saved_state() since it updates only LNKCTL, update only ASPMC and CLKREQ_EN in LNKCTL] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: David E. Box <david.e.box@linux.intel.com>
2024-03-12PCI/ASPM: Disable L1 before configuring L1 SubstatesBjorn Helgaas
Per PCIe r6.1, sec 5.5.4, L1 must be disabled while setting ASPM L1 PM Substates enable bits. Previously this was enforced by clearing PCI_EXP_LNKCTL_ASPMC before calling pci_restore_aspm_l1ss_state(). Move the L1 (and L0s, although that doesn't seem required) disable into pci_restore_aspm_l1ss_state() itself so it's closer to the code that depends on it. Link: https://lore.kernel.org/r/20240223213733.GA115410@bhelgaas Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-03-12PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()David E. Box
ASPM state is saved and restored from pci_save/restore_pcie_state(). Since the LTR Capability is linked with ASPM, move the LTR save and restore calls there as well. No functional change intended. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20240128233212.1139663-6-david.e.box@linux.intel.com Link: https://lore.kernel.org/r/20240223205851.114931-6-helgaas@kernel.org Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-03-12PCI/ASPM: Save L1 PM Substates Capability for suspend/resumeDavid E. Box
4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume") restored the L1 PM Substates Capability after resume, which reduced power consumption by making the ASPM L1.x states work after resume. a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"") reverted 4ff116d0d5fd because resume failed on some systems, so power consumption after resume increased again. a7152be79b62 mentioned that we restore L1 PM substate configuration even though ASPM L1 may already be enabled. This is due the fact that the pci_restore_aspm_l1ss_state() was called before pci_restore_pcie_state(). Save and restore the L1 PM Substates Capability, following PCIe r6.1, sec 5.5.4 more closely by: 1) Do not restore ASPM configuration in pci_restore_pcie_state() but do that after PCIe capability is restored in pci_restore_aspm_state() following PCIe r6.1, sec 5.5.4. 2) If BIOS reenables L1SS, particularly L1.2, we need to clear the enables in the right order, downstream before upstream. Defer restoring the L1SS config until we are at the downstream component. Then update the config for both ends of the link in the prescribed order. 3) Program ASPM L1 PM substate configuration before L1 enables. 4) Program ASPM L1 PM substate enables last, after rest of the fields in the capability are programmed. [bhelgaas: commit log, squash L1SS-related patches, do both LNKCTL restores in pci_restore_pcie_state()] Link: https://lore.kernel.org/r/20240128233212.1139663-3-david.e.box@linux.intel.com Link: https://lore.kernel.org/r/20240128233212.1139663-4-david.e.box@linux.intel.com Link: https://lore.kernel.org/r/20240223205851.114931-5-helgaas@kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217321 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877 Co-developed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Co-developed-by: David E. Box <david.e.box@linux.intel.com> Reported-by: Koba Ko <koba.ko@canonical.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Tasev Nikola <tasev.stefanoska@skynet.be> # Asus UX305FA Cc: Mark Enriquez <enriquezmark36@gmail.com> Cc: Thomas Witt <kernel@witt.link> Cc: Werner Sembach <wse@tuxedocomputers.com> Cc: Vidya Sagar <vidyas@nvidia.com>
2024-03-08PCI/AER: Generalize TLP Header Log readingIlpo Järvinen
Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1 secs 7.8.4 & 7.9.14) to convey error diagnostics but the struct is named after AER as the struct aer_header_log_regs. Also, not all places that handle TLP Header Log use the struct and the struct members are named individually. Generalize the struct name and members, and use it consistently where TLP Header Log is being handled so that a pcie_read_tlp_log() helper can be easily added. Link: https://lore.kernel.org/r/20240206135717.8565-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: drop ixgbe changes for now, tidy whitespace] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-03-08PCI/AER: Use explicit register size for PCI_ERR_CAPIlpo Järvinen
Use u32 for PCIe AER Capability register variable and name it "aercc" (Advanced Error Capabilities and Control register, PCIe r6.1 sec 7.8.4.7) instead of "temp". Link: https://lore.kernel.org/r/20240206135717.8565-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: make subject more specific and match similar previous patches] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-03-07PCI/AER: Block runtime suspend when handling errorsStanislaw Gruszka
PM runtime can be done simultaneously with AER error handling. Avoid that by using pm_runtime_get_sync() before and pm_runtime_put() after reset in pcie_do_recovery() for all recovering devices. pm_runtime_get_sync() will increase dev->power.usage_count counter to prevent any possible future request to runtime suspend a device. It will also resume a device, if it was previously in D3hot state. I tested with igc device by doing simultaneous aer_inject and rpm suspend/resume via /sys/bus/pci/devices/PCI_ID/power/control and can reproduce: igc 0000:02:00.0: not ready 65535ms after bus reset; giving up pcieport 0000:00:1c.2: AER: Root Port link has been reset (-25) pcieport 0000:00:1c.2: AER: subordinate device reset failed pcieport 0000:00:1c.2: AER: device recovery failed igc 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible The problem disappears when this patch is applied. Link: https://lore.kernel.org/r/20240212120135.146068-1-stanislaw.gruszka@linux.intel.com Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Cc: <stable@vger.kernel.org>
2024-03-07PCI/ASPM: Move pci_save_ltr_state() to aspm.cDavid E. Box
Even when CONFIG_PCIEASPM is not set, we save and restore the LTR Capability so that if ASPM L1.2 and LTR were configured by the platform, ASPM L1.2 will still work after suspend/resume, when that platform configuration may be lost. See dbbfadf23190 ("PCI/ASPM: Save LTR Capability for suspend/resume"). Since ASPM L1.2 depends on the LTR Capability, move the save/restore code to the part of aspm.c that is always compiled regardless of CONFIG_PCIEASPM. No functional change intended. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20240128233212.1139663-5-david.e.box@linux.intel.com [bhelgaas: commit log, reorder to make this a pure move] Link: https://lore.kernel.org/r/20240223205851.114931-4-helgaas@kernel.org Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-03-07PCI/ASPM: Always build aspm.cDavid E. Box
Some ASPM-related tasks, such as save and restore of LTR and L1SS capabilities, still need to be performed when CONFIG_PCIEASPM is not enabled. To prepare for these changes, wrap the current code in aspm.c with an #ifdef and always build the file. Link: https://lore.kernel.org/r/20240128233212.1139663-2-david.e.box@linux.intel.com [bhelgaas: split build change from function moves] Link: https://lore.kernel.org/r/20240223205851.114931-3-helgaas@kernel.org Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-03-07PCI/ASPM: Move pci_configure_ltr() to aspm.cDavid E. Box
The Latency Tolerance Reporting (LTR) mechanism supports the ASPM L1.2 state and is only configured when CONFIG_PCIEASPM is set. Move pci_configure_ltr() and pci_bridge_reconfigure_ltr() into aspm.c since they only build when CONFIG_PCIEASPM is set. No functional change intended. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20240128233212.1139663-2-david.e.box@linux.intel.com [bhelgaas: commit log, split build change from function moves] Link: https://lore.kernel.org/r/20240223205851.114931-2-helgaas@kernel.org Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-03-05PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root PortsPaul Menzel
Commit 5459c0b70467 ("PCI/DPC: Quirk PIO log size for certain Intel Root Ports") and commit 3b8803494a06 ("PCI/DPC: Quirk PIO log size for Intel Ice Lake Root Ports") add quirks for Ice, Tiger and Alder Lake Root Ports. System firmware for Raptor Lake still has the bug, so Linux logs the warning below on several Raptor Lake systems like Dell Precision 3581 with Intel Raptor Lake processor (0W18NX) system firmware/BIOS version 1.10.1. pci 0000:00:07.0: [8086:a76e] type 01 class 0x060400 pci 0000:00:07.0: DPC: RP PIO log size 0 is invalid pci 0000:00:07.1: [8086:a73f] type 01 class 0x060400 pci 0000:00:07.1: DPC: RP PIO log size 0 is invalid Apply the quirk for Raptor Lake Root Ports as well. This also enables the DPC driver to dump the RP PIO Log registers when DPC is triggered. Link: https://lore.kernel.org/r/20240305113057.56468-1-pmenzel@molgen.mpg.de Reported-by: Niels van Aert <nvaert1986@hotmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218560 Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Niels van Aert <nvaert1986@hotmail.com>
2024-02-28PCI/DPC: Ignore Surprise Down error on hot removalSmita Koralahalli
According to PCIe r6.0 sec 6.7.6 [1], async removal with DPC may result in surprise down error. This error is expected and is just a side-effect of async remove. Ignore surprise down error generated as a side-effect of async remove. Typically, this error is benign as the pciehp handler invoked by PDC or/and DLLSC alongside DPC, de-enumerates and brings down the device appropriately, but the error messages might confuse users. Get rid of these irritating log messages with a 1s delay while pciehp waits for DPC recovery. The implementation is as follows: On an async remove a DPC is triggered along with a Presence Detect State change and/or DLL State Change. Determine it's an async remove by checking for DPC Trigger Status in DPC Status Register and Surprise Down Error Status in AER Uncorrected Error Status to be non-zero. If true, treat the DPC event as a side-effect of async remove, clear the error status registers and continue with hot-plug tear down routines. If not, follow the existing routine to handle AER and DPC errors. Masking Surprise Down Errors was explored as an alternative approach, but discarded due to the odd behavior that masking only avoids the interrupt, but still records an error per PCIe r6.0, sec 6.2.3.2.2. That stale error would be reported the next time some error other than Surprise Down is handled. Dmesg before: pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000 pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID) pcieport 0000:00:01.4: device [1022:14ab] error status/mask=00000020/04004000 pcieport 0000:00:01.4: [ 5] SDES (First) nvme nvme2: frozen state error detected, reset controller pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec pcieport 0000:00:01.4: AER: subordinate device reset failed pcieport 0000:00:01.4: AER: device recovery failed pcieport 0000:00:01.4: pciehp: Slot(16): Link Down nvme2n1: detected capacity change from 1953525168 to 0 pci 0000:04:00.0: Removing from iommu group 49 Dmesg after: pcieport 0000:00:01.4: pciehp: Slot(16): Link Down nvme1n1: detected capacity change from 1953525168 to 0 pci 0000:04:00.0: Removing from iommu group 37 [1] PCI Express Base Specification Revision 6.0, Dec 16 2021. https://members.pcisig.com/wg/PCI-SIG/document/16609 Link: https://lore.kernel.org/r/20240207181854.121335-1-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-02-20PCI: Mark 3ware-9650SE Root Port Extended Tags as brokenJörg Wedekind
Per PCIe r6.1, sec 2.2.6.2 and 7.5.3.4, a Requester may not use 8-bit Tags unless its Extended Tag Field Enable is set, but all Receivers/Completers must handle 8-bit Tags correctly regardless of their Extended Tag Field Enable. Some devices do not handle 8-bit Tags as Completers, so add a quirk for them. If we find such a device, we disable Extended Tags for the entire hierarchy to make peer-to-peer DMA possible. The 3ware 9650SE seems to have issues with handling 8-bit tags. Mark it as broken. This fixes PCI Parity Errors like : 3w-9xxx: scsi0: ERROR: (0x06:0x000C): PCI Parity Error: clearing. 3w-9xxx: scsi0: ERROR: (0x06:0x000D): PCI Abort: clearing. 3w-9xxx: scsi0: ERROR: (0x06:0x000E): Controller Queue Error: clearing. 3w-9xxx: scsi0: ERROR: (0x06:0x0010): Microcontroller Error: clearing. Link: https://lore.kernel.org/r/20240219132811.8351-1-joerg@wedekind.de Fixes: 60db3a4d8cc9 ("PCI: Enable PCIe Extended Tags if supported") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=202425 Signed-off-by: Jörg Wedekind <joerg@wedekind.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-02-12PCI: Move devres code from pci.c to devres.cPhilipp Stanner
The file pci.c is very large and contains a number of devres functions. These functions should now reside in devres.c. Move as much devres-specific code from pci.c to devres.c as possible. There are a few callers left in pci.c that do devres operations. These should be ported in the future. Add corresponding TODOs. The reason they are not moved right now in this commit is that PCI's devres currently implements a sort of "hybrid-mode": pci_request_region(), for instance, does not have a corresponding pcim_ equivalent, yet. Instead, the function can be made managed by previously calling pcim_enable_device() (instead of pci_enable_device()). This makes it unreasonable to move pci_request_region() to devres.c. Moving the functions would require changes to PCI's API and is, therefore, left for future work. In summary, this commit serves as a preparation step for a following patch series that will cleanly separate the PCI's managed and unmanaged API. Link: https://lore.kernel.org/r/20240131090023.12331-5-pstanner@redhat.com Suggested-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-02-12PCI: Move PCI-specific devres code to drivers/pci/Philipp Stanner
The pcim_*() functions in lib/devres.c are guarded by an #ifdef CONFIG_PCI and, thus, don't belong to this file. They are only ever used for PCI and are not generic infrastructure. Move all pcim_*() functions in lib/devres.c to drivers/pci/devres.c. Adjust the Makefile. Add drivers/pci/devres.c to Documentation. Link: https://lore.kernel.org/r/20240131090023.12331-4-pstanner@redhat.com Suggested-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-02-12PCI: Move pci_iomap.c to drivers/pci/Philipp Stanner
The entirety of pci_iomap.c is guarded by an #ifdef CONFIG_PCI. It, consequently, does not belong to lib/ because it is not generic infrastructure. Move pci_iomap.c to drivers/pci/ and implement the necessary changes to Makefiles and Kconfigs. Update MAINTAINERS file. Update Documentation. Link: https://lore.kernel.org/r/20240131090023.12331-3-pstanner@redhat.com [bhelgaas: squash in https://lore.kernel.org/r/20240212150934.24559-1-pstanner@redhat.com] Suggested-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2024-02-02PCI/AER: Clean up version indentation in ABI docsJohan Hovold
The 'KernelVersion' lines use a single space as separator instead of a tab so the values are not aligned with the other AER attribute fields. Link: https://lore.kernel.org/r/20240202131635.11405-3-johan+linaro@kernel.org Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-02-02PCI/AER: Fix rootport attribute paths in ABI docsJohan Hovold
The 'aer_stats' directory never made it into the sixth and final revision of the series adding the sysfs AER attributes. Link: https://lore.kernel.org/r/20240202131635.11405-2-johan+linaro@kernel.org Link: https://lore.kernel.org/lkml/20180621184822.GB14136@bhelgaas-glaptop.roam.corp.google.com/ Fixes: 12833017e581 ("PCI/AER: Add sysfs attributes for rootport cumulative stats") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-01-31pci_iounmap(): Fix MMIO mapping leakPhilipp Stanner
The #ifdef ARCH_HAS_GENERIC_IOPORT_MAP accidentally also guards iounmap(), which means MMIO mappings are leaked. Move the guard so we call iounmap() for MMIO mappings. Fixes: 316e8d79a095 ("pci_iounmap'2: Electric Boogaloo: try to make sense of it all") Link: https://lore.kernel.org/r/20240131090023.12331-2-pstanner@redhat.com Reported-by: Danilo Krummrich <dakr@redhat.com> Suggested-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org> # v5.15+
2024-01-29PCI: Place interrupt related code into irq.cIlpo Järvinen
Interrupt related code is spread into irq.c, pci.c, and setup-irq.c. Group them into pre-existing irq.c. Link: https://lore.kernel.org/r/20240129113655.3368-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-01-22PCI/DPC: Print all TLP Prefixes, not just the firstIlpo Järvinen
The TLP Prefix Log Register consists of multiple DWORDs (PCIe r6.1 sec 7.9.14.13) but the loop in dpc_process_rp_pio_error() keeps reading from the first DWORD, so we print only the first PIO TLP Prefix (duplicated several times), and we never print the second, third, etc., Prefixes. Add the iteration count based offset calculation into the config read. Fixes: f20c4ea49ec4 ("PCI/DPC: Add eDPC support") Link: https://lore.kernel.org/r/20240118110815.3867-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: add user-visible details to commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-01-21Linux 6.8-rc1v6.8-rc1Linus Torvalds
2024-01-21Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull more bcachefs updates from Kent Overstreet: "Some fixes, Some refactoring, some minor features: - Assorted prep work for disk space accounting rewrite - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this makes our trigger context more explicit - A few fixes to avoid excessive transaction restarts on multithreaded workloads: fstests (in addition to ktest tests) are now checking slowpath counters, and that's shaking out a few bugs - Assorted tracepoint improvements - Starting to break up bcachefs_format.h and move on disk types so they're with the code they belong to; this will make room to start documenting the on disk format better. - A few minor fixes" * tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits) bcachefs: Improve inode_to_text() bcachefs: logged_ops_format.h bcachefs: reflink_format.h bcachefs; extents_format.h bcachefs: ec_format.h bcachefs: subvolume_format.h bcachefs: snapshot_format.h bcachefs: alloc_background_format.h bcachefs: xattr_format.h bcachefs: dirent_format.h bcachefs: inode_format.h bcachefs; quota_format.h bcachefs: sb-counters_format.h bcachefs: counters.c -> sb-counters.c bcachefs: comment bch_subvolume bcachefs: bch_snapshot::btime bcachefs: add missing __GFP_NOWARN bcachefs: opts->compression can now also be applied in the background bcachefs: Prep work for variable size btree node buffers bcachefs: grab s_umount only if snapshotting ...
2024-01-21Merge tag 'timers-core-2024-01-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for time and clocksources: - A fix for the idle and iowait time accounting vs CPU hotplug. The time is reset on CPU hotplug which makes the accumulated systemwide time jump backwards. - Assorted fixes and improvements for clocksource/event drivers" * tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug clocksource/drivers/ep93xx: Fix error handling during probe clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings clocksource/timer-riscv: Add riscv_clock_shutdown callback dt-bindings: timer: Add StarFive JH8100 clint dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs
2024-01-21Merge tag 'powerpc-6.8-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Aneesh Kumar: - Increase default stack size to 32KB for Book3S Thanks to Michael Ellerman. * tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Increase default stack size to 32KB
2024-01-21bcachefs: Improve inode_to_text()Kent Overstreet
Add line breaks - inode_to_text() is now much easier to read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: logged_ops_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: reflink_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs; extents_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: ec_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: subvolume_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: snapshot_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: alloc_background_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: xattr_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: dirent_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: inode_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs; quota_format.hKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: sb-counters_format.hKent Overstreet
bcachefs_format.h has gotten too big; let's do some organizing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: counters.c -> sb-counters.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: comment bch_subvolumeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bch_snapshot::btimeKent Overstreet
Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: add missing __GFP_NOWARNKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: opts->compression can now also be applied in the backgroundKent Overstreet
The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Prep work for variable size btree node buffersKent Overstreet
bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: grab s_umount only if snapshottingSu Yue
When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>