summaryrefslogtreecommitdiff
path: root/drivers/pci
AgeCommit message (Collapse)Author
2025-03-21PCI: vmd: Disable MSI remapping bypass under XenRoger Pau Monne
MSI remapping bypass (directly configuring MSI entries for devices on the VMD bus) won't work under Xen, as Xen is not aware of devices in such bus, and hence cannot configure the entries using the pIRQ interface in the PV case, and in the PVH case traps won't be setup for MSI entries for such devices. Until Xen is aware of devices in the VMD bus prevent the VMD_FEAT_CAN_BYPASS_MSI_REMAP capability from being used when running as any kind of Xen guest. The MSI remapping bypass is an optional feature of VMD bridges, and hence when running under Xen it will be masked and devices will be forced to redirect its interrupts from the VMD bridge. That mode of operation must always be supported by VMD bridges and works when Xen is not aware of devices behind the VMD bridge. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Message-ID: <20250219092059.90850-3-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-03-20PCI: Move cardbus IO size declarations into pci/pci.hIlpo Järvinen
For some reason, cardbus related io/mem size declarations are in linux/pci.h, whereas non-cardbus sizes are already in pci/pci.h. Move all them into one place in pci/pci.h. Link: https://lore.kernel.org/r/20250311174701.3586-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Make pci_setup_bridge() staticIlpo Järvinen
pci_setup_bridge() is only used within setup-bus.c. Therefore, make it a static function. Link: https://lore.kernel.org/r/20250311174701.3586-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Move resource reassignment func declarations into pci/pci.hIlpo Järvinen
Neither pci_reassign_bridge_resources() nor pci_reassign_resource() is used outside of the PCI subsystem. They seem to be naturally static functions but since resource fitting/assignment is split between setup-bus.c and setup-res.c, they fall into different sides of the divide and need to be declared. Move the declarations of pci_reassign_bridge_resources() and pci_reassign_resource() into pci/pci.h to keep them internal to PCI subsystem. Link: https://lore.kernel.org/r/20250311174701.3586-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.hIlpo Järvinen
pci_rescan_bus_bridge_resize() is only used by code inside PCI subsystem. The comment also falsely advertises it to be for hotplug drivers, yet the only caller is from sysfs store function. Move the function declaration into pci/pci.h. Link: https://lore.kernel.org/r/20250311174701.3586-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Fix BAR resizing when VF BARs are assignedIlpo Järvinen
__resource_resize_store() attempts to release all resources of the device before attempting the resize. The loop, however, only covers standard BARs (< PCI_STD_NUM_BARS). If a device has VF BARs that are assigned, pci_reassign_bridge_resources() finds the bridge window still has some assigned child resources and returns -NOENT which makes pci_resize_resource() to detect an error and abort the resize. Change the release loop to cover all resources up to VF BARs which allows the resize operation to release the bridge windows and attempt to assigned them again with the different size. If SR-IOV is enabled, disallow resize as it requires releasing also IOV resources. Link: https://lore.kernel.org/r/20250320142837.8027-1-ilpo.jarvinen@linux.intel.com Fixes: 91fa127794ac ("PCI: Expose PCIe Resizable BAR support via sysfs") Reported-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2025-03-20PCI: Allow PCI bridges to go to D3Hot on all non-x86Manivannan Sadhasivam
Currently, pci_bridge_d3_possible() encodes a variety of decision factors when deciding whether a given bridge can be put into D3. A particular one of note is for "recent enough PCIe ports." Per Rafael [0]: "There were hardware issues related to PM on x86 platforms predating the introduction of Connected Standby in Windows. For instance, programming a port into D3hot by writing to its PMCSR might cause the PCIe link behind it to go down and the only way to revive it was to power cycle the Root Complex. And similar." Thus, this function contains a DMI-based check for post-2015 BIOS. The above factors (Windows, x86) don't really apply to non-x86 systems, and also, many such systems don't have BIOS or DMI. However, we'd like to be able to suspend bridges on non-x86 systems too. Restrict the "recent enough" check to x86. If we find further incompatibilities, it probably makes sense to expand on the deny-list approach (i.e., bridge_d3_blacklist or similar). Link: https://lore.kernel.org/r/20250320110604.v6.1.Id0a0e78ab0421b6bce51c4b0b87e6aebdfc69ec7@changeid Link: https://lore.kernel.org/linux-pci/CAJZ5v0j_6jeMAQ7eFkZBe5Yi+USGzysxAgfemYh=-zq4h5W+Qg@mail.gmail.com/ [0] Link: https://lore.kernel.org/linux-pci/20240227225442.GA249898@bhelgaas/ [1] Link: https://lore.kernel.org/linux-pci/20240828210705.GA37859@bhelgaas/ [2] [Brian: rewrite to !X86 based on Rafael's suggestions] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-18PCI: vmd: Make vmd_dev::cfg_lock a raw_spinlock_t typeRyo Takakura
The access to the PCI config space via pci_ops::read and pci_ops::write is a low-level hardware access. The functions can be accessed with disabled interrupts even on PREEMPT_RT. The pci_lock is a raw_spinlock_t for this purpose. A spinlock_t becomes a sleeping lock on PREEMPT_RT, so it cannot be acquired with disabled interrupts. The vmd_dev::cfg_lock is accessed in the same context as the pci_lock. Make vmd_dev::cfg_lock a raw_spinlock_t type so it can be used with interrupts disabled. This was reported as: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 Call Trace: rt_spin_lock+0x4e/0x130 vmd_pci_read+0x8d/0x100 [vmd] pci_user_read_config_byte+0x6f/0xe0 pci_read_config+0xfe/0x290 sysfs_kf_bin_read+0x68/0x90 Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Acked-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> [bigeasy: reword commit message] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/r/20250218080830.ufw3IgyX@linutronix.de [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: add back report info from https://lore.kernel.org/lkml/20241218115951.83062-1-ryotkkr98@gmail.com/] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-17mm: allow compound zone device pagesAlistair Popple
Zone device pages are used to represent various type of device memory managed by device drivers. Currently compound zone device pages are not supported. This is because MEMORY_DEVICE_FS_DAX pages are the only user of higher order zone device pages and have their own page reference counting. A future change will unify FS DAX reference counting with normal page reference counting rules and remove the special FS DAX reference counting. Supporting that requires compound zone device pages. Supporting compound zone device pages requires compound_head() to distinguish between head and tail pages whilst still preserving the special struct page fields that are specific to zone device pages. A tail page is distinguished by having bit zero being set in page->compound_head, with the remaining bits pointing to the head page. For zone device pages page->compound_head is shared with page->pgmap. The page->pgmap field must be common to all pages within a folio, even if the folio spans memory sections. Therefore pgmap is the same for both head and tail pages and can be moved into the folio and we can use the standard scheme to find compound_head from a tail page. Link: https://lkml.kernel.org/r/67055d772e6102accf85161d0b57b0b3944292bf.1740713401.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Balbir Singh <balbirs@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Asahi Lina <lina@asahilina.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: linmiaohe <linmiaohe@huawei.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/mm_init: move p2pdma page refcount initialisation to p2pdmaAlistair Popple
Currently ZONE_DEVICE page reference counts are initialised by core memory management code in __init_zone_device_page() as part of the memremap() call which driver modules make to obtain ZONE_DEVICE pages. This initialises page refcounts to 1 before returning them to the driver. This was presumably done because it drivers had a reference of sorts on the page. It also ensured the page could always be mapped with vm_insert_page() for example and would never get freed (ie. have a zero refcount), freeing drivers of manipulating page reference counts. However it complicates figuring out whether or not a page is free from the mm perspective because it is no longer possible to just look at the refcount. Instead the page type must be known and if GUP is used a secondary pgmap reference is also sometimes needed. To simplify this it is desirable to remove the page reference count for the driver, so core mm can just use the refcount without having to account for page type or do other types of tracking. This is possible because drivers can always assume the page is valid as core kernel will never offline or remove the struct page. This means it is now up to drivers to initialise the page refcount as required. P2PDMA uses vm_insert_page() to map the page, and that requires a non-zero reference count when initialising the page so set that when the page is first mapped. Link: https://lkml.kernel.org/r/6aedb0ac2886dcc4503cb705273db5b3863a0b66.1740713401.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Asahi Lina <lina@asahilina.net> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chunyan Zhang <zhang.lyra@gmail.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: linmiaohe <linmiaohe@huawei.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17PCI: dwc: Consolidate devicetree handling in dw_pcie_host_get_resources()Bjorn Helgaas
Consolidate devicetree resource handling in dw_pcie_host_get_resources(). No functional change intended. Link: https://lore.kernel.org/r/20250315201548.858189-5-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-03-17PCI: dwc: Call devm_pci_alloc_host_bridge() early in dw_pcie_host_init()Frank Li
Move devm_pci_alloc_host_bridge() to the beginning of dw_pcie_host_init(). devm_pci_alloc_host_bridge() is generic code that doesn't depend on any DWC resource, so moving it earlier keeps all the subsequent devicetree-related code together. [bhelgaas: reorder earlier in series] Link: https://lore.kernel.org/r/20250315201548.858189-4-helgaas@kernel.org Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-17PCI: dwc: Rename cpu_addr to parent_bus_addr for ATU configurationFrank Li
Rename 'cpu_addr' to 'parent_bus_addr' in the DesignWare ATU configuration. The ATU translates parent bus addresses to PCI addresses, which are often the same as CPU addresses but can differ in systems where the bus fabric translates addresses before passing them to the PCIe controller. This renaming clarifies the purpose and avoids confusion. Link: https://lore.kernel.org/r/20250315201548.858189-3-helgaas@kernel.org Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-17PCI: dwc: Use resource start as ioremap() input in dw_pcie_pme_turn_off()Frank Li
The msg_res region translates writes into PCIe Message TLPs. Previously we mapped this region using atu.cpu_addr, the input address programmed into the ATU. "cpu_addr" is a misnomer because when a bus fabric translates addresses between the CPU and the ATU, the ATU input address is different from the CPU address. A future patch will rename "cpu_addr" and correct the value to be the ATU input address instead of the CPU physical address. Map the msg_res region before writing to it using the msg_res resource start, a CPU physical address. Link: https://lore.kernel.org/r/20250315201548.858189-2-helgaas@kernel.org Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-16PCI: histb: Fix an error handling path in histb_pcie_probe()Christophe JAILLET
If an error occurs after a successful phy_init() call, then phy_exit() should be called. Add the missing call, as already done in the remove function. Fixes: bbd11bddb398 ("PCI: hisi: Add HiSilicon STB SoC PCIe controller driver") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> [kwilczynski: remove unnecessary hipcie->phy NULL check from histb_pcie_probe() and squash a patch that removes similar NULL check for hipcie-phy from histb_pcie_remove() from https://lore.kernel.org/linux-pci/c369b5d25e17a44984ae5a889ccc28a59a0737f7.1742058005.git.christophe.jaillet@wanadoo.fr] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/8301fc15cdea5d2dac21f57613e8e6922fb1ad95.1740854531.git.christophe.jaillet@wanadoo.fr
2025-03-15PCI: imx6: Use devm_clk_bulk_get_all() to fetch clocksRichard Zhu
Use devm_clk_bulk_get_all() helper to simplify clock handle code. No functional changes intended. Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> [kwilczynski: commit log, refactor to use dev_err_probe()] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20250226025628.1681206-1-hongxing.zhu@nxp.com
2025-03-15PCI: imx6: Identify controller via 'linux,pci-domain', not addressRichard Zhu
Instead of testing the controller register address to distinguish controller 1 from controller 0 on i.MX8MQ platforms, use the PCI domain number, which comes from the devicetree 'linux,pci-domain' property. All relevant devicetrees should already supply 'linux,pci-domain', which was added by c0b70f05c87f ("arm64: dts: imx8mq: use_dt_domains for pci node"). Instead of being set directly in imx_pcie_probe(), pci->dbi_base will be set by the DWC core in dw_pcie_get_resources(). No functional changes intended. Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250226024256.1678103-3-hongxing.zhu@nxp.com
2025-03-14PCI: dw-rockchip: Hide broken ATS capability for RK3588 running in EP modeNiklas Cassel
When running the RK3588 in Endpoint mode, with an Intel host with IOMMU enabled, the host side prints: DMAR: VT-d detected Invalidation Time-out Error: SID 0 When running the RK3588 in Endpoint mode, with an AMD host with IOMMU enabled, the host side prints: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=63:00.0 address=0x42b5b01a0] Rockchip has confirmed that the ATS support for RK3588 only works when running the PCIe controller in Root Complex (RC) mode, see: https://lore.kernel.org/linux-pci/93cdce39-1ae6-4939-a3fc-db10be7564e5@rock-chips.com Usually, to handle these issues, we add a quirk for the PCI vendor and device ID in drivers/pci/quirks.c with quirk_no_ats(). That is because we cannot usually modify the capabilities on the EP side. In this case, we can modify the capabilities on the EP side. Thus, hide the broken ATS capability on RK3588 when running in EP mode. That way, we don't need any quirk on the host side, and we see no errors on the host side, and we can run pci_endpoint_test successfully, with the IOMMU enabled on the host side. Acked-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Niklas Cassel <cassel@kernel.org> [kwilczynski: commit log, tidy up code comments and error message] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250310094826.842681-6-cassel@kernel.org
2025-03-14PCI: dwc: ep: Add dw_pcie_ep_hide_ext_capability()Niklas Cassel
Add dw_pcie_ep_hide_ext_capability() which can be used by an endpoint controller driver to hide a capability. This can be useful to hide a capability that is buggy, such that the host side does not try to enable the buggy capability. Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310094826.842681-5-cassel@kernel.org
2025-03-14PCI: dwc: ep: Return -ENOMEM for allocation failuresDan Carpenter
If the bitmap or memory allocations fail, then dw_pcie_ep_init_registers() will incorrectly return a success. Return -ENOMEM instead. Fixes: 869bc5253406 ("PCI: dwc: ep: Fix DBI access failure for drivers requiring refclk from host") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Krzysztof Wilczyński <kw@linux.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/36dcb6fc-f292-4dd5-bd45-a8c6f9dc3df7@stanley.mountain
2025-03-14PCI: Check BAR index for validityPhilipp Stanner
Many functions in PCI use accessor macros such as pci_resource_len(), which take a BAR index. That index, however, is never checked for validity, potentially resulting in undefined behavior by overflowing the array pci_dev.resource in the macro pci_resource_n(). Since many users of those macros directly assign the accessed value to an unsigned integer, the macros cannot be changed easily anymore to return -EINVAL for invalid indexes. Consequently, the problem has to be mitigated in higher layers. Add pci_bar_index_valid(). Use it where appropriate. Link: https://lore.kernel.org/r/20250312080634.13731-4-phasta@kernel.org Closes: https://lore.kernel.org/all/adb53b1f-29e1-3d14-0e61-351fd2d3ff0d@linux.intel.com/ Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: correct if-statement condition the pci_bar_index_is_valid() helper function uses, tidy up code comments] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: fix typo] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-14PCI: pciehp: Avoid unnecessary device replacement checkLukas Wunner
Hot-removal of nested PCI hotplug ports suffers from a long-standing race condition which can lead to a deadlock: A parent hotplug port acquires pci_lock_rescan_remove(), then waits for pciehp to unbind from a child hotplug port. Meanwhile that child hotplug port tries to acquire pci_lock_rescan_remove() as well in order to remove its own children. The deadlock only occurs if the parent acquires pci_lock_rescan_remove() first, not if the child happens to acquire it first. Several workarounds to avoid the issue have been proposed and discarded over the years, e.g.: https://lore.kernel.org/r/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/ A proper fix is being worked on, but needs more time as it is nontrivial and necessarily intrusive. Recent commit 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep") provokes more frequent occurrence of the deadlock when removing more than one Thunderbolt device during system sleep. The commit sought to detect device replacement, but also triggered on device removal. Differentiating reliably between replacement and removal is impossible because pci_get_dsn() returns 0 both if the device was removed, as well as if it was replaced with one lacking a Device Serial Number. Avoid the more frequent occurrence of the deadlock by checking whether the hotplug port itself was hot-removed. If so, there's no sense in checking whether its child device was replaced. This works because the ->resume_noirq() callback is invoked in top-down order for the entire hierarchy: A parent hotplug port detecting device replacement (or removal) marks all children as removed using pci_dev_set_disconnected() and a child hotplug port can then reliably detect being removed. Link: https://lore.kernel.org/r/02f166e24c87d6cde4085865cce9adfdfd969688.1741674172.git.lukas@wunner.de Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep") Reported-by: Kenneth Crudup <kenny@panix.com> Closes: https://lore.kernel.org/r/83d9302a-f743-43e4-9de2-2dd66d91ab5b@panix.com/ Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Closes: https://lore.kernel.org/r/20240926125909.2362244-1-acelan.kao@canonical.com/ Tested-by: Kenneth Crudup <kenny@panix.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: stable@vger.kernel.org # v6.11+
2025-03-14PCI: Fix wrong length of devres arrayPhilipp Stanner
The array for the iomapping cookie addresses has a length of PCI_STD_NUM_BARS. This constant, however, only describes standard BARs; while PCI can allow for additional, special BARs. The total number of PCI resources is described by constant PCI_NUM_RESOURCES, which is also used in, e.g., pci_select_bars(). Thus, the devres array has so far been too small. Change the length of the devres array to PCI_NUM_RESOURCES. Link: https://lore.kernel.org/r/20250312080634.13731-3-phasta@kernel.org Fixes: bbaff68bf4a4 ("PCI: Add managed partial-BAR request and map infrastructure") Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org # v6.11+
2025-03-13PCI/TPH: Replace the broken MSI-X control word updateThomas Gleixner
The driver walks the MSI descriptors to test whether a descriptor exists for a given index. That's just abuse of the MSI internals. The same test can be done with a single function call by looking up whether there is a Linux interrupt number assigned at the index. What's worse is that the function is completely unserialized against modifications of the MSI-X control by operations issued from the interrupt core. It also brings the PCI/MSI-X internal cached control word out of sync. Remove the trainwreck and invoke the function provided by the PCI/MSI core to update it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/all/20250313130321.898592817@linutronix.de
2025-03-13PCI/MSI: Provide a sane mechanism for TPHThomas Gleixner
The PCI/TPH driver fiddles with the MSI-X control word of an active interrupt completely unserialized against concurrent operations issued from the interrupt core. It also brings the PCI/MSI-X internal cached control word out of sync. Provide a function, which has the required serialization and keeps the control word cache in sync. Unfortunately this requires to look up and lock the interrupt descriptor, which should be only done in the interrupt core code. But confining this particular oddity in the PCI/MSI core is the lesser of all evil. A interrupt core implementation would require a larger pile of infrastructure and indirections for dubious value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/all/20250313130321.822790423@linutronix.de
2025-03-13PCI: hv: Switch MSI descriptor locking to guard()Thomas Gleixner
Convert the code to use the new guard(msi_descs_lock). No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/all/20250313130321.758905320@linutronix.de
2025-03-13PCI/MSI: Switch to MSI descriptor locking to guard()Thomas Gleixner
Convert the code to use the new guard(msi_descs_lock). No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/all/20250313130321.695027112@linutronix.de
2025-03-11PCI: xilinx-cpm: Add support for Versal Net CPM5NC Root Port controllerThippeswamy Havalige
The Versal Net ACAP (Adaptive Compute Acceleration Platform) devices incorporate the Coherency and PCIe Gen5 Module, specifically the Next-Generation Compact Module (CPM5NC). The integrated CPM5NC block, along with the built-in bridge, can function as a PCIe Root Port and supports the PCIe Gen5 protocol with data transfer rates of up to 32 GT/s, and is capable of supporting up to a x16 lane-width configuration. Bridge errors are managed using a specific interrupt line designed for CPM5N. The INTx interrupt support is not available. Currently in this commit platform specific bridge errors support is not added. Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com> [kwilczynski: commit log, squashed patch to fix an if-statement condition to ensure that xilinx_cpm_pcie_init_port() does not run on the CPM5NC_HOST variant from https://lore.kernel.org/linux-pci/20250311072402.1049990-1-thippeswamy.havalige@amd.com] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250224155025.782179-4-thippeswamy.havalige@amd.com
2025-03-11PCI: xilinx-cpm: Fix IRQ domain leak in error path of probeThippeswamy Havalige
The IRQ domain allocated for the PCIe controller is not freed if resource_list_first_type() returns NULL, leading to a resource leak. This fix ensures properly cleaning up the allocated IRQ domain in the error path. Fixes: 49e427e6bdd1 ("Merge branch 'pci/host-probe-refactor'") Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com> [kwilczynski: added missing Fixes: tag, refactored to use one of the goto labels] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250224155025.782179-2-thippeswamy.havalige@amd.com
2025-03-11iommu: Get DT/ACPI parsing into the proper probe pathRobin Murphy
In hindsight, there were some crucial subtleties overlooked when moving {of,acpi}_dma_configure() to driver probe time to allow waiting for IOMMU drivers with -EPROBE_DEFER, and these have become an ever-increasing source of problems. The IOMMU API has some fundamental assumptions that iommu_probe_device() is called for every device added to the system, in the order in which they are added. Calling it in a random order or not at all dependent on driver binding leads to malformed groups, a potential lack of isolation for devices with no driver, and all manner of unexpected concurrency and race conditions. We've attempted to mitigate the latter with point-fix bodges like iommu_probe_device_lock, but it's a losing battle and the time has come to bite the bullet and address the true source of the problem instead. The crux of the matter is that the firmware parsing actually serves two distinct purposes; one is identifying the IOMMU instance associated with a device so we can check its availability, the second is actually telling that instance about the relevant firmware-provided data for the device. However the latter also depends on the former, and at the time there was no good place to defer and retry that separately from the availability check we also wanted for client driver probe. Nowadays, though, we have a proper notion of multiple IOMMU instances in the core API itself, and each one gets a chance to probe its own devices upon registration, so we can finally make that work as intended for DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to be able to run the iommu_fwspec machinery currently buried deep in the wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be surprisingly straightforward to bootstrap this transformation by pretty much just calling the same path twice. At client driver probe time, dev->driver is obviously set; conversely at device_add(), or a subsequent bus_iommu_probe(), any device waiting for an IOMMU really should *not* have a driver already, so we can use that as a condition to disambiguate the two cases, and avoid recursing back into the IOMMU core at the wrong times. Obviously this isn't the nicest thing, but for now it gives us a functional baseline to then unpick the layers in between without many more awkward cross-subsystem patches. There are some minor side-effects like dma_range_map potentially being created earlier, and some debug prints being repeated, but these aren't significantly detrimental. Let's make things work first, then deal with making them nice. With the basic flow finally in the right order again, the next step is probably turning the bus->dma_configure paths inside-out, since all we really need from bus code is its notion of which device and input ID(s) to parse the common firmware properties with... Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10PCI: Remove stray put_device() in pci_register_host_bridge()Dan Carpenter
This put_device() was accidentally left over from when we changed the code from using device_register() to calling device_add(). Delete it. Link: https://lore.kernel.org/r/55b24870-89fb-4c91-b85d-744e35db53c2@stanley.mountain Fixes: 9885440b16b8 ("PCI: Fix pci_host_bridge struct device release/free handling") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-10PCI: Fix reference leak in pci_alloc_child_bus()Ma Ke
If device_register(&child->dev) fails, call put_device() to explicitly release child->dev, per the comment at device_register(). Found by code review. Link: https://lore.kernel.org/r/20250202062357.872971-1-make24@iscas.ac.cn Fixes: 4f535093cf8f ("PCI: Put pci_dev in device tree as early as possible") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: stable@vger.kernel.org
2025-03-10PCI: Fix reference leak in pci_register_host_bridge()Ma Ke
If device_register() fails, call put_device() to give up the reference to avoid a memory leak, per the comment at device_register(). Found by code review. Link: https://lore.kernel.org/r/20250225021440.3130264-1-make24@iscas.ac.cn Fixes: 37d6a0a6f470 ("PCI: Add pci_register_host_bridge() interface") Signed-off-by: Ma Ke <make24@iscas.ac.cn> [bhelgaas: squash Dan Carpenter's double free fix from https://lore.kernel.org/r/db806a6c-a91b-4e5a-a84b-6b7e01bdac85@stanley.mountain] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org
2025-03-10PCI: Cache offset of Resizable BAR capabilityBjorn Helgaas
Previously most resizable BAR interfaces (pci_rebar_get_possible_sizes(), pci_rebar_set_size(), etc) as well as pci_restore_state() searched config space for a Resizable BAR capability. Most devices don't have such a capability, so this is wasted effort, especially for pci_restore_state(). Search for a Resizable BAR capability once at enumeration-time and cache the offset so we don't have to search every time we need it. No functional change intended. Link: https://lore.kernel.org/r/20250215000301.175097-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-03-10PCI: Enable Configuration RRS SV earlyBjorn Helgaas
Following a reset, a Function may respond to Config Requests with Request Retry Status (RRS) Completion Status to indicate that it is temporarily unable to process the Request, but will be able to process the Request in the future (PCIe r6.0, sec 2.3.1). If the Configuration RRS Software Visibility feature is enabled and a Root Complex receives RRS for a config read of the Vendor ID, the Root Complex completes the Request to the host by returning PCI_VENDOR_ID_PCI_SIG, 0x0001 (sec 2.3.2). The Config RRS SV feature applies only to Root Ports and is not directly related to pci_scan_bridge_extend(). Move the RRS SV enable to set_pcie_port_type() where we handle other PCIe-specific configuration. Link: https://lore.kernel.org/r/20250303210217.199504-1-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-08PCI: Fix typosBjorn Helgaas
Fix typos and whitespace errors. Link: https://lore.kernel.org/r/20250307231715.438518-1-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-08PCI: dwc: ep: Remove superfluous function dw_pcie_ep_find_ext_capability()Niklas Cassel
Remove the superfluous function dw_pcie_ep_find_ext_capability(), as it is virtually identical to dw_pcie_find_ext_capability(). Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250221202646.395252-3-cassel@kernel.org Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: endpoint: pci-epf-test: Fix double free that causes kernel to oopsChristian Bruel
Fix a kernel oops found while testing the stm32_pcie Endpoint driver with handling of PERST# deassertion: During EP initialization, pci_epf_test_alloc_space() allocates all BARs, which are further freed if epc_set_bar() fails (for instance, due to no free inbound window). However, when pci_epc_set_bar() fails, the error path: pci_epc_set_bar() -> pci_epf_free_space() does not clear the previous assignment to epf_test->reg[bar]. Then, if the host reboots, the PERST# deassertion restarts the BAR allocation sequence with the same allocation failure (no free inbound window), creating a double free situation since epf_test->reg[bar] was deallocated and is still non-NULL. Thus, make sure that pci_epf_alloc_space() and pci_epf_free_space() invocations are symmetric, and as such, set epf_test->reg[bar] to NULL when memory is freed. Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Christian Bruel <christian.bruel@foss.st.com> Link: https://lore.kernel.org/r/20250124123043.96112-1-christian.bruel@foss.st.com [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: endpoint: Remove unused devm_pci_epc_destroy()Zijun Hu
The static function devm_pci_epc_match() is only invoked within the devm_pci_epc_destroy(). However, since it was initially introduced, this new API has had no callers. Thus, remove both the unused API and the static function. Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20250217-remove_api-v2-1-b169c9117045@quicinc.com Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: dw-rockchip: Describe Resizable BARs as Resizable BARsNiklas Cassel
Looking at section "11.4.4.29 USP_PCIE_RESBAR Registers Summary" in the Technical Reference Manual (TRM) for RK3588, we can see that none of the BARs are Fixed BARs, but actually Resizable BARs. I couldn't find any reference in the TRM for RK3568, but looking at the downstream PCIe endpoint driver, both RK3568 and RK3588 are treated as the same, so the BARs on RK3568 must also be Resizable BARs. Now when we actually have support for Resizable BARs, let's configure these BARs as such. Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Link: https://lore.kernel.org/r/20250131182949.465530-16-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: keystone: Specify correct alignment requirementNiklas Cassel
The support for a specific iATU alignment was added in commit 2a9a801620ef ("PCI: endpoint: Add support to specify alignment for buffers allocated to BARs"). This commit specifically mentions both that the alignment by each DWC based EP driver should match CX_ATU_MIN_REGION_SIZE, and that AM65x specifically has a 64 KB alignment. This also matches the CX_ATU_MIN_REGION_SIZE value specified in the section "12.2.2.4.7 PCIe Subsystem Address Translation" of the Technical Reference Manual (TRM) for AM65x: https://www.ti.com/lit/ug/spruid7e/spruid7e.pdf This higher value, 1 MB, was obviously an ugly hack used to be able to handle Resizable BARs which have a minimum size of 1 MB. Now when we actually have support for Resizable BARs, let's configure the iATU alignment requirement to the actual requirement. (BARs described as Resizable will still get aligned to 1 MB.) Cc: stable+noautosel@kernel.org # Depends on PCI endpoint Resizable BARs series Fixes: 23284ad677a9 ("PCI: keystone: Add support for PCIe EP in AM654x Platforms") Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250131182949.465530-15-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: keystone: Describe Resizable BARs as Resizable BARsNiklas Cassel
Looking at section "12.2.2.4.15 PCIe Subsystem BAR Configuration" in the following Technical Reference Manual (TRM) for AM65x: https://www.ti.com/lit/ug/spruid7e/spruid7e.pdf We can see that BAR2 and BAR5 are not Fixed BARs, but actually Resizable BARs. Now when we actually have support for Resizable BARs, let's configure these BARs as such. Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Link: https://lore.kernel.org/r/20250131182949.465530-14-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: dwc: ep: Allow EPF drivers to configure the size of Resizable BARsNiklas Cassel
The DWC databook specifies three different BARn_SIZING_SCHEME_N as: - Fixed Mask (0) - Programmable Mask (1) - Resizable BAR (2) Each of these sizing schemes have different instructions for how to initialize the BAR. The DWC driver currently does not support resizable BARs. Instead, in order to somewhat support resizable BARs, the DWC EP driver currently has an ugly hack that force sets a resizable BAR to 1 MB, if such a BAR is detected. Additionally, this hack only works if the DWC glue driver also has lied in their EPC features, and claimed that the resizable BAR is a 1 MB fixed size BAR. This is unintuitive (as you somehow need to know that you need to lie in your EPC features), but other than that it is overly restrictive, since a resizable BAR is capable of supporting sizes different than 1 MB. Add proper support for resizable BARs in the DWC EP driver. Note that the pci_epc_set_bar() API takes a struct pci_epf_bar which tells the EPC driver how it wants to configure the BAR. struct pci_epf_bar only has a single size struct member. This means that an EPC driver will only be able to set a single supported size. This is perfectly fine, as we do not need the complexity of allowing a host to change the size of the BAR. If someone ever wants to support resizing a resizable BAR, the pci_epc_set_bar() API can be extended in the future. With these changes, we allow an EPF driver to configure the size of Resizable BARs, rather than forcing them to a 1 MB size. This means that an EPC driver does not need to lie in EPC features, and an EPF driver will be able to set an arbitrary size (not be forced to a 1 MB size), just like BAR_PROGRAMMABLE. Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250131182949.465530-13-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: dwc: ep: Move dw_pcie_ep_find_ext_capability()Niklas Cassel
Move dw_pcie_ep_find_ext_capability() so that it is located next to dw_pcie_ep_find_capability(). Additionally, a follow-up commit requires this to be defined earlier in order to avoid a forward declaration. Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Link: https://lore.kernel.org/r/20250131182949.465530-12-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: endpoint: Add pci_epc_bar_size_to_rebar_cap()Niklas Cassel
Add a helper function to convert a size to the representation used by the Resizable BAR Capability Register. Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250131182949.465530-11-cassel@kernel.org [mani: squashed the change that added PCIe spec reference to comments from https://lore.kernel.org/linux-pci/20250219171454.2903059-2-cassel@kernel.org] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: endpoint: Allow EPF drivers to configure the size of Resizable BARsNiklas Cassel
A resizable BAR is different from a normal BAR in a few ways: - The minimum size of a resizable BAR is 1 MB. - Each BAR that is resizable has a Capability and Control register in the Resizable BAR Capability structure. These registers contain the supported sizes and the currently selected size of a resizable BAR. The supported sizes is a bitmap of the supported sizes. The selected size is a single value that is equal to one of the supported sizes. A resizable BAR thus has to be configured differently than a BAR_PROGRAMMABLE BAR, which usually sets the BAR size/mask in a vendor specific way. The PCI endpoint framework currently does not support resizable BARs. Add a BAR type BAR_RESIZABLE, so that an EPC driver can support resizable BARs properly. Note that the pci_epc_set_bar() API takes a struct pci_epf_bar which tells the EPC driver how it wants to configure the BAR. struct pci_epf_bar only has a single size struct member. This means that an EPC driver will only be able to set a single supported size. This is perfectly fine, as we do not need the complexity of allowing a host to change the size of the BAR. If someone ever wants to support resizing a resizable BAR, the pci_epc_set_bar() API can be extended in the future. With these changes, we allow an EPF driver to configure the size of Resizable BARs, rather than forcing them to a 1 MB size. Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250131182949.465530-10-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-08PCI: endpoint: pci-epf-test: Handle endianness properlyNiklas Cassel
The struct pci_epf_test_reg is the actual data in pci-epf-test's test_reg BAR (usually BAR0), which the host uses to send commands (etc.), and which pci-epf-test uses to send back status codes. pci-epf-test currently reads and writes this data without any endianness conversion functions, which means that pci-epf-test is completely broken on big-endian endpoint systems. PCI devices are inherently little-endian, and the data stored in the PCI BARs should be in little-endian. Use endianness conversion functions when reading and writing data to struct pci_epf_test_reg so that pci-epf-test will behave correctly on big-endian endpoint systems. Fixes: 349e7a85b25f ("PCI: endpoint: functions: Add an EP function to test PCI") Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Link: https://lore.kernel.org/r/20250127161242.104651-2-cassel@kernel.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-07PCI: Do not claim to release resource falselyIlpo Järvinen
pci_release_resource() will print "... releasing" regardless of the resource being assigned or not. Move the print after the res->parent check to avoid claiming the kernel would be releasing an unassigned resource. Likely, none of the current callers pass a resource that is unassigned so this change is mostly to correct the non-sensical order than to remove errorneous printouts. Link: https://lore.kernel.org/r/20250307140922.5776-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-07PCI: Increase Resizable BAR support from 512 GB to 128 TBZhiyuan Dai
Per PCIe r6.0, sec 7.8.6.2, devices can advertise Resizable BAR sizes up to 128 TB in the Resizable BAR Capability register. Larger sizes can be advertised via the Capability register, but that requires an API change. Update pci_rebar_get_possible_sizes() and pbus_size_mem() to increase the sizes we currently support from 512 GB to 128 TB. Link: https://lore.kernel.org/r/20250307053535.44918-1-daizhiyuan@phytium.com.cn Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-06PCI/DOE: Rename Discovery Response Data Object Contents to typeAlistair Francis
PCIe r6.1, sec 6.30.1.1, describes a "Vendor ID", a "Data Object Type" and "Next Index" as the fields in the DOE Discovery Response Data Object. The DOE driver currently uses both the terms 'type' and 'prot' for the second element. Rename all uses of the DOE Discovery Response Data Object to use 'type' as the second element of the object header, instead of type/prot as it currently is. Link: https://lore.kernel.org/r/20250306075211.1855177-2-alistair@alistair23.me Signed-off-by: Alistair Francis <alistair@alistair23.me> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>