Age | Commit message (Collapse) | Author |
|
Clean up when virtfn setup fails to prevent NULL pointer dereference
during device removal. The kernel oops below occurred due to incorrect
error handling flow when pci_setup_device() fails.
Add pci_iov_scan_device(), which handles virtfn allocation and setup and
cleans up if pci_setup_device() fails, so pci_iov_add_virtfn() doesn't need
to call pci_stop_and_remove_bus_device(). This prevents accessing
partially initialized virtfn devices during removal.
BUG: kernel NULL pointer dereference, address: 00000000000000d0
RIP: 0010:device_del+0x3d/0x3d0
Call Trace:
pci_remove_bus_device+0x7c/0x100
pci_iov_add_virtfn+0xfa/0x200
sriov_enable+0x208/0x420
mlx5_core_sriov_configure+0x6a/0x160 [mlx5_core]
sriov_numvfs_store+0xae/0x1a0
Link: https://lore.kernel.org/r/20250310084524.599225-1-shayd@nvidia.com
Fixes: e3f30d563a38 ("PCI: Make pci_destroy_dev() concurrent safe")
Signed-off-by: Shay Drory <shayd@nvidia.com>
[bhelgaas: commit log, return ERR_PTR(-ENOMEM) directly]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Keith Busch <kbusch@kernel.org>
|
|
pcie_bwctrl_select_speed() should take __fls() of the speed bit, not return
it as a raw value. Instead of directly returning 2.5GT/s speed bit, simply
assign the fallback speed (2.5GT/s) into supported_speeds variable to share
the normal return path that calls pcie_supported_speeds2target_speed() to
calculate __fls().
This code path is not very likely to execute because
pcie_get_supported_speeds() should provide valid ->supported_speeds but a
spec violating device could fail to synthesize any speed in
pcie_get_supported_speeds(). It could also happen in case the
supported_speeds intersection is empty (also a violation of the current
PCIe specs).
Link: https://lore.kernel.org/r/20250321163103.5145-1-ilpo.jarvinen@linux.intel.com
Fixes: de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
PCIe hotplug can operate in poll mode without interrupt handlers using a
polling kthread only. eb34da60edee ("PCI: pciehp: Disable hotplug
interrupt during suspend") failed to consider that and enables HPIE
(Hot-Plug Interrupt Enable) unconditionally when resuming the Port.
Only set HPIE if non-poll mode is in use. This makes
pcie_enable_interrupt() match how pcie_enable_notification() already
handles HPIE.
Link: https://lore.kernel.org/r/20250321162114.3939-1-ilpo.jarvinen@linux.intel.com
Fixes: eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
|
|
Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both
MSI and MSI-X entries globally, regardless of the IRQ chip they are using.
Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over
event channels, to prevent PCI code from attempting to toggle the maskbit,
as it's Xen that controls the bit.
However, the pci_msi_ignore_mask being global will affect devices that use
MSI interrupts but are not routing those interrupts over event channels
(not using the Xen pIRQ chip). One example is devices behind a VMD PCI
bridge. In that scenario the VMD bridge configures MSI(-X) using the
normal IRQ chip (the pIRQ one in the Xen case), and devices behind the
bridge configure the MSI entries using indexes into the VMD bridge MSI
table. The VMD bridge then demultiplexes such interrupts and delivers to
the destination device(s). Having pci_msi_ignore_mask set in that scenario
prevents (un)masking of MSI entries for devices behind the VMD bridge.
Move the signaling of no entry masking into the MSI domain flags, as that
allows setting it on a per-domain basis. Set it for the Xen MSI domain
that uses the pIRQ chip, while leaving it unset for the rest of the
cases.
Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and
with Xen dropping usage the variable is unneeded.
This fixes using devices behind a VMD bridge on Xen PV hardware domains.
Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean
Linux cannot use them. By inhibiting the usage of
VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask
bodge devices behind a VMD bridge do work fine when use from a Linux Xen
hardware domain. That's the whole point of the series.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-ID: <20250219092059.90850-4-roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
MSI remapping bypass (directly configuring MSI entries for devices on the
VMD bus) won't work under Xen, as Xen is not aware of devices in such bus,
and hence cannot configure the entries using the pIRQ interface in the PV
case, and in the PVH case traps won't be setup for MSI entries for such
devices.
Until Xen is aware of devices in the VMD bus prevent the
VMD_FEAT_CAN_BYPASS_MSI_REMAP capability from being used when running as
any kind of Xen guest.
The MSI remapping bypass is an optional feature of VMD bridges, and hence
when running under Xen it will be masked and devices will be forced to
redirect its interrupts from the VMD bridge. That mode of operation must
always be supported by VMD bridges and works when Xen is not aware of
devices behind the VMD bridge.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-ID: <20250219092059.90850-3-roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
For some reason, cardbus related io/mem size declarations are in
linux/pci.h, whereas non-cardbus sizes are already in pci/pci.h.
Move all them into one place in pci/pci.h.
Link: https://lore.kernel.org/r/20250311174701.3586-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pci_setup_bridge() is only used within setup-bus.c. Therefore, make it a
static function.
Link: https://lore.kernel.org/r/20250311174701.3586-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Neither pci_reassign_bridge_resources() nor pci_reassign_resource() is used
outside of the PCI subsystem. They seem to be naturally static functions
but since resource fitting/assignment is split between setup-bus.c and
setup-res.c, they fall into different sides of the divide and need to be
declared.
Move the declarations of pci_reassign_bridge_resources() and
pci_reassign_resource() into pci/pci.h to keep them internal to PCI
subsystem.
Link: https://lore.kernel.org/r/20250311174701.3586-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pci_rescan_bus_bridge_resize() is only used by code inside PCI subsystem.
The comment also falsely advertises it to be for hotplug drivers, yet the
only caller is from sysfs store function. Move the function declaration
into pci/pci.h.
Link: https://lore.kernel.org/r/20250311174701.3586-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
__resource_resize_store() attempts to release all resources of the device
before attempting the resize. The loop, however, only covers standard BARs
(< PCI_STD_NUM_BARS). If a device has VF BARs that are assigned,
pci_reassign_bridge_resources() finds the bridge window still has some
assigned child resources and returns -NOENT which makes
pci_resize_resource() to detect an error and abort the resize.
Change the release loop to cover all resources up to VF BARs which allows
the resize operation to release the bridge windows and attempt to assigned
them again with the different size.
If SR-IOV is enabled, disallow resize as it requires releasing also IOV
resources.
Link: https://lore.kernel.org/r/20250320142837.8027-1-ilpo.jarvinen@linux.intel.com
Fixes: 91fa127794ac ("PCI: Expose PCIe Resizable BAR support via sysfs")
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Currently, pci_bridge_d3_possible() encodes a variety of decision factors
when deciding whether a given bridge can be put into D3. A particular one
of note is for "recent enough PCIe ports." Per Rafael [0]:
"There were hardware issues related to PM on x86 platforms predating
the introduction of Connected Standby in Windows. For instance,
programming a port into D3hot by writing to its PMCSR might cause the
PCIe link behind it to go down and the only way to revive it was to
power cycle the Root Complex. And similar."
Thus, this function contains a DMI-based check for post-2015 BIOS.
The above factors (Windows, x86) don't really apply to non-x86 systems, and
also, many such systems don't have BIOS or DMI. However, we'd like to be
able to suspend bridges on non-x86 systems too.
Restrict the "recent enough" check to x86. If we find further
incompatibilities, it probably makes sense to expand on the deny-list
approach (i.e., bridge_d3_blacklist or similar).
Link: https://lore.kernel.org/r/20250320110604.v6.1.Id0a0e78ab0421b6bce51c4b0b87e6aebdfc69ec7@changeid
Link: https://lore.kernel.org/linux-pci/CAJZ5v0j_6jeMAQ7eFkZBe5Yi+USGzysxAgfemYh=-zq4h5W+Qg@mail.gmail.com/ [0]
Link: https://lore.kernel.org/linux-pci/20240227225442.GA249898@bhelgaas/ [1]
Link: https://lore.kernel.org/linux-pci/20240828210705.GA37859@bhelgaas/ [2]
[Brian: rewrite to !X86 based on Rafael's suggestions]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The access to the PCI config space via pci_ops::read and pci_ops::write is
a low-level hardware access. The functions can be accessed with disabled
interrupts even on PREEMPT_RT. The pci_lock is a raw_spinlock_t for this
purpose.
A spinlock_t becomes a sleeping lock on PREEMPT_RT, so it cannot be
acquired with disabled interrupts. The vmd_dev::cfg_lock is accessed in
the same context as the pci_lock.
Make vmd_dev::cfg_lock a raw_spinlock_t type so it can be used with
interrupts disabled.
This was reported as:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
Call Trace:
rt_spin_lock+0x4e/0x130
vmd_pci_read+0x8d/0x100 [vmd]
pci_user_read_config_byte+0x6f/0xe0
pci_read_config+0xfe/0x290
sysfs_kf_bin_read+0x68/0x90
Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com>
Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Acked-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
[bigeasy: reword commit message]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Link: https://lore.kernel.org/r/20250218080830.ufw3IgyX@linutronix.de
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: add back report info from
https://lore.kernel.org/lkml/20241218115951.83062-1-ryotkkr98@gmail.com/]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Zone device pages are used to represent various type of device memory
managed by device drivers. Currently compound zone device pages are not
supported. This is because MEMORY_DEVICE_FS_DAX pages are the only user
of higher order zone device pages and have their own page reference
counting.
A future change will unify FS DAX reference counting with normal page
reference counting rules and remove the special FS DAX reference counting.
Supporting that requires compound zone device pages.
Supporting compound zone device pages requires compound_head() to
distinguish between head and tail pages whilst still preserving the
special struct page fields that are specific to zone device pages.
A tail page is distinguished by having bit zero being set in
page->compound_head, with the remaining bits pointing to the head page.
For zone device pages page->compound_head is shared with page->pgmap.
The page->pgmap field must be common to all pages within a folio, even if
the folio spans memory sections. Therefore pgmap is the same for both
head and tail pages and can be moved into the folio and we can use the
standard scheme to find compound_head from a tail page.
Link: https://lkml.kernel.org/r/67055d772e6102accf85161d0b57b0b3944292bf.1740713401.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Asahi Lina <lina@asahilina.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linmiaohe <linmiaohe@huawei.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently ZONE_DEVICE page reference counts are initialised by core memory
management code in __init_zone_device_page() as part of the memremap()
call which driver modules make to obtain ZONE_DEVICE pages. This
initialises page refcounts to 1 before returning them to the driver.
This was presumably done because it drivers had a reference of sorts on
the page. It also ensured the page could always be mapped with
vm_insert_page() for example and would never get freed (ie. have a zero
refcount), freeing drivers of manipulating page reference counts.
However it complicates figuring out whether or not a page is free from the
mm perspective because it is no longer possible to just look at the
refcount. Instead the page type must be known and if GUP is used a
secondary pgmap reference is also sometimes needed.
To simplify this it is desirable to remove the page reference count for
the driver, so core mm can just use the refcount without having to account
for page type or do other types of tracking. This is possible because
drivers can always assume the page is valid as core kernel will never
offline or remove the struct page.
This means it is now up to drivers to initialise the page refcount as
required. P2PDMA uses vm_insert_page() to map the page, and that requires
a non-zero reference count when initialising the page so set that when the
page is first mapped.
Link: https://lkml.kernel.org/r/6aedb0ac2886dcc4503cb705273db5b3863a0b66.1740713401.git-series.apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Asahi Lina <lina@asahilina.net>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linmiaohe <linmiaohe@huawei.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Consolidate devicetree resource handling in dw_pcie_host_get_resources().
No functional change intended.
Link: https://lore.kernel.org/r/20250315201548.858189-5-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
|
|
Move devm_pci_alloc_host_bridge() to the beginning of dw_pcie_host_init().
devm_pci_alloc_host_bridge() is generic code that doesn't depend on any DWC
resource, so moving it earlier keeps all the subsequent devicetree-related
code together.
[bhelgaas: reorder earlier in series]
Link: https://lore.kernel.org/r/20250315201548.858189-4-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Rename 'cpu_addr' to 'parent_bus_addr' in the DesignWare ATU configuration.
The ATU translates parent bus addresses to PCI addresses, which are often
the same as CPU addresses but can differ in systems where the bus fabric
translates addresses before passing them to the PCIe controller. This
renaming clarifies the purpose and avoids confusion.
Link: https://lore.kernel.org/r/20250315201548.858189-3-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The msg_res region translates writes into PCIe Message TLPs. Previously we
mapped this region using atu.cpu_addr, the input address programmed into
the ATU.
"cpu_addr" is a misnomer because when a bus fabric translates addresses
between the CPU and the ATU, the ATU input address is different from the
CPU address. A future patch will rename "cpu_addr" and correct the value
to be the ATU input address instead of the CPU physical address.
Map the msg_res region before writing to it using the msg_res resource
start, a CPU physical address.
Link: https://lore.kernel.org/r/20250315201548.858189-2-helgaas@kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
If an error occurs after a successful phy_init() call, then phy_exit()
should be called.
Add the missing call, as already done in the remove function.
Fixes: bbd11bddb398 ("PCI: hisi: Add HiSilicon STB SoC PCIe controller driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
[kwilczynski: remove unnecessary hipcie->phy NULL check from
histb_pcie_probe() and squash a patch that removes similar NULL
check for hipcie-phy from histb_pcie_remove() from
https://lore.kernel.org/linux-pci/c369b5d25e17a44984ae5a889ccc28a59a0737f7.1742058005.git.christophe.jaillet@wanadoo.fr]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/8301fc15cdea5d2dac21f57613e8e6922fb1ad95.1740854531.git.christophe.jaillet@wanadoo.fr
|
|
Use devm_clk_bulk_get_all() helper to simplify clock handle code.
No functional changes intended.
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[kwilczynski: commit log, refactor to use dev_err_probe()]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250226025628.1681206-1-hongxing.zhu@nxp.com
|
|
Instead of testing the controller register address to distinguish
controller 1 from controller 0 on i.MX8MQ platforms, use the PCI domain
number, which comes from the devicetree 'linux,pci-domain' property.
All relevant devicetrees should already supply 'linux,pci-domain', which
was added by c0b70f05c87f ("arm64: dts: imx8mq: use_dt_domains for pci
node").
Instead of being set directly in imx_pcie_probe(), pci->dbi_base will be
set by the DWC core in dw_pcie_get_resources().
No functional changes intended.
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20250226024256.1678103-3-hongxing.zhu@nxp.com
|
|
When running the RK3588 in Endpoint mode, with an Intel host with IOMMU
enabled, the host side prints:
DMAR: VT-d detected Invalidation Time-out Error: SID 0
When running the RK3588 in Endpoint mode, with an AMD host with IOMMU
enabled, the host side prints:
iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=63:00.0 address=0x42b5b01a0]
Rockchip has confirmed that the ATS support for RK3588 only works when
running the PCIe controller in Root Complex (RC) mode, see:
https://lore.kernel.org/linux-pci/93cdce39-1ae6-4939-a3fc-db10be7564e5@rock-chips.com
Usually, to handle these issues, we add a quirk for the PCI vendor and
device ID in drivers/pci/quirks.c with quirk_no_ats(). That is because
we cannot usually modify the capabilities on the EP side. In this case,
we can modify the capabilities on the EP side.
Thus, hide the broken ATS capability on RK3588 when running in EP mode.
That way, we don't need any quirk on the host side, and we see no errors
on the host side, and we can run pci_endpoint_test successfully, with
the IOMMU enabled on the host side.
Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
[kwilczynski: commit log, tidy up code comments and error message]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250310094826.842681-6-cassel@kernel.org
|
|
Add dw_pcie_ep_hide_ext_capability() which can be used by an endpoint
controller driver to hide a capability.
This can be useful to hide a capability that is buggy, such that the
host side does not try to enable the buggy capability.
Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250310094826.842681-5-cassel@kernel.org
|
|
If the bitmap or memory allocations fail, then dw_pcie_ep_init_registers()
will incorrectly return a success.
Return -ENOMEM instead.
Fixes: 869bc5253406 ("PCI: dwc: ep: Fix DBI access failure for drivers requiring refclk from host")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/36dcb6fc-f292-4dd5-bd45-a8c6f9dc3df7@stanley.mountain
|
|
Many functions in PCI use accessor macros such as pci_resource_len(),
which take a BAR index. That index, however, is never checked for
validity, potentially resulting in undefined behavior by overflowing the
array pci_dev.resource in the macro pci_resource_n().
Since many users of those macros directly assign the accessed value to
an unsigned integer, the macros cannot be changed easily anymore to
return -EINVAL for invalid indexes. Consequently, the problem has to be
mitigated in higher layers.
Add pci_bar_index_valid(). Use it where appropriate.
Link: https://lore.kernel.org/r/20250312080634.13731-4-phasta@kernel.org
Closes: https://lore.kernel.org/all/adb53b1f-29e1-3d14-0e61-351fd2d3ff0d@linux.intel.com/
Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
[kwilczynski: correct if-statement condition the pci_bar_index_is_valid()
helper function uses, tidy up code comments]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: fix typo]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Hot-removal of nested PCI hotplug ports suffers from a long-standing race
condition which can lead to a deadlock: A parent hotplug port acquires
pci_lock_rescan_remove(), then waits for pciehp to unbind from a child
hotplug port. Meanwhile that child hotplug port tries to acquire
pci_lock_rescan_remove() as well in order to remove its own children.
The deadlock only occurs if the parent acquires pci_lock_rescan_remove()
first, not if the child happens to acquire it first.
Several workarounds to avoid the issue have been proposed and discarded
over the years, e.g.:
https://lore.kernel.org/r/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/
A proper fix is being worked on, but needs more time as it is nontrivial
and necessarily intrusive.
Recent commit 9d573d19547b ("PCI: pciehp: Detect device replacement during
system sleep") provokes more frequent occurrence of the deadlock when
removing more than one Thunderbolt device during system sleep. The commit
sought to detect device replacement, but also triggered on device removal.
Differentiating reliably between replacement and removal is impossible
because pci_get_dsn() returns 0 both if the device was removed, as well as
if it was replaced with one lacking a Device Serial Number.
Avoid the more frequent occurrence of the deadlock by checking whether the
hotplug port itself was hot-removed. If so, there's no sense in checking
whether its child device was replaced.
This works because the ->resume_noirq() callback is invoked in top-down
order for the entire hierarchy: A parent hotplug port detecting device
replacement (or removal) marks all children as removed using
pci_dev_set_disconnected() and a child hotplug port can then reliably
detect being removed.
Link: https://lore.kernel.org/r/02f166e24c87d6cde4085865cce9adfdfd969688.1741674172.git.lukas@wunner.de
Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")
Reported-by: Kenneth Crudup <kenny@panix.com>
Closes: https://lore.kernel.org/r/83d9302a-f743-43e4-9de2-2dd66d91ab5b@panix.com/
Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Closes: https://lore.kernel.org/r/20240926125909.2362244-1-acelan.kao@canonical.com/
Tested-by: Kenneth Crudup <kenny@panix.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org # v6.11+
|
|
The array for the iomapping cookie addresses has a length of
PCI_STD_NUM_BARS. This constant, however, only describes standard BARs;
while PCI can allow for additional, special BARs.
The total number of PCI resources is described by constant
PCI_NUM_RESOURCES, which is also used in, e.g., pci_select_bars().
Thus, the devres array has so far been too small.
Change the length of the devres array to PCI_NUM_RESOURCES.
Link: https://lore.kernel.org/r/20250312080634.13731-3-phasta@kernel.org
Fixes: bbaff68bf4a4 ("PCI: Add managed partial-BAR request and map infrastructure")
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Cc: stable@vger.kernel.org # v6.11+
|
|
The driver walks the MSI descriptors to test whether a descriptor exists
for a given index. That's just abuse of the MSI internals.
The same test can be done with a single function call by looking up whether
there is a Linux interrupt number assigned at the index.
What's worse is that the function is completely unserialized against
modifications of the MSI-X control by operations issued from the interrupt
core. It also brings the PCI/MSI-X internal cached control word out of
sync.
Remove the trainwreck and invoke the function provided by the PCI/MSI core
to update it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250313130321.898592817@linutronix.de
|
|
The PCI/TPH driver fiddles with the MSI-X control word of an active
interrupt completely unserialized against concurrent operations issued
from the interrupt core. It also brings the PCI/MSI-X internal cached
control word out of sync.
Provide a function, which has the required serialization and keeps the
control word cache in sync.
Unfortunately this requires to look up and lock the interrupt descriptor,
which should be only done in the interrupt core code. But confining this
particular oddity in the PCI/MSI core is the lesser of all evil. A
interrupt core implementation would require a larger pile of infrastructure
and indirections for dubious value.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250313130321.822790423@linutronix.de
|
|
Convert the code to use the new guard(msi_descs_lock).
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20250313130321.758905320@linutronix.de
|
|
Convert the code to use the new guard(msi_descs_lock).
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/all/20250313130321.695027112@linutronix.de
|
|
The Versal Net ACAP (Adaptive Compute Acceleration Platform) devices
incorporate the Coherency and PCIe Gen5 Module, specifically the
Next-Generation Compact Module (CPM5NC).
The integrated CPM5NC block, along with the built-in bridge, can function
as a PCIe Root Port and supports the PCIe Gen5 protocol with data transfer
rates of up to 32 GT/s, and is capable of supporting up to a x16 lane-width
configuration.
Bridge errors are managed using a specific interrupt line designed for
CPM5N. The INTx interrupt support is not available.
Currently in this commit platform specific bridge errors support is not
added.
Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com>
[kwilczynski: commit log, squashed patch to fix an if-statement condition
to ensure that xilinx_cpm_pcie_init_port() does not run on the CPM5NC_HOST
variant from https://lore.kernel.org/linux-pci/20250311072402.1049990-1-thippeswamy.havalige@amd.com]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250224155025.782179-4-thippeswamy.havalige@amd.com
|
|
The IRQ domain allocated for the PCIe controller is not freed if
resource_list_first_type() returns NULL, leading to a resource leak.
This fix ensures properly cleaning up the allocated IRQ domain in
the error path.
Fixes: 49e427e6bdd1 ("Merge branch 'pci/host-probe-refactor'")
Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com>
[kwilczynski: added missing Fixes: tag, refactored to use one of the goto labels]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://lore.kernel.org/r/20250224155025.782179-2-thippeswamy.havalige@amd.com
|
|
In hindsight, there were some crucial subtleties overlooked when moving
{of,acpi}_dma_configure() to driver probe time to allow waiting for
IOMMU drivers with -EPROBE_DEFER, and these have become an
ever-increasing source of problems. The IOMMU API has some fundamental
assumptions that iommu_probe_device() is called for every device added
to the system, in the order in which they are added. Calling it in a
random order or not at all dependent on driver binding leads to
malformed groups, a potential lack of isolation for devices with no
driver, and all manner of unexpected concurrency and race conditions.
We've attempted to mitigate the latter with point-fix bodges like
iommu_probe_device_lock, but it's a losing battle and the time has come
to bite the bullet and address the true source of the problem instead.
The crux of the matter is that the firmware parsing actually serves two
distinct purposes; one is identifying the IOMMU instance associated with
a device so we can check its availability, the second is actually
telling that instance about the relevant firmware-provided data for the
device. However the latter also depends on the former, and at the time
there was no good place to defer and retry that separately from the
availability check we also wanted for client driver probe.
Nowadays, though, we have a proper notion of multiple IOMMU instances in
the core API itself, and each one gets a chance to probe its own devices
upon registration, so we can finally make that work as intended for
DT/IORT/VIOT platforms too. All we need is for iommu_probe_device() to
be able to run the iommu_fwspec machinery currently buried deep in the
wrong end of {of,acpi}_dma_configure(). Luckily it turns out to be
surprisingly straightforward to bootstrap this transformation by pretty
much just calling the same path twice. At client driver probe time,
dev->driver is obviously set; conversely at device_add(), or a
subsequent bus_iommu_probe(), any device waiting for an IOMMU really
should *not* have a driver already, so we can use that as a condition to
disambiguate the two cases, and avoid recursing back into the IOMMU core
at the wrong times.
Obviously this isn't the nicest thing, but for now it gives us a
functional baseline to then unpick the layers in between without many
more awkward cross-subsystem patches. There are some minor side-effects
like dma_range_map potentially being created earlier, and some debug
prints being repeated, but these aren't significantly detrimental. Let's
make things work first, then deal with making them nice.
With the basic flow finally in the right order again, the next step is
probably turning the bus->dma_configure paths inside-out, since all we
really need from bus code is its notion of which device and input ID(s)
to parse the common firmware properties with...
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci-driver.c
Acked-by: Rob Herring (Arm) <robh@kernel.org> # of/device.c
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/e3b191e6fd6ca9a1e84c5e5e40044faf97abb874.1740753261.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This put_device() was accidentally left over from when we changed the code
from using device_register() to calling device_add(). Delete it.
Link: https://lore.kernel.org/r/55b24870-89fb-4c91-b85d-744e35db53c2@stanley.mountain
Fixes: 9885440b16b8 ("PCI: Fix pci_host_bridge struct device release/free handling")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
If device_register(&child->dev) fails, call put_device() to explicitly
release child->dev, per the comment at device_register().
Found by code review.
Link: https://lore.kernel.org/r/20250202062357.872971-1-make24@iscas.ac.cn
Fixes: 4f535093cf8f ("PCI: Put pci_dev in device tree as early as possible")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org
|
|
If device_register() fails, call put_device() to give up the reference to
avoid a memory leak, per the comment at device_register().
Found by code review.
Link: https://lore.kernel.org/r/20250225021440.3130264-1-make24@iscas.ac.cn
Fixes: 37d6a0a6f470 ("PCI: Add pci_register_host_bridge() interface")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
[bhelgaas: squash Dan Carpenter's double free fix from
https://lore.kernel.org/r/db806a6c-a91b-4e5a-a84b-6b7e01bdac85@stanley.mountain]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
|
|
Previously most resizable BAR interfaces (pci_rebar_get_possible_sizes(),
pci_rebar_set_size(), etc) as well as pci_restore_state() searched config
space for a Resizable BAR capability. Most devices don't have such a
capability, so this is wasted effort, especially for pci_restore_state().
Search for a Resizable BAR capability once at enumeration-time and cache
the offset so we don't have to search every time we need it. No functional
change intended.
Link: https://lore.kernel.org/r/20250215000301.175097-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Following a reset, a Function may respond to Config Requests with Request
Retry Status (RRS) Completion Status to indicate that it is temporarily
unable to process the Request, but will be able to process the Request in
the future (PCIe r6.0, sec 2.3.1).
If the Configuration RRS Software Visibility feature is enabled and a Root
Complex receives RRS for a config read of the Vendor ID, the Root Complex
completes the Request to the host by returning PCI_VENDOR_ID_PCI_SIG,
0x0001 (sec 2.3.2).
The Config RRS SV feature applies only to Root Ports and is not directly
related to pci_scan_bridge_extend(). Move the RRS SV enable to
set_pcie_port_type() where we handle other PCIe-specific configuration.
Link: https://lore.kernel.org/r/20250303210217.199504-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Fix typos and whitespace errors.
Link: https://lore.kernel.org/r/20250307231715.438518-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Remove the superfluous function dw_pcie_ep_find_ext_capability(),
as it is virtually identical to dw_pcie_find_ext_capability().
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250221202646.395252-3-cassel@kernel.org
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Fix a kernel oops found while testing the stm32_pcie Endpoint driver
with handling of PERST# deassertion:
During EP initialization, pci_epf_test_alloc_space() allocates all BARs,
which are further freed if epc_set_bar() fails (for instance, due to no
free inbound window).
However, when pci_epc_set_bar() fails, the error path:
pci_epc_set_bar() ->
pci_epf_free_space()
does not clear the previous assignment to epf_test->reg[bar].
Then, if the host reboots, the PERST# deassertion restarts the BAR
allocation sequence with the same allocation failure (no free inbound
window), creating a double free situation since epf_test->reg[bar] was
deallocated and is still non-NULL.
Thus, make sure that pci_epf_alloc_space() and pci_epf_free_space()
invocations are symmetric, and as such, set epf_test->reg[bar] to NULL
when memory is freed.
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Christian Bruel <christian.bruel@foss.st.com>
Link: https://lore.kernel.org/r/20250124123043.96112-1-christian.bruel@foss.st.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The static function devm_pci_epc_match() is only invoked within the
devm_pci_epc_destroy(). However, since it was initially introduced,
this new API has had no callers.
Thus, remove both the unused API and the static function.
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250217-remove_api-v2-1-b169c9117045@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Looking at section "11.4.4.29 USP_PCIE_RESBAR Registers Summary" in the
Technical Reference Manual (TRM) for RK3588, we can see that none of the
BARs are Fixed BARs, but actually Resizable BARs.
I couldn't find any reference in the TRM for RK3568, but looking at the
downstream PCIe endpoint driver, both RK3568 and RK3588 are treated as
the same, so the BARs on RK3568 must also be Resizable BARs.
Now when we actually have support for Resizable BARs, let's configure
these BARs as such.
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20250131182949.465530-16-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The support for a specific iATU alignment was added in
commit 2a9a801620ef ("PCI: endpoint: Add support to specify alignment for
buffers allocated to BARs").
This commit specifically mentions both that the alignment by each DWC
based EP driver should match CX_ATU_MIN_REGION_SIZE, and that AM65x
specifically has a 64 KB alignment.
This also matches the CX_ATU_MIN_REGION_SIZE value specified in the
section "12.2.2.4.7 PCIe Subsystem Address Translation" of the Technical
Reference Manual (TRM) for AM65x:
https://www.ti.com/lit/ug/spruid7e/spruid7e.pdf
This higher value, 1 MB, was obviously an ugly hack used to be able to
handle Resizable BARs which have a minimum size of 1 MB.
Now when we actually have support for Resizable BARs, let's configure the
iATU alignment requirement to the actual requirement.
(BARs described as Resizable will still get aligned to 1 MB.)
Cc: stable+noautosel@kernel.org # Depends on PCI endpoint Resizable BARs series
Fixes: 23284ad677a9 ("PCI: keystone: Add support for PCIe EP in AM654x Platforms")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250131182949.465530-15-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Looking at section "12.2.2.4.15 PCIe Subsystem BAR Configuration" in the
following Technical Reference Manual (TRM) for AM65x:
https://www.ti.com/lit/ug/spruid7e/spruid7e.pdf
We can see that BAR2 and BAR5 are not Fixed BARs, but actually Resizable
BARs.
Now when we actually have support for Resizable BARs, let's configure
these BARs as such.
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20250131182949.465530-14-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The DWC databook specifies three different BARn_SIZING_SCHEME_N as:
- Fixed Mask (0)
- Programmable Mask (1)
- Resizable BAR (2)
Each of these sizing schemes have different instructions for how to
initialize the BAR.
The DWC driver currently does not support resizable BARs.
Instead, in order to somewhat support resizable BARs, the DWC EP driver
currently has an ugly hack that force sets a resizable BAR to 1 MB, if
such a BAR is detected.
Additionally, this hack only works if the DWC glue driver also has lied
in their EPC features, and claimed that the resizable BAR is a 1 MB fixed
size BAR.
This is unintuitive (as you somehow need to know that you need to lie in
your EPC features), but other than that it is overly restrictive, since a
resizable BAR is capable of supporting sizes different than 1 MB.
Add proper support for resizable BARs in the DWC EP driver.
Note that the pci_epc_set_bar() API takes a struct pci_epf_bar which tells
the EPC driver how it wants to configure the BAR.
struct pci_epf_bar only has a single size struct member.
This means that an EPC driver will only be able to set a single supported
size. This is perfectly fine, as we do not need the complexity of allowing
a host to change the size of the BAR. If someone ever wants to support
resizing a resizable BAR, the pci_epc_set_bar() API can be extended in the
future.
With these changes, we allow an EPF driver to configure the size of
Resizable BARs, rather than forcing them to a 1 MB size.
This means that an EPC driver does not need to lie in EPC features, and an
EPF driver will be able to set an arbitrary size (not be forced to a 1 MB
size), just like BAR_PROGRAMMABLE.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250131182949.465530-13-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Move dw_pcie_ep_find_ext_capability() so that it is located next to
dw_pcie_ep_find_capability().
Additionally, a follow-up commit requires this to be defined earlier
in order to avoid a forward declaration.
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20250131182949.465530-12-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Add a helper function to convert a size to the representation used by the
Resizable BAR Capability Register.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250131182949.465530-11-cassel@kernel.org
[mani: squashed the change that added PCIe spec reference to comments
from https://lore.kernel.org/linux-pci/20250219171454.2903059-2-cassel@kernel.org]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
A resizable BAR is different from a normal BAR in a few ways:
- The minimum size of a resizable BAR is 1 MB.
- Each BAR that is resizable has a Capability and Control register in
the Resizable BAR Capability structure.
These registers contain the supported sizes and the currently selected
size of a resizable BAR.
The supported sizes is a bitmap of the supported sizes. The selected size
is a single value that is equal to one of the supported sizes.
A resizable BAR thus has to be configured differently than a
BAR_PROGRAMMABLE BAR, which usually sets the BAR size/mask in a vendor
specific way.
The PCI endpoint framework currently does not support resizable BARs.
Add a BAR type BAR_RESIZABLE, so that an EPC driver can support resizable
BARs properly.
Note that the pci_epc_set_bar() API takes a struct pci_epf_bar which tells
the EPC driver how it wants to configure the BAR.
struct pci_epf_bar only has a single size struct member.
This means that an EPC driver will only be able to set a single supported
size. This is perfectly fine, as we do not need the complexity of allowing
a host to change the size of the BAR. If someone ever wants to support
resizing a resizable BAR, the pci_epc_set_bar() API can be extended in the
future.
With these changes, we allow an EPF driver to configure the size of
Resizable BARs, rather than forcing them to a 1 MB size.
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250131182949.465530-10-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|