Age | Commit message (Collapse) | Author |
|
When CONFIG_PCI_P2PDMA=y (which is basically enabled on all
large x86 distros), it maps the PFN's via a ZONE_DEVICE
mapping using devm_memremap_pages(). The mapped virtual
address range corresponds to the pci_resource_start()
of the BAR address and size corresponding to the BAR length.
When KASLR is enabled, the direct map range of the kernel is
reduced to the size of physical memory plus additional padding.
If the BAR address is beyond this limit, PCI peer to peer DMA
mappings fail.
Fix this by not shrinking the size of the direct map when
CONFIG_PCI_P2PDMA=y.
This reduces the total available entropy, but it's better than
the current work around of having to disable KASLR completely.
[ mingo: Clarified the changelog to point out the broad impact ... ]
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/Kconfig
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/lkml/20250206023201.1481957-1-balbirs@nvidia.com/
Link: https://lore.kernel.org/r/20250206234234.1912585-1-balbirs@nvidia.com
--
arch/x86/mm/kaslr.c | 10 ++++++++--
drivers/pci/Kconfig | 6 ++++++
2 files changed, 14 insertions(+), 2 deletions(-)
|
|
The .get_power() and .set_power() function pointers in struct
cpci_hp_controller_ops were declared but never implemented by any
driver.
Thus, to improve code readability and reduce resource usage,
remove these pointers and the code that has never been used.
Link: https://lore.kernel.org/r/20250217185638.398925-1-trintaeoitogc@gmail.com
Signed-off-by: Guilherme Giacomo Simoes <trintaeoitogc@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Flit mode introduced in PCIe r6.0 alters how the TLP Header Log is
presented through AER and DPC Capability registers. The TLP Prefix Log
Register is not present with Flit mode, and the register becomes an
extension of the TLP Header Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13).
Adapt pcie_read_tlp_log() and struct pcie_tlp_log to read and store the
extended TLP Header Log when the Link is in Flit mode. As the Prefix Log
and Extended TLP Header are not present at the same time, a C union can be
used.
Determining whether the error occurred while the Link was in Flit mode is a
bit complicated. In case of AER, the Advanced Error Capabilities and
Control Register directly tells whether the error was logged in Flit mode
or not (PCIe r6.1 sec 7.8.4.7). The DPC Capability (PCIe r6.1 sec 7.9.14),
unfortunately, does not contain the same information.
Unlike AER, the DPC Capability does not provide a way to discern whether
the error was logged in Flit mode (this is confirmed by PCI WG to be an
oversight in the spec). DPC will bring the Link down immediately following
an error, which makes it impossible to acquire the Flit Mode Status
directly from the Link Status 2 register because Flit Mode Status is only
set in certain Link states (PCIe r6.1 sec 7.5.3.20). As a workaround, use
the flit_mode value stored into the struct pci_bus.
Link: https://lore.kernel.org/r/20250207161836.2755-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
PCIe r6.0 added Flit mode, which mainly alters HW behavior, but there are
some OS visible changes. The OS visible changes include differences in the
layout of some capabilities and interpretation of the TLP headers (in
diagnostics situations).
To be able to determine which mode the PCIe Link is using, store the Flit
Mode Status (PCIe r6.1 sec 7.5.3.20) information in addition to the Link
speed into struct pci_bus in pcie_update_link_speed().
Link: https://lore.kernel.org/r/20250207161836.2755-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: use unsigned int:1 instead of bool, update flit_mode setting]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
include/linux/pci.h provides low-level pci_printk() interface that is
only used by AER because it needs to print the same message with
different levels depending on the error severity. No other PCI code
uses that functionality and calls pci_<level>() logging functions
directly with the appropriate level.
Descope pci_printk() into AER as aer_printk().
Link: https://lore.kernel.org/r/20241216161012.1774-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: retain pci_printk() for now since shpchp still uses it and I
moved those patches to a different branch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Commit 47c8846a49ba ("PCI: Extend ACS configurability") introduced bugs
that fail to configure ACS ctrl to the value specified by the kernel
parameter. Essentially there are two bugs:
1) When ACS is configured for multiple PCI devices using 'config_acs'
kernel parameter, it results into error "PCI: Can't parse ACS command
line parameter". This is due to a bug that doesn't preserve the ACS
mask, but instead overwrites the mask with value 0.
For example, using 'config_acs' to configure ACS ctrl for multiple BDFs
fails:
Kernel command line: pci=config_acs=1111011@0020:02:00.0;101xxxx@0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
PCI: Can't parse ACS command line parameter
pci 0020:02:00.0: ACS mask = 0x007f
pci 0020:02:00.0: ACS flags = 0x007b
pci 0020:02:00.0: Configured ACS to 0x007b
After this fix:
Kernel command line: pci=config_acs=1111011@0020:02:00.0;101xxxx@0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
pci 0020:02:00.0: ACS mask = 0x007f
pci 0020:02:00.0: ACS flags = 0x007b
pci 0020:02:00.0: ACS control = 0x005f
pci 0020:02:00.0: ACS fw_ctrl = 0x0053
pci 0020:02:00.0: Configured ACS to 0x007b
pci 0039:00:00.0: ACS mask = 0x0070
pci 0039:00:00.0: ACS flags = 0x0050
pci 0039:00:00.0: ACS control = 0x001d
pci 0039:00:00.0: ACS fw_ctrl = 0x0000
pci 0039:00:00.0: Configured ACS to 0x0050
2) In the bit manipulation logic, we copy the bit from the firmware
settings when mask bit 0.
For example, 'disable_acs_redir' fails to clear all three ACS P2P redir
bits due to the wrong bit fiddling:
Kernel command line: pci=disable_acs_redir=0020:02:00.0;0030:02:00.0;0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
pci 0020:02:00.0: ACS mask = 0x002c
pci 0020:02:00.0: ACS flags = 0xffd3
pci 0020:02:00.0: Configured ACS to 0xfffb
pci 0030:02:00.0: ACS mask = 0x002c
pci 0030:02:00.0: ACS flags = 0xffd3
pci 0030:02:00.0: Configured ACS to 0xffdf
pci 0039:00:00.0: ACS mask = 0x002c
pci 0039:00:00.0: ACS flags = 0xffd3
pci 0039:00:00.0: Configured ACS to 0xffd3
After this fix:
Kernel command line: pci=disable_acs_redir=0020:02:00.0;0030:02:00.0;0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
pci 0020:02:00.0: ACS mask = 0x002c
pci 0020:02:00.0: ACS flags = 0xffd3
pci 0020:02:00.0: ACS control = 0x007f
pci 0020:02:00.0: ACS fw_ctrl = 0x007b
pci 0020:02:00.0: Configured ACS to 0x0053
pci 0030:02:00.0: ACS mask = 0x002c
pci 0030:02:00.0: ACS flags = 0xffd3
pci 0030:02:00.0: ACS control = 0x005f
pci 0030:02:00.0: ACS fw_ctrl = 0x005f
pci 0030:02:00.0: Configured ACS to 0x0053
pci 0039:00:00.0: ACS mask = 0x002c
pci 0039:00:00.0: ACS flags = 0xffd3
pci 0039:00:00.0: ACS control = 0x001d
pci 0039:00:00.0: ACS fw_ctrl = 0x0000
pci 0039:00:00.0: Configured ACS to 0x0000
Link: https://lore.kernel.org/r/20250207030338.456887-1-tdave@nvidia.com
Fixes: 47c8846a49ba ("PCI: Extend ACS configurability")
Signed-off-by: Tushar Dave <tdave@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
|
|
Update device ID for the Qcom SA8775P SoC.
Signed-off-by: Mrinmay Sarkar <quic_msarkar@quicinc.com>
Link: https://lore.kernel.org/r/20241205065422.2515086-3-quic_msarkar@quicinc.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Remove a leftover assert for mac_reset line in mtk_pcie_en7581_power_up().
This is not harmful since EN7581 does not requires mac_reset and
mac_reset is not defined in EN7581 device tree.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://lore.kernel.org/r/20250201-pcie-en7581-remove-mac_reset-v2-1-a06786cdc683@kernel.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
This driver is used to control the power state of the devices attached to
the PCI slots. Currently, it controls the voltage rails of the PCI slots
defined in the devicetree node of the root port.
The voltage rails for PCI slots are documented in the DT-schema:
https://github.com/devicetree-org/dt-schema/blob/v2024.11/dtschema/schemas/pci/pci-bus-common.yaml#L153
Since this driver has to work with different kind of slots (PCIe
x1/x4/x8/x16, Mini PCIe, PCI, etc.), the driver is thus using the
of_regulator_bulk_get_all() API to obtain the voltage regulators defined
in the DT node, instead of hardcoding them.
As such, the DT node of the root port should define the relevant supply
properties corresponding to the voltage rails of the PCI slot.
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-5-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The VF driver controls an endpoint attached to the PCI HyperV
controller. An invalidation sent by the PF driver in the host
would be delivered *to* the endpoint driver by the controller
driver.
Thus, correct the wording within the comment.
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20250207190716.89995-1-eahariha@linux.microsoft.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The pwrctrl core will rescan the bus once the device is powered on. So
there is no need to continue scanning for the device further.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-3-827473c8fbf4@linaro.org
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The PCI core will try to access the devices even after pci_stop_dev()
for things like Data Object Exchange (DOE), ASPM, etc.
So, move pci_pwrctrl_unregister() to the near end of pci_destroy_dev()
to make sure that the devices are powered down only after the PCI core
is done with them.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-2-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Current way of creating pwrctrl devices requires iterating through the
child devicetree nodes of the PCI bridge in pci_pwrctrl_create_devices().
Even though it works, it creates confusion as there is no symmetry between
this and pci_pwrctrl_unregister() function that removes the pwrctrl
devices.
So to make these two functions symmetric, move the creation of pwrctrl
devices to pci_scan_device(). During the scan of each device in a slot,
the devicetree node (if exists) for the PCI device will be checked. If it
has the supplies populated, then the pwrctrl device will be created.
Since the PCI device scan happens so early, there would be no "struct
pci_dev" available for the device. So the host bridge is used as the
parent of all pwrctrl devices.
One nice side effect of this move is that, it is now possible to have
pwrctrl devices for PCI bridges as well (to control the supplies of PCI
slots).
Suggested-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-1-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Before 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to
avoid use-after-free"), we would free the ASPM link only after the last
function on the bus pertaining to the given link was removed.
That was too late. If function 0 is removed before sibling function,
link->downstream would point to free'd memory after.
After above change, we freed the ASPM parent link state upon any function
removal on the bus pertaining to a given link.
That is too early. If the link is to a PCIe switch with MFD on the upstream
port, then removing functions other than 0 first would free a link which
still remains parent_link to the remaining downstream ports.
The resulting GPFs are especially frequent during hot-unplug, because
pciehp removes devices on the link bus in reverse order.
On that switch, function 0 is the virtual P2P bridge to the internal bus.
Free exactly when function 0 is removed -- before the parent link is
obsolete, but after all subordinate links are gone.
Link: https://lore.kernel.org/r/e12898835f25234561c9d7de4435590d957b85d9.1734924854.git.dns@arista.com
Fixes: 456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
Signed-off-by: Daniel Stodden <dns@arista.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
The "shpchp_debug" module parameter is used to enable debug logging. The
generic ability to turn on/off debug prints dynamically covers this use
case already so there is no need for module specific debug handling. The
ctrl_dbg() wrapper also uses a low-level pci_printk() despite always using
KERN_DEBUG level.
Remove "shpchp_debug" parameter and convert ctrl_dbg() to use pci_dbg().
From now on, shpchp can be debugged using the normal dynamic debugger by
setting CONFIG_DYNAMIC_DEBUG=y and then either adding to kernel cmdline:
dyndbg="file drivers/pci/hotplug/shpchp* +p"
or using this command on a running kernel:
echo 'file drivers/pci/hotplug/shpchp* +p' > /sys/kernel/debug/dynamic_debug/control
Link: https://lore.kernel.org/r/20250217095550.2789-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The shpchp hotplug driver defines logging wrapper with generic names which
are just duplicates of existing generic printk() wrappers. They are also
unused so remove them.
Link: https://lore.kernel.org/r/20250217095550.2789-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Convert the last user of dbg() to use ctrl_dbg().
Link: https://lore.kernel.org/r/20241216161012.1774-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The logging in shpchp module init/exit functions is not very useful.
Remove it.
Link: https://lore.kernel.org/r/20241216161012.1774-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
A recent discussion has revealed that using DPM_FLAG_SMART_SUSPEND
unconditionally is generally problematic because it may lead to
situations in which the device's runtime PM information is internally
inconsistent or does not reflect its real state [1].
For this reason, change the handling of DPM_FLAG_SMART_SUSPEND so that
it is only taken into account if it is consistently set by the drivers
of all devices having any PM callbacks throughout dependency graphs in
accordance with the following rules:
- The "smart suspend" feature is only enabled for devices whose drivers
ask for it (that is, set DPM_FLAG_SMART_SUSPEND) and for devices
without PM callbacks unless they have never had runtime PM enabled.
- The "smart suspend" feature is not enabled for a device if it has not
been enabled for the device's parent unless the parent does not take
children into account or it has never had runtime PM enabled.
- The "smart suspend" feature is not enabled for a device if it has not
been enabled for one of the device's suppliers taking runtime PM into
account unless that supplier has never had runtime PM enabled.
Namely, introduce a new device PM flag called smart_suspend that is only
set if the above conditions are met and update all DPM_FLAG_SMART_SUSPEND
users to check power.smart_suspend instead of directly checking the
latter.
At the same time, drop the power.set_active flage introduced recently
in commit 3775fc538f53 ("PM: sleep: core: Synchronize runtime PM status
of parents and children") because it is now sufficient to check
power.smart_suspend along with the dev_pm_skip_resume() return value
to decide whether or not pm_runtime_set_active() needs to be called
for the device.
Link: https://lore.kernel.org/linux-pm/CAPDyKFroyU3YDSfw_Y6k3giVfajg3NQGwNWeteJWqpW29BojhQ@mail.gmail.com/ [1]
Fixes: 7585946243d6 ("PM: sleep: core: Restrict power.set_active propagation")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci
Link: https://patch.msgid.link/1914558.tdWV9SEqCh@rjwysocki.net
|
|
Remove and rescan cycle can result in failure to assign a bridge window if
it becomes larger than before the remove. The bridge window size will
include space for disabled Expansion ROM, which can causes the bridge
window to not fit anymore into the same address space slot on rescan if the
Expansion ROM resource was not assigned before the remove. In addition, the
optional resource handling is not internally consistent.
The resource fitting logic supports three main types of optional resources:
- IOV BARs
- Expansion ROMs
- Bridge window size variation due to optional resources
In addition to the above, resizable BARs beyond their current size will
require handling optional variation in resource sizes within the resource
fitting algorithm (not yet done by the resource fitting code).
There are multiple inconsistencies related to optional resource handling:
a) The allocation failure of disabled expansion ROM requires special case
inside assign_requested_resources_sorted().
b) The optionality of disabled expansion ROM is not considered during
bridge window sizing in pbus_size_mem().
c) Setting resource size to zero for optional resource in pbus_size_mem()
is problematic because it makes also the alignment invalid, which is
checked by pdev_sort_resources().
Optional IOV resources have their size set to zero by pbus_size_mem()
but the information about size is stored externally in struct pci_sriov
and complex call-chain trickery in pci_resource_alignment() ensures IOV
resources return a valid alignment despite having zero resource size. A
solution that is specific to IOV resources makes it hard to use the same
solution for other types of resources such as expansion ROM.
Simply changing pbus_size_mem() is not sufficient to fully address the main
issue because it would introduce disparity between bridge window sizing and
resource allocation. Due to size-based ordering of the resource list during
assignment loop, an Expansion ROM resource could steal space from some
other resource and make the other resource not fit if the Expansion ROM is
larger than the other resource. Thus, the resource assignment functions
need to be changed as well.
Make optional resource handling more straightforward. Use
pci_resource_is_optional() to determine if a resource is optional in both
bridge window sizing and assignment failure classification to ensure they
always align. Indicate with a parameter to
assign_requested_resources_sorted() whether it should attempt to allocate
optional resources or not.
Always try first to assign all resources (also when realloc_head is not
provided). This is required for calls from
pci_assign_unassigned_root_bus_resources() that provide realloc_head only
with some of its iterations.
Non-bridge-window optional resources in realloc_head now have add_size 0.
This condition has to be detected in reassign_resources_sorted() before
reassigning them (which would fail as there is no size change). Removing
add_size=0 optional resources entirely from realloc_head might eventually
be doable but further rework in __assign_resources_sorted() is needed first
to support such a change.
Link: https://lore.kernel.org/r/20241216175632.4175-26-ilpo.jarvinen@linux.intel.com
Reported-by: Jia Yao <jia.yao@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219547
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jia Yao <jia.yao@intel.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Resetting a resource is problematic as it prevents attempting to allocate
the resource later, unless something in between restores the resource.
Similarly, if fail_head does not contain all resources that were reset,
those resources cannot be restored later.
The entire reset/restore cycle adds complexity and leaving resources in the
reset state causes issues to other code such as for checks done in
pci_enable_resources(). Take a small step towards not resetting resources
by delaying reset until the end of resource assignment and build failure
list (fail_head) in sync with the reset to avoid leaving behind resources
that cannot be restored (for the case where the caller provides fail_head
in the first place to allow restore somewhere in the callchain, as is not
all callers pass non-NULL fail_head).
Leave the Expansion ROM check temporarily in place while building the
failure list until an upcoming change that reworks optional resource
handling.
Ideally, whole resource reset could be removed but doing that in one step
would be non-tractable due to complexity of all related code.
Link: https://lore.kernel.org/r/20241216175632.4175-25-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
reassign_resources_sorted() uses resource_size() to select between
pci_assign_resource() and pci_reassign_resource(). Due to twisted way
bridge window sizing in pbus_size_mem() sets resource sizes to 0, it works
to match into IOV resources but that is going to be changed by an upcoming
change.
Replace resource_size() check with res->parent check that is the true
dividing line in between whether assign or reassign function should be used
for the resource.
Link: https://lore.kernel.org/r/20241216175632.4175-24-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
PCI resource fitting is somewhat hard to track because it performs many
actions without logging them. In the case inside
__assign_resources_sorted(), the resources are released before resource
assignment is going to be retried in a different order. That is just one
level of retries the resource fitting performs overall so tracking it
through repeated assignments or failures of a resource gets messy rather
quickly.
Simply announce the release explicitly using pci_dbg() so it is clear what
is going on with each resource.
Link: https://lore.kernel.org/r/20241216175632.4175-23-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Add pci_dbg() to note that an assignment failure was for an optional
resource and reword existing message about resource resize to say the
change was optional.
Link: https://lore.kernel.org/r/20241216175632.4175-22-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Add a dummy list to always have a non-NULL realloc head in
__assign_resources_sorted() as it allows only checking list_empty().
In future, it would be good to ensure all callers provide a valid
realloc_head but that is relatively complex to do in practice and not
necessary for the subsequent optional resource handling fix.
Link: https://lore.kernel.org/r/20241216175632.4175-21-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
pci_enable_resources() checks if device's io and mem resources are all
assigned and disallows enable if any resource failed to assign (*) but
makes an exception for the case of disabled extension ROM. There are other
optional resources, however.
Add pci_resource_is_optional() and use it instead of
pci_resource_is_disabled_rom() to cover also IOV resources that are also
optional as per pbus_size_mem().
As there will be more users of pci_resource_is_optional() inside
setup-bus.c in changes coming up after this one, the function is placed
there.
(*) In practice, resource fitting code calls reset_resource() for any
resource it fails to assign which clears resource's ->flags causing
pci_enable_resources() to never detect failed resource assignments.
This seems undesirable internal logic inconsistency, effectively
reset_resource() prevents pci_enable_resources() from functioning as
intended. This is one step of many that will be needed towards removing
reset_resource().
Link: https://lore.kernel.org/r/20241216175632.4175-20-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Resource fitting needs to restore the saved dev resources in a few places.
Add a restore_dev_resource() helper for that.
Link: https://lore.kernel.org/r/20241216175632.4175-19-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Commit a4ac9fea016f ("PCI : Calculate right add_size") removed including
min_align into new_size in pci_reassign_resource() which is the correct
thing to do. However, it also added a snakeoil comment that the resource
would already be aligned with min_align which is incorrect.
A resource that is assigned earlier is aligned with the old alignment, NOT
with the new requested alignment (min_align) until later deep within the
reassignment callchain. Thus, remove the incorrect comment.
Link: https://lore.kernel.org/r/20241216175632.4175-18-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Cc: Yinghai Lu <yinghai@kernel.org>
|
|
pci_assign_unassigned_root_bus_resources() and
pci_assign_unassigned_bridge_resources() have a loop that may perform
several rounds to assign resources. The code to prepare for the next round
is identical.
Consolidate the code that prepares for the next assignment round into
pci_prepare_next_assign_round().
Link: https://lore.kernel.org/r/20241216175632.4175-17-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Rename 'retval' to 'ret' in pci_assign_unassigned_bridge_resources().
Link: https://lore.kernel.org/r/20241216175632.4175-16-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
pci_assign_unassigned_root_bus_resources() and
pci_assign_unassigned_bridge_resources() contain ad hoc loops using
backwards goto and gotos out of the loop. Replace them with while loops
and break statements.
While reindenting the loop bodies, add braces & remove parenthesis.
Link: https://lore.kernel.org/r/20241216175632.4175-15-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Reduce level of call nesting by calling pdev_sort_resources() directly
and by moving the tests done inside __dev_sort_resources() into
pdev_resources_assignable() helper.
Link: https://lore.kernel.org/r/20241216175632.4175-14-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
All return paths want to free head list in __assign_resources_sorted(), so
add a label and use goto.
Link: https://lore.kernel.org/r/20241216175632.4175-13-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Many PCI resource allocation related functions process struct
pci_dev_resource items which hold the struct pci_dev and resource pointers.
Reduce the number of lines that need indirection by adding 'dev' and 'res'
local variable to hold the pointers.
Link: https://lore.kernel.org/r/20241216175632.4175-12-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
A few places in PCI code, mainly in setup-bus.c, need to reverse lookup the
index of a resource in pci_dev's resource array. Create pci_resource_num()
helper to avoid repeating the pointer arithmetic trick used to calculate
the index.
Link: https://lore.kernel.org/r/20241216175632.4175-11-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Instead of chaining logic inside if () condition so that multiple lines are
required, make !resource_size() a separate check and use continue.
Link: https://lore.kernel.org/r/20241216175632.4175-10-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
There are multiple places where special handling is required for IOV
resources.
Extract the identification of IOV resources to pci_resource_is_iov() and
drop a few ifdefs.
Link: https://lore.kernel.org/r/20241216175632.4175-9-ilpo.jarvinen@linux.intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
A few sites that could use resource_set_range/size() in setup-bus.c were
not picked up earlier due to them no matching the usual pattern. Convert
them now.
These are more cases similar to 783602c920e9 ("PCI: Use
resource_set_{range,size}() helpers").
Link: https://lore.kernel.org/r/20241216175632.4175-8-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: add 783602c920e9 history]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Convert literals in setup-bus.c to SZ_* defines that make the size more
human readable.
As the code is now self-explanatory, eliminate comments about the size.
Link: https://lore.kernel.org/r/20241216175632.4175-7-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Commit 903534fa7d30 ("PCI: Fix resource double counting on remove &
rescan") fixed double counting of mem resources because of old_size being
applied too early.
Fix a similar counting bug on the io resource side.
Link: https://lore.kernel.org/r/20241216175632.4175-6-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Commit 566f1dd52816 ("PCI: Relax bridge window tail sizing rules")
relaxed the bridge window requirements for non-optional size (size0)
but pbus_size_mem() also handles optional sizes (IOV resources) using
size1. This can manifest, e.g., as a failure to resize a BAR back to
its original size after it was first shrunk when device has a VF BAR
resource because the bridge window (size1) is enlarged beyond what is
strictly required to fit the downstream resources.
Allow using relaxed bridge window tail sizing rules also with the optional
resources (size1) so that the remove/realloc cycle during BAR resize
(smaller and back to the original size) does not fail unexpectedly due to
increase in bridge window size demand.
Also move add_align calculation to more logical place next to size1
assignment as they are strongly related to each other.
Link: https://lore.kernel.org/r/20241216175632.4175-5-ilpo.jarvinen@linux.intel.com
Fixes: 566f1dd52816 ("PCI: Relax bridge window tail sizing rules")
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
In pbus_size_io() and pbus_size_mem(), a complex ?: operation is performed
to set size1. Decompose this so it's easier to read.
In the case of pbus_size_mem(), simply initializing size1 to zero ensures
the size1 checks work as expected.
Link: https://lore.kernel.org/r/20241216175632.4175-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Commit 566f1dd52816 ("PCI: Relax bridge window tail sizing rules")
relaxed bridge window tail alignment rule for the non-optional part
(size0, no add_size/add_align). The required alignment given for
pbus_upstream_space_available(), however, was add_align which relates
only to size1 alignment.
As pbus_upstream_space_available() only selects between normal and relaxed
tail alignment of the bridge window, the different alignment only makes
relaxed tail alignment to be used more often than what was intended, which
should be harmless because relaxed tail alignment itself should work in all
cases.
For consistency, change pbus_upstream_space_available() call to use
min_align which is the alignment that is going to be used for the bridge
window in the case where size0 sized allocation is attempted.
Link: https://lore.kernel.org/r/20241216175632.4175-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
Commit 566f1dd52816 ("PCI: Relax bridge window tail sizing rules")
relaxed bridge window tail alignment rule for the non-optional part
(size0, no add_size/add_align). The change, however, also overwrote
add_align, which is only related to case where optional size1 related
entry is added into realloc head.
Correct this by removing the add_align overwrite.
Link: https://lore.kernel.org/r/20241216175632.4175-2-ilpo.jarvinen@linux.intel.com
Fixes: 566f1dd52816 ("PCI: Relax bridge window tail sizing rules")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
|
|
7180c1d08639 ("PCI: Distribute available resources for root buses, too")
breaks BAR assignment on some devices:
pci 0006:03:00.0: BAR 0 [mem 0x6300c0000000-0x6300c1ffffff 64bit pref]: assigned
pci 0006:03:00.1: BAR 0 [mem 0x6300c2000000-0x6300c3ffffff 64bit pref]: assigned
pci 0006:03:00.2: BAR 0 [mem size 0x00800000 64bit pref]: can't assign; no space
pci 0006:03:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space
pci 0006:03:00.1: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space
The apertures of domain 0006 before 7180c1d08639:
6300c0000000-63ffffffffff : PCI Bus 0006:00
6300c0000000-6300c9ffffff : PCI Bus 0006:01
6300c0000000-6300c9ffffff : PCI Bus 0006:02 # 160MB
6300c0000000-6300c8ffffff : PCI Bus 0006:03 # 144MB
6300c0000000-6300c1ffffff : 0006:03:00.0 # 32MB
6300c2000000-6300c3ffffff : 0006:03:00.1 # 32MB
6300c4000000-6300c47fffff : 0006:03:00.2 # 8MB
6300c4800000-6300c67fffff : 0006:03:00.0 # 32MB
6300c6800000-6300c87fffff : 0006:03:00.1 # 32MB
6300c9000000-6300c9bfffff : PCI Bus 0006:04 # 12MB
6300c9000000-6300c9bfffff : PCI Bus 0006:05 # 12MB
6300c9000000-6300c91fffff : PCI Bus 0006:06 # 2MB
6300c9200000-6300c93fffff : PCI Bus 0006:07 # 2MB
6300c9400000-6300c95fffff : PCI Bus 0006:08 # 2MB
6300c9600000-6300c97fffff : PCI Bus 0006:09 # 2MB
After 7180c1d08639:
6300c0000000-63ffffffffff : PCI Bus 0006:00
6300c0000000-6300c9ffffff : PCI Bus 0006:01
6300c0000000-6300c9ffffff : PCI Bus 0006:02 # 160MB
6300c0000000-6300c43fffff : PCI Bus 0006:03 # 68MB
6300c0000000-6300c1ffffff : 0006:03:00.0 # 32MB
6300c2000000-6300c3ffffff : 0006:03:00.1 # 32MB
--- no space --- : 0006:03:00.2 # 8MB
--- no space --- : 0006:03:00.0 # 32MB
--- no space --- : 0006:03:00.1 # 32MB
6300c4400000-6300c4dfffff : PCI Bus 0006:04 # 10MB
6300c4400000-6300c4dfffff : PCI Bus 0006:05 # 10MB
6300c4400000-6300c45fffff : PCI Bus 0006:06 # 2MB
6300c4600000-6300c47fffff : PCI Bus 0006:07 # 2MB
6300c4800000-6300c49fffff : PCI Bus 0006:08 # 2MB
6300c4a00000-6300c4bfffff : PCI Bus 0006:09 # 2MB
We can see that the window to 0006:03 gets shrunken too much and 0006:04
eats away the window for 0006:03:00.2.
The offending commit distributes the upstream bridge's resources multiple
times to every downstream bridge, hence makes the aperture smaller than
desired because calculation of io_per_b, mmio_per_b and mmio_pref_per_b
becomes incorrect.
Instead, distribute downstream bridges' own resources to resolve the issue.
Link: https://lore.kernel.org/r/20241204022457.51322-1-kaihengf@nvidia.com
Fixes: 7180c1d08639 ("PCI: Distribute available resources for root buses, too")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219540
Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Carol Soto <csoto@nvidia.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Chris Chiu <chris.chiu@canonical.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Update a BUILD_BUG_ON() usage that works on current compilers, but
breaks compilation on gcc 5.3.1 (Alex Williamson)
- Avoid use of FLR for Mediatek MT7922 WiFi; the device previously
worked after a long timeout and fallback to SBR, but after a recent
RRS change it doesn't work at all after FLR (Bjorn Helgaas)
* tag 'pci-v6.14-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Avoid FLR for Mediatek MT7922 WiFi
PCI: Fix BUILD_BUG_ON usage for old gcc
|
|
Replace pointer arithmetic in finding the correct resource entry with the
pci_resource_n() helper.
Link: https://lore.kernel.org/r/20250207162301.2842-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The Mediatek MT7922 WiFi device advertises FLR support, but it apparently
does not work, and all subsequent config reads return ~0:
pci 0000:01:00.0: [14c3:0616] type 00 class 0x028000 PCIe Endpoint
pciback 0000:01:00.0: not ready 65535ms after FLR; giving up
After an FLR, pci_dev_wait() waits for the device to become ready. Prior
to d591f6804e7e ("PCI: Wait for device readiness with Configuration RRS"),
it polls PCI_COMMAND until it is something other that PCI_POSSIBLE_ERROR
(~0). If it times out, pci_dev_wait() returns -ENOTTY and
__pci_reset_function_locked() tries the next available reset method.
Typically this is Secondary Bus Reset, which does work, so the MT7922 is
eventually usable.
After d591f6804e7e, if Configuration Request Retry Status Software
Visibility (RRS SV) is enabled, pci_dev_wait() polls PCI_VENDOR_ID until it
is something other than the special 0x0001 Vendor ID that indicates a
completion with RRS status.
When RRS SV is enabled, reads of PCI_VENDOR_ID should return either 0x0001,
i.e., the config read was completed with RRS, or a valid Vendor ID. On the
MT7922, it seems that all config reads after FLR return ~0 indefinitely.
When pci_dev_wait() reads PCI_VENDOR_ID and gets 0xffff, it assumes that's
a valid Vendor ID and the device is now ready, so it returns with success.
After pci_dev_wait() returns success, we restore config space and continue.
Since the MT7922 is not actually ready after the FLR, the restore fails and
the device is unusable.
We considered changing pci_dev_wait() to continue polling if a
PCI_VENDOR_ID read returns either 0x0001 or 0xffff. This "works" as it did
before d591f6804e7e, although we have to wait for the timeout and then fall
back to SBR. But it doesn't work for SR-IOV VFs, which *always* return
0xffff as the Vendor ID.
Mark Mediatek MT7922 WiFi devices to avoid the use of FLR completely. This
will cause fallback to another reset method, such as SBR.
Link: https://lore.kernel.org/r/20250212193516.88741-1-helgaas@kernel.org
Fixes: d591f6804e7e ("PCI: Wait for device readiness with Configuration RRS")
Link: https://github.com/QubesOS/qubes-issues/issues/9689#issuecomment-2582927149
Link: https://lore.kernel.org/r/Z4pHll_6GX7OUBzQ@mail-itl
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: stable@vger.kernel.org
|
|
As reported in the below link, it seems older versions of gcc cannot
determine that the howmany variable is known for all callers. Include
a test so that newer compilers can enforce this sanity check and older
compilers can still work. Add __always_inline attribute to give the
compiler an even better chance to know the inputs.
Link: https://lore.kernel.org/r/20250212185337.293023-1-alex.williamson@redhat.com
Fixes: 4453f360862e ("PCI: Batch BAR sizing operations")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/all/20250209154512.GA18688@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Mitchell Augustin <mitchell.augustin@canonical.com>
|
|
When we reenable TPH after changing a Steering Tag value, we need the
actual TPH Requester Enable value, not the ST Mode (which only happens to
work out by chance for non-extended TPH in interrupt vector mode).
Link: https://lore.kernel.org/r/13118098116d7bce07aa20b8c52e28c7d1847246.1738759933.git.robin.murphy@arm.com
Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Wei Huang <wei.huang2@amd.com>
|