summaryrefslogtreecommitdiff
path: root/drivers/iommu/intel
AgeCommit message (Collapse)Author
2024-09-02iommu/vt-d: Factor out helpers from domain_context_mapping_one()Lu Baolu
Extract common code from domain_context_mapping_one() into new helpers, making it reusable by other functions such as the upcoming identity domain implementation. No intentional functional changes. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20240809055431.36513-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Remove has_iotlb_device flagLu Baolu
The has_iotlb_device flag was used to indicate if a domain had attached devices with ATS enabled. Domains without this flag didn't require device TLB invalidation during unmap operations, optimizing performance by avoiding unnecessary device iteration. With the introduction of cache tags, this flag is no longer needed. The code to iterate over attached devices was removed by commit 06792d067989 ("iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi()"). Remove has_iotlb_device to avoid unnecessary code. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240809055431.36513-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Always reserve a domain ID for identity setupLu Baolu
We will use a global static identity domain. Reserve a static domain ID for it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20240809055431.36513-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Remove identity mappings from si_domainLu Baolu
As the driver has enforced DMA domains for devices managed by an IOMMU hardware that doesn't support passthrough translation mode, there is no need for static identity mappings in the si_domain. Remove the identity mapping code to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240809055431.36513-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-02iommu/vt-d: Require DMA domain if hardware not support passthroughLu Baolu
The iommu core defines the def_domain_type callback to query the iommu driver about hardware capability and quirks. The iommu driver should declare IOMMU_DOMAIN_DMA requirement for hardware lacking pass-through capability. Earlier VT-d hardware implementations did not support pass-through translation mode. The iommu driver relied on a paging domain with all physical system memory addresses identically mapped to the same IOVA to simulate pass-through translation before the def_domain_type was introduced and it has been kept until now. It's time to adjust it now to make the Intel iommu driver follow the def_domain_type semantics. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20240809055431.36513-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-30iommu: Allow ATS to work on VFs when the PF uses IDENTITYJason Gunthorpe
PCI ATS has a global Smallest Translation Unit field that is located in the PF but shared by all of the VFs. The expectation is that the STU will be set to the root port's global STU capability which is driven by the IO page table configuration of the iommu HW. Today it becomes set when the iommu driver first enables ATS. Thus, to enable ATS on the VF, the PF must have already had the correct STU programmed, even if ATS is off on the PF. Unfortunately the PF only programs the STU when the PF enables ATS. The iommu drivers tend to leave ATS disabled when IDENTITY translation is being used. Thus we can get into a state where the PF is setup to use IDENTITY with the DMA API while the VF would like to use VFIO with a PAGING domain and have ATS turned on. This fails because the PF never loaded a PAGING domain and so it never setup the STU, and the VF can't do it. The simplest solution is to have the iommu driver set the ATS STU when it probes the device. This way the ATS STU is loaded immediately at boot time to all PFs and there is no issue when a VF comes to use it. Add a new call pci_prepare_ats() which should be called by iommu drivers in their probe_device() op for every PCI device if the iommu driver supports ATS. This will setup the STU based on whatever page size capability the iommu HW has. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-0fb4d2ab6770+7e706-ats_vf_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-26iommu/vt-d: Fix incorrect domain ID in context flush helperLu Baolu
The helper intel_context_flush_present() is designed to flush all related caches when a context entry with the present bit set is modified. It currently retrieves the domain ID from the context entry and uses it to flush the IOTLB and context caches. This is incorrect when the context entry transitions from present to non-present, as the domain ID field is cleared before calling the helper. Fix it by passing the domain ID programmed in the context entry before the change to intel_context_flush_present(). This ensures that the correct domain ID is used for cache invalidation. Fixes: f90584f4beb8 ("iommu/vt-d: Add helper to flush caches for context change") Reported-by: Alex Williamson <alex.williamson@redhat.com> Closes: https://lore.kernel.org/linux-iommu/20240814162726.5efe1a6e.alex.williamson@redhat.com/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jacob Pan <jacob.pan@linux.microsoft.com> Link: https://lore.kernel.org/r/20240815124857.70038-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-08-22dma-mapping: direct calls for dma-iommuLeon Romanovsky
Directly call into dma-iommu just like we have been doing for dma-direct for a while. This avoids the indirect call overhead for IOMMU ops and removes the need to have DMA ops entirely for many common configurations. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-08-07iommu/vt-d: Cleanup apic_printk()Thomas Gleixner
Use the new apic_pr_verbose() helper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Tested-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/all/20240802155440.843266805@linutronix.de
2024-07-19Merge tag 'iommu-updates-v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Will Deacon: "Core: - Support for the "ats-supported" device-tree property - Removal of the 'ops' field from 'struct iommu_fwspec' - Introduction of iommu_paging_domain_alloc() and partial conversion of existing users - Introduce 'struct iommu_attach_handle' and provide corresponding IOMMU interfaces which will be used by the IOMMUFD subsystem - Remove stale documentation - Add missing MODULE_DESCRIPTION() macro - Misc cleanups Allwinner Sun50i: - Ensure bypass mode is disabled on H616 SoCs - Ensure page-tables are allocated below 4GiB for the 32-bit page-table walker - Add new device-tree compatible strings AMD Vi: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte Arm SMMUv2: - Print much more useful information on context faults - Fix Qualcomm TBU probing when CONFIG_ARM_SMMU_QCOM_DEBUG=n - Add new Qualcomm device-tree bindings Arm SMMUv3: - Support for hardware update of access/dirty bits and reporting via IOMMUFD - More driver rework from Jason, this time updating the PASID/SVA support to prepare for full IOMMUFD support - Add missing MODULE_DESCRIPTION() macro - Minor fixes and cleanups NVIDIA Tegra: - Fix for benign fwspec initialisation issue exposed by rework on the core branch Intel VT-d: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte - Use READ_ONCE() to read volatile descriptor status - Remove support for handling Execute-Requested requests - Avoid calling iommu_domain_alloc() - Minor fixes and refactoring Qualcomm MSM: - Updates to the device-tree bindings" * tag 'iommu-updates-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (72 commits) iommu/tegra-smmu: Pass correct fwnode to iommu_fwspec_init() iommu/vt-d: Fix identity map bounds in si_domain_init() iommu: Move IOMMU_DIRTY_NO_CLEAR define dt-bindings: iommu: Convert msm,iommu-v0 to yaml iommu/vt-d: Fix aligned pages in calculate_psi_aligned_address() iommu/vt-d: Limit max address mask to MAX_AGAW_PFN_WIDTH docs: iommu: Remove outdated Documentation/userspace-api/iommu.rst arm64: dts: fvp: Enable PCIe ATS for Base RevC FVP iommu/of: Support ats-supported device-tree property dt-bindings: PCI: generic: Add ats-supported property iommu: Remove iommu_fwspec ops OF: Simplify of_iommu_configure() ACPI: Retire acpi_iommu_fwspec_ops() iommu: Resolve fwspec ops automatically iommu/mediatek-v1: Clean up redundant fwspec checks RDMA/usnic: Use iommu_paging_domain_alloc() wifi: ath11k: Use iommu_paging_domain_alloc() wifi: ath10k: Use iommu_paging_domain_alloc() drm/msm: Use iommu_paging_domain_alloc() vhost-vdpa: Use iommu_paging_domain_alloc() ...
2024-07-12iommu/vt-d: Fix identity map bounds in si_domain_init()Jon Pan-Doh
Intel IOMMU operates on inclusive bounds (both generally aas well as iommu_domain_identity_map()). Meanwhile, for_each_mem_pfn_range() uses exclusive bounds for end_pfn. This creates an off-by-one error when switching between the two. Fixes: c5395d5c4a82 ("intel-iommu: Clean up iommu_domain_identity_map()") Signed-off-by: Jon Pan-Doh <pandoh@google.com> Tested-by: Sudheer Dantuluri <dantuluris@google.com> Suggested-by: Gary Zibrat <gzibrat@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240709234913.2749386-1-pandoh@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-10iommu/vt-d: Fix aligned pages in calculate_psi_aligned_address()Lu Baolu
The helper calculate_psi_aligned_address() is used to convert an arbitrary range into a size-aligned one. The aligned_pages variable is calculated from input start and end, but is not adjusted when the start pfn is not aligned and the mask is adjusted, which results in an incorrect number of pages returned. The number of pages is used by qi_flush_piotlb() to flush caches for the first-stage translation. With the wrong number of pages, the cache is not synchronized, leading to inconsistencies in some cases. Fixes: c4d27ffaa8eb ("iommu/vt-d: Add cache tag invalidation helpers") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240709152643.28109-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-10iommu/vt-d: Limit max address mask to MAX_AGAW_PFN_WIDTHLu Baolu
Address mask specifies the number of low order bits of the address field that must be masked for the invalidation operation. Since address bits masked start from bit 12, the max address mask should be MAX_AGAW_PFN_WIDTH, as defined in Table 19 ("Invalidate Descriptor Address Mask Encodings") of the spec. Limit the max address mask returned from calculate_psi_aligned_address() to MAX_AGAW_PFN_WIDTH to prevent potential integer overflow in the following code: qi_flush_dev_iotlb(): ... addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1; ... Fixes: c4d27ffaa8eb ("iommu/vt-d: Add cache tag invalidation helpers") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240709152643.28109-2-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Refactor PCI PRI enabling/disabling callbacksLu Baolu
Commit 0095bf83554f8 ("iommu: Improve iopf_queue_remove_device()") specified the flow for disabling the PRI on a device. Refactor the PRI callbacks in the intel iommu driver to better manage PRI enabling and disabling and align it with the device queue interfaces in the iommu core. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240701112317.94022-3-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-8-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Add helper to flush caches for context changeLu Baolu
This helper is used to flush the related caches following a change in a context table entry that was previously present. The VT-d specification provides guidance for such invalidations in section 6.5.3.3. This helper replaces the existing open code in the code paths where a present context entry is being torn down. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240701112317.94022-2-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-7-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Add helper to allocate paging domainLu Baolu
The domain_alloc_user operation is currently implemented by allocating a paging domain using iommu_domain_alloc(). This is because it needs to fully initialize the domain before return. Add a helper to do this to avoid using iommu_domain_alloc(). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240610085555.88197-16-baolu.lu@linux.intel.com Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20240702130839.108139-6-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Downgrade warning for pre-enabled IRLu Baolu
Emitting a warning is overkill in intel_setup_irq_remapping() since the interrupt remapping is pre-enabled. For example, there's no guarantee that kexec will explicitly disable interrupt remapping before booting a new kernel. As a result, users are seeing warning messages like below when they kexec boot a kernel, though there is nothing wrong: DMAR-IR: IRQ remapping was enabled on dmar18 but we are not in kdump mode DMAR-IR: IRQ remapping was enabled on dmar17 but we are not in kdump mode DMAR-IR: IRQ remapping was enabled on dmar16 but we are not in kdump mode ... ... Downgrade the severity of this message to avoid user confusion. CC: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/linux-iommu/5517f76a-94ad-452c-bae6-34ecc0ec4831@molgen.mpg.de/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240625043912.258036-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-5-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Remove control over Execute-Requested requestsLu Baolu
The VT-d specification has removed architectural support of the requests with pasid with a value of 1 for Execute-Requested (ER). And the NXE bit in the pasid table entry and XD bit in the first-stage paging Entries are deprecated accordingly. Remove the programming of these bits to make it consistent with the spec. Suggested-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240624032351.249858-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-4-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Remove comment for def_domain_typeLu Baolu
The comment for def_domain_type is outdated. Part of it is irrelevant. Furthermore, it could just be deleted since the iommu_ops::def_domain_type callback is properly documented in iommu.h, so individual implementations shouldn't need to repeat that. Remove it to avoid confusion. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240624024327.234979-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20240702130839.108139-3-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03iommu/vt-d: Handle volatile descriptor status readJacob Pan
Queued invalidation wait descriptor status is volatile in that IOMMU hardware writes the data upon completion. Use READ_ONCE() to prevent compiler optimizations which ensures memory reads every time. As a side effect, READ_ONCE() also enforces strict types and may add an extra instruction. But it should not have negative performance impact since we use cpu_relax anyway and the extra time(by adding an instruction) may allow IOMMU HW request cacheline ownership easier. e.g. gcc 12.3 BEFORE: 81 38 ad de 00 00 cmpl $0x2,(%rax) AFTER (with READ_ONCE()) 772f: 8b 00 mov (%rax),%eax 7731: 3d ad de 00 00 cmp $0x2,%eax //status data is 32 bit Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20240607173817.3914600-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240702130839.108139-2-baolu.lu@linux.intel.com Signed-off-by: Will Deacon <will@kernel.org>
2024-06-27iommu/vt-d: Fix missed device TLB cache tagLu Baolu
When a domain is attached to a device, the required cache tags are assigned to the domain so that the related caches can be flushed whenever it is needed. The device TLB cache tag is created based on whether the ats_enabled field of the device's iommu data is set. This creates an ordered dependency between cache tag assignment and ATS enabling. The device TLB cache tag would not be created if device's ATS is enabled after the cache tag assignment. This causes devices with PCI ATS support to malfunction. The ATS control is exclusively owned by the iommu driver. Hence, move cache_tag_assign_domain() after PCI ATS enabling to make sure that the device TLB cache tag is created for the domain. Fixes: 3b1d9e2b2d68 ("iommu/vt-d: Add cache tag assignment interface") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240620062940.201786-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-06-25iommu/vt-d: Use try_cmpxchg64() in intel_pasid_get_entry()Uros Bizjak
Use try_cmpxchg64() instead of cmpxchg64 (*ptr, old, new) != old in intel_pasid_get_entry(). cmpxchg returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240522082729.971123-2-ubizjak@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-05-22tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-21Merge tag 'pci-v6.10-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Skip E820 checks for MCFG ECAM regions for new (2016+) machines, since there's no requirement to describe them in E820 and some platforms require ECAM to work (Bjorn Helgaas) - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien Le Moal) - Remove last user and pci_enable_device_io() (Heiner Kallweit) - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen) - Skip waiting for devices that have been disconnected while suspended (Ilpo Järvinen) - Clear Secondary Status errors after enumeration since Master Aborts and Unsupported Request errors are an expected part of enumeration (Vidya Sagar) MSI: - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas) Error handling: - Mask Genesys GL975x SD host controller Replay Timer Timeout correctable errors caused by a hardware defect; the errors cause interrupts that prevent system suspend (Kai-Heng Feng) - Fix EDR-related _DSM support, which previously evaluated revision 5 but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan) ASPM: - Simplify link state definitions and mask calculation (Ilpo Järvinen) Power management: - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS apparently doesn't know how to put them back in D0 (Mario Limonciello) CXL: - Support resetting CXL devices; special handling required because CXL Ports mask Secondary Bus Reset by default (Dave Jiang) DOE: - Support DOE Discovery Version 2 (Alexey Kardashevskiy) Endpoint framework: - Set endpoint BAR to be 64-bit if the driver says that's all the device supports, in addition to doing so if the size is >2GB (Niklas Cassel) - Simplify endpoint BAR allocation and setting interfaces (Niklas Cassel) Cadence PCIe controller driver: - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof Kozlowski) Cadence PCIe endpoint driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) Freescale Layerscape PCIe controller driver: - Convert DT binding to YAML (Frank Li) MediaTek MT7621 PCIe controller driver: - Add DT binding missing 'reg' property for child Root Ports (Krzysztof Kozlowski) - Fix theoretical string truncation in PHY name (Sergio Paracuellos) NVIDIA Tegra194 PCIe controller driver: - Return success for endpoint probe instead of falling through to the failure path (Vidya Sagar) Renesas R-Car PCIe controller driver: - Add DT binding missing IOMMU properties (Geert Uytterhoeven) - Add DT binding R-Car V4H compatible for host and endpoint mode (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski) - Set the Subsystem Vendor ID, which was previously zero because it was masked incorrectly (Rick Wertenbroek) Synopsys DesignWare PCIe controller driver: - Restructure DBI register access to accommodate devices where this requires Refclk to be active (Manivannan Sadhasivam) - Remove the deinit() callback, which was only need by the pcie-rcar-gen4, and do it directly in that driver (Manivannan Sadhasivam) - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean up things like eDMA (Manivannan Sadhasivam) - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel to dw_pcie_ep_init() (Manivannan Sadhasivam) - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to reflect the actual functionality (Manivannan Sadhasivam) - Call dw_pcie_ep_init_registers() directly from all the glue drivers, not just those that require active Refclk from the host (Manivannan Sadhasivam) - Remove the "core_init_notifier" flag, which was an obscure way for glue drivers to indicate that they depend on Refclk from the host (Manivannan Sadhasivam) TI J721E PCIe driver: - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli) - Add DT binding J722S SoC support (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Add DT binding missing num-viewport, phys and phy-name properties (Jan Kiszka) Miscellaneous: - Constify and annotate with __ro_after_init (Heiner Kallweit) - Convert DT bindings to YAML (Krzysztof Kozlowski) - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming Zhou)" * tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits) PCI: Do not wait for disconnected devices when resuming x86/pci: Skip early E820 check for ECAM region PCI: Remove unused pci_enable_device_io() ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io() PCI: Update pci_find_capability() stub return types PCI: Remove PCI_IRQ_LEGACY scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios Revert "genirq/msi: Provide constants for PCI/IMS support" Revert "x86/apic/msi: Enable PCI/IMS" Revert "iommu/vt-d: Enable PCI/IMS" Revert "iommu/amd: Enable PCI/IMS" Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support" ...
2024-05-18Merge tag 'iommu-updates-v6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core: - IOMMU memory usage observability - This will make the memory used for IO page tables explicitly visible. - Simplify arch_setup_dma_ops() Intel VT-d: - Consolidate domain cache invalidation - Remove private data from page fault message - Allocate DMAR fault interrupts locally - Cleanup and refactoring ARM-SMMUv2: - Support for fault debugging hardware on Qualcomm implementations - Re-land support for the ->domain_alloc_paging() callback ARM-SMMUv3: - Improve handling of MSI allocation failure - Drop support for the "disable_bypass" cmdline option - Major rework of the CD creation code, following on directly from the STE rework merged last time around. - Add unit tests for the new STE/CD manipulation logic AMD-Vi: - Final part of SVA changes with generic IO page fault handling Renesas IPMMU: - Add support for R8A779H0 hardware ... and a couple smaller fixes and updates across the sub-tree" * tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (80 commits) iommu/arm-smmu-v3: Make the kunit into a module arm64: Properly clean up iommu-dma remnants iommu/amd: Enable Guest Translation after reading IOMMU feature register iommu/vt-d: Decouple igfx_off from graphic identity mapping iommu/amd: Fix compilation error iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() iommu/arm-smmu-v3: Move the CD generation for SVA into a function iommu/arm-smmu-v3: Allocate the CD table entry in advance iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr() iommu/arm-smmu-v3: Consolidate clearing a CD table entry iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() iommu/arm-smmu-v3: Add an ops indirection to the STE code iommu/arm-smmu-qcom: Don't build debug features as a kernel module iommu/amd: Add SVA domain support iommu: Add ops->domain_alloc_sva() iommu/amd: Initial SVA support for AMD IOMMU iommu/amd: Add support for enable/disable IOPF iommu/amd: Add IO page fault notifier handler ...
2024-05-15Revert "iommu/vt-d: Enable PCI/IMS"Bjorn Helgaas
This reverts commit 810531a1af5393f010d6508b1cb48e6650fc5e8f. IMS (Interrupt Message Store) support appeared in v6.2, but there are no users yet. Remove it for now. We can add it back when a user comes along. Link: https://lore.kernel.org/r/20240410221307.2162676-6-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2024-05-13Merge branches 'arm/renesas', 'arm/smmu', 'x86/amd', 'core' and 'x86/vt-d' ↵Joerg Roedel
into next
2024-05-06iommu/vt-d: Decouple igfx_off from graphic identity mappingLu Baolu
A kernel command called igfx_off was introduced in commit <ba39592764ed> ("Intel IOMMU: Intel IOMMU driver"). This command allows the user to disable the IOMMU dedicated to SOC-integrated graphic devices. Commit <9452618e7462> ("iommu/intel: disable DMAR for g4x integrated gfx") used this mechanism to disable the graphic-dedicated IOMMU for some problematic devices. Later, more problematic graphic devices were added to the list by commit <1f76249cc3beb> ("iommu/vt-d: Declare Broadwell igfx dmar support snafu"). On the other hand, commit <19943b0e30b05> ("intel-iommu: Unify hardware and software passthrough support") uses the identity domain for graphic devices if CONFIG_DMAR_BROKEN_GFX_WA is selected. + if (iommu_pass_through) + iommu_identity_mapping = 1; +#ifdef CONFIG_DMAR_BROKEN_GFX_WA + else + iommu_identity_mapping = 2; +#endif ... static int iommu_should_identity_map(struct pci_dev *pdev, int startup) { + if (iommu_identity_mapping == 2) + return IS_GFX_DEVICE(pdev); ... In the following driver evolution, CONFIG_DMAR_BROKEN_GFX_WA and quirk_iommu_igfx() are mixed together, causing confusion in the driver's device_def_domain_type callback. On one hand, dmar_map_gfx is used to turn off the graphic-dedicated IOMMU as a workaround for some buggy hardware; on the other hand, for those graphic devices, IDENTITY mapping is required for the IOMMU core. Commit <4b8d18c0c986> "iommu/vt-d: Remove INTEL_IOMMU_BROKEN_GFX_WA" has removed the CONFIG_DMAR_BROKEN_GFX_WA option, so the IDENTITY_DOMAIN requirement for graphic devices is no longer needed. Therefore, this requirement can be removed from device_def_domain_type() and igfx_off can be made independent. Fixes: 4b8d18c0c986 ("iommu/vt-d: Remove INTEL_IOMMU_BROKEN_GFX_WA") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240428032020.214616-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-30iommu/vt-d: Enable posted mode for device MSIsJacob Pan
With posted MSI feature enabled on the CPU side, iommu interrupt remapping table entries (IRTEs) for device MSI/x can be allocated, activated, and programed in posted mode. This means that IRTEs are linked with their respective PIDs of the target CPU. Handlers for the posted MSI notification vector will de-multiplex device MSI handlers. CPU notifications are coalesced if interrupts arrive at a high frequency. Posted interrupts are only used for device MSI and not for legacy devices (IO/APIC, HPET). Introduce a new irq_chip for posted MSIs, which has a dummy irq_ack() callback as EOI is performed in the notification handler once. When posted MSI is enabled, MSI domain/chip hierarchy will look like this example: domain: IR-PCI-MSIX-0000:50:00.0-12 hwirq: 0x29 chip: IR-PCI-MSIX-0000:50:00.0 flags: 0x430 IRQCHIP_SKIP_SET_WAKE IRQCHIP_ONESHOT_SAFE parent: domain: INTEL-IR-10-13 hwirq: 0x2d0000 chip: INTEL-IR-POST flags: 0x0 parent: domain: VECTOR hwirq: 0x77 chip: APIC Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240423174114.526704-13-jacob.jun.pan@linux.intel.com
2024-04-26iommu/dma: Centralise iommu_setup_dma_ops()Robin Murphy
It's somewhat hard to see, but arm64's arch_setup_dma_ops() should only ever call iommu_setup_dma_ops() after a successful iommu_probe_device(), which means there should be no harm in achieving the same order of operations by running it off the back of iommu_probe_device() itself. This then puts it in line with the x86 and s390 .probe_finalize bodges, letting us pull it all into the main flow properly. As a bonus this lets us fold in and de-scope the PCI workaround setup as well. At this point we can also then pull the call up inside the group mutex, and avoid having to think about whether iommu_group_store_type() could theoretically race and free the domain if iommu_setup_dma_ops() ran just *before* iommu_device_use_default_domain() claims it... Furthermore we replace one .probe_finalize call completely, since the only remaining implementations are now one which only needs to run once for the initial boot-time probe, and two which themselves render that path unreachable. This leaves us a big step closer to realistically being able to unpick the variety of different things that iommu_setup_dma_ops() has been muddling together, and further streamline iommu-dma into core API flows in future. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> # For Intel IOMMU Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/bebea331c1d688b34d9862eefd5ede47503961b8.1713523152.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Remove struct intel_svmLu Baolu
The struct intel_svm was used for keeping attached devices info for sva domain. Since sva domain is a kind of iommu_domain, the struct dmar_domain should centralize all info of a sva domain, including the info of attached devices. Therefore, retire struct intel_svm and clean up the code. Besides, register mmu notifier callback in domain_alloc_sva() callback which allows the memory management notifier lifetime to follow the lifetime of the iommu_domain. Call mmu_notifier_put() in the domain free and defer the real free to the mmu free_notifier callback. Co-developed-by: Tina Zhang <tina.zhang@intel.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-13-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Remove intel_svm_devLu Baolu
The intel_svm_dev data structure used in the sva implementation for the Intel IOMMU driver stores information about a device attached to an SVA domain. It is a duplicate of dev_pasid_info that serves the same purpose. Replace intel_svm_dev with dev_pasid_info and clean up the use of intel_svm_dev. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Use cache helpers in arch_invalidate_secondary_tlbsLu Baolu
The arch_invalidate_secondary_tlbs callback is called in the SVA mm notification path. It invalidates all or a range of caches after the CPU page table is modified. Use the cache tag helps in this path. The mm_types defines vm_end as the first byte after the end address which is different from the iommu gather API, hence convert the end parameter from mm_types to iommu gather scheme before calling the cache_tag helper. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Use cache_tag_flush_range() in cache_invalidate_userLu Baolu
The cache_invalidate_user callback is called to invalidate a range of caches for the affected user domain. Use cache_tag_flush_range() in this callback. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Cleanup use of iommu_flush_iotlb_psi()Lu Baolu
Use cache_tag_flush_range() in switch_to_super_page() to invalidate the necessary caches when switching mappings from normal to super pages. The iommu_flush_iotlb_psi() call in intel_iommu_memory_notifier() is unnecessary since there should be no cache invalidation for the identity domain. Clean up iommu_flush_iotlb_psi() after the last call site is removed. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Use cache_tag_flush_range_np() in iotlb_sync_mapLu Baolu
The iotlb_sync_map callback is called by the iommu core after non-present to present mappings are created. The iommu driver uses this callback to invalidate caches if IOMMU is working in caching mode and second-only translation is used for the domain. Use cache_tag_flush_range_np() in this callback. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Use cache_tag_flush_range() in tlb_syncLu Baolu
The tlb_sync callback is called by the iommu core to flush a range of caches for the affected domain. Use cache_tag_flush_range() in this callback. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Use cache_tag_flush_all() in flush_iotlb_allLu Baolu
The flush_iotlb_all callback is called by the iommu core to flush all caches for the affected domain. Use cache_tag_flush_all() in this callback. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Add trace events for cache tag interfaceLu Baolu
Add trace events for cache tag assign/unassign/flush operations and trace the events in the interfaces. These trace events will improve debugging capabilities by providing detailed information about cache tag activity. A sample of the traced messages looks like below [messages have been stripped and wrapped to make the line short]. cache_tag_assign: dmar9/0000:00:01.0 type iotlb did 1 pasid 9 ref 1 cache_tag_assign: dmar9/0000:00:01.0 type devtlb did 1 pasid 9 ref 1 cache_tag_flush_all: dmar6/0000:8a:00.0 type iotlb did 7 pasid 0 ref 1 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9 [0xeab00000-0xeab1afff] addr 0xeab00000 pages 0x20 mask 0x5 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9 [0xeab20000-0xeab31fff] addr 0xeab20000 pages 0x20 mask 0x5 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9 [0xeaa40000-0xeaa51fff] addr 0xeaa40000 pages 0x20 mask 0x5 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9 [0x98de0000-0x98de4fff] addr 0x98de0000 pages 0x8 mask 0x3 cache_tag_flush_range: dmar1 0000:00:1b.0[0] type iotlb did 9 [0xe9828000-0xe9828fff] addr 0xe9828000 pages 0x1 mask 0x0 cache_tag_unassign: dmar9/0000:00:01.0 type iotlb did 1 pasid 9 ref 1 cache_tag_unassign: dmar9/0000:00:01.0 type devtlb did 1 pasid 9 ref 1 Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Add cache tag invalidation helpersLu Baolu
Add several helpers to invalidate the caches after mappings in the affected domain are changed. - cache_tag_flush_range() invalidates a range of caches after mappings within this range are changed. It uses the page-selective cache invalidation methods. - cache_tag_flush_all() invalidates all caches tagged by a domain ID. It uses the domain-selective cache invalidation methods. - cache_tag_flush_range_np() invalidates a range of caches when new mappings are created in the domain and the corresponding page table entries change from non-present to present. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Add cache tag assignment interfaceLu Baolu
Caching tag is a combination of tags used by the hardware to cache various translations. Whenever a mapping in a domain is changed, the IOMMU driver should invalidate the caches with the caching tags. The VT-d specification describes caching tags in section 6.2.1, Tagging of Cached Translations. Add interface to assign caching tags to an IOMMU domain when attached to a RID or PASID, and unassign caching tags when a domain is detached from a RID or PASID. All caching tags are listed in the per-domain tag list and are protected by a dedicated lock. In addition to the basic IOTLB and devTLB caching tag types, NESTING_IOTLB and NESTING_DEVTLB tag types are also introduced. These tags are used for caches that store translations for DMA accesses through a nested user domain. They are affected by changes to mappings in the parent domain. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240416080656.60968-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Remove caching mode check before device TLB flushLu Baolu
The Caching Mode (CM) of the Intel IOMMU indicates if the hardware implementation caches not-present or erroneous translation-structure entries except for the first-stage translation. The caching mode is irrelevant to the device TLB, therefore there is no need to check it before a device TLB invalidation operation. Remove two caching mode checks before device TLB invalidation in the driver. The removal of these checks doesn't change the driver's behavior in critical map/unmap paths. Hence, there is no functionality or performance impact, especially since commit <29b32839725f> ("iommu/vt-d: Do not use flush-queue when caching-mode is on") has already disabled flush-queue for caching mode. Therefore, caching mode will never call intel_flush_iotlb_all(). Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20240415013835.9527-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Remove private data use in fault messageJingqi Liu
According to Intel VT-d specification revision 4.0, "Private Data" field has been removed from Page Request/Response. Since the private data field is not used in fault message, remove the related definitions in page request descriptor and remove the related code in page request/response handler, as Intel hasn't shipped any products which support private data in the page request message. Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20240308103811.76744-3-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Remove debugfs use of private data fieldJingqi Liu
Since the page fault report and response have been tracked by ftrace, the users can easily calculate the time used for a page fault handling. There's no need to expose the similar functionality in debugfs. Hence, remove the corresponding operations in debugfs. Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20240308103811.76744-2-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Allocate DMAR fault interrupts locallyDimitri Sivanich
The Intel IOMMU code currently tries to allocate all DMAR fault interrupt vectors on the boot cpu. On large systems with high DMAR counts this results in vector exhaustion, and most of the vectors are not initially allocated socket local. Instead, have a cpu on each node do the vector allocation for the DMARs on that node. The boot cpu still does the allocation for its node during its boot sequence. Signed-off-by: Dimitri Sivanich <sivanich@hpe.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/Zfydpp2Hm+as16TY@hpe.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Use try_cmpxchg64{,_local}() in iommu.cUros Bizjak
Replace this pattern in iommu.c: cmpxchg64{,_local}(*ptr, 0, new) != 0 ... with the simpler and faster: !try_cmpxchg64{,_local}(*ptr, &tmp, new) The x86 CMPXCHG instruction returns success in the ZF flag, so this change saves a compare after the CMPXCHG. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240414162454.49584-1-ubizjak@gmail.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-26iommu/vt-d: Remove redundant assignment to variable errColin Ian King
Variable err is being assigned a value that is never read. It is either being re-assigned later on error exit paths, or never referenced on the non-error path. Cleans up clang scan build warning: drivers/iommu/intel/dmar.c:1070:2: warning: Value stored to 'err' is never read [deadcode.DeadStores]` Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20240411090535.306326-1-colin.i.king@gmail.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-15iommu/vt-d: add wrapper functions for page allocationsPasha Tatashin
In order to improve observability and accountability of IOMMU layer, we must account the number of pages that are allocated by functions that are calling directly into buddy allocator. This is achieved by first wrapping the allocation related functions into a separate inline functions in new file: drivers/iommu/iommu-pages.h Convert all page allocation calls under iommu/intel to use these new functions. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20240413002522.1101315-2-pasha.tatashin@soleen.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-12iommu: Pass domain to remove_dev_pasid() opYi Liu
Existing remove_dev_pasid() callbacks of the underlying iommu drivers get the attached domain from the group->pasid_array. However, the domain stored in group->pasid_array is not always correct in all scenarios. A wrong domain may result in failure in remove_dev_pasid() callback. To avoid such problems, it is more reliable to pass the domain to the remove_dev_pasid() op. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240328122958.83332-3-yi.l.liu@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-04-12iommu/vt-d: Fix WARN_ON in iommu probe pathLu Baolu
Commit 1a75cc710b95 ("iommu/vt-d: Use rbtree to track iommu probed devices") adds all devices probed by the iommu driver in a rbtree indexed by the source ID of each device. It assumes that each device has a unique source ID. This assumption is incorrect and the VT-d spec doesn't state this requirement either. The reason for using a rbtree to track devices is to look up the device with PCI bus and devfunc in the paths of handling ATS invalidation time out error and the PRI I/O page faults. Both are PCI ATS feature related. Only track the devices that have PCI ATS capabilities in the rbtree to avoid unnecessary WARN_ON in the iommu probe path. Otherwise, on some platforms below kernel splat will be displayed and the iommu probe results in failure. WARNING: CPU: 3 PID: 166 at drivers/iommu/intel/iommu.c:158 intel_iommu_probe_device+0x319/0xd90 Call Trace: <TASK> ? __warn+0x7e/0x180 ? intel_iommu_probe_device+0x319/0xd90 ? report_bug+0x1f8/0x200 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? intel_iommu_probe_device+0x319/0xd90 ? debug_mutex_init+0x37/0x50 __iommu_probe_device+0xf2/0x4f0 iommu_probe_device+0x22/0x70 iommu_bus_notifier+0x1e/0x40 notifier_call_chain+0x46/0x150 blocking_notifier_call_chain+0x42/0x60 bus_notify+0x2f/0x50 device_add+0x5ed/0x7e0 platform_device_add+0xf5/0x240 mfd_add_devices+0x3f9/0x500 ? preempt_count_add+0x4c/0xa0 ? up_write+0xa2/0x1b0 ? __debugfs_create_file+0xe3/0x150 intel_lpss_probe+0x49f/0x5b0 ? pci_conf1_write+0xa3/0xf0 intel_lpss_pci_probe+0xcf/0x110 [intel_lpss_pci] pci_device_probe+0x95/0x120 really_probe+0xd9/0x370 ? __pfx___driver_attach+0x10/0x10 __driver_probe_device+0x73/0x150 driver_probe_device+0x19/0xa0 __driver_attach+0xb6/0x180 ? __pfx___driver_attach+0x10/0x10 bus_for_each_dev+0x77/0xd0 bus_add_driver+0x114/0x210 driver_register+0x5b/0x110 ? __pfx_intel_lpss_pci_driver_init+0x10/0x10 [intel_lpss_pci] do_one_initcall+0x57/0x2b0 ? kmalloc_trace+0x21e/0x280 ? do_init_module+0x1e/0x210 do_init_module+0x5f/0x210 load_module+0x1d37/0x1fc0 ? init_module_from_file+0x86/0xd0 init_module_from_file+0x86/0xd0 idempotent_init_module+0x17c/0x230 __x64_sys_finit_module+0x56/0xb0 do_syscall_64+0x6e/0x140 entry_SYSCALL_64_after_hwframe+0x71/0x79 Fixes: 1a75cc710b95 ("iommu/vt-d: Use rbtree to track iommu probed devices") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10689 Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240407011429.136282-1-baolu.lu@linux.intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>