summaryrefslogtreecommitdiff
path: root/drivers/iommu/amd
AgeCommit message (Collapse)Author
2023-10-24iommu/amd: Access/Dirty bit support in IOPTEsJoao Martins
IOMMU advertises Access/Dirty bits if the extended feature register reports it. Relevant AMD IOMMU SDM ref[0] "1.3.8 Enhanced Support for Access and Dirty Bits" To enable it set the DTE flag in bits 7 and 8 to enable access, or access+dirty. With that, the IOMMU starts marking the D and A flags on every Memory Request or ATS translation request. It is on the VMM side to steer whether to enable dirty tracking or not, rather than wrongly doing in IOMMU. Relevant AMD IOMMU SDM ref [0], "Table 7. Device Table Entry (DTE) Field Definitions" particularly the entry "HAD". To actually toggle on and off it's relatively simple as it's setting 2 bits on DTE and flush the device DTE cache. To get what's dirtied use existing AMD io-pgtable support, by walking the pagetables over each IOVA, with fetch_pte(). The IOTLB flushing is left to the caller (much like unmap), and iommu_dirty_bitmap_record() is the one adding page-ranges to invalidate. This allows caller to batch the flush over a big span of IOVA space, without the iommu wondering about when to flush. Worthwhile sections from AMD IOMMU SDM: "2.2.3.1 Host Access Support" "2.2.3.2 Host Dirty Support" For details on how IOMMU hardware updates the dirty bit see, and expects from its consequent clearing by CPU: "2.2.7.4 Updating Accessed and Dirty Bits in the Guest Address Tables" "2.2.7.5 Clearing Accessed and Dirty Bits" Quoting the SDM: "The setting of accessed and dirty status bits in the page tables is visible to both the CPU and the peripheral when sharing guest page tables. The IOMMU interlocked operations to update A and D bits must be 64-bit operations and naturally aligned on a 64-bit boundary" .. and for the IOMMU update sequence to Dirty bit, essentially is states: 1. Decodes the read and write intent from the memory access. 2. If P=0 in the page descriptor, fail the access. 3. Compare the A & D bits in the descriptor with the read and write intent in the request. 4. If the A or D bits need to be updated in the descriptor: * Start atomic operation. * Read the descriptor as a 64-bit access. * If the descriptor no longer appears to require an update, release the atomic lock with no further action and continue to step 5. * Calculate the new A & D bits. * Write the descriptor as a 64-bit access. * End atomic operation. 5. Continue to the next stage of translation or to the memory access. Access/Dirty bits readout also need to consider the non-default page-sizes (aka replicated PTEs as mentined by manual), as AMD supports all powers of two (except 512G) page sizes. Select IOMMUFD_DRIVER only if IOMMUFD is enabled considering that IOMMU dirty tracking requires IOMMUFD. Link: https://lore.kernel.org/r/20231024135109.73787-12-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24iommu/amd: Add domain_alloc_user based domain allocationJoao Martins
Add the domain_alloc_user op implementation. To that end, refactor amd_iommu_domain_alloc() to receive a dev pointer and flags, while renaming it too, such that it becomes a common function shared with domain_alloc_user() implementation. The sole difference with domain_alloc_user() is that we initialize also other fields that iommu_domain_alloc() does. It lets it return the iommu domain correctly initialized in one function. This is in preparation to add dirty enforcement on AMD implementation of domain_alloc_user. Link: https://lore.kernel.org/r/20231024135109.73787-11-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-16iommu/amd: Remove DMA_FQ type from domain allocation pathVasant Hegde
.. as drivers won't see DMA_FQ any more. See commit a4fdd9762272 ("iommu: Use flush queue capability") for details. Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20231016051305.13091-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-06iommu/amd: Remove unused EXPORT_SYMBOLSVasant Hegde
Drop EXPORT_SYMBOLS for the functions that are not used by any modules. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231006095706.5694-5-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-06iommu/amd: Remove amd_iommu_device_info()Vasant Hegde
No one is using this function. Hence remove it. Also move PCI device feature detection flags to amd_iommu_types.h as its only used inside AMD IOMMU driver. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231006095706.5694-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-06iommu/amd: Remove PPR supportVasant Hegde
Remove PPR handler and notifier related functions as its not used anymore. Note that we are retaining PPR interrupt handler support as it will be re-used when we introduce IOPF support. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231006095706.5694-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-06iommu/amd: Remove iommu_v2 moduleVasant Hegde
AMD GPU driver which was the only in-kernel user of iommu_v2 module removed dependency on iommu_v2 module. Also we are working on adding SVA support in AMD IOMMU driver. Device drivers are expected to use common SVA framework to enable device PASID/PRI features. Removing iommu_v2 module and then adding SVA simplifies the development. Hence remove iommu_v2 module. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20231006095706.5694-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-02iommu: Allow .iotlb_sync_map to fail and handle s390's -ENOMEM returnNiklas Schnelle
On s390 when using a paging hypervisor, .iotlb_sync_map is used to sync mappings by letting the hypervisor inspect the synced IOVA range and updating a shadow table. This however means that .iotlb_sync_map can fail as the hypervisor may run out of resources while doing the sync. This can be due to the hypervisor being unable to pin guest pages, due to a limit on mapped addresses such as vfio_iommu_type1.dma_entry_limit or lack of other resources. Either way such a failure to sync a mapping should result in a DMA_MAPPING_ERROR. Now especially when running with batched IOTLB flushes for unmap it may be that some IOVAs have already been invalidated but not yet synced via .iotlb_sync_map. Thus if the hypervisor indicates running out of resources, first do a global flush allowing the hypervisor to free resources associated with these mappings as well a retry creating the new mappings and only if that also fails report this error to callers. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> # sun50i Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-1-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Initialize iommu_device->max_pasidsVasant Hegde
Commit 1adf3cc20d69 ("iommu: Add max_pasids field in struct iommu_device") introduced a variable struct iommu_device.max_pasids to track max PASIDS supported by each IOMMU. Let us initialize this field for AMD IOMMU. IOMMU core will use this value to set max PASIDs per device (see __iommu_probe_device()). Also remove unused global 'amd_iommu_max_pasid' variable. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-15-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Enable device ATS/PASID/PRI capabilities independentlyVasant Hegde
Introduce helper functions to enable/disable device ATS/PASID/PRI capabilities independently along with the new pasid_enabled and pri_enabled variables in struct iommu_dev_data to keep track, which allows attach_device() and detach_device() to be simplified. Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-14-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Introduce iommu_dev_data.flags to track device capabilitiesVasant Hegde
Currently we use struct iommu_dev_data.iommu_v2 to keep track of the device ATS, PRI, and PASID capabilities. But these capabilities can be enabled independently (except PRI requires ATS support). Hence, replace the iommu_v2 variable with a flags variable, which keep track of the device capabilities. From commit 9bf49e36d718 ("PCI/ATS: Handle sharing of PF PRI Capability with all VFs"), device PRI/PASID is shared between PF and any associated VFs. Hence use pci_pri_supported() and pci_pasid_features() instead of pci_find_ext_capability() to check device PRI/PASID support. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-13-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Introduce iommu_dev_data.pprSuravee Suthikulpanit
For AMD IOMMU, the PPR feature is needed to support IO page fault (IOPF). PPR is enabled per PCI end-point device, and is configured by the PPR bit in the IOMMU device table entry (i.e DTE[PPR]). Introducing struct iommu_dev_data.ppr track PPR setting for each device. Also iommu_dev_data.ppr will be set only when IOMMU supports PPR. Hence remove redundant feature support check in set_dte_entry(). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-12-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Rename ats related variablesVasant Hegde
Remove nested structure and make it as 'ats_{enable/qdep}'. Also convert 'dev_data.pri_tlp' to bit field. No functional changes intended. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-11-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Modify logic for checking GT and PPR featuresSuravee Suthikulpanit
In order to support v2 page table, IOMMU driver need to check if the hardware can support Guest Translation (GT) and Peripheral Page Request (PPR) features. Currently, IOMMU driver uses global (amd_iommu_v2_present) and per-iommu (struct amd_iommu.is_iommu_v2) variables to track the features. There variables area redundant since we could simply just check the global EFR mask. Therefore, replace it with a helper function with appropriate name. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-10-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Consolidate feature detection and reporting logicSuravee Suthikulpanit
Currently, IOMMU driver assumes capabilities on all IOMMU instances to be homogeneous. During early_amd_iommu_init(), the driver probes all IVHD blocks and do sanity check to make sure that only features common among all IOMMU instances are supported. This is tracked in the global amd_iommu_efr and amd_iommu_efr2, which should be used whenever the driver need to check hardware capabilities. Therefore, introduce check_feature() and check_feature2(), and modify the driver to adopt the new helper functions. In addition, clean up the print_iommu_info() to avoid reporting redundant EFR/EFR2 for each IOMMU instance. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-9-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Miscellaneous clean up when free domainSuravee Suthikulpanit
* Use the protection_domain_free() helper function to free domain. The function has been modified to also free memory used for the v1 and v2 page tables. Also clear gcr3 table in v2 page table free path. * Refactor code into cleanup_domain() for reusability. Change BUG_ON to WARN_ON in cleanup path. * Protection domain dev_cnt should be read when the domain is locked. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-8-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Do not set amd_iommu_pgtable in pass-through modeVasant Hegde
Since AMD IOMMU page table is not used in passthrough mode, switching to v1 page table is not required. Therefore, remove redundant amd_iommu_pgtable update and misleading warning message. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-7-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Introduce helper functions for managing GCR3 tableSuravee Suthikulpanit
Refactor domain_enable_v2() into helper functions for managing GCR3 table (i.e. setup_gcr3_table() and get_gcr3_levels()), which will be used in subsequent patches. Also re-arrange code and remove forward declaration. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-6-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Refactor protection domain allocation codeVasant Hegde
To replace if-else with switch-case statement due to increasing number of domain types. No functional changes intended. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-5-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Consolidate logic to allocate protection domainSuravee Suthikulpanit
Move the logic into the common caller function to simplify the code. No functional changes intended. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Consolidate timeout pre-define to amd_iommu_type.hSuravee Suthikulpanit
To allow inclusion in other files in subsequent patches. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/amd: Remove unused amd_io_pgtable.pt_root variableSuravee Suthikulpanit
It has been no longer used since the commit 6eedb59c18a3 ("iommu/amd: Remove amd_iommu_domain_get_pgtable"). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230921092147.5930-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-01Merge tag 'iommu-updates-v6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Consolidate probe_device path - Make the PCI-SAC IOVA allocation trick PCI-only AMD IOMMU: - Consolidate PPR log handling - Interrupt handling improvements - Refcount fixes for amd_iommu_v2 driver Intel VT-d driver: - Enable idxd device DMA with pasid through iommu dma ops - Lift RESV_DIRECT check from VT-d driver to core - Miscellaneous cleanups and fixes ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Allow ASIDs to be configured in the DT to work around Qualcomm's broken hypervisor - Fix clocks for Qualcomm's MSM8998 SoC - SMMUv2: - Support for Qualcomm's legacy firmware implementation featured on at least MSM8956 and MSM8976 - Match compatible strings for Qualcomm SM6350 and SM6375 SoC variants - SMMUv3: - Use 'ida' instead of a bitmap for VMID allocation - Rockchip IOMMU: - Lift page-table allocation restrictions on newer hardware - Mediatek IOMMU: - Add MT8188 IOMMU Support - Renesas IOMMU: - Allow PCIe devices .. and the usual set of cleanups an smaller fixes" * tag 'iommu-updates-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (64 commits) iommu: Explicitly include correct DT includes iommu/amd: Remove unused declarations iommu/arm-smmu-qcom: Add SM6375 SMMUv2 iommu/arm-smmu-qcom: Add SM6350 DPU compatible iommu/arm-smmu-qcom: Add SM6375 DPU compatible iommu/arm-smmu-qcom: Sort the compatible list alphabetically dt-bindings: arm-smmu: Fix MSM8998 clocks description iommu/vt-d: Remove unused extern declaration dmar_parse_dev_scope() iommu/vt-d: Fix to convert mm pfn to dma pfn iommu/vt-d: Fix to flush cache of PASID directory table iommu/vt-d: Remove rmrr check in domain attaching device path iommu: Prevent RESV_DIRECT devices from blocking domains dmaengine/idxd: Re-enable kernel workqueue under DMA API iommu/vt-d: Add set_dev_pasid callback for dma domain iommu/vt-d: Prepare for set_dev_pasid callback iommu/vt-d: Make prq draining code generic iommu/vt-d: Remove pasid_mutex iommu/vt-d: Add domain_flush_pasid_iotlb() iommu: Move global PASID allocation from SVA to core iommu: Generalize PASID 0 for normal DMA w/o PASID ...
2023-08-30Merge tag 'x86_apic_for_6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Dave Hansen: "This includes a very thorough rework of the 'struct apic' handlers. Quite a variety of them popped up over the years, especially in the 32-bit days when odd apics were much more in vogue. The end result speaks for itself, which is a removal of a ton of code and static calls to replace indirect calls. If there's any breakage here, it's likely to be around the 32-bit museum pieces that get light to no testing these days. Summary: - Rework apic callbacks, getting rid of unnecessary ones and coalescing lots of silly duplicates. - Use static_calls() instead of indirect calls for apic->foo() - Tons of cleanups an crap removal along the way" * tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/apic: Turn on static calls x86/apic: Provide static call infrastructure for APIC callbacks x86/apic: Wrap IPI calls into helper functions x86/apic: Mark all hotpath APIC callback wrappers __always_inline x86/xen/apic: Mark apic __ro_after_init x86/apic: Convert other overrides to apic_update_callback() x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb() x86/apic: Provide apic_update_callback() x86/xen/apic: Use standard apic driver mechanism for Xen PV x86/apic: Provide common init infrastructure x86/apic: Wrap apic->native_eoi() into a helper x86/apic: Nuke ack_APIC_irq() x86/apic: Remove pointless arguments from [native_]eoi_write() x86/apic/noop: Tidy up the code x86/apic: Remove pointless NULL initializations x86/apic: Sanitize APIC ID range validation x86/apic: Prepare x2APIC for using apic::max_apic_id x86/apic: Simplify X2APIC ID validation x86/apic: Add max_apic_id member x86/apic: Wrap APIC ID validation into an inline ...
2023-08-18mmu_notifiers: rename invalidate_range notifierAlistair Popple
There are two main use cases for mmu notifiers. One is by KVM which uses mmu_notifier_invalidate_range_start()/end() to manage a software TLB. The other is to manage hardware TLBs which need to use the invalidate_range() callback because HW can establish new TLB entries at any time. Hence using start/end() can lead to memory corruption as these callbacks happen too soon/late during page unmap. mmu notifier users should therefore either use the start()/end() callbacks or the invalidate_range() callbacks. To make this usage clearer rename the invalidate_range() callback to arch_invalidate_secondary_tlbs() and update documention. Link: https://lkml.kernel.org/r/6f77248cd25545c8020a54b4e567e8b72be4dca1.1690292440.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Andrew Donnellan <ajd@linux.ibm.com> Cc: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Cc: Frederic Barrat <fbarrat@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nicolin Chen <nicolinc@nvidia.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: SeongJae Park <sj@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Zhi Wang <zhi.wang.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-17iommu/amd: Remove unused declarationsYue Haibing
Commit aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device") removed the implementations but left declarations in place. Remove it. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230814135502.4808-1-yuehaibing@huawei.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-08iommu/amd: Rearrange DTE bit definationsVasant Hegde
Rearrage according to 64bit word they are in. Note that I have not rearranged gcr3 related macros even though they belong to different 64bit word as its easy to read it in current format. No functional changes intended. Suggested-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230619131908.5887-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-06x86/vector: Rename send_cleanup_vector() to vector_schedule_cleanup()Thomas Gleixner
Rename send_cleanup_vector() to vector_schedule_cleanup() to prepare for replacing the vector cleanup IPI with a timer callback. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20230621171248.6805-2-xin3.li@intel.com
2023-07-14iommu/amd: Enable PPR/GA interrupt after interrupt handler setupVasant Hegde
Current code enables PPR and GA interrupts before setting up the interrupt handler (in state_next()). Make sure interrupt handler is in place before enabling these interrupt. amd_iommu_enable_interrupts() gets called in normal boot, kdump as well as in suspend/resume path. Hence moving interrupt enablement to this function works fine. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14iommu/amd: Consolidate PPR log enablementVasant Hegde
Move PPR log interrupt bit setting to iommu_enable_ppr_log(). Also rearrange iommu_enable_ppr_log() such that PPREn bit is enabled before enabling PPRLog and PPRInt bits. So that when PPRLog bit is set it will clear the PPRLogOverflow bit and sets the PPRLogRun bit in the IOMMU Status Register [MMIO Offset 2020h]. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14iommu/amd: Disable PPR log/interrupt in iommu_disable()Vasant Hegde
Similar to other logs, disable PPR log/interrupt in iommu_disable() path. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628054554.6131-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14iommu/amd: Enable separate interrupt for PPR and GA logVasant Hegde
AMD IOMMU has three log buffers (i.e. Event, PPR, and GA). These logs can be configured to generate different interrupts when an entry is inserted into a log buffer. However, current implementation share single interrupt to handle all three logs. With increasing usages of the GA (for IOMMU AVIC) and PPR logs (for IOMMUv2 APIs and SVA), interrupt sharing could potentially become performance bottleneck. Hence, separate IOMMU interrupt into use three separate vectors and irq threads with corresponding name, which will be displayed in the /proc/interrupts as "AMD-Vi<x>-[Evt/PPR/GA]", where "x" is an IOMMU id. Note that this patch changes interrupt handling only in IOMMU x2apic mode (MMIO 0x18[IntCapXTEn]=1). In legacy mode it will continue to use single MSI interrupt. Signed-off-by: Vasant Hegde<vasant.hegde@amd.com> Reviewed-by: Alexey Kardashevskiy<aik@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628053222.5962-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14iommu/amd: Refactor IOMMU interrupt handling logic for Event, PPR, and GA logsVasant Hegde
The AMD IOMMU has three log buffers (i.e. Event, PPR, and GA). The IOMMU driver processes these log entries when it receive an IOMMU interrupt. Then, it needs to clear the corresponding interrupt status bits. Also, when an overflow occurs, it needs to handle the log overflow by clearing the specific overflow status bit and restart the log. Since, logic for handling these logs is the same, refactor the code into a helper function called amd_iommu_handle_irq(), which handles the steps described. Then, reuse it for all types of log. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde<vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230628053222.5962-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14iommu/amd: Handle PPR log overflowVasant Hegde
Some ATS-capable peripherals can issue requests to the processor to service peripheral page requests using PCIe PRI (the Page Request Interface). IOMMU supports PRI using PPR log buffer. IOMMU writes PRI request to PPR log buffer and sends PPR interrupt to host. When there is no space in the PPR log buffer (PPR log overflow) it will set PprOverflow bit in 'MMIO Offset 2020h IOMMU Status Register'. When this happens PPR log needs to be restarted as specified in IOMMU spec [1] section 2.6.2. When handling the event it just resumes the PPR log without resizing (similar to the way event and GA log overflow is handled). Failing to handle PPR overflow means device may not work properly as IOMMU stops processing new PPR events from device. [1] https://www.amd.com/system/files/TechDocs/48882_3.07_PUB.pdf Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230628051624.5792-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14iommu/amd: Generalize log overflow handlingVasant Hegde
Each IOMMU has three log buffers (Event, GA and PPR log). Once a buffer becomes full, IOMMU generates an interrupt with the corresponding overflow status bit, and stop processing the log. To handle an overflow, the IOMMU driver needs to disable the log, clear the overflow status bit, and re-enable the log. This procedure is same among all types of log buffer except it uses different overflow status bit and enabling bit. Hence, to consolidate the log buffer restarting logic, introduce a helper function amd_iommu_restart_log(), which caller can specify parameters specific for each type of log buffer. Also rename MMIO_STATUS_EVT_OVERFLOW_INT_MASK as MMIO_STATUS_EVT_OVERFLOW_MASK. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230628051624.5792-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14iommu/amd/iommu_v2: Clear pasid state in free pathVasant Hegde
Clear pasid state in device amd_iommu_free_device() path. It will make sure no new ppr notifier is registered in free path. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609105146.7773-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-07-14iommu/amd/iommu_v2: Fix pasid_state refcount dec hit 0 warning on pasid unbindDaniel Marcovitch
When unbinding pasid - a race condition exists vs outstanding page faults. To prevent this, the pasid_state object contains a refcount. * set to 1 on pasid bind * incremented on each ppr notification start * decremented on each ppr notification done * decremented on pasid unbind Since refcount_dec assumes that refcount will never reach 0: the current implementation causes the following to be invoked on pasid unbind: REFCOUNT_WARN("decrement hit 0; leaking memory") Fix this issue by changing refcount_dec to refcount_dec_and_test to explicitly handle refcount=1. Fixes: 8bc54824da4e ("iommu/amd: Convert from atomic_t to refcount_t on pasid_state->count") Signed-off-by: Daniel Marcovitch <dmarcovitch@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609105146.7773-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-29Merge tag 'iommu-updates-v6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - iova_magazine_alloc() optimization - Make flush-queue an IOMMU driver capability - Consolidate the error handling around device attachment AMD IOMMU changes: - AVIC Interrupt Remapping Improvements - Some minor fixes and cleanups Intel VT-d changes from Lu Baolu: - Small and misc cleanups ARM-SMMU changes from Will Deacon: - Device-tree binding updates: - Add missing clocks for SC8280XP and SA8775 Adreno SMMUs - Add two new Qualcomm SMMUs in SDX75 and SM6375 - Workarounds for Arm MMU-700 errata: - 1076982: Avoid use of SEV-based cmdq wakeup - 2812531: Terminate command batches with a CMD_SYNC - Enforce single-stage translation to avoid nesting-related errata - Set the correct level hint for range TLB invalidation on teardown .. and some other minor fixes and cleanups (including Freescale PAMU and virtio-iommu changes)" * tag 'iommu-updates-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (50 commits) iommu/vt-d: Remove commented-out code iommu/vt-d: Remove two WARN_ON in domain_context_mapping_one() iommu/vt-d: Handle the failure case of dmar_reenable_qi() iommu/vt-d: Remove unnecessary (void*) conversions iommu/amd: Remove extern from function prototypes iommu/amd: Use BIT/BIT_ULL macro to define bit fields iommu/amd: Fix DTE_IRQ_PHYS_ADDR_MASK macro iommu/amd: Fix compile error for unused function iommu/amd: Improving Interrupt Remapping Table Invalidation iommu/amd: Do not Invalidate IRT when IRTE caching is disabled iommu/amd: Introduce Disable IRTE Caching Support iommu/amd: Remove the unused struct amd_ir_data.ref iommu/amd: Switch amd_iommu_update_ga() to use modify_irte_ga() iommu/arm-smmu-v3: Set TTL invalidation hint better iommu/arm-smmu-v3: Document nesting-related errata iommu/arm-smmu-v3: Add explicit feature for nesting iommu/arm-smmu-v3: Document MMU-700 erratum 2812531 iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982 dt-bindings: arm-smmu: Add SDX75 SMMU compatible dt-bindings: arm-smmu: Add SM6375 GPU SMMU ...
2023-06-28Merge branch 'expand-stack'Linus Torvalds
This modifies our user mode stack expansion code to always take the mmap_lock for writing before modifying the VM layout. It's actually something we always technically should have done, but because we didn't strictly need it, we were being lazy ("opportunistic" sounds so much better, doesn't it?) about things, and had this hack in place where we would extend the stack vma in-place without doing the proper locking. And it worked fine. We just needed to change vm_start (or, in the case of grow-up stacks, vm_end) and together with some special ad-hoc locking using the anon_vma lock and the mm->page_table_lock, it all was fairly straightforward. That is, it was all fine until Ruihan Li pointed out that now that the vma layout uses the maple tree code, we *really* don't just change vm_start and vm_end any more, and the locking really is broken. Oops. It's not actually all _that_ horrible to fix this once and for all, and do proper locking, but it's a bit painful. We have basically three different cases of stack expansion, and they all work just a bit differently: - the common and obvious case is the page fault handling. It's actually fairly simple and straightforward, except for the fact that we have something like 24 different versions of it, and you end up in a maze of twisty little passages, all alike. - the simplest case is the execve() code that creates a new stack. There are no real locking concerns because it's all in a private new VM that hasn't been exposed to anybody, but lockdep still can end up unhappy if you get it wrong. - and finally, we have GUP and page pinning, which shouldn't really be expanding the stack in the first place, but in addition to execve() we also use it for ptrace(). And debuggers do want to possibly access memory under the stack pointer and thus need to be able to expand the stack as a special case. None of these cases are exactly complicated, but the page fault case in particular is just repeated slightly differently many many times. And ia64 in particular has a fairly complicated situation where you can have both a regular grow-down stack _and_ a special grow-up stack for the register backing store. So to make this slightly more manageable, the bulk of this series is to first create a helper function for the most common page fault case, and convert all the straightforward architectures to it. Thus the new 'lock_mm_and_find_vma()' helper function, which ends up being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon, loongarch, nios2, sh, sparc32, and xtensa. So we not only convert more than half the architectures, we now have more shared code and avoid some of those twisty little passages. And largely due to this common helper function, the full diffstat of this series ends up deleting more lines than it adds. That still leaves eight architectures (ia64, m68k, microblaze, openrisc, parisc, s390, sparc64 and um) that end up doing 'expand_stack()' manually because they are doing something slightly different from the normal pattern. Along with the couple of special cases in execve() and GUP. So there's a couple of patches that first create 'locked' helper versions of the stack expansion functions, so that there's a obvious path forward in the conversion. The execve() case is then actually pretty simple, and is a nice cleanup from our old "grow-up stackls are special, because at execve time even they grow down". The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because it's just more straightforward to write out the stack expansion there manually, instead od having get_user_pages_remote() do it for us in some situations but not others and have to worry about locking rules for GUP. And the final step is then to just convert the remaining odd cases to a new world order where 'expand_stack()' is called with the mmap_lock held for reading, but where it might drop it and upgrade it to a write, only to return with it held for reading (in the success case) or with it completely dropped (in the failure case). In the process, we remove all the stack expansion from GUP (where dropping the lock wouldn't be ok without special rules anyway), and add it in manually to __access_remote_vm() for ptrace(). Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases. Everything else here felt pretty straightforward, but the ia64 rules for stack expansion are really quite odd and very different from everything else. Also thanks to Vegard Nossum who caught me getting one of those odd conditions entirely the wrong way around. Anyway, I think I want to actually move all the stack expansion code to a whole new file of its own, rather than have it split up between mm/mmap.c and mm/memory.c, but since this will have to be backported to the initial maple tree vma introduction anyway, I tried to keep the patches _fairly_ minimal. Also, while I don't think it's valid to expand the stack from GUP, the final patch in here is a "warn if some crazy GUP user wants to try to expand the stack" patch. That one will be reverted before the final release, but it's left to catch any odd cases during the merge window and release candidates. Reported-by: Ruihan Li <lrh2000@pku.edu.cn> * branch 'expand-stack': gup: add warning if some caller would seem to want stack expansion mm: always expand the stack with the mmap write lock held execve: expand new process stack manually ahead of time mm: make find_extend_vma() fail if write lock not held powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma() mm/fault: convert remaining simple cases to lock_mm_and_find_vma() arm/mm: Convert to using lock_mm_and_find_vma() riscv/mm: Convert to using lock_mm_and_find_vma() mips/mm: Convert to using lock_mm_and_find_vma() powerpc/mm: Convert to using lock_mm_and_find_vma() arm64/mm: Convert to using lock_mm_and_find_vma() mm: make the page fault mmap locking killable mm: introduce new 'lock_mm_and_find_vma()' page fault helper
2023-06-27Merge tag 'locking-core-2023-06-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce cmpxchg128() -- aka. the demise of cmpxchg_double() The cmpxchg128() family of functions is basically & functionally the same as cmpxchg_double(), but with a saner interface. Instead of a 6-parameter horror that forced u128 - u64/u64-halves layout details on the interface and exposed users to complexity, fragility & bugs, use a natural 3-parameter interface with u128 types. - Restructure the generated atomic headers, and add kerneldoc comments for all of the generic atomic{,64,_long}_t operations. The generated definitions are much cleaner now, and come with documentation. - Implement lock_set_cmp_fn() on lockdep, for defining an ordering when taking multiple locks of the same type. This gets rid of one use of lockdep_set_novalidate_class() in the bcache code. - Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable shadowing generating garbage code on Clang on certain ARM builds. * tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits) locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg() locking/atomic: treewide: delete arch_atomic_*() kerneldoc locking/atomic: docs: Add atomic operations to the driver basic API documentation locking/atomic: scripts: generate kerneldoc comments docs: scripts: kernel-doc: accept bitwise negation like ~@var locking/atomic: scripts: simplify raw_atomic*() definitions locking/atomic: scripts: simplify raw_atomic_long*() definitions locking/atomic: scripts: split pfx/name/sfx/order locking/atomic: scripts: restructure fallback ifdeffery locking/atomic: scripts: build raw_atomic_long*() directly locking/atomic: treewide: use raw_atomic*_<op>() locking/atomic: scripts: add trivial raw_atomic*_<op>() locking/atomic: scripts: factor out order template generation locking/atomic: scripts: remove leftover "${mult}" locking/atomic: scripts: remove bogus order parameter locking/atomic: xtensa: add preprocessor symbols locking/atomic: x86: add preprocessor symbols locking/atomic: sparc: add preprocessor symbols locking/atomic: sh: add preprocessor symbols ...
2023-06-27mm: always expand the stack with the mmap write lock heldLinus Torvalds
This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64 Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-19Merge branches 'iommu/fixes', 'arm/smmu', 'ppc/pamu', 'virtio', 'x86/vt-d', ↵Joerg Roedel
'core' and 'x86/amd' into next
2023-06-16iommu/amd: Fix possible memory leak of 'domain'Su Hui
Move allocation code down to avoid memory leak. Fixes: 29f54745f245 ("iommu/amd: Add missing domain type checks") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230608021933.856045-1-suhui@nfschina.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16iommu/amd: Remove extern from function prototypesVasant Hegde
The kernel coding style does not require 'extern' in function prototypes. Hence remove them from header file. No functional change intended. Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230609090631.6052-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16iommu/amd: Use BIT/BIT_ULL macro to define bit fieldsVasant Hegde
Make use of BIT macro when defining bitfields which makes it easy to read. No functional change intended. Suggested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609090631.6052-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16iommu/amd: Fix DTE_IRQ_PHYS_ADDR_MASK macroVasant Hegde
Interrupt Table Root Pointer is 52 bit and table must be aligned to start on a 128-byte boundary. Hence first 6 bits are ignored. Current code uses address mask as 45 instead of 46bit. Use GENMASK_ULL macro instead of manually generating address mask. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230609090327.5923-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09iommu/amd: Fix compile error for unused functionJoerg Roedel
Recent changes introduced a compile error: drivers/iommu/amd/iommu.c:1285:13: error: ‘iommu_flush_irt_and_complete’ defined but not used [-Werror=unused-function] 1285 | static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ This happens with defconfig-x86_64 because AMD IOMMU is enabled but CONFIG_IRQ_REMAP is disabled. Move the function under #ifdef CONFIG_IRQ_REMAP to fix the error. Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09iommu/amd: Improving Interrupt Remapping Table InvalidationSuravee Suthikulpanit
Invalidating Interrupt Remapping Table (IRT) requires, the AMD IOMMU driver to issue INVALIDATE_INTERRUPT_TABLE and COMPLETION_WAIT commands. Currently, the driver issues the two commands separately, which requires calling raw_spin_lock_irqsave() twice. In addition, the COMPLETION_WAIT could potentially be interleaved with other commands causing delay of the COMPLETION_WAIT command. Therefore, combine issuing of the two commands in one spin-lock, and changing struct amd_iommu.cmd_sem_val to use atomic64 to minimize locking. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-6-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09iommu/amd: Do not Invalidate IRT when IRTE caching is disabledSuravee Suthikulpanit
With the Interrupt Remapping Table cache disabled, there is no need to issue invalidate IRT and wait for its completion. Therefore, add logic to bypass the operation. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Suggested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-5-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-09iommu/amd: Introduce Disable IRTE Caching SupportSuravee Suthikulpanit
An Interrupt Remapping Table (IRT) stores interrupt remapping configuration for each device. In a normal operation, the AMD IOMMU caches the table to optimize subsequent data accesses. This requires the IOMMU driver to invalidate IRT whenever it updates the table. The invalidation process includes issuing an INVALIDATE_INTERRUPT_TABLE command following by a COMPLETION_WAIT command. However, there are cases in which the IRT is updated at a high rate. For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large amount of vcpus and VFIO PCI pass-through devices, the invalidation process could potentially become a performance bottleneck. Introducing a new kernel boot option: amd_iommu=irtcachedis which disables IRTE caching by setting the IRTCachedis bit in each IOMMU Control register, and bypass the IRT invalidation process. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Co-developed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-4-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>