summaryrefslogtreecommitdiff
path: root/drivers/vfio/pci
AgeCommit message (Collapse)Author
2022-09-21vfio/mlx5: Use the new device life cycle helpersYi Liu
mlx5 has its own @init/@release for handling migration cap. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-4-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/pci: Use the new device life cycle helpersYi Liu
Also introduce two pci core helpers as @init/@release for pci drivers: - vfio_pci_core_init_dev() - vfio_pci_core_release_dev() Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-3-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Set the driver DMA logging callbacksYishai Hadas
Now that everything is ready set the driver DMA logging callbacks if supported by the device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-11-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Manage error scenarios on trackerYishai Hadas
Handle async error events and health/recovery flow to safely stop the tracker upon error scenarios. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-10-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Report dirty pages from trackerYishai Hadas
Report dirty pages from tracker. It includes: Querying for dirty pages in a given IOVA range, this is done by modifying the tracker into the reporting state and supplying the required range. Using the CQ event completion mechanism to be notified once data is ready on the CQ/QP to be processed. Once data is available turn on the corresponding bits in the bit map. This functionality will be used as part of the 'log_read_and_clear' driver callback in the next patches. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-9-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Create and destroy page tracker objectYishai Hadas
Add support for creating and destroying page tracker object. This object is used to control/report the device dirty pages. As part of creating the tracker need to consider the device capabilities for max ranges and adapt/combine ranges accordingly. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-8-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Init QP based resources for dirty trackingYishai Hadas
Init QP based resources for dirty tracking to be used upon start logging. It includes: Creating the host and firmware RC QPs, move each of them to its expected state based on the device specification, etc. Creating the relevant resources which are needed by both QPs as of UAR, PD, etc. Creating the host receive side resources as of MKEY, CQ, receive WQEs, etc. The above resources are cleaned-up upon stop logging. The tracker object that will be introduced by next patches will use those resources. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio: Introduce the DMA logging feature supportYishai Hadas
Introduce the DMA logging feature support in the vfio core layer. It includes the processing of the device start/stop/report DMA logging UAPIs and calling the relevant driver 'op' to do the work. Specifically, Upon start, the core translates the given input ranges into an interval tree, checks for unexpected overlapping, non aligned ranges and then pass the translated input to the driver for start tracking the given ranges. Upon report, the core translates the given input user space bitmap and page size into an IOVA kernel bitmap iterator. Then it iterates it and call the driver to set the corresponding bits for the dirtied pages in a specific IOVA range. Upon stop, the driver is called to stop the previous started tracking. The next patches from the series will introduce the mlx5 driver implementation for the logging ops. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08Merge remote-tracking branch 'mlx5/mlx5-vfio' into v6.1/vfio/nextAlex Williamson
Merge net/mlx5 depedencies for device DMA logging and mlx5 variant driver suppport. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01hisi_acc_vfio_pci: Correct the function prefix for hssi_acc_drvdata()Shameer Kolothum
Commit 91be0bd6c6cf("vfio/pci: Have all VFIO PCI drivers store the vfio_pci_core_device in drvdata") introduced a helper function to retrieve the drvdata but used "hssi" instead of "hisi" for the function prefix. Correct that and also while at it, moved the function a bit down so that it's close to other hisi_ prefixed functions. No functional changes. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220831085943.993-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUPAbhishek Sahu
This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is any access for the VFIO device on the host side, then the device will be moved out of the low power state without the user's guest driver involvement. Once the device access has been finished, then the host can move the device again into low power state. With the low power entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP, the device will not be moved back into the low power state and a notification will be sent to the user by triggering wakeup eventfd. vfio_pci_core_pm_entry() will be called for both the variants of low power feature entry so add an extra argument for wakeup eventfd context and store locally in 'struct vfio_pci_core_device'. For the entry happened without wakeup eventfd, all the exit related handling will be done by the LOW_POWER_EXIT device feature only. When the LOW_POWER_EXIT will be called, then the vfio core layer vfio_device_pm_runtime_get() will increment the usage count and will resume the device. In the driver runtime_resume callback, the 'pm_wake_eventfd_ctx' will be NULL. Then vfio_pci_core_pm_exit() will call vfio_pci_runtime_pm_exit() and all the exit related handling will be done. For the entry happened with wakeup eventfd, in the driver resume callback, eventfd will be triggered and all the exit related handling will be done. When vfio_pci_runtime_pm_exit() will be called by vfio_pci_core_pm_exit(), then it will return early. But if the runtime suspend has not happened on the host side, then all the exit related handling will be done in vfio_pci_core_pm_exit() only. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-6-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY/EXITAbhishek Sahu
Currently, if the runtime power management is enabled for vfio-pci based devices in the guest OS, then the guest OS will do the register write for PCI_PM_CTRL register. This write request will be handled in vfio_pm_config_write() where it will do the actual register write of PCI_PM_CTRL register. With this, the maximum D3hot state can be achieved for low power. If we can use the runtime PM framework, then we can achieve the D3cold state (on the supported systems) which will help in saving maximum power. 1. D3cold state can't be achieved by writing PCI standard PM config registers. This patch implements the following newly added low power related device features: - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT The VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY feature will allow the device to make use of low power platform states on the host while the VFIO_DEVICE_FEATURE_LOW_POWER_EXIT will prevent further use of those power states. 2. The vfio-pci driver uses runtime PM framework for low power entry and exit. On the platforms where D3cold state is supported, the runtime PM framework will put the device into D3cold otherwise, D3hot or some other power state will be used. There are various cases where the device will not go into the runtime suspended state. For example, - The runtime power management is disabled on the host side for the device. - The user keeps the device busy after calling LOW_POWER_ENTRY. - There are dependent devices that are still in runtime active state. For these cases, the device will be in the same power state that has been configured by the user through PCI_PM_CTRL register. 3. The hypervisors can implement virtual ACPI methods. For example, in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power resources with _ON/_OFF method, then guest linux OS invokes the _OFF method during D3cold transition and then _ON during D0 transition. The hypervisor can tap these virtual ACPI calls and then call the low power device feature IOCTL. 4. The 'pm_runtime_engaged' flag tracks the entry and exit to runtime PM. This flag is protected with 'memory_lock' semaphore. 5. All the config and other region access are wrapped under pm_runtime_resume_and_get() and pm_runtime_put(). So, if any device access happens while the device is in the runtime suspended state, then the device will be resumed first before access. Once the access has been finished, then the device will again go into the runtime suspended state. 6. The memory region access through mmap will not be allowed in the low power state. Since __vfio_pci_memory_enabled() is a common function, so check for 'pm_runtime_engaged' has been added explicitly in vfio_pci_mmap_fault() to block only mmap'ed access. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-5-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio/pci: Mask INTx during runtime suspendAbhishek Sahu
This patch adds INTx handling during runtime suspend/resume. All the suspend/resume related code for the user to put the device into the low power state will be added in subsequent patches. The INTx lines may be shared among devices. Whenever any INTx interrupt comes for the VFIO devices, then vfio_intx_handler() will be called for each device sharing the interrupt. Inside vfio_intx_handler(), it calls pci_check_and_mask_intx() and checks if the interrupt has been generated for the current device. Now, if the device is already in the D3cold state, then the config space can not be read. Attempt to read config space in D3cold state can cause system unresponsiveness in a few systems. To prevent this, mask INTx in runtime suspend callback, and unmask the same in runtime resume callback. If INTx has been already masked, then no handling is needed in runtime suspend/resume callbacks. 'pm_intx_masked' tracks this, and vfio_pci_intx_mask() has been updated to return true if the INTx vfio_pci_irq_ctx.masked value is changed inside this function. For the runtime suspend which is triggered for the no user of VFIO device, the 'irq_type' will be VFIO_PCI_NUM_IRQS and these callbacks won't do anything. The MSI/MSI-X are not shared so similar handling should not be needed for MSI/MSI-X. vfio_msihandler() triggers eventfd_signal() without doing any device-specific config access. When the user performs any config access or IOCTL after receiving the eventfd notification, then the device will be moved to the D0 state first before servicing any request. Another option was to check this flag 'pm_intx_masked' inside vfio_intx_handler() instead of masking the interrupts. This flag is being set inside the runtime_suspend callback but the device can be in non-D3cold state (for example, if the user has disabled D3cold explicitly by sysfs, the D3cold is not supported in the platform, etc.). Also, in D3cold supported case, the device will be in D0 till the PCI core moves the device into D3cold. In this case, there is a possibility that the device can generate an interrupt. Adding check in the IRQ handler will not clear the IRQ status and the interrupt line will still be asserted. This can cause interrupt flooding. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio-pci: Replace 'void __user *' with proper types in the ioctl functionsJason Gunthorpe
This makes the code clearer and replaces a few places trying to access a flex array with an actual flex array. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio-pci: Re-indent what was vfio_pci_core_ioctl()Jason Gunthorpe
Done mechanically with: $ git clang-format-14 -i --lines 675:1210 drivers/vfio/pci/vfio_pci_core.c And manually reflow the multi-line comments clang-format doesn't fix. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio-pci: Break up vfio_pci_core_ioctl() into one function per ioctlJason Gunthorpe
500 lines is a bit long for a single function, move the bodies of each ioctl into separate functions and leave behind a switch statement to dispatch them. This patch just adds the function declarations and does not fix the indenting. The next patch will restore the indenting. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio-pci: Fix vfio_pci_ioeventfd() to return intJason Gunthorpe
This only returns 0 or -ERRNO, it should return int like all the other ioctl dispatch functions. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-0f9e632d54fb+d6-vfio_ioctl_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio/pci: Simplify the is_intx/msi/msix/etc definesJason Gunthorpe
Only three of these are actually used, simplify to three inline functions, and open code the if statement in vfio_pci_config.c. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/3-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio/pci: Rename vfio_pci_register_dev_region()Jason Gunthorpe
As this is part of the vfio_pci_core component it should be called vfio_pci_core_register_dev_region() like everything else exported from this module. Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/2-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio/pci: Split linux/vfio_pci_core.hJason Gunthorpe
The header in include/linux should have only the exported interface for other vfio_pci modules to use. Internal definitions for vfio_pci.ko should be in a "priv" header along side the .c files. Move the internal declarations out of vfio_pci_core.h. They either move to vfio_pci_priv.h or to the C file that is the only user. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/1-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-08-29KVM: s390: pci: Hook to access KVM lowlevel from VFIOPierre Morel
We have a cross dependency between KVM and VFIO when using s390 vfio_pci_zdev extensions for PCI passthrough To be able to keep both subsystem modular we add a registering hook inside the S390 core code. This fixes a build problem when VFIO is built-in and KVM is built as a module. Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Fixes: 09340b2fca007 ("KVM: s390: pci: add routines to start/stop interpretive execution") Cc: <stable@vger.kernel.org> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20220819122945.9309-1-pmorel@linux.ibm.com Message-Id: <20220819122945.9309-1-pmorel@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2022-08-06Merge tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Cleanup use of extern in function prototypes (Alex Williamson) - Simplify bus_type usage and convert to device IOMMU interfaces (Robin Murphy) - Check missed return value and fix comment typos (Bo Liu) - Split migration ops from device ops and fix races in mlx5 migration support (Yishai Hadas) - Fix missed return value check in noiommu support (Liam Ni) - Hardening to clear buffer pointer to avoid use-after-free (Schspa Shi) - Remove requirement that only the same mm can unmap a previously mapped range (Li Zhe) - Adjust semaphore release vs device open counter (Yi Liu) - Remove unused arg from SPAPR support code (Deming Wang) - Rework vfio-ccw driver to better fit new mdev framework (Eric Farman, Michael Kawano) - Replace DMA unmap notifier with callbacks (Jason Gunthorpe) - Clarify SPAPR support comment relative to iommu_ops (Alexey Kardashevskiy) - Revise page pinning API towards compatibility with future iommufd support (Nicolin Chen) - Resolve issues in vfio-ccw, including use of DMA unmap callback (Eric Farman) * tag 'vfio-v6.0-rc1' of https://github.com/awilliam/linux-vfio: (40 commits) vfio/pci: fix the wrong word vfio/ccw: Check return code from subchannel quiesce vfio/ccw: Remove FSM Close from remove handlers vfio/ccw: Add length to DMA_UNMAP checks vfio: Replace phys_pfn with pages for vfio_pin_pages() vfio/ccw: Add kmap_local_page() for memcpy vfio: Rename user_iova of vfio_dma_rw() vfio/ccw: Change pa_pfn list to pa_iova list vfio/ap: Change saved_pfn to saved_iova vfio: Pass in starting IOVA to vfio_pin/unpin_pages API vfio/ccw: Only pass in contiguous pages vfio/ap: Pass in physical address of ind to ap_aqic() drm/i915/gvt: Replace roundup with DIV_ROUND_UP vfio: Make vfio_unpin_pages() return void vfio/spapr_tce: Fix the comment vfio: Replace the iommu notifier with a device list vfio: Replace the DMA unmapping notifier with a callback vfio/ccw: Move FSM open/close to MDEV open/close vfio/ccw: Refactor vfio_ccw_mdev_reset vfio/ccw: Create a CLOSE FSM event ...
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "Quite a large pull request due to a selftest API overhaul and some patches that had come in too late for 5.19. ARM: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes RISC-V: - Track ISA extensions used by Guest using bitmap - Added system instruction emulation framework - Added CSR emulation framework - Added gfp_custom flag in struct kvm_mmu_memory_cache - Added G-stage ioremap() and iounmap() functions - Added support for Svpbmt inside Guest s390: - add an interface to provide a hypervisor dump for secure guests - improve selftests to use TAP interface - enable interpretive execution of zPCI instructions (for PCI passthrough) - First part of deferred teardown - CPU Topology - PV attestation - Minor fixes x86: - Permit guests to ignore single-bit ECC errors - Intel IPI virtualization - Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS - PEBS virtualization - Simplify PMU emulation by just using PERF_TYPE_RAW events - More accurate event reinjection on SVM (avoid retrying instructions) - Allow getting/setting the state of the speaker port data bit - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent - "Notify" VM exit (detect microarchitectural hangs) for Intel - Use try_cmpxchg64 instead of cmpxchg64 - Ignore benign host accesses to PMU MSRs when PMU is disabled - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior - Allow NX huge page mitigation to be disabled on a per-vm basis - Port eager page splitting to shadow MMU as well - Enable CMCI capability by default and handle injected UCNA errors - Expose pid of vcpu threads in debugfs - x2AVIC support for AMD - cleanup PIO emulation - Fixes for LLDT/LTR emulation - Don't require refcounted "struct page" to create huge SPTEs - Miscellaneous cleanups: - MCE MSR emulation - Use separate namespaces for guest PTEs and shadow PTEs bitmasks - PIO emulation - Reorganize rmap API, mostly around rmap destruction - Do not workaround very old KVM bugs for L0 that runs with nesting enabled - new selftests API for CPUID Generic: - Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache - new selftests API using struct kvm_vcpu instead of a (vm, id) tuple" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits) selftests: kvm: set rax before vmcall selftests: KVM: Add exponent check for boolean stats selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test selftests: KVM: Check stat name before other fields KVM: x86/mmu: remove unused variable RISC-V: KVM: Add support for Svpbmt inside Guest/VM RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap() RISC-V: KVM: Add G-stage ioremap() and iounmap() functions KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache RISC-V: KVM: Add extensible CSR emulation framework RISC-V: KVM: Add extensible system instruction emulation framework RISC-V: KVM: Factor-out instruction emulation into separate sources RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function RISC-V: KVM: Fix variable spelling mistake RISC-V: KVM: Improve ISA extension by using a bitmap KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs() KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT KVM: x86: Do not block APIC write for non ICR registers ...
2022-08-01vfio/pci: fix the wrong wordBo Liu
This patch fixes a wrong word in comment. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20220801013918.2520-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-11vfio-pci/zdev: different maxstbl for interpreted devicesMatthew Rosato
When doing load/store interpretation, the maximum store block length is determined by the underlying firmware, not the host kernel API. Reflect that in the associated Query PCI Function Group clp capability and let userspace decide which is appropriate to present to the guest. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-20-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio-pci/zdev: add function handle to clp base capabilityMatthew Rosato
The function handle is a system-wide unique identifier for a zPCI device. With zPCI instruction interpretation, the host will no longer be executing the zPCI instructions on behalf of the guest. As a result, the guest needs to use the real function handle in order for firmware to associate the instruction with the proper PCI function. Let's provide that handle to the guest. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-19-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio-pci/zdev: add open/close device hooksMatthew Rosato
During vfio-pci open_device, pass the KVM associated with the vfio group (if one exists). This is needed in order to pass a special indicator (GISA) to firmware to allow zPCI interpretation facilities to be used for only the specific KVM associated with the vfio-pci device. During vfio-pci close_device, unregister the notifier. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-18-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-11vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVMMatthew Rosato
The current contents of vfio-pci-zdev are today only useful in a KVM environment; let's tie everything currently under vfio-pci-zdev to this Kconfig statement and require KVM in this case, reducing complexity (e.g. symbol lookups). Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Link: https://lore.kernel.org/r/20220606203325.110625-11-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-07-06vfio/pci: fix the wrong wordBo Liu
This patch fixes a wrong word in comment. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20220704023649.3913-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-30vfio: Split migration ops from main device opsYishai Hadas
vfio core checks whether the driver sets some migration op (e.g. set_state/get_state) and accordingly calls its op. However, currently mlx5 driver sets the above ops without regards to its migration caps. This might lead to unexpected usage/Oops if user space may call to the above ops even if the driver doesn't support migration. As for example, the migration state_mutex is not initialized in that case. The cleanest way to manage that seems to split the migration ops from the main device ops, this will let the driver setting them separately from the main ops when it's applicable. As part of that, validate ops construction on registration and include a check for VFIO_MIGRATION_STOP_COPY since the uAPI claims it must be set in migration_flags. HISI driver was changed as well to match this scheme. This scheme may enable down the road to come with some extra group of ops (e.g. DMA log) that can be set without regards to the other options based on driver caps. Fixes: 6fadb021266d ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220628155910.171454-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-30vfio/mlx5: Protect mlx5vf_disable_fds() upon close deviceYishai Hadas
Protect mlx5vf_disable_fds() upon close device to be called under the state mutex as done in all other places. This will prevent a race with any other flow which calls mlx5vf_disable_fds() as of health/recovery upon MLX5_PF_NOTIFY_DISABLE_VF event. Encapsulate this functionality in a separate function named mlx5vf_cmd_close_migratable() to consider migration caps and for further usage upon close device. Fixes: 6fadb021266d ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220628155910.171454-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-06-27vfio/pci: Remove console driversAlex Williamson
Console drivers can create conflicts with PCI resources resulting in userspace getting mmap failures to memory BARs. This is especially evident when trying to re-use the system primary console for userspace drivers. Use the aperture helpers to remove these conflicts. v3: * call aperture_remove_conflicting_pci_devices() Reported-by: Laszlo Ersek <lersek@redhat.com> Suggested-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220622140134.12763-4-tzimmermann@suse.de
2022-05-23vfio/pci: Add driver_managed_dma to the new vfio_pci driversJason Gunthorpe
When the iommu series adding driver_managed_dma was rebased it missed that new VFIO drivers were added and did not update them too. Without this vfio will claim the groups are not viable. Add driver_managed_dma to mlx5 and hisi. Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices") Reported-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-f9dfa642fab0+2b3-vfio_managed_dma_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18vfio/pci: Move the unused device into low power state with runtime PMAbhishek Sahu
Currently, there is very limited power management support available in the upstream vfio_pci_core based drivers. If there are no users of the device, then the PCI device will be moved into D3hot state by writing directly into PCI PM registers. This D3hot state help in saving power but we can achieve zero power consumption if we go into the D3cold state. The D3cold state cannot be possible with native PCI PM. It requires interaction with platform firmware which is system-specific. To go into low power states (including D3cold), the runtime PM framework can be used which internally interacts with PCI and platform firmware and puts the device into the lowest possible D-States. This patch registers vfio_pci_core based drivers with the runtime PM framework. 1. The PCI core framework takes care of most of the runtime PM related things. For enabling the runtime PM, the PCI driver needs to decrement the usage count and needs to provide 'struct dev_pm_ops' at least. The runtime suspend/resume callbacks are optional and needed only if we need to do any extra handling. Now there are multiple vfio_pci_core based drivers. Instead of assigning the 'struct dev_pm_ops' in individual parent driver, the vfio_pci_core itself assigns the 'struct dev_pm_ops'. There are other drivers where the 'struct dev_pm_ops' is being assigned inside core layer (For example, wlcore_probe() and some sound based driver, etc.). 2. This patch provides the stub implementation of 'struct dev_pm_ops'. The subsequent patch will provide the runtime suspend/resume callbacks. All the config state saving, and PCI power management related things will be done by PCI core framework itself inside its runtime suspend/resume callbacks (pci_pm_runtime_suspend() and pci_pm_runtime_resume()). 3. Inside pci_reset_bus(), all the devices in dev_set needs to be runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take care of the runtime resume and its error handling. 4. Inside vfio_pci_core_disable(), the device usage count always needs to be decremented which was incremented in vfio_pci_core_enable(). 5. Since the runtime PM framework will provide the same functionality, so directly writing into PCI PM config register can be replaced with the use of runtime PM routines. Also, the use of runtime PM can help us in more power saving. In the systems which do not support D3cold, With the existing implementation: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3hot So, with runtime PM, the upstream bridge or root port will also go into lower power state which is not possible with existing implementation. In the systems which support D3cold, // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3cold // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3cold So, with runtime PM, both the PCI device and upstream bridge will go into D3cold state. 6. If 'disable_idle_d3' module parameter is set, then also the runtime PM will be enabled, but in this case, the usage count should not be decremented. 7. vfio_pci_dev_set_try_reset() return value is unused now, so this function return type can be changed to void. 8. Use the runtime PM API's in vfio_pci_core_sriov_configure(). The device can be in low power state either with runtime power management (when there is no user) or PCI_PM_CTRL register write by the user. In both the cases, the PF should be moved to D0 state. For preventing any runtime usage mismatch, pci_num_vf() has been called explicitly during disable. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-5-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18vfio/pci: Virtualize PME related registers bits and initialize to zeroAbhishek Sahu
If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18vfio/pci: Change the PF power state to D0 before enabling VFsAbhishek Sahu
According to [PCIe v5 9.6.2] for PF Device Power Management States "The PF's power management state (D-state) has global impact on its associated VFs. If a VF does not implement the Power Management Capability, then it behaves as if it is in an equivalent power state of its associated PF. If a VF implements the Power Management Capability, the Device behavior is undefined if the PF is placed in a lower power state than the VF. Software should avoid this situation by placing all VFs in lower power state before lowering their associated PF's power state." From the vfio driver side, user can enable SR-IOV when the PF is in D3hot state. If VF does not implement the Power Management Capability, then the VF will be actually in D3hot state and then the VF BAR access will fail. If VF implements the Power Management Capability, then VF will assume that its current power state is D0 when the PF is D3hot and in this case, the behavior is undefined. To support PF power management, we need to create power management dependency between PF and its VF's. The runtime power management support may help with this where power management dependencies are supported through device links. But till we have such support in place, we can disallow the PF to go into low power state, if PF has VF enabled. There can be a case, where user first enables the VF's and then disables the VF's. If there is no user of PF, then the PF can put into D3hot state again. But with this patch, the PF will still be in D0 state after disabling VF's since detecting this case inside vfio_pci_core_sriov_configure() requires access to struct vfio_device::open_count along with its locks. But the subsequent patches related to runtime PM will handle this case since runtime PM maintains its own usage count. Also, vfio_pci_core_sriov_configure() can be called at any time (with and without vfio pci device user), so the power state change and SR-IOV enablement need to be protected with the required locks. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-3-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18vfio/pci: Invalidate mmaps and block the access in D3hot power stateAbhishek Sahu
According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error will be returned for MMIO access. The IO access generally does not make the system unresponsive so the IO access can still happen in D3hot state. The default value should be returned in this case without bringing down the complete system. Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. This protection is mainly needed when user changes the PCI power state by writing into PCI_PM_CTRL register. vfio_lock_and_set_power_state() wrapper function will take the required locks and then it will invoke the vfio_pci_set_power_state(). Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-2-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio/pci: Use the struct file as the handle not the vfio_groupJason Gunthorpe
VFIO PCI does a security check as part of hot reset to prove that the user has permission to manipulate all the devices that will be impacted by the reset. Use a new API vfio_file_has_dev() to perform this security check against the struct file directly and remove the vfio_group from VFIO PCI. Since VFIO PCI was the last user of vfio_group_get_external_user() and vfio_group_put_external_user() remove it as well. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13Merge remote-tracking branch 'iommu/vfio-notifier-fix' into v5.19/vfio/nextAlex Williamson
Merge IOMMU dependencies for vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio/pci: Remove vfio_device_get_from_dev()Jason Gunthorpe
The last user of this function is in PCI callbacks that want to convert their struct pci_dev to a vfio_device. Instead of searching use the vfio_device available trivially through the drvdata. When a callback in the device_driver is called, the caller must hold the device_lock() on dev. The purpose of the device_lock is to prevent remove() from being called (see __device_release_driver), and allow the driver to safely interact with its drvdata without races. The PCI core correctly follows this and holds the device_lock() when calling error_detected (see report_error_detected) and sriov_configure (see sriov_numvfs_store). Further, since the drvdata holds a positive refcount on the vfio_device any access of the drvdata, under the device_lock(), from a driver callback needs no further protection or refcounting. Thus the remark in the vfio_device_get_from_dev() comment does not apply here, VFIO PCI drivers all call vfio_unregister_group_dev() from their remove callbacks under the device_lock() and cannot race with the remaining callers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio/pci: Have all VFIO PCI drivers store the vfio_pci_core_device in drvdataJason Gunthorpe
Having a consistent pointer in the drvdata will allow the next patch to make use of the drvdata from some of the core code helpers. Use a WARN_ON inside vfio_pci_core_register_device() to detect drivers that miss this. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11Merge tag 'mlx5-lm-parallel' of ↵Alex Williamson
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.19/vfio/next Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-11vfio/mlx5: Run the SAVE state command in an async modeYishai Hadas
Use the PF asynchronous command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Link: https://lore.kernel.org/r/20220510090206.90374-5-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-11vfio/mlx5: Refactor to enable VFs migration in parallelYishai Hadas
Refactor to enable different VFs to run their commands over the PF command interface in parallel and to not block one each other. This is done by not using the global PF lock that was used before but relying on the VF attach/detach mechanism to sync. Link: https://lore.kernel.org/r/20220510090206.90374-4-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-11vfio/mlx5: Manage the VF attach/detach callback from the PFYishai Hadas
Manage the VF attach/detach callback from the PF. This lets the driver to enable parallel VFs migration as will be introduced in the next patch. As part of this, reorganize the VF is migratable code to be in a separate function and rename it to be set_migratable() to match its functionality. Link: https://lore.kernel.org/r/20220510090206.90374-3-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-28vfio: Set DMA ownership for VFIO devicesLu Baolu
Claim group dma ownership when an IOMMU group is set to a container, and release the dma ownership once the iommu group is unset from the container. This change disallows some unsafe bridge drivers to bind to non-ACS bridges while devices under them are assigned to user space. This is an intentional enhancement and possibly breaks some existing configurations. The recommendation to such an affected user would be that the previously allowed host bridge driver was unsafe for this use case and to continue to enable assignment of devices within that group, the driver should be unbound from the bridge device or replaced with the pci-stub driver. For any bridge driver, we consider it unsafe if it satisfies any of the following conditions: 1) The bridge driver uses DMA. Calling pci_set_master() or calling any kernel DMA API (dma_map_*() and etc.) is an indicate that the driver is doing DMA. 2) If the bridge driver uses MMIO, it should be tolerant to hostile userspace also touching the same MMIO registers via P2P DMA attacks. If the bridge driver turns out to be a safe one, it could be used as before by setting the driver's .driver_managed_dma field, just like what we have done in the pcieport driver. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220418005000.897664-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-04-13vfio/pci: Fix vf_token mechanism when device-specific VF drivers are usedJason Gunthorpe
get_pf_vdev() tries to check if a PF is a VFIO PF by looking at the driver: if (pci_dev_driver(physfn) != pci_dev_driver(vdev->pdev)) { However now that we have multiple VF and PF drivers this is no longer reliable. This means that security tests realted to vf_token can be skipped by mixing and matching different VFIO PCI drivers. Instead of trying to use the driver core to find the PF devices maintain a linked list of all PF vfio_pci_core_device's that we have called pci_enable_sriov() on. When registering a VF just search the list to see if the PF is present and record the match permanently in the struct. PCI core locking prevents a PF from passing pci_disable_sriov() while VF drivers are attached so the VFIO owned PF becomes a static property of the VF. In common cases where vfio does not own the PF the global list remains empty and the VF's pointer is statically NULL. This also fixes a lockdep splat from recursive locking of the vfio_group::device_lock between vfio_device_get_from_name() and vfio_device_get_from_dev(). If the VF and PF share the same group this would deadlock. Fixes: ff53edf6d6ab ("vfio/pci: Split the pci_driver code out of vfio_pci_core.c") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v3-876570980634+f2e8-vfio_vf_token_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-03-15hisi_acc_vfio_pci: Use its own PCI reset_done error handlerShameer Kolothum
Register private handler for pci_error_handlers.reset_done and update state accordingly. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220308184902.2242-10-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-03-15hisi_acc_vfio_pci: Add support for VFIO live migrationLongfang Liu
VMs assigned with HiSilicon ACC VF devices can now perform live migration if the VF devices are bind to the hisi_acc_vfio_pci driver. Just like ACC PF/VF drivers this VFIO driver also make use of the HiSilicon QM interface. QM stands for Queue Management which is a generic IP used by ACC devices. It provides a generic PCIe interface for the CPU and the ACC devices to share a group of queues. QM integrated into an accelerator provides queue management service. Queues can be assigned to PF and VFs, and queues can be controlled by unified mailboxes and doorbells. The QM driver (drivers/crypto/hisilicon/qm.c) provides generic interfaces to ACC drivers to manage the QM. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220308184902.2242-9-shameerali.kolothum.thodi@huawei.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-03-15hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration regionShameer Kolothum
HiSilicon ACC VF device BAR2 region consists of both functional register space and migration control register space. Unnecessarily exposing the migration BAR region to the Guest has the potential to prevent/corrupt the Guest migration. Hence, introduce a separate struct vfio_device_ops for migration support which will override the ioctl/read/write/mmap methods to hide the migration region and limit the Guest access only to the functional register space. This will be used in subsequent patches when we add migration support to the driver. Please note that it is OK to export the entire VF BAR if migration is not supported or required as this cannot affect the PF configurations. Reviewed-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220308184902.2242-6-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>