summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx5/main.c
AgeCommit message (Collapse)Author
2019-12-12IB/mlx5: Fix device memory flowsYishai Hadas
Fix device memory flows so that only once there will be no live mmaped VA to a given allocation the matching object will be destroyed. This prevents a potential scenario that existing VA that was mmaped by one process might still be used post its deallocation despite that it's owned now by other process. The above is achieved by integrating with IB core APIs to manage mmap/munmap. Only once the refcount will become 0 the DM object and its underlay area will be freed. Fixes: 3b113a1ec3d4 ("IB/mlx5: Support device memory type attribute") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212100237.330654-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12IB/mlx5: Fix steering rule of drop and countMaor Gottlieb
There are two flow rule destinations: QP and packet. While users are setting DROP packet rule, the QP should not be set as a destination. Fixes: 3b3233fbf02e ("IB/mlx5: Add flow counters binding support") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212091214.315005-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-11-27Merge tag 'driver-core-5.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core patches for 5.5-rc1 There's a few minor cleanups and fixes in here, but the majority of the patches in here fall into two buckets: - debugfs api cleanups and fixes - driver core device link support for boot dependancy issues The debugfs api cleanups are working to slowly refactor the debugfs apis so that it is even harder to use incorrectly. That work has been happening for the past few kernel releases and will continue over time, it's a long-term project/goal The driver core device link support missed 5.4 by just a bit, so it's been sitting and baking for many months now. It's from Saravana Kannan to help resolve the problems that DT-based systems have at boot time with dependancy graphs and kernel modules. Turns out that no one has actually tried to build a generic arm64 kernel with loads of modules and have it "just work" for a variety of platforms (like a distro kernel). The big problem turned out to be a lack of dependency information between different areas of DT entries, and the work here resolves that problem and now allows devices to boot properly, and quicker than a monolith kernel. All of these patches have been in linux-next for a long time with no reported issues" * tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (68 commits) tracing: Remove unnecessary DEBUG_FS dependency of: property: Add device link support for interrupt-parent, dmas and -gpio(s) debugfs: Fix !DEBUG_FS debugfs_create_automount of: property: Add device link support for "iommu-map" of: property: Fix the semantics of of_is_ancestor_of() i2c: of: Populate fwnode in of_i2c_get_board_info() drivers: base: Fix Kconfig indentation firmware_loader: Fix labels with comma for builtin firmware driver core: Allow device link operations inside sync_state() driver core: platform: Declare ret variable only once cpu-topology: declare parse_acpi_topology in <linux/arch_topology.h> crypto: hisilicon: no need to check return value of debugfs_create functions driver core: platform: use the correct callback type for bus_find_device firmware_class: make firmware caching configurable driver core: Clarify documentation for fwnode_operations.add_links() mailbox: tegra: Fix superfluous IRQ error message net: caif: Fix debugfs on 64-bit platforms mac80211: Use debugfs_create_xul() helper media: c8sectpfe: no need to check return value of debugfs_create functions of: property: Add device link support for iommus, mboxes and io-channels ...
2019-11-25Merge branch 'ib-guids' into rdma.git for-nextJason Gunthorpe
Danit Goldberg says: ==================== This series extends RTNETLINK to provide IB port and node GUIDs, which were configured for Infiniband VFs. The functionality to set VF GUIDs already existed for a long time, and here we are adding the missing "get" so that netlink will be symmetric and various cloud orchestration tools will be able to manage such VFs more naturally. The iproute2 was extended too to present those GUIDs. - ip link show <device> For example: - ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33 - ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10 - ip link show ib4 ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'ib-guids': (35 commits) IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs net/mlx5: Add new chain for netfilter flow table offload net/mlx5: Refactor creating fast path prio chains net/mlx5: Accumulate levels for chains prio namespaces net/mlx5: Define fdb tc levels per prio net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines net/mlx5: Simplify fdb chain and prio eswitch defines IB/mlx5: Load profile according to RoCE enablement state IB/mlx5: Rename profile and init methods net/mlx5: Handle "enable_roce" devlink param net/mlx5: Document flow_steering_mode devlink param devlink: Add new "enable_roce" generic device param net/mlx5: fix spelling mistake "metdata" -> "metadata" net/mlx5: fix kvfree of uninitialized pointer spec IB/mlx5: Introduce and use mlx5_core_is_vf() net/mlx5: E-switch, Enable metadata on own vport net/mlx5: Refactor ingress acl configuration ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-22IB/mlx5: Implement callbacks for getting VFs GUID attributesDanit Goldberg
Implement the IB defined callback mlx5_ib_get_vf_guid used to query FW for VFs attributes and return node and port GUIDs. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-19IB/mlx5: Support extended number of strides for Striding RQMark Zhang
Extends the minimum single WQE strides from 64 to 8, which is exposed by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps. Choose right number of strides based on FW capability. Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-11IB/mlx5: Load profile according to RoCE enablement stateMichael Guralnik
When RoCE is disabled load mlx5_ib in raw_eth profile. Clean pf_profile roce capability checks as it will not be used without roce capability. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-11IB/mlx5: Rename profile and init methodsMichael Guralnik
Rename uplink_rep_profile and its unique init and cleanup stages to suit its upcoming use as the profile when RoCE is disabled. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-06RDMA: Connect between the mmap entry and the umap_priv structureMichal Kalderon
The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources. However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy. Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together. Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-04IB: mlx5: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Leon Romanovsky <leon@kernel.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-01IB/mlx5: Introduce and use mlx5_core_is_vf()Parav Pandit
Instead of deciding a given device is virtual function or not based on a device is PF or not, use already defined MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf(). This enables to clearly identify PF, VF and non virtual functions. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-31IB/mlx5: Test write combining supportMichael Guralnik
Linux can run in all sorts of physical machines and VMs where write combining may or may not be supported. Currently there is no way to reliably tell if the system supports WC, or not. The driver uses WC to optimize posting work to the HCA, and getting this wrong in either direction can cause a significant performance loss. Add a test in mlx5_ib initialization process to test whether write-combining is supported on the machine. The test will run as part of the enable_driver callback to ensure that the test runs after the device is setup and can create and modify the QP needed, but runs before the device is exposed to the users. The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame is different from the WQE in memory, requesting CQE only on the BlueFlame WQE. By checking whether we received a completion on one of these WQEs we can know if BlueFlame succeeded and this write-combining must be supported. Change reporting of BlueFlame support to be dependent on write-combining support instead of the FW's guess as to what the machine can do. Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28Merge branch 'odp_rework' into rdma.git for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== In order to hoist the interval tree code out of the drivers and into the mmu_notifiers it is necessary for the drivers to not use the interval tree for other things. This series replaces the interval tree with an xarray and along the way re-aligns all the locking to use a sensible SRCU model where the 'update' step is done by modifying an xarray. The result is overall much simpler and with less locking in the critical path. Many functions were reworked for clarity and small details like using 'imr' to refer to the implicit MR make the entire code flow here more readable. This also squashes at least two race bugs on its own, and quite possibily more that haven't been identified. ==================== Merge conflicts with the odp statistics patch resolved. * branch 'odp_rework': RDMA/odp: Remove broken debugging call to invalidate_range RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray RDMA/mlx5: Rework implicit ODP destroy RDMA/mlx5: Avoid double lookups on the pagefault path RDMA/mlx5: Reduce locking in implicit_mr_get_data() RDMA/mlx5: Use an xarray for the children of an implicit ODP RDMA/mlx5: Split implicit handling from pagefault_mr RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it RDMA/mlx5: Rework implicit_mr_get_data RDMA/mlx5: Delete struct mlx5_priv->mkey_table RDMA/mlx5: Use a dedicated mkey xarray for ODP RDMA/mlx5: Split sig_err MR data into its own xarray RDMA/mlx5: Use SRCU properly in ODP prefetch Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/mlx5: Rework implicit ODP destroyJason Gunthorpe
Use SRCU in a sensible way by removing all MRs in the implicit tree from the two xarrays (the update operation), then a synchronize, followed by a normal single threaded teardown. This is only a little unusual from the normal pattern as there can still be some work pending in the unbound wq that may also require a workqueue flush. This is tracked with a single atomic, consolidating the redundant existing atomics and wait queue. For understand-ability the entire ODP implicit create/destroy flow now largely exists in a single pair of functions within odp.c, with a few support functions for tearing down an unused child. Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/mlx5: Use a dedicated mkey xarray for ODPJason Gunthorpe
There is a per device xarray storing mkeys that is used to store every mkey in the system. However, this xarray is now only read by ODP for certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT). Create an xarray only for use by ODP, that only contains ODP related MKeys. This xarray is protected by SRCU and all erases are protected by a synchronize. This improves performance: - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp() can be deleted. The xarray will also consume fewer nodes. - normal MR's are never mixed with ODP MRs in a SRCU data structure so performance sucking synchronize_srcu() on every MR destruction is not needed. - No smp_load_acquire(live) and xa_load() double barrier on read Due to the SRCU locking scheme care must be taken with the placement of the xa_store(). Once it completes the MR is immediately visible to other threads and only through a xa_erase() & synchronize_srcu() cycle could it be destroyed. Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/mlx5: Split sig_err MR data into its own xarrayJason Gunthorpe
The locking model for signature is completely different than ODP, do not share the same xarray that relies on SRCU locking to support ODP. Simply store the active mlx5_core_sig_ctx's in an xarray when signature MRs are created and rely on trivial xarray locking to serialize everything. The overhead of storing only a handful of SIG related MRs is going to be much less than an xarray full of every mkey. Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22IB/mlx5: Remove dead codeRan Rozenstein
mlx5_ib_dc_atomic_is_supported function is not used anywhere. Remove the dead code. Fixes: a60109dc9a95 ("IB/mlx5: Add support for extended atomic operations") Link: https://lore.kernel.org/r/20191020064454.8551-1-leon@kernel.org Signed-off-by: Ran Rozenstein <ranro@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/nldev: Provide MR statisticsErez Alfasi
Add RDMA nldev netlink interface for dumping MR statistics information. Output example: $ ./ibv_rc_pingpong -o -P -s 500000000 local address: LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID :: $ rdma stat show mr dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0 Link: https://lore.kernel.org/r/20191016062308.11886-5-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/mlx5: Return ODP type per MRErez Alfasi
Provide an ODP explicit/implicit type as part of 'rdma -dd resource show mr' dump. For example: $ rdma -dd resource show mr dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000 pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit For non-ODP MRs, we won't print "drv_odp ..." at all. Link: https://lore.kernel.org/r/20191016062308.11886-4-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22Merge branch 'mlx5-rd-sgl' into rdma.git for-nextJason Gunthorpe
From Yamin Friedman: ==================== This series from Yamin implements long standing "TODO" existed in rw.c. It allows the driver to specify a cut-over point where it is faster to build a lkey MR rather than do a large SGL for RDMA READ operations. mlx5 HW gets a notable performane boost by switching to MRs. ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'mlx5-rd-sgl': (3 commits) RDMA/mlx5: Add capability for max sge to get optimized performance RDMA/rw: Support threshold for registration vs scattering to local pages net/mlx5: Expose optimal performance scatter entries capability
2019-10-22RDMA/mlx5: Add capability for max sge to get optimized performanceYamin Friedman
Allows the IB device to provide a value of maximum scatter gather entries per RDMA READ. In certain cases it may be preferable for a device to perform UMR memory registration rather than have many scatter entries in a single RDMA READ. This provides a significant performance increase in devices capable of using different memory registration schemes based on the number of scatter gather entries. This general capability allows each device vendor to fine tune when it is better to use memory registration. Link: https://lore.kernel.org/r/20191007135933.12483-4-leon@kernel.org Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/mlx5: Remove unnecessary else statementErez Alfasi
'else' is not generally useful after a break or return. Remove this unnecessary statement. Link: https://lore.kernel.org/r/20191002122517.17721-4-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull RDMA subsystem updates from Jason Gunthorpe: "This cycle mainly saw lots of bug fixes and clean up code across the core code and several drivers, few new functional changes were made. - Many cleanup and bug fixes for hns - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed, bnxt_re, efa - Share the query_port code between all the iWarp drivers - General rework and cleanup of the ODP MR umem code to fit better with the mmu notifier get/put scheme - Support rdma netlink in non init_net name spaces - mlx5 support for XRC devx and DC ODP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA: Fix double-free in srq creation error flow RDMA/efa: Fix incorrect error print IB/mlx5: Free mpi in mp_slave mode IB/mlx5: Use the original address for the page during free_pages RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp" RDMA/hns: Package operations of rq inline buffer into separate functions RDMA/hns: Optimize cmd init and mode selection for hip08 IB/hfi1: Define variables as unsigned long to fix KASAN warning IB/{rdmavt, hfi1, qib}: Add a counter for credit waits IB/hfi1: Add traces for TID RDMA READ RDMA/siw: Relax from kmap_atomic() use in TX path IB/iser: Support up to 16MB data transfer in a single command RDMA/siw: Fix page address mapping in TX path RDMA: Fix goto target to release the allocated memory RDMA/usnic: Avoid overly large buffers on stack RDMA/odp: Add missing cast for 32 bit RDMA/hns: Use devm_platform_ioremap_resource() to simplify code Documentation/infiniband: update name of some functions RDMA/cma: Fix false error message RDMA/hns: Fix wrong assignment of qp_access_flags ...
2019-09-21Merge tag 'for-linus-hmm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits) libnvdimm: Enable unit test infrastructure compile checks mm, notifier: Catch sleeping/blocking for !blockable kernel.h: Add non_block_start/end() drm/radeon: guard against calling an unpaired radeon_mn_unregister() csky: add missing brackets in a macro for tlb.h pagewalk: use lockdep_assert_held for locking validation pagewalk: separate function pointers from iterator data mm: split out a new pagewalk.h header from mm.h mm/mmu_notifiers: annotate with might_sleep() mm/mmu_notifiers: prime lockdep mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports mm/hmm: hmm_range_fault() infinite loop mm/hmm: hmm_range_fault() NULL pointer bug mm/hmm: fix hmm_range_fault()'s handling of swapped out pages mm/mmu_notifiers: remove unregister_no_release RDMA/odp: remove ib_ucontext from ib_umem RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address ...
2019-09-16IB/mlx5: Free mpi in mp_slave modeDanit Goldberg
ib_add_slave_port() allocates a multiport struct but never frees it. Don't leak memory, free the allocated mpi struct during driver unload. Cc: <stable@vger.kernel.org> Fixes: 32f69e4be269 ("{net, IB}/mlx5: Manage port association for multiport RoCE") Link: https://lore.kernel.org/r/20190916064818.19823-3-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13Merge tag 'v5.3-rc8' into rdma.git for-nextJason Gunthorpe
To resolve dependencies in following patches mlx5_ib.h conflict resolved by keeing both hunks Linux 5.3-rc8 Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-03net/mlx5: Add flow steering actions to fs_cmd shim layerMaor Gottlieb
Add flow steering actions: modify header and packet reformat to the fs_cmd shim layer. This allows each namespace to define possibly different functionality for alloc/dealloc action commands. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-02Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Merge mlx5-next patches needed for upcoming mlx5 software steering. 1) Alex adds HW bits and definitions required for SW steering 2) Ariel moves device memory management to mlx5_core (From mlx5_ib) 3) Maor, Cleanups and fixups for eswitch mode and RoCE 4) Mark, Set only stag for match untagged packets Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-01net/mlx5: Move device memory management to mlx5_coreAriel Levkovich
Move the device memory allocation and deallocation commands SW ICM memory to mlx5_core to expose this API for all mlx5_core users. This comes as preparation for supporting SW steering in kernel where it will be required to allocate and register device memory for direct rule insertion. In addition, an API to register this device memory for future remote access operations is introduced using the create_mkey commands. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-26RDMA/mlx5: RDMA_RX flow type support for user applicationsMark Zhang
Currently user applications can only steer TCP/IP(NIC RX/RX) traffic. This patch adds RDMA_RX as a new flow type to allow the user to insert steering rules to control RDMA traffic. Two destinations are supported(but not set at the same time): devx flow table object and QP. Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190819113626.20284-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-21RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'Jason Gunthorpe
This is a significant simplification, no extra list is kept per FD, and the interval tree is now shared between all the ucontexts, reducing overhead if there are multiple ucontexts active. Link: https://lore.kernel.org/r/20190806231548.25242-7-jgg@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21Merge branch 'odp_fixes' into rdma.git for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== This is a collection of general cleanups for ODP to clarify some of the flows around umem creation and use of the interval tree. ==================== The branch is based on v5.3-rc5 due to dependencies * odp_fixes: RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address RDMA/core: Make invalidate_range a device operation RDMA/odp: Use kvcalloc for the dma_list and page_list RDMA/odp: Check for overflow when computing the umem_odp end RDMA/odp: Provide ib_umem_odp_release() to undo the allocs RDMA/odp: Split creating a umem_odp from ib_umem_get RDMA/odp: Make the three ways to create a umem_odp clear RMDA/odp: Consolidate umem_odp initialization RDMA/odp: Make it clearer when a umem is an implicit ODP umem RDMA/odp: Iterate over the whole rbtree directly RDMA/odp: Use the common interval tree library instead of generic RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21RDMA/core: Make invalidate_range a device operationMoni Shoua
The callback function 'invalidate_range' is implemented in a driver so the place for it is in the ib_device_ops structure and not in ib_ucontext. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Link: https://lore.kernel.org/r/20190819111710.18440-11-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-20IB/mlx5: Report and handle ODP support properlyMoni Shoua
ODP depends on the several device capabilities, among them is the ability to send UMR WQEs with that modify atomic and entity size of the MR. Therefore, only if all conditions to send such a UMR WQE are met then driver can report that ODP is supported. Use this check of conditions in all places where driver needs to know about ODP support. Also, implicit ODP support depends on ability of driver to send UMR WQEs for an indirect mkey. Therefore, verify that all conditions to do so are met when reporting support. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-7-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-13RDMA/mlx5: Annotate lock dependency in bind/unbind slave portLeon Romanovsky
NULL-ing notifier_call is performed under protection of mlx5_ib_multiport_mutex lock. Such protection is not easily spotted and better to be guarded by lockdep annotation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190813102814.22350-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-12RDMA: Introduce ib_port_phys_state enumKamal Heib
In order to improve readability, add ib_port_phys_state enum to replace the use of magic numbers. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Andrew Boyer <aboyer@tobark.org> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190807103138.17219-2-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-01RDMA/mlx5: Release locks during notifier unregisterLeon Romanovsky
The below kernel panic was observed when created bond mode LACP with GRE tunnel on top. The reason to it was not released spinlock during mlx5 notify unregsiter sequence. [ 234.562007] BUG: scheduling while atomic: sh/10900/0x00000002 [ 234.563005] Preemption disabled at: [ 234.566864] ------------[ cut here ]------------ [ 234.567120] DEBUG_LOCKS_WARN_ON(val > preempt_count()) [ 234.567139] WARNING: CPU: 16 PID: 10900 at kernel/sched/core.c:3203 preempt_count_sub+0xca/0x170 [ 234.569550] CPU: 16 PID: 10900 Comm: sh Tainted: G W 5.2.0-rc1-for-linust-dbg-2019-05-25_04-57-33-60 #1 [ 234.569886] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.6.1 02/12/2018 [ 234.570183] RIP: 0010:preempt_count_sub+0xca/0x170 [ 234.570404] Code: 03 38 d0 7c 08 84 d2 0f 85 b0 00 00 00 8b 15 dd 02 03 04 85 d2 75 ba 48 c7 c6 00 e1 88 83 48 c7 c7 40 e1 88 83 e8 76 11 f7 ff <0f> 0b 5b c3 65 8b 05 d3 1f d8 7e 84 c0 75 82 e8 62 c3 c3 00 85 c0 [ 234.570911] RSP: 0018:ffff888b94477b08 EFLAGS: 00010286 [ 234.571133] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 234.571391] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000246 [ 234.571648] RBP: ffff888ba5560000 R08: fffffbfff08962d5 R09: fffffbfff08962d5 [ 234.571902] R10: 0000000000000001 R11: fffffbfff08962d4 R12: ffff888bac6e9548 [ 234.572157] R13: ffff888babfaf728 R14: ffff888bac6e9568 R15: ffff888babfaf750 [ 234.572412] FS: 00007fcafa59b740(0000) GS:ffff888bed200000(0000) knlGS:0000000000000000 [ 234.572686] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 234.572914] CR2: 00007f984f16b140 CR3: 0000000b2bf0a001 CR4: 00000000001606e0 [ 234.573172] Call Trace: [ 234.573336] _raw_spin_unlock+0x2e/0x50 [ 234.573542] mlx5_ib_unbind_slave_port+0x1bc/0x690 [mlx5_ib] [ 234.573793] mlx5_ib_cleanup_multiport_master+0x1d3/0x660 [mlx5_ib] [ 234.574039] mlx5_ib_stage_init_cleanup+0x4c/0x360 [mlx5_ib] [ 234.574271] ? kfree+0xf5/0x2f0 [ 234.574465] __mlx5_ib_remove+0x61/0xd0 [mlx5_ib] [ 234.574688] ? __mlx5_ib_remove+0xd0/0xd0 [mlx5_ib] [ 234.574951] mlx5_remove_device+0x234/0x300 [mlx5_core] [ 234.575224] mlx5_unregister_device+0x4d/0x1e0 [mlx5_core] [ 234.575493] remove_one+0x4f/0x160 [mlx5_core] [ 234.575704] pci_device_remove+0xef/0x2a0 [ 234.581407] ? pcibios_free_irq+0x10/0x10 [ 234.587143] ? up_read+0xc1/0x260 [ 234.592785] device_release_driver_internal+0x1ab/0x430 [ 234.598442] unbind_store+0x152/0x200 [ 234.604064] ? sysfs_kf_write+0x3b/0x180 [ 234.609441] ? sysfs_file_ops+0x160/0x160 [ 234.615021] kernfs_fop_write+0x277/0x440 [ 234.620288] ? __sb_start_write+0x1ef/0x2c0 [ 234.625512] vfs_write+0x15e/0x460 [ 234.630786] ksys_write+0x156/0x1e0 [ 234.635988] ? __ia32_sys_read+0xb0/0xb0 [ 234.641120] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 234.646163] do_syscall_64+0x95/0x470 [ 234.651106] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 234.656004] RIP: 0033:0x7fcaf9c9cfd0 [ 234.660686] Code: 73 01 c3 48 8b 0d c0 6e 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d cd cf 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24 [ 234.670128] RSP: 002b:00007ffd3b01ddd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 234.674811] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fcaf9c9cfd0 [ 234.679387] RDX: 000000000000000d RSI: 00007fcafa5c1000 RDI: 0000000000000001 [ 234.683848] RBP: 00007fcafa5c1000 R08: 000000000000000a R09: 00007fcafa59b740 [ 234.688167] R10: 00007ffd3b01d8e0 R11: 0000000000000246 R12: 00007fcaf9f75400 [ 234.692386] R13: 000000000000000d R14: 0000000000000001 R15: 0000000000000000 [ 234.696495] irq event stamp: 153067 [ 234.700525] hardirqs last enabled at (153067): [<ffffffff83258c39>] _raw_spin_unlock_irqrestore+0x59/0x70 [ 234.704665] hardirqs last disabled at (153066): [<ffffffff83259382>] _raw_spin_lock_irqsave+0x22/0x90 [ 234.708722] softirqs last enabled at (153058): [<ffffffff836006c5>] __do_softirq+0x6c5/0xb4e [ 234.712673] softirqs last disabled at (153051): [<ffffffff81227c1d>] irq_exit+0x17d/0x1d0 [ 234.716601] ---[ end trace 5dbf096843ee9ce6 ]--- Fixes: df097a278c75 ("IB/mlx5: Use the new mlx5 core notifier API") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190731083852.584-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-29IB/mlx5: Support per device q counters in switchdev modeParav Pandit
When parent mlx5_core_dev is in switchdev mode, q_counters are not applicable to multiple non uplink vports. Hence, have make them limited to device level. While at it, correct __mlx5_ib_qp_set_counter() and __mlx5_ib_modify_qp() to use u16 set_id as defined by the device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190723073117.7175-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-29IB/mlx5: Refactor code for counters allocationParav Pandit
To support per device counters in switchdev mode (instead of per port counter), refactor query routines to work on mlx5_ib_counter structure instead of mlx5_ib_port structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190723073117.7175-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-25IB/mlx5: Avoid unnecessary typecastParav Pandit
IB device pointer is already available while deallocating IB device, Hence do not typecast it. Link: https://lore.kernel.org/r/20190723065733.4899-9-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-08RDMA/mlx5: Set RDMA DIM to be enabled by defaultLeon Romanovsky
Enable RDMA DIM by default for better user experience. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-08IB/mlx5: Report correctly tag matching rendezvous capabilityDanit Goldberg
Userspace expects the IB_TM_CAP_RC bit to indicate that the device supports RC transport tag matching with rendezvous offload. However the firmware splits this into two capabilities for eager and rendezvous tag matching. Only if the FW supports both modes should userspace be told the tag matching capability is available. Cc: <stable@vger.kernel.org> # 4.13 Fixes: eb761894351d ("IB/mlx5: Fill XRQ capabilities") Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05IB/mlx5: Add counter_alloc_stats() and counter_update_stats() supportMark Zhang
Add support for ib callback counter_alloc_stats() and counter_update_stats(). Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05IB/mlx5: Support statistic q counter configurationMark Zhang
Add support for ib callbacks counter_bind_qp(), counter_unbind_qp() and counter_dealloc(). Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05IB/mlx5: Add counter set id as a parameter for mlx5_ib_query_q_counters()Mark Zhang
Add counter set id as a parameter so that this API can be used for querying any q counter. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-04RDMA/mlx5: Use proper allocation API to get zeroed memoryLeon Romanovsky
There is no need in custom memory zeroing, because it can be done by using kzalloc from the beginning. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03IB/mlx5: Register DEVX with mlx5_core to get async eventsYishai Hadas
Register DEVX with with mlx5_core to get async events. This will enable to dispatch the applicable events to its consumers in down stream patches. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03Merge mlx5-next into rdma for-nextJason Gunthorpe
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in the next patches. Resolved the conflicts: - esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports() version - esw_offloads_steering_init() drop the cap test - esw_offloads_init() drop the extra function arguments * branch 'mlx5-next': (39 commits) net/mlx5: Expose device definitions for object events net/mlx5: Report EQE data upon CQ completion net/mlx5: Report a CQ error event only when a handler was set net/mlx5: mlx5_core_create_cq() enhancements net/mlx5: Expose the API to register for ANY event net/mlx5: Use event mask based on device capabilities net/mlx5: Fix mlx5_core_destroy_cq() error flow net/mlx5: E-Switch, Handle UC address change in switchdev mode net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop net/mlx5: E-Switch, Use iterator for vlan and min-inline setups net/mlx5: E-Switch, Reg/unreg function changed event at correct stage net/mlx5: E-Switch, Consolidate eswitch function number of VFs net/mlx5: E-Switch, Refactor eswitch SR-IOV interface net/mlx5: Handle host PF vport mac/guid for ECPF net/mlx5: E-Switch, Use correct flags when configuring vlan net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs net/mlx5: Don't handle VF func change if host PF is disabled net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices net/mlx5: Move pci status reg access mutex to mlx5_pci_init net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03net/mlx5: Report EQE data upon CQ completionYishai Hadas
Report EQE data upon CQ completion to let upper layers use this data. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-01net/mlx5: E-Switch, Refactor eswitch SR-IOV interfaceBodong Wang
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF can be at offload mode when SR-IOV is not enabled. Rename the interface and eswitch mode names to decouple from SR-IOV, and cleanup eswitch messages accordingly. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>