summaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)Author
2024-07-01RDMA/nldev: Add support to dump device type and parent device if existsMark Zhang
If a device has a specific type or a parent device, dump them as well. Example: $ rdma dev show smi1 3: smi1: node_type ca fw 20.38.1002 node_guid 9803:9b03:009f:d5ef sys_image_guid 9803:9b03:009f:d5ee type smi parent ibp8s0f1 Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/4c022e3e34b5de1254a3b367d502a362cdd0c53a.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01RDMA/nldev: Add support to add/delete a sub IB device through netlinkMark Zhang
Add new netlink commands and attributes to support adding and deleting a sub IB device with admin privilege. Examples: $ rdma dev add smi1 type SMI parent ibp8s0f1 $ rdma dev del smi1 Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/77cbf1b36359642be8a8d8c5c2f4e585b544282f.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01RDMA/mlx5: Support plane device and driver APIs to add and delete itMark Zhang
This patch supports driver APIs "add_sub_dev" and "del_sub_dev", to add and delete a plane device respectively. A mlx5 plane device is a rdma SMI device; It provides the SMI capability through user MAD for it's parent, the logical multi-plane aggregated device. For a plane port: - It supports QP0 only; - When adding a plane device, all plane ports are added; - For some commands like mad_ifc, both plane_index and native portnum is needed; - When querying or modifying a plane port context, the native portnum must be used, as the query/modify_hca_vport_context command doesn't support plane port. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/e933cd0562aece181f8657af2ca0f5b387d0f14e.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01RDMA/core: Create GSI QP only when CM is supportedMark Zhang
GSI QP is not needed if the port doesn't support connection management. In following patches mlx5 is going to support IB ports that doesn't support CM. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/c449ebd955923b0e54c58832fd322f9d461b37a0.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01RDMA/core: Support IB sub device with type "SMI"Mark Zhang
This patch adds 2 APIs, as well as driver operations to support adding and deleting an IB sub device, which provides part of functionalities of it's parent. A sub device has a type; for a sub device with type "SMI", it provides the smi capability through umad for its parent, meaning uverb is not supported. A sub device cannot live without a parent. So when a parent is released, all it's sub devices are released as well. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/44253f7508b21eb2caefea3980c2bc072869116c.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01RDMA/mlx5: Add support to multi-plane device and portMark Zhang
When multi-plane is supported, a logical port, which is aggregation of multiple physical plane ports, is exposed for data transmission. Compared with a normal mlx5 IB port, this logical port supports all functionalities except Subnet Management. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/7e37c06c9cb243be9ac79930cd17053903785b95.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01RDMA/core: Create "issm*" device nodes only when SMI is supportedMark Zhang
For an IB port create it's issm device node only when it has SMI capability. In following patches mlx5 is going to support IB devices without this cap. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/359f73c9a388d5e3ae971e40d8507888b1ba6f93.1718553901.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-07-01RDMA/bnxt_re: Disable doorbell moderation if hardware register read failsSelvin Xavier
If the HW register read fails, the FIFO will be always shown as full. DB moderation doesn't work in that case and the traffic fails. So disable this feature and log a message. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1719456065-27394-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-07-01RDMA/bnxt_re: Enable DB moderation for genP7 adaptersSelvin Xavier
Enable DB moderation support for GenP7 adapters also. Query from FW and update the status. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1719456065-27394-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-07-01RDMA/bnxt_re: Update the correct DB FIFO depth and mask for GenP7Selvin Xavier
GenP5 and P7 devices have different DB FIFO depth. Use different values based on the chip context. Instead of hardcoding doorbell FIFO related values, get it from the HWRM interface. Maintain backward compatibility by having default values when FW is not providing the doorbell FIFO related values. Signed-off-by: Chandramohan Akula <chandramohan.akula@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1719456065-27394-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-07-01RDMA/device: Return error earlier if port in not validLeon Romanovsky
There is no need to allocate port data if port provided is not valid. Fixes: c2261dd76b54 ("RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev") Link: https://lore.kernel.org/r/022047a8b16988fc88d4426da50bf60a4833311b.1719235449.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-06-27RDMA/mlx5: Send UAR page index as ioctl attributeAkiva Goldberger
Add UAR page index as a driver ioctl attribute to increase the number of supported indices, previously limited to 16 bits by mlx5_ib_create_cq struct. Link: https://lore.kernel.org/r/0e18b34d7ec3b1ae02d694b0d545aed7413c0ef7.1719512393.git.leon@kernel.org Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-27RDMA: Pass entire uverbs attr bundle to create cq functionAkiva Goldberger
Changes the create_cq verb signature by sending the entire uverbs attr bundle as a parameter. This allows drivers to send driver specific attrs through ioctl for the create_cq verb and access them in their driver specific code. Also adds a new enum value for driver specific ioctl attributes for methods already supporting UHW. Link: https://lore.kernel.org/r/ed147343987c0d43fd391c1b2f85e2f425747387.1719512393.git.leon@kernel.org Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: e3f02f32a050 ("ionic: fix kernel panic due to multi-buffer handling") d9c04209990b ("ionic: Mark error paths in the data path as unlikely") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-26RDMA/efa: Remove duplicate aenq enable macroYonatan Nachum
We have the same macro in main and verbs files and we don't use the macro in the verbs file, remove it. Link: https://lore.kernel.org/r/20240624160918.27060-3-mrgolin@amazon.com Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Reviewed-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26RDMA/efa: Use offset_in_page() functionGal Pressman
Use offset_in_page() instead of open-coding it. Link: https://lore.kernel.org/r/20240624160918.27060-2-mrgolin@amazon.com Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Reviewed-by: Firas Jahjah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26RDMA/hfi1: Constify struct mmu_rb_opsChristophe JAILLET
'struct mmu_rb_ops' is not modified in this driver. Constifying this structure moves some data to a read-only section, so increase overall security. On a x86_64, with allmodconfig, as an example: Before: ====== text data bss dec hex filename 10879 164 0 11043 2b23 drivers/infiniband/hw/hfi1/pin_system.o After: ===== text data bss dec hex filename 10907 140 0 11047 2b27 drivers/infiniband/hw/hfi1/pin_system.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/b826dd05eefa5f4d6a7a1b4d191eaf37c714ed04.1719259997.git.christophe.jaillet@wanadoo.fr Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26RDMA/rxe: Don't set BTH_ACK_MASK for UC or UD QPsHonggang LI
BTH_ACK_MASK bit is used to indicate that an acknowledge (for this packet) should be scheduled by the responder. Both UC and UD QPs are unacknowledged, so don't set BTH_ACK_MASK for UC or UD QPs. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240624020348.494338-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26IB/isert: remove the handling of last WQE reached eventMax Gurtovoy
This event is raised for QPs that are associated with a Shared RQ (SRQ). The iSER target does not support SRQ. Remove this dead code. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20240619171153.34631-3-mgurtovoy@nvidia.com Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26IB/core: add support for draining Shared receive queuesMax Gurtovoy
To avoid leakage for QPs assocoated with SRQ, according to IB spec (section 10.3.1): "Note, for QPs that are associated with an SRQ, the Consumer should take the QP through the Error State before invoking a Destroy QP or a Modify QP to the Reset State. The Consumer may invoke the Destroy QP without first performing a Modify QP to the Error State and waiting for the Affiliated Asynchronous Last WQE Reached Event. However, if the Consumer does not wait for the Affiliated Asynchronous Last WQE Reached Event, then WQE and Data Segment leakage may occur. Therefore, it is good programming practice to teardown a QP that is associated with an SRQ by using the following process: - Put the QP in the Error State; - wait for the Affiliated Asynchronous Last WQE Reached Event; - either: - drain the CQ by invoking the Poll CQ verb and either wait for CQ to be empty or the number of Poll CQ operations has exceeded CQ capacity size; or - post another WR that completes on the same CQ and wait for this WR to return as a WC; - and then invoke a Destroy QP or Reset QP." Catch the Last WQE Reached Event in the core layer during drain QP flow. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20240619171153.34631-2-mgurtovoy@nvidia.com Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26RDMA/mlx4: Fix truncated output warning in alias_GUID.cLeon Romanovsky
drivers/infiniband/hw/mlx4/alias_GUID.c: In function ‘mlx4_ib_init_alias_guid_service’: drivers/infiniband/hw/mlx4/alias_GUID.c:878:74: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 5 [-Werror=format-truncation=] 878 | snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i); | ^~ drivers/infiniband/hw/mlx4/alias_GUID.c:878:63: note: directive argument in the range [-2147483641, 2147483646] 878 | snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i); | ^~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/alias_GUID.c:878:17: note: ‘snprintf’ output between 12 and 22 bytes into a destination of size 15 878 | snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Fixes: a0c64a17aba8 ("mlx4: Add alias_guid mechanism") Link: https://lore.kernel.org/r/1951c9500109ca7e36dcd523f8a5f2d0d2a608d1.1718554641.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-26RDMA/mlx4: Fix truncated output warning in mad.cLeon Romanovsky
Increase size of the name array to avoid truncated output warning. drivers/infiniband/hw/mlx4/mad.c: In function ‘mlx4_ib_alloc_demux_ctx’: drivers/infiniband/hw/mlx4/mad.c:2197:47: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 4 [-Werror=format-truncation=] 2197 | snprintf(name, sizeof(name), "mlx4_ibt%d", port); | ^~ drivers/infiniband/hw/mlx4/mad.c:2197:38: note: directive argument in the range [-2147483645, 2147483647] 2197 | snprintf(name, sizeof(name), "mlx4_ibt%d", port); | ^~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2197:9: note: ‘snprintf’ output between 10 and 20 bytes into a destination of size 12 2197 | snprintf(name, sizeof(name), "mlx4_ibt%d", port); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2205:48: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 3 [-Werror=format-truncation=] 2205 | snprintf(name, sizeof(name), "mlx4_ibwi%d", port); | ^~ drivers/infiniband/hw/mlx4/mad.c:2205:38: note: directive argument in the range [-2147483645, 2147483647] 2205 | snprintf(name, sizeof(name), "mlx4_ibwi%d", port); | ^~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2205:9: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 12 2205 | snprintf(name, sizeof(name), "mlx4_ibwi%d", port); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2213:48: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 3 [-Werror=format-truncation=] 2213 | snprintf(name, sizeof(name), "mlx4_ibud%d", port); | ^~ drivers/infiniband/hw/mlx4/mad.c:2213:38: note: directive argument in the range [-2147483645, 2147483647] 2213 | snprintf(name, sizeof(name), "mlx4_ibud%d", port); | ^~~~~~~~~~~~~ drivers/infiniband/hw/mlx4/mad.c:2213:9: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 12 2213 | snprintf(name, sizeof(name), "mlx4_ibud%d", port); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make[6]: *** [scripts/Makefile.build:244: drivers/infiniband/hw/mlx4/mad.o] Error 1 Fixes: fc06573dfaf8 ("IB/mlx4: Initialize SR-IOV IB support for slaves in master context") Link: https://lore.kernel.org/r/f3798b3ce9a410257d7e1ec7c9e285f1352e256a.1718554569.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-06-21RDMA/mana_ib: Ignore optional access flags for MRsKonstantin Taranov
Ignore optional ib_access_flags when an MR is created. Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1717575368-14879-1-git-send-email-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-21RDMA/mlx5: Add check for srq max_sge attributePatrisious Haddad
max_sge attribute is passed by the user, and is inserted and used unchecked, so verify that the value doesn't exceed maximum allowed value before using it. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/277ccc29e8d57bfd53ddeb2ac633f2760cf8cdd0.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-21RDMA/mlx5: Fix unwind flow as part of mlx5_ib_stage_init_initYishai Hadas
Fix unwind flow as part of mlx5_ib_stage_init_init to use the correct goto upon an error. Fixes: 758ce14aee82 ("RDMA/mlx5: Implement MACsec gid addition and deletion") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/aa40615116eda14ec9eca21d52017d632ea89188.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-21RDMA/mlx5: Ensure created mkeys always have a populated rb_keyJason Gunthorpe
cachable and mmkey.rb_key together are used by mlx5_revoke_mr() to put the MR/mkey back into the cache. In all cases they should be set correctly. alloc_cacheable_mr() was setting cachable but not filling rb_key, resulting in cache_ent_find_and_store() bucketing them all into a 0 length entry. implicit_get_child_mr()/mlx5_ib_alloc_implicit_mr() failed to set cachable or rb_key at all, so the cache was not working at all for implicit ODP. Cc: stable@vger.kernel.org Fixes: 8c1185fef68c ("RDMA/mlx5: Change check for cacheable mkeys") Fixes: dd1b913fb0d0 ("RDMA/mlx5: Cache all user cacheable mkeys on dereg MR flow") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7778c02dfa0999a30d6746c79a23dd7140a9c729.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-21RDMA/mlx5: Follow rb_key.ats when creating new mkeysJason Gunthorpe
When a cache ent already exists but doesn't have any mkeys in it the cache will automatically create a new one based on the specification in the ent->rb_key. ent->ats was missed when creating the new key and so ma_translation_mode was not being set even though the ent requires it. Cc: stable@vger.kernel.org Fixes: 73d09b2fe833 ("RDMA/mlx5: Introduce mlx5r_cache_rb_key") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/7c5613458ecb89fbe5606b7aa4c8d990bdea5b9a.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-21RDMA/mlx5: Remove extra unlock on error pathJason Gunthorpe
The below commit lifted the locking out of this function but left this error path unlock behind resulting in unbalanced locking. Remove the missed unlock too. Cc: stable@vger.kernel.org Fixes: 627122280c87 ("RDMA/mlx5: Add work to remove temporary entries from the cache") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/78090c210c750f47219b95248f9f782f34548bb1.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-21RDMA/cache: Release GID table even if leak is detectedLeon Romanovsky
When the table is released, we nullify pointer to GID table, it means that in case GID entry leak is detected, we will leak table too. Delete code that prevents table destruction. Fixes: b150c3862d21 ("IB/core: Introduce GID entry reference counts") Link: https://lore.kernel.org/r/a62560af06ba82c88ef9194982bfa63d14768ff9.1716900410.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16Merge branch 'mlx5-next' into wip/leon-for-nextLeon Romanovsky
The req_transport_retries_exceeded counter shows the number of times requester detected transport retries exceed error. The req_rnr_retries_exceeded counter show the number of times the requester detected RNR NAKs retries exceed error. Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16RDMA/mlx5: Add Qcounters req_transport_retries_exceeded/req_rnr_retries_exceededPatrisious Haddad
The req_transport_retries_exceeded counter shows the number of times requester detected transport retries exceed error. The req_rnr_retries_exceeded counter show the number of times the requester detected RNR NAKs retries exceed error. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/250466af94f4989d638fab168e246035530e912f.1718301543.git.leon@kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16RDMA/mlx5: Set mkeys for dmabuf at PAGE_SIZEChiara Meiohas
Set the mkey for dmabuf at PAGE_SIZE to support any SGL after a move operation. ib_umem_find_best_pgsz returns 0 on error, so it is incorrect to check the returned page_size against PAGE_SIZE Fixes: 90da7dc8206a ("RDMA/mlx5: Support dma-buf based userspace memory region") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/1e2289b9133e89f273a4e68d459057d032cbc2ce.1718301631.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16IB/mlx5: Allocate resources just before first QP/SRQ is createdJianbo Liu
Previously, all IB dev resources are initialized on driver load. As they are not always used, move the initialization to the time when they are needed. To be more specific, move PD (p0) and CQ (c0) initialization to the time when the first SRQ is created. and move SRQs(s0 and s1) initialization to the time first QP is created. To avoid concurrent creations, two new mutexes are also added. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Link: https://lore.kernel.org/r/98c3e53a8cc0bdfeb6dec6e5bb8b037d78ab00d8.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16IB/mlx5: Create UMR QP just before first reg_mr occursJianbo Liu
UMR QP is not used in some cases, so move QP and its CQ creations from driver load flow to the time first reg_mr occurs, that is when MR interfaces are first called. The initialization of dev->umrc.pd and dev->umrc.lock is still done in driver load because pd is needed for mlx5_mkey_cache_init and the lock is reused to protect against the concurrent creation. When testing 4G bytes memory registration latency with rtool [1] and 8 threads in parallel, there is minor performance degradation (<5% for the max latency) is seen for the first reg_mr with this change. Link: https://github.com/paravmellanox/rtool [1] Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Link: https://lore.kernel.org/r/55d3c4f8a542fd974d8a4c5816eccfb318a59b38.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16Delay mlx5_ib internal resources allocationsLeon Romanovsky
From: Leon Romanovsky <leonro@nvidia.com> Internal mlx5_ib resources are created during mlx5_ib module load. This behavior is not optimal because it consumes resources that are not needed when SFs are created. This patch series delays the creation of mlx5_ib internal resources to the stage when they actually used. Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16net/mlx5: Reimplement write combining testJianbo Liu
The test of write combining was added before in mlx5_ib driver. It opens UD QP and posts NOP WQEs, and uses BlueFlame doorbell. When BlueFlame is used, WQEs get written directly to a PCI BAR of the device (in addition to memory) so that the device handles them without having to access memory. In this test, the WQEs written in memory are different from the ones written to the BlueFlame which request CQE update. By checking the completion reports posted on CQ, we can know if BlueFlame succeeds or not. The write combining must be supported if BlueFlame succeeds as its register is written using write combining. This patch reimplements the test in the same way, but using a pair of SQ and CQ only. It is moved to mlx5_core as a general feature used by both mlx5_core and mlx5_ib. Besides, save write combine test result of the PCI function, so that its thousands of child functions such as SF can query without paying the time and resource penalty by itself. The test function is called only after failing to get the cached result. With this enhancement, all thousands of SFs of the PF attached to same driver no longer need to perform WC check explicitly, which is already done in the system. This saves several commands per SF, thereby speeds up SF creation and also saves completion EQ creation. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/4ff5a8cc4c5b5b0d98397baa45a5019bcdbf096e.1717409369.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-16Merge branch 'mana-shared' of ↵Leon Romanovsky
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Leon Romanovsky says: ==================== net: mana: Allow variable size indirection table Like we talked, I created new shared branch for this patch: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=mana-shared * 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net: mana: Allow variable size indirection table ==================== Link: https://lore.kernel.org/all/20240612183051.GE4966@unreal Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-12net: mana: Allow variable size indirection tableShradha Gupta
Allow variable size indirection table allocation in MANA instead of using a constant value MANA_INDIRECT_TABLE_SIZE. The size is now derived from the MANA_QUERY_VPORT_CONFIG and the indirection table is allocated dynamically. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Link: https://lore.kernel.org/r/1718015319-9609-1-git-send-email-shradhagupta@linux.microsoft.com Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-09RDMA/mana_ib: Process QP error events in mana_ibKonstantin Taranov
Process QP fatal events from the error event queue. For that, find the QP, using QPN from the event, and then call its event_handler. To find the QPs, store created RC QPs in an xarray. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1717754897-19858-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Wei Hu <weh@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-09RDMA/rxe: Fix responder length checking for UD request packetsHonggang LI
According to the IBA specification: If a UD request packet is detected with an invalid length, the request shall be an invalid request and it shall be silently dropped by the responder. The responder then waits for a new request packet. commit 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length checking") defers responder length check for UD QPs in function `copy_data`. But it introduces a regression issue for UD QPs. When the packet size is too large to fit in the receive buffer. `copy_data` will return error code -EINVAL. Then `send_data_in` will return RESPST_ERR_MALFORMED_WQE. UD QP will transfer into ERROR state. Fixes: 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length checking") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240523094617.141148-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-09RDMA/iwcm: Fix a use-after-free related to destroying CM IDsBart Van Assche
iw_conn_req_handler() associates a new struct rdma_id_private (conn_id) with an existing struct iw_cm_id (cm_id) as follows: conn_id->cm_id.iw = cm_id; cm_id->context = conn_id; cm_id->cm_handler = cma_iw_handler; rdma_destroy_id() frees both the cm_id and the struct rdma_id_private. Make sure that cm_work_handler() does not trigger a use-after-free by only freeing of the struct rdma_id_private after all pending work has finished. Cc: stable@vger.kernel.org Fixes: 59c68ac31e15 ("iw_cm: free cm_id resources on the last deref") Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-6-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-09RDMA/iwcm: Simplify cm_work_handler()Bart Van Assche
Instead of complicating the code to avoid a spin_lock_irqsave() / spin_lock_irqrestore() pair before returning, simplify the code by removing the local variable 'empty'. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-5-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-09RDMA/iwcm: Simplify cm_event_handler()Bart Van Assche
queue_work() can test efficiently whether or not work is pending. Hence, simplify cm_event_handler() by always calling queue_work() instead of only if the list with pending work is empty. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-4-bvanassche@acm.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-09RDMA/iwcm: Change the return type of iwcm_deref_id()Bart Van Assche
Since iwcm_deref_id() returns either 0 or 1, change its return type from 'int' into 'bool'. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-3-bvanassche@acm.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-09RDMA/iwcm: Use list_first_entry() where appropriateBart Van Assche
Improve source code readability by using list_first_entry() where appropriate. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240605145117.397751-2-bvanassche@acm.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-02RDMA/mana_ib: extend query deviceKonstantin Taranov
Fill in properties of the ib device. Order the assignment in the order of fields in the struct ib_device_attr. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1717070117-1234-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-06-02RDMA/mana_ib: set node_guidKonstantin Taranov
Use the mac address for the node_guid of the IB device. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1717070117-1234-2-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-05-30RDMA/rxe: Fix data copy for IB_SEND_INLINEHonggang LI
For RDMA Send and Write with IB_SEND_INLINE, the memory buffers specified in sge list will be placed inline in the Send Request. The data should be copied by CPU from the virtual addresses of corresponding sge list DMA addresses. Cc: stable@kernel.org Fixes: 8d7c7c0eeb74 ("RDMA: Add ib_virt_dma_to_page()") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240516095052.542767-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-05-30RDMA/efa: Properly handle unexpected AQ completionsMichael Margolin
Do not try to handle admin command completion if it has an unexpected command id and print a relevant error message. Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yehuda Yitschak <yehuday@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://lore.kernel.org/r/20240513064630.6247-1-mrgolin@amazon.com Reviewed-by: Gal Pressman <gal.pressman@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-05-30RDMA/mana_ib: Modify QP stateKonstantin Taranov
Implement modify QP state for RC QPs. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1716366242-558-4-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>