summaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw
AgeCommit message (Collapse)Author
2023-04-17RDMA/rxe: Move code to check if drained to subroutineBob Pearson
Move two blocks of code in rxe_comp.c and rxe_req.c to subroutines that check if draining is complete in the SQD state and, if so, generate a SQ_DRAINED event. Link: https://lore.kernel.org/r/20230405042611.6467-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17RDMA/rxe: Remove qp->req.stateBob Pearson
The rxe driver has four different QP state variables, qp->attr.qp_state, qp->req.state, qp->comp.state, and qp->resp.state. All of these basically carry the same information. This patch replaces uses of qp->req.state by qp->attr.qp_state and enum rxe_qp_state. This is the third of three patches which will remove all but the qp->attr.qp_state variable. This will bring the driver closer to the IBA description. Link: https://lore.kernel.org/r/20230405042611.6467-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17RDMA/rxe: Remove qp->comp.stateBob Pearson
The rxe driver has four different QP state variables, qp->attr.qp_state, qp->req.state, qp->comp.state, and qp->resp.state. All of these basically carry the same information. This patch replaces uses of qp->comp.state by qp->attr.qp_state. This is the second of three patches which will remove all but the qp->attr.qp_state variable. This will bring the driver closer to the IBA description. Link: https://lore.kernel.org/r/20230405042611.6467-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17RDMA/rxe: Remove qp->resp.stateBob Pearson
The rxe driver has four different QP state variables, qp->attr.qp_state, qp->req.state, qp->comp.state, and qp->resp.state. All of these basically carry the same information. This patch replaces uses of qp->resp.state by qp->attr.qp_state. This is the first of three patches which will remove all but the qp->attr.qp_state variable. This will bring the driver closer to the IBA description. Link: https://lore.kernel.org/r/20230405042611.6467-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-16RDMA: Add ib_virt_dma_to_page()Jason Gunthorpe
Make it clearer what is going on by adding a function to go back from the "virtual" dma_addr to a kva and another to a struct page. This is used in the ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib). Call them instead of a naked casting and virt_to_page() when working with dma_addr values encoded by the various ib_map functions. This also fixes the virt_to_page() casting problem Linus Walleij has been chasing. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"Zhu Yanjun
In the function rxe_create_qp(), rxe_qp_from_init() is called to initialize qp, internally things like rxe_init_task are not setup until rxe_qp_init_req(). If an error occurred before this point then the unwind will call rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task() which will oops when trying to access the uninitialized spinlock. If rxe_init_task is not executed, rxe_cleanup_task will not be called. Reported-by: syzbot+cfcc1a3c85be15a40cba@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8 Fixes: 8700e3e7c485 ("Soft RoCE driver") Fixes: 2d4b21e0a291 ("IB/rxe: Prevent from completer to operate on non valid QP") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-12RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.cBob Pearson
In a previous patch TASKLET_STATE_SCHED was used as a mask but it is a bit position instead. Add the missing shift. Link: https://lore.kernel.org/r/20230329193308.7489-1-rpearsonhpe@gmail.com Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/linux-rdma/8a054b78-6d50-4bc6-8d8a-83f85fbdb82f@kili.mountain/ Fixes: d94671632572 ("RDMA/rxe: Rewrite rxe_task.c") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-03RDMA/siw: Remove namespace check from siw_netdev_event()Tetsuo Handa
syzbot is reporting that siw_netdev_event(NETDEV_UNREGISTER) cannot destroy siw_device created after unshare(CLONE_NEWNET) due to net namespace check. It seems that this check was by error there and should be removed. Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Suggested-by: Leon Romanovsky <leon@kernel.org> Fixes: bdcf26bf9b3a ("rdma/siw: network and RDMA core interface") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/a44e9ac5-44e2-d575-9e30-02483cc7ffd1@I-love.SAKURA.ne.jp Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-30RDMA/rxe: Clean kzalloc failure pathsLeon Romanovsky
There is no need to print any debug messages after failure to allocate memory, because kernel will print OOM dumps anyway. Together with removal of these messages, remove useless goto jumps. Fixes: 5bf944f24129 ("RDMA/rxe: Add error messages") Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/all/ea43486f-43dd-4054-b1d5-3a0d202be621@kili.mountain Link: https://lore.kernel.org/r/d3cedf723b84e73e8062a67b7489d33802bafba2.1680113597.git.leon@kernel.org Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-03-29RDMA/rxe: Remove tasklet call from rxe_cq.cBob Pearson
Remove the tasklet call in rxe_cq.c and also the is_dying in the cq struct. There is no reason for the rxe driver to defer the call to the cq completion handler by scheduling a tasklet. rxe_cq_post() is not called in a hard irq context. The rxe driver currently is incorrect because the tasklet call is made without protecting the cq pointer with a reference from having the underlying memory freed before the deferred routine is called. Executing the comp_handler inline fixes this problem. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Link: https://lore.kernel.org/r/20230327215643.10410-1-rpearsonhpe@gmail.com Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-24RDMA/rxe: Rewrite rxe_task.cBob Pearson
This patch is a major rewrite of the tasklet routines in rxe_task.c. The main motivation for this is the realization that the code violates the safety of the qp pointer by correct reference counting. When a tasklet is scheduled from a verbs API the calling thread has a valid reference to the qp and schedules the tasklet to run at a later time carrying a pointer to the qp. Once the calling code returns however the qp can be destroyed at any time. In order to correct this a reference to the qp must be taken when the task is scheduled and held until it finishes running. This is complicated by the tasklet library not alwys running a task that is scheduled depending on whether someone else has scheduled it. This patch moves the logic for deciding whether to run or schedule a task outside of do_task() and guarantees that there is only one copy of the task scheduled or running at a time. Secondly the separate flags controlling teardown and draining of the task are included in the task state machine and all references to the state are protected by spinlocks to avoid consistency and memory barrier issues. Link: https://lore.kernel.org/r/20230304174533.11296-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Make tasks schedule each otherBob Pearson
Replace rxe_run_task() by rxe_sched_task() when tasks call each other. These are not performance critical and mainly involve error paths but they run the risk of causing deadlocks. Link: https://lore.kernel.org/r/20230304174533.11296-8-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Remove __rxe_do_task()Bob Pearson
The subroutine __rxe_do_task is not thread safe and it has no way to guarantee that the tasks, which are designed with the assumption that they are non-reentrant, are not reentered. All of its uses are non-performance critical. This patch replaces calls to __rxe_do_task with calls to rxe_sched_task. It also removes irrelevant or unneeded if tests. Instead of calling the task machinery a single call to the tasklet function (rxe_requester, etc.) is sufficient to draing the queues if task execution has been disabled or stopped. Together these changes allow the removal of __rxe_do_task. Link: https://lore.kernel.org/r/20230304174533.11296-7-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Remove qp reference counting in tasksBob Pearson
Currently each of the three tasklets requester, completer and responder in the rxe driver take and release a reference to the qp argument at the beginning and end of the subroutines. The caller passing in the qp argument should be responsible for holding a reference to qp so these are not required. Further doing so breaks the qp cleanup code in rxe_qp_do_cleanup which calls these routines after all the references have been dropped so they cannot drain the packet and work request queues as intended. In fact if these routines are deferred by calling tasklet_schedule there is no guarantee that the calling code does have a qp reference. That is a bug in rxe_task.c which will be fixed later in this series. Link: https://lore.kernel.org/r/20230304174533.11296-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Cleanup error state handling in rxe_comp.cBob Pearson
Cleanup the handling of qp in the error state, reset state and during rxe_qp_do_cleanup. Make the same as rxe_resp.c Link: https://lore.kernel.org/r/20230304174533.11296-5-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Cleanup reset state handling in rxe_resp.cBob Pearson
Cleanup the handling of qp in the error state, reset state and during rxe_qp_do_cleanup. The error state does about the same thing as the others but has code spread all over. This patch combines them in a cleaner way. Link: https://lore.kernel.org/r/20230304174533.11296-4-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Convert tasklet args to queue pairsBob Pearson
Originally is was thought that the tasklet machinery in rxe_task.c would be used in other applications but that has not happened for years. This patch replaces the 'void *arg' by struct 'rxe_qp *qp' in the parameters to the tasklet calls. This change will have no affect on performance but may make the code a little clearer. Link: https://lore.kernel.org/r/20230304174533.11296-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Add error messagesBob Pearson
This patch adds error and debug messages so that every interaction with rdma-core through a verbs API call or a completion error return will generate at least one error message backed up by debug messages with more detail. With dynamic debugging one can follow up after seeing an error message by turning on the appropriate debug messages. Link: https://lore.kernel.org/r/20230303221623.8053-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Extend dbg log messages to err and infoBob Pearson
Extend the dbg log messages (e.g. rxe_dbg_xxx) to include err and info types. rxe.c is modified to use these new log messages as examples. Link: https://lore.kernel.org/r/20230303221623.8053-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Change rxe_dbg to rxe_dbg_devBob Pearson
Replace the name rxe_dbg with rxe_dbg_dev which better matches the remaining rxe_dbg_xxx macros for debug messages with a rxe device parameter. Reuse the name rxe_dbg for debug messages which do not have a rxe device parameter. Link: https://lore.kernel.org/r/20230303221623.8053-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24RDMA/rxe: Replace exists by rxe in rxe.cBob Pearson
'exists' looks like a boolean. This patch replaces it by the normal name used for the rxe device, 'rxe', which should be a little less confusing. The second rxe_dbg() message is incorrect since rxe is known to be NULL and this will cause a seg fault if this message were ever sent. Replace it by pr_debug for the moment. Fixes: c6aba5ea0055 ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c") Link: https://lore.kernel.org/r/20230303221623.8053-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-14RDMA/rdmavt: Delete unnecessary NULL checkNatalia Petrova
There is no need to check 'rdi->qp_dev' for NULL. The field 'qp_dev' is created in rvt_register_device() which will fail if the 'qp_dev' allocation fails in rvt_driver_qp_init(). Overwise this pointer doesn't changed and passed to rvt_qp_exit() by the next step. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table") Signed-off-by: Natalia Petrova <n.petrova@fintech.ru> Link: https://lore.kernel.org/r/20230303124408.16685-1-n.petrova@fintech.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-13IB/rdmavt: Fix target union member for rvt_post_one_wr()Kees Cook
The "cplen" result used by the memcpy() into struct rvt_swqe "wqe" may be sized to 80 for struct rvt_ud_wr (which is member "ud_wr", not "wr" which is only 40 bytes in size). Change the destination union member so the compiler can use the correct bounds check. struct rvt_swqe { union { struct ib_send_wr wr; /* don't use wr.sg_list */ struct rvt_ud_wr ud_wr; ... }; ... }; Silences false positive memcpy() run-time warning: memcpy: detected field-spanning write (size 80) of single field "&wqe->wr" at drivers/infiniband/sw/rdmavt/qp.c:2043 (size 40) Reported-by: Zhang Yi <yizhan@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216561 Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230218185701.never.779-kees@kernel.org Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-13RDMA/siw: Fix potential page_array out of range accessDaniil Dulov
When seg is equal to MAX_ARRAY, the loop should break, otherwise it will result in out of range access. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b9be6f18cf9e ("rdma/siw: transmit path") Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru> Link: https://lore.kernel.org/r/20230227091751.589612-1-d.dulov@aladdin.ru Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Quite a small cycle this time, even with the rc8. I suppose everyone went to sleep over xmas. - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana - inline CQE support for hns - Have mlx5 display device error codes - Pinned DMABUF support for irdma - Continued rxe cleanups, particularly converting the MRs to use xarray - Improvements to what can be cached in the mlx5 mkey cache" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits) IB/mlx5: Extend debug control for CC parameters IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors IB/hfi1: Fix math bugs in hfi1_can_pin_pages() RDMA/irdma: Add support for dmabuf pin memory regions RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys RDMA/rxe: Fix missing memory barriers in rxe_queue.h RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet RDMA/rxe: Remove rxe_alloc() RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size Subject: RDMA/rxe: Handle zero length rdma iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry() RDMA/mlx5: Use rdma_umem_for_each_dma_block() RDMA/umem: Remove unused 'work' member from struct ib_umem RDMA/irdma: Cap MSIX used to online CPUs + 1 RDMA/mlx5: Check reg_create() create for errors RDMA/restrack: Correct spelling RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish() ...
2023-02-16RDMA/rxe: Fix missing memory barriers in rxe_queue.hBob Pearson
An earlier patch which introduced smp_load_acquire/smp_store_release into rxe_queue.h incorrectly assumed that surrounding spin-locks in rxe_verbs.c around queue updates for kernel ulps was sufficient to protect the passing of data through the queues between the ulp and the rxe tasklets. But this was incorrect. The typical sequence was ulp rxe requester tasklet ------------------------ --------------------- spin_lock_irqsave() wqe = queue_head(queue) if (!queue_full(q)) { if (!wqe) spin_unlock_irqrestore return; return -ENOMEM } <process wqe> wqe = queue_producer_addr(q) <fill in wqe> queue_advance_consumer(queue) queue_advance_producer(q) spin_unlock_irqrestore() queue_head() calls queue_empty() which calls smp_load_acquire() For user space apps queue_advance_producer() calls smp_store_release() so that there is a memory barrier between the producer and the consumer but for kernel ulps queue_advance_produce() just incremented the producer index because the lock function is a release function. But to work the barrier has to come between filling in the wqe and updating the producer index. This patch adds the missing barriers. It also changes the enum names for the ulp queue types to QUEUE_TYPE_FROM/TO_ULP instead of QUEUE_TYPE_TO/FROM_DRIVER which is very ambiguous. This bug is suspected as the cause of very rare lockups in a very high scale storage application. It is a bug in any case and should be corrected. Fixes: 0a67c46d2e99 ("RDMA/rxe: Protect user space index loads/stores") Link: https://lore.kernel.org/r/20230214071053.5395-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16RDMA/rxe: Remove rxe_alloc()Bob Pearson
Currently all the object types in the rxe driver are allocated in rdma-core except for MRs. By moving tha kzalloc() call outside of the pool code the rxe_alloc() subroutine can be eliminated and code checking for MR as a special case can be removed. This patch moves the kzalloc() and kfree_rcu() calls into the mr registration and destruction verbs. It removes that code from rxe_pool.c including the rxe_alloc() subroutine which is no longer used. Link: https://lore.kernel.org/r/20230213225551.12437-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16Subject: RDMA/rxe: Handle zero length rdmaBob Pearson
Currently the rxe driver does not handle all cases of zero length rdma operations correctly. The client does not have to provide an rkey for zero length RDMA read or write operations so the rkey provided may be invalid and should not be used to lookup an mr. This patch corrects the driver to ignore the provided rkey if the reth length is zero for read or write operations and make sure to set the mr to NULL. In read_reply() if length is zero rxe_recheck_mr() is not called. Warnings are added in the routines in rxe_mr.c to catch NULL MRs when the length is non-zero. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230202044240.6304-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-06RDMA/siw: Fix user page pinning accountingBernard Metzler
To avoid racing with other user memory reservations, immediately account full amount of pages to be pinned. Fixes: 2251334dcac9 ("rdma/siw: application buffer management") Reported-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20230202101000.402990-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarrayBob Pearson
Replace struct rxe-phys_buf and struct rxe_map by struct xarray in rxe_verbs.h. This allows using rcu locking on reads for the memory maps stored in each mr. This is based off of a sketch of a patch from Jason Gunthorpe in the link below. Some changes were needed to make this work. It applies cleanly to the current for-next and passes the pyverbs, perftest and the same blktests test cases which run today. Link: https://lore.kernel.org/r/20230119235936.19728-7-rpearsonhpe@gmail.com Link: https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/ Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26RDMA/rxe: Cleanup page variables in rxe_mr.cBob Pearson
Cleanup usage of mr->page_shift and mr->page_mask and introduce an extractor for mr->ibmr.page_size. Normal usage in the kernel has page_mask masking out offset in page rather than masking out the page number. The rxe driver had reversed that which was confusing. Implicitly there can be a per mr page_size which was not uniformly supported. Link: https://lore.kernel.org/r/20230119235936.19728-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26RDMA-rxe: Isolate mr code from atomic_write_reply()Bob Pearson
Isolate mr specific code from atomic_write_reply() in rxe_resp.c into a subroutine rxe_mr_do_atomic_write() in rxe_mr.c. Check length for atomic write operation. Make iova_to_vaddr() static. Link: https://lore.kernel.org/r/20230119235936.19728-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26RDMA-rxe: Isolate mr code from atomic_reply()Bob Pearson
Isolate mr specific code from atomic_reply() in rxe_resp.c into a subroutine rxe_mr_do_atomic_op() in rxe_mr.c. Minor cleanups to rxe_check_range() and iova_to_vaddr(). Move enum resp_state to rxe.h Link: https://lore.kernel.org/r/20230119235936.19728-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.cBob Pearson
Move rxe_map_mr_sg() to rxe_mr.c where it makes a little more sense. Link: https://lore.kernel.org/r/20230119235936.19728-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26RDMA/rxe: Cleanup mr_check_rangeBob Pearson
Remove blank lines and replace EFAULT by EINVAL when an invalid mr type is used. Link: https://lore.kernel.org/r/20230119235936.19728-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-23net/sock: Introduce trace_sk_data_ready()Peilin Ye
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready() callback implementations. For example: <...> iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable <...> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-09RDMA/rxe: Prevent faulty rkey generationDaisuke Matsuda
If you create MRs more than 0x10000 times after loading the module, responder starts to reply NAKs for RDMA/Atomic operations because of rkey violation detected in check_rkey(). The root cause is that rkeys are incremented each time a new MR is created and the value overflows into the range reserved for MWs. This commit also increases the value of RXE_MAX_MW that has been limited unlike other parameters. Fixes: 0994a1bcd5f7 ("RDMA/rxe: Bump up default maximum values used via uverbs") Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-09RDMA/rxe: Fix inaccurate constants in rxe_type_infoDaisuke Matsuda
ibv_query_device() has reported incorrect device attributes, which are actually not used by the device. Make the constants correspond with the attributes shown to users. Fixes: 3ccffe8abf2f ("RDMA/rxe: Move max_elem into rxe_type_info") Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by xarrays") Link: https://lore.kernel.org/r/20221220080848.253785-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Fix two build warnings on 32 bit platforms It seems the linux-next CI and 0-day bot are not testing enough 32 bit configurations, as soon as you merged the rdma pull request there were two instant reports of warnings on these sytems that I would have thought should have been covered by time in linux-next Anyhow, here are the fixes so people don't hit problems with -Werror" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/siw: Fix pointer cast warning RDMA/rxe: Fix compile warnings on 32-bit
2022-12-16RDMA/siw: Fix pointer cast warningArnd Bergmann
The previous build fix left a remaining issue in configurations with 64-bit dma_addr_t on 32-bit architectures: drivers/infiniband/sw/siw/siw_qp_tx.c: In function 'siw_get_pblpage': drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 32 | return virt_to_page((void *)paddr); | ^ Use the same double cast here that the driver uses elsewhere to convert between dma_addr_t and void*. Fixes: 0d1b756acf60 ("RDMA/siw: Pass a pointer to virt_to_page()") Link: https://lore.kernel.org/r/20221215170347.2612403-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-15RDMA/rxe: Fix compile warnings on 32-bitJason Gunthorpe
Move the conditional code into a function, with two varients so it is harder to make these kinds of mistakes. drivers/infiniband/sw/rxe/rxe_resp.c: In function 'atomic_write_reply': drivers/infiniband/sw/rxe/rxe_resp.c:794:13: error: unused variable 'payload' [-Werror=unused-variable] 794 | int payload = payload_size(pkt); | ^~~~~~~ drivers/infiniband/sw/rxe/rxe_resp.c:793:24: error: unused variable 'mr' [-Werror=unused-variable] 793 | struct rxe_mr *mr = qp->resp.mr; | ^~ drivers/infiniband/sw/rxe/rxe_resp.c:791:19: error: unused variable 'dst' [-Werror=unused-variable] 791 | u64 src, *dst; | ^~~ drivers/infiniband/sw/rxe/rxe_resp.c:791:13: error: unused variable 'src' [-Werror=unused-variable] 791 | u64 src, *dst; Fixes: 034e285f8b99 ("RDMA/rxe: Make responder support atomic write on RC service") Link: https://lore.kernel.org/linux-rdma/Y5s+EVE7eLWQqOwv@nvidia.com/ Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Usual size of updates, a new driver, and most of the bulk focusing on rxe: - Usual typos, style, and language updates - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma, mlx4, srp - Lots of RXE updates: * Improve reply error handling for bad MR operations * Code tidying * Debug printing uses common loggers * Remove half implemented RD related stuff * Support IBA's recently defined Atomic Write and Flush operations - erdma support for atomic operations - New driver 'mana' for Ethernet HW available in Azure VMs. This driver only supports DPDK" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (122 commits) IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces RDMA: Add missed netdev_put() for the netdevice_tracker RDMA/rxe: Enable RDMA FLUSH capability for rxe device RDMA/cm: Make QP FLUSHABLE for supported device RDMA/rxe: Implement flush completion RDMA/rxe: Implement flush execution in responder side RDMA/rxe: Implement RC RDMA FLUSH service in requester side RDMA/rxe: Extend rxe packet format to support flush RDMA/rxe: Allow registering persistent flag for pmem MR only RDMA/rxe: Extend rxe user ABI to support flush RDMA: Extend RDMA kernel verbs ABI to support flush RDMA: Extend RDMA user ABI to support flush RDMA/rxe: Fix incorrect responder length checking RDMA/rxe: Fix oops with zero length reads RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB define RDMA/hns: Fix XRC caps on HIP08 RDMA/hns: Fix error code of CMD RDMA/hns: Fix page size cap from firmware RDMA/hns: Fix PBL page MTR find RDMA/hns: Fix AH attr queried by query_qp ...
2022-12-09RDMA/rxe: Enable RDMA FLUSH capability for rxe deviceLi Zhijian
Now we are ready to enable RDMA FLUSH capability for RXE. It can support Global Visibility and Persistence placement types. Link: https://lore.kernel.org/r/20221206130201.30986-11-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement flush completionLi Zhijian
Per IBA SPEC, FLUSH will ack in rdma read response with 0 length. Use IB_WC_FLUSH (aka IB_UVERBS_WC_FLUSH) code to tell userspace a FLUSH completion. Link: https://lore.kernel.org/r/20221206130201.30986-9-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement flush execution in responder sideLi Zhijian
Only the requested placement types that also registered in the destination memory region are acceptable. Otherwise, responder will also reply NAK "Remote Access Error" if it found a placement type violation. We will persist data via arch_wb_cache_pmem(), which could be architecture specific. This commit also adds 2 helpers to update qp.resp from the incoming packet. Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Implement RC RDMA FLUSH service in requester sideLi Zhijian
Implement FLUSH request operation in the requester. Link: https://lore.kernel.org/r/20221206130201.30986-7-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Extend rxe packet format to support flushLi Zhijian
Extend rxe opcode tables, headers, helper and constants to support flush operations. Refer to the IBA A19.4.1 for more FETH definition details Link: https://lore.kernel.org/r/20221206130201.30986-6-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Allow registering persistent flag for pmem MR onlyLi Zhijian
Memory region could support at most 2 flush access flags: IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL But we only allow user to register persistent flush flags to the pmem MR where it has the ability of persisting data across power cycles. So registering a persistent access flag to a non-pmem MR will be rejected. Link: https://lore.kernel.org/r/20221206130201.30986-5-lizhijian@fujitsu.com CC: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09RDMA/rxe: Fix incorrect responder length checkingBob Pearson
The code in rxe_resp.c at check_length() is incorrect as it compares pkt->opcode an 8 bit value against various mask bits which are all higher than 256 so nothing is ever reported. This patch rewrites this to compare against pkt->mask which is correct. However this now exposes another error. For UD send packets the value of the pmtu cannot be determined from qp->mtu. All that is required here is to later check if the payload fits into the posted receive buffer in that case. Fixes: 837a55847ead ("RDMA/rxe: Implement packet length validation on responder") Link: https://lore.kernel.org/r/20221208210945.28607-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>