diff options
author | Jacob Moroni <jmoroni@google.com> | 2025-02-20 17:56:12 +0000 |
---|---|---|
committer | Jason Gunthorpe <jgg@nvidia.com> | 2025-04-07 15:38:09 -0300 |
commit | 4dab26bed543584577b64b36aadb8b5b165bf44f (patch) | |
tree | c3ed6cd0ec864f6fb3a9a0b1d3c669da894caba4 /scripts/gdb/linux/tasks.py | |
parent | 3aadd652c2c816a908abe55079e2590e3259e8ba (diff) |
IB/cm: use rwlock for MAD agent lock
In workloads where there are many processes establishing connections using
RDMA CM in parallel (large scale MPI), there can be heavy contention for
mad_agent_lock in cm_alloc_msg.
This contention can occur while inside of a spin_lock_irq region, leading
to interrupts being disabled for extended durations on many
cores. Furthermore, it leads to the serialization of rdma_create_ah calls,
which has negative performance impacts for NICs which are capable of
processing multiple address handle creations in parallel.
The end result is the machine becoming unresponsive, hung task warnings,
netdev TX timeouts, etc.
Since the lock appears to be only for protection from cm_remove_one, it
can be changed to a rwlock to resolve these issues.
Reproducer:
Server:
for i in $(seq 1 512); do
ucmatose -c 32 -p $((i + 5000)) &
done
Client:
for i in $(seq 1 512); do
ucmatose -c 32 -p $((i + 5000)) -s 10.2.0.52 &
done
Fixes: 76039ac9095f ("IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock")
Link: https://patch.msgid.link/r/20250220175612.2763122-1-jmoroni@google.com
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Diffstat (limited to 'scripts/gdb/linux/tasks.py')
0 files changed, 0 insertions, 0 deletions