linux.git - Linus' kernel tree

diff options

author	Caleb Sander Mateos <csander@purestorage.com>	2025-05-12 15:50:40 +0200
committer	Christoph Hellwig <hch@lst.de>	2025-05-20 05:34:27 +0200
commit	d977506f8863807129d7a11f4057dfb1b38085ea (patch)
tree	bc960debf3a808f9ddf530ad8a292cd29675732f /tools/perf/scripts/python/exported-sql-viewer.py
parent	b9d1ec530cdb631a3298de97c7b2ccc6145e1798 (diff)

nvme-pci: make PRP list DMA pools per-NUMA-node

NVMe commands with over 8 KB of discontiguous data allocate PRP list pages from the per-nvme_device dma_pool prp_page_pool or prp_small_pool. Each call to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool spinlock. These device-global spinlocks are a significant source of contention when many CPUs are submitting to the same NVMe devices. On a workload issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2 NUMA nodes to 23 NVMe devices, we observed 2.4% of CPU time spent in _raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free. Ideally, the dma_pools would be per-hctx to minimize contention. But that could impose considerable resource costs in a system with many NVMe devices and CPUs. As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each nvme_queue to the set of DMA pools corresponding to its device and its hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by about half, to 1.2%. Preventing the sharing of PRP list pages across NUMA nodes also makes them cheaper to initialize. Link: https://lore.kernel.org/linux-nvme/CADUfDZqa=OOTtTTznXRDmBQo1WrFcDw1hBA7XwM7hzJ-hpckcA@mail.gmail.com/T/#u Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>

Diffstat (limited to 'tools/perf/scripts/python/exported-sql-viewer.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: