summaryrefslogtreecommitdiff
path: root/io_uring/rsrc.h
AgeCommit message (Collapse)Author
2025-03-10io_uring: rely on io_prep_reg_vec for iovec placementPavel Begunkov
All vectored reg buffer users should use io_import_reg_vec() for iovec imports, since iovec placement is the function's responsibility and callers shouldn't know much about it, drop the offset parameter from io_prep_reg_vec() and calculate it inside. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/08ed87ca4bbc06724373b6ce06f36b703fe60c4e.1741457480.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10io_uring: introduce io_prep_reg_iovec()Pavel Begunkov
iovecs that are turned into registered buffers are imported in a special way with an offset, so that later we can do an in place translation. Add a helper function taking care of it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7de2ecb9ed5efc3c5cf320232236966da5ad4ccc.1741457480.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07io_uring: cap cached iovec/bvec sizePavel Begunkov
Bvecs can be large, put an arbitrary limit on the max vector size it can cache. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/823055fa6628daa24bbc9cd77c2da87e9a1e1e32.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07io_uring: add infra for importing vectored reg buffersPavel Begunkov
Add io_import_reg_vec(), which will be responsible for importing vectored registered buffers. The function might reallocate the vector, but it'd try to do the conversion in place first, which is why it's required of the user to pad the iovec to the right border of the cache. Overlapping also depends on struct iovec being larger than bvec, which is not the case on e.g. 32 bit architectures. Don't try to complicate this case and make sure vectors never overlap, it'll be improved later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60bd246b1249476a6996407c1dbc38ef6febad14.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07io_uring: introduce struct iou_vecPavel Begunkov
I need a convenient way to pass around and work with iovec+size pair, put them into a structure and makes use of it in rw.c Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d39fadafc9e9047b0a292e5be6db3cf2f48bb1f7.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07Merge branch 'for-6.15/io_uring-rx-zc' into for-6.15/io_uring-reg-vecJens Axboe
* for-6.15/io_uring-rx-zc: (80 commits) io_uring/zcrx: add selftest case for recvzc with read limit io_uring/zcrx: add a read limit to recvzc requests io_uring: add missing IORING_MAP_OFF_ZCRX_REGION in io_uring_mmap io_uring: Rename KConfig to Kconfig io_uring/zcrx: fix leaks on failed registration io_uring/zcrx: recheck ifq on shutdown io_uring/zcrx: add selftest net: add documentation for io_uring zcrx io_uring/zcrx: add copy fallback io_uring/zcrx: throttle receive requests io_uring/zcrx: set pp memory provider for an rx queue io_uring/zcrx: add io_recvzc request io_uring/zcrx: dma-map area for the device io_uring/zcrx: implement zerocopy receive pp memory provider io_uring/zcrx: grab a net device io_uring/zcrx: add io_zcrx_area io_uring/zcrx: add interface queue and refill queue net: add helpers for setting a memory provider on an rx queue net: page_pool: add memory provider helpers net: prepare for non devmem TCP memory providers ...
2025-03-04io_uring/rsrc: include io_uring_types.h in rsrc.hCaleb Sander Mateos
io_uring/rsrc.h uses several types from include/linux/io_uring_types.h. Include io_uring_types.h explicitly in rsrc.h to avoid depending on users of rsrc.h including io_uring_types.h first. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Li Zetao <lizetao1@huawei.com> Link: https://lore.kernel.org/r/20250301183612.937529-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring/rsrc: declare io_find_buf_node() in header fileCaleb Sander Mateos
Declare io_find_buf_node() in io_uring/rsrc.h so it can be called from other files. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250301001610.678223-1-csander@purestorage.com [axboe: keep the inline for local hot path usage] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring: cache nodes and mapped buffersKeith Busch
Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-7-kbusch@meta.com [axboe: fix imu leak] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring: add support for kernel registered bvecsKeith Busch
Provide an interface for the kernel to leverage the existing pre-registered buffers that io_uring provides. User space can reference these later to achieve zero-copy IO. User space must register an empty fixed buffer table with io_uring in order for the kernel to make use of it. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-5-kbusch@meta.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-27Merge branch 'io_uring-6.14' into for-6.15/io_uringJens Axboe
Merge mainline fixes into 6.15 branch, as upcoming patches depend on fixes that went into the 6.14 mainline branch. * io_uring-6.14: io_uring/net: save msg_control for compat io_uring/rw: clean up mshot forced sync mode io_uring/rw: move ki_complete init into prep io_uring/rw: don't directly use ki_complete io_uring/rw: forbid multishot async reads io_uring/rsrc: remove unused constants io_uring: fix spelling error in uapi io_uring.h io_uring: prevent opcode speculation io-wq: backoff when retrying worker creation
2025-02-27io_uring: combine buffer lookup and importPavel Begunkov
Registered buffer are currently imported in two steps, first we lookup a rsrc node and then use it to set up the iterator. The first part is usually done at the prep stage, and import happens whenever it's needed. As we want to defer binding to a node so that it works with linked requests, combine both steps into a single helper. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250224213116.3509093-6-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-19io_uring/rsrc: remove unused constantsCaleb Sander Mateos
IO_NODE_ALLOC_CACHE_MAX has been unused since commit fbbb8e991d86 ("io_uring/rsrc: get rid of io_rsrc_node allocation cache") removed the rsrc_node_cache. IO_RSRC_TAG_TABLE_SHIFT and IO_RSRC_TAG_TABLE_MASK have been unused since commit 7029acd8a950 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list") removed the separate tag table for registered nodes. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Li Zetao <lizetao1@huawei.com> Link: https://lore.kernel.org/r/20250219033444.2020136-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-17io_uring/zcrx: add io_zcrx_areaDavid Wei
Add io_zcrx_area that represents a region of userspace memory that is used for zero copy. During ifq registration, userspace passes in the uaddr and len of userspace memory, which is then pinned by the kernel. Each net_iov is mapped to one of these pages. The freelist is a spinlock protected list that keeps track of all the net_iovs/pages that aren't used. For now, there is only one area per ifq and area registration happens implicitly as part of ifq registration. There is no API for adding/removing areas yet. The struct for area registration is there for future extensibility once we support multiple areas and TCP devmem. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20250215000947.789731-3-dw@davidwei.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-17io_uring/rsrc: avoid NULL check in io_put_rsrc_node()Caleb Sander Mateos
Most callers of io_put_rsrc_node() already check that node is non-NULL: - io_rsrc_data_free() - io_sqe_buffer_register() - io_reset_rsrc_node() - io_req_put_rsrc_nodes() (REQ_F_BUF_NODE indicates non-NULL buf_node) Only io_splice_cleanup() can call io_put_rsrc_node() with a NULL node. So move the NULL check there. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250216225900.1075446-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-21io_uring/rsrc: Move lockdep assert from io_free_rsrc_node() to callerJann Horn
Checking for lockdep_assert_held(&ctx->uring_lock) in io_free_rsrc_node() means that the assertion is only checked when the resource drops to zero references. Move the lockdep assertion up into the caller io_put_rsrc_node() so that it instead happens on every reference count decrement. Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20250120-uring-lockdep-assert-earlier-v1-1-68d8e071a4bb@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-21io_uring/rsrc: remove unused parameter ctx for io_rsrc_node_alloc()Sidong Yang
io_uring_ctx parameter for io_rsrc_node_alloc() is unused for now. This patch removes the parameter and fixes the callers accordingly. Signed-off-by: Sidong Yang <sidong.yang@furiosa.ai> Link: https://lore.kernel.org/r/20250115142033.658599-1-sidong.yang@furiosa.ai Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23io_uring/rsrc: export io_check_coalesce_bufferPavel Begunkov
io_try_coalesce_buffer() is a useful helper collecting useful info about a set of pages, I want to reuse it for analysing ring/etc. mappings. I don't need the entire thing and only interested if it can be coalesced into a single page, but that's better than duplicating the parsing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/353b447953cd5d34c454a7d909bb6024c391d6e2.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-07io_uring/rsrc: add & apply io_req_assign_buf_node()Ming Lei
The following pattern becomes more and more: + io_req_assign_rsrc_node(&req->buf_node, node); + req->flags |= REQ_F_BUF_NODE; so make it a helper, which is less fragile to use than above code, for example, the BUF_NODE flag is even missed in current io_uring_cmd_prep(). Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241107110149.890530-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-07io_uring/rsrc: remove '->ctx_ptr' of 'struct io_rsrc_node'Ming Lei
Remove '->ctx_ptr' of 'struct io_rsrc_node', and add 'type' field, meantime remove io_rsrc_node_type(). Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241107110149.890530-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-07io_uring/rsrc: pass 'struct io_ring_ctx' reference to rsrc helpersMing Lei
`io_rsrc_node` instance won't be shared among different io_uring ctxs, and its allocation 'ctx' is always same with the user's 'ctx', so it is safe to pass user 'ctx' reference to rsrc helpers. Even in io_clone_buffers(), `io_rsrc_node` instance is allocated actually for destination io_uring_ctx. Then io_rsrc_node_ctx() can be removed, and the 8 bytes `ctx` pointer will be removed from `io_rsrc_node` in the following patch. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241107110149.890530-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/rsrc: split io_kiocb node type assignmentsJens Axboe
Currently the io_rsrc_node assignment in io_kiocb is an array of two pointers, as two nodes may be assigned to a request - one file node, and one buffer node. However, the buffer node can co-exist with the provided buffers, as currently it's not supported to use both provided and registered buffers at the same time. This crucially brings struct io_kiocb down to 4 cache lines again, as before it spilled into the 5th cacheline. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/rsrc: encode node type and ctx togetherJens Axboe
Rather than keep the type field separate rom ctx, use the fact that we can encode up to 4 types of nodes in the LSB of the ctx pointer. Doesn't reclaim any space right now on 64-bit archs, but it leaves a full int for future use. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: get rid of the empty node and dummy_ubufJens Axboe
The empty node was used as a placeholder for a sparse entry, but it didn't really solve any issues. The caller still has to check for whether it's the empty node or not, it may as well just check for a NULL return instead. The dummy_ubuf was used for a sparse buffer entry, but NULL will serve the same purpose there of ensuring an -EFAULT on attempted import. Just use NULL for a sparse node, regardless of whether or not it's a file or buffer resource. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: add io_reset_rsrc_node() helperJens Axboe
Puts and reset an existing node in a slot, if one exists. Returns true if a node was there, false if not. This helps cleanup some of the code that does a lookup just to clear an existing node. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: add io_rsrc_node_lookup() helperJens Axboe
There are lots of spots open-coding this functionality, add a generic helper that does the node lookup in a speculation safe way. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: unify file and buffer resource tablesJens Axboe
For files, there's nr_user_files/file_table/file_data, and buffers have nr_user_bufs/user_bufs/buf_data. There's no reason why file_table and file_data can't be the same thing, and ditto for the buffer side. That gets rid of more io_ring_ctx state that's in two spots rather than just being in one spot, as it should be. Put all the registered file data in one locations, and ditto on the buffer front. This also avoids having both io_rsrc_data->nodes being an allocated array, and ->user_bufs[] or ->file_table.nodes. There's no reason to have this information duplicated. Keep it in one spot, io_rsrc_data, along with how many resources are available. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring: only initialize io_kiocb rsrc_nodes when neededJens Axboe
Add the empty node initializing to the preinit part of the io_kiocb allocation, and reset them if they have been used. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: add an empty io_rsrc_node for sparse buffer entriesJens Axboe
Rather than allocate an io_rsrc_node for an empty/sparse buffer entry, add a const entry that can be used for that. This just needs checking for writing the tag, and the put check needs to check for that sparse node rather than NULL for validity. This avoids allocating rsrc nodes for sparse buffer entries. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: get rid of per-ring io_rsrc_node listJens Axboe
Work in progress, but get rid of the per-ring serialization of resource nodes, like registered buffers and files. Main issue here is that one node can otherwise hold up a bunch of other nodes from getting freed, which is especially a problem for file resource nodes and networked workloads where some descriptors may not see activity in a long time. As an example, instantiate an io_uring ring fd and create a sparse registered file table. Even 2 will do. Then create a socket and register it as fixed file 0, F0. The number of open files in the app is now 5, with 0/1/2 being the usual stdin/out/err, 3 being the ring fd, and 4 being the socket. Register this socket (eg "the listener") in slot 0 of the registered file table. Now add an operation on the socket that uses slot 0. Finally, loop N times, where each loop creates a new socket, registers said socket as a file, then unregisters the socket, and finally closes the socket. This is roughly similar to what a basic accept loop would look like. At the end of this loop, it's not unreasonable to expect that there would still be 5 open files. Each socket created and registered in the loop is also unregistered and closed. But since the listener socket registered first still has references to its resource node due to still being active, each subsequent socket unregistration is stuck behind it for reclaim. Hence 5 + N files are still open at that point, where N is awaiting the final put held up by the listener socket. Rewrite the io_rsrc_node handling to NOT rely on serialization. Struct io_kiocb now gets explicit resource nodes assigned, with each holding a reference to the parent node. A parent node is either of type FILE or BUFFER, which are the two types of nodes that exist. A request can have two nodes assigned, if it's using both registered files and buffers. Since request issue and task_work completion is both under the ring private lock, no atomics are needed to handle these references. It's a simple unlocked inc/dec. As before, the registered buffer or file table each hold a reference as well to the registered nodes. Final put of the node will remove the node and free the underlying resource, eg unmap the buffer or put the file. Outside of removing the stall in resource reclaim described above, it has the following advantages: 1) It's a lot simpler than the previous scheme, and easier to follow. No need to specific quiesce handling anymore. 2) There are no resource node allocations in the fast path, all of that happens at resource registration time. 3) The structs related to resource handling can all get simplified quite a bit, like io_rsrc_node and io_rsrc_data. io_rsrc_put can go away completely. 4) Handling of resource tags is much simpler, and doesn't require persistent storage as it can simply get assigned up front at registration time. Just copy them in one-by-one at registration time and assign to the resource node. The only real downside is that a request is now explicitly limited to pinning 2 resources, one file and one buffer, where before just assigning a resource node to a request would pin all of them. The upside is that it's easier to follow now, as an individual resource is explicitly referenced and assigned to the request. With this in place, the above mentioned example will be using exactly 5 files at the end of the loop, not N. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/rsrc: kill io_charge_rsrc_node()Jens Axboe
It's only used from __io_req_set_rsrc_node(), and it takes both the ctx and node itself, while never using the ctx. Just open-code the basic refs++ in __io_req_set_rsrc_node() instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring/rsrc: move struct io_fixed_file to rsrc.h headerJens Axboe
There's no need for this internal structure to be visible, move it to the private rsrc.h header instead. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-29io_uring: remove 'issue_flags' argument for io_req_set_rsrc_node()Jens Axboe
All callers already hold the ring lock and hence are passing '0', remove the argument and the conditional locking that it controlled. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-15io_uring/rsrc: change ubuf->ubuf_end to length trackingJens Axboe
If we change it to tracking ubuf->start + ubuf->len, then we can reduce the size of struct io_mapped_ubuf by another 4 bytes, effectively 8 bytes, as a hole is eliminated too. This shrinks io_mapped_ubuf to 32 bytes. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-15io_uring/rsrc: get rid of io_mapped_ubuf->folio_maskJens Axboe
We don't really need to cache this, let's reclaim 8 bytes from struct io_mapped_ubuf and just calculate it when we need it. The only hot path here is io_import_fixed(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-14io_uring: rename "copy buffers" to "clone buffers"Jens Axboe
A recent commit added support for copying registered buffers from one ring to another. But that term is a bit confusing, as no copying of buffer data is done here. What is being done is simply cloning the buffer registrations from one ring to another. Rename it while we still can, so that it's more descriptive. No functional changes in this patch. Fixes: 7cc2a6eadcd7 ("io_uring: add IORING_REGISTER_COPY_BUFFERS method") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-12io_uring: add IORING_REGISTER_COPY_BUFFERS methodJens Axboe
Buffers can get registered with io_uring, which allows to skip the repeated pin_pages, unpin/unref pages for each O_DIRECT operation. This reduces the overhead of O_DIRECT IO. However, registrering buffers can take some time. Normally this isn't an issue as it's done at initialization time (and hence less critical), but for cases where rings can be created and destroyed as part of an IO thread pool, registering the same buffers for multiple rings become a more time sensitive proposition. As an example, let's say an application has an IO memory pool of 500G. Initial registration takes: Got 500 huge pages (each 1024MB) Registered 500 pages in 409 msec or about 0.4 seconds. If we go higher to 900 1GB huge pages being registered: Registered 900 pages in 738 msec which is, as expected, a fully linear scaling. Rather than have each ring pin/map/register the same buffer pool, provide an io_uring_register(2) opcode to simply duplicate the buffers that are registered with another ring. Adding the same 900GB of registered buffers to the target ring can then be accomplished in: Copied 900 pages in 17 usec While timing differs a bit, this provides around a 25,000-40,000x speedup for this use case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11io_uring/rsrc: add reference count to struct io_mapped_ubufJens Axboe
Currently there's a single ring owner of a mapped buffer, and hence the reference count will always be 1 when it's torn down and freed. However, in preparation for being able to link io_mapped_ubuf to different spots, add a reference count to manage the lifetime of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-02io_uring: remove unused rsrc_put_fnAnuj Gupta
rsrc_put_fn is declared but never used, remove it. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://lore.kernel.org/r/20240902062134.136387-3-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-25io_uring/rsrc: enable multi-hugepage buffer coalescingChenliang Li
Add support for checking and coalescing multi-hugepage-backed fixed buffers. The coalescing optimizes both time and space consumption caused by mapping and storing multi-hugepage fixed buffers. A coalescable multi-hugepage buffer should fully cover its folios (except potentially the first and last one), and these folios should have the same size. These requirements are for easier processing later, also we need same size'd chunks in io_import_fixed for fast iov_iter adjust. Signed-off-by: Chenliang Li <cliang01.li@samsung.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20240731090133.4106-3-cliang01.li@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-25io_uring/rsrc: store folio shift and mask into imuChenliang Li
Store the folio shift and folio mask into imu struct and use it in iov_iter adjust, as we will have non PAGE_SIZE'd chunks if a multi-hugepage buffer get coalesced. Signed-off-by: Chenliang Li <cliang01.li@samsung.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20240731090133.4106-2-cliang01.li@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring: remove io_req_put_rsrc_locked()Pavel Begunkov
io_req_put_rsrc_locked() is a weird shim function around io_req_put_rsrc(). All calls to io_req_put_rsrc() require holding ->uring_lock, so we can just use it directly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a195bc78ac3d2c6fbaea72976e982fe51e50ecdd.1712331455.git.asml.silence@gmail.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring/alloc_cache: switch to array based cachingJens Axboe
Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage. Outside of that detail, games are also played with KASAN as the list is inside the cached entry itself. Finally, all users of this need a struct io_cache_entry embedded in their struct, which is union'ized with something else in there that isn't used across the free -> realloc cycle. Get rid of all of that, and simply have it be an array. This will not change the memory used, as we're just trading an 8-byte member entry for the per-elem array size. This reduces the overhead of the recycled allocations, and it reduces the amount of code code needed to support recycling to about half of what it currently is. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12io_uring: Don't include af_unix.h.Kuniyuki Iwashima
Changes to AF_UNIX trigger rebuild of io_uring, but io_uring does not use AF_UNIX anymore. Let's not include af_unix.h and instead include necessary headers. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240212234236.63714-1-kuniyu@amazon.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-01-11io_uring/rsrc: improve code generation for fixed file assignmentJens Axboe
For the normal read/write path, we have already locked the ring submission side when assigning the file. This causes branch mispredictions when we then check and try and lock again in io_req_set_rsrc_node(). As this is a very hot path, this matters. Add a basic helper that already assumes we already have it locked, and use that in io_file_get_fixed(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-19io_uring: drop any code related to SCM_RIGHTSJens Axboe
This is dead code after we dropped support for passing io_uring fds over SCM_RIGHTS, get rid of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-07io_uring/af_unix: disable sending io_uring over socketsPavel Begunkov
File reference cycles have caused lots of problems for io_uring in the past, and it still doesn't work exactly right and races with unix_stream_read_generic(). The safest fix would be to completely disallow sending io_uring files via sockets via SCM_RIGHT, so there are no possible cycles invloving registered files and thus rendering SCM accounting on the io_uring side unnecessary. Cc: <stable@vger.kernel.org> Fixes: 0091bfc81741b ("io_uring/af_unix: defer registered files gc to io_uring release") Reported-and-suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c716c88321939156909cfa1bd8b0faaf1c804103.1701868795.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-17io_uring/rsrc: Annotate struct io_mapped_ubuf with __counted_byKees Cook
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct io_mapped_ubuf. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: io-uring@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230817212146.never.853-kees@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09io_uring/rsrc: Remove unused declaration io_rsrc_put_tw()Yue Haibing
Commit 36b9818a5a84 ("io_uring/rsrc: don't offload node free") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230808151058.4572-1-yuehaibing@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-18io_uring/rsrc: disassociate nodes and rsrc_dataPavel Begunkov
Make rsrc nodes independent from rsrd_data, for that we keep ctx and rsrc type in nodes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4f259abe9cd4eea6a3b4ed83508635218acd3c3f.1681822823.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>