summaryrefslogtreecommitdiff
path: root/io_uring/rw.h
AgeCommit message (Collapse)Author
2025-03-07io_uring/rw: implement vectored registered rwPavel Begunkov
Implement registered buffer vectored reads with new opcodes IORING_OP_WRITEV_FIXED and IORING_OP_READV_FIXED. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7c89eb481e870f598edc91cc66ff4d1e4ae3788.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-07io_uring: introduce struct iou_vecPavel Begunkov
I need a convenient way to pass around and work with iovec+size pair, put them into a structure and makes use of it in rw.c Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d39fadafc9e9047b0a292e5be6db3cf2f48bb1f7.1741362889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-28io_uring/rw: move fixed buffer import to issue pathKeith Busch
Registered buffers may depend on a linked command, which makes the prep path too early to import. Move to the issue path when the node is actually needed like all the other users of fixed buffers. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250227223916.143006-3-kbusch@meta.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-17io_uring: introduce type alias for io_tw_stateCaleb Sander Mateos
In preparation for changing how io_tw_state is passed, introduce a type alias io_tw_token_t for struct io_tw_state *. This allows for changing the representation in one place, without having to update the many functions that just forward their struct io_tw_state * argument. Also add a comment to struct io_tw_state to explain its purpose. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250217022511.1150145-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-23io_uring: get rid of alloc cache init_once handlingJens Axboe
init_once is called when an object doesn't come from the cache, and hence needs initial clearing of certain members. While the whole struct could get cleared by memset() in that case, a few of the cache members are large enough that this may cause unnecessary overhead if the caches used aren't large enough to satisfy the workload. For those cases, some churn of kmalloc+kfree is to be expected. Ensure that the 3 users that need clearing put the members they need cleared at the start of the struct, and wrap the rest of the struct in a struct group so the offset is known. While at it, improve the interaction with KASAN such that when/if KASAN writes to members inside the struct that should be retained over caching, it won't trip over itself. For rw and net, the retaining of the iovec over caching is disabled if KASAN is enabled. A helper will free and clear those members in that case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23io_uring: introduce attributes for read/write and PI supportAnuj Gupta
Add the ability to pass additional attributes along with read/write. Application can prepare attibute specific information and pass its address using the SQE field: __u64 attr_ptr; Along with setting a mask indicating attributes being passed: __u64 attr_type_mask; Overall 64 attributes are allowed and currently one attribute 'IORING_RW_ATTR_FLAG_PI' is supported. With PI attribute, userspace can pass following information: - flags: integrity check flags IO_INTEGRITY_CHK_{GUARD/APPTAG/REFTAG} - len: length of PI/metadata buffer - addr: address of metadata buffer - seed: seed value for reftag remapping - app_tag: application defined 16b value Process this information to prepare uio_meta_descriptor and pass it down using kiocb->private. PI attribute is supported only for direct IO. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20241128112240.8867-7-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring/alloc_cache: switch to array based cachingJens Axboe
Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage. Outside of that detail, games are also played with KASAN as the list is inside the cached entry itself. Finally, all users of this need a struct io_cache_entry embedded in their struct, which is union'ized with something else in there that isn't used across the free -> realloc cycle. Get rid of all of that, and simply have it be an array. This will not change the memory used, as we're just trading an 8-byte member entry for the per-elem array size. This reduces the overhead of the recycled allocations, and it reduces the amount of code code needed to support recycling to about half of what it currently is. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring/rw: add iovec recyclingJens Axboe
Let the io_async_rw hold on to the iovec and reuse it, rather than always allocate and free them. Also enables KASAN for the iovec entries, so that reuse can be detected even while they are in the cache. While doing so, shrink io_async_rw by getting rid of the bigger embedded fast iovec. Since iovecs are being recycled now, shrink it from 8 to 1. This reduces the io_async_rw size from 264 to 160 bytes, a 40% reduction. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring: get rid of struct io_rw_stateJens Axboe
A separate state struct is not needed anymore, just fold it in with io_async_rw. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-15io_uring/rw: always setup io_async_rw for read/write requestsJens Axboe
read/write requests try to put everything on the stack, and then alloc and copy if a retry is needed. This necessitates a bunch of nasty code that deals with intermediate state. Get rid of this, and have the prep side setup everything that is needed upfront, which greatly simplifies the opcode handlers. This includes adding an alloc cache for io_async_rw, to make it cheap to handle. In terms of cost, this should be basically free and transparent. For the worst case of {READ,WRITE}_FIXED which didn't need it before, performance is unaffected in the normal peak workload that is being used to test that. Still runs at 122M IOPS. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-06io_uring/rw: add separate prep handler for fixed read/writeJens Axboe
Rather than sprinkle opcode checks in the generic read/write prep handler, have a separate prep handler for the vectored readv/writev operation. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-06io_uring/rw: add separate prep handler for readv/writevJens Axboe
Rather than sprinkle opcode checks in the generic read/write prep handler, have a separate prep handler for the vectored readv/writev operation. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-21io_uring/rw: add support for IORING_OP_READ_MULTISHOTJens Axboe
This behaves like IORING_OP_READ, except: 1) It only supports pollable files (eg pipes, sockets, etc). Note that for sockets, you probably want to use recv/recvmsg with multishot instead. 2) It supports multishot mode, meaning it will repeatedly trigger a read and fill a buffer when data is available. This allows similar use to recv/recvmsg but on non-sockets, where a single request will repeatedly post a CQE whenever data is read from it. 3) Because of #2, it must be used with provided buffers. This is uniformly true across any request type that supports multishot and transfers data, with the reason being that it's obviously not possible to pass in a single buffer for the data, as multiple reads may very well trigger before an application has a chance to process previous CQEs and the data passed from them. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-02io_uring: avoid indirect function calls for the hottest task_workJens Axboe
We use task_work for a variety of reasons, but doing completions or triggering rety after poll are by far the hottest two. Use the indirect funtion call wrappers to avoid the indirect function call if CONFIG_RETPOLINE is set. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-21io_uring/rw: don't lose partial IO result on failPavel Begunkov
A partially done read/write may end up in io_req_complete_failed() and loose the result, make sure we return the number of bytes processed. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/05e0879c226bcd53b441bf92868eadd4bf04e2fc.1663668091.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: move read/write related opcodes to its own fileJens Axboe
Signed-off-by: Jens Axboe <axboe@kernel.dk>