summaryrefslogtreecommitdiff
path: root/block/blk-mq.c
AgeCommit message (Collapse)Author
2025-04-03block: don't grab elevator lock during queue initializationMing Lei
->elevator_lock depends on queue freeze lock, see block/blk-sysfs.c. queue freeze lock depends on fs_reclaim. So don't grab elevator lock during queue initialization which needs to call kmalloc(GFP_KERNEL), and we can cut the dependency between ->elevator_lock and fs_reclaim, then the lockdep warning can be killed. This way is safe because elevator setting isn't ready to run during queue initialization. There isn't such issue in __blk_mq_update_nr_hw_queues() because memalloc_noio_save() is called before acquiring elevator lock. Fixes the following lockdep warning: https://lore.kernel.org/linux-block/67e6b425.050a0220.2f068f.007b.GAE@google.com/ Reported-by: syzbot+4c7e0f9b94ad65811efb@syzkaller.appspotmail.com Cc: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250403105402.1334206-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-01block: remove unused nseg parameterNitesh Shetty
We are no longer using nr_segs, after blk_mq_attempt_bio_merge was moved out of blk_mq_get_new_request. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Link: https://lore.kernel.org/r/20250401044348.15588-1-nj.shetty@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-12block: remove unused parameterGuixin Liu
The blk_mq_map_queue()'s request_queue param is not used anymore, remove it, same with blk_get_flush_queue(). Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Link: https://lore.kernel.org/r/20250312084722.129680-1-kanie@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10block: make sure ->nr_integrity_segments is cloned in blk_rq_prep_cloneMing Lei
Make sure ->nr_integrity_segments is cloned in blk_rq_prep_clone(), otherwise requests cloned by device-mapper multipath will not have the proper nr_integrity_segments values set, then BUG() is hit from sg_alloc_table_chained(). Fixes: b0fd271d5fba ("block: add request clone interface (v2)") Cc: stable@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250310115453.2271109-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10block: protect hctx attributes/params using q->elevator_lockNilay Shroff
Currently, hctx attributes (nr_tags, nr_reserved_tags, and cpu_list) are protected using `q->sysfs_lock`. However, these attributes can be updated in multiple scenarios: - During the driver's probe method. - When updating nr_hw_queues. - When writing to the sysfs attribute nr_requests, which can modify nr_tags. The nr_requests attribute is already protected using q->elevator_lock, but none of the update paths actually use q->sysfs_lock to protect hctx attributes. So to ensure proper synchronization, replace q->sysfs_lock with q->elevator_lock when reading hctx attributes through sysfs. Additionally, blk_mq_update_nr_hw_queues allocates and updates hctx. The allocation of hctx is protected using q->elevator_lock, however, updating hctx params happens without any protection, so safeguard hctx param update path by also using q->elevator_lock. Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://lore.kernel.org/r/20250306093956.2818808-1-nilay@linux.ibm.com [axboe: wrap comment at 80 chars] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-10block: introduce a dedicated lock for protecting queue elevator updatesNilay Shroff
A queue's elevator can be updated either when modifying nr_hw_queues or through the sysfs scheduler attribute. Currently, elevator switching/ updating is protected using q->sysfs_lock, but this has led to lockdep splats[1] due to inconsistent lock ordering between q->sysfs_lock and the freeze-lock in multiple block layer call sites. As the scope of q->sysfs_lock is not well-defined, its (mis)use has resulted in numerous lockdep warnings. To address this, introduce a new q->elevator_lock, dedicated specifically for protecting elevator switches/updates. And we'd now use this new q->elevator_lock instead of q->sysfs_lock for protecting elevator switches/updates. While at it, make elv_iosched_load_module() a static function, as it is only called from elv_iosched_store(). Also, remove redundant parameters from elv_iosched_load_module() function signature. [1] https://lore.kernel.org/all/67637e70.050a0220.3157ee.000c.GAE@google.com/ Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://lore.kernel.org/r/20250304102551.2533767-5-nilay@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-31block: force noio scope in blk_mq_freeze_queueChristoph Hellwig
When block drivers or the core block code perform allocations with a frozen queue, this could try to recurse into the block device to reclaim memory and deadlock. Thus all allocations done by a process that froze a queue need to be done without __GFP_IO and __GFP_FS. Instead of tying to track all of them down, force a noio scope as part of freezing the queue. Note that nvme is a bit of a mess here due to the non-owner freezes, and they will be addressed separately. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-14blk-mq: Move more error handling into blk_mq_submit_bio()Bart Van Assche
The error handling code in blk_mq_get_new_requests() cannot be understood without knowing that this function is only called by blk_mq_submit_bio(). Hence move the code for handling blk_mq_get_new_requests() failures into blk_mq_submit_bio(). Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241218212246.1073149-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-14block: Reorder the request allocation code in blk_mq_submit_bio()Bart Van Assche
Help the CPU branch predictor in case of a cache hit by handling the cache hit scenario first. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241218212246.1073149-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10block: don't update BLK_FEAT_POLL in __blk_mq_update_nr_hw_queuesChristoph Hellwig
When __blk_mq_update_nr_hw_queues changes the number of tag sets, it might have to disable poll queues. Currently it does so by adjusting the BLK_FEAT_POLL, which is a bit against the intent of features that describe hardware / driver capabilities, but more importantly causes nasty lock order problems with the broadly held freeze when updating the number of hardware queues and the limits lock. Fix this by leaving BLK_FEAT_POLL alone, and instead check for the number of poll queues in the bio submission and poll handlers. While this adds extra work to the fast path, the variables are in cache lines used by these operations anyway, so it should be cheap enough. Fixes: 8023e144f9d6 ("block: move the poll flag to queue_limits") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://lore.kernel.org/r/20250110054726.1499538-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10block: check BLK_FEAT_POLL under q_usage_countChristoph Hellwig
Otherwise feature reconfiguration can race with I/O submission. Also drop the bio_clear_polled in the error path, as the flag does not matter for instant error completions, it is a left over from when we allowed polled I/O to proceed unpolled in this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20250110054726.1499538-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-06block: simplify tag allocation policy selectionChristoph Hellwig
Use a plain BLK_MQ_F_* flag to select the round robin tag selection instead of overlaying an enum with just two possible values into the flags space. Doing so allows adding a BLK_MQ_F_MAX sentinel for simplified overflow checking in the messy debugfs helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250106083531.799976-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-04block: remove blk_rq_bio_prepChristoph Hellwig
There is not real point in a helper just to assign three values to four fields, especially when the surrounding code is working on the neighbor fields directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20250103073417.459715-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: track queue dying state automatically for modeling queue freeze lockdepMing Lei
Now we only verify the outmost freeze & unfreeze in current context in case that !q->mq_freeze_depth, so it is reliable to save queue lying state when we want to lock the freeze queue since the state is one per-task variable now. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: track disk DEAD state automatically for modeling queue freeze lockdepMing Lei
Now we only verify the outmost freeze & unfreeze in current context in case that !q->mq_freeze_depth, so it is reliable to save disk DEAD state when we want to lock the freeze queue since the state is one per-task variable now. Doing this way can kill lots of false positive when freeze queue is called before adding disk[1]. [1] https://lore.kernel.org/linux-block/6741f6b2.050a0220.1cc393.0017.GAE@google.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: remove unnecessary check in blk_unfreeze_check_owner()Ming Lei
The following check of 'q->mq_freeze_owner != current' covers the previous one, so remove the unnecessary check. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-18block: avoid to reuse `hctx` not removed from cpuhp callback listMing Lei
If the 'hctx' isn't removed from cpuhp callback list, we can't reuse it, otherwise use-after-free may be triggered. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202412172217.b906db7c-lkp@intel.com Tested-by: kernel test robot <oliver.sang@intel.com> Fixes: 22465bbac53c ("blk-mq: move cpuhp callback registering out of q->sysfs_lock") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241218101617.3275704-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-18block: Revert "block: Fix potential deadlock while freezing queue and ↵Ming Lei
acquiring sysfs_lock" This reverts commit be26ba96421ab0a8fa2055ccf7db7832a13c44d2. Commit be26ba96421a ("block: Fix potential deadlock while freezing queue and acquiring sysfs_loc") actually reverts commit 22465bbac53c ("blk-mq: move cpuhp callback registering out of q->sysfs_lock"), and causes the original resctrl lockdep warning. So revert it and we need to fix the issue in another way. Cc: Nilay Shroff <nilay@linux.ibm.com> Fixes: be26ba96421a ("block: Fix potential deadlock while freezing queue and acquiring sysfs_loc") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241218101617.3275704-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-13block: Fix potential deadlock while freezing queue and acquiring sysfs_lockNilay Shroff
For storing a value to a queue attribute, the queue_attr_store function first freezes the queue (->q_usage_counter(io)) and then acquire ->sysfs_lock. This seems not correct as the usual ordering should be to acquire ->sysfs_lock before freezing the queue. This incorrect ordering causes the following lockdep splat which we are able to reproduce always simply by accessing /sys/kernel/debug file using ls command: [ 57.597146] WARNING: possible circular locking dependency detected [ 57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G W [ 57.597162] ------------------------------------------------------ [ 57.597168] ls/4605 is trying to acquire lock: [ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0 [ 57.597200] but task is already holding lock: [ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 [ 57.597226] which lock already depends on the new lock. [ 57.597233] the existing dependency chain (in reverse order) is: [ 57.597241] -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}: [ 57.597255] down_write+0x6c/0x18c [ 57.597264] start_creating+0xb4/0x24c [ 57.597274] debugfs_create_dir+0x2c/0x1e8 [ 57.597283] blk_register_queue+0xec/0x294 [ 57.597292] add_disk_fwnode+0x2e4/0x548 [ 57.597302] brd_alloc+0x2c8/0x338 [ 57.597309] brd_init+0x100/0x178 [ 57.597317] do_one_initcall+0x88/0x3e4 [ 57.597326] kernel_init_freeable+0x3cc/0x6e0 [ 57.597334] kernel_init+0x34/0x1cc [ 57.597342] ret_from_kernel_user_thread+0x14/0x1c [ 57.597350] -> #4 (&q->debugfs_mutex){+.+.}-{4:4}: [ 57.597362] __mutex_lock+0xfc/0x12a0 [ 57.597370] blk_register_queue+0xd4/0x294 [ 57.597379] add_disk_fwnode+0x2e4/0x548 [ 57.597388] brd_alloc+0x2c8/0x338 [ 57.597395] brd_init+0x100/0x178 [ 57.597402] do_one_initcall+0x88/0x3e4 [ 57.597410] kernel_init_freeable+0x3cc/0x6e0 [ 57.597418] kernel_init+0x34/0x1cc [ 57.597426] ret_from_kernel_user_thread+0x14/0x1c [ 57.597434] -> #3 (&q->sysfs_lock){+.+.}-{4:4}: [ 57.597446] __mutex_lock+0xfc/0x12a0 [ 57.597454] queue_attr_store+0x9c/0x110 [ 57.597462] sysfs_kf_write+0x70/0xb0 [ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac [ 57.597480] vfs_write+0x3dc/0x6e8 [ 57.597488] ksys_write+0x84/0x140 [ 57.597495] system_call_exception+0x130/0x360 [ 57.597504] system_call_common+0x160/0x2c4 [ 57.597516] -> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}: [ 57.597530] __submit_bio+0x5ec/0x828 [ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0 [ 57.597547] iomap_readahead+0x2a0/0x448 [ 57.597556] xfs_vm_readahead+0x28/0x3c [ 57.597564] read_pages+0x88/0x41c [ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8 [ 57.597580] filemap_get_pages+0x188/0x984 [ 57.597588] filemap_read+0x13c/0x4bc [ 57.597596] xfs_file_buffered_read+0x88/0x17c [ 57.597605] xfs_file_read_iter+0xac/0x158 [ 57.597614] vfs_read+0x2d4/0x3b4 [ 57.597622] ksys_read+0x84/0x144 [ 57.597629] system_call_exception+0x130/0x360 [ 57.597637] system_call_common+0x160/0x2c4 [ 57.597647] -> #1 (mapping.invalidate_lock#2){++++}-{4:4}: [ 57.597661] down_read+0x6c/0x220 [ 57.597669] filemap_fault+0x870/0x100c [ 57.597677] xfs_filemap_fault+0xc4/0x18c [ 57.597684] __do_fault+0x64/0x164 [ 57.597693] __handle_mm_fault+0x1274/0x1dac [ 57.597702] handle_mm_fault+0x248/0x484 [ 57.597711] ___do_page_fault+0x428/0xc0c [ 57.597719] hash__do_page_fault+0x30/0x68 [ 57.597727] do_hash_fault+0x90/0x35c [ 57.597736] data_access_common_virt+0x210/0x220 [ 57.597745] _copy_from_user+0xf8/0x19c [ 57.597754] sel_write_load+0x178/0xd54 [ 57.597762] vfs_write+0x108/0x6e8 [ 57.597769] ksys_write+0x84/0x140 [ 57.597777] system_call_exception+0x130/0x360 [ 57.597785] system_call_common+0x160/0x2c4 [ 57.597794] -> #0 (&mm->mmap_lock){++++}-{4:4}: [ 57.597806] __lock_acquire+0x17cc/0x2330 [ 57.597814] lock_acquire+0x138/0x400 [ 57.597822] __might_fault+0x7c/0xc0 [ 57.597830] filldir64+0xe8/0x390 [ 57.597839] dcache_readdir+0x80/0x2d4 [ 57.597846] iterate_dir+0xd8/0x1d4 [ 57.597855] sys_getdents64+0x88/0x2d4 [ 57.597864] system_call_exception+0x130/0x360 [ 57.597872] system_call_common+0x160/0x2c4 [ 57.597881] other info that might help us debug this: [ 57.597888] Chain exists of: &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3 [ 57.597905] Possible unsafe locking scenario: [ 57.597911] CPU0 CPU1 [ 57.597917] ---- ---- [ 57.597922] rlock(&sb->s_type->i_mutex_key#3); [ 57.597932] lock(&q->debugfs_mutex); [ 57.597940] lock(&sb->s_type->i_mutex_key#3); [ 57.597950] rlock(&mm->mmap_lock); [ 57.597958] *** DEADLOCK *** [ 57.597965] 2 locks held by ls/4605: [ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154 [ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4 Prevent the above lockdep warning by acquiring ->sysfs_lock before freezing the queue while storing a queue attribute in queue_attr_store function. Later, we also found[1] another function __blk_mq_update_nr_ hw_queues where we first freeze queue and then acquire the ->sysfs_lock. So we've also updated lock ordering in __blk_mq_update_nr_hw_queues function and ensured that in all code paths we follow the correct lock ordering i.e. acquire ->sysfs_lock before freezing the queue. [1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/ Reported-by: kjain@linux.ibm.com Fixes: af2814149883 ("block: freeze the queue in queue_attr_store") Tested-by: kjain@linux.ibm.com Cc: hch@lst.de Cc: axboe@kernel.dk Cc: ritesh.list@gmail.com Cc: ming.lei@redhat.com Cc: gjoyce@linux.ibm.com Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241210144222.1066229-1-nilay@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-13blk-mq: Clean up blk_mq_requeue_work()Bart Van Assche
Move a statement that occurs in both branches of an if-statement in front of the if-statement. Fix a typo in a source code comment. No functionality has been changed. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241212212941.1268662-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-06blk-mq: move cpuhp callback registering out of q->sysfs_lockMing Lei
Registering and unregistering cpuhp callback requires global cpu hotplug lock, which is used everywhere. Meantime q->sysfs_lock is used in block layer almost everywhere. It is easy to trigger lockdep warning[1] by connecting the two locks. Fix the warning by moving blk-mq's cpuhp callback registering out of q->sysfs_lock. Add one dedicated global lock for covering registering & unregistering hctx's cpuhp, and it is safe to do so because hctx is guaranteed to be live if our request_queue is live. [1] https://lore.kernel.org/lkml/Z04pz3AlvI4o0Mr8@agluck-desk3/ Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Newman <peternewman@google.com> Cc: Babu Moger <babu.moger@amd.com> Reported-by: Luck Tony <tony.luck@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20241206111611.978870-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-06blk-mq: register cpuhp callback after hctx is added to xarray tableMing Lei
We need to retrieve 'hctx' from xarray table in the cpuhp callback, so the callback should be registered after this 'hctx' is added to xarray table. Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Peter Newman <peternewman@google.com> Cc: Babu Moger <babu.moger@amd.com> Cc: Luck Tony <tony.luck@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20241206111611.978870-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-25block: Remove extra part pointer NULLify in blk_rq_init()John Garry
The rq->part pointer is already NULLified in the memset() call, so - like for other pointers in rq - don't re-NULLify. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241125100258.4172774-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-19block: blk-mq: fix uninit-value in blk_rq_prep_clone and refactorSuraj Sonawane
Fix an issue detected by the `smatch` tool: block/blk-mq.c:3314 blk_rq_prep_clone() error: uninitialized symbol 'bio'. This patch refactors `blk_rq_prep_clone()` to improve code readability and ensure safety by addressing potential misuse of the `bio` variable: - Move the bio_put(bio); call to the bio_ctr error handling block, which is the only place where it can be triggered. - Move the bio variable into the __rq_for_each_bio loop scope. This change removes the need to set bio to NULL at the loop's end. discussion on why bio remains uninitialized: https://lore.kernel.org/lkml/20241004141037.43277-1-surajsonawane0215@gmail.com Summary of above discussion: - I pointed out that `bio` can remain uninitialized if the allocation with `bio_alloc_clone` fails. - Keith Busch explained that `bio` is initialized to `NULL` when `bio_alloc_clone()` fails, preventing uninitialized usage. - John Garry questioned whether `rq_src->bio` being `NULL` could leave `bio` uninitialized. Keith clarified that in such cases, `bio` is not referenced, so it does not need initialization. - Christoph Hellwig recommended code improvements: - move the bio_put to the bio_ctr error handling, which is the only case where it can happen - move the bio variable into the __rq_for_each_bio scope, which also removed the need to zero it at the end of the loop These changes enhance code clarity, address static analysis tool warnings, and make the function more maintainable. thread of previous version patch discussion: https://lore.kernel.org/lkml/20241004100842.9052-1-surajsonawane0215@gmail.com Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241119164412.37609-1-surajsonawane0215@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: don't reorder requests in blk_add_rq_to_plugChristoph Hellwig
Add requests to the tail of the list instead of the front so that they are queued up in submission order. Remove the re-reordering in blk_mq_dispatch_plug_list, virtio_queue_rqs and nvme_queue_rqs now that the list is ordered as expected. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: add a rq_list typeChristoph Hellwig
Replace the semi-open coded request list helpers with a proper rq_list type that mirrors the bio_list and has head and tail pointers. Besides better type safety this actually allows to insert at the tail of the list, which will be useful soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-12block: remove the ioprio field from struct requestChristoph Hellwig
The request ioprio is only initialized from the first attached bio, so requests without a bio already never set it. Directly use the bio field instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241112170050.1612998-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-12block: remove the write_hint field from struct requestChristoph Hellwig
The write_hint is only used for read/write requests, which must have a bio attached to them. Just use the bio field instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241112170050.1612998-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-07block: always verify unfreeze lock on the owner taskMing Lei
commit f1be1788a32e ("block: model freeze & enter queue as lock for supporting lockdep") tries to apply lockdep for verifying freeze & unfreeze. However, the verification is only done the outmost freeze and unfreeze. This way is actually not correct because q->mq_freeze_depth still may drop to zero on other task instead of the freeze owner task. Fix this issue by always verifying the last unfreeze lock on the owner task context, and make sure both the outmost freeze & unfreeze are verified in the current task. Fixes: f1be1788a32e ("block: model freeze & enter queue as lock for supporting lockdep") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241031133723.303835-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-07block: remove blk_freeze_queue()Ming Lei
No one use blk_freeze_queue(), so remove it and the obsolete comment. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241031133723.303835-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-26block: model freeze & enter queue as lock for supporting lockdepMing Lei
Recently we got several deadlock report[1][2][3] caused by blk_mq_freeze_queue and blk_enter_queue(). Turns out the two are just like acquiring read/write lock, so model them as read/write lock for supporting lockdep: 1) model q->q_usage_counter as two locks(io and queue lock) - queue lock covers sync with blk_enter_queue() - io lock covers sync with bio_enter_queue() 2) make the lockdep class/key as per-queue: - different subsystem has very different lock use pattern, shared lock class causes false positive easily - freeze_queue degrades to no lock in case that disk state becomes DEAD because bio_enter_queue() won't be blocked any more - freeze_queue degrades to no lock in case that request queue becomes dying because blk_enter_queue() won't be blocked any more 3) model blk_mq_freeze_queue() as acquire_exclusive & try_lock - it is exclusive lock, so dependency with blk_enter_queue() is covered - it is trylock because blk_mq_freeze_queue() are allowed to run concurrently 4) model blk_enter_queue() & bio_enter_queue() as acquire_read() - nested blk_enter_queue() are allowed - dependency with blk_mq_freeze_queue() is covered - blk_queue_exit() is often called from other contexts(such as irq), and it can't be annotated as lock_release(), so simply do it in blk_enter_queue(), this way still covered cases as many as possible With lockdep support, such kind of reports may be reported asap and needn't wait until the real deadlock is triggered. For example, lockdep report can be triggered in the report[3] with this patch applied. [1] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler' https://bugzilla.kernel.org/show_bug.cgi?id=219166 [2] del_gendisk() vs blk_queue_enter() race condition https://lore.kernel.org/linux-block/20241003085610.GK11458@google.com/ [3] queue_freeze & queue_enter deadlock in scsi https://lore.kernel.org/linux-block/ZxG38G9BuFdBpBHZ@fedora/T/#u Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241025003722.3630252-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-26blk-mq: add non_owner variant of start_freeze/unfreeze queue APIsMing Lei
Add non_owner variant of start_freeze/unfreeze queue APIs, so that the caller knows that what they are doing, and we can skip lockdep support for non_owner variant in per-call level. Prepare for supporting lockdep for freezing/unfreezing queue. Reviewed-by: Christoph Hellwig <hch@lst.de> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241025003722.3630252-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-23blk-mq: Unexport blk_mq_flush_busy_ctxs()Bart Van Assche
Commit a6088845c2bf ("block: kyber: make kyber more friendly with merging") removed the only blk_mq_flush_busy_ctxs() call from outside the block layer core. Hence unexport blk_mq_flush_busy_ctxs(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241023202850.3469279-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22blk-mq: Make blk_mq_quiesce_tagset() hold the tag list mutex less longBart Van Assche
Make sure that the tag_list_lock mutex is not held any longer than necessary. This change reduces latency if e.g. blk_mq_quiesce_tagset() is called concurrently from more than one thread. This function is used by the NVMe core and also by the UFS driver. Reported-by: Peter Wang <peter.wang@mediatek.com> Cc: Chao Leng <lengchao@huawei.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: stable@vger.kernel.org Fixes: 414dd48e882c ("blk-mq: add tagset quiesce interface") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20241022181617.2716173-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: fix ordering between checking BLK_MQ_S_STOPPED request addingMuchun Song
Supposing first scenario with a virtio_blk driver. CPU0 CPU1 blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_request_bypass_insert() 1) store blk_mq_start_stopped_hw_queue() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) return blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load return __blk_mq_sched_dispatch_requests() Supposing another scenario. CPU0 CPU1 blk_mq_requeue_work() blk_mq_insert_request() 1) store virtblk_done() blk_mq_start_stopped_hw_queue() blk_mq_run_hw_queues() clear_bit(BLK_MQ_S_STOPPED) 3) store blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_sched_dispatch_requests() if (blk_mq_hctx_stopped()) 2) load continue blk_mq_run_hw_queue() Both scenarios are similar, the full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list. Otherwise, either CPU will not rerun the hardware queue causing starvation of the request. The easy way to fix it is to add the essential full memory barrier into helper of blk_mq_hctx_stopped(). In order to not affect the fast path (hardware queue is not stopped most of the time), we only insert the barrier into the slow path. Actually, only slow path needs to care about missing of dispatching the request to the low-level device driver. Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-4-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: fix ordering between checking QUEUE_FLAG_QUIESCED request addingMuchun Song
Supposing the following scenario. CPU0 CPU1 blk_mq_insert_request() 1) store blk_mq_unquiesce_queue() blk_queue_flag_clear() 3) store blk_mq_run_hw_queues() blk_mq_run_hw_queue() if (!blk_mq_hctx_has_pending()) 4) load return blk_mq_run_hw_queue() if (blk_queue_quiesced()) 2) load return blk_mq_sched_dispatch_requests() The full memory barrier should be inserted between 1) and 2), as well as between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is cleared or CPU1 sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU will not rerun the hardware queue causing starvation. So the first solution is to 1) add a pair of memory barrier to fix the problem, another solution is to 2) use hctx->queue->queue_lock to synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory barrier is not easy to be maintained. Fixes: f4560ffe8cec ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-3-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: fix missing dispatching request when queue is started or unquiescedMuchun Song
Supposing the following scenario with a virtio_blk driver. CPU0 CPU1 CPU2 blk_mq_try_issue_directly() __blk_mq_issue_directly() q->mq_ops->queue_rq() virtio_queue_rq() blk_mq_stop_hw_queue() virtblk_done() blk_mq_try_issue_directly() if (blk_mq_hctx_stopped()) blk_mq_request_bypass_insert() blk_mq_run_hw_queue() blk_mq_run_hw_queue() blk_mq_run_hw_queue() blk_mq_insert_request() return After CPU0 has marked the queue as stopped, CPU1 will see the queue is stopped. But before CPU1 puts the request on the dispatch list, CPU2 receives the interrupt of completion of request, so it will run the hardware queue and marks the queue as non-stopped. Meanwhile, CPU1 also runs the same hardware queue. After both CPU1 and CPU2 complete blk_mq_run_hw_queue(), CPU1 just puts the request to the same hardware queue and returns. It misses dispatching a request. Fix it by running the hardware queue explicitly. And blk_mq_request_issue_directly() should handle a similar situation. Fix it as well. Fixes: d964f04a8fde ("blk-mq: fix direct issue") Cc: stable@vger.kernel.org Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241014092934.53630-2-songmuchun@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: enable passthrough command statisticsKeith Busch
Applications using the passthrough interfaces for IO want to continue seeing the disk stats. These requests had been fenced off from this block layer feature. While the block layer doesn't necessarily know what a passthrough command does, we do know the data size and direction, which is enough to account for the command's stats. Since tracking these has the potential to produce unexpected results, the passthrough stats are locked behind a new queue flag that needs to be enabled with the /sys/block/<dev>/queue/iostats_passthrough attribute. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241007153236.2818562-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: move issue side time stamping to blk_account_io_start()Jens Axboe
It's known needed at that point, and it's cleaner to just assign it there rather than rely on it being reliably set before hitting the IO accounting. Hence, move it out of blk_mq_rq_time_init(), which is now only doing the allocation side timing. While at it, get rid of the '0' time passing to blk_mq_rq_time_init(), just pass in blk_time_get_ns() for the two cases where 0 is being explicitly passed in. The rest pass in the previously cached allocation time. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: set issue time stamp based on queue stateJens Axboe
A previous commit moved RQF_IO_STAT into blk_account_io_done(), where it's being set rather than at allocation time. Unfortunately we do check for that flag in blk_mq_rq_time_init(), and hence setting the start_time_ns wasn't being done. This lead to unwieldy inflight IO counts and times, as IO completion accounting would a 0 value rather than the issue time for it's subtraction math. Fix this by switching the blk_mq_rq_time_init() check to use the queue state rather than the request state. Fixes: b8f762400ae8 ("block: move iostat check into blk_acount_io_start()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202410062110.512391df-oliver.sang@intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: kill blk_do_io_stat() helperJens Axboe
It's now just checking whether or not RQF_IO_STAT is set, so let's get rid of it and just open-code the specific flag that is being checked. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: remove 'req->part' check for stats accountingJens Axboe
If RQF_IO_STAT is set, then accounting is enabled. There's no need to further gate this on req->part being set or not, RQF_IO_STAT should never be set if accounting is not being done for this request. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-22block: move iostat check into blk_acount_io_start()Jens Axboe
Rather than have blk_do_io_stat() check for both RQF_IO_STAT and whether the request is a passthrough requests every time, move both of those checks into blk_account_io_start(). Then blk_do_io_stat() can be reduced to just checking for RQF_IO_STAT. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-14blk-mq: setup queue ->tag_set before initializing hctxMing Lei
Commit 7b815817aa58 ("blk-mq: add helper for checking if one CPU is mapped to specified hctx") needs to check queue mapping via tag set in hctx's cpuhp handler. However, q->tag_set may not be setup yet when the cpuhp handler is enabled, then kernel oops is triggered. Fix the issue by setup queue tag_set before initializing hctx. Cc: stable@vger.kernel.org Reported-and-tested-by: Rick Koch <mr.rickkoch@gmail.com> Closes: https://lore.kernel.org/linux-block/CANa58eeNDozLaBHKPLxSAhEy__FPfJT_F71W=sEQw49UCrC9PQ@mail.gmail.com Fixes: 7b815817aa58 ("blk-mq: add helper for checking if one CPU is mapped to specified hctx") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241014005115.2699642-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-25Merge tag 'for-6.12/block-20240925' of git://git.kernel.dk/linuxLinus Torvalds
Pull more block updates from Jens Axboe: - Improve blk-integrity segment counting and merging (Keith) - NVMe pull request via Keith: - Multipath fixes (Hannes) - Sysfs attribute list NULL terminate fix (Shin'ichiro) - Remove problematic read-back (Keith) - Fix for a regression with the IO scheduler switching freezing from 6.11 (Damien) - Use a raw spinlock for sbitmap, as it may get called from preempt disabled context (Ming) - Cleanup for bd_claiming waiting, using var_waitqueue() rather than the bit waitqueues, as that more accurately describes that it does (Neil) - Various cleanups (Kanchan, Qiu-ji, David) * tag 'for-6.12/block-20240925' of git://git.kernel.dk/linux: nvme: remove CC register read-back during enabling nvme: null terminate nvme_tls_attrs nvme-multipath: avoid hang on inaccessible namespaces nvme-multipath: system fails to create generic nvme device lib/sbitmap: define swap_lock as raw_spinlock_t block: Remove unused blk_limits_io_{min,opt} drbd: Fix atomicity violation in drbd_uuid_set_bm() block: Fix elv_iosched_local_module handling of "none" scheduler block: remove bogus union block: change wait on bd_claiming to use a var_waitqueue blk-integrity: improved sg segment mapping block: unexport blk_rq_count_integrity_sg nvme-rdma: use request to get integrity segments scsi: use request to get integrity segments block: provide a request helper for user integrity segments blk-integrity: consider entire bio list for merging blk-integrity: properly account for segments blk-mq: set the nr_integrity_segments from bio blk-mq: unconditional nr_integrity_segments
2024-09-17Merge tag 'irq-core-2024-09-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Core: - Remove a global lock in the affinity setting code The lock protects a cpumask for intermediate results and the lock causes a bottleneck on simultaneous start of multiple virtual machines. Replace the lock and the static cpumask with a per CPU cpumask which is nicely serialized by raw spinlock held when executing this code. - Provide support for giving a suffix to interrupt domain names. That's required to support devices with subfunctions so that the domain names are distinct even if they originate from the same device node. - The usual set of cleanups and enhancements all over the place Drivers: - Support for longarch AVEC interrupt chip - Refurbishment of the Armada driver so it can be extended for new variants. - The usual set of cleanups and enhancements all over the place" * tag 'irq-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits) genirq: Use cpumask_intersects() genirq/cpuhotplug: Use cpumask_intersects() irqchip/apple-aic: Only access system registers on SoCs which provide them irqchip/apple-aic: Add a new "Global fast IPIs only" feature level irqchip/apple-aic: Skip unnecessary enabling of use_fast_ipi dt-bindings: apple,aic: Document A7-A11 compatibles irqdomain: Use IS_ERR_OR_NULL() in irq_domain_trim_hierarchy() genirq/msi: Use kmemdup_array() instead of kmemdup() genirq/proc: Change the return value for set affinity permission error genirq/proc: Use irq_move_pending() in show_irq_affinity() genirq/proc: Correctly set file permissions for affinity control files genirq: Get rid of global lock in irq_do_set_affinity() genirq: Fix typo in struct comment irqchip/loongarch-avec: Add AVEC irqchip support irqchip/loongson-pch-msi: Prepare get_pch_msi_handle() for AVECINTC irqchip/loongson-eiointc: Rename CPUHP_AP_IRQ_LOONGARCH_STARTING LoongArch: Architectural preparation for AVEC irqchip LoongArch: Move irqchip function prototypes to irq-loongson.h irqchip/loongson-pch-msi: Switch to MSI parent domains softirq: Remove unused 'action' parameter from action callback ...
2024-09-13blk-mq: set the nr_integrity_segments from bioKeith Busch
This value is used for merging considerations, so it needs to be accurate. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20240913182854.2445457-3-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-13blk-mq: unconditional nr_integrity_segmentsKeith Busch
Always defining the field will make using it easier and less error prone in future patches. There shouldn't be any downside to this: the field fits in what would otherwise be a 2-byte hole, so we're not saving space by conditionally leaving it out. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20240913182854.2445457-2-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-07blk-mq: add missing unplug trace eventKeith Busch
The single-queue optimized list flush doesn't have an unplug trace event to pair with the plug event. Add one. In the unlikely event an error occurs and falls back to the less optimized plug flush path, it's possible a 2nd unplug trace event will be logged, but it will show the remainig count that weren't previously handled. Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20240906194540.3719642-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-29block: rework bio splittingChristoph Hellwig
The current setup with bio_may_exceed_limit and __bio_split_to_limits is a bit of a mess. Change it so that __bio_split_to_limits does all the work and is just a variant of bio_split_to_limits that returns nr_segs. This is done by inlining it and instead have the various bio_split_* helpers directly submit the potentially split bios. To support btrfs, the rw version has a lower level helper split out that just returns the offset to split. This turns out to nicely clean up the btrfs flow as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Link: https://lore.kernel.org/r/20240826173820.1690925-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>