summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-12blk-mq: update arg in comment of blk_mq_map_queueMinwoo Im
Update mis-named argument description of blk_mq_map_queue(). This patch also updates description that argument to software queue percpu context. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-12blk-mq: add helper allocating tagset->tagsMinwoo Im
tagset->set is allocated from blk_mq_alloc_tag_set() rather than being reallocated. This patch added a helper to make its meaning explicitly which is to allocate rather than to reallocate. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07Revert "block: Fix a lockdep complaint triggered by request queue flushing"Ming Lei
This reverts commit b3c6a59975415bde29cfd76ff1ab008edbf614a9. Now we can avoid nvme-loop lockdep warning of 'lockdep possible recursive locking' by nvme-loop's lock class, no need to apply dynamically allocated lock class key, so revert commit b3c6a5997541("block: Fix a lockdep complaint triggered by request queue flushing"). This way fixes horrible SCSI probe delay issue on megaraid_sas, and it is reported the whole probe may take more than half an hour. Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Qian Cai <cai@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: John Garry <john.garry@huawei.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock classMing Lei
Set nvme-loop's lock class via blk_mq_hctx_set_fq_lock_class for avoiding lockdep possible recursive locking, then we can remove the dynamically allocated lock class for each flush queue, finally we can avoid horrible SCSI probe delay. This way may not address situation in which one nvme-loop is backed on another nvme-loop. However, in reality, people seldom uses this way for test. Even though someone played in this way, it is just one recursive locking false positive, no real deadlock issue. Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Qian Cai <cai@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: John Garry <john.garry@huawei.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-mq: add new API of blk_mq_hctx_set_fq_lock_classMing Lei
flush_end_io() may be called recursively from some driver, such as nvme-loop, so lockdep may complain 'possible recursive locking'. Commit b3c6a5997541("block: Fix a lockdep complaint triggered by request queue flushing") tried to address this issue by assigning dynamically allocated per-flush-queue lock class. This solution adds synchronize_rcu() for each hctx's release handler, and causes horrible SCSI MQ probe delay(more than half an hour on megaraid sas). Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so we just need to use driver specific lock class for avoiding the lockdep warning of 'possible recursive locking'. Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Qian Cai <cai@redhat.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: John Garry <john.garry@huawei.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07block: disable iopoll for split bioJeffle Xu
iopoll is initially for small size, latency sensitive IO. It doesn't work well for big IO, especially when it needs to be split to multiple bios. In this case, the returned cookie of __submit_bio_noacct_mq() is indeed the cookie of the last split bio. The completion of *this* last split bio done by iopoll doesn't mean the whole original bio has completed. Callers of iopoll still need to wait for completion of other split bios. Besides bio splitting may cause more trouble for iopoll which isn't supposed to be used in case of big IO. iopoll for split bio may cause potential race if CPU migration happens during bio submission. Since the returned cookie is that of the last split bio, polling on the corresponding hardware queue doesn't help complete other split bios, if these split bios are enqueued into different hardware queues. Since interrupts are disabled for polling queues, the completion of these other split bios depends on timeout mechanism, thus causing a potential hang. iopoll for split bio may also cause hang for sync polling. Currently both the blkdev and iomap-based fs (ext4/xfs, etc) support sync polling in direct IO routine. These routines will submit bio without REQ_NOWAIT flag set, and then start sync polling in current process context. The process may hang in blk_mq_get_tag() if the submitted bio has to be split into multiple bios and can rapidly exhaust the queue depth. The process are waiting for the completion of the previously allocated requests, which should be reaped by the following polling, and thus causing a deadlock. To avoid these subtle trouble described above, just disable iopoll for split bio and return BLK_QC_T_NONE in this case. The side effect is that non-HIPRI IO also returns BLK_QC_T_NONE now. It should be acceptable since the returned cookie is never used for non-HIPRI IO. Suggested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07block: Improve blk_revalidate_disk_zones() checksDamien Le Moal
Improves the checks on the zones of a zoned block device done in blk_revalidate_disk_zones() by making sure that the device report_zones method did report at least one zone and that the zones reported exactly cover the entire disk capacity, that is, that there are no missing zones at the end of the disk sector range. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07sbitmap: simplify wrap checkPavel Begunkov
__sbitmap_get_word() doesn't warp if it's starting from the beginning (i.e. initial hint is 0). Instead of stashing the original hint just set @wrap accordingly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07sbitmap: replace CAS with atomic andPavel Begunkov
sbitmap_deferred_clear() does CAS loop to propagate cleared bits, replace it with equivalent atomic bitwise and. That's slightly faster and makes wait-free instead of lock-free as before. The atomic can be relaxed (i.e. barrier-less) because following sbitmap_get*() deal with synchronisation, see comments in sbitmap_queue_clear(). It's ok to cast to atomic_long_t, that's what bitops/lock.h does. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07sbitmap: remove swap_lockPavel Begunkov
map->swap_lock protects map->cleared from concurrent modification, however sbitmap_deferred_clear() is already atomically drains it, so it's guaranteed to not loose bits on concurrent sbitmap_deferred_clear(). A one threaded tag heavy test on top of nullbk showed ~1.5% t-put increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07sbitmap: optimise sbitmap_deferred_clear()Pavel Begunkov
Because of spinlocks and atomics sbitmap_deferred_clear() have to reload &sb->map[index] on each access even though the map address won't change. Pass in sbitmap_word instead of {sb, index}, so it's cached in a variable. It also improves code generation of sbitmap_find_bit_in_index(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-mq: skip hybrid polling if iopoll doesn't spinPavel Begunkov
If blk_poll() is not going to spin (i.e. @spin=false), it also must not sleep in hybrid polling, otherwise it might be pretty suprising for users trying to do a quick check and expecting no-wait behaviour. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Factor out the base vrate change into a separate functionBaolin Wang
Factor out the base vrate change code into a separate function to fimplify the ioc_timer_fn(). No functional change. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Factor out the active iocgs' state check into a separate functionBaolin Wang
Factor out the iocgs' state check into a separate function to simplify the ioc_timer_fn(). No functional change. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Move the usage ratio calculation to the correct placeBaolin Wang
We only use the hweight based usage ratio to calculate the new hweight_inuse of the iocg to decide if this iocg can donate some surplus vtime. Thus move the usage ratio calculation to the correct place to avoid unnecessary calculation for some vtime shortage iocgs. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Remove unnecessary advance declarationBaolin Wang
Remove unnecessary advance declaration of struct ioc_gq. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blk-iocost: Fix some typos in commentsBaolin Wang
Fix some typos in comments. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07blktrace: fix up a kerneldoc commentChristoph Hellwig
Fixes: a54895fa057c ("block: remove the request_queue to argument request based tracepoints") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the request_queue to argument request based tracepointsChristoph Hellwig
The request_queue can trivially be derived from the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the request_queue argument to the block_bio_remap tracepointChristoph Hellwig
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the request_queue argument to the block_split tracepointChristoph Hellwig
The request_queue can trivially be derived from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: simplify and extend the block_bio_merge tracepoint classChristoph Hellwig
The block_bio_merge tracepoint class can be reused for most bio-based tracepoints. For that it just needs to lose the superfluous q and rq parameters. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04block: remove the unused block_sleeprq tracepointChristoph Hellwig
The block_sleeprq tracepoint was only used by the legacy request code. Remove it now that the legacy request code is gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02blk-throttle: don't check whether or not lower limit is valid if ↵Yu Kuai
CONFIG_BLK_DEV_THROTTLING_LOW is off blk_throtl_update_limit_valid() will search for descendants to see if 'LIMIT_LOW' of bps/iops and READ/WRITE is nonzero. However, they're always zero if CONFIG_BLK_DEV_THROTTLING_LOW is not set, furthermore, a lot of time will be wasted to iterate descendants. Thus do nothing in blk_throtl_update_limit_valid() in such situation. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: fix inflight statistics of part0Jeffle Xu
The inflight of partition 0 doesn't include inflight IOs to all sub-partitions, since currently mq calculates inflight of specific partition by simply camparing the value of the partition pointer. Thus the following case is possible: $ cat /sys/block/vda/inflight        0        0 $ cat /sys/block/vda/vda1/inflight        0      128 While single queue device (on a previous version, e.g. v3.10) has no this issue: $cat /sys/block/sda/sda3/inflight 0 33 $cat /sys/block/sda/inflight 0 33 Partition 0 should be specially handled since it represents the whole disk. This issue is introduced since commit bf0ddaba65dd ("blk-mq: fix sysfs inflight counter"). Besides, this patch can also fix the inflight statistics of part 0 in /proc/diskstats. Before this patch, the inflight statistics of part 0 doesn't include that of sub partitions. (I have marked the 'inflight' field with asterisk.) $cat /proc/diskstats 259 0 nvme0n1 45974469 0 367814768 6445794 1 0 1 0 *0* 111062 6445794 0 0 0 0 0 0 259 2 nvme0n1p1 45974058 0 367797952 6445727 0 0 0 0 *33* 111001 6445727 0 0 0 0 0 0 This is introduced since commit f299b7c7a9de ("blk-mq: provide internal in-flight variant"). Fixes: bf0ddaba65dd ("blk-mq: fix sysfs inflight counter") Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [axboe: adapt for 5.11 partition change] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02bio: optimise bvec iterationPavel Begunkov
__bio_for_each_bvec(), __bio_for_each_segment() and bio_copy_data_iter() fall under conditions of bvec_iter_advance_single(), which is a faster and slimmer version of bvec_iter_advance(). Add bio_advance_iter_single() and convert them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-02block: optimise for_each_bvec() advancePavel Begunkov
Because of how for_each_bvec() works it never advances across multiple entries at a time, so bvec_iter_advance() is an overkill. Add specialised bvec_iter_advance_single() that is faster. It also handles zero-len bvecs, so can kill bvec_iter_skip_zero_bvec(). text data bss dec hex filename before: 23977 805 0 24782 60ce lib/iov_iter.o before, bvec_iter_advance() w/o WARN_ONCE() 22886 600 0 23486 5bbe ./lib/iov_iter.o after: 21862 600 0 22462 57be lib/iov_iter.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: stop using bdget_disk for partition 0Christoph Hellwig
We can just dereference the point in struct gendisk instead. Also remove the now unused export. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: merge struct block_device and struct hd_structChristoph Hellwig
Instead of having two structures that represent each block device with different life time rules, merge them into a single one. This also greatly simplifies the reference counting rules, as we can use the inode reference count as the main reference count for the new struct block_device, with the device model reference front ending it for device model interaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01f2fs: remove a few bd_part checksChristoph Hellwig
bd_part is never NULL for a block device in use by a file system, so remove the checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: switch disk_part_iter_* to use a struct block_deviceChristoph Hellwig
Switch the partition iter infrastructure to iterate over block_device references instead of hd_struct ones mostly used to get at the block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: pass a block_device to invalidate_partitionChristoph Hellwig
Pass the block_device actually needed instead of looking it up using bdget_disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: pass a block_device to blk_alloc_devtChristoph Hellwig
Pass the block_device actually needed instead of the hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove the partno field from struct hd_structChristoph Hellwig
Just use the bd_partno field in struct block_device everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: switch partition lookup to use struct block_deviceChristoph Hellwig
Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: allocate struct hd_struct as part of struct bdev_inodeChristoph Hellwig
Allocate hd_struct together with struct block_device to pre-load the lifetime rule changes in preparation of merging the two structures. Note that part0 was previously embedded into struct gendisk, but is a separate allocation now, and already points to the block_device instead of the hd_struct. The lifetime of struct gendisk is still controlled by the struct device embedded in the part0 hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: move the policy field to struct block_deviceChristoph Hellwig
Move the policy field to struct block_device and rename it to the more descriptive bd_read_only. Also turn the field into a bool as it is used as such. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: move make_it_fail to struct block_deviceChristoph Hellwig
Move the make_it_fail flag to struct block_device an turn it into a bool in preparation of killing struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: move holder_dir to struct block_deviceChristoph Hellwig
Move the holder_dir field to struct block_device in preparation for kill struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: move the partition_meta_info to struct block_deviceChristoph Hellwig
Move the partition_meta_info to struct block_device in preparation for killing struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: move the start_sect field to struct block_deviceChristoph Hellwig
Move the start_sect field to struct block_device in preparation of killing struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: move disk stat accounting to struct block_deviceChristoph Hellwig
Move the dkstats and stamp field to struct block_device in preparation of killing struct hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove the nr_sects field in struct hd_structChristoph Hellwig
Now that the hd_struct always has a block device attached to it, there is no need for having two size field that just get out of sync. Additionally the field in hd_struct did not use proper serialization, possibly allowing for torn writes. By only using the block_device field this problem also gets fixed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: initialize struct block_device in bdev_allocChristoph Hellwig
Don't play tricks with slab constructors as bdev structures tends to not get reused very much, and this makes the code a lot less error prone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: simplify part_to_diskChristoph Hellwig
Now that struct hd_struct has a block_device pointer use that to find the disk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: simplify the block device claiming interfaceChristoph Hellwig
Stop passing the whole device as a separate argument given that it can be trivially deducted and cleanup the !holder debug check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove ->bd_containsChristoph Hellwig
Now that each hd_struct has a reference to the corresponding block_device, there is no need for the bd_contains pointer. Add a bdev_whole() helper to look up the whole device block_device struture instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: simplify bdev/disk lookup in blkdev_getChristoph Hellwig
To simplify block device lookup and a few other upcoming areas, make sure that we always have a struct block_device available for each disk and each partition, and only find existing block devices in bdget. The only downside of this is that each device and partition uses a little more memory. The upside will be that a lot of code can be simplified. With that all we need to look up the block device is to lookup the inode and do a few sanity checks on the gendisk, instead of the separate lookup for the gendisk. For blk-cgroup which wants to access a gendisk without opening it, a new blkdev_{get,put}_no_open low-level interface is added to replace the previous get_gendisk use. Note that the change to look up block device directly instead of the two step lookup using struct gendisk causes a subtile change in behavior: accessing a non-existing partition on an existing block device can now cause a call to request_module. That call is harmless, and in practice no recent system will access these nodes as they aren't created by udev and static /dev/ setups are unusual. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove i_bdevChristoph Hellwig
Switch the block device lookup interfaces to directly work with a dev_t so that struct block_device references are only acquired by the blkdev_get variants (and the blk-cgroup special case). This means that we now don't need an extra reference in the inode and can generally simplify handling of struct block_device to keep the lookups contained in the core block layer code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Coly Li <colyli@suse.de> [bcache] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: opencode devcgroup_inode_permissionChristoph Hellwig
Just call devcgroup_check_permission to avoid various superflous checks and a double conversion of the access flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>