summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-02-13block: add a max_user_discard_sectors queue limitChristoph Hellwig
Add a new max_user_discard_sectors limit that mirrors max_user_sectors and stores the value that the user manually set. This now allows updates of the max_hw_discard_sectors to not worry about the user limit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: use queue_limits_commit_update in queue_max_sectors_storeChristoph Hellwig
Convert queue_max_sectors_store to use queue_limits_commit_update to check and update the max_sectors limit and freeze the queue before doing so to ensure we don't have requests in flight while changing the limits. Note that this removes the previously held queue_lock that doesn't protect against any other reader or writer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: add an API to atomically update queue limitsChristoph Hellwig
Add a new queue_limits_{start,commit}_update pair of functions that allows taking an atomic snapshot of queue limits, update it, and commit it if it passes validity checking. Also use the low-level validation helper to implement blk_set_default_limits instead of duplicating the initialization. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: decouple blk_set_stacking_limits from blk_set_default_limitsChristoph Hellwig
blk_set_stacking_limits uses very little from blk_set_default_limits. Open code these initializations in preparation for rewriting blk_set_default_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: refactor disk_update_readaheadChristoph Hellwig
Factor out a blk_apply_bdi_limits limits helper that can be used with an explicit queue_limits argument, which will be useful later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: move max_{open,active}_zones to struct queue_limitsChristoph Hellwig
The maximum number of open and active zones is a limit on the queue and should be places there so that we can including it in the upcoming queue limits batch update API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13drbd: fix function cast warnings in state machineArnd Bergmann
There are four state machines in drbd that use a common infrastructure, with a cast to an incompatible function type in REMEMBER_STATE_CHANGE that clang-16 now warns about: drivers/block/drbd/drbd_state.c:1632:3: error: cast from 'int (*)(struct sk_buff *, unsigned int, struct drbd_resource_state_change *, enum drbd_notification_type)' to 'typeof (last_func)' (aka 'int (*)(struct sk_buff *, unsigned int, void *, enum drbd_notification_type)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1632 | REMEMBER_STATE_CHANGE(notify_resource_state_change, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1633 | resource_state_change, NOTIFY_CHANGE); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/block/drbd/drbd_state.c:1619:17: note: expanded from macro 'REMEMBER_STATE_CHANGE' 1619 | last_func = (typeof(last_func))func; \ | ^~~~~~~~~~~~~~~~~~~~~~~ drivers/block/drbd/drbd_state.c:1641:4: error: cast from 'int (*)(struct sk_buff *, unsigned int, struct drbd_connection_state_change *, enum drbd_notification_type)' to 'typeof (last_func)' (aka 'int (*)(struct sk_buff *, unsigned int, void *, enum drbd_notification_type)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1641 | REMEMBER_STATE_CHANGE(notify_connection_state_change, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1642 | connection_state_change, NOTIFY_CHANGE); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Change these all to actually expect a void pointer to be passed, which matches the caller. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240213100354.457128-1-arnd@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13floppy: fix function pointer cast warningsArnd Bergmann
clang-16 complains about a control flow integrity (kcfi) violation casting between incompatible pointers: drivers/block/floppy.c:2001:11: error: cast from 'void (*)(void)' to 'done_f' (aka 'void (*)(int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] 2001 | .done = (done_f)empty | ^~~~~~~~~~~~~ Just add another empty function with the correct prototype as a workaround. The warning is for code that was added before the start of the normal git history, but I tracked it done to an early change in the reconstructed linux-history.git. Fixes: 598a477afe06 ("Import 1.1.41") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240213095918.455478-1-arnd@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12md: fix kmemleak of rdev->serialLi Nan
If kobject_add() is fail in bind_rdev_to_array(), 'rdev->serial' will be alloc not be freed, and kmemleak occurs. unreferenced object 0xffff88815a350000 (size 49152): comm "mdadm", pid 789, jiffies 4294716910 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc f773277a): [<0000000058b0a453>] kmemleak_alloc+0x61/0xe0 [<00000000366adf14>] __kmalloc_large_node+0x15e/0x270 [<000000002e82961b>] __kmalloc_node.cold+0x11/0x7f [<00000000f206d60a>] kvmalloc_node+0x74/0x150 [<0000000034bf3363>] rdev_init_serial+0x67/0x170 [<0000000010e08fe9>] mddev_create_serial_pool+0x62/0x220 [<00000000c3837bf0>] bind_rdev_to_array+0x2af/0x630 [<0000000073c28560>] md_add_new_disk+0x400/0x9f0 [<00000000770e30ff>] md_ioctl+0x15bf/0x1c10 [<000000006cfab718>] blkdev_ioctl+0x191/0x3f0 [<0000000085086a11>] vfs_ioctl+0x22/0x60 [<0000000018b656fe>] __x64_sys_ioctl+0xba/0xe0 [<00000000e54e675e>] do_syscall_64+0x71/0x150 [<000000008b0ad622>] entry_SYSCALL_64_after_hwframe+0x6c/0x74 Fixes: 963c555e75b0 ("md: introduce mddev_create/destroy_wb_pool for the change of member device") Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240208085556.2412922-1-linan666@huaweicloud.com
2024-02-12nvme: allow integrity when PI is not in first bytesKanchan Joshi
NVM command set 1.0 (or later) mandates PI to be in the last bytes of metadata. But this was not supported in the block-layer, and driver registered a nop profile. Since block-integrity can now handle flexible PI offset, change the driver to support this configuration. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240201130126.211402-4-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12block: support PI at non-zero offset within metadataKanchan Joshi
Block layer integrity processing assumes that protection information (PI) is placed in the first bytes of each metadata block. Remove this limitation and include the metadata before the PI in the calculation of the guard tag. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Chinmay Gameti <c.gameti@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240201130126.211402-3-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12block: refactor guard helpersKanchan Joshi
Allow computation using the existing guard value. This is a prep patch. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240201130126.211402-2-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12block: remove gfp_flags from blkdev_zone_mgmtJohannes Thumshirn
Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can drop the gfp_mask parameter from blkdev_zone_mgmt() as well as blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated(). Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-5-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12f2fs: guard blkdev_zone_mgmt with nofs scopeJohannes Thumshirn
Guard the calls to blkdev_zone_mgmt() with a memalloc_nofs scope. This helps us getting rid of the GFP_NOFS argument to blkdev_zone_mgmt(); Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-4-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12btrfs: zoned: call blkdev_zone_mgmt in nofs scopeJohannes Thumshirn
Add a memalloc_nofs scope around all calls to blkdev_zone_mgmt(). This allows us to further get rid of the GFP_NOFS argument for blkdev_zone_mgmt(). Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-3-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12dm: dm-zoned: guard blkdev_zone_mgmt with noio scopeJohannes Thumshirn
Guard the calls to blkdev_zone_mgmt() with a memalloc_noio scope. This helps us getting rid of the GFP_NOIO argument to blkdev_zone_mgmt(); Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-2-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12zonefs: pass GFP_KERNEL to blkdev_zone_mgmt() callJohannes Thumshirn
Pass GFP_KERNEL instead of GFP_NOFS to the blkdev_zone_mgmt() call in zonefs_zone_mgmt(). As as zonefs_zone_mgmt() and zonefs_inode_zone_mgmt() are never called from a place that can recurse back into the filesystem on memory reclaim, it is save to call blkdev_zone_mgmt() with GFP_KERNEL. Link: https://lore.kernel.org/all/ZZcgXI46AinlcBDP@casper.infradead.org/ Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-1-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-09s390/dasd: fix double module refcount decrementMiroslav Franc
Once the discipline is associated with the device, deleting the device takes care of decrementing the module's refcount. Doing it manually on this error path causes refcount to artificially decrease on each error while it should just stay the same. Fixes: c020d722b110 ("s390/dasd: fix panic during offline processing") Signed-off-by: Miroslav Franc <mfranc@suse.cz> Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240209124522.3697827-3-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-09s390/dasd: Improve ERP error messagesJan Höppner
Some ERP errors still share the same message format and only add different reason codes to it. These reason codes don't have any meaning anymore. Make the individual error messages more explicit and remove the reason codes altogether. Comments around the error messages are also removed as they provide no additional value anymore with more explicit messages. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240209124522.3697827-2-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08null_blk: add configfs variable shared_tagsShin'ichiro Kawasaki
Allow setting shared_tags through configfs, which could only be set as a module parameter. For that purpose, delay tag_set initialization from null_init() to null_add_dev(). Refer tag_set.ops as the flag to check if tag_set is initialized or not. The following parameters can not be set through configfs yet: timeout requeue init_hctx Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240130042134.2463659-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08block: Simplify the allocation of slab cachesKunwu Chan
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240131094323.146659-1-chentao@kylinos.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08block: optimise in irq bio put cachingPavel Begunkov
When enlisting a bio into ->free_list_irq we protect the list by disabling irqs. It's likely they're already disabled and performance of local_irq_{save,restore}() is decent, but it's not zero cost. Let's only use the irq cache when when we're serving a hard irq, which allows to remove local_irq_{save,restore}(), and fall back to bio_free() in all left cases. Profiles indicate that the bio_put() cost is reduced by ~3.5 times (1.76% -> 0.49%), and total throughput of a CPU bound benchmark improve by around 1% (t/io_uring with high QD and several drives). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/36d207540b7046c653cc16e5ff08fe7234b19f81.1707314970.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08block: extend bio caching to task contextPavel Begunkov
bio_put_percpu_cache() puts all non-iopoll bios into the irq-safe list, which entails disabling irqs. The overhead of that is not that bad when interrupts are already off but getting worse otherwise. We can optimise it when we're in the task context by using ->free_list directly just as the IOPOLL path does. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4774e1a0f905f96c63174b0f3e4f79f0d9b63246.1707314970.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08s390/dasd: Use dev_*() for device log messagesJan Höppner
All log messages in dasd.c use the printk variants of pr_*(). They all add the name of the affected device manually to the log message. This can be simplified by using the dev_*() variants of printk, which include the device information and make a separate call to dev_name() unnecessary. The KMSG_COMPONENT and the pr_fmt() definition can be dropped. Note that this removes the "dasd: " prefix from the one pr_info() call in dasd_init(). However, the log message already provides all relevant information. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240208164248.540985-10-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08s390/dasd: Remove PRINTK_HEADER and KMSG_COMPONENT definitionsJan Höppner
PRINTK_HEADER was mainly used to prefix log messages with the module name. Most components don't use this definition anymore. Either because there are no log messages being generated anymore, or pr_*() were replaced by dev_*(), which contains device and component information already. PRINTK_HEADER is also dropped in the function dasd_3990_erp_handle_match_erp() in dasd_3990_erp.c from a panic() call as panic() already provides all relevant information. KMSG_COMPONENT was mainly used to identify a component in a long gone kernel message catalog feature. Remove both definition since they're either not used or alternatives make the code slightly shorter and more readable. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240208164248.540985-9-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08s390/dasd: Remove %p format specifier from error messagesJan Höppner
Printing pointer in error messages doesn't add any value since the addresses are hashed. Remove the %p format specifier and adapt the error messages slightly. Replace %p with %px in ERP to get the actual addresses since ERP is used for debugging purposes only anyway. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240208164248.540985-8-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08s390/dasd: Use dev_err() over printk()Jan Höppner
To reduce the information required for the string generation in the sense dump functions, use the more concise dev_err() variant over printk(KERN_ERR, ...) to improve code readability. The dev_err() function provides the component and device name for free and the separate dev_name() calls as well as the PRINTK_HEADER can be dropped. Dropping PRINTK_HEADER removes the "dasd(eckd):" for all lines. Only the first line of a dev_err() call is prefixed with the component and device (e.g. "dasd-eckd 0.0.95d0:"). The format specifier for printed pointers is also changed to unhashed (%px) as this can help with debugging and servicing. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240208164248.540985-7-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08s390/dasd: Remove unused message logging macrosJan Höppner
The macros DEV_MESSAGE, MESSAGE, DEV_MESSAGE_LOG, and MESSAGE_LOG, are not used and there is no history anymore of any usage. Remove them. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240208164248.540985-6-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08s390/dasd: Move allocation error message to DBFJan Höppner
All error messages for a failling dasd_smalloc_request() call are logged via DBF, except one. There is no value in logging this particular allocation failure via dev_err(). Move the message to DBF, too, to be in line with the rest. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240208164248.540985-5-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08s390/dasd: Remove unnecessary errorstring generationJan Höppner
In quite a few cases an errorstring is generated using snprintf() before it's passed to dev_err(). This indirection is unnecessary and all information can simply be passed directly to dev_err() instead. The errrorstring and ERRORLENGTH definitions are removed entirely. While at it, rephrase the error messages to provide more context where possible. Also, fix a few incorrectly used format specifier (e.g. %x02 -> %02x) in those messages. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240208164248.540985-4-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08s390/dasd: Use sysfs_emit() over sprintf()Jan Höppner
sysfs_emit() should be used in show() functions. There are still a couple of functions that use sprintf(). Replace outstanding occurrences of sprintf() in all show() functions with sysfs_emit(). Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240208164248.540985-3-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08s390/dasd: Simplify uid string generationJan Höppner
There are two variants of the device uid string. One containing the virtual device unit information table (vduit) identifying the device as a virtual device located on a real device in a z/VM environment. The other variant does not contain those additional information. Simplify the string generation with a shorter check of an existing vduit embedded in the snprintf() calls. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20240208164248.540985-2-sth@linux.ibm.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-06block: rbd: make rbd_bus_type constRicardo B. Marliere
Now that the driver core can properly handle constant struct bus_type, move the rbd_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20240204-bus_cleanup-block-v1-1-fc77afd8d7cc@marliere.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05md/multipath: Remove md-multipath.hSong Liu
md-multipath is already deprecated. Remove the header file. Signed-off-by: Song Liu <song@kernel.org>
2024-02-05md/linear: Get rid of md-linear.hMarc Zyngier
Given that 849d18e27be9 ("md: Remove deprecated CONFIG_MD_LINEAR") killed the linear flavour of MD, it seems only logical to drop the leftover include file that used to come with it. I also feel that it should be my own privilege to remove my 30 year old attempt at writing kernel code ;-). RIP! Cc: Song Liu <song@kernel.org> Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240201224549.750644-1-maz@kernel.org
2024-02-05md: use RCU lock to protect traversal in md_spares_need_change()Li Lingfeng
Since md_start_sync() will be called without the protect of mddev_lock, and it can run concurrently with array reconfiguration, traversal of rdev in it should be protected by RCU lock. Commit bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration") added md_spares_need_change() to md_start_sync(), casusing use of rdev without any protection. Fix this by adding RCU lock in md_spares_need_change(). Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration") Cc: stable@vger.kernel.org # 6.7+ Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240104133629.1277517-1-lilingfeng@huaweicloud.com
2024-02-05md: get rdev->mddev with READ_ONCE()Li Lingfeng
Users may get rdev->mddev by sysfs while rdev is releasing. So use both READ_ONCE() and WRITE_ONCE() to prevent load/store tearing and to read/write mddev atomically. Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231229070500.3602712-1-lilingfeng@huaweicloud.com
2024-02-05md: remove redundant md_wakeup_thread()Yu Kuai
On the one hand, mddev_unlock() will call md_wakeup_thread() unconditionally; on the other hand, md_check_recovery() can't make progress if 'reconfig_mutex' can't be grabbed. Hence, it really doesn't make sense to wake up daemon thread while 'reconfig_mutex' is still grabbed. Remove all the md_wakup_thread() for 'mddev->thread' while 'reconfig_mtuex' is still grabbed. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231228125553.2697765-3-yukuai1@huaweicloud.com
2024-02-05md: remove redundant check of 'mddev->sync_thread'Yu Kuai
The lifetime of sync_thread: 1) Set MD_RECOVERY_NEEDED and wake up daemon thread (by ioctl/sysfs or other events); 2) Daemon thread woke up, md_check_recovery() found that MD_RECOVERY_NEEDED is set: a) try to grab reconfig_mutex; b) set MD_RECOVERY_RUNNING; c) clear MD_RECOVERY_NEEDED, and then queue sync_work; 3) md_start_sync() choose sync_action, then register sync_thread; 4) md_do_sync() is done, set MD_RECOVERY_DONE and wake up daemon thread; 5) Daemon thread woke up, md_check_recovery() found that MD_RECOVERY_DONE is set: a) try to grab reconfig_mutex; b) unregister sync_thread; c) clear MD_RECOVERY_RUNNING and MD_RECOVERY_DONE; Hence there is no such case that MD_RECOVERY_RUNNING is not set, while sync_thread is registered. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231228125553.2697765-2-yukuai1@huaweicloud.com
2024-02-05blk-throttle: Eliminate redundant checks for data directionTang Yizhou
After calling throtl_peek_queued(), the data direction can be determined so there is no need to call bio_data_dir() to check the direction again. Signed-off-by: Tang Yizhou <yizhou.tang@shopee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240123081248.3752878-1-yizhou.tang@shopee.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05block: update cached timestamp post schedule/preemptionJens Axboe
Mark the task as having a cached timestamp when set assign it, so we can efficiently check if it needs updating post being scheduled back in. This covers both the actual schedule out case, which would've flushed the plug, and the preemption case which doesn't touch the plugged requests (for many reasons, one of them being then we'd need to have preemption disabled around plug state manipulation). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05block: cache current nsec time in struct blk_plugJens Axboe
Querying the current time is the most costly thing we do in the block layer per IO, and depending on kernel config settings, we may do it many times per IO. None of the callers actually need nsec granularity. Take advantage of that by caching the current time in the plug, with the assumption here being that any time checking will be temporally close enough that the slight loss of precision doesn't matter. If the block plug gets flushed, eg on preempt or schedule out, then we invalidate the cached clock. On a basic peak IOPS test case with iostats enabled, this changes the performance from: IOPS=108.41M, BW=52.93GiB/s, IOS/call=31/31 IOPS=108.43M, BW=52.94GiB/s, IOS/call=32/32 IOPS=108.29M, BW=52.88GiB/s, IOS/call=31/32 IOPS=108.35M, BW=52.91GiB/s, IOS/call=32/32 IOPS=108.42M, BW=52.94GiB/s, IOS/call=31/31 IOPS=108.40M, BW=52.93GiB/s, IOS/call=32/32 IOPS=108.31M, BW=52.89GiB/s, IOS/call=32/31 to IOPS=118.79M, BW=58.00GiB/s, IOS/call=31/32 IOPS=118.62M, BW=57.92GiB/s, IOS/call=31/31 IOPS=118.80M, BW=58.01GiB/s, IOS/call=32/31 IOPS=118.78M, BW=58.00GiB/s, IOS/call=32/32 IOPS=118.69M, BW=57.95GiB/s, IOS/call=32/31 IOPS=118.62M, BW=57.92GiB/s, IOS/call=32/31 IOPS=118.63M, BW=57.92GiB/s, IOS/call=31/32 which is more than a 9% improvement in performance. Looking at perf diff, we can see a huge reduction in time overhead: 10.55% -9.88% [kernel.vmlinux] [k] read_tsc 1.31% -1.22% [kernel.vmlinux] [k] ktime_get Note that since this relies on blk_plug for the caching, it's only applicable to the issue side. But this is where most of the time calls happen anyway. On the completion side, cached time stamping is done with struct io_comp patch, as long as the driver supports it. It's also worth noting that the above testing doesn't enable any of the higher cost CPU items on the block layer side, like wbt, cgroups, iocost, etc, which all would add additional time querying and hence overhead. IOW, results would likely look even better in comparison with those enabled, as distros would do. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05block: add blk_time_get_ns() and blk_time_get() helpersJens Axboe
Convert any user of ktime_get_ns() to use blk_time_get_ns(), and ktime_get() to blk_time_get(), so we have a unified API for querying the current time in nanoseconds or as ktime. No functional changes intended, this patch just wraps ktime_get_ns() and ktime_get() with a block helper. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05block: move cgroup time handling code into blk.hJens Axboe
In preparation for moving time keeping into blk.h, move the cgroup related code for timestamps in here too. This will help avoid a circular dependency, and also moves it into a more appropriate header as this one is private to the block layer code. Leave struct bio_issue in blk_types.h as it's a proper time definition. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05blk-mq: special case cached requests lessChristoph Hellwig
Share the main merge / split / integrity preparation code between the cached request vs newly allocated request cases, and add comments explaining the cached request handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240124092658.2258309-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05blk-mq: introduce a blk_mq_peek_cached_request helperChristoph Hellwig
Add a new helper to check if there is suitable cached request in blk_mq_submit_bio. This removes open coded logic in blk_mq_submit_bio and moves some checks that so far are in blk_mq_use_cached_rq to be performed earlier. This avoids the case where we first do check with the cached request but then later end up allocating a new one anyway and need to grab a queue reference. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240124092658.2258309-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-05blk-mq: move blk_mq_attempt_bio_merge out blk_mq_get_new_requestsChristoph Hellwig
blk_mq_attempt_bio_merge has nothing to do with allocating a new request, it avoids allocating a new request. Move the call out of blk_mq_get_new_requests and into the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240124092658.2258309-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-04Linux 6.8-rc3Linus Torvalds
2024-02-04Merge tag 'for-linus-6.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous bug fixes and cleanups in ext4's multi-block allocator and extent handling code" * tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type ext4: make ext4_map_blocks() distinguish delalloc only extent ext4: add a hole extent entry in cache after punch ext4: correct the hole length returned by ext4_map_blocks() ext4: convert to exclusive lock while inserting delalloc extents ext4: refactor ext4_da_map_blocks() ext4: remove 'needed' in trace_ext4_discard_preallocations ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations ext4: remove unused return value of ext4_mb_release_group_pa ext4: remove unused return value of ext4_mb_release_inode_pa ext4: remove unused return value of ext4_mb_release ext4: remove unused ext4_allocation_context::ac_groups_considered ext4: remove unneeded return value of ext4_mb_release_context ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*() ext4: remove unused return value of __mb_check_buddy ext4: mark the group block bitmap as corrupted before reporting an error ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() ...
2024-02-04Merge tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Five smb3 client fixes, mostly multichannel related: - four multichannel fixes including fix for channel allocation when multiple inactive channels, fix for unneeded race in channel deallocation, correct redundant channel scaling, and redundant multichannel disabling scenarios - add warning if max compound requests reached" * tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: increase number of PDUs allowed in a compound request cifs: failure to add channel on iface should bump up weight cifs: do not search for channel if server is terminating cifs: avoid redundant calls to disable multichannel cifs: make sure that channel scaling is done only once