summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)Author
2025-01-25zram: cond_resched() in writeback loopSergey Senozhatsky
zram writeback is a costly operation, because every target slot (unless ZRAM_HUGE) is decompressed before it gets written to a backing device. The writeback to a backing device uses submit_bio_wait() which may look like a rescheduling point. However, if the backing device has BD_HAS_SUBMIT_BIO bit set __submit_bio() calls directly disk->fops->submit_bio(bio) on the backing device and so when submit_bio_wait() calls blk_wait_io() the I/O is already done. On such systems we effective end up in a loop for_each (target slot) { decompress(slot) __submit_bio() disk->fops->submit_bio(bio) } Which on PREEMPT_NONE systems triggers watchdogs (since there are no explicit rescheduling points). Add cond_resched() to the zram writeback loop. Link: https://lkml.kernel.org/r/20241218063513.297475-8-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: use zram_read_from_zspool() in writebackSergey Senozhatsky
We only can read pages from zspool in writeback, zram_read_page() is not really right in that context not only because it's a more generic function that handles ZRAM_WB pages, but also because it requires us to unlock slot between slot flag check and actual page read. Use zram_read_from_zspool() instead and do slot flags check and page read under the same slot lock. Link: https://lkml.kernel.org/r/20241218063513.297475-7-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: factor out different page types readSergey Senozhatsky
Similarly to write, split the page read code into ZRAM_HUGE read, ZRAM_SAME read and compressed page read to simplify the code. Link: https://lkml.kernel.org/r/20241218063513.297475-6-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: factor out ZRAM_HUGE writeSergey Senozhatsky
zram_write_page() handles: ZRAM_SAME pages (which was already factored out) stores, regular page stores and ZRAM_HUGE pages stores. ZRAM_HUGE handling adds a significant amount of complexity. Instead, we can handle ZRAM_HUGE in a separate function. This allows us to simplify zs_handle allocations slow-path, as it now does not handle ZRAM_HUGE case. ZRAM_HUGE zs_handle allocation, on the other hand, can now drop __GFP_KSWAPD_RECLAIM because we handle ZRAM_HUGE in preemptible context (outside of local-lock scope). Link: https://lkml.kernel.org/r/20241218063513.297475-5-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: factor out ZRAM_SAME writeSergey Senozhatsky
Handling of ZRAM_SAME now uses a goto to the final stages of zram_write_page() plus it introduces a branch and flags variable, which is not making the code any simpler. In reality, we can handle ZRAM_SAME immediately when we detect such pages and remove a goto and a branch. Factor out ZRAM_SAME handling into a separate routine to simplify zram_write_page(). Link: https://lkml.kernel.org/r/20241218063513.297475-4-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: remove entry element memberSergey Senozhatsky
Element is in the same anon union as handle and hence holds the same value, which makes code below sort of confusing handle = zram_get_handle() if (!handle) element = zram_get_element() Element doesn't really simplify the code, let's just remove it. We already re-purpose handle to store the block id a written back page. Link: https://lkml.kernel.org/r/20241218063513.297475-3-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25zram: free slot memory early during writeSergey Senozhatsky
Patch series "zram: split page type read/write handling", v2. This is a subset of [1] series which contains only fixes and improvements (no new features, as ZRAM_HUGE split is still under consideration). The motivation for factoring out is that zram_write_page() gets more and more complex all the time, because it tries to handle too many scenarios: ZRAM_SAME store, ZRAM_HUGE store, compress page store with zs_malloc allocation slowpath and conditional recompression, etc. Factor those out and make things easier to handle. Addition of cond_resched() is simply a fix, I can trigger watchdog from zram writeback(). And early slot free is just a reasonable thing to do. [1] https://lore.kernel.org/linux-kernel/20241119072057.3440039-1-senozhatsky@chromium.org This patch (of 7): In the current implementation entry's previously allocated memory is released in the very last moment, when we already have allocated a new memory for new data. This, basically, temporarily increases memory usage for no good reason. For example, consider the case when both old (stale) and new entry data are incompressible so such entry will temporarily use two physical pages - one for stale (old) data and one for new data. We can release old memory as soon as we get a write request for entry. Link: https://lkml.kernel.org/r/20241218063513.297475-1-senozhatsky@chromium.org Link: https://lkml.kernel.org/r/20241218063513.297475-2-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-20Merge tag 'for-6.14/block-20250118' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - NVMe pull requests via Keith: - Target support for PCI-Endpoint transport (Damien) - TCP IO queue spreading fixes (Sagi, Chaitanya) - Target handling for "limited retry" flags (Guixen) - Poll type fix (Yongsoo) - Xarray storage error handling (Keisuke) - Host memory buffer free size fix on error (Francis) - MD pull requests via Song: - Reintroduce md-linear (Yu Kuai) - md-bitmap refactor and fix (Yu Kuai) - Replace kmap_atomic with kmap_local_page (David Reaver) - Quite a few queue freeze and debugfs deadlock fixes Ming introduced lockdep support for this in the 6.13 kernel, and it has (unsurprisingly) uncovered quite a few issues - Use const attributes for IO schedulers - Remove bio ioprio wrappers - Fixes for stacked device atomic write support - Refactor queue affinity helpers, in preparation for better supporting isolated CPUs - Cleanups of loop O_DIRECT handling - Cleanup of BLK_MQ_F_* flags - Add rotational support for null_blk - Various fixes and cleanups * tag 'for-6.14/block-20250118' of git://git.kernel.dk/linux: (106 commits) block: Don't trim an atomic write block: Add common atomic writes enable flag md/md-linear: Fix a NULL vs IS_ERR() bug in linear_add() block: limit disk max sectors to (LLONG_MAX >> 9) block: Change blk_stack_atomic_writes_limits() unit_min check block: Ensure start sector is aligned for stacking atomic writes blk-mq: Move more error handling into blk_mq_submit_bio() block: Reorder the request allocation code in blk_mq_submit_bio() nvme: fix bogus kzalloc() return check in nvme_init_effects_log() md/md-bitmap: move bitmap_{start, end}write to md upper layer md/raid5: implement pers->bitmap_sector() md: add a new callback pers->bitmap_sector() md/md-bitmap: remove the last parameter for bimtap_ops->endwrite() md/md-bitmap: factor behind write counters out from bitmap_{start/end}write() md: Replace deprecated kmap_atomic() with kmap_local_page() md: reintroduce md-linear partitions: ldm: remove the initial kernel-doc notation blk-cgroup: rwstat: fix kernel-doc warnings in header file blk-cgroup: fix kernel-doc warnings in header file nbd: fix partial sending ...
2025-01-13nbd: fix partial sendingMing Lei
nbd driver sends request header and payload with multiple call of sock_sendmsg, and partial sending can't be avoided. However, nbd driver returns BLK_STS_RESOURCE to block core in this situation. This way causes one issue: request->tag may change in the next run of nbd_queue_rq(), but the original old tag has been sent as part of header cookie, this way confuses nbd driver reply handling, since the real request can't be retrieved any more with the obsolete old tag. Fix it by retrying sending directly in per-socket work function, meantime return BLK_STS_OK to block layer core. Cc: vincent.chen@sifive.com Cc: Leon Schuermann <leon@is.currently.online> Cc: Bart Van Assche <bvanassche@acm.org> Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Link: https://lore.kernel.org/r/20241029011941.153037-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-13Merge 6.13-rc7 into driver-core-nextGreg Kroah-Hartman
We need the debugfs / driver-core fixes in here as well for testing and to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-12xen/blkback: convert timeouts to secs_to_jiffies()Easwar Hariharan
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication. This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @@ constant C; @@ - msecs_to_jiffies(C * 1000) + secs_to_jiffies(C) @@ constant C; @@ - msecs_to_jiffies(C * MSEC_PER_SEC) + secs_to_jiffies(C) Link: https://lkml.kernel.org/r/20241210-converge-secs-to-jiffies-v3-12-ddfefd7e9f2a@linux.microsoft.com Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Daniel Mack <daniel@zonque.org> Cc: David Airlie <airlied@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Florian Fainelli <florian.fainelli@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jack Wang <jinpu.wang@cloud.ionos.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: James Smart <james.smart@broadcom.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jeff Johnson <jjohnson@kernel.org> Cc: Jeff Johnson <quic_jjohnson@quicinc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jeroen de Borst <jeroendb@google.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kalle Valo <kvalo@kernel.org> Cc: Louis Peens <louis.peens@corigine.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nicolas Palix <nicolas.palix@imag.fr> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: Ofir Bitton <obitton@habana.ai> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Praveen Kaligineedi <pkaligineedi@google.com> Cc: Ray Jui <rjui@broadcom.com> Cc: Robert Jarzmik <robert.jarzmik@free.fr> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Scott Branden <sbranden@broadcom.com> Cc: Shailend Chand <shailend@google.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Simon Horman <horms@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Takashi Iwai <tiwai@suse.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12zram: fix potential UAF of zram tableKairui Song
If zram_meta_alloc failed early, it frees allocated zram->table without setting it NULL. Which will potentially cause zram_meta_free to access the table if user reset an failed and uninitialized device. Link: https://lkml.kernel.org/r/20250107065446.86928-1-ryncsn@gmail.com Fixes: 74363ec674cb ("zram: fix uninitialized ZRAM not releasing backing device") Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-10loop: remove the use_dio field in struct loop_deviceChristoph Hellwig
This field duplicate the LO_FLAGS_DIRECT_IO flag in lo_flags. Remove it to have a single source of truth about using direct I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: don't freeze the queue in loop_update_dioChristoph Hellwig
All callers of loop_update_dio except for loop_configure already have the queue frozen, and loop_configure works on an unbound device. Remove the superfluous recursive freezing in loop_update_dio and add asserts for the locking and freezing state instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: allow loop_set_status to re-enable direct I/OChristoph Hellwig
Unlike all other calls of (__)loop_update_dio, loop_set_status never looks at the O_DIRECT flag of the backing file, and thus doesn't re-enable direct I/O on an O_DIRECT backing file if e.g. the new block size would allow it. Fix that and remove the need for the separate __loop_update_dio flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: open code the direct I/O flag update in loop_set_dioChristoph Hellwig
loop_set_dio is different from the other (__)loop_update_dio callers in that it doesn't take any implicit conditions into account and wants to update the direct I/O flag to the user passed in value and fail if that can't be done. Open code the logic here to prepare for simplifying the other direct I/O flag updates and to make the error handling less convoluted. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: only write back pagecache when starting to to use direct I/OChristoph Hellwig
There is no point in doing an fdatasync to write out pages when switching away from direct I/O, as there won't be any. The writeback is only needed when switching to direct I/O, which would have to invalidate the pagecache less efficiently from the I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: create a lo_can_use_dio helperChristoph Hellwig
Factor out a part of __loop_update_dio in preparation for further refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: update commands in loop_set_status still referring to transfersChristoph Hellwig
The concept of transfers is gone since commit 47e9624616c8 ("block: remove support for cryptoloop and the xor transfer"). Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: move updating lo_flags out of loop_set_status_from_infoChristoph Hellwig
While loop_configure simplify assigns the flags passed in by userspace, loop_set_status only looks at the two changeable flags, and currently has to do a complicate dance to implement that. Move assign lo->lo_flags out of loop_set_status_from_info into the callers and thus drastically simplify the lo_flags handling in loop_set_status. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250110073750.1582447-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: fix queue freeze vs limits lock orderChristoph Hellwig
Match the locking order used by the core block code by only freezing the queue after taking the limits lock using the queue_limits_commit_update_frozen helper and document the callers that do not freeze the queue at all. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250110054726.1499538-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10loop: refactor queue limits updatesChristoph Hellwig
Replace loop_reconfigure_limits with a slightly less encompassing loop_update_limits that expects the caller to acquire and commit the queue limits to prepare for sorting out the freeze vs limits lock ordering. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://lore.kernel.org/r/20250110054726.1499538-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10nbd: fix queue freeze vs limits lock orderChristoph Hellwig
Match the locking order used by the core block code by only freezing the queue after taking the limits lock using the queue_limits_commit_update_frozen helper. This also allows removes the need for the separate __nbd_set_size helper, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250110054726.1499538-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10block: add a queue_limits_commit_update_frozen helperChristoph Hellwig
Add a helper that freezes the queue, updates the queue limits and unfreezes the queue and convert all open coded versions of that to the new helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250110054726.1499538-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-06nbd: don't allow reconnect after disconnectYu Kuai
Following process can cause nbd_config UAF: 1) grab nbd_config temporarily; 2) nbd_genl_disconnect() flush all recv_work() and release the initial reference: nbd_genl_disconnect nbd_disconnect_and_put nbd_disconnect flush_workqueue(nbd->recv_workq) if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF, ...)) nbd_config_put -> due to step 1), reference is still not zero 3) nbd_genl_reconfigure() queue recv_work() again; nbd_genl_reconfigure config = nbd_get_config_unlocked(nbd) if (!config) -> succeed if (!test_bit(NBD_RT_BOUND, ...)) -> succeed nbd_reconnect_socket queue_work(nbd->recv_workq, &args->work) 4) step 1) release the reference; 5) Finially, recv_work() will trigger UAF: recv_work nbd_config_put(nbd) -> nbd_config is freed atomic_dec(&config->recv_threads) -> UAF Fix the problem by clearing NBD_RT_BOUND in nbd_genl_disconnect(), so that nbd_genl_reconfigure() will fail. Fixes: b7aa3d39385d ("nbd: add a reconfigure netlink command") Reported-by: syzbot+6b0df248918b92c33e6a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/675bfb65.050a0220.1a2d0d.0006.GAE@google.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250103092859.3574648-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-06block: remove BLK_MQ_F_NO_SCHEDChristoph Hellwig
The only queues that really can't support a scheduler are those that do not have a gendisk associated with them, and thus can't be used for non-passthrough commands. In addition to those null_blk can optionally set the flag, which is a bad odd. Replace the null_blk usage with BLK_MQ_F_NO_SCHED_BY_DEFAULT to keep the expected semantics and then remove BLK_MQ_F_NO_SCHED as the non-disk queues never call into elevator_init_mq or blk_register_queue which adds the sysfs attributes. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250106083531.799976-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-03ps3disk: Do not use dev->bounce_size before it is setGeert Uytterhoeven
dev->bounce_size is only initialized after it is used to set the queue limits. Fix this by using BOUNCE_SIZE instead. Fixes: a7f18b74dbe17162 ("ps3disk: pass queue_limits to blk_mq_alloc_disk") Reported-by: Philipp Hortmann <philipp.g.hortmann@gmail.com> Closes: https://lore.kernel.org/39256db9-3d73-4e86-a49b-300dfd670212@gmail.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/06988f959ea6885b8bd7fb3b9059dd54bc6bbad7.1735894216.git.geert+renesas@glider.be Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-03driver core: Constify API device_find_child() and adapt for various usagesZijun Hu
Constify the following API: struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); To : struct device *device_find_child(struct device *dev, const void *data, device_match_t match); typedef int (*device_match_t)(struct device *dev, const void *data); with the following reasons: - Protect caller's match data @*data which is for comparison and lookup and the API does not actually need to modify @*data. - Make the API's parameters (@match)() and @data have the same type as all of other device finding APIs (bus|class|driver)_find_device(). - All kinds of existing device match functions can be directly taken as the API's argument, they were exported by driver core. Constify the API and adapt for various existing usages. BTW, various subsystem changes are squashed into this commit to meet 'git bisect' requirement, and this commit has the minimal and simplest changes to complement squashing shortcoming, and that may bring extra code improvement. Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-28Merge tag 'block-6.13-20241228' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fix from Jens Axboe: "Just a single fix for ublk setup error handling" * tag 'block-6.13-20241228' of git://git.kernel.dk/linux: ublk: detach gendisk from ublk device if add_disk() fails
2024-12-26ublk: detach gendisk from ublk device if add_disk() failsMing Lei
Inside ublk_abort_requests(), gendisk is grabbed for aborting all inflight requests. And ublk_abort_requests() is called when exiting the uring context or handling timeout. If add_disk() fails, the gendisk may have been freed when calling ublk_abort_requests(), so use-after-free can be caused when getting disk's reference in ublk_abort_requests(). Fixes the bug by detaching gendisk from ublk device if add_disk() fails. Fixes: bd23f6c2c2d0 ("ublk: quiesce request queue when aborting queue") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241225110640.351531-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: remove BLK_MQ_F_SHOULD_MERGEChristoph Hellwig
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely process passthrough commands (bsg-lib, ufs tmf, various nvme admin queues) and thus don't even check the flag. Remove it to simplify the driver interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23virtio: blk/scsi: replace blk_mq_virtio_map_queues with blk_mq_map_hw_queuesDaniel Wagner
Replace all users of blk_mq_virtio_map_queues with the more generic blk_mq_map_hw_queues. This in preparation to retire blk_mq_virtio_map_queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-7-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23null_blk: Remove accesses to page->indexMatthew Wilcox (Oracle)
Use page->private to store the index instead of page->index. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241216160849.31739-1-willy@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: rnull: Initialize the module in placeBenoît du Garreau
Using `InPlaceModule` avoids an allocation and an indirection. Signed-off-by: Benoît du Garreau <benoit@dugarreau.fr> Acked-by: Andreas Hindborg <a.hindborg@kernel.org> Link: https://lore.kernel.org/r/20241204-rnull_in_place-v1-1-efe3eafac9fb@dugarreau.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: Delete bio_set_prio()John Garry
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro bio_set_prio() does nothing but set bio->bi_ioprio. All other places just set bio->bi_ioprio directly, so replace bio_set_prio() remaining callsites with setting bio->bi_ioprio directly and delete that macro. Signed-off-by: John Garry <john.g.garry@oracle.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241202111957.2311683-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23null_blk: Add rotational feature supportDamien Le Moal
To facilitate testing of kernel functions related to the rotational feature (BLK_FEAT_ROTATIONAL) of a block device (e.g. NVMe rotational bit support), add the rotational boolean configfs attribute and module parameter to the null_blk driver. If set, a null block device will report being a rotational device through it queue limits features with the BLK_FEAT_ROTATIONAL flag. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20241126000956.95983-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-18zram: fix uninitialized ZRAM not releasing backing deviceKairui Song
Setting backing device is done before ZRAM initialization. If we set the backing device, then remove the ZRAM module without initializing the device, the backing device reference will be leaked and the device will be hold forever. Fix this by always reset the ZRAM fully on rmmod or reset store. Link: https://lkml.kernel.org/r/20241209165717.94215-3-ryncsn@gmail.com Fixes: 013bf95a83ec ("zram: add interface to specif backing device") Signed-off-by: Kairui Song <kasong@tencent.com> Reported-by: Desheng Wu <deshengwu@tencent.com> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18zram: refuse to use zero sized block device as backing deviceKairui Song
Patch series "zram: fix backing device setup issue", v2. This series fixes two bugs of backing device setting: - ZRAM should reject using a zero sized (or the uninitialized ZRAM device itself) as the backing device. - Fix backing device leaking when removing a uninitialized ZRAM device. This patch (of 2): Setting a zero sized block device as backing device is pointless, and one can easily create a recursive loop by setting the uninitialized ZRAM device itself as its own backing device by (zram0 is uninitialized): echo /dev/zram0 > /sys/block/zram0/backing_dev It's definitely a wrong config, and the module will pin itself, kernel should refuse doing so in the first place. By refusing to use zero sized device we avoided misuse cases including this one above. Link: https://lkml.kernel.org/r/20241209165717.94215-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20241209165717.94215-2-ryncsn@gmail.com Fixes: 013bf95a83ec ("zram: add interface to specif backing device") Signed-off-by: Kairui Song <kasong@tencent.com> Reported-by: Desheng Wu <deshengwu@tencent.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05virtio-blk: don't keep queue frozen during system suspendMing Lei
Commit 4ce6e2db00de ("virtio-blk: Ensure no requests in virtqueues before deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's PM callbacks. And the motivation is to drain inflight IOs before suspending. block layer's queue freeze looks very handy, but it is also easy to cause deadlock, such as, any attempt to call into bio_queue_enter() may run into deadlock if the queue is frozen in current context. There are all kinds of ->suspend() called in suspend context, so keeping queue frozen in the whole suspend context isn't one good idea. And Marek reported lockdep warning[1] caused by virtio-blk's freeze queue in virtblk_freeze(). [1] https://lore.kernel.org/linux-block/ca16370e-d646-4eee-b9cc-87277c89c43c@samsung.com/ Given the motivation is to drain in-flight IOs, it can be done by calling freeze & unfreeze, meantime restore to previous behavior by keeping queue quiesced during suspend. Cc: Yi Sun <yi.sun@unisoc.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: virtualization@lists.linux.dev Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20241112125821.1475793-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-03block: rnull: add missing MODULE_DESCRIPTIONFUJITA Tomonori
Add the missing description to fix the following warning: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/block/rnull_mod.o Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Acked-by: Andreas Hindborg <a.hindborg@kernel.org> Link: https://lore.kernel.org/r/20241130094521.193924-1-fujita.tomonori@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-01Get rid of 'remove_new' relic from platform driver structLinus Torvalds
The continual trickle of small conversion patches is grating on me, and is really not helping. Just get rid of the 'remove_new' member function, which is just an alias for the plain 'remove', and had a comment to that effect: /* * .remove_new() is a relic from a prototype conversion of .remove(). * New drivers are supposed to implement .remove(). Once all drivers are * converted to not use .remove_new any more, it will be dropped. */ This was just a tree-wide 'sed' script that replaced '.remove_new' with '.remove', with some care taken to turn a subsequent tab into two tabs to make things line up. I did do some minimal manual whitespace adjustment for places that used spaces to line things up. Then I just removed the old (sic) .remove_new member function, and this is the end result. No more unnecessary conversion noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-30Merge tag 'block-6.13-20242901' of git://git.kernel.dk/linuxLinus Torvalds
Pull more block updates from Jens Axboe: - NVMe pull request via Keith: - Use correct srcu list traversal (Breno) - Scatter-gather support for metadata (Keith) - Fabrics shutdown race condition fix (Nilay) - Persistent reservations updates (Guixin) - Add the required bits for MD atomic write support for raid0/1/10 - Correct return value for unknown opcode in ublk - Fix deadlock with zone revalidation - Fix for the io priority request vs bio cleanups - Use the correct unsigned int type for various limit helpers - Fix for a race in loop - Cleanup blk_rq_prep_clone() to prevent uninit-value warning and make it easier for actual humans to read - Fix potential UAF when iterating tags - A few fixes for bfq-iosched UAF issues - Fix for brd discard not decrementing the allocated page count - Various little fixes and cleanups * tag 'block-6.13-20242901' of git://git.kernel.dk/linux: (36 commits) brd: decrease the number of allocated pages which discarded block, bfq: fix bfqq uaf in bfq_limit_depth() block: Don't allow an atomic write be truncated in blkdev_write_iter() mq-deadline: don't call req_get_ioprio from the I/O completion handler block: Prevent potential deadlock in blk_revalidate_disk_zones() block: Remove extra part pointer NULLify in blk_rq_init() nvme: tuning pr code by using defined structs and macros nvme: introduce change ptpl and iekey definition block: return bool from get_disk_ro and bdev_read_only block: remove a duplicate definition for bdev_read_only block: return bool from blk_rq_aligned block: return unsigned int from blk_lim_dma_alignment_and_pad block: return unsigned int from queue_dma_alignment block: return unsigned int from bdev_io_opt block: req->bio is always set in the merge code block: don't bother checking the data direction for merges block: blk-mq: fix uninit-value in blk_rq_prep_clone and refactor Revert "block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()" md/raid10: Atomic write support md/raid1: Atomic write support ...
2024-11-29brd: decrease the number of allocated pages which discardedZhang Xianwei
The number of allocated pages which discarded will not decrease. Fix it. Fixes: 9ead7efc6f3f ("brd: implement discard support") Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241128170056565nPKSz2vsP8K8X2uk2iaDG@zte.com.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-26Merge tag 'rust-6.13' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Enable a series of lints, including safety-related ones, e.g. the compiler will now warn about missing safety comments, as well as unnecessary ones. How safety documentation is organized is a frequent source of review comments, thus having the compiler guide new developers on where they are expected (and where not) is very nice. - Start using '#[expect]': an interesting feature in Rust (stabilized in 1.81.0) that makes the compiler warn if an expected warning was _not_ emitted. This is useful to avoid forgetting cleaning up locally ignored diagnostics ('#[allow]'s). - Introduce '.clippy.toml' configuration file for Clippy, the Rust linter, which will allow us to tweak its behaviour. For instance, our first use cases are declaring a disallowed macro and, more importantly, enabling the checking of private items. - Lints-related fixes and cleanups related to the items above. - Migrate from 'receiver_trait' to 'arbitrary_self_types': to get the kernel into stable Rust, one of the major pieces of the puzzle is the support to write custom types that can be used as 'self', i.e. as receivers, since the kernel needs to write types such as 'Arc' that common userspace Rust would not. 'arbitrary_self_types' has been accepted to become stable, and this is one of the steps required to get there. - Remove usage of the 'new_uninit' unstable feature. - Use custom C FFI types. Includes a new 'ffi' crate to contain our custom mapping, instead of using the standard library 'core::ffi' one. The actual remapping will be introduced in a later cycle. - Map '__kernel_{size_t,ssize_t,ptrdiff_t}' to 'usize'/'isize' instead of 32/64-bit integers. - Fix 'size_t' in bindgen generated prototypes of C builtins. - Warn on bindgen < 0.69.5 and libclang >= 19.1 due to a double issue in the projects, which we managed to trigger with the upcoming tracepoint support. It includes a build test since some distributions backported the fix (e.g. Debian -- thanks!). All major distributions we list should be now OK except Ubuntu non-LTS. 'macros' crate: - Adapt the build system to be able run the doctests there too; and clean up and enable the corresponding doctests. 'kernel' crate: - Add 'alloc' module with generic kernel allocator support and remove the dependency on the Rust standard library 'alloc' and the extension traits we used to provide fallible methods with flags. Add the 'Allocator' trait and its implementations '{K,V,KV}malloc'. Add the 'Box' type (a heap allocation for a single value of type 'T' that is also generic over an allocator and considers the kernel's GFP flags) and its shorthand aliases '{K,V,KV}Box'. Add 'ArrayLayout' type. Add 'Vec' (a contiguous growable array type) and its shorthand aliases '{K,V,KV}Vec', including iterator support. For instance, now we may write code such as: let mut v = KVec::new(); v.push(1, GFP_KERNEL)?; assert_eq!(&v, &[1]); Treewide, move as well old users to these new types. - 'sync' module: add global lock support, including the 'GlobalLockBackend' trait; the 'Global{Lock,Guard,LockedBy}' types and the 'global_lock!' macro. Add the 'Lock::try_lock' method. - 'error' module: optimize 'Error' type to use 'NonZeroI32' and make conversion functions public. - 'page' module: add 'page_align' function. - Add 'transmute' module with the existing 'FromBytes' and 'AsBytes' traits. - 'block::mq::request' module: improve rendered documentation. - 'types' module: extend 'Opaque' type documentation and add simple examples for the 'Either' types. drm/panic: - Clean up a series of Clippy warnings. Documentation: - Add coding guidelines for lints and the '#[expect]' feature. - Add Ubuntu to the list of distributions in the Quick Start guide. MAINTAINERS: - Add Danilo Krummrich as maintainer of the new 'alloc' module. And a few other small cleanups and fixes" * tag 'rust-6.13' of https://github.com/Rust-for-Linux/linux: (82 commits) rust: alloc: Fix `ArrayLayout` allocations docs: rust: remove spurious item in `expect` list rust: allow `clippy::needless_lifetimes` rust: warn on bindgen < 0.69.5 and libclang >= 19.1 rust: use custom FFI integer types rust: map `__kernel_size_t` and friends also to usize/isize rust: fix size_t in bindgen prototypes of C builtins rust: sync: add global lock support rust: macros: enable the rest of the tests rust: macros: enable paste! use from macro_rules! rust: enable macros::module! tests rust: kbuild: expand rusttest target for macros rust: types: extend `Opaque` documentation rust: block: fix formatting of `kernel::block::mq::request` module rust: macros: fix documentation of the paste! macro rust: kernel: fix THIS_MODULE header path in ThisModule doc comment rust: page: add Rust version of PAGE_ALIGN rust: helpers: remove unnecessary header includes rust: exports: improve grammar in commentary drm/panic: allow verbose version check ...
2024-11-23Merge tag 'mm-stable-2024-11-18-19-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...
2024-11-21Merge tag 'reiserfs_delete' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull reiserfs removal from Jan Kara: "The deprecation period of reiserfs is ending at the end of this year so it is time to remove it" * tag 'reiserfs_delete' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: The last commit
2024-11-19ublk: fix error code for unsupported commandMing Lei
ENOTSUPP is for kernel use only, and shouldn't be sent to userspace. Fix it by replacing it with EOPNOTSUPP. Cc: stable@vger.kernel.org Fixes: bfbcef036396 ("ublk_drv: move ublk_get_device_from_id into ublk_ctrl_uring_cmd") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241119030646.2319030-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-19loop: Fix ABBA locking raceOGAWA Hirofumi
Current loop calls vfs_statfs() while holding the q->limits_lock. If FS takes some locking in vfs_statfs callback, this may lead to ABBA locking bug (at least, FAT fs has this issue actually). So this patch calls vfs_statfs() outside q->limits_locks instead, because looks like no reason to hold q->limits_locks while getting discord configs. Chain exists of: &sbi->fat_lock --> &q->q_usage_counter(io)#17 --> &q->limits_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->limits_lock); lock(&q->q_usage_counter(io)#17); lock(&q->limits_lock); lock(&sbi->fat_lock); *** DEADLOCK *** Reported-by: syzbot+a5d8c609c02f508672cc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a5d8c609c02f508672cc Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-14zram: fix NULL pointer in comp_algorithm_show()Liu Shixin
LTP reported a NULL pointer dereference as followed: CPU: 7 UID: 0 PID: 5995 Comm: cat Kdump: loaded Not tainted 6.12.0-rc6+ #3 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __pi_strcmp+0x24/0x140 lr : zcomp_available_show+0x60/0x100 [zram] sp : ffff800088b93b90 x29: ffff800088b93b90 x28: 0000000000000001 x27: 0000000000400cc0 x26: 0000000000000ffe x25: ffff80007b3e2388 x24: 0000000000000000 x23: ffff80007b3e2390 x22: ffff0004041a9000 x21: ffff80007b3e2900 x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: ffff80007b3e2900 x9 : ffff80007b3cb280 x8 : 0101010101010101 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000040 x4 : 0000000000000000 x3 : 00656c722d6f7a6c x2 : 0000000000000000 x1 : ffff80007b3e2900 x0 : 0000000000000000 Call trace: __pi_strcmp+0x24/0x140 comp_algorithm_show+0x40/0x70 [zram] dev_attr_show+0x28/0x80 sysfs_kf_seq_show+0x90/0x140 kernfs_seq_show+0x34/0x48 seq_read_iter+0x1d4/0x4e8 kernfs_fop_read_iter+0x40/0x58 new_sync_read+0x9c/0x168 vfs_read+0x1a8/0x1f8 ksys_read+0x74/0x108 __arm64_sys_read+0x24/0x38 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x38/0x138 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x188/0x190 The zram->comp_algs[ZRAM_PRIMARY_COMP] can be NULL in zram_add() if comp_algorithm_set() has not been called. User can access the zram device by sysfs after device_add_disk(), so there is a time window to trigger the NULL pointer dereference. Move it ahead device_add_disk() to make sure when user can access the zram device, it is ready. comp_algorithm_set() is protected by zram->init_lock in other places and no such problem. Link: https://lkml.kernel.org/r/20241108100147.3776123-1-liushixin2@huawei.com Fixes: 7ac07a26dea7 ("zram: preparation for multi-zcomp support") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-13block: don't reorder requests in blk_add_rq_to_plugChristoph Hellwig
Add requests to the tail of the list instead of the front so that they are queued up in submission order. Remove the re-reordering in blk_mq_dispatch_plug_list, virtio_queue_rqs and nvme_queue_rqs now that the list is ordered as expected. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>