summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-25mm: vmalloc: actually use the in-place vrealloc regionKees Cook
Patch series "mm: vmalloc: Actually use the in-place vrealloc region". This fixes a performance regression[1] with vrealloc()[1]. The refactoring to not build a new vmalloc region only actually worked when shrinking. Actually return the resized area when it grows. Ugh. Link: https://lkml.kernel.org/r/20250515214217.619685-1-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Closes: https://lore.kernel.org/all/20250515-bpf-verifier-slowdown-vwo2meju4cgp2su5ckj@6gi6ssxbnfqg [1] Tested-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25alloc_tag: allocate percpu counters for module tags dynamicallySuren Baghdasaryan
When a module gets unloaded it checks whether any of its tags are still in use and if so, we keep the memory containing module's allocation tags alive until all tags are unused. However percpu counters referenced by the tags are freed by free_module(). This will lead to UAF if the memory allocated by a module is accessed after module was unloaded. To fix this we allocate percpu counters for module allocation tags dynamically and we keep it alive for tags which are still in use after module unloading. This also removes the requirement of a larger PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because percpu memory for counters does not need to be reserved anymore. Link: https://lkml.kernel.org/r/20250517000739.5930-1-surenb@google.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Tested-by: David Wang <00107082@163.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25module: release codetag section when module load failsDavid Wang
When module load fails after memory for codetag section is ready, codetag section memory will not be properly released. This causes memory leak, and if next module load happens to get the same module address, codetag may pick the uninitialized section when manipulating tags during module unload, and leads to "unable to handle page fault" BUG. Link: https://lkml.kernel.org/r/20250519163823.7540-1-00107082@163.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Signed-off-by: David Wang <00107082@163.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm/cma: make detection of highmem_start more robustMike Rapoport (Microsoft)
Pratyush Yadav reports the following crash: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/physaddr.c:23! ception 0x06 IP 10:ffffffff812ebbf8 error 0 cr2 0xffff88903ffff000 CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc6+ #231 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:__phys_addr+0x58/0x60 Code: 01 48 89 c2 48 d3 ea 48 85 d2 75 05 e9 91 52 cf 00 0f 0b 48 3d ff ff ff 1f 77 0f 48 8b 05 20 54 55 01 48 01 d0 e9 78 52 cf 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 0000:ffffffff82803dd8 EFLAGS: 00010006 ORIG_RAX: 0000000000000000 RAX: 000000007fffffff RBX: 00000000ffffffff RCX: 0000000000000000 RDX: 000000007fffffff RSI: 0000000280000000 RDI: ffffffffffffffff RBP: ffffffff82803e68 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff83153180 R11: ffffffff82803e48 R12: ffffffff83c9aed0 R13: 0000000000000000 R14: 0000001040000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88903ffff000 CR3: 0000000002838000 CR4: 00000000000000b0 Call Trace: <TASK> ? __cma_declare_contiguous_nid+0x6e/0x340 ? cma_declare_contiguous_nid+0x33/0x70 ? dma_contiguous_reserve_area+0x2f/0x70 ? setup_arch+0x6f1/0x870 ? start_kernel+0x52/0x4b0 ? x86_64_start_reservations+0x29/0x30 ? x86_64_start_kernel+0x7c/0x80 ? common_startup_64+0x13e/0x141 The reason is that __cma_declare_contiguous_nid() does: highmem_start = __pa(high_memory - 1) + 1; If dma_contiguous_reserve_area() (or any other CMA declaration) is called before free_area_init(), high_memory is uninitialized. Without CONFIG_DEBUG_VIRTUAL, it will likely work but use the wrong value for highmem_start. The issue occurs because commit e120d1bc12da ("arch, mm: set high_memory in free_area_init()") moved initialization of high_memory after the call to dma_contiguous_reserve() -> __cma_declare_contiguous_nid() on several architectures. In the case CONFIG_HIGHMEM is enabled, some architectures that actually support HIGHMEM (arm, powerpc and x86) have initialization of high_memory before a possible call to __cma_declare_contiguous_nid() and some initialized high_memory late anyway (arc, csky, microblase, mips, sparc, xtensa) even before the commit e120d1bc12da so they are fine with using uninitialized value of high_memory. And in the case CONFIG_HIGHMEM is disabled high_memory essentially becomes the first address after memory end, so instead of relying on high_memory to calculate highmem_start use memblock_end_of_DRAM() and eliminate the dependency of CMA area creation on high_memory in majority of configurations. Link: https://lkml.kernel.org/r/20250519171805.1288393-1-rppt@kernel.org Fixes: e120d1bc12da ("arch, mm: set high_memory in free_area_init()") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reported-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25perf/headers: Clean up <linux/perf_event.h> a bitIngo Molnar
Do a bit of readability spring cleaning: - Fix misaligned structure member in perf_addr_filter: the new struct perf_addr_filter::action member was too long, but when it was added it was not aligned properly. Align all fields to the customary column 41 alignment of most of the rest of the header. - Adjust the vertical alignment of the definition of other structures and definitions as well, so that the 'most of' in the previous paragraph changes to 'all of'. ;-) - Prettify the assignments in perf_clear_branch_entry_bitfields() - Move comments from CPP definitions to outside the macro - Move perf_guest_info_callbacks and related defines from the front of the header closer to where it's used within the header. - And more #endif markers for larger CPP blocks and standardize #if/#else/#endif blocks to the following nomenclature: #ifdef CONFIG_FOO ... #else /* !CONFIG_FOO: */ ... #endif /* !CONFIG_FOO */ - Standardize on consistently using the 'extern' storage class where appropriate, we had cases where method prototypes sometimes omitted the storage class: extern void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu); int perf_event_read_local(struct perf_event *event, u64 *value, u64 *enabled, u64 *running); extern u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running); Which is obviously a bit confusing and adds unnecessary noise. - s/__u64/u64 and similar cleanups: there's no point in using __u64 in non-UAPI headers, and doing so only adds unnecessary visual noise. - Harmonize all multi-parameter function prototypes along the following style: extern struct perf_event * perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, struct task_struct *task, perf_overflow_handler_t callback, void *context); - etc. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-25erofs: support DEFLATE decompression by using Intel QATBo Liu
This patch introduces the use of the Intel QAT to offload EROFS data decompression, aiming to improve the decompression performance. A 285MiB dataset is used with the following command to create EROFS images with different cluster sizes: $ mkfs.erofs -zdeflate,level=9 -C{4096,16384,65536,131072,262144} Fio is used to test the following read patterns: $ fio -filename=testfile -bs=4k -rw=read -name=job1 $ fio -filename=testfile -bs=4k -rw=randread -name=job1 $ fio -filename=testfile -bs=4k -rw=randread --io_size=14m -name=job1 Here are some performance numbers for reference: Processors: Intel(R) Xeon(R) 6766E (144 cores) Memory: 512 GiB |-----------------------------------------------------------------------------| | | Cluster size | sequential read | randread | small randread(5%) | |-----------|--------------|-----------------|-----------|--------------------| | Intel QAT | 4096 | 538 MiB/s | 112 MiB/s | 20.76 MiB/s | | Intel QAT | 16384 | 699 MiB/s | 158 MiB/s | 21.02 MiB/s | | Intel QAT | 65536 | 917 MiB/s | 278 MiB/s | 20.90 MiB/s | | Intel QAT | 131072 | 1056 MiB/s | 351 MiB/s | 23.36 MiB/s | | Intel QAT | 262144 | 1145 MiB/s | 431 MiB/s | 26.66 MiB/s | | deflate | 4096 | 499 MiB/s | 108 MiB/s | 21.50 MiB/s | | deflate | 16384 | 422 MiB/s | 125 MiB/s | 18.94 MiB/s | | deflate | 65536 | 452 MiB/s | 159 MiB/s | 13.02 MiB/s | | deflate | 131072 | 452 MiB/s | 177 MiB/s | 11.44 MiB/s | | deflate | 262144 | 466 MiB/s | 194 MiB/s | 10.60 MiB/s | Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250522094931.28956-1-liubo03@inspur.com [ Gao Xiang: refine the commit message. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-05-24Merge tag 'input-for-v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - even more Xbox controllers added to xpad driver: Turtle Beach Recon Wired Controller, Turtle Beach Stealth Ultra, and PowerA Wired Controller - a fix to Synaptics RMI driver to not crash if controller reports unsupported version of F34 (firmware flash) function * tag 'input-for-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics-rmi - fix crash with unsupported versions of F34 Input: xpad - add more controllers
2025-05-24Merge tag 'spi-fix-v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few final fixes for v6.15, some driver fixes for the Freescale DSPI driver pulled over from their vendor code and another instance of the fixes Greg has been sending throughout the kernel for constification of the bus_type in driver core match() functions" * tag 'spi-fix-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-fsl-dspi: Reset SR flags before sending a new message spi: spi-fsl-dspi: Halt the module after a new message transfer spi: spi-fsl-dspi: restrict register range for regmap access spi: use container_of_cont() for to_spi_device()
2025-05-24Merge tag 'iommu-fixes-v6.15-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fix from Joerg Roedel: - core: skip PASID validation for devices without PASID support * tag 'iommu-fixes-v6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu: Skip PASID validation for devices without PASID capability
2025-05-23bcachefs: Don't mount bs > ps without TRANSPARENT_HUGEPAGEKent Overstreet
Large folios aren't supported without TRANSPARENT_HUGEPAGE Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix btree_iter_next_node() for new locking assertsKent Overstreet
We can't unlock a should_be_locked path unless we're in a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Ensure we don't use a blacklisted journal seqKent Overstreet
Different versions differ on the size of the blacklist range; it is theoretically possible that we could end up with blacklisted journal sequence numbers newer than the newest seq we find in the journal, and pick a new start seq that's blacklisted. Explicitly check for this in bch2_fs_journal_start(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Small check_fix_ptr fixesKent Overstreet
We don't want to change the bucket gen, on gen mismatch: it's possible to have multiple btree nodes with different gens in the same bucket that we want to keep, if we have to recover from btree node scan. It's also not necessary to set g->gen_valid; add a comment to that effect. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix opts.recovery_pass_lastKent Overstreet
This was lost in the giant recovery pass rework - but it's used heavily by bcachefs subcommand utilities. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix allocate -> self healing pathKent Overstreet
When we go to allocate and find taht a bucket in the freespace btree is actually allocated, we're supposed to return nonzero to tell the allocator to skip it. This fixes an emergency read only due to a bucket/ptr gen mismatch - we also don't return the correct bucket gen when this happens. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix endianness in casefold check/repairKent Overstreet
Fixes: 010c89468134 ("bcachefs: Check for casefolded dirents in non casefolded dirs") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23Merge tag 'drm-fixes-2025-05-24' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly drm fixes pull, on target to be quiet, just one amdgpu, one edid and a few minor xe fixes. edid: - fix HDR metadata reset amdgpu: - Hibernate fix xe: - Make sure to check all forcewakes when dumping mocs - Fix wrong use of read64 on 32b register - Synchronize Panther Lake PCI IDs" * tag 'drm-fixes-2025-05-24' of https://gitlab.freedesktop.org/drm/kernel: drm/xe/ptl: Update the PTL pci id table drm/xe: Use xe_mmio_read32() to read mtcfg register drm/xe/mocs: Check if all domains awake Revert "drm/amd: Keep display off while going into S4" drm/edid: fixed the bug that hdr metadata was not reset
2025-05-24Merge tag 'drm-xe-fixes-2025-05-23' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Make sure to check all forcewakes when dumping mocs - Fix wrong use of read64 on 32b register - Synchronize Panther Lake PCI IDs Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/uixp5cq7emz32lmwwvq4vbujppugfozhyj3cm2aqzx4lcg7ivn@m2khvf4kvz5p
2025-05-24Merge tag 'amd-drm-fixes-6.15-2025-05-22' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.15-2025-05-22: amdgpu: - Hibernate fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250522183941.9606-1-alexander.deucher@amd.com
2025-05-24Merge tag 'drm-misc-fixes-2025-05-22' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: edid: - fix HDR metadata reset Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/r/20250522113902.GA7000@localhost.localdomain
2025-05-23Merge tag 'thermal-6.15-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "This fixes a coding mistake in the x86_pkg_temp_thermal Intel thermal driver that was introduced by an incorrect conflict resolution during a merge (Zhang Rui)" * tag 'thermal-6.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: x86_pkg_temp_thermal: Fix bogus trip temperature
2025-05-23tpm_crb: ffa_tpm: fix/update comments describing the CRB over FFA ABIStuart Yoder
-Fix the comment describing the 'start' function, which was a cut/paste mistake for a different function. -The comment for DIRECT_REQ and DIRECT_RESP only mentioned AArch32 and listed 32-bit function IDs. Update to include 64-bit. Signed-off-by: Stuart Yoder <stuart.yoder@arm.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-23tpm_crb_ffa: use dev_xx() macro to print logYeoreum Yun
Instead of pr_xxx() macro, use dev_xxx() to print log. This patch changes some error log level to warn log level when the tpm_crb_ffa secure partition doesn't support properly but system can run without it. (i.e) unsupport of direct message ABI or unsupported ABI version Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-23tpm_ffa_crb: access tpm service over FF-A direct message request v2Yeoreum Yun
For secure partition with multi service, tpm_ffa_crb can access tpm service with direct message request v2 interface according to chapter 3.3, TPM Service Command Response Buffer Interface Over FF-A specificationi v1.0 BET. This patch reflects this spec to access tpm service over FF-A direct message request v2 ABI. Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-23tpm: remove kmalloc failure error messageColin Ian King
The kmalloc failure message is just noise. Remove it and replace -EFAULT with -ENOMEM as standard for out of memory allocation error returns. Link: https://lore.kernel.org/linux-integrity/20250430083435.860146-1-colin.i.king@gmail.com/ Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-23Merge tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix for rename regression due to the recent VFS lookup changes - Fix write failure - locking fix for oplock handling * tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: use list_first_entry_or_null for opinfo_get_list() ksmbd: fix rename failure ksmbd: fix stream write failure
2025-05-23selftests: ublk: add test for UBLK_F_QUIESCEMing Lei
Add test generic_11 for covering new control command of UBLK_U_CMD_QUIESCE_DEV. Add 'quiesce -n dev_id' sub-command on ublk utility for transitioning device state to quiesce states, then verify the feature via generic_10 by doing quiesce and recovery. Cc: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23ublk: add feature UBLK_F_QUIESCEMing Lei
Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV` for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED` or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server cooperation. This feature can help to support to upgrade ublk server application by shutting down ublk server gracefully, meantime keep ublk block device persistent during the upgrading period. The feature is only available for UBLK_F_USER_RECOVERY. Suggested-by: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZEMing Lei
Add test generic_10 for covering new control command of UBLK_U_CMD_UPDATE_SIZE. Add 'update_size -s|--size size_in_bytes' sub-command on ublk utility for supporting this feature, then verify the feature via generic_10. Cc: Omri Mann <omri@nvidia.com> Cc: Jared Holzman <jholzman@nvidia.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23traceevent/block: Add REQ_ATOMIC flag to block trace eventsRitesh Harjani (IBM)
Filesystems like XFS can implement atomic write I/O using either REQ_ATOMIC flag set in the bio or via CoW operation. It will be useful if we have a flag in trace events to distinguish between the two. This patch adds char 'U' (Untorn writes) to rwbs field of the trace events if REQ_ATOMIC flag is set in the bio. <W/ REQ_ATOMIC> ================= xfs_io-4238 [009] ..... 4148.126843: block_rq_issue: 259,0 WFSU 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [009] d.h1. 4148.129864: block_rq_complete: 259,0 WFSU () 768 + 32 none,0,0 [0] <W/O REQ_ATOMIC> =============== xfs_io-4237 [010] ..... 4143.325616: block_rq_issue: 259,0 WS 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [010] d.H1. 4143.329138: block_rq_complete: 259,0 WS () 768 + 32 none,0,0 [0] Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/44317cb2ec4588f6a2c1501a96684e6a1196e8ba.1747921498.git.ritesh.list@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23Merge tag 'soc-fixes-6.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "A few last minute fixes: - two driver fixes for samsung/google platforms, both addressing mistakes in changes from the 6.15 merge window - a revert for an allwinner devicetree change that caused problems - a fix for an older regression with the LEDs on Marvell Armada 3720 - a defconfig change to enable chacha20 again after a crypto subsystem change in 6.15 inadventently turned it off" * tag 'soc-fixes-6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: defconfig: Ensure CRYPTO_CHACHA20_NEON is selected arm64: dts: marvell: uDPU: define pinctrl state for alarm LEDs Revert "arm64: dts: allwinner: h6: Use RSB for AXP805 PMIC connection" soc: samsung: usi: prevent wrong bits inversion during unconfiguring firmware: exynos-acpm: check saved RX before bailing out on empty RX queue
2025-05-23Merge tag 'platform-drivers-x86-v6.15-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers fixes from Ilpo Järvinen: - dell-wmi-sysman: Avoid buffer overflow in current_password_store() - fujitsu-laptop: Support Lifebook S2110 hotkeys - intel/pmc: Fix Arrow Lake U/H NPU PCI ID - think-lmi: Fix attribute name usage for non-compliant items - thinkpad_acpi: Ignore battery threshold change event notification * tag 'platform-drivers-x86-v6.15-6' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID platform/x86: think-lmi: Fix attribute name usage for non-compliant items platform/x86: thinkpad_acpi: Ignore battery threshold change event notification platform/x86: dell-wmi-sysman: Avoid buffer overflow in current_password_store() platform/x86: fujitsu-laptop: Support Lifebook S2110 hotkeys
2025-05-23Merge tag 'vfs-6.15-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains a small set of fixes for the blocking buffer lookup conversion done earlier this cycle. It adds a missing conversion in the getblk slowpath and a few minor optimizations and cleanups" * tag 'vfs-6.15-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/buffer: optimize discard_buffer() fs/buffer: remove superfluous statements fs/buffer: avoid redundant lookup in getblk slowpath fs/buffer: use sleeping lookup in __getblk_slowpath()
2025-05-23io_uring/cmd: warn on reg buf imports by ineligible cmdsPavel Begunkov
For IORING_URING_CMD_FIXED-less commands io_uring doesn't pull buf_index from the sqe, so imports might succeed if the index coincide, e.g. when it's 0, but otherwise it's error prone. Warn if someone tries to import without the flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://lore.kernel.org/r/a1c2c88e53c3fe96978f23d50c6bc66c2c79c337.1747991070.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23statmount: update STATMOUNT_SUPPORTED macroDmitry V. Levin
According to commit 8f6116b5b77b ("statmount: add a new supported_mask field"), STATMOUNT_SUPPORTED macro shall be updated whenever a new flag is added. Fixes: 7a54947e727b ("Merge patch series "fs: allow changing idmappings"") Signed-off-by: "Dmitry V. Levin" <ldv@strace.io> Link: https://lore.kernel.org/20250511224953.GA17849@strace.io Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23fs: convert mount flags to enumStephen Brennan
In prior kernel versions (5.8-6.8), commit 9f6c61f96f2d9 ("proc/mounts: add cursor") introduced MNT_CURSOR, a flag used by readers from /proc/mounts to keep their place while reading the file. Later, commit 2eea9ce4310d8 ("mounts: keep list of mounts in an rbtree") removed this flag and its value has since been repurposed. For debuggers iterating over the list of mounts, cursors should be skipped as they are irrelevant. Detecting whether an element is a cursor can be difficult. Since the MNT_CURSOR flag is a preprocessor constant, it's not present in debuginfo, and since its value is repurposed, we cannot hard-code it. For this specific issue, cursors are possible to detect in other ways, but ideally, we would be able to read the mount flag definitions out of the debuginfo. For that reason, convert the mount flags to an enum. Link: https://github.com/osandov/drgn/pull/496 Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Link: https://lore.kernel.org/20250507223402.2795029-1-stephen.s.brennan@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23->mnt_devname is never NULLAl Viro
Not since 8f2918898eb5 "new helpers: vfs_create_mount(), fc_mount()" back in 2018. Get rid of the dead checks... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/20250421033509.GV2023217@ZenIV Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23mount: add a comment about concurrent changes with statmount()/listmount()Christian Brauner
Add some comments in there highlighting a few non-obvious assumptions. Link: https://lore.kernel.org/20250416-zerknirschen-aluminium-14a55639076f@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23io_uring/io-wq: only create a new worker if it can make progressJens Axboe
Hashed work is serialized by io-wq, intended to be used for cases like serializing buffered writes to a regular file, where the file system will serialize the workers anyway with a mutex or similar. Since they would be forcibly serialized and blocked, it's more efficient for io-wq to handle these individually rather than issue them in parallel. If a worker is currently handling a hashed work item and gets blocked, don't create a new worker if the next work item is also hashed and mapped to the same bucket. That new worker would not be able to make any progress anyway. Reported-by: Fengnan Chang <changfengnan@bytedance.com> Reported-by: Diangang Li <lidiangang@bytedance.com> Link: https://lore.kernel.org/io-uring/20250522090909.73212-1-changfengnan@bytedance.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23io_uring/io-wq: ignore non-busy worker going to sleepJens Axboe
When an io-wq worker goes to sleep, it checks if there's work to do. If there is, it'll create a new worker. But if this worker is currently idle, it'll either get woken right back up immediately, or someone else has already created the necessary worker to handle this work. Only go through the worker creation logic if the current worker is currently handling a work item. That means it's being scheduled out as part of handling that work, not just going to sleep on its own. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23io_uring/io-wq: move hash helpers to the topJens Axboe
Just in preparation for using them higher up in the function, move these generic helpers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23bcachefs: Path must be locked if trans->locked && should_be_lockedKent Overstreet
If path->should_be_locked is true, that means user code (of the btree API) has seen, in this transaction, something guarded by the node this path has locked, and we have to keep it locked until the end of the transaction. Assert that we're not violating this; should_be_locked should also be cleared only in _very_ special situations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Simplify bch2_path_put()Kent Overstreet
Simplify the "do we need to keep this locked?" checks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Plumb btree_trans for more locking assertsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Clear trans->locked before unlockKent Overstreet
We're adding new should_be_locked assertions: it's going to be illegal to unlock a should_be_locked path when trans->locked is true. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Clear should_be_locked before unlock in key_cache_drop()Kent Overstreet
We're adding new should_be_locked assertions, also add a comment explaining why clearing should_be_locked is safe here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: bch2_path_get() reuses paths if upgrade_fails & !should_be_lockedKent Overstreet
Small additional optimization over the previous patch, bringing us closer to the original behaviour, except when we need to clone to avoid a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Give out new path if upgrade failsKent Overstreet
Avoid transaction restarts due to failure to upgrade - we can traverse a new iterator without a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix btree_path_get_locks when not doing trans restartKent Overstreet
btree_path_get_locks, on failure, shouldn't unlock if we're not issuing a transaction restart: we might drop locks we're not supposed to (if path->should_be_locked is set). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: btree_node_locked_type_nowrite()Kent Overstreet
Small helper to improve locking assertions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>