summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2025-05-11ptrace: introduce PTRACE_SET_SYSCALL_INFO requestDmitry V. Levin
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of system calls the tracee is blocked in. This API allows ptracers to obtain and modify system call details in a straightforward and architecture-agnostic way, providing a consistent way of manipulating the system call number and arguments across architectures. As in case of PTRACE_GET_SYSCALL_INFO, PTRACE_SET_SYSCALL_INFO also does not aim to address numerous architecture-specific system call ABI peculiarities, like differences in the number of system call arguments for such system calls as pread64 and preadv. The current implementation supports changing only those bits of system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO, such as instruction pointer and stack pointer, could be added later if needed, by using struct ptrace_syscall_info.flags to specify the additional details that should be set. Currently, "flags" and "reserved" fields of struct ptrace_syscall_info must be initialized with zeroes; "arch", "instruction_pointer", and "stack_pointer" fields are currently ignored. PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY, PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations. Other operations could be added later if needed. Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with PTRACE_GET_SYSCALL_INFO, but it didn't happen. The last straw that convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure to provide an API of changing the first system call argument on riscv architecture. ptrace(2) man page: long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data); ... PTRACE_SET_SYSCALL_INFO Modify information about the system call that caused the stop. The "data" argument is a pointer to struct ptrace_syscall_info that specifies the system call information to be set. The "addr" argument should be set to sizeof(struct ptrace_syscall_info)). Link: https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/ Link: https://lkml.kernel.org/r/20250303112044.GF24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Reviewed-by: Alexey Gladkov <legion@kernel.org> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Eugene Syromiatnikov <esyr@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Maciej W. Rozycki <macro@orcam.me.uk> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11syscall.h: introduce syscall_set_nr()Dmitry V. Levin
Similar to syscall_set_arguments() that complements syscall_get_arguments(), introduce syscall_set_nr() that complements syscall_get_nr(). syscall_set_nr() is going to be needed along with syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. Link: https://lkml.kernel.org/r/20250303112020.GD24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> # mips Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11syscall.h: add syscall_set_arguments()Dmitry V. Levin
This function is going to be needed on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. This partially reverts commit 7962c2eddbfe ("arch: remove unused function syscall_set_arguments()") by reusing some of old syscall_set_arguments() implementations. [nathan@kernel.org: fix compile time fortify checks] Link: https://lkml.kernel.org/r/20250408213131.GA2872426@ax162 Link: https://lkml.kernel.org/r/20250303112009.GC24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> [mips] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm: annotate data race in update_hiwater_rssIgnacio Encinas
mm_struct.hiwater_rss can be accessed concurrently without proper synchronization as reported by KCSAN. This data race is benign as it only affects accounting information. Annotate it with data_race() to make KCSAN happy. Link: https://lkml.kernel.org/r/20250331-mm-maxrss-data-race-v2-1-cf958e6205bf@iencinas.com Signed-off-by: Ignacio Encinas <ignacio@iencinas.com> Reported-by: syzbot+419c4b42acc36c420ad3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67e3390c.050a0220.1ec46.0001.GAE@google.com/ Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Pedro Falcato <pfalcato@suse.de> Cc: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm/compaction: use folio in hugetlb pathwayVishal Moola (Oracle)
Use a folio in the hugetlb pathway during the compaction migrate-able pageblock scan. This removes a call to compound_head(). Link: https://lkml.kernel.org/r/20250401021025.637333-2-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11memory: implement memory_block_advise/probe_max_sizeGregory Price
Patch series "memory,x86,acpi: hotplug memory alignment advisement", v8. When physical address regions are not aligned to memory block size, the misaligned portion is lost (stranded capacity). Block size (min/max/selected) is architecture defined. Most architectures tend to use the minimum block size or some simplistic heurist. On x86, memory block size increases up to 2GB, and is otherwise fitted to the alignment of non-hotplug (i.e. not special purpose memory). CXL exposes its memory for management through the ACPI CEDT (CXL Early Detection Table) in a field called the CXL Fixed Memory Window. Per the CXL specification, this memory must be aligned to at least 256MB. When a CFMW aligns on a size less than the block size, this causes a loss of up to 2GB per CFMW on x86. It is not uncommon for CFMW to be allocated per-device - though this behavior is BIOS defined. This patch set provides 3 things: 1) implement advise/query functions in driverse/base/memory.c to report/query architecture agnostic hotplug block alignment advice. 2) update x86 memblock size logic to consider the hotplug advice 3) add code in acpi/numa/srat.c to report CFMW alignment advice The advisement interfaces are design to be called during arch_init code prior to allocator and smp_init. start_kernel will call these through setup_arch() (via acpi and mm/init_64.c on x86), which occurs prior to mm_core_init and smp_init - so no need for atomics. There's an attempt to signal callers to advise() that query has already occurred, but this is predicated on the notion that query actually occurs (which presently only happens on the x86 arch). This is to assist debugging future users. Otherwise, the advise() call has been marked __init to help static discovery of bad call times. Once query is called the first time, it will always return the same value. Interfaces return -EBUSY and 0 respectively on systems without hotplug. This patch (of 3): Hotplug memory sources may have opinions on what the memblock size should be - usually for alignment purposes. For example, CXL memory extents can be 256MB with a matching alignment. If this size/alignment is smaller than the block size, it can result in stranded capacity. Implement memory_block_advise_max_size for use prior to allocator init, for software to advise the system on the max block size. Implement memory_block_probe_max_size for use by arch init code to calculate the best block size. Use of advice is architecture defined. The probe value can never change after first probe. Calls to advise after probe will return -EBUSY to aid debugging. On systems without hotplug, always return -ENODEV and 0 respectively. Link: https://lkml.kernel.org/r/20250127153405.3379117-1-gourry@gourry.net Link: https://lkml.kernel.org/r/20250127153405.3379117-2-gourry@gourry.net Signed-off-by: Gregory Price <gourry@gourry.net> Suggested-by: Ira Weiny <ira.weiny@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Bruno Faccini <bfaccini@nvidia.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haibo Xu <haibo1.xu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Len Brown <lenb@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Robert Richter <rrichter@amd.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11zsmalloc: prefer the the original page's node for compressed dataNhat Pham
Currently, zsmalloc, zswap's and zram's backend memory allocator, does not enforce any policy for the allocation of memory for the compressed data, instead just adopting the memory policy of the task entering reclaim, or the default policy (prefer local node) if no such policy is specified. This can lead to several pathological behaviors in multi-node NUMA systems: 1. Systems with CXL-based memory tiering can encounter the following inversion with zswap/zram: the coldest pages demoted to the CXL tier can return to the high tier when they are reclaimed to compressed swap, creating memory pressure on the high tier. 2. Consider a direct reclaimer scanning nodes in order of allocation preference. If it ventures into remote nodes, the memory it compresses there should stay there. Trying to shift those contents over to the reclaiming thread's preferred node further *increases* its local pressure, and provoking more spills. The remote node is also the most likely to refault this data again. This undesirable behavior was pointed out by Johannes Weiner in [1]. 3. For zswap writeback, the zswap entries are organized in node-specific LRUs, based on the node placement of the original pages, allowing for targeted zswap writeback for specific nodes. However, the compressed data of a zswap entry can be placed on a different node from the LRU it is placed on. This means that reclaim targeted at one node might not free up memory used for zswap entries in that node, but instead reclaiming memory in a different node. All of these issues will be resolved if the compressed data go to the same node as the original page. This patch encourages this behavior by having zswap and zram pass the node of the original page to zsmalloc, and have zsmalloc prefer the specified node if we need to allocate new (zs)pages for the compressed data. Note that we are not strictly binding the allocation to the preferred node. We still allow the allocation to fall back to other nodes when the preferred node is full, or if we have zspages with slots available on a different node. This is OK, and still a strict improvement over the status quo: 1. On a system with demotion enabled, we will generally prefer demotions over compressed swapping, and only swap when pages have already gone to the lowest tier. This patch should achieve the desired effect for the most part. 2. If the preferred node is out of memory, letting the compressed data going to other nodes can be better than the alternative (OOMs, keeping cold memory unreclaimed, disk swapping, etc.). 3. If the allocation go to a separate node because we have a zspage with slots available, at least we're not creating extra immediate memory pressure (since the space is already allocated). 3. While there can be mixings, we generally reclaim pages in same-node batches, which encourage zspage grouping that is more likely to go to the right node. 4. A strict binding would require partitioning zsmalloc by node, which is more complicated, and more prone to regression, since it reduces the storage density of zsmalloc. We need to evaluate the tradeoff and benchmark carefully before adopting such an involved solution. [1]: https://lore.kernel.org/linux-mm/20250331165306.GC2110528@cmpxchg.org/ [senozhatsky@chromium.org: coding-style fixes] Link: https://lkml.kernel.org/r/mnvexa7kseswglcqbhlot4zg3b3la2ypv2rimdl5mh5glbmhvz@wi6bgqn47hge Link: https://lkml.kernel.org/r/20250402204416.3435994-1-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: Gregory Price <gourry@gourry.net> Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> [zram, zsmalloc] Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev> [zswap/zsmalloc] Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm: delete thp_nr_pages()Matthew Wilcox (Oracle)
All callers now use folio_nr_pages(). Delete this wrapper. Link: https://lkml.kernel.org/r/20250402210612.2444135-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11filemap: remove readahead_page_batch()Matthew Wilcox (Oracle)
This function has no more callers; delete it. Link: https://lkml.kernel.org/r/20250402210612.2444135-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11filemap: convert __readahead_batch() to use a folioMatthew Wilcox (Oracle)
Extract folios from i_mapping, not pages. Removes a hidden call to compound_head(), a use of thp_nr_pages() and an unnecessary assertion that we didn't find a tail page in the page cache. Link: https://lkml.kernel.org/r/20250402210612.2444135-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11filemap: remove find_subpage()Matthew Wilcox (Oracle)
All users of this function now call folio_file_page() instead. Delete it. Link: https://lkml.kernel.org/r/20250402210612.2444135-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm: remove offset_in_thp()Matthew Wilcox (Oracle)
All callers have been converted to call offset_in_folio(). Link: https://lkml.kernel.org/r/20250402210612.2444135-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11filemap: remove readahead_page()Matthew Wilcox (Oracle)
Patch series "Misc folio patches for 6.16". Remove a few APIs that we've converted everybody from using. I also found a few places that extract a page pointer from i_pages, which will be an invalid thing to do when we separate pages from folios. This patch (of 8): All filesystems have now been converted to call readahead_folio() so we can delete this wrapper. Link: https://lkml.kernel.org/r/20250402210612.2444135-1-willy@infradead.org Link: https://lkml.kernel.org/r/20250402210612.2444135-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11arch: remove mk_pmd()Matthew Wilcox (Oracle)
There are now no callers of mk_huge_pmd() and mk_pmd(). Remove them. Link: https://lkml.kernel.org/r/20250402181709.2386022-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm: add folio_mk_pmd()Matthew Wilcox (Oracle)
Removes five conversions from folio to page. Also removes both callers of mk_pmd() that aren't part of mk_huge_pmd(), getting us a step closer to removing the confusion between mk_pmd(), mk_huge_pmd() and pmd_mkhuge(). Link: https://lkml.kernel.org/r/20250402181709.2386022-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm: remove mk_huge_pte()Matthew Wilcox (Oracle)
The only remaining user of mk_huge_pte() is the debug code, so remove the API and replace its use with pfn_pte() which lets us remove the conversion to a page first. We should always call arch_make_huge_pte() to turn this PTE into a huge PTE before operating on it with huge_pte_mkdirty() etc. Link: https://lkml.kernel.org/r/20250402181709.2386022-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm: add folio_mk_pte()Matthew Wilcox (Oracle)
Remove a cast from folio to page in four callers of mk_pte(). Link: https://lkml.kernel.org/r/20250402181709.2386022-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm: make mk_pte() definition unconditionalMatthew Wilcox (Oracle)
All architectures now use the common mk_pte() definition, so we can remove the condition. Link: https://lkml.kernel.org/r/20250402181709.2386022-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm: introduce a common definition of mk_pte()Matthew Wilcox (Oracle)
Most architectures simply call pfn_pte(). Centralise that as the normal definition and remove the definition of mk_pte() from the architectures which have either that exact definition or something similar. Link: https://lkml.kernel.org/r/20250402181709.2386022-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Cc: Zi Yan <ziy@nvidia.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm/codetag: move tag retrieval back upfront in __free_pages()David Wang
Commit 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks") introduces a possible use-after-free scenario, when page is non-compound, page[0] could be released by other thread right after put_page_testzero failed in current thread, pgalloc_tag_sub_pages afterwards would manipulate an invalid page for accounting remaining pages: [timeline] [thread1] [thread2] | alloc_page non-compound V | get_page, rf counter inc V | in ___free_pages | put_page_testzero fails V | put_page, page released V | in ___free_pages, | pgalloc_tag_sub_pages | manipulate an invalid page V Restore __free_pages() to its state before, retrieve alloc tag beforehand. Link: https://lkml.kernel.org/r/20250505193034.91682-1-00107082@163.com Fixes: 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks") Signed-off-by: David Wang <00107082@163.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11Merge tag 'its-for-linus-20250509' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 ITS mitigation from Dave Hansen: "Mitigate Indirect Target Selection (ITS) issue. I'd describe this one as a good old CPU bug where the behavior is _obviously_ wrong, but since it just results in bad predictions it wasn't wrong enough to notice. Well, the researchers noticed and also realized that thus bug undermined a bunch of existing indirect branch mitigations. Thus the unusually wide impact on this one. Details: ITS is a bug in some Intel CPUs that affects indirect branches including RETs in the first half of a cacheline. Due to ITS such branches may get wrongly predicted to a target of (direct or indirect) branch that is located in the second half of a cacheline. Researchers at VUSec found this behavior and reported to Intel. Affected processors: - Cascade Lake, Cooper Lake, Whiskey Lake V, Coffee Lake R, Comet Lake, Ice Lake, Tiger Lake and Rocket Lake. Scope of impact: - Guest/host isolation: When eIBRS is used for guest/host isolation, the indirect branches in the VMM may still be predicted with targets corresponding to direct branches in the guest. - Intra-mode using cBPF: cBPF can be used to poison the branch history to exploit ITS. Realigning the indirect branches and RETs mitigates this attack vector. - User/kernel: With eIBRS enabled user/kernel isolation is *not* impacted by ITS. - Indirect Branch Prediction Barrier (IBPB): Due to this bug indirect branches may be predicted with targets corresponding to direct branches which were executed prior to IBPB. This will be fixed in the microcode. Mitigation: As indirect branches in the first half of cacheline are affected, the mitigation is to replace those indirect branches with a call to thunk that is aligned to the second half of the cacheline. RETs that take prediction from RSB are not affected, but they may be affected by RSB-underflow condition. So, RETs in the first half of cacheline are also patched to a return thunk that executes the RET aligned to second half of cacheline" * tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftest/x86/bugs: Add selftests for ITS x86/its: FineIBT-paranoid vs ITS x86/its: Use dynamic thunks for indirect branches x86/ibt: Keep IBT disabled during alternative patching mm/execmem: Unify early execmem_cache behaviour x86/its: Align RETs in BHB clear sequence to avoid thunking x86/its: Add support for RSB stuffing mitigation x86/its: Add "vmexit" option to skip mitigation on some CPUs x86/its: Enable Indirect Target Selection mitigation x86/its: Add support for ITS-safe return thunk x86/its: Add support for ITS-safe indirect thunk x86/its: Enumerate Indirect Target Selection (ITS) bug Documentation: x86/bugs/its: Add ITS documentation
2025-05-11nfsd: add a tracepoint for nfsd_setattrJeff Layton
Turn Sargun's internal kprobe based implementation of this into a normal static tracepoint. Also, remove the dprintk's that got added recently with the fix for zero-length ACLs. Cc: Sargun Dillon <sargun@sargun.me> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11sunrpc: add info about xprt queue times to svc_xprt_dequeue tracepointJeff Layton
I've been looking at a problem where we see increased RPC timeouts in clients when the nfs_layout_flexfiles dataserver_timeo value is tuned very low (6s). This is necessary to ensure quick failover to a different mirror if a server goes down, but it causes a lot more major RPC timeouts. Ultimately, the problem is server-side however. It's sometimes doesn't respond to connection attempts. My theory is that the interrupt handler runs when a connection comes in, the xprt ends up being enqueued, but it takes a significant amount of time for the nfsd thread to pick it up. Currently, the svc_xprt_dequeue tracepoint displays "wakeup-us". This is the time between the wake_up() call, and the thread dequeueing the xprt. If no thread was woken, or the thread ended up picking up a different xprt than intended, then this value won't tell us how long the xprt was waiting. Add a new xpt_qtime field to struct svc_xprt and set it in svc_xprt_enqueue(). When the dequeue tracepoint fires, also store the time that the xprt sat on the queue in total. Display it as "qtime-us". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-05-11dt-bindings: arm: qcom,ids: add SoC ID for SM8750Mukesh Ojha
Add the unique ID for Qualcomm SM8750 SoC. This value is used to differentiate the SoC across qcom targets. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Reviewed-by: Melody Olvera <melody.olvera@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250508134635.1627031-2-mukesh.ojha@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-05-12Merge branch 'fixes' into for-nextIlpo Järvinen
Resolve conflicts in dell/alienware-wmi-wmax and asus-wmi, and enable applying a few amd/hsmp patches that depend on changes in the fixes branch.
2025-05-12Merge tag 'amd-drm-next-6.16-2025-05-09' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.16-2025-05-09: amdgpu: - IPS fixes - DSC cleanup - DC Scaling updates - DC FP fixes - Fused I2C-over-AUX updates - SubVP fixes - Freesync fix - DMUB AUX fixes - VCN fix - Hibernation fixes - HDP fixes - DCN 2.1 fixes - DPIA fixes - DMUB updates - Use drm_file_err in amdgpu - Enforce isolation updates - Use new dma_fence helpers - USERQ fixes - Documentation updates - Misc code cleanups - SR-IOV updates - RAS updates - PSP 12 cleanups amdkfd: - Update error messages for SDMA - Userptr updates drm: - Add drm_file_err function dma-buf: - Add a helper to sort and deduplicate dma_fence arrays From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250509230951.3871914-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-11Merge tag 'timers-urgent-2025-05-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc timers fixes from Ingo Molnar: - Fix time keeping bugs in CLOCK_MONOTONIC_COARSE clocks - Work around absolute relocations into vDSO code that GCC erroneously emits in certain arm64 build environments - Fix a false positive lockdep warning in the i8253 clocksource driver * tag 'timers-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable() arm64: vdso: Work around invalid absolute relocations from GCC timekeeping: Prevent coarse clocks going backwards
2025-05-11ALSA: ump: Fix a typo of snd_ump_stream_msg_device_infoTakashi Iwai
s/devince/device/ It's used only internally, so no any behavior changes. Fixes: 37e0e14128e0 ("ALSA: ump: Support UMP Endpoint and Function Block parsing") Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20250511141147.10246-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-05-11ALSA: wavefront: remove snd_wavefront_xxx()Kuninori Morimoto
There is no snd_wavefront_xxx() implementation, and no one is using it. Let's remove it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/87msbmpqls.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-05-11ALSA: isa/gus: remove snd_gf1_lfo_xxx()Kuninori Morimoto
There is no snd_gf1_lfo_xxx() implementation, and no one is using it. Let's remove it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/87o6w2pqm8.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-05-11ALSA/hda: intel-sdw-acpi: Correct sdw_intel_acpi_scan() function parameterPeter Ujfalusi
The acpi_handle should be just a handle and not a pointer in sdw_intel_acpi_scan() parameter list. It is called with 'acpi_handle handle' as parameter and it is passing it to acpi_walk_namespace, which also expects acpi_handle and not acpi_handle* Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://patch.msgid.link/20250508181207.22113-1-peter.ujfalusi@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-05-11futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in ↵Ingo Molnar
futex_mm_init() The following commit added an rcu_assign_pointer() assignment to futex_mm_init() in <linux/futex.h>: bd54df5ea7ca ("futex: Allow to resize the private local hash") Which breaks the build on older compilers (gcc-9, x86-64 defconfig): CC io_uring/futex.o In file included from ./arch/x86/include/generated/asm/rwonce.h:1, from ./include/linux/compiler.h:390, from ./include/linux/array_size.h:5, from ./include/linux/kernel.h:16, from io_uring/futex.c:2: ./include/linux/futex.h: In function 'futex_mm_init': ./include/linux/rcupdate.h:555:36: error: dereferencing pointer to incomplete type 'struct futex_private_hash' The problem is that this variant of rcu_assign_pointer() wants to know the full type of 'struct futex_private_hash', which type is local to futex.c: kernel/futex/core.c:struct futex_private_hash { There are a couple of mechanical solutions for this bug: - we can uninline futex_mm_init() and move it into futex/core.c - or we can share the structure definition with kernel/fork.c. But both of these solutions have disadvantages: the first one adds runtime overhead, while the second one dis-encapsulates private futex types. A third solution, implemented by this patch, is to just initialize mm->futex_phash with NULL like the patch below, it's not like this new MM's ->futex_phash can be observed externally until the task is inserted into the task list, which guarantees full store ordering. The relaxation of this initialization might also give a tiny speedup on certain platforms. Fixes: bd54df5ea7ca ("futex: Allow to resize the private local hash") Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: André Almeida <andrealmeid@igalia.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/aB8SI00EHBri23lB@gmail.com
2025-05-10Merge tag 'mm-hotfixes-stable-2025-05-10-14-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "22 hotfixes. 13 are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates" * tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mm: fix folio_pte_batch() on XEN PV nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs() mm/hugetlb: copy the CMA flag when demoting mm, swap: fix false warning for large allocation with !THP_SWAP selftests/mm: fix a build failure on powerpc selftests/mm: fix build break when compiling pkey_util.c mm: vmalloc: support more granular vrealloc() sizing tools/testing/selftests: fix guard region test tmpfs assumption ocfs2: stop quota recovery before disabling quotas ocfs2: implement handshaking with ocfs2 recovery thread ocfs2: switch osb->disable_recovery to enum mailmap: map Uwe's BayLibre addresses to a single one MAINTAINERS: add mm THP section mm/userfaultfd: fix uninitialized output field for -EAGAIN race selftests/mm: compaction_test: support platform with huge mount of memory MAINTAINERS: add core mm section ocfs2: fix panic in failed foilio allocation mm/huge_memory: fix dereferencing invalid pmd migration entry MAINTAINERS: add reverse mapping section x86: disable image size check for test builds ...
2025-05-10dt-bindings: clock: add SM6350 QCOM video clock bindingsKonrad Dybcio
Add device tree bindings for video clock controller for SM6350 SoCs. Signed-off-by: Konrad Dybcio <konradybcio@kernel.org> Co-developed-by: Luca Weiss <luca.weiss@fairphone.com> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250324-sm6350-videocc-v2-2-cc22386433f4@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-05-10Merge tag 'char-misc-6.15-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc/IIO driver fixes from Greg KH: "Here are a bunch of small driver fixes (mostly all IIO) for 6.15-rc6. Included in here are: - loads of tiny IIO driver fixes for reported issues - hyperv driver fix for a much-reported and worked on sysfs ring buffer creation bug All of these have been in linux-next for over a week (the IIO ones for many weeks now), with no reported issues" * tag 'char-misc-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits) Drivers: hv: Make the sysfs node size for the ring buffer dynamic uio_hv_generic: Fix sysfs creation path for ring buffer iio: adis16201: Correct inclinometer channel resolution iio: adc: ad7606: fix serial register access iio: pressure: mprls0025pa: use aligned_s64 for timestamp iio: imu: adis16550: align buffers for timestamp staging: iio: adc: ad7816: Correct conditional logic for store mode iio: adc: ad7266: Fix potential timestamp alignment issue. iio: adc: ad7768-1: Fix insufficient alignment of timestamp. iio: adc: dln2: Use aligned_s64 for timestamp iio: accel: adxl355: Make timestamp 64-bit aligned using aligned_s64 iio: temp: maxim-thermocouple: Fix potential lack of DMA safe buffer. iio: chemical: pms7003: use aligned_s64 for timestamp iio: chemical: sps30: use aligned_s64 for timestamp iio: imu: inv_mpu6050: align buffer for timestamp iio: imu: st_lsm6dsx: Fix wakeup source leaks on device unbind iio: adc: qcom-spmi-iadc: Fix wakeup source leaks on device unbind iio: accel: fxls8962af: Fix wakeup source leaks on device unbind iio: adc: ad7380: fix event threshold shift iio: hid-sensor-prox: Fix incorrect OFFSET calculation ...
2025-05-10md: clean up accounting for issued sync IOYu Kuai
It's no longer used and can be removed, also remove the field 'gendisk->sync_io'. Link: https://lore.kernel.org/linux-raid/20250506124903.2540268-10-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com>
2025-05-10block: export API to get the number of bdev inflight IOYu Kuai
- rename part_in_{flight, flight_rw} to bdev_count_{inflight, inflight_rw} - export bdev_count_inflight, to fix a problem in mdraid that foreground IO can be starved by background sync IO in later patches Link: https://lore.kernel.org/linux-raid/20250506124903.2540268-6-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de>
2025-05-10dt-bindings: clock: sun50i-h616-ccu: Add LVDS resetChris Morgan
Add the required LVDS reset binding for the LCD TCON. Signed-off-by: Chris Morgan <macromorgan@hotmail.com> Signed-off-by: Ryan Walklin <ryan@testtoast.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://patch.msgid.link/20250507201943.330111-2-macroalpha82@gmail.com Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2025-05-09net: dsa: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert DSA to the new API, so that the ndo_eth_ioctl() path can be removed completely. Move the ds->ops->port_hwtstamp_get() and ds->ops->port_hwtstamp_set() calls from dsa_user_ioctl() to dsa_user_hwtstamp_get() and dsa_user_hwtstamp_set(). Due to the fact that the underlying ifreq type changes to kernel_hwtstamp_config, the drivers and the Ocelot switchdev front-end, all hooked up directly or indirectly, must also be converted all at once. The conversion also updates the comment from dsa_port_supports_hwtstamp(), which is no longer true because kernel_hwtstamp_config is kernel memory and does not need copy_to_user(). I've deliberated whether it is necessary to also update "err != -EOPNOTSUPP" to a more general "!err", but all drivers now either return 0 or -EOPNOTSUPP. The existing logic from the ocelot_ioctl() function, to avoid configuring timestamping if the PHY supports the operation, is obsoleted by more advanced core logic in dev_set_hwtstamp_phylib(). This is only a partial preparation for proper PHY timestamping support. None of these switch driver currently sets up PTP traps for PHY timestamping, so setting dev->see_all_hwtstamp_requests is not yet necessary and the conversion is relatively trivial. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # felix, sja1105, mv88e6xxx Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250508095236.887789-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09ethtool: Block setting of symmetric RSS when non-symmetric rx-flow-hash is ↵Gal Pressman
requested Symmetric RSS hash requires that: * No other fields besides IP src/dst and/or L4 src/dst are set * If src is set, dst must also be set This restriction was only enforced when RXNFC was configured after symmetric hash was enabled. In the opposite order of operations (RXNFC then symmetric enablement) the check was not performed. Perform the sanity check on set_rxfh as well, by iterating over all flow types hash fields and making sure they are all symmetric. Introduce a function that returns whether a flow type is hashable (not spec only) and needs to be iterated over. To make sure that no one forgets to update the list of hashable flow types when adding new flow types, a static assert is added to draw the developer's attention. The conversion of uapi #defines to enum is not ideal, but as Jakub mentioned [1], we have precedent for that. [1] https://lore.kernel.org/netdev/20250324073509.6571ade3@kernel.org/ Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250508103034.885536-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09Merge tag 'memory-controller-drv-6.16' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into soc/drivers Memory controller drivers for v6.16 1. Mediatek: Add support for MT6893 MTK SMI. 2. STM32: Add new driver for STM32 Octo Memory Manager (OMM), which manages muxing between two OSPI busses. 3. Several cleanups and minor improvements (OMAP GPMC, Kconfig entries, BT1 L2). * tag 'memory-controller-drv-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl: MAINTAINERS: add entry for STM32 OCTO MEMORY MANAGER driver memory: Add STM32 Octo Memory Manager driver dt-bindings: memory-controllers: Add STM32 Octo Memory Manager controller bus: firewall: Fix missing static inline annotations for stubs memory: bt1-l2-ctl: replace scnprintf() with sysfs_emit() memory: mtk-smi: Add support for Dimensity 1200 MT6893 SMI dt-bindings: memory: mtk-smi: Add support for MT6893 memory: tegra: Do not enable by default during compile testing memory: Simplify 'default' choice in Kconfig memory: omap-gpmc: remove GPIO set() and direction_output() callbacks memory: omap-gpmc: use the dedicated define for GPIO direction Link: https://lore.kernel.org/r/20250508093451.55755-2-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-09Merge tag 'memory-controller-drv-renesas-6.16' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into soc/drivers Renesas memory controller drivers for v6.16 Improvements and new device support for the Renesas RPC IF memory controller driver: 1. Minor cleanup and improvements. 2. Refactor the driver to accommodate for newly added Renesas RZ/G3E support: - Acquire two resets instead of only one, - Add RZ/G3E xSPI support with different register layout and its own, new interface for Renesas SPI. * tag 'memory-controller-drv-renesas-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl: memory: renesas-rpc-if: Add missing static keyword memory: renesas-rpc-if: Add RZ/G3E xSPI support memory: renesas-rpc-if: Add wrapper functions memory: renesas-rpc-if: Add regmap to struct rpcif_info memory: renesas-rpc-if: Use devm_reset_control_array_get_exclusive() memory: renesas-rpc-if: Move rpc-if reg definitions dt-bindings: memory: Document RZ/G3E support memory: renesas-rpc-if: Move rpcif_info definitions near to the user memory: renesas-rpc-if: Fix RPCIF_DRENR_CDB macro error Link: https://lore.kernel.org/r/20250508090749.51379-2-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-09Merge tag 'scmi-updates-6.16' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers Arm SCMI updates for v6.16 1. Quirk framework to handle buggy firmware With SCMI gaining broader adoption across arm64 platforms, it's increasingly important to address how we consistently manage out-of-spec SCMI firmware already deployed in the field. This change introduces a lightweight quirk framework built around static_keys, enabling developers to: - Define quirks and their match criteria, which can include: o A list of compatibles ({ comp, comp2, NULL }) o Vendor ID / Sub-Vendor ID o Firmware implementation version ranges ([Min_Vers, Max_Vers]) Matching proceeds from the most specific (longest match) to the least specific. NULL entries are treated as wildcards (i.e., match any value). This flexibility allows matching very specific combinations or just a general compatible string. The quirk code blocks/snippets implementing the workaround are placed near their intended usage and guarded by a static_key that's tied to the quirk. Once the SCMI core stack is initialized and retrieves platform info via the base protocol, any matching quirks will have their associated static_keys enabled. 2. Quirk for Qualcomm X1E platforms On some Qualcomm X1E platforms, such as the Lenovo ThinkPad T14s, the SCMI firmware fails to set the FastChannel support bit for PERF_LEVEL_GET, yet it crashes when the driver attempts to fall back to standard messaging which is clearly out-of-spec behavior. To work around this, the new SCMI quirk framework is used to unconditionally enable FC initialization for this firmware version. In the future, once the fixed firmware version is identified, an upper version bound can be added to the quirk match criteria. Alternatively, matching can be further restricted using a SoC-specific compatible string if always enabling FC proves problematic elsewhere. 3. Support for NXP i.MX LMM/CPU vendor protocol extensions The i.MX95 System Manager (SM) implements Logical Machine Management (LMM) and a CPU protocol to manage Logical Machines (LM) and CPUs (e.g., M7). These changes integrate the vendor-specific protocol extensions implementing the LMM and CPU protocols for the i.MX95, facilitating standardized communication between the operating system and the platform's firmware, which will be used by remoteproc drivers. The changes also include the necessary device tree bindings. 4. Miscellaneous cleanups/changes These mainly include polling support in SCMI raw mode. The cleanups centralize error logging for SCMI device creation into a single helper function, consolidate the device matching logic into a single function, and ensure that devices must have a name for registration—removing support for unnamed devices when matching drivers and devices for probing. Transport devices are now excluded from bus matching, and the correct assignment of the parent device for the arm-scmi platform device is ensured in the transport drivers. * tag 'scmi-updates-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_scmi: quirk: Force perf level get fastchannel firmware: arm_scmi: quirk: Fix CLOCK_DESCRIBE_RATES triplet firmware: arm_scmi: Add common framework to handle firmware quirks firmware: arm_scmi: Ensure that the message-id supports fastchannel MAINTAINERS: add entry for i.MX SCMI extensions firmware: imx: Add i.MX95 SCMI CPU driver firmware: imx: Add i.MX95 SCMI LMM driver firmware: arm_scmi: imx: Add i.MX95 CPU Protocol firmware: arm_scmi: imx: Add i.MX95 LMM protocol dt-bindings: firmware: Add i.MX95 SCMI LMM and CPU protocol firmware: arm_scmi: imx: Add LMM and CPU documentation firmware: arm_scmi: Add polling support to raw mode firmware: arm_scmi: Exclude transport devices from bus matching firmware: arm_scmi: Assign correct parent to arm-scmi platform device firmware: arm_scmi: Refactor error logging from SCMI device creation to single helper firmware: arm_scmi: Refactor device matching logic to eliminate duplication firmware: arm_scmi: Ensure scmi_devices are always matched by name as well Link: https://lore.kernel.org/r/20250507134713.49039-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-09Merge tag 'samsung-drivers-6.16' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers Samsung SoC drivers for v6.16 Several improvements to Exynos ACPM (Alive Clock and Power Manager) driver: 1. Handle communication timeous better. 2. Avoid sleeping, so users (PMIC) can still transfer during system shutdown. 3. Fix reading longer messages from them firmware. 4. Deferred probe improvements. 5. Model the user of ACPM - PMIC - a as child device and export devm_acpm_get_by_node() for such use case. * tag 'samsung-drivers-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: firmware: exynos-acpm: Correct kerneldoc and use typical np argument name firmware: exynos-acpm: introduce devm_acpm_get_by_node() firmware: exynos-acpm: populate devices from device tree data firmware: exynos-acpm: silence EPROBE_DEFER error on boot firmware: exynos-acpm: fix reading longer results dt-bindings: firmware: google,gs101-acpm-ipc: add PMIC child node firmware: exynos-acpm: allow use during system shutdown firmware: exynos-acpm: use ktime APIs for timeout detection Link: https://lore.kernel.org/r/20250501103541.13795-2-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-09Merge tag 'renesas-dts-for-v6.16-tag1' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into soc/dt Renesas DTS updates for v6.16 - Add SDHI, ICU, I2C, PMIC, and GPU support on the RZ/G3E SoC and the RZ/G3E SoM and SMARC Carrier-II EVK development board, - Add internal SDHI regulator support on the RZ/V2H(P) SoC, - Add UFS tuning parameters in E-FUSE on the R-Car S4-8 ES1.2 SoC, - Add support for Ethernet ports C and D, I2C, keys, and SDHI on the RZ/N1D SoC and the RZN1D-DB and RZN1D-EB development and expansion boards, - Add initial support for the RZ/V2N (R9A09G056) and the RZ/V2N EVK board, - Add support for the Retronix Sparrow Hawk board, which is based on R-Car V4H ES3.0, - Add ISP core support on R-Car V3U, V4H, and V4M, - Miscellaneous fixes and improvements. * tag 'renesas-dts-for-v6.16-tag1' of https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel: (29 commits) arm64: dts: renesas: r8a779h0: Add ISP core function block arm64: dts: renesas: r8a779g0: Add ISP core function block arm64: dts: renesas: r8a779a0: Add ISP core function block arm64: dts: renesas: r8a779g3: Add Retronix R-Car V4H Sparrow Hawk board support arm64: dts: renesas: rzg3e-smarc-som: Enable Mali-G52 arm64: dts: renesas: r9a09g047: Add Mali-G52 GPU node arm64: dts: renesas: rzg3e-smarc-som: Add RAA215300 pmic support arm64: dts: renesas: rzg3e-smarc-som: Add I2C2 device pincontrol ARM: dts: renesas: r9a06g032-rzn1d400-eb: describe SD card port ARM: dts: renesas: r9a06g032: Describe SDHCI controllers arm64: dts: renesas: Add initial device tree for RZ/V2N EVK arm64: dts: renesas: Add initial SoC DTSI for RZ/V2N dt-bindings: pinctrl: renesas: Document RZ/V2N SoC dt-bindings: clock: renesas: Document RZ/V2N SoC CPG dt-bindings: soc: renesas: Document SYS for RZ/V2N SoC dt-bindings: soc: renesas: Document Renesas RZ/V2N SoC variants and EVK ARM: dts: renesas: r9a06g032-rzn1d400-db: Describe keys ARM: dts: renesas: r9a06g032-rzn1d400-eb: Describe I2C bus ARM: dts: renesas: r9a06g032-rzn1d400-db: Describe I2C bus ARM: dts: renesas: r9a06g032: Describe I2C controllers ... Link: https://lore.kernel.org/r/cover.1745582596.git.geert+renesas@glider.be Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-09x86/its: Use dynamic thunks for indirect branchesPeter Zijlstra
ITS mitigation moves the unsafe indirect branches to a safe thunk. This could degrade the prediction accuracy as the source address of indirect branches becomes same for different execution paths. To improve the predictions, and hence the performance, assign a separate thunk for each indirect callsite. This is also a defense-in-depth measure to avoid indirect branches aliasing with each other. As an example, 5000 dynamic thunks would utilize around 16 bits of the address space, thereby gaining entropy. For a BTB that uses 32 bits for indexing, dynamic thunks could provide better prediction accuracy over fixed thunks. Have ITS thunks be variable sized and use EXECMEM_MODULE_TEXT such that they are both more flexible (got to extend them later) and live in 2M TLBs, just like kernel code, avoiding undue TLB pressure. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09mm/execmem: Unify early execmem_cache behaviourPeter Zijlstra
Early kernel memory is RWX, only at the end of early boot (before SMP) do we mark things ROX. Have execmem_cache mirror this behaviour for early users. This avoids having to remember what code is execmem and what is not -- we can poke everything with impunity ;-) Also performance for not having to do endless text_poke_mm switches. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-09x86/its: Enable Indirect Target Selection mitigationPawan Gupta
Indirect Target Selection (ITS) is a bug in some pre-ADL Intel CPUs with eIBRS. It affects prediction of indirect branch and RETs in the lower half of cacheline. Due to ITS such branches may get wrongly predicted to a target of (direct or indirect) branch that is located in the upper half of the cacheline. Scope of impact =============== Guest/host isolation -------------------- When eIBRS is used for guest/host isolation, the indirect branches in the VMM may still be predicted with targets corresponding to branches in the guest. Intra-mode ---------- cBPF or other native gadgets can be used for intra-mode training and disclosure using ITS. User/kernel isolation --------------------- When eIBRS is enabled user/kernel isolation is not impacted. Indirect Branch Prediction Barrier (IBPB) ----------------------------------------- After an IBPB, indirect branches may be predicted with targets corresponding to direct branches which were executed prior to IBPB. This is mitigated by a microcode update. Add cmdline parameter indirect_target_selection=off|on|force to control the mitigation to relocate the affected branches to an ITS-safe thunk i.e. located in the upper half of cacheline. Also add the sysfs reporting. When retpoline mitigation is deployed, ITS safe-thunks are not needed, because retpoline sequence is already ITS-safe. Similarly, when call depth tracking (CDT) mitigation is deployed (retbleed=stuff), ITS safe return thunk is not used, as CDT prevents RSB-underflow. To not overcomplicate things, ITS mitigation is not supported with spectre-v2 lfence;jmp mitigation. Moreover, it is less practical to deploy lfence;jmp mitigation on ITS affected parts anyways. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
2025-05-10Merge tag 'drm-intel-next-2025-05-08' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next Non-display related: - Fix undefined reference to `intel_pxp_gsccs_is_ready_for_sessions' Display related: - More work towards display separation (Jani) - Stop writing VRR_CTL_IGN_MAX_SHIFT for MTL onwards (Jouni) - DSC checks for 3 engines (Ankit) - Add link rate and lane count to i915_display_info (Khaled) - PSR fixes and workaround for underrun on idle (Jouni) - LOBF enablement and ALMP fixes (Animesh) - Clean up VGA plane handling (Ville) - Use an intel_connector pointer everywhere (Imre) - Fix warning for coffeelake on SunrisePoint PCH (Jiajia) - Rework/Correction on minimum hblank calculation (Arun) - Dmesg clean up (Jani) - Add a couple of simple display workarounds (Ankit, Vinod) - Refactor HDCP GSC (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/aByyL3bEufPu79OM@intel.com
2025-05-09bpf: Add support to retrieve ref_ctr_offset for uprobe perf linkJiri Olsa
Adding support to retrieve ref_ctr_offset for uprobe perf link, which got somehow omitted from the initial uprobe link info changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/bpf/20250509153539.779599-2-jolsa@kernel.org