summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-11-24mm/memremap: add driver callback support for folio splittingBalbir Singh
When a zone device page is split (via huge pmd folio split). The driver callback for folio_split is invoked to let the device driver know that the folio size has been split into a smaller order. Provide a default implementation for drivers that do not provide this callback that copies the pgmap and mapping fields for the split folios. Update the HMM test driver to handle the split. Link: https://lkml.kernel.org/r/20251001065707.920170-11-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24lib/test_hmm: add zone device private THP test infrastructureBalbir Singh
Enhance the hmm test driver (lib/test_hmm) with support for THP pages. A new pool of free_folios() has now been added to the dmirror device, which can be allocated when a request for a THP zone device private page is made. Add compound page awareness to the allocation function during normal migration and fault based migration. These routines also copy folio_nr_pages() when moving data between system memory and device memory. args.src and args.dst used to hold migration entries are now dynamically allocated (as they need to hold HPAGE_PMD_NR entries or more). Split and migrate support will be added in future patches in this series. Link: https://lkml.kernel.org/r/20251001065707.920170-10-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/memory/fault: add THP fault handling for zone device private pagesBalbir Singh
Implement CPU fault handling for zone device THP entries through do_huge_pmd_device_private(), enabling transparent migration of device-private large pages back to system memory on CPU access. When the CPU accesses a zone device THP entry, the fault handler calls the device driver's migrate_to_ram() callback to migrate the entire large page back to system memory. Link: https://lkml.kernel.org/r/20251001065707.920170-9-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/migrate_device: implement THP migration of zone device pagesBalbir Singh
MIGRATE_VMA_SELECT_COMPOUND will be used to select THP pages during migrate_vma_setup() and MIGRATE_PFN_COMPOUND will make migrating device pages as compound pages during device pfn migration. migrate_device code paths go through the collect, setup and finalize phases of migration. The entries in src and dst arrays passed to these functions still remain at a PAGE_SIZE granularity. When a compound page is passed, the first entry has the PFN along with MIGRATE_PFN_COMPOUND and other flags set (MIGRATE_PFN_MIGRATE, MIGRATE_PFN_VALID), the remaining entries (HPAGE_PMD_NR - 1) are filled with 0's. This representation allows for the compound page to be split into smaller page sizes. migrate_vma_collect_hole(), migrate_vma_collect_pmd() are now THP page aware. Two new helper functions migrate_vma_collect_huge_pmd() and migrate_vma_insert_huge_pmd_page() have been added. migrate_vma_collect_huge_pmd() can collect THP pages, but if for some reason this fails, there is fallback support to split the folio and migrate it. migrate_vma_insert_huge_pmd_page() closely follows the logic of migrate_vma_insert_page() Support for splitting pages as needed for migration will follow in later patches in this series. Link: https://lkml.kernel.org/r/20251001065707.920170-8-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/huge_memory: add device-private THP support to PMD operationsBalbir Singh
Extend core huge page management functions to handle device-private THP entries. This enables proper handling of large device-private folios in fundamental MM operations. The following functions have been updated: - copy_huge_pmd(): Handle device-private entries during fork/clone - zap_huge_pmd(): Properly free device-private THP during munmap - change_huge_pmd(): Support protection changes on device-private THP - __pte_offset_map(): Add device-private entry awareness Link: https://lkml.kernel.org/r/20251001065707.920170-4-balbirs@nvidia.com Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Balbir Singh <balbirs@nvidia.com> Acked-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/zone_device: rename page_free callback to folio_freeBalbir Singh
Change page_free to folio_free to make the folio support for zone device-private more consistent. The PCI P2PDMA callback has also been updated and changed to folio_free() as a result. For drivers that do not support folios (yet), the folio is converted back into page via &folio->page and the page is used as is, in the current callback implementation. Link: https://lkml.kernel.org/r/20251001065707.920170-3-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/zone_device: support large zone device private foliosBalbir Singh
Patch series "mm: support device-private THP", v7. This patch series introduces support for Transparent Huge Page (THP) migration in zone device-private memory. The implementation enables efficient migration of large folios between system memory and device-private memory Background Current zone device-private memory implementation only supports PAGE_SIZE granularity, leading to: - Increased TLB pressure - Inefficient migration between CPU and device memory This series extends the existing zone device-private infrastructure to support THP, leading to: - Reduced page table overhead - Improved memory bandwidth utilization - Seamless fallback to base pages when needed In my local testing (using lib/test_hmm) and a throughput test, the series shows a 350% improvement in data transfer throughput and a 80% improvement in latency These patches build on the earlier posts by Ralph Campbell [1] Two new flags are added in vma_migration to select and mark compound pages. migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize() support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND is passed in as arguments. The series also adds zone device awareness to (m)THP pages along with fault handling of large zone device private pages. page vma walk and the rmap code is also zone device aware. Support has also been added for folios that might need to be split in the middle of migration (when the src and dst do not agree on MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can migrate large pages, but the destination has not been able to allocate large pages. The code supported and used folio_split() when migrating THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed as an argument to migrate_vma_setup(). The test infrastructure lib/test_hmm.c has been enhanced to support THP migration. A new ioctl to emulate failure of large page allocations has been added to test the folio split code path. hmm-tests.c has new test cases for huge page migration and to test the folio split path. A new throughput test has been added as well. The nouveau dmem code has been enhanced to use the new THP migration capability. mTHP support: The patches hard code, HPAGE_PMD_NR in a few places, but the code has been kept generic to support various order sizes. With additional refactoring of the code support of different order sizes should be possible. The future plan is to post enhancements to support mTHP with a rough design as follows: 1. Add the notion of allowable thp orders to the HMM based test driver 2. For non PMD based THP paths in migrate_device.c, check to see if a suitable order is found and supported by the driver 3. Iterate across orders to check the highest supported order for migration 4. Migrate and finalize The mTHP patches can be built on top of this series, the key design elements that need to be worked out are infrastructure and driver support for multiple ordered pages and their migration. HMM support for large folios was added in 10b9feee2d0d ("mm/hmm: populate PFNs from PMD swap entry"). This patch (of 16) Add routines to support allocation of large order zone device folios and helper functions for zone device folios, to check if a folio is device private and helpers for setting zone device data. When large folios are used, the existing page_free() callback in pgmap is called when the folio is freed, this is true for both PAGE_SIZE and higher order pages. Zone device private large folios do not support deferred split and scan like normal THP folios. Link: https://lkml.kernel.org/r/20251001065707.920170-1-balbirs@nvidia.com Link: https://lkml.kernel.org/r/20251001065707.920170-2-balbirs@nvidia.com Link: https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/ [1] Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24irqchip/gic: Expose CPU interface VA to KVMMarc Zyngier
Future changes will require KVM to be able to perform deactivations by writing to the physical CPU interface. Add the corresponding VA to the kvm_info structure, and let KVM stash it. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-3-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24irqchip/gic: Add missing GICH_HCR control bitsMarc Zyngier
The GICH_HCR description is missing a bunch of control bits that control the maintenance interrupt. Add them. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-2-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24cpumask: Don't use "proxy" headersAndy Shevchenko
Update header inclusions to follow IWYU (Include What You Use) principle. Note that kernel.h is discouraged to be included as it's written at the top of that file. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
2025-11-24string: Add missing kernel-doc return descriptionsKriish Sharma
While running kernel-doc validation on linux-next, warnings were emitted for functions in include/linux/string.h due to missing return value documentation: Warning: include/linux/string.h:375 No description found for return value of 'kbasename' Warning: include/linux/string.h:560 No description found for return value of 'strstarts' This patch adds the missing return value descriptions for both functions and clears the related kernel-doc warnings. Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://patch.msgid.link/20251118184828.2621595-1-kriish.sharma2006@gmail.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-11-24Add RSPI support for RZ/T2H and RZ/N2HMark Brown
Merge series from Cosmin Tanislav <cosmin-gabriel.tanislav.xa@renesas.com>: Add support for RZ/T2H and RZ/N2H.
2025-11-24bitfield: Add non-constant field_{prep,get}() helpersGeert Uytterhoeven
The existing FIELD_{GET,PREP}() macros are limited to compile-time constants. However, it is very common to prepare or extract bitfield elements where the bitfield mask is not a compile-time constant. To avoid this limitation, the AT91 clock driver and several other drivers already have their own non-const field_{prep,get}() macros. Make them available for general use by adding them to <linux/bitfield.h>, and improve them slightly: 1. Avoid evaluating macro parameters more than once, 2. Replace "ffs() - 1" by "__ffs()", 3. Support 64-bit use on 32-bit architectures, 4. Wire field_{get,prep}() to FIELD_{GET,PREP}() when mask is actually constant. This is deliberately not merged into the existing FIELD_{GET,PREP}() macros, as people expressed the desire to keep stricter variants for increased safety, or for performance critical paths. Yury: use __mask withing new macros. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Crt Mori <cmo@melexis.com> Acked-by: Nuno Sá <nuno.sa@analog.com> Acked-by: Richard Genoud <richard.genoud@bootlin.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
2025-11-24bitfield: Add less-checking __FIELD_{GET,PREP}()Geert Uytterhoeven
The BUILD_BUG_ON_MSG() check against "~0ull" works only with "unsigned (long) long" _mask types. For constant masks, that condition is usually met, as GENMASK() yields an UL value. The few places where the constant mask is stored in an intermediate variable were fixed by changing the variable type to u64 (see e.g. [1] and [2]). However, for non-constant masks, smaller unsigned types should be valid, too, but currently lead to "result of comparison of constant 18446744073709551615 with expression of type ... is always false"-warnings with clang and W=1. Hence refactor the __BF_FIELD_CHECK() helper, and factor out __FIELD_{GET,PREP}(). The later lack the single problematic check, but are otherwise identical to FIELD_{GET,PREP}(), and are intended to be used in the fully non-const variants later. [1] commit 5c667d5a5a3ec166 ("clk: sp7021: Adjust width of _m in HWM_FIELD_PREP()") [2] commit cfd6fb45cfaf46fa ("crypto: ccree - avoid out-of-range warnings from clang") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://git.kernel.org/torvalds/c/5c667d5a5a3ec166 [1] Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
2025-11-24firmware: cs_dsp: Store control length as 32-bitRichard Fitzgerald
The architectures supported by this driver have a maximum of 32-bits of address, so we don't need more than 32-bits to store the length of control data. Change the length in struct cs_dsp_coeff_ctl to an unsigned int instead of a size_t. Also make a corresponding trivial change to wm_adsp.c to prevent a compiler warning. Tested on x86_64 builds this saves at least 4 bytes per control (another 4 bytes might be saved if the compiler was inserting padding to align the size_t). Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20251124171536.78962-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-24bpf: specify the old and new poke_type for bpf_arch_text_pokeMenglong Dong
In the origin logic, the bpf_arch_text_poke() assume that the old and new instructions have the same opcode. However, they can have different opcode if we want to replace a "call" insn with a "jmp" insn. Therefore, add the new function parameter "old_t" along with the "new_t", which are used to indicate the old and new poke type. Meanwhile, adjust the implement of bpf_arch_text_poke() for all the archs. "BPF_MOD_NOP" is added to make the code more readable. In bpf_arch_text_poke(), we still check if the new and old address is NULL to determine if nop insn should be used, which I think is more safe. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20251118123639.688444-6-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-24bpf,x86: adjust the "jmp" mode for bpf trampolineMenglong Dong
In the origin call case, if BPF_TRAMP_F_SKIP_FRAME is not set, it means that the trampoline is not called, but "jmp". Introduce the function bpf_trampoline_use_jmp() to check if the trampoline is in "jmp" mode. Do some adjustment on the "jmp" mode for the x86_64. The main adjustment that we make is for the stack parameter passing case, as the stack alignment logic changes in the "jmp" mode without the "rip". What's more, the location of the parameters on the stack also changes. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20251118123639.688444-5-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-24ftrace: Introduce FTRACE_OPS_FL_JMPMenglong Dong
For now, the "nop" will be replaced with a "call" instruction when a function is hooked by the ftrace. However, sometimes the "call" can break the RSB and introduce extra overhead. Therefore, introduce the flag FTRACE_OPS_FL_JMP, which indicate that the ftrace_ops should be called with a "jmp" instead of "call". For now, it is only used by the direct call case. When a direct ftrace_ops is marked with FTRACE_OPS_FL_JMP, the last bit of the ops->direct_call will be set to 1. Therefore, we can tell if we should use "jmp" for the callback in ftrace_call_replace(). Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20251118123639.688444-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-24firmware: stratix10-svc: fix make htmldocs warningDinh Nguyen
Stephen Rothwell reports htmldocs warnings when merging char-misc tree: WARNING: include/linux/firmware/intel/stratix10-svc-client.h:22 This comment starts with '/**', but isn't a kernel-doc comment. WARNING: include/linux/firmware/intel/stratix10-svc-client.h:184 Enum value 'COMMAND_HWMON_READTEMP' not described in enum 'stratix10_svc_command_code' WARNING: include/linux/firmware/intel/stratix10-svc-client.h:184 Enum value 'COMMAND_HWMON_READVOLT' not described in enum 'stratix10_svc_command_code' WARNING: include/linux/firmware/intel/stratix10-svc-client.h:307 function parameter 'cb_arg' not described in 'async_callback_t' Fixes: 4f49088c1625 ("firmware: stratix10-svc: Add definition for voltage and temperature sensor") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20251114153920.1c5df700@canb.auug.org.au/ Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Link: https://patch.msgid.link/20251114185815.358423-3-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-24Merge tag 'icc-6.19-rc1' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next Georgi writes: interconnect changes for 6.19 This pull request contains the interconnect changes for the 6.19-rc1 merge window. The core and driver changes are listed below. Core changes: - kbps_to_icc() macro optimization Driver changes: - Switch all Qualcomm RPMh interconnect drivers to use the dynamic node IDs and drop support for non-dynamic ID allocation - Add new driver and BWMON support for the Kaanapali SoC - Add QoS support for the SM6350 SoC - Add QoS support for the SA8775p SoC - Fix missing link from SNOC_PNOC to the USB 2 on MSM8996 SoC that includes also a dts change that has been acked by the maintainer - Drop the QPIC interconnect and BCM nodes for the SDX75 SoC, as these should be handled by the rpmh-clk driver - Other misc fixes Signed-off-by: Georgi Djakov <djakov@kernel.org> * tag 'icc-6.19-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/djakov/icc: (40 commits) interconnect: qcom: sm6350: enable QoS configuration interconnect: qcom: sm6350: Remove empty BCM arrays interconnect: qcom: icc-rpmh: Get parent's regmap for nested NoCs dt-bindings: interconnect: qcom,sm6350-rpmh: Add clocks for QoS dt-bindings: interconnect: qcom-bwmon: Document Kaanapali BWMONs interconnect: qcom: icc-rpmh: drop support for non-dynamic IDS interconnect: qcom: sm8750: convert to dynamic IDs interconnect: qcom: sm8650: convert to dynamic IDs interconnect: qcom: sm8550: convert to dynamic IDs interconnect: qcom: sm8450: convert to dynamic IDs interconnect: qcom: sm8350: convert to dynamic IDs interconnect: qcom: sm8150: convert to dynamic IDs interconnect: qcom: sm7150: convert to dynamic IDs interconnect: qcom: sm6350: convert to dynamic IDs interconnect: qcom: sdx75: convert to dynamic IDs interconnect: qcom: sdx65: convert to dynamic IDs interconnect: qcom: sdx55: convert to dynamic IDs interconnect: qcom: sdm670: convert to dynamic IDs interconnect: qcom: sc7180: convert to dynamic IDs interconnect: qcom: sar2130p: convert to dynamic IDs ...
2025-11-24Merge tag 'coresight-next-v6.19' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next Suzuki writes: coresight: Updates for Linux v6.19 The changes for Linux v6.19 include : - Support for static TPDM - Fixes to TMC-ETR with CATU where buffer wasn't available to CATU in perf mode - Clean ups to the component operations to accept coresight_path - Fixes to the ETM4x/ETM3x driver Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> * tag 'coresight-next-v6.19' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux: coresight: etm4x: Remove the state_needs_restore flag coresight: etm4x: Remove the redundant DSB coresight: etm4x: Properly control filter in CPU idle with FEAT_TRF coresight: etm4x: Add context synchronization before enabling trace coresight: etm4x: Correct polling IDLE bit coresight: etm3x: Always set tracer's device mode on target CPU coresight: etm4x: Always set tracer's device mode on target CPU coresight: Change device mode to atomic type coresight: change the sink_ops to accept coresight_path coresight: change helper_ops to accept coresight_path coresight: tmc: add the handle of the event to the path coresight: tpdm: remove redundant check for drvdata coresight: tpdm: add static tpdm support dt-bindings: arm: document the static TPDM compatible coresight: ETR: Fix ETR buffer use-after-free issue
2025-11-24wifi: mt76: Introduce the NPU generic layerLorenzo Bianconi
Add the NPU generic layer in mt76 module. NPU will be used to enable traffic forward offloading between the MT76 NIC and the Airoha ethernet one available on the Airoha EN7581 SoC using Netfilter Flowtable APIs. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20251017-mt76-npu-devel-v2-4-ddaa90901723@kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-11-24wifi: mt76: wed: use proper wed reference in mt76 wed driver callabacksLorenzo Bianconi
MT7996 driver can use both wed and wed_hif2 devices to offload traffic from/to the wireless NIC. In the current codebase we assume to always use the primary wed device in wed callbacks resulting in the following crash if the hw runs wed_hif2 (e.g. 6GHz link). [ 297.455876] Unable to handle kernel read from unreadable memory at virtual address 000000000000080a [ 297.464928] Mem abort info: [ 297.467722] ESR = 0x0000000096000005 [ 297.471461] EC = 0x25: DABT (current EL), IL = 32 bits [ 297.476766] SET = 0, FnV = 0 [ 297.479809] EA = 0, S1PTW = 0 [ 297.482940] FSC = 0x05: level 1 translation fault [ 297.487809] Data abort info: [ 297.490679] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 297.496156] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 297.501196] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 297.506500] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000107480000 [ 297.512927] [000000000000080a] pgd=08000001097fb003, p4d=08000001097fb003, pud=08000001097fb003, pmd=0000000000000000 [ 297.523532] Internal error: Oops: 0000000096000005 [#1] SMP [ 297.715393] CPU: 2 UID: 0 PID: 45 Comm: kworker/u16:2 Tainted: G O 6.12.50 #0 [ 297.723908] Tainted: [O]=OOT_MODULE [ 297.727384] Hardware name: Banana Pi BPI-R4 (2x SFP+) (DT) [ 297.732857] Workqueue: nf_ft_offload_del nf_flow_rule_route_ipv6 [nf_flow_table] [ 297.740254] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 297.747205] pc : mt76_wed_offload_disable+0x64/0xa0 [mt76] [ 297.752688] lr : mtk_wed_flow_remove+0x58/0x80 [ 297.757126] sp : ffffffc080fe3ae0 [ 297.760430] x29: ffffffc080fe3ae0 x28: ffffffc080fe3be0 x27: 00000000deadbef7 [ 297.767557] x26: ffffff80c5ebca00 x25: 0000000000000001 x24: ffffff80c85f4c00 [ 297.774683] x23: ffffff80c1875b78 x22: ffffffc080d42cd0 x21: ffffffc080660018 [ 297.781809] x20: ffffff80c6a076d0 x19: ffffff80c6a043c8 x18: 0000000000000000 [ 297.788935] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000000 [ 297.796060] x14: 0000000000000019 x13: ffffff80c0ad8ec0 x12: 00000000fa83b2da [ 297.803185] x11: ffffff80c02700c0 x10: ffffff80c0ad8ec0 x9 : ffffff81fef96200 [ 297.810311] x8 : ffffff80c02700c0 x7 : ffffff80c02700d0 x6 : 0000000000000002 [ 297.817435] x5 : 0000000000000400 x4 : 0000000000000000 x3 : 0000000000000000 [ 297.824561] x2 : 0000000000000001 x1 : 0000000000000800 x0 : ffffff80c6a063c8 [ 297.831686] Call trace: [ 297.834123] mt76_wed_offload_disable+0x64/0xa0 [mt76] [ 297.839254] mtk_wed_flow_remove+0x58/0x80 [ 297.843342] mtk_flow_offload_cmd+0x434/0x574 [ 297.847689] mtk_wed_setup_tc_block_cb+0x30/0x40 [ 297.852295] nf_flow_offload_ipv6_hook+0x7f4/0x964 [nf_flow_table] [ 297.858466] nf_flow_rule_route_ipv6+0x438/0x4a4 [nf_flow_table] [ 297.864463] process_one_work+0x174/0x300 [ 297.868465] worker_thread+0x278/0x430 [ 297.872204] kthread+0xd8/0xdc [ 297.875251] ret_from_fork+0x10/0x20 [ 297.878820] Code: 928b5ae0 8b000273 91400a60 f943fa61 (79401421) [ 297.884901] ---[ end trace 0000000000000000 ]--- Fix the issue detecting the proper wed reference to use running wed callabacks. Fixes: 83eafc9251d6 ("wifi: mt76: mt7996: add wed tx support") Tested-by: Daniel Pawlik <pawlik.dan@gmail.com> Tested-by: Matteo Croce <teknoraver@meta.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20251008-wed-fixes-v1-1-8f7678583385@kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name>
2025-11-24s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPUHeiko Carstens
Since the rework of the kernel virtual address space [1] the module area and the kernel image are within the same 4GB area. Therefore there is no need for the weak per cpu workaround for modules anymore. Remove it. [1] commit c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-23NFS: Request a directory delegation during RENAMEAnna Schumaker
If we notice that we're renaming a file within a directory then we take that as a sign that the user is working with the current directory and may want a delegation to avoid extra revalidations when possible. The nfs_request_directory_delegation() function exists within the NFS v4 module, so I add an extra flag to rename_setup() to indicate if a dentry is being renamed within the same parent directory. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFS: Request a directory delegation on ACCESS, CREATE, and UNLINKAnna Schumaker
This patch adds a new flag: NFS_INO_REQ_DIR_DELEG to signal that a directory wants to request a directory delegation the next time it does a GETATTR. I have the client request a directory delegation when doing an access, create, or unlink call since these calls indicate that a user is working with a directory. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFS: Add support for sending GDD_GETATTRAnna Schumaker
I add this to the existing GETATTR compound as an option extra step that we can send if the "dir_deleg" flag is set to 'true'. Actually enabling this value will happen in a later patch. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23SUNRPC: new helper function for stopping backchannel serverOlga Kornievskaia
Create a new backchannel function to stop the backchannel server and clear the bc_serv in transport protected under the bc_pa_lock. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23SUNRPC: cleanup common code in backchannel requestOlga Kornievskaia
Create a helper function for common code between rdma and tcp backchannel handling of the backchannel request. Make sure that access is protected by the bc_pa_lock lock. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23compiler_types: introduce at_least parameter decoration pseudo keywordJason A. Donenfeld
Clang and recent gcc support warning if they are able to prove that the user is passing to a function an array that is too short in size. For example: void blah(unsigned char herp[at_least 7]); static void schma(void) { unsigned char good[] = { 1, 2, 3, 4, 5, 6, 7 }; unsigned char bad[] = { 1, 2, 3, 4, 5, 6 }; blah(good); blah(bad); } The notation here, `static 7`, which this commit makes explicit by allowing us to write it as `at_least 7`, means that it's incorrect to pass anything less than 7 elements. This is section 6.7.5.3 of C99: If the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the size expression. Here is the output from gcc 15: zx2c4@thinkpad /tmp $ gcc -c a.c a.c: In function ‘schma’: a.c:9:9: warning: ‘blah’ accessing 7 bytes in a region of size 6 [-Wstringop-overflow=] 9 | blah(bad); | ^~~~~~~~~ a.c:9:9: note: referencing argument 1 of type ‘unsigned char[7]’ a.c:2:6: note: in a call to function ‘blah’ 2 | void blah(unsigned char herp[at_least 7]); | ^~~~ And from clang 21: zx2c4@thinkpad /tmp $ clang -c a.c a.c:9:2: warning: array argument is too small; contains 6 elements, callee requires at least 7 [-Warray-bounds] 9 | blah(bad); | ^ ~~~ a.c:2:25: note: callee declares array parameter as static here 2 | void blah(unsigned char herp[at_least 7]); | ^ ~~~~~~~~~~ 1 warning generated. So these are covered by, variously, -Wstringop-overflow and -Warray-bounds. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20251123054819.2371989-3-Jason@zx2c4.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-24PM / devfreq: Move governor.h to a public header locationDmitry Baryshkov
Some device drivers (and out-of-tree modules) might want to define device-specific device governors. Rather than restricting all of them to be a part of drivers/devfreq/ (which is not possible for out-of-tree drivers anyway) move governor.h to include/linux/devfreq-governor.h and update all drivers to use it. The devfreq_cpu_data is only used internally, by the passive governor, so it is moved to the driver source rather than being a part of the public interface. Reported-by: Robie Basak <robibasa@qti.qualcomm.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Link: https://patchwork.kernel.org/project/linux-pm/patch/20251030-governor-public-v2-1-432a11a9975a@oss.qualcomm.com/
2025-11-23mempool: de-typedefChristoph Hellwig
Switch all uses of the deprecated mempool_t typedef in the core mempool code to use struct mempool instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251113084022.1255121-11-hch@lst.de Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-23mempool: remove mempool_{init,create}_kvmalloc_poolChristoph Hellwig
This was added for bcachefs and is unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251113084022.1255121-10-hch@lst.de Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-23mempool: add mempool_{alloc,free}_bulkChristoph Hellwig
Add a version of the mempool allocator that works for batch allocations of multiple objects. Calling mempool_alloc in a loop is not safe because it could deadlock if multiple threads are performing such an allocation at the same time. As an extra benefit the interface is build so that the same array can be used for alloc_pages_bulk / release_pages so that at least for page backed mempools the fast path can use a nice batch optimization. Note that mempool_alloc_bulk does not take a gfp_mask argument as it must always be able to sleep and doesn't support any non-trivial modifiers. NOFO or NOIO constrainst must be set through the scoped API. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251113084022.1255121-8-hch@lst.de Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-22Merge tag 'v6.18-rc3' into irq/msiThomas Gleixner
Pick up OF changes to resolve dependencies
2025-11-21bpf: support nested rcu critical sectionsPuranjay Mohan
Currently, nested rcu critical sections are rejected by the verifier and rcu_lock state is managed by a boolean variable. Add support for nested rcu critical sections by make active_rcu_locks a counter similar to active_preempt_locks. bpf_rcu_read_lock() increments this counter and bpf_rcu_read_unlock() decrements it, MEM_RCU -> PTR_UNTRUSTED transition happens when active_rcu_locks drops to 0. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251117200411.25563-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21bpf: correct stack liveness for tail callsEduard Zingerman
This updates bpf_insn_successors() reflecting that control flow might jump over the instructions between tail call and function exit, verifier might assume that some writes to parent stack always happen, which is not the case. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu> Link: https://lore.kernel.org/r/20251119160355.1160932-4-martin.teichmann@xfel.eu Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-21x86,fs/resctrl: Implement "io_alloc" enable/disable handlersBabu Moger
"io_alloc" is the generic name of the new resctrl feature that enables system software to configure the portion of cache allocated for I/O traffic. On AMD systems, "io_alloc" resctrl feature is backed by AMD's L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). Introduce the architecture-specific functions that resctrl fs should call to enable, disable, or check status of the "io_alloc" feature. Change SDCIAE state by setting (to enable) or clearing (to disable) bit 1 of MSR_IA32_L3_QOS_EXT_CFG on all logical processors within the cache domain. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/9e9070100c320eab5368e088a3642443dee95ed7.1762995456.git.babu.moger@amd.com
2025-11-21x86,fs/resctrl: Detect io_alloc featureBabu Moger
AMD's SDCIAE (SDCI Allocation Enforcement) PQE feature enables system software to control the portions of L3 cache used for direct insertion of data from I/O devices into the L3 cache. Introduce a generic resctrl cache resource property "io_alloc_capable" as the first part of the new "io_alloc" resctrl feature that will support AMD's SDCIAE. Any architecture can set a cache resource as "io_alloc_capable" if a portion of the cache can be allocated for I/O traffic. Set the "io_alloc_capable" property for the L3 cache resource on x86 (AMD) systems that support SDCIAE. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/df85a9a6081674fd3ef6b4170920485512ce2ded.1762995456.git.babu.moger@amd.com
2025-11-21powercap: intel_rapl: Prepare read_raw() interface for atomic-context callersKuppuswamy Sathyanarayanan
The current read_raw() implementation of the TPMI, MMIO and MSR interfaces does not distinguish between atomic and non-atomic callers. rapl_msr_read_raw() uses rdmsrq_safe_on_cpu(), which can sleep and issue cross CPU calls. When MSR-based RAPL PMU support is enabled, PMU event handlers can invoke this function from atomic context where sleeping or rescheduling is not allowed. In atomic context, the caller is already executing on the target CPU, so a direct rdmsrq() is sufficient. To support such usage, introduce an atomic flag to the read_raw() interface to allow callers pass the context information. Modify the common RAPL code to propagate this flag, and set the flag to reflect the calling contexts. Utilize the atomic flag in rapl_msr_read_raw() to perform direct MSR read with rdmsrq() when running in atomic context, and a sanity check to ensure target CPU matches the current CPU for such use cases. The TPMI and MMIO implementations do not require special atomic handling, so the flag is ignored in those paths. This is a preparatory patch for adding MSR-based RAPL PMU support. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject tweak ] Link: https://patch.msgid.link/20251121000539.386069-2-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-21Merge tag 'ata-6.18-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fixes from Niklas Cassel: - Add a missing refcount decrement in ata_scsi_dev_rescan() when the device or its queue is not running. In the case where the device is running, the recount is already decremented properly (Yihang Li) - Generate the proper sense code for a Security locked device. There was a regression caused by a recent change of how sense data is generated for commands that did not provide any sense data. This broke system suspend for Security locked devices. Generate the sense data that the SCSI disk driver expects for a Security locked device so that system suspend works again (me) - Set capacity to zero for a Security locked device. All I/O commands will be aborted by a Security locked device. Thus, the block layer disk partition scanning will result in a bunch of, for the user, confusing I/O errors in dmesg during boot. Since a Security locked device is unusable anyway, set the capacity to zero, to avoid the disk partition scanning during boot. We still create the block device in /dev such that the user may unlock the device using e.g. hdparm (me) * tag 'ata-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: libata-core: Set capacity to zero for a security locked drive ata: libata-scsi: Fix system suspend for a security locked drive ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan()
2025-11-21lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGIONYicong Yang
ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for invalidating certain memory regions in a cache-incoherent manner. Currently this is used by NVDIMM and CXL memory drivers in cases where it is necessary to flush all data from caches by physical address range. The operations in question are effectively memory hotplug, where stale data might otherwise remain in the caches. This is separate from the invalidates done to enable use of non-coherent DMA masters, primarily in terms of when it is needed (not related to DMA mappings) and how deep the flush must push data. The flushes done for non-coherent DMA only need to reach the Point of Coherence of a single host (which is often nearer CPUs and DMA masters than the physical storage). This operation must push the data out of non architectural caches (memory-side caches, write buffers etc) and typically all the way to the memory device. In some architectures these operations are supported by system components that may become available only later in boot as they are either present on a discoverable bus, or via a firmware description of an MMIO interface (e.g. ACPI DSDT). Provide a framework to handle this case. Architectures can opt in for this support via CONFIG_GENERIC_CPU_CACHE_MAINTENANCE Add a registration framework. Each driver provides an ops structure and the first op is Write Back and Invalidate by PA Range. The driver may over invalidate. For systems that can perform this operation asynchronously an optional completion check operation is also provided. If present that must be called to ensure that the action has finished. This provides a considerable performance advantage if multiple agents are involved in the maintenance operation. When multiple agents are present in the system each should register with this framework and the core code will issue the invalidate to all of them before checking for completion on each. This is done to avoid need for filtering in the core code which can become complex when interleave, potentially across different cache coherency hardware is going on, so it is easier to tell everyone and let those who don't care do nothing. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-21regulator: Add FP9931/JD9930Mark Brown
Merge series from Andreas Kemnade <andreas@kemnade.info>: Add a driver for the FP9931/JD9930 regulator which provides the comparatively high voltages needed for electronic paper displays. Datasheet for the FP9931 is at https://www.fitipower.com/dl/file/flXa6hIchVeu0W3K Although it is in English, it seems to be only downloadable from the Chinese part of that website. For the JD9930 there can be a datasheet found at https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/196/JD9930_2D00_0.7_2D00_JUN_2D00_2019.pdf To simplify things, include the hwmon part directly which is only one register read and there are not other functions besides regulators in this chip.
2025-11-21Merge tag 'iio-for-6.19a' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Jonathan writes: IIO: New device support, features and cleanup for 6.19 The usual bunch of new device support, but also quite a bit of cleanup of the core and older drivers which is always good to see. New device support ------------------ adi,ad4080 - Add support for AD4081, AD4083, AD4084, AD4086 and AD4087 ADCs with slightly different features to existing supported parts (max CNV clock count, resolution etc) adi,adxl380 - Add support for ADXL318 and ADXL319 which have reduced functionality compared to other supported parts, particularly around event detection. aosong,adp810 - New driver for this differential pressure and temperature sensor. aspeed,adc - Add support for the AST2700 SoC ADCs which differ in small ways from already supported parts. bosch,sm330 - New driver for this IMU (accelerometer + gyroscop) with I2C and SPI bus support. invensense,icm45600 - New driver for this family of IMUs with sub drivers for accelerometer and gyroscope elements. I2C, I3C and SPI busses all supported. * Supports ICM45605, ICM45606, ICM45608, ICM45634, ICM45686, ICM45687, ICM45688P, ICM45689. * Support basic features and FIFO. maxim,max14001 - New driver for the MAX14001 and MAX14002 ADCs. renesas,rzt2h - New driver supporting the RZ/T2H and RZ/N2H ADCs found in various SoCs. renesas,rznl - New driver supporting the RZ/NL ADC found in various SoCs. Features -------- adi,ad5446 - Add a DT binding doc for the 29 variants currently covered by the driver. - Add adi,ad5542 which is compatiable with the adi,ad5542a which was already supported. bosch,bma220 - I2C support including an I2C bus watchdog. - Power supply control - Data ready trigger. - Low pass filter control. - Debugfs register access. - Add Petre Rodan as a maintainer of this driver (thanks!) bosch,bmi270 - Add support for motion events. fsl,mpl3115 - Add a dataready trigger and related sampling frequency control. - Add threshold events. infineon,dps310 - Add a specific device tree binding. maxim,max30100 - Allow control of LED pulse-width in dt-binding. Optimum value depends on physical characteristics of the device which contains this sensor. mediatek,mt2701 - Add dt compatible for the mt8189. rockchip,saradc - Add rk3506 compatible which is functionally the same as the already supported rk3528 (which is therefore the fallback) st,lsm6dsx - Make sampling more flexible when both fifo and events are of interest by decoupling the FIFO fill rate from actual sampling. Cleanup and minor fixes ----------------------- core - Document and add might_sleep() to iio_push_to_buffers_with_ts_unaligned() as it allocates a buffer, typically just on 1st call. - Add documentation for iio_push_to_buffers_with_ts() which is being used to replace iio_push_to_buffers_with_timestamp() in new code as it validates the buffer size. Make the deprecation of the old function clear. - Document that the store_to() callback in struct iio_buffer_access_funcs may be called from contexts that cannot sleep. - Document that the cb() provided to a callback buffer may be called from contexts that cannot sleep. - Cleanup up industrialio-backend.c comments. - Call mutex_destroy() in cleanup of buffers. - Call device_initialize() later to avoid having to call device_put() before configuration is otherwise complete. - Use mutex_init_with_key() to replace opencoded version. - Use dma_buf_unmap_attachment_unlocked() to replace opencoded version. - Reorder Makefile for pressure sensors. various - Uses sysfs_emit() to replace sprintf() in read_label() and other callbacks that typically are used to write data to sysfs buffers. - Switch to REGCACHE_MAPLE in various drivers. adi,docs - Fix up formatting of cross references and other kernel-doc issues. adi,ad4080 - Fix wrong masking of product IDs. adi,ad5446 - Use DMA safe buffers as needed for SPI. - Drop a duplicate device chip specific data structure where two parts are functionally identical. - Fail probe if reference is not available. - Split up the massive array of chip type specific structures into separate structures as this tends to be easier to read and maintain. - Add explicit of_device_id entries for all supported parts. - Split I2C and SPI parts away from core to avoid ifdef complexity. - Switch to devm_mutex_init(). - Make use of guard() to simplify code. - Applying IWYU principles and reorder headers. - Various other minor cleanup. adi,ad7124 - Add debugfs to support single cycle mode, typically only used for cases such as validate performance of the ADC. - Various other minor cleanup including removing some layers of indirection that weren't necessary. - Add extended attributes to the temperature channel which follows the same signal path as other channels. - Replace the setup register allocation strategy with a simpler more predictable one (a fix for OOB from this code follows later in this pull request). adi,adxl345 - Ensure dt-binding allows for both interrupt wired at the same time. arm,scmi - Replace const_ilog2() with the resulting value which ends up simpler to read. bosch,bma220 - Add correct SPI mode specification to the device tree binding. - Fix up interrupt type in dt binding example to match that the driver expects. - Relax hard constraint on matching chip ID with a message only so as to enable fallback DT compatibles to work. - Use local struct device *dev to replaces lots of indirect look ups. - Improve includes on approximate IWYU basis. - Explicit of_match_table. - Reset some registers during probe. - Move to regmap. - Ensure a timestamp is available when filling the buffer by using a locally acquired one rather than relying on trigger top half running. - Add a utility function to search value pair tables for a match. - Various other minor improvements. - Move code to avoid a false dependency of the core code on the I2C module. bosch,bma400 - Improve register and field naming + organization. Use with FIELD_GET() and FIELD_PREP() to allow dropping of shift defines. - Use macros to define event related fields. - Switch to an address lookup based on an index variable to replace lots of very similar register macros. - Rename activity_event_en() to generic_event_en() to better reflect what it does. - Improve comments around interrupt register handling. fsl,mpl3115 - Factor out code for triggered buffer data collection. - Use more consistent register field naming style. - Use get_unaligned_be24() to get the pressure. invensense,mpu6050 - Drop false requirement in DT binding for the interrupt. The driver will be able to do less if one is not provided, but some features are still available. invensense,icm45600_i3c - Fix missing return on failure to match part. linear,ltc2688 - Use devm_mutex_init() so mutex_destroy() is called in tear down path. - Use guard() to simplify lock handling in error return paths. qcom,vadc - Fix up some kernel-doc related warnings. rohm,bd79112 and bd79124 - Use regmap_reg_range() helper to set the ranges. st,lsm6dsx - Fix units of ODR in structure documentation. ti,am335x - Add range checks to avoid a compiler warning. ti,pac1934 - Switch to system_percpu_wq. Various other minor typo fixes etc. * tag 'iio-for-6.19a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (150 commits) staging: iio: adt7316: replace sprintf() with sysfs_emit() iio: pressure: Arrange Makefile alphabetically iio: ABI: document pressure event attributes iio: mpl3115: add threshold events support iio: mpl3115: use get_unaligned_be24() to retrieve pressure data iio: buffer: use dma_buf_unmap_attachment_unlocked() helper iio: core: Replace lockdep_set_class() + mutex_init() by combined call iio: core: Clean up device correctly on iio_device_alloc() failure iio: core: add missing mutex_destroy in iio_dev_release() iio: accel: adxl380: add support for ADXL318 and ADXL319 dt-bindings: iio: accel: adxl380: add new supported parts iio: imu: inv_icm45600: Initializes inv_icm45600_buffer_postdisable() sleep iio: adc: pac1934: replace use of system_wq with system_percpu_wq iio: dac: ad5446: Add AD5542 to the spi id table iio: dac: ad5446: Fix coding style issues iio: dac: ad5446: Refactor header inclusion iio: dac: ad5446: Make use of the cleanup helpers iio: dac: ad5446: Make use of devm_mutex_init() iio: dac: ad5446: Separate I2C/SPI into different drivers iio: dac: ad5456: Add missing DT compatibles ...
2025-11-21usb: ohci-da8xx: remove unused platform dataBartosz Golaszewski
We no longer support any board files for DaVinci in mainline and so struct da8xx_ohci_root_hub is no longer used. Remove it together with all the code it's used for. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://patch.msgid.link/20251114-davinci-usb-v1-1-737380353a74@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-21bug: Add report_bug_entry()Peter Zijlstra
Add a report_bug() variant where the bug_entry is already known. This is useful when the exception instruction is not instantiated per-site. But instead has a single instance. In such a case the bug_entry address might be passed along in a known register or something. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251110115757.575795595@infradead.org
2025-11-21efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specsMauro Carvalho Chehab
Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor was defined simply as: Type at byte offset 4: - Cache error - TLB Error - Bus Error - Micro-architectural Error All other values are reserved Yet, there was no information about how this would be encoded. Spec 2.9A errata corrected it by defining: - Bit 1 - Cache Error - Bit 2 - TLB Error - Bit 3 - Bus Error - Bit 4 - Micro-architectural Error All other values are reserved That actually aligns with the values already defined on older versions at N.2.4.1. Generic Processor Error Section. Spec 2.10 also preserve the same encoding as 2.9A. Adjust CPER and GHES handling code for both generic and ARM processors to properly handle UEFI 2.9A and 2.10 encoding. Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-21efi/cper: Add a new helper function to print bitmasksMauro Carvalho Chehab
Add a helper function to print a string with names associated to each bit field. A typical example is: const char * const bits[] = { "bit 3 name", "bit 4 name", "bit 5 name", }; char str[120]; unsigned int bitmask = BIT(3) | BIT(5); #define MASK GENMASK(5,3) cper_bits_to_str(str, sizeof(str), FIELD_GET(MASK, bitmask), bits, ARRAY_SIZE(bits)); The above code fills string "str" with "bit 3 name|bit 5 name". Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-21RAS: Report all ARM processor CPER information to userspaceJason Tian
The ARM processor CPER record was added in UEFI v2.6 and remained unchanged up to v2.10. Yet, the original arm_event trace code added by e9279e83ad1f ("trace, ras: add ARM processor error trace event") is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not exporting any information from tables N.17 to N.29 of the record. This is not enough for the user to be able to figure out what has exactly happened or to take appropriate action. According to the UEFI v2.9 specification chapter N2.4.4, the ARM processor error section includes: - several (ERR_INFO_NUM) ARM processor error information structures (Tables N.17 to N.20); - several (CONTEXT_INFO_NUM) ARM processor context information structures (Tables N.21 to N.29); - several vendor specific error information structures. The size is given by Section Length minus the size of the other fields. In addition, it also exports two fields that are parsed by the GHES driver when firmware reports it, e.g.: - error severity - CPU logical index Report all of these information to userspace via a the ARM tracepoint so that userspace can properly record the error and take decisions related to CPU core isolation according to error severity and other info. The updated ARM trace event now contains the following fields: ====================================== ============================= UEFI field on table N.16 ARM Processor trace fields ====================================== ============================= Validation handled when filling data for affinity MPIDR and running state. ERR_INFO_NUM pei_len CONTEXT_INFO_NUM ctx_len Section Length indirectly reported by pei_len, ctx_len and oem_len Error affinity level affinity MPIDR_EL1 mpidr MIDR_EL1 midr Running State running_state PSCI State psci_state Processor Error Information Structure pei_err - count at pei_len Processor Context ctx_err- count at ctx_len Vendor Specific Error Info oem - count at oem_len ====================================== ============================= It should be noted that decoding of tables N.17 to N.29, if needed, will be handled in userspace. That gives more flexibility, as there won't be any need to flood the kernel with micro-architecture specific error decoding. Also, decoding the other fields require a complex logic, and should be done for each of the several values inside the record field. So, let userspace daemons like rasdaemon decode them, parsing such tables and having vendor-specific micro-architecture-specific decoders. [mchehab: modified description, solved merge conflicts and fixed coding style] Signed-off-by: Jason Tian <jason@os.amperecomputing.com> Co-developed-by: Shengwei Luo <luoshengwei@huawei.com> Signed-off-by: Shengwei Luo <luoshengwei@huawei.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com> # rebased Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Shiju Jose <shiju.jose@huawei.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event") Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-20Merge tag 'vfio-v6.19-dma-buf-v9+' into v6.19/vfio/nextAlex Williamson
[v9] vfio/pci: Allow MMIO regions to be exported through dma-buf https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>