| Age | Commit message (Collapse) | Author |
|
When a zone device page is split (via huge pmd folio split). The driver
callback for folio_split is invoked to let the device driver know that the
folio size has been split into a smaller order.
Provide a default implementation for drivers that do not provide this
callback that copies the pgmap and mapping fields for the split folios.
Update the HMM test driver to handle the split.
Link: https://lkml.kernel.org/r/20251001065707.920170-11-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Enhance the hmm test driver (lib/test_hmm) with support for THP pages.
A new pool of free_folios() has now been added to the dmirror device,
which can be allocated when a request for a THP zone device private page
is made.
Add compound page awareness to the allocation function during normal
migration and fault based migration. These routines also copy
folio_nr_pages() when moving data between system memory and device memory.
args.src and args.dst used to hold migration entries are now dynamically
allocated (as they need to hold HPAGE_PMD_NR entries or more).
Split and migrate support will be added in future patches in this series.
Link: https://lkml.kernel.org/r/20251001065707.920170-10-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Implement CPU fault handling for zone device THP entries through
do_huge_pmd_device_private(), enabling transparent migration of
device-private large pages back to system memory on CPU access.
When the CPU accesses a zone device THP entry, the fault handler calls the
device driver's migrate_to_ram() callback to migrate the entire large page
back to system memory.
Link: https://lkml.kernel.org/r/20251001065707.920170-9-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
MIGRATE_VMA_SELECT_COMPOUND will be used to select THP pages during
migrate_vma_setup() and MIGRATE_PFN_COMPOUND will make migrating device
pages as compound pages during device pfn migration.
migrate_device code paths go through the collect, setup and finalize
phases of migration.
The entries in src and dst arrays passed to these functions still remain
at a PAGE_SIZE granularity. When a compound page is passed, the first
entry has the PFN along with MIGRATE_PFN_COMPOUND and other flags set
(MIGRATE_PFN_MIGRATE, MIGRATE_PFN_VALID), the remaining entries
(HPAGE_PMD_NR - 1) are filled with 0's. This representation allows for
the compound page to be split into smaller page sizes.
migrate_vma_collect_hole(), migrate_vma_collect_pmd() are now THP page
aware. Two new helper functions migrate_vma_collect_huge_pmd() and
migrate_vma_insert_huge_pmd_page() have been added.
migrate_vma_collect_huge_pmd() can collect THP pages, but if for some
reason this fails, there is fallback support to split the folio and
migrate it.
migrate_vma_insert_huge_pmd_page() closely follows the logic of
migrate_vma_insert_page()
Support for splitting pages as needed for migration will follow in later
patches in this series.
Link: https://lkml.kernel.org/r/20251001065707.920170-8-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Extend core huge page management functions to handle device-private THP
entries. This enables proper handling of large device-private folios in
fundamental MM operations.
The following functions have been updated:
- copy_huge_pmd(): Handle device-private entries during fork/clone
- zap_huge_pmd(): Properly free device-private THP during munmap
- change_huge_pmd(): Support protection changes on device-private THP
- __pte_offset_map(): Add device-private entry awareness
Link: https://lkml.kernel.org/r/20251001065707.920170-4-balbirs@nvidia.com
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Change page_free to folio_free to make the folio support for
zone device-private more consistent. The PCI P2PDMA callback
has also been updated and changed to folio_free() as a result.
For drivers that do not support folios (yet), the folio is
converted back into page via &folio->page and the page is used
as is, in the current callback implementation.
Link: https://lkml.kernel.org/r/20251001065707.920170-3-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: support device-private THP", v7.
This patch series introduces support for Transparent Huge Page (THP)
migration in zone device-private memory. The implementation enables
efficient migration of large folios between system memory and
device-private memory
Background
Current zone device-private memory implementation only supports PAGE_SIZE
granularity, leading to:
- Increased TLB pressure
- Inefficient migration between CPU and device memory
This series extends the existing zone device-private infrastructure to
support THP, leading to:
- Reduced page table overhead
- Improved memory bandwidth utilization
- Seamless fallback to base pages when needed
In my local testing (using lib/test_hmm) and a throughput test, the series
shows a 350% improvement in data transfer throughput and a 80% improvement
in latency
These patches build on the earlier posts by Ralph Campbell [1]
Two new flags are added in vma_migration to select and mark compound
pages. migrate_vma_setup(), migrate_vma_pages() and
migrate_vma_finalize() support migration of these pages when
MIGRATE_VMA_SELECT_COMPOUND is passed in as arguments.
The series also adds zone device awareness to (m)THP pages along with
fault handling of large zone device private pages. page vma walk and the
rmap code is also zone device aware. Support has also been added for
folios that might need to be split in the middle of migration (when the
src and dst do not agree on MIGRATE_PFN_COMPOUND), that occurs when src
side of the migration can migrate large pages, but the destination has not
been able to allocate large pages. The code supported and used
folio_split() when migrating THP pages, this is used when
MIGRATE_VMA_SELECT_COMPOUND is not passed as an argument to
migrate_vma_setup().
The test infrastructure lib/test_hmm.c has been enhanced to support THP
migration. A new ioctl to emulate failure of large page allocations has
been added to test the folio split code path. hmm-tests.c has new test
cases for huge page migration and to test the folio split path. A new
throughput test has been added as well.
The nouveau dmem code has been enhanced to use the new THP migration
capability.
mTHP support:
The patches hard code, HPAGE_PMD_NR in a few places, but the code has been
kept generic to support various order sizes. With additional refactoring
of the code support of different order sizes should be possible.
The future plan is to post enhancements to support mTHP with a rough
design as follows:
1. Add the notion of allowable thp orders to the HMM based test driver
2. For non PMD based THP paths in migrate_device.c, check to see if
a suitable order is found and supported by the driver
3. Iterate across orders to check the highest supported order for migration
4. Migrate and finalize
The mTHP patches can be built on top of this series, the key design
elements that need to be worked out are infrastructure and driver support
for multiple ordered pages and their migration.
HMM support for large folios was added in 10b9feee2d0d ("mm/hmm:
populate PFNs from PMD swap entry").
This patch (of 16)
Add routines to support allocation of large order zone device folios and
helper functions for zone device folios, to check if a folio is device
private and helpers for setting zone device data.
When large folios are used, the existing page_free() callback in pgmap is
called when the folio is freed, this is true for both PAGE_SIZE and higher
order pages.
Zone device private large folios do not support deferred split and scan
like normal THP folios.
Link: https://lkml.kernel.org/r/20251001065707.920170-1-balbirs@nvidia.com
Link: https://lkml.kernel.org/r/20251001065707.920170-2-balbirs@nvidia.com
Link: https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/ [1]
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Future changes will require KVM to be able to perform deactivations
by writing to the physical CPU interface. Add the corresponding
VA to the kvm_info structure, and let KVM stash it.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-3-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
The GICH_HCR description is missing a bunch of control bits that
control the maintenance interrupt. Add them.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-2-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Update header inclusions to follow IWYU (Include What You Use)
principle.
Note that kernel.h is discouraged to be included as it's written
at the top of that file.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
|
|
While running kernel-doc validation on linux-next, warnings were emitted
for functions in include/linux/string.h due to missing return value
documentation:
Warning: include/linux/string.h:375 No description found for return value of 'kbasename'
Warning: include/linux/string.h:560 No description found for return value of 'strstarts'
This patch adds the missing return value descriptions for both functions
and clears the related kernel-doc warnings.
Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://patch.msgid.link/20251118184828.2621595-1-kriish.sharma2006@gmail.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Merge series from Cosmin Tanislav <cosmin-gabriel.tanislav.xa@renesas.com>:
Add support for RZ/T2H and RZ/N2H.
|
|
The existing FIELD_{GET,PREP}() macros are limited to compile-time
constants. However, it is very common to prepare or extract bitfield
elements where the bitfield mask is not a compile-time constant.
To avoid this limitation, the AT91 clock driver and several other
drivers already have their own non-const field_{prep,get}() macros.
Make them available for general use by adding them to
<linux/bitfield.h>, and improve them slightly:
1. Avoid evaluating macro parameters more than once,
2. Replace "ffs() - 1" by "__ffs()",
3. Support 64-bit use on 32-bit architectures,
4. Wire field_{get,prep}() to FIELD_{GET,PREP}() when mask is
actually constant.
This is deliberately not merged into the existing FIELD_{GET,PREP}()
macros, as people expressed the desire to keep stricter variants for
increased safety, or for performance critical paths.
Yury: use __mask withing new macros.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Crt Mori <cmo@melexis.com>
Acked-by: Nuno Sá <nuno.sa@analog.com>
Acked-by: Richard Genoud <richard.genoud@bootlin.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
|
|
The BUILD_BUG_ON_MSG() check against "~0ull" works only with "unsigned
(long) long" _mask types. For constant masks, that condition is usually
met, as GENMASK() yields an UL value. The few places where the
constant mask is stored in an intermediate variable were fixed by
changing the variable type to u64 (see e.g. [1] and [2]).
However, for non-constant masks, smaller unsigned types should be valid,
too, but currently lead to "result of comparison of constant
18446744073709551615 with expression of type ... is always
false"-warnings with clang and W=1.
Hence refactor the __BF_FIELD_CHECK() helper, and factor out
__FIELD_{GET,PREP}(). The later lack the single problematic check, but
are otherwise identical to FIELD_{GET,PREP}(), and are intended to be
used in the fully non-const variants later.
[1] commit 5c667d5a5a3ec166 ("clk: sp7021: Adjust width of _m in
HWM_FIELD_PREP()")
[2] commit cfd6fb45cfaf46fa ("crypto: ccree - avoid out-of-range
warnings from clang")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://git.kernel.org/torvalds/c/5c667d5a5a3ec166 [1]
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
|
|
The architectures supported by this driver have a maximum of 32-bits
of address, so we don't need more than 32-bits to store the length of
control data. Change the length in struct cs_dsp_coeff_ctl to an
unsigned int instead of a size_t. Also make a corresponding trivial
change to wm_adsp.c to prevent a compiler warning.
Tested on x86_64 builds this saves at least 4 bytes per control
(another 4 bytes might be saved if the compiler was inserting padding
to align the size_t).
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20251124171536.78962-1-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
In the origin logic, the bpf_arch_text_poke() assume that the old and new
instructions have the same opcode. However, they can have different opcode
if we want to replace a "call" insn with a "jmp" insn.
Therefore, add the new function parameter "old_t" along with the "new_t",
which are used to indicate the old and new poke type. Meanwhile, adjust
the implement of bpf_arch_text_poke() for all the archs.
"BPF_MOD_NOP" is added to make the code more readable. In
bpf_arch_text_poke(), we still check if the new and old address is NULL to
determine if nop insn should be used, which I think is more safe.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20251118123639.688444-6-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In the origin call case, if BPF_TRAMP_F_SKIP_FRAME is not set, it means
that the trampoline is not called, but "jmp".
Introduce the function bpf_trampoline_use_jmp() to check if the trampoline
is in "jmp" mode.
Do some adjustment on the "jmp" mode for the x86_64. The main adjustment
that we make is for the stack parameter passing case, as the stack
alignment logic changes in the "jmp" mode without the "rip". What's more,
the location of the parameters on the stack also changes.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20251118123639.688444-5-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
For now, the "nop" will be replaced with a "call" instruction when a
function is hooked by the ftrace. However, sometimes the "call" can break
the RSB and introduce extra overhead. Therefore, introduce the flag
FTRACE_OPS_FL_JMP, which indicate that the ftrace_ops should be called
with a "jmp" instead of "call". For now, it is only used by the direct
call case.
When a direct ftrace_ops is marked with FTRACE_OPS_FL_JMP, the last bit of
the ops->direct_call will be set to 1. Therefore, we can tell if we should
use "jmp" for the callback in ftrace_call_replace().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20251118123639.688444-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Stephen Rothwell reports htmldocs warnings when merging char-misc tree:
WARNING: include/linux/firmware/intel/stratix10-svc-client.h:22 This comment
starts with '/**', but isn't a kernel-doc comment.
WARNING: include/linux/firmware/intel/stratix10-svc-client.h:184 Enum value
'COMMAND_HWMON_READTEMP' not described in enum 'stratix10_svc_command_code'
WARNING: include/linux/firmware/intel/stratix10-svc-client.h:184 Enum value
'COMMAND_HWMON_READVOLT' not described in enum 'stratix10_svc_command_code'
WARNING: include/linux/firmware/intel/stratix10-svc-client.h:307 function
parameter 'cb_arg' not described in 'async_callback_t'
Fixes: 4f49088c1625 ("firmware: stratix10-svc: Add definition for voltage and temperature sensor")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20251114153920.1c5df700@canb.auug.org.au/
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Link: https://patch.msgid.link/20251114185815.358423-3-dinguyen@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next
Georgi writes:
interconnect changes for 6.19
This pull request contains the interconnect changes for the 6.19-rc1
merge window. The core and driver changes are listed below.
Core changes:
- kbps_to_icc() macro optimization
Driver changes:
- Switch all Qualcomm RPMh interconnect drivers to use the dynamic
node IDs and drop support for non-dynamic ID allocation
- Add new driver and BWMON support for the Kaanapali SoC
- Add QoS support for the SM6350 SoC
- Add QoS support for the SA8775p SoC
- Fix missing link from SNOC_PNOC to the USB 2 on MSM8996 SoC that
includes also a dts change that has been acked by the maintainer
- Drop the QPIC interconnect and BCM nodes for the SDX75 SoC, as these
should be handled by the rpmh-clk driver
- Other misc fixes
Signed-off-by: Georgi Djakov <djakov@kernel.org>
* tag 'icc-6.19-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/djakov/icc: (40 commits)
interconnect: qcom: sm6350: enable QoS configuration
interconnect: qcom: sm6350: Remove empty BCM arrays
interconnect: qcom: icc-rpmh: Get parent's regmap for nested NoCs
dt-bindings: interconnect: qcom,sm6350-rpmh: Add clocks for QoS
dt-bindings: interconnect: qcom-bwmon: Document Kaanapali BWMONs
interconnect: qcom: icc-rpmh: drop support for non-dynamic IDS
interconnect: qcom: sm8750: convert to dynamic IDs
interconnect: qcom: sm8650: convert to dynamic IDs
interconnect: qcom: sm8550: convert to dynamic IDs
interconnect: qcom: sm8450: convert to dynamic IDs
interconnect: qcom: sm8350: convert to dynamic IDs
interconnect: qcom: sm8150: convert to dynamic IDs
interconnect: qcom: sm7150: convert to dynamic IDs
interconnect: qcom: sm6350: convert to dynamic IDs
interconnect: qcom: sdx75: convert to dynamic IDs
interconnect: qcom: sdx65: convert to dynamic IDs
interconnect: qcom: sdx55: convert to dynamic IDs
interconnect: qcom: sdm670: convert to dynamic IDs
interconnect: qcom: sc7180: convert to dynamic IDs
interconnect: qcom: sar2130p: convert to dynamic IDs
...
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next
Suzuki writes:
coresight: Updates for Linux v6.19
The changes for Linux v6.19 include :
- Support for static TPDM
- Fixes to TMC-ETR with CATU where buffer wasn't available to CATU in perf mode
- Clean ups to the component operations to accept coresight_path
- Fixes to the ETM4x/ETM3x driver
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
* tag 'coresight-next-v6.19' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux:
coresight: etm4x: Remove the state_needs_restore flag
coresight: etm4x: Remove the redundant DSB
coresight: etm4x: Properly control filter in CPU idle with FEAT_TRF
coresight: etm4x: Add context synchronization before enabling trace
coresight: etm4x: Correct polling IDLE bit
coresight: etm3x: Always set tracer's device mode on target CPU
coresight: etm4x: Always set tracer's device mode on target CPU
coresight: Change device mode to atomic type
coresight: change the sink_ops to accept coresight_path
coresight: change helper_ops to accept coresight_path
coresight: tmc: add the handle of the event to the path
coresight: tpdm: remove redundant check for drvdata
coresight: tpdm: add static tpdm support
dt-bindings: arm: document the static TPDM compatible
coresight: ETR: Fix ETR buffer use-after-free issue
|
|
Add the NPU generic layer in mt76 module. NPU will be used to enable
traffic forward offloading between the MT76 NIC and the Airoha ethernet one
available on the Airoha EN7581 SoC using Netfilter Flowtable APIs.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20251017-mt76-npu-devel-v2-4-ddaa90901723@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
MT7996 driver can use both wed and wed_hif2 devices to offload traffic
from/to the wireless NIC. In the current codebase we assume to always
use the primary wed device in wed callbacks resulting in the following
crash if the hw runs wed_hif2 (e.g. 6GHz link).
[ 297.455876] Unable to handle kernel read from unreadable memory at virtual address 000000000000080a
[ 297.464928] Mem abort info:
[ 297.467722] ESR = 0x0000000096000005
[ 297.471461] EC = 0x25: DABT (current EL), IL = 32 bits
[ 297.476766] SET = 0, FnV = 0
[ 297.479809] EA = 0, S1PTW = 0
[ 297.482940] FSC = 0x05: level 1 translation fault
[ 297.487809] Data abort info:
[ 297.490679] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 297.496156] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 297.501196] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 297.506500] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000107480000
[ 297.512927] [000000000000080a] pgd=08000001097fb003, p4d=08000001097fb003, pud=08000001097fb003, pmd=0000000000000000
[ 297.523532] Internal error: Oops: 0000000096000005 [#1] SMP
[ 297.715393] CPU: 2 UID: 0 PID: 45 Comm: kworker/u16:2 Tainted: G O 6.12.50 #0
[ 297.723908] Tainted: [O]=OOT_MODULE
[ 297.727384] Hardware name: Banana Pi BPI-R4 (2x SFP+) (DT)
[ 297.732857] Workqueue: nf_ft_offload_del nf_flow_rule_route_ipv6 [nf_flow_table]
[ 297.740254] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 297.747205] pc : mt76_wed_offload_disable+0x64/0xa0 [mt76]
[ 297.752688] lr : mtk_wed_flow_remove+0x58/0x80
[ 297.757126] sp : ffffffc080fe3ae0
[ 297.760430] x29: ffffffc080fe3ae0 x28: ffffffc080fe3be0 x27: 00000000deadbef7
[ 297.767557] x26: ffffff80c5ebca00 x25: 0000000000000001 x24: ffffff80c85f4c00
[ 297.774683] x23: ffffff80c1875b78 x22: ffffffc080d42cd0 x21: ffffffc080660018
[ 297.781809] x20: ffffff80c6a076d0 x19: ffffff80c6a043c8 x18: 0000000000000000
[ 297.788935] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000000
[ 297.796060] x14: 0000000000000019 x13: ffffff80c0ad8ec0 x12: 00000000fa83b2da
[ 297.803185] x11: ffffff80c02700c0 x10: ffffff80c0ad8ec0 x9 : ffffff81fef96200
[ 297.810311] x8 : ffffff80c02700c0 x7 : ffffff80c02700d0 x6 : 0000000000000002
[ 297.817435] x5 : 0000000000000400 x4 : 0000000000000000 x3 : 0000000000000000
[ 297.824561] x2 : 0000000000000001 x1 : 0000000000000800 x0 : ffffff80c6a063c8
[ 297.831686] Call trace:
[ 297.834123] mt76_wed_offload_disable+0x64/0xa0 [mt76]
[ 297.839254] mtk_wed_flow_remove+0x58/0x80
[ 297.843342] mtk_flow_offload_cmd+0x434/0x574
[ 297.847689] mtk_wed_setup_tc_block_cb+0x30/0x40
[ 297.852295] nf_flow_offload_ipv6_hook+0x7f4/0x964 [nf_flow_table]
[ 297.858466] nf_flow_rule_route_ipv6+0x438/0x4a4 [nf_flow_table]
[ 297.864463] process_one_work+0x174/0x300
[ 297.868465] worker_thread+0x278/0x430
[ 297.872204] kthread+0xd8/0xdc
[ 297.875251] ret_from_fork+0x10/0x20
[ 297.878820] Code: 928b5ae0 8b000273 91400a60 f943fa61 (79401421)
[ 297.884901] ---[ end trace 0000000000000000 ]---
Fix the issue detecting the proper wed reference to use running wed
callabacks.
Fixes: 83eafc9251d6 ("wifi: mt76: mt7996: add wed tx support")
Tested-by: Daniel Pawlik <pawlik.dan@gmail.com>
Tested-by: Matteo Croce <teknoraver@meta.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20251008-wed-fixes-v1-1-8f7678583385@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
|
|
Since the rework of the kernel virtual address space [1] the module area
and the kernel image are within the same 4GB area. Therefore there is no
need for the weak per cpu workaround for modules anymore. Remove it.
[1] commit c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
If we notice that we're renaming a file within a directory then we take
that as a sign that the user is working with the current directory and
may want a delegation to avoid extra revalidations when possible.
The nfs_request_directory_delegation() function exists within the NFS v4
module, so I add an extra flag to rename_setup() to indicate if a dentry
is being renamed within the same parent directory.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
This patch adds a new flag: NFS_INO_REQ_DIR_DELEG to signal that a
directory wants to request a directory delegation the next time it does
a GETATTR. I have the client request a directory delegation when doing
an access, create, or unlink call since these calls indicate that a user
is working with a directory.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
I add this to the existing GETATTR compound as an option extra step that
we can send if the "dir_deleg" flag is set to 'true'. Actually enabling
this value will happen in a later patch.
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Create a new backchannel function to stop the backchannel server
and clear the bc_serv in transport protected under the bc_pa_lock.
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Create a helper function for common code between rdma
and tcp backchannel handling of the backchannel request.
Make sure that access is protected by the bc_pa_lock
lock.
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Clang and recent gcc support warning if they are able to prove that the
user is passing to a function an array that is too short in size. For
example:
void blah(unsigned char herp[at_least 7]);
static void schma(void)
{
unsigned char good[] = { 1, 2, 3, 4, 5, 6, 7 };
unsigned char bad[] = { 1, 2, 3, 4, 5, 6 };
blah(good);
blah(bad);
}
The notation here, `static 7`, which this commit makes explicit by
allowing us to write it as `at_least 7`, means that it's incorrect to
pass anything less than 7 elements. This is section 6.7.5.3 of C99:
If the keyword static also appears within the [ and ] of the array
type derivation, then for each call to the function, the value of
the corresponding actual argument shall provide access to the first
element of an array with at least as many elements as specified by
the size expression.
Here is the output from gcc 15:
zx2c4@thinkpad /tmp $ gcc -c a.c
a.c: In function ‘schma’:
a.c:9:9: warning: ‘blah’ accessing 7 bytes in a region of size 6 [-Wstringop-overflow=]
9 | blah(bad);
| ^~~~~~~~~
a.c:9:9: note: referencing argument 1 of type ‘unsigned char[7]’
a.c:2:6: note: in a call to function ‘blah’
2 | void blah(unsigned char herp[at_least 7]);
| ^~~~
And from clang 21:
zx2c4@thinkpad /tmp $ clang -c a.c
a.c:9:2: warning: array argument is too small; contains 6 elements, callee requires at least 7
[-Warray-bounds]
9 | blah(bad);
| ^ ~~~
a.c:2:25: note: callee declares array parameter as static here
2 | void blah(unsigned char herp[at_least 7]);
| ^ ~~~~~~~~~~
1 warning generated.
So these are covered by, variously, -Wstringop-overflow and
-Warray-bounds.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20251123054819.2371989-3-Jason@zx2c4.com
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Some device drivers (and out-of-tree modules) might want to define
device-specific device governors. Rather than restricting all of them to
be a part of drivers/devfreq/ (which is not possible for out-of-tree
drivers anyway) move governor.h to include/linux/devfreq-governor.h and
update all drivers to use it.
The devfreq_cpu_data is only used internally, by the passive governor,
so it is moved to the driver source rather than being a part of the
public interface.
Reported-by: Robie Basak <robibasa@qti.qualcomm.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://patchwork.kernel.org/project/linux-pm/patch/20251030-governor-public-v2-1-432a11a9975a@oss.qualcomm.com/
|
|
Switch all uses of the deprecated mempool_t typedef in the core mempool
code to use struct mempool instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-11-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
This was added for bcachefs and is unused now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-10-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Add a version of the mempool allocator that works for batch allocations
of multiple objects. Calling mempool_alloc in a loop is not safe because
it could deadlock if multiple threads are performing such an allocation
at the same time.
As an extra benefit the interface is build so that the same array can be
used for alloc_pages_bulk / release_pages so that at least for page
backed mempools the fast path can use a nice batch optimization.
Note that mempool_alloc_bulk does not take a gfp_mask argument as it
must always be able to sleep and doesn't support any non-trivial
modifiers. NOFO or NOIO constrainst must be set through the scoped API.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251113084022.1255121-8-hch@lst.de
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Pick up OF changes to resolve dependencies
|
|
Currently, nested rcu critical sections are rejected by the verifier and
rcu_lock state is managed by a boolean variable. Add support for nested
rcu critical sections by make active_rcu_locks a counter similar to
active_preempt_locks. bpf_rcu_read_lock() increments this counter and
bpf_rcu_read_unlock() decrements it, MEM_RCU -> PTR_UNTRUSTED transition
happens when active_rcu_locks drops to 0.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251117200411.25563-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This updates bpf_insn_successors() reflecting that control flow might
jump over the instructions between tail call and function exit, verifier
might assume that some writes to parent stack always happen, which is
not the case.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu>
Link: https://lore.kernel.org/r/20251119160355.1160932-4-martin.teichmann@xfel.eu
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
"io_alloc" is the generic name of the new resctrl feature that enables system
software to configure the portion of cache allocated for I/O traffic. On AMD
systems, "io_alloc" resctrl feature is backed by AMD's L3 Smart Data Cache
Injection Allocation Enforcement (SDCIAE).
Introduce the architecture-specific functions that resctrl fs should call to
enable, disable, or check status of the "io_alloc" feature. Change SDCIAE state
by setting (to enable) or clearing (to disable) bit 1 of
MSR_IA32_L3_QOS_EXT_CFG on all logical processors within the cache domain.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/9e9070100c320eab5368e088a3642443dee95ed7.1762995456.git.babu.moger@amd.com
|
|
AMD's SDCIAE (SDCI Allocation Enforcement) PQE feature enables system software
to control the portions of L3 cache used for direct insertion of data from I/O
devices into the L3 cache.
Introduce a generic resctrl cache resource property "io_alloc_capable" as the
first part of the new "io_alloc" resctrl feature that will support AMD's
SDCIAE. Any architecture can set a cache resource as "io_alloc_capable" if
a portion of the cache can be allocated for I/O traffic.
Set the "io_alloc_capable" property for the L3 cache resource on x86 (AMD)
systems that support SDCIAE.
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://patch.msgid.link/df85a9a6081674fd3ef6b4170920485512ce2ded.1762995456.git.babu.moger@amd.com
|
|
The current read_raw() implementation of the TPMI, MMIO and MSR
interfaces does not distinguish between atomic and non-atomic callers.
rapl_msr_read_raw() uses rdmsrq_safe_on_cpu(), which can sleep and
issue cross CPU calls. When MSR-based RAPL PMU support is enabled, PMU
event handlers can invoke this function from atomic context where
sleeping or rescheduling is not allowed. In atomic context, the caller
is already executing on the target CPU, so a direct rdmsrq() is
sufficient.
To support such usage, introduce an atomic flag to the read_raw()
interface to allow callers pass the context information. Modify the
common RAPL code to propagate this flag, and set the flag to reflect
the calling contexts.
Utilize the atomic flag in rapl_msr_read_raw() to perform direct MSR
read with rdmsrq() when running in atomic context, and a sanity check
to ensure target CPU matches the current CPU for such use cases.
The TPMI and MMIO implementations do not require special atomic
handling, so the flag is ignored in those paths.
This is a preparatory patch for adding MSR-based RAPL PMU support.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20251121000539.386069-2-sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Add a missing refcount decrement in ata_scsi_dev_rescan() when
the device or its queue is not running.
In the case where the device is running, the recount is already
decremented properly (Yihang Li)
- Generate the proper sense code for a Security locked device.
There was a regression caused by a recent change of how sense
data is generated for commands that did not provide any sense
data. This broke system suspend for Security locked devices.
Generate the sense data that the SCSI disk driver expects for a
Security locked device so that system suspend works again (me)
- Set capacity to zero for a Security locked device.
All I/O commands will be aborted by a Security locked device.
Thus, the block layer disk partition scanning will result in
a bunch of, for the user, confusing I/O errors in dmesg during
boot.
Since a Security locked device is unusable anyway, set the capacity
to zero, to avoid the disk partition scanning during boot. We still
create the block device in /dev such that the user may unlock the
device using e.g. hdparm (me)
* tag 'ata-6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Set capacity to zero for a security locked drive
ata: libata-scsi: Fix system suspend for a security locked drive
ata: libata-scsi: Add missing scsi_device_put() in ata_scsi_dev_rescan()
|
|
ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
invalidating certain memory regions in a cache-incoherent manner. Currently
this is used by NVDIMM and CXL memory drivers in cases where it is
necessary to flush all data from caches by physical address range.
The operations in question are effectively memory hotplug, where stale
data might otherwise remain in the caches.
This is separate from the invalidates done to enable use of non-coherent
DMA masters, primarily in terms of when it is needed (not related to DMA
mappings) and how deep the flush must push data. The flushes done for
non-coherent DMA only need to reach the Point of Coherence of a single host
(which is often nearer CPUs and DMA masters than the physical storage).
This operation must push the data out of non architectural caches
(memory-side caches, write buffers etc) and typically all the way to the
memory device.
In some architectures these operations are supported by system components
that may become available only later in boot as they are either present
on a discoverable bus, or via a firmware description of an MMIO interface
(e.g. ACPI DSDT). Provide a framework to handle this case.
Architectures can opt in for this support via
CONFIG_GENERIC_CPU_CACHE_MAINTENANCE
Add a registration framework. Each driver provides an ops structure and
the first op is Write Back and Invalidate by PA Range. The driver may
over invalidate.
For systems that can perform this operation asynchronously an optional
completion check operation is also provided. If present that must be called
to ensure that the action has finished. This provides a considerable
performance advantage if multiple agents are involved in the maintenance
operation.
When multiple agents are present in the system each should register with
this framework and the core code will issue the invalidate to all of them
before checking for completion on each. This is done to avoid need for
filtering in the core code which can become complex when interleave,
potentially across different cache coherency hardware is going on, so it
is easier to tell everyone and let those who don't care do nothing.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
|
|
Merge series from Andreas Kemnade <andreas@kemnade.info>:
Add a driver for the FP9931/JD9930 regulator which provides the
comparatively high voltages needed for electronic paper displays.
Datasheet for the FP9931 is at
https://www.fitipower.com/dl/file/flXa6hIchVeu0W3K
Although it is in English, it seems to be only downloadable
from the Chinese part of that website.
For the JD9930 there can be a datasheet found at
https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/196/JD9930_2D00_0.7_2D00_JUN_2D00_2019.pdf
To simplify things, include the hwmon part directly which is only
one register read and there are not other functions besides
regulators in this chip.
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
IIO: New device support, features and cleanup for 6.19
The usual bunch of new device support, but also quite a bit of cleanup
of the core and older drivers which is always good to see.
New device support
------------------
adi,ad4080
- Add support for AD4081, AD4083, AD4084, AD4086 and AD4087 ADCs with
slightly different features to existing supported parts (max CNV clock
count, resolution etc)
adi,adxl380
- Add support for ADXL318 and ADXL319 which have reduced functionality
compared to other supported parts, particularly around event detection.
aosong,adp810
- New driver for this differential pressure and temperature sensor.
aspeed,adc
- Add support for the AST2700 SoC ADCs which differ in small ways from
already supported parts.
bosch,sm330
- New driver for this IMU (accelerometer + gyroscop) with I2C and SPI bus
support.
invensense,icm45600
- New driver for this family of IMUs with sub drivers for accelerometer
and gyroscope elements. I2C, I3C and SPI busses all supported.
* Supports ICM45605, ICM45606, ICM45608, ICM45634, ICM45686, ICM45687,
ICM45688P, ICM45689.
* Support basic features and FIFO.
maxim,max14001
- New driver for the MAX14001 and MAX14002 ADCs.
renesas,rzt2h
- New driver supporting the RZ/T2H and RZ/N2H ADCs found in various SoCs.
renesas,rznl
- New driver supporting the RZ/NL ADC found in various SoCs.
Features
--------
adi,ad5446
- Add a DT binding doc for the 29 variants currently covered by the driver.
- Add adi,ad5542 which is compatiable with the adi,ad5542a which was
already supported.
bosch,bma220
- I2C support including an I2C bus watchdog.
- Power supply control
- Data ready trigger.
- Low pass filter control.
- Debugfs register access.
- Add Petre Rodan as a maintainer of this driver (thanks!)
bosch,bmi270
- Add support for motion events.
fsl,mpl3115
- Add a dataready trigger and related sampling frequency control.
- Add threshold events.
infineon,dps310
- Add a specific device tree binding.
maxim,max30100
- Allow control of LED pulse-width in dt-binding. Optimum value depends
on physical characteristics of the device which contains this sensor.
mediatek,mt2701
- Add dt compatible for the mt8189.
rockchip,saradc
- Add rk3506 compatible which is functionally the same as the already
supported rk3528 (which is therefore the fallback)
st,lsm6dsx
- Make sampling more flexible when both fifo and events are of interest
by decoupling the FIFO fill rate from actual sampling.
Cleanup and minor fixes
-----------------------
core
- Document and add might_sleep() to iio_push_to_buffers_with_ts_unaligned()
as it allocates a buffer, typically just on 1st call.
- Add documentation for iio_push_to_buffers_with_ts() which is being used
to replace iio_push_to_buffers_with_timestamp() in new code as it
validates the buffer size. Make the deprecation of the old function
clear.
- Document that the store_to() callback in struct iio_buffer_access_funcs
may be called from contexts that cannot sleep.
- Document that the cb() provided to a callback buffer may be called
from contexts that cannot sleep.
- Cleanup up industrialio-backend.c comments.
- Call mutex_destroy() in cleanup of buffers.
- Call device_initialize() later to avoid having to call device_put()
before configuration is otherwise complete.
- Use mutex_init_with_key() to replace opencoded version.
- Use dma_buf_unmap_attachment_unlocked() to replace opencoded version.
- Reorder Makefile for pressure sensors.
various
- Uses sysfs_emit() to replace sprintf() in read_label() and other
callbacks that typically are used to write data to sysfs buffers.
- Switch to REGCACHE_MAPLE in various drivers.
adi,docs
- Fix up formatting of cross references and other kernel-doc issues.
adi,ad4080
- Fix wrong masking of product IDs.
adi,ad5446
- Use DMA safe buffers as needed for SPI.
- Drop a duplicate device chip specific data structure where two parts
are functionally identical.
- Fail probe if reference is not available.
- Split up the massive array of chip type specific structures into
separate structures as this tends to be easier to read and maintain.
- Add explicit of_device_id entries for all supported parts.
- Split I2C and SPI parts away from core to avoid ifdef complexity.
- Switch to devm_mutex_init().
- Make use of guard() to simplify code.
- Applying IWYU principles and reorder headers.
- Various other minor cleanup.
adi,ad7124
- Add debugfs to support single cycle mode, typically only used for cases
such as validate performance of the ADC.
- Various other minor cleanup including removing some layers of indirection
that weren't necessary.
- Add extended attributes to the temperature channel which follows the same
signal path as other channels.
- Replace the setup register allocation strategy with a simpler more
predictable one (a fix for OOB from this code follows later in this pull
request).
adi,adxl345
- Ensure dt-binding allows for both interrupt wired at the same time.
arm,scmi
- Replace const_ilog2() with the resulting value which ends up simpler
to read.
bosch,bma220
- Add correct SPI mode specification to the device tree binding.
- Fix up interrupt type in dt binding example to match that the driver
expects.
- Relax hard constraint on matching chip ID with a message only so as to
enable fallback DT compatibles to work.
- Use local struct device *dev to replaces lots of indirect look ups.
- Improve includes on approximate IWYU basis.
- Explicit of_match_table.
- Reset some registers during probe.
- Move to regmap.
- Ensure a timestamp is available when filling the buffer by using a
locally acquired one rather than relying on trigger top half running.
- Add a utility function to search value pair tables for a match.
- Various other minor improvements.
- Move code to avoid a false dependency of the core code on the I2C module.
bosch,bma400
- Improve register and field naming + organization. Use with FIELD_GET()
and FIELD_PREP() to allow dropping of shift defines.
- Use macros to define event related fields.
- Switch to an address lookup based on an index variable to replace lots of
very similar register macros.
- Rename activity_event_en() to generic_event_en() to better reflect what
it does.
- Improve comments around interrupt register handling.
fsl,mpl3115
- Factor out code for triggered buffer data collection.
- Use more consistent register field naming style.
- Use get_unaligned_be24() to get the pressure.
invensense,mpu6050
- Drop false requirement in DT binding for the interrupt. The driver will
be able to do less if one is not provided, but some features are still
available.
invensense,icm45600_i3c
- Fix missing return on failure to match part.
linear,ltc2688
- Use devm_mutex_init() so mutex_destroy() is called in tear down path.
- Use guard() to simplify lock handling in error return paths.
qcom,vadc
- Fix up some kernel-doc related warnings.
rohm,bd79112 and bd79124
- Use regmap_reg_range() helper to set the ranges.
st,lsm6dsx
- Fix units of ODR in structure documentation.
ti,am335x
- Add range checks to avoid a compiler warning.
ti,pac1934
- Switch to system_percpu_wq.
Various other minor typo fixes etc.
* tag 'iio-for-6.19a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (150 commits)
staging: iio: adt7316: replace sprintf() with sysfs_emit()
iio: pressure: Arrange Makefile alphabetically
iio: ABI: document pressure event attributes
iio: mpl3115: add threshold events support
iio: mpl3115: use get_unaligned_be24() to retrieve pressure data
iio: buffer: use dma_buf_unmap_attachment_unlocked() helper
iio: core: Replace lockdep_set_class() + mutex_init() by combined call
iio: core: Clean up device correctly on iio_device_alloc() failure
iio: core: add missing mutex_destroy in iio_dev_release()
iio: accel: adxl380: add support for ADXL318 and ADXL319
dt-bindings: iio: accel: adxl380: add new supported parts
iio: imu: inv_icm45600: Initializes inv_icm45600_buffer_postdisable() sleep
iio: adc: pac1934: replace use of system_wq with system_percpu_wq
iio: dac: ad5446: Add AD5542 to the spi id table
iio: dac: ad5446: Fix coding style issues
iio: dac: ad5446: Refactor header inclusion
iio: dac: ad5446: Make use of the cleanup helpers
iio: dac: ad5446: Make use of devm_mutex_init()
iio: dac: ad5446: Separate I2C/SPI into different drivers
iio: dac: ad5456: Add missing DT compatibles
...
|
|
We no longer support any board files for DaVinci in mainline and so
struct da8xx_ohci_root_hub is no longer used. Remove it together with
all the code it's used for.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://patch.msgid.link/20251114-davinci-usb-v1-1-737380353a74@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add a report_bug() variant where the bug_entry is already known. This
is useful when the exception instruction is not instantiated per-site.
But instead has a single instance. In such a case the bug_entry
address might be passed along in a known register or something.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251110115757.575795595@infradead.org
|
|
Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor
was defined simply as:
Type at byte offset 4:
- Cache error
- TLB Error
- Bus Error
- Micro-architectural Error
All other values are reserved
Yet, there was no information about how this would be encoded.
Spec 2.9A errata corrected it by defining:
- Bit 1 - Cache Error
- Bit 2 - TLB Error
- Bit 3 - Bus Error
- Bit 4 - Micro-architectural Error
All other values are reserved
That actually aligns with the values already defined on older
versions at N.2.4.1. Generic Processor Error Section.
Spec 2.10 also preserve the same encoding as 2.9A.
Adjust CPER and GHES handling code for both generic and ARM
processors to properly handle UEFI 2.9A and 2.10 encoding.
Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Add a helper function to print a string with names associated
to each bit field.
A typical example is:
const char * const bits[] = {
"bit 3 name",
"bit 4 name",
"bit 5 name",
};
char str[120];
unsigned int bitmask = BIT(3) | BIT(5);
#define MASK GENMASK(5,3)
cper_bits_to_str(str, sizeof(str), FIELD_GET(MASK, bitmask),
bits, ARRAY_SIZE(bits));
The above code fills string "str" with "bit 3 name|bit 5 name".
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The ARM processor CPER record was added in UEFI v2.6 and remained
unchanged up to v2.10.
Yet, the original arm_event trace code added by
e9279e83ad1f ("trace, ras: add ARM processor error trace event")
is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
exporting any information from tables N.17 to N.29 of the record.
This is not enough for the user to be able to figure out what has
exactly happened or to take appropriate action.
According to the UEFI v2.9 specification chapter N2.4.4, the ARM
processor error section includes:
- several (ERR_INFO_NUM) ARM processor error information structures
(Tables N.17 to N.20);
- several (CONTEXT_INFO_NUM) ARM processor context information
structures (Tables N.21 to N.29);
- several vendor specific error information structures. The
size is given by Section Length minus the size of the other
fields.
In addition, it also exports two fields that are parsed by the GHES
driver when firmware reports it, e.g.:
- error severity
- CPU logical index
Report all of these information to userspace via a the ARM tracepoint so
that userspace can properly record the error and take decisions related
to CPU core isolation according to error severity and other info.
The updated ARM trace event now contains the following fields:
====================================== =============================
UEFI field on table N.16 ARM Processor trace fields
====================================== =============================
Validation handled when filling data for
affinity MPIDR and running
state.
ERR_INFO_NUM pei_len
CONTEXT_INFO_NUM ctx_len
Section Length indirectly reported by
pei_len, ctx_len and oem_len
Error affinity level affinity
MPIDR_EL1 mpidr
MIDR_EL1 midr
Running State running_state
PSCI State psci_state
Processor Error Information Structure pei_err - count at pei_len
Processor Context ctx_err- count at ctx_len
Vendor Specific Error Info oem - count at oem_len
====================================== =============================
It should be noted that decoding of tables N.17 to N.29, if needed, will
be handled in userspace. That gives more flexibility, as there won't be
any need to flood the kernel with micro-architecture specific error
decoding.
Also, decoding the other fields require a complex logic, and should be
done for each of the several values inside the record field. So, let
userspace daemons like rasdaemon decode them, parsing such tables and
having vendor-specific micro-architecture-specific decoders.
[mchehab: modified description, solved merge conflicts and fixed coding style]
Signed-off-by: Jason Tian <jason@os.amperecomputing.com>
Co-developed-by: Shengwei Luo <luoshengwei@huawei.com>
Signed-off-by: Shengwei Luo <luoshengwei@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Daniel Ferguson <danielf@os.amperecomputing.com> # rebased
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Shiju Jose <shiju.jose@huawei.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
[v9] vfio/pci: Allow MMIO regions to be exported through dma-buf
https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|