summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2024-08-30powerpc/8xx: Preallocate execmem page tablesChristophe Leroy
Preallocate execmem page tables before creating new PGDs so that all PGD entries related to execmem can be copied in pgd_alloc(). On 8xx there are 32 Mbytes for execmem by default so this will use 32 kbytes. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/a7180cc1ba59dec4502af39b4e9f3ff91c57280d.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Reduce default size of module/execmem areaChristophe Leroy
8xx boards don't have much memory, the two I know have respectively 32Mbytes and 128Mbytes, so there is no point in having 256 Mbytes of memory for module text. Reduce it to 32Mbytes for 8xx, that's more than enough. Nevertheless, make it a configurable value so that it can be customised if needed. Also add a build verification for overlap of module execmem space with user PMD. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/8db23b61e33a0d1913d814f94bfe71ba7ac78b0f.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Allow setting DATA alignment even with STRICT_KERNEL_RWXChristophe Leroy
It is now possible to not pin kernel text with a 8Mbytes TLB, so the alignment for STRICT_KERNEL_RWX can be relaxed. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/d0d8b05012b392dd166cfd911f14ba2741ce7e1e.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30Revert "powerpc/8xx: Always pin kernel text TLB"Christophe Leroy
This reverts commit bccc58986a2f98e3af349c85c5f49aac7fb19ef2. When STRICT_KERNEL_RWX is selected, EXEC memory must stop where RW memory start. When pinning iTLBs it means an 8M alignment for RW data start. That may be acceptable on boards with a lot of memory but one of my supported boards only has 32 Mbytes and this forced alignment leads to a waste of almost 4 Mbytes with is more than 10% of the total memory. So revert commit bccc58986a2f ("powerpc/8xx: Always pin kernel text TLB") but don't restore previous behaviour in ITLB miss handler as now kernel PGD entries are copied into each process PGDIR. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/01b6780b860c8043b51a1ba9d83acfc6f2dde910.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Copy kernel PGD entries into all PGDIRsChristophe Leroy
In order to avoid having to select PGDIR at each TLB miss based on fault address, copy kernel PGD entries into all PGDIRs in pgd_alloc(). At first it will be used for ITLB misses for kernel TEXT, then for execmem then for kernel DATA. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/c6d2bf5af2ea909071a85bdca8b1f5dc2df134a8.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Fix kernel vs user address comparisonChristophe Leroy
Since commit 9132a2e82adc ("powerpc/8xx: Define a MODULE area below kernel text"), module exec space is below PAGE_OFFSET so not only space above PAGE_OFFSET, but space above TASK_SIZE need to be seen as kernel space. Until now the problem went undetected because by default TASK_SIZE is 0x8000000 which means address space is determined by just checking upper address bit. But when TASK_SIZE is over 0x80000000, PAGE_OFFSET is used for comparison, leading to thinking module addresses are part of user space. Fix it by using TASK_SIZE instead of PAGE_OFFSET for address comparison. Fixes: 9132a2e82adc ("powerpc/8xx: Define a MODULE area below kernel text") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/3f574c9845ff0a023b46cb4f38d2c45aecd769bd.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Fix initial memory mappingChristophe Leroy
Commit cf209951fa7f ("powerpc/8xx: Map linear memory with huge pages") introduced an initial mapping of kernel TEXT using PAGE_KERNEL_TEXT, but the pages that contain kernel TEXT may also contain kernel RODATA, and depending on selected debug options PAGE_KERNEL_TEXT may be either RWX or ROX. RODATA must be writable during init because it also contains ro_after_init data. So use PAGE_KERNEL_X instead to be sure it is RWX. Fixes: cf209951fa7f ("powerpc/8xx: Map linear memory with huge pages") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/dac7a828d8497c4548c91840575a706657baa4f1.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/powernv/pci: Remove obsoleted declaration for pnv_pci_init_ioda_hubGaosheng Cui
The pnv_pci_init_ioda_hub() have been removed since commit 5ac129cdb50b ("powerpc/powernv/pci: Remove ioda1 support"), and now it is useless, so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822130043.783756-1-cuigaosheng1@huawei.com
2024-08-30powerpc: Remove obsoleted declarations for use_cop and drop_copGaosheng Cui
The use_cop() and drop_cop() have been removed since commit 6ff4d3e96652 ("powerpc: Remove old unused icswx based coprocessor support"), now they are useless, so remove them. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822130609.786431-5-cuigaosheng1@huawei.com
2024-08-30powerpc/pasemi: Remove obsoleted declaration for pas_pci_irq_fixup()Gaosheng Cui
The pas_pci_irq_fixup() have been removed since commit 771f7404a9de ("pasemi_mac: Move the IRQ mapping from the PCI layer to the driver"), and now it is useless, so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822130609.786431-4-cuigaosheng1@huawei.com
2024-08-30powerpc/maple: Remove obsoleted declaration for maple_calibrate_decr()Gaosheng Cui
The maple_calibrate_decr() have been removed since commit 10f7e7c15e6c ("[PATCH] ppc64: consolidate calibrate_decr implementations"), and now it is useless, so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822130609.786431-3-cuigaosheng1@huawei.com
2024-08-30powerpc: Remove obsoleted declaration for _get_SPGaosheng Cui
The implementation of _get_SP() was removed in commit f4db196717c6 ("[POWERPC] Remove _get_SP"), remove the now obsolete declaration. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Update change log to refer to correct commit per Christophe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822130609.786431-2-cuigaosheng1@huawei.com
2024-08-29powerpc/pseries/dlpar: Use helper function for_each_child_of_node()Zhang Zekun
for_each_child_of_node can help to iterate through the device_node, and we don't need to use while loop. No functional change with this conversion. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822085430.25753-3-zhangzekun11@huawei.com
2024-08-29powerpc/powermac/pfunc_base: Use helper function for_each_child_of_node()Zhang Zekun
for_each_child_of_node() can help to iterate through the device_node, and we don't need to do it manually. No functional change with this conversion. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822085430.25753-2-zhangzekun11@huawei.com
2024-08-29powerpc/64s/mm: Move __real_pte stubs into hash-4k.hMichael Ellerman
The stub versions of __real_pte() etc are only used with HPT & 4K pages, so move them into the hash-4k.h header. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240821080729.872034-1-mpe@ellerman.id.au
2024-08-29powerpc/configs/64s: Enable DEFERRED_STRUCT_PAGE_INITMichael Ellerman
It can speed up initialisation of page structs at boot on large machines. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240820065705.660812-1-mpe@ellerman.id.au
2024-08-29powerpc/qspinlock: Fix deadlock in MCS queueNysal Jan K.A.
If an interrupt occurs in queued_spin_lock_slowpath() after we increment qnodesp->count and before node->lock is initialized, another CPU might see stale lock values in get_tail_qnode(). If the stale lock value happens to match the lock on that CPU, then we write to the "next" pointer of the wrong qnode. This causes a deadlock as the former CPU, once it becomes the head of the MCS queue, will spin indefinitely until it's "next" pointer is set by its successor in the queue. Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results in occasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive \ --maximize --oomable --verify --syslog \ --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ec The following code flow illustrates how the deadlock occurs. For the sake of brevity, assume that both locks (A and B) are contended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before "nodes[0].lock = B") | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) | ▼ WRITE_ONCE(prev->next, node) | ▼ Spin indefinitely (until nodes[0].locked == 1) Thanks to Saket Kumar Bhaskar for help with recreating the issue Fixes: 84990b169557 ("powerpc/qspinlock: add mcs queueing for contended waiters") Cc: stable@vger.kernel.org # v6.2+ Reported-by: Geetika Moolchandani <geetika@linux.ibm.com> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Reported-by: Jijo Varghese <vargjijo@in.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240829022830.1164355-1-nysal@linux.ibm.com
2024-08-27Merge v6.11-rc5 into drm-nextDaniel Vetter
amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might as well catch up with a backmerge and handle them all. Plus both misc and intel maintainers asked for a backmerge anyway. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-08-27powerpc/xmon: Fix tmpstr length check in scanhexMadhavan Srinivasan
If a function name is greater than 63 characters long, xmon command may not find them. For example, here is a test that executed an illegal instruction in a kernel function and one of call stack function has a name greater than 63 characters long: cpu 0x0: Vector: 700 (Program Check) at [c00000000a6577e0] pc: c0000000001aacb8: check__allowed__function__name__for__symbol__r4+0x8/0x10 lr: c00000000019c1e0: check__allowed__function__name__for__symbol__r1+0x20/0x40 sp: c00000000a657a80 msr: 800000000288b033 current = 0xc00000000a439900 paca = 0xc000000003e90000 irqmask: 0x03 irq_happened: 0x01 ..... [link register ] c00000000019c1e0 check__allowed__function__name__for__symbol__r1+0x20/0x40 [c00000000a657a80] c00000000a439900 (unreliable) [c00000000a657aa0] c0000000001021d8 check__allowed__function__name__for__symbol__r2_resolution_symbol+0x38/0x4c [c00000000a657ac0] c00000000019b424 power_pmu_event_init+0xa4/0xa50 and when executing a dump instruction (di) command for long function name, xmon fails to find the function symbol: 0:mon> di $check__allowed__function__name__for__symbol__r2_resolution_symbol unknown symbol 'check__allowed__function__name__for__symbol__r2_resolution_symb' 0000000000000000 ******** This is because in scanhex(), tmpstr loop index is checked only for a upper bound of 63. Fix it by replacing the upper bound value with (KSYM_NAME_LEN-1). With fix: 0:mon> di $check__allowed__function__name__for__symbol__r2_resolution_symbol c0000000001021a0 3c4c0249 addis r2,r12,585 c0000000001021a4 3842ae60 addi r2,r2,-20896 c0000000001021a8 7c0802a6 mflr r0 c0000000001021ac 60000000 nop ..... Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Closes: https://lore.kernel.org/linuxppc-dev/CANiq72=QeTgtZL4k9=4CJP6C_Hv=rh3fsn3B9S3KFoPXkyWk3w@mail.gmail.com/ Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240826064217.46658-1-maddy@linux.ibm.com
2024-08-22powerpc/mm: Fix return type of pgd_val()Christophe Leroy
Commit 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") switched PGD entries to 64 bits, but pgd_val() returns an unsigned long which is 32 bits on PPC32. This is not a problem for regular PMD entries because the upper part is always NULL, but when PMD entries are leaf they contain 64 bits values, so pgd_val() must return an unsigned long long instead of an unsigned long. Also change the condition to CONFIG_PPC_85xx instead of CONFIG_PPC_E500 as the change was meant for 32 bits only. Allthough this should be harmless on PPC64, it generates a warning with pgd_ERROR print. Fixes: 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/45f8fdf298ec3df7573b66d21b03a5cda92e2cb1.1724313510.git.christophe.leroy@csgroup.eu
2024-08-22powerpc/vdso: Don't discard rela sectionsChristophe Leroy
After building the VDSO, there is a verification that it contains no dynamic relocation, see commit aff69273af61 ("vdso: Improve cmd_vdso_check to check all dynamic relocations"). This verification uses readelf -r and doesn't work if rela sections are discarded. Fixes: 8ad57add77d3 ("powerpc/build: vdso linker warning for orphan sections") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/45c3e6fc76cad05ad2cac0f5b5dfb4fae86dc9d6.1724153239.git.christophe.leroy@csgroup.eu
2024-08-22powerpc/64e: Define mmu_pte_psize staticChristophe Leroy
mmu_pte_psize is only used in the tlb_64e.c, define it static. Fixes: 25d21ad6e799 ("powerpc: Add TLB management code for 64-bit Book3E") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408011256.1O99IB0s-lkp@intel.com/ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/beb30d280eaa5d857c38a0834b147dffd6b28aa9.1724157750.git.christophe.leroy@csgroup.eu
2024-08-22dma-mapping: replace zone_dma_bits by zone_dma_limitCatalin Marinas
The hardware DMA limit might not be power of 2. When RAM range starts above 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit can not encode this limit. Use a plain address for the DMA zone limit instead. Since the DMA zone can now potentially span beyond 4GB physical limit of DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-08-21powerpc/code-patching: Add boot selftest for data patchingBenjamin Gray
Extend the code patching selftests with some basic coverage of the new data patching variants too. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-6-bgray@linux.ibm.com
2024-08-21powerpc/32: Convert patch_instruction() to patch_uint()Benjamin Gray
These changes are for patch_instruction() uses on data. Unlike ppc64 these should not be incorrect as-is, but using the patch_uint() alias better reflects what kind of data being patched and allows for benchmarking the effect of different patch_* implementations (e.g., skipping instruction flushing when patching data). Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Tested-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-5-bgray@linux.ibm.com
2024-08-21powerpc/64: Convert patch_instruction() to patch_u32()Benjamin Gray
This use of patch_instruction() is working on 32 bit data, and can fail if the data looks like a prefixed instruction and the extra write crosses a page boundary. Use patch_u32() to fix the write size. Fixes: 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules") Link: https://lore.kernel.org/all/20230203004649.1f59dbd4@yea/ Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Tested-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-4-bgray@linux.ibm.com
2024-08-21powerpc/code-patching: Add data patch alignment checkBenjamin Gray
The new data patching still needs to be aligned within a cacheline too for the flushes to work correctly. To simplify this requirement, we just say data patches must be aligned. Detect when data patching is not aligned, returning an invalid argument error. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-3-bgray@linux.ibm.com
2024-08-21powerpc/code-patching: Add generic memory patchingBenjamin Gray
patch_instruction() is designed for patching instructions in otherwise readonly memory. Other consumers also sometimes need to patch readonly memory, so have abused patch_instruction() for arbitrary data patches. This is a problem on ppc64 as patch_instruction() decides on the patch width using the 'instruction' opcode to see if it's a prefixed instruction. Data that triggers this can lead to larger writes, possibly crossing a page boundary and failing the write altogether. Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and patch_u64() (on ppc64) designed for aligned data patches. The patch size is now determined by the called function, and is passed as an additional parameter to generic internals. While the instruction flushing is not required for data patches, it remains unconditional in this patch. A followup series is possible if benchmarking shows fewer flushes gives an improvement in some data-patching workload. ppc32 does not support prefixed instructions, so is unaffected by the original issue. Care is taken in not exposing the size parameter in the public (non-static) interface, so the compiler can const-propagate it away. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-2-bgray@linux.ibm.com
2024-08-21powerpc: Remove unused LHZX_BE macroChristophe Leroy
LHZX_BE has been unused since commit dbf44daf7c88 ("bpf, ppc64: remove ld_abs/ld_ind") Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/fd332b01c47bb9cb6c3af1696a2e109be655f5b5.1724222856.git.christophe.leroy@csgroup.eu
2024-08-16KVM: PPC: Book3S HV: remove unused varibleAlex Shi
During build testing, we found a error: arch/powerpc/kvm/book3s_hv.c:4052:17: error: variable 'loops' set but not used unsigned long loops = 0; 1 error generated. Fix it by removing the unused variable. Fixes: b4deba5c41e9 ("KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8") Signed-off-by: Alex Shi <alexs@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240816093313.327268-1-alexs@kernel.org
2024-08-12introduce fd_file(), convert all accessors to it.Al Viro
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-08-13powerpc/topology: Check if a core is onlineNysal Jan K.A
topology_is_core_online() checks if the core a CPU belongs to is online. The core is online if at least one of the sibling CPUs is online. The first CPU of an online core is also online in the common case, so this should be fairly quick. Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support") Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240731030126.956210-3-nysal@linux.ibm.com
2024-08-12powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUALChristophe Leroy
Booting with CONFIG_DEBUG_VIRTUAL leads to following warning when passing hugepage reservation on command line: Kernel command line: hugepagesz=1g hugepages=1 hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1 noreboot HugeTLB: allocating 1 of page size 1.00 GiB failed. Only allocated 0 hugepages. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/io.h:948 __alloc_bootmem_huge_page+0xd4/0x284 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc6-00396-g6b0e82791bd0-dirty #936 Hardware name: MPC8544DS e500v2 0x80210030 MPC8544 DS NIP: c1020240 LR: c10201d0 CTR: 00000000 REGS: c13fdd30 TRAP: 0700 Not tainted (6.10.0-rc6-00396-g6b0e82791bd0-dirty) MSR: 00021000 <CE,ME> CR: 44084288 XER: 20000000 GPR00: c10201d0 c13fde20 c130b560 e8000000 e8001000 00000000 00000000 c1420000 GPR08: 00000000 00028001 00000000 00000004 44084282 01066ac0 c0eb7c9c efffe149 GPR16: c0fc4228 0000005f ffffffff c0eb7d0c c0eb7cc0 c0eb7ce0 ffffffff 00000000 GPR24: c1441cec efffe153 e8001000 c14240c0 00000000 c1441d64 00000000 e8000000 NIP [c1020240] __alloc_bootmem_huge_page+0xd4/0x284 LR [c10201d0] __alloc_bootmem_huge_page+0x64/0x284 Call Trace: [c13fde20] [c10201d0] __alloc_bootmem_huge_page+0x64/0x284 (unreliable) [c13fde50] [c10207b8] hugetlb_hstate_alloc_pages+0x8c/0x3e8 [c13fdeb0] [c1021384] hugepages_setup+0x240/0x2cc [c13fdef0] [c1000574] unknown_bootoption+0xfc/0x280 [c13fdf30] [c0078904] parse_args+0x200/0x4c4 [c13fdfa0] [c1000d9c] start_kernel+0x238/0x7d0 [c13fdff0] [c0000434] set_ivor+0x12c/0x168 Code: 554aa33e 7c042840 3ce0c142 80a7427c 5109a016 50caa016 7c9a2378 7fdcf378 4180000c 7c052040 41810160 7c095040 <0fe00000> 38c00000 40800108 3c60c0eb ---[ end trace 0000000000000000 ]--- This is due to virt_addr_valid() using high_memory before it is set. high_memory is set in mem_init() using max_low_pfn, but max_low_pfn is available long before, it is set in mem_topology_setup(). So just like commit daa9ada2093e ("powerpc/mm: Fix boot crash with FLATMEM") moved the setting of max_mapnr immediately after the call to mem_topology_setup(), the same can be done for high_memory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/62b69c4baad067093f39e7e60df0fe27a86b8d2a.1723100702.git.christophe.leroy@csgroup.eu
2024-08-12powerpc/mm: Fix size of allocated PGDIRChristophe Leroy
Commit 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") increased the size of PGD entries but failed to increase the PGD directory. Use the size of pgd_t instead of the size of pointers to calculate the allocated size. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/1cdaacb391cbd3e0240f0e0faf691202874e9422.1723109462.git.christophe.leroy@csgroup.eu
2024-08-08Merge tag 'drm-misc-next-2024-08-01' of ↵Daniel Vetter
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: virtio: - Define DRM capset Cross-subsystem Changes: dma-buf: - heaps: Clean up documentation printk: - Pass description to kmsg_dump() Core Changes: CI: - Update IGT tests - Point upstream repo to GitLab instance modesetting: - Introduce Power Saving Policy property for connectors - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support panic: - Avoid build-time interference with framebuffer console docs: - Document Colorspace property scheduler: - Remove full_recover from drm_sched_start TTM: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory Driver Changes: amdgpu: - Support Power Saving Policy connector property ast: - astdp: Support AST2600 with VGA; Clean up HPD bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable gma500: - Update i2c terminology ivpu: - Add MODULE_FIRMWARE() lcdif: - Fix pixel clock loongson: - Use GEM refcount over TTM's mgag200: - Improve BMC handling - Support VBLANK intterupts nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's panel: - Shutdown fixes plus documentation - Refactor several drivers for better code sharing - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing sti: - Fix module owner stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - Call drm_atomic_helper_shutdown() v3d: - Clean up perfmon vkms: - Clean up Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box
2024-08-07powerpc: Remove useless config comment in asm/percpu.hJinjie Ruan
commit 0db880fc865f ("powerpc: Avoid nmi_enter/nmi_exit in real mode interrupt.") has a config comment typo, and the #if/#else/#endif section is small and doesn't nest additional #ifdefs so the comment is useless and should be removed completely. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240807025604.2817577-1-ruanjinjie@huawei.com
2024-08-07powerpc/476: Drop explicit initialization of struct ↵Uwe Kleine-König
i2c_device_id::driver_data to 0 This driver doesn't use the driver_data member of struct i2c_device_id, so don't explicitly initialize this member. This prepares putting driver_data in an anonymous union which requires either no initialization or named designators. But it's also a nice cleanup on its own. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240804112032.3628645-2-u.kleine-koenig@baylibre.com
2024-08-07powerpc/traps: Use backlight power constantsThomas Zimmermann
Replace FB_BLANK_ constants with their counterparts from the backlight subsystem. The values are identical, so there's no change in functionality or semantics. traps.c already includes backlight.h where the BACKLIGHT constants are defined. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240731130720.1148872-2-tzimmermann@suse.de
2024-08-07powerpc: Use of_property_present()Rob Herring (Arm)
Use of_property_present() to test for property presence rather than of_get_property(). This is part of a larger effort to remove callers of of_get_property() and similar functions. of_get_property() leaks the DT property data pointer which is a problem for dynamically allocated nodes which may be freed. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240731191312.1710417-9-robh@kernel.org
2024-08-05KVM: PPC: Book3S HV: Refactor HFSCR emulation for KVM guestsGautam Menghani
Refactor HFSCR emulation for KVM guests when they exit out with H_FAC_UNAVAIL to use a switch case instead of checking all "cause" values, since the "cause" values are mutually exclusive; and this is better expressed with a switch case. Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240716115206.70210-1-gautam@linux.ibm.com
2024-08-02crypto: ppc/curve25519 - add missing MODULE_DESCRIPTION() macroJeff Johnson
Since commit 1fffe7a34c89 ("script: modpost: emit a warning when the description is missing"), a module without a MODULE_DESCRIPTION() will result in a warning with make W=1. The following warning is being observed when building ppc64le with CRYPTO_CURVE25519_PPC64=m: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/crypto/curve25519-ppc64le.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-07-29Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging to get a late RC of v6.10 before moving into v6.11. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-29treewide: context_tracking: Rename CONTEXT_* into CT_STATE_*Valentin Schneider
Context tracking state related symbols currently use a mix of the CONTEXT_ (e.g. CONTEXT_KERNEL) and CT_SATE_ (e.g. CT_STATE_MASK) prefixes. Clean up the naming and make the ctx_state enum use the CT_STATE_ prefix. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-07-27Merge tag 'devicetree-fixes-for-6.11-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull more devicetree updates from Rob Herring: "Most of this is a treewide change to of_property_for_each_u32() which was small enough to do in one go before rc1 and avoids the need to create of_property_for_each_u32_some_new_name(). - Treewide conversion of of_property_for_each_u32() to drop internal arguments making struct property opaque - Add binding for Amlogic A4 SoC watchdog - Fix constraints for AD7192 'single-channel' property" * tag 'devicetree-fixes-for-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: iio: adc: ad7192: Fix 'single-channel' constraints of: remove internal arguments from of_property_for_each_u32() dt-bindings: watchdog: add support for Amlogic A4 SoCs
2024-07-26Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull struct file leak fixes from Al Viro: "a couple of leaks on failure exits missing fdput()" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: lirc: rc_dev_get_from_fd(): fix file leak powerpc: fix a file leak in kvm_vcpu_ioctl_enable_cap()
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-25of: remove internal arguments from of_property_for_each_u32()Luca Ceresoli
The of_property_for_each_u32() macro needs five parameters, two of which are primarily meant as internal variables for the macro itself (in the for() clause). Yet these two parameters are used by a few drivers, and this can be considered misuse or at least bad practice. Now that the kernel uses C11 to build, these two parameters can be avoided by declaring them internally, thus changing this pattern: struct property *prop; const __be32 *p; u32 val; of_property_for_each_u32(np, "xyz", prop, p, val) { ... } to this: u32 val; of_property_for_each_u32(np, "xyz", val) { ... } However two variables cannot be declared in the for clause even with C11, so declare one struct that contain the two variables we actually need. As the variables inside this struct are not meant to be used by users of this macro, give the struct instance the noticeable name "_it" so it is visible during code reviews, helping to avoid new code to use it directly. Most usages are trivially converted as they do not use those two parameters, as expected. The non-trivial cases are: - drivers/clk/clk.c, of_clk_get_parent_name(): easily doable anyway - drivers/clk/clk-si5351.c, si5351_dt_parse(): this is more complex as the checks had to be replicated in a different way, making code more verbose and somewhat uglier, but I refrained from a full rework to keep as much of the original code untouched having no hardware to test my changes All the changes have been build tested. The few for which I have the hardware have been runtime-tested too. Reviewed-by: Andre Przywara <andre.przywara@arm.com> # drivers/clk/sunxi/clk-simple-gates.c, drivers/clk/sunxi/clk-sun8i-bus-gates.c Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # drivers/gpio/gpio-brcmstb.c Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> # drivers/irqchip/irq-atmel-aic-common.c Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # drivers/iio/adc/ti_am335x_adc.c Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> # drivers/pwm/pwm-samsung.c Acked-by: Richard Leitner <richard.leitner@linux.dev> # drivers/usb/misc/usb251xb.c Acked-by: Mark Brown <broonie@kernel.org> # sound/soc/codecs/arizona.c Reviewed-by: Richard Fitzgerald <rf@opensource.cirrus.com> # sound/soc/codecs/arizona.c Acked-by: Michael Ellerman <mpe@ellerman.id.au> # arch/powerpc/sysdev/xive/spapr.c Acked-by: Stephen Boyd <sboyd@kernel.org> # clk Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Acked-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20240724-of_property_for_each_u32-v3-1-bea82ce429e2@bootlin.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-07-23Merge tag 'kbuild-v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Remove tristate choice support from Kconfig - Stop using the PROVIDE() directive in the linker script - Reduce the number of links for the combination of CONFIG_KALLSYMS and CONFIG_DEBUG_INFO_BTF - Enable the warning for symbol reference to .exit.* sections by default - Fix warnings in RPM package builds - Improve scripts/make_fit.py to generate a FIT image with separate base DTB and overlays - Improve choice value calculation in Kconfig - Fix conditional prompt behavior in choice in Kconfig - Remove support for the uncommon EMAIL environment variable in Debian package builds - Remove support for the uncommon "name <email>" form for the DEBEMAIL environment variable - Raise the minimum supported GNU Make version to 4.0 - Remove stale code for the absolute kallsyms - Move header files commonly used for host programs to scripts/include/ - Introduce the pacman-pkg target to generate a pacman package used in Arch Linux - Clean up Kconfig * tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits) kbuild: doc: gcc to CC change kallsyms: change sym_entry::percpu_absolute to bool type kallsyms: unify seq and start_pos fields of struct sym_entry kallsyms: add more original symbol type/name in comment lines kallsyms: use \t instead of a tab in printf() kallsyms: avoid repeated calculation of array size for markers kbuild: add script and target to generate pacman package modpost: use generic macros for hash table implementation kbuild: move some helper headers from scripts/kconfig/ to scripts/include/ Makefile: add comment to discourage tools/* addition for kernel builds kbuild: clean up scripts/remove-stale-files kconfig: recursive checks drop file/lineno kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec kallsyms: get rid of code for absolute kallsyms kbuild: Create INSTALL_PATH directory if it does not exist kbuild: Abort make on install failures kconfig: remove 'e1' and 'e2' macros from expression deduplication kconfig: remove SYMBOL_CHOICEVAL flag kconfig: add const qualifiers to several function arguments kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups() ...
2024-07-21Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - In the series "treewide: Refactor heap related implementation", Kuan-Wei Chiu has significantly reworked the min_heap library code and has taught bcachefs to use the new more generic implementation. - Yury Norov's series "Cleanup cpumask.h inclusion in core headers" reworks the cpumask and nodemask headers to make things generally more rational. - Kuan-Wei Chiu has sent along some maintenance work against our sorting library code in the series "lib/sort: Optimizations and cleanups". - More library maintainance work from Christophe Jaillet in the series "Remove usage of the deprecated ida_simple_xx() API". - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the series "nilfs2: eliminate the call to inode_attach_wb()". - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB command error". - Plus the usual shower of singleton patches all over the place. Please see the relevant changelogs for details. * tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits) ia64: scrub ia64 from poison.h watchdog/perf: properly initialize the turbo mode timestamp and rearm counter tsacct: replace strncpy() with strscpy() lib/bch.c: use swap() to improve code test_bpf: convert comma to semicolon init/modpost: conditionally check section mismatch to __meminit* init: remove unused __MEMINIT* macros nilfs2: Constify struct kobj_type nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro math: rational: add missing MODULE_DESCRIPTION() macro lib/zlib: add missing MODULE_DESCRIPTION() macro fs: ufs: add MODULE_DESCRIPTION() lib/rbtree.c: fix the example typo ocfs2: add bounds checking to ocfs2_check_dir_entry() fs: add kernel-doc comments to ocfs2_prepare_orphan_dir() coredump: simplify zap_process() selftests/fpu: add missing MODULE_DESCRIPTION() macro compiler.h: simplify data_race() macro build-id: require program headers to be right after ELF header resource: add missing MODULE_DESCRIPTION() ...
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...