summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2024-09-13powerpc/vdso: Refactor CFLAGS for CVDSO buildChristophe Leroy
In order to avoid two much duplication when we add new VDSO functionnalities in C like getrandom, refactor common CFLAGS. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso32: Add crtsavresChristophe Leroy
Commit 08c18b63d965 ("powerpc/vdso32: Add missing _restgpr_31_x to fix build failure") added _restgpr_31_x to the vdso for gettimeofday, but the work on getrandom shows that we will need more of those functions. Remove _restgpr_31_x and link in crtsavres.o so that we get all save/restore functions when optimising the kernel for size. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso: Fix VDSO data access when running in a non-root time namespaceChristophe Leroy
When running in a non-root time namespace, the global VDSO data page is replaced by a dedicated namespace data page and the global data page is mapped next to it. Detailed explanations can be found at commit 660fd04f9317 ("lib/vdso: Prepare for time namespace support"). When it happens, __kernel_get_syscall_map and __kernel_get_tbfreq and __kernel_sync_dicache don't work anymore because they read 0 instead of the data they need. To address that, clock_mode has to be read. When it is set to VDSO_CLOCKMODE_TIMENS, it means it is a dedicated namespace data page and the global data is located on the following page. Add a macro called get_realdatapage which reads clock_mode and add PAGE_SIZE to the pointer provided by get_datapage macro when clock_mode is equal to VDSO_CLOCKMODE_TIMENS. Use this new macro instead of get_datapage macro except for time functions as they handle it internally. Fixes: 74205b3fc2ef ("powerpc/vdso: Add support for time namespaces") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Closes: https://lore.kernel.org/all/ZtnYqZI-nrsNslwy@zx2c4.com/ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-12Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
2024-09-11Merge v6.11-rc7 into drm-nextSimona Vetter
Thomas needs 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if necessary") in drm-misc, so start the backmerge cascade. Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2024-09-10powerpc: Switch back to struct platform_driver::remove()Uwe Kleine-König
After commit 0edb555a65d1 ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all pwm drivers to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240909130902.851274-2-u.kleine-koenig@baylibre.com
2024-09-10powerpc/pseries/eeh: Fix pseries_eeh_err_injectNarayana Murty N
VFIO_EEH_PE_INJECT_ERR ioctl is currently failing on pseries due to missing implementation of err_inject eeh_ops for pseries. This patch implements pseries_eeh_err_inject in eeh_ops/pseries eeh_ops. Implements support for injecting MMIO load/store error for testing from user space. The check on PCI error type (bus type) code is moved to platform code, since the eeh_pe_inject_err can be allowed to more error types depending on platform requirement. Removal of the check for 'type' in eeh_pe_inject_err() doesn't impact PowerNV as pnv_eeh_err_inject() already has an equivalent check in place. Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240909140220.529333-1-nnmlinux@linux.ibm.com
2024-09-09mm: pass vm_flags to generic_get_unmapped_area()Mark Brown
In preparation for using vm_flags to ensure guard pages for shadow stacks supply them as an argument to generic_get_unmapped_area(). The only user outside of the core code is the PowerPC book3s64 implementation which is trivially wrapping the generic implementation in the radix_enabled() case. No functional changes. Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-2-a46b8b6dc0ed@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09mm: make arch_get_unmapped_area() take vm_flags by defaultMark Brown
Patch series "mm: Care about shadow stack guard gap when getting an unmapped area", v2. As covered in the commit log for c44357c2e76b ("x86/mm: care about shadow stack guard gap during placement") our current mmap() implementation does not take care to ensure that a new mapping isn't placed with existing mappings inside it's own guard gaps. This is particularly important for shadow stacks since if two shadow stacks end up getting placed adjacent to each other then they can overflow into each other which weakens the protection offered by the feature. On x86 there is a custom arch_get_unmapped_area() which was updated by the above commit to cover this case by specifying a start_gap for allocations with VM_SHADOW_STACK. Both arm64 and RISC-V have equivalent features and use the generic implementation of arch_get_unmapped_area() so let's make the equivalent change there so they also don't get shadow stack pages placed without guard pages. The arm64 and RISC-V shadow stack implementations are currently on the list: https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec94743 https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ Given the addition of the use of vm_flags in the generic implementation we also simplify the set of possibilities that have to be dealt with in the core code by making arch_get_unmapped_area() take vm_flags as standard. This is a bit invasive since the prototype change touches quite a few architectures but since the parameter is ignored the change is straightforward, the simplification for the generic code seems worth it. This patch (of 3): When we introduced arch_get_unmapped_area_vmflags() in 961148704acd ("mm: introduce arch_get_unmapped_area_vmflags()") we did so as part of properly supporting guard pages for shadow stacks on x86_64, which uses a custom arch_get_unmapped_area(). Equivalent features are also present on both arm64 and RISC-V, both of which use the generic implementation of arch_get_unmapped_area() and will require equivalent modification there. Rather than continue to deal with having two versions of the functions let's bite the bullet and have all implementations of arch_get_unmapped_area() take vm_flags as a parameter. The new parameter is currently ignored by all implementations other than x86. The only caller that doesn't have a vm_flags available is mm_get_unmapped_area(), as for the x86 implementation and the wrapper used on other architectures this is modified to supply no flags. No functional changes. Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-0-a46b8b6dc0ed@kernel.org Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-1-a46b8b6dc0ed@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Helge Deller <deller@gmx.de> [parisc] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-08smp: Mark smp_prepare_boot_cpu() __initBibo Mao
smp_prepare_boot_cpu() is only called during boot, hence mark it as __init. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://lore.kernel.org/all/20240907082720.452148-1-maobibo@loongson.cn
2024-09-05powerpc: Stop using no_llseekMichael Ellerman
Since commit 868941b14441 ("fs: remove no_llseek"), no_llseek() is simply defined to be NULL, and a NULL llseek means seeking is unsupported. So for statically defined file_operations, such as all these, there's no need or benefit to set llseek = no_llseek. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240903111951.141376-1-mpe@ellerman.id.au
2024-09-05powerpc/64s: Remove the "fast endian switch" syscallMichael Ellerman
The non-standard "fast endian switch" syscall was added in 2008[1], but was never widely used. It was disabled by default in 2017[2], and there's no evidence it's ever been used since. Remove it entirely. A normal endian switch syscall was added in 2015[3]. [1]: 745a14cc264b ("[POWERPC] Add fast little-endian switch system call") [2]: 529d235a0e19 ("powerpc: Add a proper syscall for switching endianness") [3]: 727f13616c45 ("powerpc: Disable the fast-endian switch syscall by default") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240823070830.1269033-1-mpe@ellerman.id.au
2024-09-05powerpc/mm/64s: Restrict THP to Radix or HPT w/64K pagesMichael Ellerman
Transparent hugepages (THP) are not supported when using the Hash Page Table (HPT) MMU with 4K pages. Currently a HPT-only 4K kernel still allows THP to be enabled, which is misleading. Add restrictions to the PPC_THP symbol so that if the kernel is configured with 4K pages and only the HPT MMU (no Radix), then THP is disabled. Note that it's still possible to build a combined Radix/HPT kernel with 4K pages, which does allow THP to be enabled at build time. As such the HPT code still needs to provide some THP related symbols, to allow the build to succeed, but those code paths are never run. See the stubs in arch/powerpc/include/asm/book3s/64/hash-4k.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240823032911.1238471-2-mpe@ellerman.id.au
2024-09-05powerpc/mm/64s: Move THP reqs into a separate symbolMichael Ellerman
Move the Kconfig symbols related to transparent hugepages (THP) under a separate config symbol, separate from CONFIG_PPC_BOOK3S_64. The new symbol is automatically enabled if CONFIG_PPC_BOOK3S_64 is enabled, so there is no behaviour change, except for the existence of the new PPC_THP symbol. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240823032911.1238471-1-mpe@ellerman.id.au
2024-09-05powerpc/64s: Make mmu_hash_ops __ro_after_initMichael Ellerman
The mmu_hash_ops are only assigned to during boot, so mark them __ro_after_init to prevent any further modification. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240821080745.872151-1-mpe@ellerman.id.au
2024-09-05powerpc: Replace kretprobe code with rethook on powerpcAbhishek Dubey
This is an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes: Replace kretprobe with rethook on x86") to powerpc. Rethook follows the existing kretprobe implementation, but separates it from kprobes so that it can be used by fprobe (ftrace-based function entry/exit probes). As such, this patch also enables fprobe to work on powerpc. The only other change compared to the existing kretprobe implementation is doing the return address fixup in arch_rethook_fixup_return(). Reference to other archs: commit b57c2f124098 ("riscv: add riscv rethook implementation") commit 7b0a096436c2 ("LoongArch: Replace kretprobe with rethook") Note: ===== In future, rethook will be only for kretprobe, and kretprobe will be replaced by fprobe. https://lore.kernel.org/all/172000134410.63468.13742222887213469474.stgit@devnote2/ We will adapt the above implementation for powerpc once its upstream. Until then, we can have this implementation of rethook to serve current kretprobe usecases. Reviewed-by: Naveen Rao <naveen@kernel.org> Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240830113131.7597-1-adubey@linux.ibm.com
2024-09-05powerpc: pseries: Constify struct kobj_typeHuang Xiaojia
'struct kobj_type' is not modified. It is only used in kobject_init() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure moves some data to a read-only section, so increase over all security. On a x86_64, compiled with ppc64 defconfig: Before: ====== text data bss dec hex filename 1885 368 16 2269 8dd arch/powerpc/platforms/pseries/vas-sysfs.o After: ====== text data bss dec hex filename 1981 272 16 2269 8dd arch/powerpc/platforms/pseries/vas-sysfs.o Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240826150957.3500237-3-huangxiaojia2@huawei.com
2024-09-05powerpc: powernv: Constify struct kobj_typeHuang Xiaojia
'struct kobj_type' is not modified. It is only used in kobject_init() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure moves some data to a read-only section, so increase over all security. On a x86_64, compiled with ppc64 defconfig: Before: ====== text data bss dec hex filename 3775 256 8 4039 fc7 arch/powerpc/platforms/powernv/opal-dump.o 2679 260 8 2947 b83 arch/powerpc/platforms/powernv/opal-elog.o After: ====== text data bss dec hex filename 3823 208 8 4039 fc7 arch/powerpc/platforms/powernv/opal-dump.o 2727 212 8 2947 b83 arch/powerpc/platforms/powernv/opal-elog.o Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240826150957.3500237-2-huangxiaojia2@huawei.com
2024-09-05powerpc: Constify struct kobj_typeHuang Xiaojia
'struct kobj_type' is not modified. It is only used in kobject_init_and_add()/kobject_init() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure moves some data to a read-only section, so increase over all security. On a x86_64, compiled with ppc64 defconfig: Before: ====== text data bss dec hex filename 7145 606 0 7751 1e47 arch/powerpc/kernel/cacheinfo.o 3663 384 16 4063 fdf arch/powerpc/kernel/secvar-sysfs.o After: ====== text data bss dec hex filename 7193 558 0 7751 1e47 arch/powerpc/kernel/cacheinfo.o 3663 384 16 4063 fdf arch/powerpc/kernel/secvar-sysfs.o Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240826150957.3500237-1-huangxiaojia2@huawei.com
2024-09-04powerpc/mm: add ARCH_PKEY_BITS to KconfigJoey Gouly
The new config option specifies how many bits are in each PKEY. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20240822151113.1479789-2-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-03mm: remove PageActiveMatthew Wilcox (Oracle)
Patch series "Simplify the page flags a little". In the course of our folio conversions, we have made many page flags only used on folios, so we can now remove the page-based accessors. This should cut down compile time a little, and prevent new users from cropping up. There is more that could be done in this area, but it would produce merge conflicts, so I'll sit on those patches until next merge window. We now have line of sight to removing PG_private_2 and PG_private. This patch (of 10): This flag is now only used on folios, so we can remove all the page accessors. [akpm@linux-foundation.org: fix arch/powerpc/mm/pgtable-frag.c] Link: https://lkml.kernel.org/r/20240821193445.2294269-1-willy@infradead.org Link: https://lkml.kernel.org/r/20240821193445.2294269-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03arch, mm: pull out allocation of NODE_DATA to generic codeMike Rapoport (Microsoft)
Architectures that support NUMA duplicate the code that allocates NODE_DATA on the node-local memory with slight variations in reporting of the addresses where the memory was allocated. Use x86 version as the basis for the generic alloc_node_data() function and call this function in architecture specific numa initialization. Round up node data size to SMP_CACHE_BYTES rather than to PAGE_SIZE like x86 used to do since the bootmem era when allocation granularity was PAGE_SIZE anyway. Link: https://lkml.kernel.org/r/20240807064110.1003856-10-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03arch, mm: move definition of node_data to generic codeMike Rapoport (Microsoft)
Every architecture that supports NUMA defines node_data in the same way: struct pglist_data *node_data[MAX_NUMNODES]; No reason to keep multiple copies of this definition and its forward declarations, especially when such forward declaration is the only thing in include/asm/mmzone.h for many architectures. Add definition and declaration of node_data to generic code and drop architecture-specific versions. Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-04dma-mapping: clearly mark DMA ops as an architecture featureChristoph Hellwig
DMA ops are a helper for architectures and not for drivers to override the DMA implementation. Unfortunately driver authors keep ignoring this. Make the fact more clear by renaming the symbol to ARCH_HAS_DMA_OPS and having the two drivers overriding their dma_ops depend on that. These drivers should probably be marked broken, but we can give them a bit of a grace period for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # for IPU6 Acked-by: Robin Murphy <robin.murphy@arm.com>
2024-09-01xz: remove XZ_EXTERN and extern from functionsLasse Collin
XZ_EXTERN was used to make internal functions static in the preboot code. However, in other decompressors this hasn't been done. On x86-64, this makes no difference to the kernel image size. Omit XZ_EXTERN and let some of the internal functions be extern in the preboot code. Omitting XZ_EXTERN from include/linux/xz.h fixes warnings in "make htmldocs" and makes the intradocument links to xz_dec functions work in Documentation/staging/xz.rst. The alternative would have been to add "XZ_EXTERN" to c_id_attributes in Documentation/conf.py but omitting XZ_EXTERN seemed cleaner. Link: https://lore.kernel.org/lkml/20240723205437.3c0664b0@kaneli/ Link: https://lkml.kernel.org/r/20240724110544.16430-1-lasse.collin@tukaani.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Jonathan Corbet <corbet@lwn.net> Cc: Sam James <sam@gentoo.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joel Stanley <joel@jms.id.au> Cc: Jubin Zhong <zhongjubin@huawei.com> Cc: Jules Maselbas <jmaselbas@zdiv.net> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rui Li <me@lirui.org> Cc: Simon Glass <sjg@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01powerpc/vdso: refactor error handlingMichael Ellerman
Linus noticed that the error handling in __arch_setup_additional_pages() fails to clear the mm VDSO pointer if _install_special_mapping() fails. In practice there should be no actual bug, because if there's an error the VDSO pointer is cleared later in arch_setup_additional_pages(). However it's no longer necessary to set the pointer before installing the mapping. Commit c1bab64360e6 ("powerpc/vdso: Move to _install_special_mapping() and remove arch_vma_name()") reworked the code so that the VMA name comes from the vm_special_mapping.name, rather than relying on arch_vma_name(). So rework the code to only set the VDSO pointer once the mappings have been installed correctly, and remove the stale comment. Link: https://lkml.kernel.org/r/20240812082605.743814-4-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: remove arch_unmap()Michael Ellerman
Now that powerpc no longer uses arch_unmap() to handle VDSO unmapping, there are no meaningful implementions left. Drop support for it entirely, and update comments which refer to it. Link: https://lkml.kernel.org/r/20240812082605.743814-3-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01powerpc/mm: handle VDSO unmapping via close() rather than arch_unmap()Michael Ellerman
Add a close() callback to the VDSO special mapping to handle unmapping of the VDSO. That will make it possible to remove the arch_unmap() hook entirely in a subsequent patch. Link: https://lkml.kernel.org/r/20240812082605.743814-2-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm/powerpc: add missing pud helpersPeter Xu
Some new helpers will be needed for pud entry updates soon. Introduce these helpers by referencing the pmd ones. Namely: - pudp_invalidate(): this helper invalidates a huge pud before a split happens, so that the invalidated pud entry will make sure no race will happen (either with software, like a concurrent zap, or hardware, like a/d bit lost). - pud_modify(): this helper applies a new pgprot to an existing huge pud mapping. For more information on why we need these two helpers, please refer to the corresponding pmd helpers in the mprotect() code path. Link: https://lkml.kernel.org/r/20240812181225.1360970-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01powerpc/8xx: document and enforce that split PT locks are not usedDavid Hildenbrand
Right now, we cannot have split PT locks because 8xx does not support SMP. But for the sake of documentation *why* 8xx is fine regarding what we documented in huge_pte_lockptr(), let's just add code to enforce it at the same time as documenting it. This should also make everybody who wants to copy from the 8xx approach of supporting such unusual ways of mapping hugetlb folios aware that it gets tricky once multiple page tables are involved. Link: https://lkml.kernel.org/r/20240726150728.3159964-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: kvmalloc: align kvrealloc() with krealloc()Danilo Krummrich
Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior: - krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation. - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead. - krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation. Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects. Besides that, implementing kvrealloc() by making use of krealloc() and vrealloc() provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation. [dakr@kernel.org: document concurrency restrictions] Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org [dakr@kernel.org: disable KASAN when switching to vmalloc] Link: https://lkml.kernel.org/r/20240730185049.6244-2-dakr@kernel.org [dakr@kernel.org: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/20240730185049.6244-5-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-3-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Christian König <christian.koenig@amd.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-30powerpc/pseries/dlpar: Add device tree nodes for DLPAR IO addHaren Myneni
In the powerpc-pseries specific implementation, the IO hotplug event is handled in the user space (drmgr tool). For the DLPAR IO ADD, the corresponding device tree nodes and properties will be added to the device tree after the device enable. The user space (drmgr tool) uses configure_connector RTAS call with the DRC index to retrieve the device nodes and updates the device tree by writing to /proc/ppc64/ofdt. Under system lockdown, /dev/mem access to allocate buffers for configure_connector RTAS call is restricted which means the user space can not issue this RTAS call and also can not access to /proc/ppc64/ofdt. The pseries implementation need user interaction to power-on and add device to the slot during the ADD event handling. So adds complexity if the complete hotplug ADD event handling moved to the kernel. To overcome /dev/mem access restriction, this patch extends the /sys/kernel/dlpar interface and provides ‘dt add index <drc_index>’ to the user space. The drmgr tool uses this interface to update the device tree whenever the device is added. This interface retrieves device tree nodes for the corresponding DRC index using the configure_connector RTAS call and adds new device nodes / properties to the device tree. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822025028.938332-3-haren@linux.ibm.com
2024-08-30powerpc/pseries/dlpar: Remove device tree node for DLPAR IO removeHaren Myneni
In the powerpc-pseries specific implementation, the IO hotplug event is handled in the user space (drmgr tool). But update the device tree and /dev/mem access to allocate buffers for some RTAS calls are restricted when the kernel lockdown feature is enabled. For the DLPAR IO REMOVE, the corresponding device tree nodes and properties have to be removed from the device tree after the device disable. The user space removes the device tree nodes by updating /proc/ppc64/ofdt which is not allowed under system lockdown is enabled. This restriction can be resolved by moving the complete IO hotplug handling in the kernel. But the pseries implementation need user interaction to power off and to remove device from the slot during hotplug event handling. To overcome the /proc/ppc64/ofdt restriction, this patch extends the /sys/kernel/dlpar interface and provides ‘dt remove index <drc_index>’ to the user space so that drmgr tool can remove the corresponding device tree nodes based on DRC index from the device tree. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822025028.938332-2-haren@linux.ibm.com
2024-08-30powerpc/pseries: Use correct data types from pseries_hp_errorlog structHaren Myneni
_be32 type is defined for some elements in pseries_hp_errorlog struct but also used them u32 after be32_to_cpu() conversion. Example: In handle_dlpar_errorlog() hp_elog->_drc_u.drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index); And later assigned to u32 type dlpar_cpu() - u32 drc_index = hp_elog->_drc_u.drc_index; This incorrect usage is giving the following warnings and the patch resolve these warnings with the correct assignment. arch/powerpc/platforms/pseries/dlpar.c:398:53: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned int [usertype] drc_index @@ got restricted __be32 [usertype] drc_index @@ ... arch/powerpc/platforms/pseries/dlpar.c:418:43: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __be32 [usertype] drc_count @@ got unsigned int [usertype] @@ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408182142.wuIKqYae-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202408182302.o7QRO45S-lkp@intel.com/ Signed-off-by: Haren Myneni <haren@linux.ibm.com> v3: - Fix warnings from using incorrect data types in pseries_hp_errorlog struct v2: - Remove pr_info() and TODO comments - Update more information in the commit logs Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822025028.938332-1-haren@linux.ibm.com
2024-08-30powerpc/vdso: Inconditionally use CFUNC macroChristophe Leroy
During merge of commit 4e991e3c16a3 ("powerpc: add CFUNC assembly label annotation") a fallback version of CFUNC macro was added at the last minute, so it can be used inconditionally. Fixes: 4e991e3c16a3 ("powerpc: add CFUNC assembly label annotation") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/0fa863f2f69b2ca4094ae066fcf1430fb31110c9.1724313540.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/32: Implement validation of emergency stackChristophe Leroy
VMAP stack added an emergency stack on powerpc/32 for when there is a stack overflow, but failed to add stack validation for that emergency stack. That validation is required for show stack. Implement it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/2439d50b019f758db4a6d7b238b06441ab109799.1724156805.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Inconditionally use task PGDIR in DTLB missesChristophe Leroy
At the time being, DATA TLB miss handlers use task PGDIR for user addresses and swapper_pg_dir for kernel addresses. Now that kernel part of swapper_pg_dir is copied into task PGDIR at PGD allocation, it is possible to avoid the above logic and always use task PGDIR. But new kernel PGD entries can still be created after init, in which case those PGD entries may miss in task PGDIR. This can be handled in DATA TLB error handler. However, it needs to be done in real mode because the missing entry might be related to the stack. So implement copy of missing PGD entry in DATA TLB miss handler just after detection of invalid PGD entry. Also replace comparison by same calculation as in previous patch to know if an address belongs to a kernel or user segment. Note that as mentioned in platforms/Kconfig.cputype, SMP is not supported on 603 processors so there is no risk of the PGD entry be populated during the fault. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/a2ba8eeb1c845eeb9e46b6fe3a5e9f841df9a033.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Inconditionally use task PGDIR in ITLB missesChristophe Leroy
Now that modules exec page tables are preallocated, the instruction TLBmiss handler can use task PGDIR inconditionally. Also revise the identification of user vs kernel user space by doing a calculation instead of a comparison: Get the segment number and subtract the number of the first kernel segment. The result is positive for kernel addresses and negative for user addresses, which means that upper 2 bits are 0 for kernel and 3 for user. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/9a3242162ad2faab8019c698e501b326a126ee9e.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Switch r0 and r3 in TLB miss handlersChristophe Leroy
In preparation of next patch that will perform some additional calculations to replace comparison, switch the use of r0 and r3 as r0 has some limitations in some instructions like 'addi/subi'. Also remove outdated comments about the meaning of each register. The registers are used for many things and it would be difficult to accurately describe all things done with a given register. The function is now small enough to get a global view without much description. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/566af5e87685b1a85d3182549c0d520ce2d8877a.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Copy kernel PGD entries into all PGDIRs and preallocate execmem ↵Christophe Leroy
page tables For the same reason as 8xx, copy kernel PGD entries into all PGDIRs in pgd_alloc() and preallocate execmem page tables before creating new PGDs so that all PGD entries related to execmem are copied by pgd_alloc(). This will help reduce the fast-path in TLBmiss handlers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/1a0d1feee07c4cf955f6a43a704c203e5c90fa53.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/32s: Reduce default size of module/execmem areaChristophe Leroy
book3s/32 platforms have usually more memory than 8xx, but it is still not worth reserving a full segment (256 Mbytes) for module text. 64Mbytes should be far enough. Also fix TASK_SIZE when EXECMEM is not selected, and add a build verification for overlap of module execmem space with user segments. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/c1f6a4e47f177d919561c6e97d31af5564923cf6.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Inconditionally use task PGDIR in DTLB missesChristophe Leroy
At the time being, DATA TLB miss handlers use task PGDIR for user addresses and swapper_pg_dir for kernel addresses. Now that kernel part of swapper_pg_dir is copied into task PGDIR at PGD allocation, it is possible to avoid the above logic and always use task PGDIR. But new kernel PGD entries can still be created after init, in which case those PGD entries may miss in task PGDIR. This can be handled in DATA TLB error handler. However, it needs to be done in real mode because the missing entry might be related to the stack. So implement copy of missing PGD entry in the prolog of DATA TLB ERROR handler just after the fixup of DAR. Note that this is feasible because 8xx doesn't implement vmap or ioremap with 8Mbytes pages but only 512kbytes pages which are at PTE level. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/7a76a923d2a111f1d843d8b20b4df0c65d2f4a7b.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Inconditionally use task PGDIR in ITLB missesChristophe Leroy
Now that modules exec page tables are preallocated, the instruction TLBmiss handler can use task PGDIR inconditionally. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/774fd766a8b9bcb9173b5e677d5dad0df2d3970f.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Preallocate execmem page tablesChristophe Leroy
Preallocate execmem page tables before creating new PGDs so that all PGD entries related to execmem can be copied in pgd_alloc(). On 8xx there are 32 Mbytes for execmem by default so this will use 32 kbytes. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/a7180cc1ba59dec4502af39b4e9f3ff91c57280d.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Reduce default size of module/execmem areaChristophe Leroy
8xx boards don't have much memory, the two I know have respectively 32Mbytes and 128Mbytes, so there is no point in having 256 Mbytes of memory for module text. Reduce it to 32Mbytes for 8xx, that's more than enough. Nevertheless, make it a configurable value so that it can be customised if needed. Also add a build verification for overlap of module execmem space with user PMD. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/8db23b61e33a0d1913d814f94bfe71ba7ac78b0f.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Allow setting DATA alignment even with STRICT_KERNEL_RWXChristophe Leroy
It is now possible to not pin kernel text with a 8Mbytes TLB, so the alignment for STRICT_KERNEL_RWX can be relaxed. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/d0d8b05012b392dd166cfd911f14ba2741ce7e1e.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30Revert "powerpc/8xx: Always pin kernel text TLB"Christophe Leroy
This reverts commit bccc58986a2f98e3af349c85c5f49aac7fb19ef2. When STRICT_KERNEL_RWX is selected, EXEC memory must stop where RW memory start. When pinning iTLBs it means an 8M alignment for RW data start. That may be acceptable on boards with a lot of memory but one of my supported boards only has 32 Mbytes and this forced alignment leads to a waste of almost 4 Mbytes with is more than 10% of the total memory. So revert commit bccc58986a2f ("powerpc/8xx: Always pin kernel text TLB") but don't restore previous behaviour in ITLB miss handler as now kernel PGD entries are copied into each process PGDIR. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/01b6780b860c8043b51a1ba9d83acfc6f2dde910.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Copy kernel PGD entries into all PGDIRsChristophe Leroy
In order to avoid having to select PGDIR at each TLB miss based on fault address, copy kernel PGD entries into all PGDIRs in pgd_alloc(). At first it will be used for ITLB misses for kernel TEXT, then for execmem then for kernel DATA. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/c6d2bf5af2ea909071a85bdca8b1f5dc2df134a8.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Fix kernel vs user address comparisonChristophe Leroy
Since commit 9132a2e82adc ("powerpc/8xx: Define a MODULE area below kernel text"), module exec space is below PAGE_OFFSET so not only space above PAGE_OFFSET, but space above TASK_SIZE need to be seen as kernel space. Until now the problem went undetected because by default TASK_SIZE is 0x8000000 which means address space is determined by just checking upper address bit. But when TASK_SIZE is over 0x80000000, PAGE_OFFSET is used for comparison, leading to thinking module addresses are part of user space. Fix it by using TASK_SIZE instead of PAGE_OFFSET for address comparison. Fixes: 9132a2e82adc ("powerpc/8xx: Define a MODULE area below kernel text") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/3f574c9845ff0a023b46cb4f38d2c45aecd769bd.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Fix initial memory mappingChristophe Leroy
Commit cf209951fa7f ("powerpc/8xx: Map linear memory with huge pages") introduced an initial mapping of kernel TEXT using PAGE_KERNEL_TEXT, but the pages that contain kernel TEXT may also contain kernel RODATA, and depending on selected debug options PAGE_KERNEL_TEXT may be either RWX or ROX. RODATA must be writable during init because it also contains ro_after_init data. So use PAGE_KERNEL_X instead to be sure it is RWX. Fixes: cf209951fa7f ("powerpc/8xx: Map linear memory with huge pages") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/dac7a828d8497c4548c91840575a706657baa4f1.1724173828.git.christophe.leroy@csgroup.eu