summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2024-09-17mm: always define pxx_pgprot()Peter Xu
There're: - 8 archs (arc, arm64, include, mips, powerpc, s390, sh, x86) that support pte_pgprot(). - 2 archs (x86, sparc) that support pmd_pgprot(). - 1 arch (x86) that support pud_pgprot(). Always define them to be used in generic code, and then we don't need to fiddle with "#ifdef"s when doing so. Link: https://lkml.kernel.org/r/20240826204353.2228736-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-17Merge tag 'smp-core-2024-09-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: - Prepare the core for supporting parallel hotplug on loongarch - A small set of cleanups and enhancements * tag 'smp-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Mark smp_prepare_boot_cpu() __init cpu: Fix W=1 build kernel-doc warning cpu/hotplug: Provide weak fallback for arch_cpuhp_init_parallel_bringup() cpu/hotplug: Make HOTPLUG_PARALLEL independent of HOTPLUG_SMT
2024-09-16Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The highlights are support for Arm's "Permission Overlay Extension" using memory protection keys, support for running as a protected guest on Android as well as perf support for a bunch of new interconnect PMUs. Summary: ACPI: - Enable PMCG erratum workaround for HiSilicon HIP10 and 11 platforms. - Ensure arm64-specific IORT header is covered by MAINTAINERS. CPU Errata: - Enable workaround for hardware access/dirty issue on Ampere-1A cores. Memory management: - Define PHYSMEM_END to fix a crash in the amdgpu driver. - Avoid tripping over invalid kernel mappings on the kexec() path. - Userspace support for the Permission Overlay Extension (POE) using protection keys. Perf and PMUs: - Add support for the "fixed instruction counter" extension in the CPU PMU architecture. - Extend and fix the event encodings for Apple's M1 CPU PMU. - Allow LSM hooks to decide on SPE permissions for physical profiling. - Add support for the CMN S3 and NI-700 PMUs. Confidential Computing: - Add support for booting an arm64 kernel as a protected guest under Android's "Protected KVM" (pKVM) hypervisor. Selftests: - Fix vector length issues in the SVE/SME sigreturn tests - Fix build warning in the ptrace tests. Timers: - Add support for PR_{G,S}ET_TSC so that 'rr' can deal with non-determinism arising from the architected counter. Miscellaneous: - Rework our IPI-based CPU stopping code to try NMIs if regular IPIs don't succeed. - Minor fixes and cleanups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) perf: arm-ni: Fix an NULL vs IS_ERR() bug arm64: hibernate: Fix warning for cast from restricted gfp_t arm64: esr: Define ESR_ELx_EC_* constants as UL arm64: pkeys: remove redundant WARN perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled MAINTAINERS: List Arm interconnect PMUs as supported perf: Add driver for Arm NI-700 interconnect PMU dt-bindings/perf: Add Arm NI-700 PMU perf/arm-cmn: Improve format attr printing perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check arm64/mm: use lm_alias() with addresses passed to memblock_free() mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags() arm64: Expose the end of the linear map in PHYSMEM_END arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec() arm64/mm: Delete __init region from memblock.reserved perf/arm-cmn: Support CMN S3 dt-bindings: perf: arm-cmn: Add CMN S3 perf/arm-cmn: Refactor DTC PMU register access perf/arm-cmn: Make cycle counts less surprising perf/arm-cmn: Improve build-time assertion ...
2024-09-16Merge tag 'v6.12-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu" "API: - Make self-test asynchronous Algorithms: - Remove MPI functions added for SM3 - Add allocation error checks to remaining MPI functions (introduced for SM3) - Set default Jitter RNG OSR to 3 Drivers: - Add hwrng driver for Rockchip RK3568 SoC - Allow disabling SR-IOV VFs through sysfs in qat - Fix device reset bugs in hisilicon - Fix authenc key parsing by using generic helper in octeontx* Others: - Fix xor benchmarking on parisc" * tag 'v6.12-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (96 commits) crypto: n2 - Set err to EINVAL if snprintf fails for hmac crypto: camm/qi - Use ERR_CAST() to return error-valued pointer crypto: mips/crc32 - Clean up useless assignment operations crypto: qcom-rng - rename *_of_data to *_match_data crypto: qcom-rng - fix support for ACPI-based systems dt-bindings: crypto: qcom,prng: document support for SA8255p crypto: aegis128 - Fix indentation issue in crypto_aegis128_process_crypt() crypto: octeontx* - Select CRYPTO_AUTHENC crypto: testmgr - Hide ENOENT errors crypto: qat - Remove trailing space after \n newline crypto: hisilicon/sec - Remove trailing space after \n newline crypto: algboss - Pass instance creation error up crypto: api - Fix generic algorithm self-test races crypto: hisilicon/qm - inject error before stopping queue crypto: hisilicon/hpre - mask cluster timeout error crypto: hisilicon/qm - reset device before enabling it crypto: hisilicon/trng - modifying the order of header files crypto: hisilicon - add a lock for the qp send operation crypto: hisilicon - fix missed error branch crypto: ccp - do not request interrupt on cmd completion when irqs disabled ...
2024-09-13MIPS: Remove the obsoleted code for include/linux/mv643xx.hGaosheng Cui
Most of the drivers which used this header have been deleted, most of these code is obsoleted, move the only defines that are actually used into arch/powerpc/platforms/chrp/pegasos_eth.c and delete the file completely. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://patch.msgid.link/20240912011949.2726928-1-cuigaosheng1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64Christophe Leroy
Extend getrandom() vDSO implementation to VDSO64. Tested on QEMU on both ppc64_defconfig and ppc64le_defconfig. Results from a Power9 (PowerNV): ~ # ./vdso_test_getrandom bench-single    vdso: 25000000 times in 0.787943615 seconds    libc: 25000000 times in 14.101887252 seconds    syscall: 25000000 times in 14.047475082 seconds Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32Christophe Leroy
To be consistent with other VDSO functions, the function is called __kernel_getrandom() __arch_chacha20_blocks_nostack() fonction is implemented basically with 32 bits operations. It performs 4 QUARTERROUND operations in parallele. There are enough registers to avoid using the stack: On input: r3: output bytes r4: 32-byte key input r5: 8-byte counter input/output r6: number of 64-byte blocks to write to output During operation: stack: pointer to counter (r5) and non-volatile registers (r14-131) r0: counter of blocks (initialised with r6) r4: Value '4' after key has been read, used for indexing r5-r12: key r14-r15: block counter r16-r31: chacha state At the end: r0, r6-r12: Zeroised r5, r14-r31: Restored Performance on powerpc 885 (using kernel selftest): ~# ./vdso_test_getrandom bench-single vdso: 25000000 times in 62.938002291 seconds libc: 25000000 times in 535.581916866 seconds syscall: 25000000 times in 531.525042806 seconds Performance on powerpc 8321 (using kernel selftest): ~# ./vdso_test_getrandom bench-single vdso: 25000000 times in 16.899318858 seconds libc: 25000000 times in 131.050596522 seconds syscall: 25000000 times in 129.794790389 seconds This first patch adds support for VDSO32. As selftests cannot easily be generated only for VDSO32, and because the following patch brings support for VDSO64 anyway, this patch opts out all code in __arch_chacha20_blocks_nostack() so that vdso_test_chacha will not fail to compile and will not crash on PPC64/PPC64LE, allthough the selftest itself will fail. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso: Refactor CFLAGS for CVDSO buildChristophe Leroy
In order to avoid two much duplication when we add new VDSO functionnalities in C like getrandom, refactor common CFLAGS. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso32: Add crtsavresChristophe Leroy
Commit 08c18b63d965 ("powerpc/vdso32: Add missing _restgpr_31_x to fix build failure") added _restgpr_31_x to the vdso for gettimeofday, but the work on getrandom shows that we will need more of those functions. Remove _restgpr_31_x and link in crtsavres.o so that we get all save/restore functions when optimising the kernel for size. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-13powerpc/vdso: Fix VDSO data access when running in a non-root time namespaceChristophe Leroy
When running in a non-root time namespace, the global VDSO data page is replaced by a dedicated namespace data page and the global data page is mapped next to it. Detailed explanations can be found at commit 660fd04f9317 ("lib/vdso: Prepare for time namespace support"). When it happens, __kernel_get_syscall_map and __kernel_get_tbfreq and __kernel_sync_dicache don't work anymore because they read 0 instead of the data they need. To address that, clock_mode has to be read. When it is set to VDSO_CLOCKMODE_TIMENS, it means it is a dedicated namespace data page and the global data is located on the following page. Add a macro called get_realdatapage which reads clock_mode and add PAGE_SIZE to the pointer provided by get_datapage macro when clock_mode is equal to VDSO_CLOCKMODE_TIMENS. Use this new macro instead of get_datapage macro except for time functions as they handle it internally. Fixes: 74205b3fc2ef ("powerpc/vdso: Add support for time namespaces") Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Closes: https://lore.kernel.org/all/ZtnYqZI-nrsNslwy@zx2c4.com/ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-09-12Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
2024-09-11Merge v6.11-rc7 into drm-nextSimona Vetter
Thomas needs 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O if necessary") in drm-misc, so start the backmerge cascade. Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2024-09-10powerpc: Switch back to struct platform_driver::remove()Uwe Kleine-König
After commit 0edb555a65d1 ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all pwm drivers to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240909130902.851274-2-u.kleine-koenig@baylibre.com
2024-09-10powerpc/pseries/eeh: Fix pseries_eeh_err_injectNarayana Murty N
VFIO_EEH_PE_INJECT_ERR ioctl is currently failing on pseries due to missing implementation of err_inject eeh_ops for pseries. This patch implements pseries_eeh_err_inject in eeh_ops/pseries eeh_ops. Implements support for injecting MMIO load/store error for testing from user space. The check on PCI error type (bus type) code is moved to platform code, since the eeh_pe_inject_err can be allowed to more error types depending on platform requirement. Removal of the check for 'type' in eeh_pe_inject_err() doesn't impact PowerNV as pnv_eeh_err_inject() already has an equivalent check in place. Signed-off-by: Narayana Murty N <nnmlinux@linux.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240909140220.529333-1-nnmlinux@linux.ibm.com
2024-09-09mm: pass vm_flags to generic_get_unmapped_area()Mark Brown
In preparation for using vm_flags to ensure guard pages for shadow stacks supply them as an argument to generic_get_unmapped_area(). The only user outside of the core code is the PowerPC book3s64 implementation which is trivially wrapping the generic implementation in the radix_enabled() case. No functional changes. Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-2-a46b8b6dc0ed@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-09mm: make arch_get_unmapped_area() take vm_flags by defaultMark Brown
Patch series "mm: Care about shadow stack guard gap when getting an unmapped area", v2. As covered in the commit log for c44357c2e76b ("x86/mm: care about shadow stack guard gap during placement") our current mmap() implementation does not take care to ensure that a new mapping isn't placed with existing mappings inside it's own guard gaps. This is particularly important for shadow stacks since if two shadow stacks end up getting placed adjacent to each other then they can overflow into each other which weakens the protection offered by the feature. On x86 there is a custom arch_get_unmapped_area() which was updated by the above commit to cover this case by specifying a start_gap for allocations with VM_SHADOW_STACK. Both arm64 and RISC-V have equivalent features and use the generic implementation of arch_get_unmapped_area() so let's make the equivalent change there so they also don't get shadow stack pages placed without guard pages. The arm64 and RISC-V shadow stack implementations are currently on the list: https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec94743 https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ Given the addition of the use of vm_flags in the generic implementation we also simplify the set of possibilities that have to be dealt with in the core code by making arch_get_unmapped_area() take vm_flags as standard. This is a bit invasive since the prototype change touches quite a few architectures but since the parameter is ignored the change is straightforward, the simplification for the generic code seems worth it. This patch (of 3): When we introduced arch_get_unmapped_area_vmflags() in 961148704acd ("mm: introduce arch_get_unmapped_area_vmflags()") we did so as part of properly supporting guard pages for shadow stacks on x86_64, which uses a custom arch_get_unmapped_area(). Equivalent features are also present on both arm64 and RISC-V, both of which use the generic implementation of arch_get_unmapped_area() and will require equivalent modification there. Rather than continue to deal with having two versions of the functions let's bite the bullet and have all implementations of arch_get_unmapped_area() take vm_flags as a parameter. The new parameter is currently ignored by all implementations other than x86. The only caller that doesn't have a vm_flags available is mm_get_unmapped_area(), as for the x86 implementation and the wrapper used on other architectures this is modified to supply no flags. No functional changes. Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-0-a46b8b6dc0ed@kernel.org Link: https://lkml.kernel.org/r/20240904-mm-generic-shadow-stack-guard-v2-1-a46b8b6dc0ed@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Helge Deller <deller@gmx.de> [parisc] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-08smp: Mark smp_prepare_boot_cpu() __initBibo Mao
smp_prepare_boot_cpu() is only called during boot, hence mark it as __init. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://lore.kernel.org/all/20240907082720.452148-1-maobibo@loongson.cn
2024-09-05powerpc: Stop using no_llseekMichael Ellerman
Since commit 868941b14441 ("fs: remove no_llseek"), no_llseek() is simply defined to be NULL, and a NULL llseek means seeking is unsupported. So for statically defined file_operations, such as all these, there's no need or benefit to set llseek = no_llseek. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240903111951.141376-1-mpe@ellerman.id.au
2024-09-05powerpc/64s: Remove the "fast endian switch" syscallMichael Ellerman
The non-standard "fast endian switch" syscall was added in 2008[1], but was never widely used. It was disabled by default in 2017[2], and there's no evidence it's ever been used since. Remove it entirely. A normal endian switch syscall was added in 2015[3]. [1]: 745a14cc264b ("[POWERPC] Add fast little-endian switch system call") [2]: 529d235a0e19 ("powerpc: Add a proper syscall for switching endianness") [3]: 727f13616c45 ("powerpc: Disable the fast-endian switch syscall by default") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240823070830.1269033-1-mpe@ellerman.id.au
2024-09-05powerpc/mm/64s: Restrict THP to Radix or HPT w/64K pagesMichael Ellerman
Transparent hugepages (THP) are not supported when using the Hash Page Table (HPT) MMU with 4K pages. Currently a HPT-only 4K kernel still allows THP to be enabled, which is misleading. Add restrictions to the PPC_THP symbol so that if the kernel is configured with 4K pages and only the HPT MMU (no Radix), then THP is disabled. Note that it's still possible to build a combined Radix/HPT kernel with 4K pages, which does allow THP to be enabled at build time. As such the HPT code still needs to provide some THP related symbols, to allow the build to succeed, but those code paths are never run. See the stubs in arch/powerpc/include/asm/book3s/64/hash-4k.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240823032911.1238471-2-mpe@ellerman.id.au
2024-09-05powerpc/mm/64s: Move THP reqs into a separate symbolMichael Ellerman
Move the Kconfig symbols related to transparent hugepages (THP) under a separate config symbol, separate from CONFIG_PPC_BOOK3S_64. The new symbol is automatically enabled if CONFIG_PPC_BOOK3S_64 is enabled, so there is no behaviour change, except for the existence of the new PPC_THP symbol. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240823032911.1238471-1-mpe@ellerman.id.au
2024-09-05powerpc/64s: Make mmu_hash_ops __ro_after_initMichael Ellerman
The mmu_hash_ops are only assigned to during boot, so mark them __ro_after_init to prevent any further modification. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240821080745.872151-1-mpe@ellerman.id.au
2024-09-05powerpc: Replace kretprobe code with rethook on powerpcAbhishek Dubey
This is an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes: Replace kretprobe with rethook on x86") to powerpc. Rethook follows the existing kretprobe implementation, but separates it from kprobes so that it can be used by fprobe (ftrace-based function entry/exit probes). As such, this patch also enables fprobe to work on powerpc. The only other change compared to the existing kretprobe implementation is doing the return address fixup in arch_rethook_fixup_return(). Reference to other archs: commit b57c2f124098 ("riscv: add riscv rethook implementation") commit 7b0a096436c2 ("LoongArch: Replace kretprobe with rethook") Note: ===== In future, rethook will be only for kretprobe, and kretprobe will be replaced by fprobe. https://lore.kernel.org/all/172000134410.63468.13742222887213469474.stgit@devnote2/ We will adapt the above implementation for powerpc once its upstream. Until then, we can have this implementation of rethook to serve current kretprobe usecases. Reviewed-by: Naveen Rao <naveen@kernel.org> Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240830113131.7597-1-adubey@linux.ibm.com
2024-09-05powerpc: pseries: Constify struct kobj_typeHuang Xiaojia
'struct kobj_type' is not modified. It is only used in kobject_init() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure moves some data to a read-only section, so increase over all security. On a x86_64, compiled with ppc64 defconfig: Before: ====== text data bss dec hex filename 1885 368 16 2269 8dd arch/powerpc/platforms/pseries/vas-sysfs.o After: ====== text data bss dec hex filename 1981 272 16 2269 8dd arch/powerpc/platforms/pseries/vas-sysfs.o Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240826150957.3500237-3-huangxiaojia2@huawei.com
2024-09-05powerpc: powernv: Constify struct kobj_typeHuang Xiaojia
'struct kobj_type' is not modified. It is only used in kobject_init() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure moves some data to a read-only section, so increase over all security. On a x86_64, compiled with ppc64 defconfig: Before: ====== text data bss dec hex filename 3775 256 8 4039 fc7 arch/powerpc/platforms/powernv/opal-dump.o 2679 260 8 2947 b83 arch/powerpc/platforms/powernv/opal-elog.o After: ====== text data bss dec hex filename 3823 208 8 4039 fc7 arch/powerpc/platforms/powernv/opal-dump.o 2727 212 8 2947 b83 arch/powerpc/platforms/powernv/opal-elog.o Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240826150957.3500237-2-huangxiaojia2@huawei.com
2024-09-05powerpc: Constify struct kobj_typeHuang Xiaojia
'struct kobj_type' is not modified. It is only used in kobject_init_and_add()/kobject_init() which takes a 'const struct kobj_type *ktype' parameter. Constifying this structure moves some data to a read-only section, so increase over all security. On a x86_64, compiled with ppc64 defconfig: Before: ====== text data bss dec hex filename 7145 606 0 7751 1e47 arch/powerpc/kernel/cacheinfo.o 3663 384 16 4063 fdf arch/powerpc/kernel/secvar-sysfs.o After: ====== text data bss dec hex filename 7193 558 0 7751 1e47 arch/powerpc/kernel/cacheinfo.o 3663 384 16 4063 fdf arch/powerpc/kernel/secvar-sysfs.o Signed-off-by: Huang Xiaojia <huangxiaojia2@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240826150957.3500237-1-huangxiaojia2@huawei.com
2024-09-04powerpc/mm: add ARCH_PKEY_BITS to KconfigJoey Gouly
The new config option specifies how many bits are in each PKEY. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20240822151113.1479789-2-joey.gouly@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-09-03mm: remove PageActiveMatthew Wilcox (Oracle)
Patch series "Simplify the page flags a little". In the course of our folio conversions, we have made many page flags only used on folios, so we can now remove the page-based accessors. This should cut down compile time a little, and prevent new users from cropping up. There is more that could be done in this area, but it would produce merge conflicts, so I'll sit on those patches until next merge window. We now have line of sight to removing PG_private_2 and PG_private. This patch (of 10): This flag is now only used on folios, so we can remove all the page accessors. [akpm@linux-foundation.org: fix arch/powerpc/mm/pgtable-frag.c] Link: https://lkml.kernel.org/r/20240821193445.2294269-1-willy@infradead.org Link: https://lkml.kernel.org/r/20240821193445.2294269-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03arch, mm: pull out allocation of NODE_DATA to generic codeMike Rapoport (Microsoft)
Architectures that support NUMA duplicate the code that allocates NODE_DATA on the node-local memory with slight variations in reporting of the addresses where the memory was allocated. Use x86 version as the basis for the generic alloc_node_data() function and call this function in architecture specific numa initialization. Round up node data size to SMP_CACHE_BYTES rather than to PAGE_SIZE like x86 used to do since the bootmem era when allocation granularity was PAGE_SIZE anyway. Link: https://lkml.kernel.org/r/20240807064110.1003856-10-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03arch, mm: move definition of node_data to generic codeMike Rapoport (Microsoft)
Every architecture that supports NUMA defines node_data in the same way: struct pglist_data *node_data[MAX_NUMNODES]; No reason to keep multiple copies of this definition and its forward declarations, especially when such forward declaration is the only thing in include/asm/mmzone.h for many architectures. Add definition and declaration of node_data to generic code and drop architecture-specific versions. Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-04dma-mapping: clearly mark DMA ops as an architecture featureChristoph Hellwig
DMA ops are a helper for architectures and not for drivers to override the DMA implementation. Unfortunately driver authors keep ignoring this. Make the fact more clear by renaming the symbol to ARCH_HAS_DMA_OPS and having the two drivers overriding their dma_ops depend on that. These drivers should probably be marked broken, but we can give them a bit of a grace period for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # for IPU6 Acked-by: Robin Murphy <robin.murphy@arm.com>
2024-09-01xz: remove XZ_EXTERN and extern from functionsLasse Collin
XZ_EXTERN was used to make internal functions static in the preboot code. However, in other decompressors this hasn't been done. On x86-64, this makes no difference to the kernel image size. Omit XZ_EXTERN and let some of the internal functions be extern in the preboot code. Omitting XZ_EXTERN from include/linux/xz.h fixes warnings in "make htmldocs" and makes the intradocument links to xz_dec functions work in Documentation/staging/xz.rst. The alternative would have been to add "XZ_EXTERN" to c_id_attributes in Documentation/conf.py but omitting XZ_EXTERN seemed cleaner. Link: https://lore.kernel.org/lkml/20240723205437.3c0664b0@kaneli/ Link: https://lkml.kernel.org/r/20240724110544.16430-1-lasse.collin@tukaani.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Jonathan Corbet <corbet@lwn.net> Cc: Sam James <sam@gentoo.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joel Stanley <joel@jms.id.au> Cc: Jubin Zhong <zhongjubin@huawei.com> Cc: Jules Maselbas <jmaselbas@zdiv.net> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rui Li <me@lirui.org> Cc: Simon Glass <sjg@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01powerpc/vdso: refactor error handlingMichael Ellerman
Linus noticed that the error handling in __arch_setup_additional_pages() fails to clear the mm VDSO pointer if _install_special_mapping() fails. In practice there should be no actual bug, because if there's an error the VDSO pointer is cleared later in arch_setup_additional_pages(). However it's no longer necessary to set the pointer before installing the mapping. Commit c1bab64360e6 ("powerpc/vdso: Move to _install_special_mapping() and remove arch_vma_name()") reworked the code so that the VMA name comes from the vm_special_mapping.name, rather than relying on arch_vma_name(). So rework the code to only set the VDSO pointer once the mappings have been installed correctly, and remove the stale comment. Link: https://lkml.kernel.org/r/20240812082605.743814-4-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: remove arch_unmap()Michael Ellerman
Now that powerpc no longer uses arch_unmap() to handle VDSO unmapping, there are no meaningful implementions left. Drop support for it entirely, and update comments which refer to it. Link: https://lkml.kernel.org/r/20240812082605.743814-3-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01powerpc/mm: handle VDSO unmapping via close() rather than arch_unmap()Michael Ellerman
Add a close() callback to the VDSO special mapping to handle unmapping of the VDSO. That will make it possible to remove the arch_unmap() hook entirely in a subsequent patch. Link: https://lkml.kernel.org/r/20240812082605.743814-2-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Jeff Xu <jeffxu@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm/powerpc: add missing pud helpersPeter Xu
Some new helpers will be needed for pud entry updates soon. Introduce these helpers by referencing the pmd ones. Namely: - pudp_invalidate(): this helper invalidates a huge pud before a split happens, so that the invalidated pud entry will make sure no race will happen (either with software, like a concurrent zap, or hardware, like a/d bit lost). - pud_modify(): this helper applies a new pgprot to an existing huge pud mapping. For more information on why we need these two helpers, please refer to the corresponding pmd helpers in the mprotect() code path. Link: https://lkml.kernel.org/r/20240812181225.1360970-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01powerpc/8xx: document and enforce that split PT locks are not usedDavid Hildenbrand
Right now, we cannot have split PT locks because 8xx does not support SMP. But for the sake of documentation *why* 8xx is fine regarding what we documented in huge_pte_lockptr(), let's just add code to enforce it at the same time as documenting it. This should also make everybody who wants to copy from the 8xx approach of supporting such unusual ways of mapping hugetlb folios aware that it gets tricky once multiple page tables are involved. Link: https://lkml.kernel.org/r/20240726150728.3159964-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: kvmalloc: align kvrealloc() with krealloc()Danilo Krummrich
Besides the obvious (and desired) difference between krealloc() and kvrealloc(), there is some inconsistency in their function signatures and behavior: - krealloc() frees the memory when the requested size is zero, whereas kvrealloc() simply returns a pointer to the existing allocation. - krealloc() behaves like kmalloc() if a NULL pointer is passed, whereas kvrealloc() does not accept a NULL pointer at all and, if passed, would fault instead. - krealloc() is self-contained, whereas kvrealloc() relies on the caller to provide the size of the previous allocation. Inconsistent behavior throughout allocation APIs is error prone, hence make kvrealloc() behave like krealloc(), which seems superior in all mentioned aspects. Besides that, implementing kvrealloc() by making use of krealloc() and vrealloc() provides oppertunities to grow (and shrink) allocations more efficiently. For instance, vrealloc() can be optimized to allocate and map additional pages to grow the allocation or unmap and free unused pages to shrink the allocation. [dakr@kernel.org: document concurrency restrictions] Link: https://lkml.kernel.org/r/20240725125442.4957-1-dakr@kernel.org [dakr@kernel.org: disable KASAN when switching to vmalloc] Link: https://lkml.kernel.org/r/20240730185049.6244-2-dakr@kernel.org [dakr@kernel.org: properly document __GFP_ZERO behavior] Link: https://lkml.kernel.org/r/20240730185049.6244-5-dakr@kernel.org Link: https://lkml.kernel.org/r/20240722163111.4766-3-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Christian König <christian.koenig@amd.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <kees@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-30powerpc/pseries/dlpar: Add device tree nodes for DLPAR IO addHaren Myneni
In the powerpc-pseries specific implementation, the IO hotplug event is handled in the user space (drmgr tool). For the DLPAR IO ADD, the corresponding device tree nodes and properties will be added to the device tree after the device enable. The user space (drmgr tool) uses configure_connector RTAS call with the DRC index to retrieve the device nodes and updates the device tree by writing to /proc/ppc64/ofdt. Under system lockdown, /dev/mem access to allocate buffers for configure_connector RTAS call is restricted which means the user space can not issue this RTAS call and also can not access to /proc/ppc64/ofdt. The pseries implementation need user interaction to power-on and add device to the slot during the ADD event handling. So adds complexity if the complete hotplug ADD event handling moved to the kernel. To overcome /dev/mem access restriction, this patch extends the /sys/kernel/dlpar interface and provides ‘dt add index <drc_index>’ to the user space. The drmgr tool uses this interface to update the device tree whenever the device is added. This interface retrieves device tree nodes for the corresponding DRC index using the configure_connector RTAS call and adds new device nodes / properties to the device tree. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822025028.938332-3-haren@linux.ibm.com
2024-08-30powerpc/pseries/dlpar: Remove device tree node for DLPAR IO removeHaren Myneni
In the powerpc-pseries specific implementation, the IO hotplug event is handled in the user space (drmgr tool). But update the device tree and /dev/mem access to allocate buffers for some RTAS calls are restricted when the kernel lockdown feature is enabled. For the DLPAR IO REMOVE, the corresponding device tree nodes and properties have to be removed from the device tree after the device disable. The user space removes the device tree nodes by updating /proc/ppc64/ofdt which is not allowed under system lockdown is enabled. This restriction can be resolved by moving the complete IO hotplug handling in the kernel. But the pseries implementation need user interaction to power off and to remove device from the slot during hotplug event handling. To overcome the /proc/ppc64/ofdt restriction, this patch extends the /sys/kernel/dlpar interface and provides ‘dt remove index <drc_index>’ to the user space so that drmgr tool can remove the corresponding device tree nodes based on DRC index from the device tree. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822025028.938332-2-haren@linux.ibm.com
2024-08-30powerpc/pseries: Use correct data types from pseries_hp_errorlog structHaren Myneni
_be32 type is defined for some elements in pseries_hp_errorlog struct but also used them u32 after be32_to_cpu() conversion. Example: In handle_dlpar_errorlog() hp_elog->_drc_u.drc_index = be32_to_cpu(hp_elog->_drc_u.drc_index); And later assigned to u32 type dlpar_cpu() - u32 drc_index = hp_elog->_drc_u.drc_index; This incorrect usage is giving the following warnings and the patch resolve these warnings with the correct assignment. arch/powerpc/platforms/pseries/dlpar.c:398:53: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned int [usertype] drc_index @@ got restricted __be32 [usertype] drc_index @@ ... arch/powerpc/platforms/pseries/dlpar.c:418:43: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __be32 [usertype] drc_count @@ got unsigned int [usertype] @@ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408182142.wuIKqYae-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202408182302.o7QRO45S-lkp@intel.com/ Signed-off-by: Haren Myneni <haren@linux.ibm.com> v3: - Fix warnings from using incorrect data types in pseries_hp_errorlog struct v2: - Remove pr_info() and TODO comments - Update more information in the commit logs Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240822025028.938332-1-haren@linux.ibm.com
2024-08-30powerpc/vdso: Inconditionally use CFUNC macroChristophe Leroy
During merge of commit 4e991e3c16a3 ("powerpc: add CFUNC assembly label annotation") a fallback version of CFUNC macro was added at the last minute, so it can be used inconditionally. Fixes: 4e991e3c16a3 ("powerpc: add CFUNC assembly label annotation") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/0fa863f2f69b2ca4094ae066fcf1430fb31110c9.1724313540.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/32: Implement validation of emergency stackChristophe Leroy
VMAP stack added an emergency stack on powerpc/32 for when there is a stack overflow, but failed to add stack validation for that emergency stack. That validation is required for show stack. Implement it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/2439d50b019f758db4a6d7b238b06441ab109799.1724156805.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Inconditionally use task PGDIR in DTLB missesChristophe Leroy
At the time being, DATA TLB miss handlers use task PGDIR for user addresses and swapper_pg_dir for kernel addresses. Now that kernel part of swapper_pg_dir is copied into task PGDIR at PGD allocation, it is possible to avoid the above logic and always use task PGDIR. But new kernel PGD entries can still be created after init, in which case those PGD entries may miss in task PGDIR. This can be handled in DATA TLB error handler. However, it needs to be done in real mode because the missing entry might be related to the stack. So implement copy of missing PGD entry in DATA TLB miss handler just after detection of invalid PGD entry. Also replace comparison by same calculation as in previous patch to know if an address belongs to a kernel or user segment. Note that as mentioned in platforms/Kconfig.cputype, SMP is not supported on 603 processors so there is no risk of the PGD entry be populated during the fault. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/a2ba8eeb1c845eeb9e46b6fe3a5e9f841df9a033.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Inconditionally use task PGDIR in ITLB missesChristophe Leroy
Now that modules exec page tables are preallocated, the instruction TLBmiss handler can use task PGDIR inconditionally. Also revise the identification of user vs kernel user space by doing a calculation instead of a comparison: Get the segment number and subtract the number of the first kernel segment. The result is positive for kernel addresses and negative for user addresses, which means that upper 2 bits are 0 for kernel and 3 for user. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/9a3242162ad2faab8019c698e501b326a126ee9e.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Switch r0 and r3 in TLB miss handlersChristophe Leroy
In preparation of next patch that will perform some additional calculations to replace comparison, switch the use of r0 and r3 as r0 has some limitations in some instructions like 'addi/subi'. Also remove outdated comments about the meaning of each register. The registers are used for many things and it would be difficult to accurately describe all things done with a given register. The function is now small enough to get a global view without much description. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/566af5e87685b1a85d3182549c0d520ce2d8877a.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/603: Copy kernel PGD entries into all PGDIRs and preallocate execmem ↵Christophe Leroy
page tables For the same reason as 8xx, copy kernel PGD entries into all PGDIRs in pgd_alloc() and preallocate execmem page tables before creating new PGDs so that all PGD entries related to execmem are copied by pgd_alloc(). This will help reduce the fast-path in TLBmiss handlers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/1a0d1feee07c4cf955f6a43a704c203e5c90fa53.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/32s: Reduce default size of module/execmem areaChristophe Leroy
book3s/32 platforms have usually more memory than 8xx, but it is still not worth reserving a full segment (256 Mbytes) for module text. 64Mbytes should be far enough. Also fix TASK_SIZE when EXECMEM is not selected, and add a build verification for overlap of module execmem space with user segments. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/c1f6a4e47f177d919561c6e97d31af5564923cf6.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Inconditionally use task PGDIR in DTLB missesChristophe Leroy
At the time being, DATA TLB miss handlers use task PGDIR for user addresses and swapper_pg_dir for kernel addresses. Now that kernel part of swapper_pg_dir is copied into task PGDIR at PGD allocation, it is possible to avoid the above logic and always use task PGDIR. But new kernel PGD entries can still be created after init, in which case those PGD entries may miss in task PGDIR. This can be handled in DATA TLB error handler. However, it needs to be done in real mode because the missing entry might be related to the stack. So implement copy of missing PGD entry in the prolog of DATA TLB ERROR handler just after the fixup of DAR. Note that this is feasible because 8xx doesn't implement vmap or ioremap with 8Mbytes pages but only 512kbytes pages which are at PTE level. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/7a76a923d2a111f1d843d8b20b4df0c65d2f4a7b.1724173828.git.christophe.leroy@csgroup.eu
2024-08-30powerpc/8xx: Inconditionally use task PGDIR in ITLB missesChristophe Leroy
Now that modules exec page tables are preallocated, the instruction TLBmiss handler can use task PGDIR inconditionally. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/774fd766a8b9bcb9173b5e677d5dad0df2d3970f.1724173828.git.christophe.leroy@csgroup.eu