summaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)Author
2020-12-08powerpc/book3s64/kuap: Improve error reporting with KUAPAneesh Kumar K.V
This partially reverts commit eb232b162446 ("powerpc/book3s64/kuap: Improve error reporting with KUAP") and update the fault handler to print [ 55.022514] Kernel attempted to access user page (7e6725b70000) - exploit attempt? (uid: 0) [ 55.022528] BUG: Unable to handle kernel data access on read at 0x7e6725b70000 [ 55.022533] Faulting instruction address: 0xc000000000e8b9bc [ 55.022540] Oops: Kernel access of bad area, sig: 11 [#1] .... when the kernel access userspace address without unlocking AMR. bad_kuap_fault() is added as part of commit 5e5be3aed230 ("powerpc/mm: Detect bad KUAP faults") to catch userspace access incorrectly blocked by AMR. Hence retain the full stack dump there even with hash translation. Also, add a comment explaining the difference between hash and radix. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201208031539.84878-1-aneesh.kumar@linux.ibm.com
2020-12-08powerpc/mm: Fix KUAP warning by providing copy_from_kernel_nofault_allowed()Christophe Leroy
Since commit c33165253492 ("powerpc: use non-set_fs based maccess routines"), userspace access is not granted anymore when using copy_from_kernel_nofault() However, kthread_probe_data() uses copy_from_kernel_nofault() to check validity of pointers. When the pointer is NULL, it points to userspace, leading to a KUAP fault and triggering the following big hammer warning many times when you request a sysrq "show task": [ 1117.202054] ------------[ cut here ]------------ [ 1117.202102] Bug: fault blocked by AP register ! [ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec [ 1117.202310] Modules linked in: [ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G W 5.10.0-rc5-01340-g83f53be2de31-dirty #4175 [ 1117.202499] NIP: c0012048 LR: c0012048 CTR: 00000000 [ 1117.202573] REGS: cacdbb88 TRAP: 0700 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) [ 1117.202625] MSR: 00021032 <ME,IR,DR,RI> CR: 24082222 XER: 20000000 [ 1117.202899] [ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640 [ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000 [ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900 [ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80 [ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec [ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec [ 1117.204509] Call Trace: [ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable) [ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34 [ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0 [ 1117.204979] NIP: c010dbec LR: c010dbac CTR: 00000001 [ 1117.205053] REGS: cacdbc80 TRAP: 0301 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) [ 1117.205104] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28082224 XER: 00000000 [ 1117.205416] DAR: 0000005c DSISR: c0000000 [ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f [ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224 [ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0 [ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0 [ 1117.206258] --- interrupt: 301 [ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable) [ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194 [ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168 [ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100 [ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24 [ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0 [ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74 [ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114 [ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478 [ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128 [ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34 [ 1117.207938] --- interrupt: c01 at 0xfd4e784 [ 1117.208008] NIP: 0fd4e784 LR: 0fe0f244 CTR: 10048d38 [ 1117.208083] REGS: cacdbf48 TRAP: 0c01 Tainted: G W (5.10.0-rc5-01340-g83f53be2de31-dirty) [ 1117.208134] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 44002222 XER: 00000000 [ 1117.208470] [ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff [ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000 [ 1117.209104] NIP [0fd4e784] 0xfd4e784 [ 1117.209180] LR [0fe0f244] 0xfe0f244 [ 1117.209236] --- interrupt: c01 [ 1117.209274] Instruction dump: [ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60 [ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6 [ 1117.210102] ---[ end trace 1927c0323393af3e ]--- To avoid that, copy_from_kernel_nofault_allowed() is used to check whether the address is a valid kernel address. But the default version of it returns true for any address. Provide a powerpc version of copy_from_kernel_nofault_allowed() that returns false when the address is below TASK_USER_MAX, so that copy_from_kernel_nofault() will return -ERANGE. Fixes: c33165253492 ("powerpc: use non-set_fs based maccess routines") Reported-by: Qian Cai <qcai@redhat.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/18bcb456d32a3e74f5ae241fd6f1580c092d07f5.1607360230.git.christophe.leroy@csgroup.eu
2020-12-05Merge tag 'powerpc-5.10-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.10: - Three commits fixing possible missed TLB invalidations for multi-threaded processes when CPUs are hotplugged in and out. - A fix for a host crash triggerable by host userspace (qemu) in KVM on Power9. - A fix for a host crash in machine check handling when running HPT guests on a HPT host. - One commit fixing potential missed TLB invalidations when using the hash MMU on Power9 or later. - A regression fix for machines with CPUs on node 0 but no memory. Thanks to Aneesh Kumar K.V, Cédric Le Goater, Greg Kurz, Milan Mohanty, Milton Miller, Nicholas Piggin, Paul Mackerras, and Srikar Dronamraju" * tag 'powerpc-5.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/powernv: Fix memory corruption when saving SLB entries on MCE KVM: PPC: Book3S HV: XIVE: Fix vCPU id sanity check powerpc/numa: Fix a regression on memoryless node 0 powerpc/64s: Trim offlined CPUs from mm_cpumasks kernel/cpu: add arch override for clear_tasks_mm_cpumask() mm handling powerpc/64s/pseries: Fix hash tlbiel_all_isa300 for guest kernels powerpc/64s: Fix hash ISA v3.0 TLBIEL instruction generation
2020-12-05powerpc: Retire e200 core (mpc555x processor)Christophe Leroy
There is no defconfig selecting CONFIG_E200, and no platform. e200 is an earlier version of booke, a predecessor of e500, with some particularities like an unified cache instead of both an instruction cache and a data cache. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/34ebc3ba2c768d97f363bd5f2deea2356e9ae127.1605589460.git.christophe.leroy@csgroup.eu
2020-12-04lkdtm/powerpc: Add SLB multihit testGanesh Goudar
To check machine check handling, add support to inject slb multihit errors. Co-developed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> [mpe: Use CONFIG_PPC_BOOK3S_64 to fix compile errors reported by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201130083057.135610-1-ganeshgr@linux.ibm.com
2020-12-04powerpc/44x: Don't support 47x code and non 47x code at the same timeChristophe Leroy
440/460 variants and 470 variants are not compatible, no need to make code supporting both and using MMU features. Just use CONFIG_PPC_47x to decide what to build. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c3e64da3d5d068c69a201e03bbae7da055761e5b.1603041883.git.christophe.leroy@csgroup.eu
2020-12-04powerpc/mm: Remove useless #ifndef CPU_FTR_COHERENT_ICACHE in mem.cChristophe Leroy
Since commit 10b35d9978ac ("[PATCH] powerpc: merged asm/cputable.h"), CPU_FTR_COHERENT_ICACHE has always been defined. Remove the #ifndef CPU_FTR_COHERENT_ICACHE block. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e26ddc1d6f6aca739dd8d2b7c67351ead559b084.1602489664.git.christophe.leroy@csgroup.eu
2020-12-04powerpc/mm: Fix verification of MMU_FTR_TYPE_44xChristophe Leroy
MMU_FTR_TYPE_44x cannot be checked by cpu_has_feature() Use mmu_has_feature() instead Fixes: 23eb7f560a2a ("powerpc: Convert flush_icache_range & friends to C") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ceede82fadf37f3b8275e61fcf8cf29a3e2ec7fe.1602351011.git.christophe.leroy@csgroup.eu
2020-12-04powerpc/32s: Don't use SPRN_SPRG_PGDIR in hash_pageChristophe Leroy
SPRN_SPRG_PGDIR is there mainly to speedup SW TLB miss handlers for powerpc 603. We need to free SPRN_SPRG2 to reduce the mess with CONFIG_VMAP_STACK. In hash_page(), reading PGDIR from thread_struct will be in the noise performance wise. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4adca19b7120cdf619956768ed09e74fc6a558f3.1606285014.git.christophe.leroy@csgroup.eu
2020-12-04powerpc/32s: Don't hash_preload() kernel textChristophe Leroy
We now always map kernel text with BATs. Neither need to preload hash with kernel text addresses nor ensure they are never evicted. This is more or less a revert of commit ee4f2ea48674 ("[POWERPC] Fix 32-bit mm operations when not using BATs") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0a0bab7fadd89aa829e33420fbc10d60c59040a7.1606285014.git.christophe.leroy@csgroup.eu
2020-12-04powerpc/32s: Always map kernel text and rodata with BATsChristophe Leroy
Since commit 2b279c0348af ("powerpc/32s: Allow mapping with BATs with DEBUG_PAGEALLOC"), there is no real situation where mapping without BATs is required. In order to simplify memory handling, always map kernel text and rodata with BATs even when "nobats" kernel parameter is set. Also fix the 603 TLB miss exceptions that don't require anymore kernel page table if DEBUG_PAGEALLOC. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/da51f7ec632825a4ce43290a904aad61648408c0.1606285013.git.christophe.leroy@csgroup.eu
2020-12-04powerpc/book3s64/kup: Check max key supported before enabling kupAneesh Kumar K.V
Don't enable KUEP/KUAP if we support less than or equal to 3 keys. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201202043854.76406-1-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/hash/kuep: Enable KUEP on hashAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-21-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/hash/kuap: Enable kuap on hashAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-20-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/kuap: Improve error reporting with KUAPAneesh Kumar K.V
With hash translation use DSISR_KEYFAULT to identify a wrong access. With Radix we look at the AMR value and type of fault. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-17-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.Aneesh Kumar K.V
Now that kernel correctly store/restore userspace AMR/IAMR values, avoid manipulating AMR and IAMR from the kernel on behalf of userspace. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-15-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/pkeys: Reset userspace AMR correctly on execAneesh Kumar K.V
On fork, we inherit from the parent and on exec, we should switch to default_amr values. Also, avoid changing the AMR register value within the kernel. The kernel now runs with different AMR values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-13-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translationAneesh Kumar K.V
This patch updates kernel hash page table entries to use storage key 3 for its mapping. This implies all kernel access will now use key 3 to control READ/WRITE. The patch also prevents the allocation of key 3 from userspace and UAMOR value is updated such that userspace cannot modify key 3. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-9-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/kuap: Rename MMU_FTR_RADIX_KUAP and MMU_FTR_KUEPAneesh Kumar K.V
This is in preparation to adding support for kuap with hash translation. In preparation for that rename/move kuap related functions to non radix names. Also move the feature bit closer to MMU_FTR_KUEP. MMU_FTR_KUEP is renamed to MMU_FTR_BOOK3S_KUEP to indicate the feature is only relevant to BOOK3S_64 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-8-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/kuep: Move KUEP related function outside radixAneesh Kumar K.V
The next set of patches adds support for kuep with hash translation. In preparation for that rename/move kuap related functions to non radix names. Also set MMU_FTR_KUEP and add the missing isync(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-7-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/kuap: Move KUAP related function outside radixAneesh Kumar K.V
The next set of patches adds support for kuap with hash translation. In preparation for that rename/move kuap related functions to non radix names. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-6-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/kuap/kuep: Move uamor setup to pkey initAneesh Kumar K.V
This patch consolidates UAMOR update across pkey, kuap and kuep features. The boot cpu initialize UAMOR via pkey init and both radix/hash do the secondary cpu UAMOR init in early_init_mmu_secondary. We don't check for mmu_feature in radix secondary init because UAMOR is a supported SPRN with all CPUs supporting radix translation. The old code was not updating UAMOR if we had smap disabled and smep enabled. This change handles that case. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-5-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/book3s64/kuap/kuep: Add PPC_PKEY config on book3s64Aneesh Kumar K.V
The config CONFIG_PPC_PKEY is used to select the base support that is required for PPC_MEM_KEYS, KUAP, and KUEP. Adding this dependency reduces the code complexity(in terms of #ifdefs) and enables us to move some of the initialization code to pkeys.c Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-4-aneesh.kumar@linux.ibm.com
2020-12-04powerpc/64s: Tidy machine check SLB loggingNicholas Piggin
Since ISA v3.0, SLB no longer uses the slb_cache, and stab_rr is no longer correlated with SLB allocation. Move those to pre-3.0. While here, improve some alignments and reduce whitespace. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-9-npiggin@gmail.com
2020-11-27powerpc/numa: Fix a regression on memoryless node 0Srikar Dronamraju
Commit e75130f20b1f ("powerpc/numa: Offline memoryless cpuless node 0") offlines node 0 and expects nodes to be subsequently onlined when CPUs or nodes are detected. Commit 6398eaa26816 ("powerpc/numa: Prefer node id queried from vphn") skips onlining node 0 when CPUs are associated with node 0. On systems with node 0 having CPUs but no memory, this causes node 0 be marked offline. This causes issues at boot time when trying to set memory node for online CPUs while building the zonelist. 0:mon> t [link register ] c000000000400354 __build_all_zonelists+0x164/0x280 [c00000000161bda0] c0000000016533c8 node_states+0x20/0xa0 (unreliable) [c00000000161bdc0] c000000000400384 __build_all_zonelists+0x194/0x280 [c00000000161be30] c000000001041800 build_all_zonelists_init+0x4c/0x118 [c00000000161be80] c0000000004020d0 build_all_zonelists+0x190/0x1b0 [c00000000161bef0] c000000001003cf8 start_kernel+0x18c/0x6a8 [c00000000161bf90] c00000000000adb4 start_here_common+0x1c/0x3e8 0:mon> r R00 = c000000000400354 R16 = 000000000b57a0e8 R01 = c00000000161bda0 R17 = 000000000b57a6b0 R02 = c00000000161ce00 R18 = 000000000b5afee8 R03 = 0000000000000000 R19 = 000000000b6448a0 R04 = 0000000000000000 R20 = fffffffffffffffd R05 = 0000000000000000 R21 = 0000000001400000 R06 = 0000000000000000 R22 = 000000001ec00000 R07 = 0000000000000001 R23 = c000000001175580 R08 = 0000000000000000 R24 = c000000001651ed8 R09 = c0000000017e84d8 R25 = c000000001652480 R10 = 0000000000000000 R26 = c000000001175584 R11 = c000000c7fac0d10 R27 = c0000000019568d0 R12 = c000000000400180 R28 = 0000000000000000 R13 = c000000002200000 R29 = c00000000164dd78 R14 = 000000000b579f78 R30 = 0000000000000000 R15 = 000000000b57a2b8 R31 = c000000001175584 pc = c000000000400194 local_memory_node+0x24/0x80 cfar= c000000000074334 mcount+0xc/0x10 lr = c000000000400354 __build_all_zonelists+0x164/0x280 msr = 8000000002001033 cr = 44002284 ctr = c000000000400180 xer = 0000000000000001 trap = 380 dar = 0000000000001388 dsisr = c00000000161bc90 0:mon> Fix this by setting node to be online while onlining CPUs that belong to node 0. Fixes: e75130f20b1f ("powerpc/numa: Offline memoryless cpuless node 0") Fixes: 6398eaa26816 ("powerpc/numa: Prefer node id queried from vphn") Reported-by: Milan Mohanty <milmohan@in.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127053738.10085-1-srikar@linux.vnet.ibm.com
2020-11-27powerpc/64s: Trim offlined CPUs from mm_cpumasksNicholas Piggin
When offlining a CPU, powerpc/64s does not flush TLBs, rather it just leaves the CPU set in mm_cpumasks, so it continues to receive TLBIEs to manage its TLBs. However the exit_flush_lazy_tlbs() function expects that after returning, all CPUs (except self) have flushed TLBs for that mm, in which case TLBIEL can be used for this flush. This breaks for offline CPUs because they don't get the IPI to flush their TLB. This can lead to stale translations. Fix this by clearing the CPU from mm_cpumasks, then flushing all TLBs before going offline. These offlined CPU bits stuck in the cpumask also prevents the cpumask from being trimmed back to local mode, which means continual broadcast IPIs or TLBIEs are needed for TLB flushing. This patch prevents that situation too. A cast of many were involved in working this out, but in particular Milton, Aneesh, Paul made key discoveries. Fixes: 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: Milton Miller <miltonm@us.ibm.com> Debugged-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-5-npiggin@gmail.com
2020-11-27powerpc/64s/pseries: Fix hash tlbiel_all_isa300 for guest kernelsNicholas Piggin
tlbiel_all() can not be usable in !HVMODE when running hash presently, remove HV privileged flushes when running in guest to make it usable. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-3-npiggin@gmail.com
2020-11-27powerpc/64s: Fix hash ISA v3.0 TLBIEL instruction generationNicholas Piggin
A typo has the R field of the instruction assigned by lucky dip a la register allocator. Fixes: d4748276ae14c ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-2-npiggin@gmail.com
2020-11-22mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exportsDan Williams
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-19powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()David Hildenbrand
Let's revert what we did in case something goes wrong and we return an error - as already done on arm64 and s390x. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-8-david@redhat.com
2020-11-19powerpc/book3s64/hash: Drop WARN_ON in hash__remove_section_mapping()David Hildenbrand
The single caller (arch_remove_linear_mapping()) prints a proper warning when this function fails. No need to eventually crash the kernel - let's drop this WARN_ON. Suggested-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-7-david@redhat.com
2020-11-19powerpc/mm: print warning in arch_remove_linear_mapping()David Hildenbrand
Let's print a warning similar to in arch_add_linear_mapping() instead of WARN_ON_ONCE() and eventually crashing the kernel. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-6-david@redhat.com
2020-11-19powerpc/mm: protect linear mapping modifications by a mutexDavid Hildenbrand
This code currently relies on mem_hotplug_begin()/mem_hotplug_done() - create_section_mapping()/remove_section_mapping() implementations cannot tollerate getting called concurrently. Let's prepare for callers (memtrace) not holding any such locks (and don't force them to mess with memory hotplug locks). Other parts in these functions don't seem to rely on external locking. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-5-david@redhat.com
2020-11-19powerpc/mm: factor out creating/removing linear mappingDavid Hildenbrand
We want to stop abusing memory hotplug infrastructure in memtrace code to perform allocations and remove the linear mapping. Instead we will use alloc_contig_pages() and remove the linear mapping manually. Let's factor out creating/removing the linear mapping into arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the future, we might be able to have whole arch_add_memory() / arch_remove_memory() be implemented in common code. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-4-david@redhat.com
2020-11-19powerpc/mm: Fix comparing pointer to 0 warningKaixu Xia
Fixes coccicheck warning: ./arch/powerpc/mm/pgtable_32.c:87:11-12: WARNING comparing pointer to 0 Avoid pointer type value compared to 0. Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604976961-20441-1-git-send-email-kaixuxia@tencent.com
2020-11-19powerpc/mm: Update tlbiel loop on POWER10Aneesh Kumar K.V
With POWER10, single tlbiel instruction invalidates all the congruence class of the TLB and hence we need to issue only one tlbiel with SET=0. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com
2020-11-19powerpc/mm: Move setting PTE specific flags to pfn_pmd()Aneesh Kumar K.V
powerpc used to set the PTE specific flags in set_pte_at(). That is different from other architectures. To be consistent with other architectures powerpc updated pfn_pte() to set _PAGE_PTE in commit 379c926d6334 ("powerpc/mm: move setting pte specific flags to pfn_pte") That commit didn't do the same for pfn_pmd() because we expect pmd_mkhuge() to do that. But as per Linus that is a bad rule: The rule that you must use "pmd_mkhuge()" seems _completely_ wrong. The only valid use to ever make a pmd out of a pfn is to make a huge-page. Hence update pfn_pmd() to set _PAGE_PTE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201022091115.39568-1-aneesh.kumar@linux.ibm.com
2020-11-06powerpc/mm/highmem: Switch to generic kmap atomicThomas Gleixner
No reason having the same code in every architecture Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20201103095858.087635810@linutronix.de
2020-10-16Merge tag 'powerpc-5.10-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting it for powerpc, as well as a related fix for sparc. - Remove support for PowerPC 601. - Some fixes for watchpoints & addition of a new ptrace flag for detecting ISA v3.1 (Power10) watchpoint features. - A fix for kernels using 4K pages and the hash MMU on bare metal Power9 systems with > 16TB of RAM, or RAM on the 2nd node. - A basic idle driver for shallow stop states on Power10. - Tweaks to our sched domains code to better inform the scheduler about the hardware topology on Power9/10, where two SMT4 cores can be presented by firmware as an SMT8 core. - A series doing further reworks & cleanups of our EEH code. - Addition of a filter for RTAS (firmware) calls done via sys_rtas(), to prevent root from overwriting kernel memory. - Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Athira Rajeev, Biwen Li, Cameron Berkenpas, Cédric Le Goater, Christophe Leroy, Christoph Hellwig, Colin Ian King, Daniel Axtens, David Dai, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Greg Kurz, Gustavo Romero, Ira Weiny, Jason Yan, Joel Stanley, Jordan Niethe, Kajol Jain, Konrad Rzeszutek Wilk, Laurent Dufour, Leonardo Bras, Liu Shixin, Luca Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar, Nathan Lynch, Nicholas Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Pedro Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott Cheloha, Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt, Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang Yingliang, zhengbin. * tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (228 commits) Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed" selftests/powerpc: Fix eeh-basic.sh exit codes cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier powerpc/time: Make get_tb() common to PPC32 and PPC64 powerpc/time: Make get_tbl() common to PPC32 and PPC64 powerpc/time: Remove get_tbu() powerpc/time: Avoid using get_tbl() and get_tbu() internally powerpc/time: Make mftb() common to PPC32 and PPC64 powerpc/time: Rename mftbl() to mftb() powerpc/32s: Remove #ifdef CONFIG_PPC_BOOK3S_32 in head_book3s_32.S powerpc/32s: Rename head_32.S to head_book3s_32.S powerpc/32s: Setup the early hash table at all time. powerpc/time: Remove ifdef in get_dec() and set_dec() powerpc: Remove get_tb_or_rtc() powerpc: Remove __USE_RTC() powerpc: Tidy up a bit after removal of PowerPC 601. powerpc: Remove support for PowerPC 601 powerpc: Remove PowerPC 601 powerpc: Drop SYNC_601() ISYNC_601() and SYNC() powerpc: Remove CONFIG_PPC601_SYNC_FIX ...
2020-10-16powerpc/mm: move setting pte specific flags to pfn_pteAneesh Kumar K.V
powerpc used to set the pte specific flags in set_pte_at(). This is different from other architectures. To be consistent with other architecture update pfn_pte to set _PAGE_PTE on ppc64. Also, drop now unused pte_mkpte. We add a VM_WARN_ON() to catch the usage of calling set_pte_at() without setting _PAGE_PTE bit. We will remove that after a few releases. With respect to huge pmd entries, pmd_mkhuge() takes care of adding the _PAGE_PTE bit. [akpm@linux-foundation.org: whitespace fix, per Christophe] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: https://lkml.kernel.org/r/20200902114222.181353-3-aneesh.kumar@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-15Merge tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping updates from Christoph Hellwig: - rework the non-coherent DMA allocator - move private definitions out of <linux/dma-mapping.h> - lower CMA_ALIGNMENT (Paul Cercueil) - remove the omap1 dma address translation in favor of the common code - make dma-direct aware of multiple dma offset ranges (Jim Quinlan) - support per-node DMA CMA areas (Barry Song) - increase the default seg boundary limit (Nicolin Chen) - misc fixes (Robin Murphy, Thomas Tai, Xu Wang) - various cleanups * tag 'dma-mapping-5.10' of git://git.infradead.org/users/hch/dma-mapping: (63 commits) ARM/ixp4xx: add a missing include of dma-map-ops.h dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling dma-direct: factor out a dma_direct_alloc_from_pool helper dma-direct check for highmem pages in dma_direct_alloc_pages dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h> dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma dma-mapping: move dma-debug.h to kernel/dma/ dma-mapping: remove <asm/dma-contiguous.h> dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h> dma-contiguous: remove dma_contiguous_set_default dma-contiguous: remove dev_set_cma_area dma-contiguous: remove dma_declare_contiguous dma-mapping: split <linux/dma-mapping.h> cma: decrease CMA_ALIGNMENT lower limit to 2 firewire-ohci: use dma_alloc_pages dma-iommu: implement ->alloc_noncoherent dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods dma-mapping: add a new dma_alloc_pages API dma-mapping: remove dma_cache_sync 53c700: convert to dma_alloc_noncoherent ...
2020-10-13arch, drivers: replace for_each_membock() with for_each_mem_range()Mike Rapoport
There are several occurrences of the following pattern: for_each_memblock(memory, reg) { start = __pfn_to_phys(memblock_region_memory_base_pfn(reg); end = __pfn_to_phys(memblock_region_memory_end_pfn(reg)); /* do something with start and end */ } Using for_each_mem_range() iterator is more appropriate in such cases and allows simpler and cleaner code. [akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build] [rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring] Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13arch, mm: replace for_each_memblock() with for_each_mem_pfn_range()Mike Rapoport
There are several occurrences of the following pattern: for_each_memblock(memory, reg) { start_pfn = memblock_region_memory_base_pfn(reg); end_pfn = memblock_region_memory_end_pfn(reg); /* do something with start_pfn and end_pfn */ } Rather than iterate over all memblock.memory regions and each time query for their start and end PFNs, use for_each_mem_pfn_range() iterator to get simpler and clearer code. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [.clang-format] Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-12-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-08powerpc/32s: Setup the early hash table at all time.Christophe Leroy
At the time being, an early hash table is set up when CONFIG_KASAN is selected. There is nothing wrong with setting such an early hash table all the time, even if it is not used. This is a statically allocated 256 kB table which lies in the init data section. This makes the code simpler and may in the future allow to setup early IO mappings with fixmap instead of hard coding BATs. Put create_hpte() and flush_hash_pages() in the .ref.text section in order to avoid warning for the reference to early_hash[]. This reference is removed by MMU_init_hw_patch() before init memory is freed. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b8f8101c368b8a6451844a58d7bd7d83c14cf2aa.1601566529.git.christophe.leroy@csgroup.eu
2020-10-08powerpc: Tidy up a bit after removal of PowerPC 601.Christophe Leroy
The removal of the 601 left some standalone blocks from former if/else. Drop the { } and re-indent. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/31c4cd093963f22831bf388449056ee045533d3b.1601362098.git.christophe.leroy@csgroup.eu
2020-10-08powerpc: Remove support for PowerPC 601Christophe Leroy
PowerPC 601 has been retired. Remove all associated specific code. CPU_FTRS_PPC601 has CPU_FTR_COHERENT_ICACHE and CPU_FTR_COMMON. CPU_FTR_COMMON is already present via other CPU_FTRS. None of the remaining CPU selects CPU_FTR_COHERENT_ICACHE. So CPU_FTRS_PPC601 can be removed from the possible features, hence can be removed completely. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/60b725d55e21beec3335175c20b77903ff98284f.1601362098.git.christophe.leroy@csgroup.eu
2020-10-08powerpc: Drop SYNC_601() ISYNC_601() and SYNC()Christophe Leroy
Those macros are now empty at all time. Drop them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7990bb63fc53e460bfa94f8040184881d9e6fbc3.1601362098.git.christophe.leroy@csgroup.eu
2020-10-08powerpc/lmb-size: Use addr #size-cells value when fetching lmb-sizeAneesh Kumar K.V
Make it consistent with other usages. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007114836.282468-5-aneesh.kumar@linux.ibm.com
2020-10-08powerpc/book3s64/radix: Make radix_mem_block_size 64bitAneesh Kumar K.V
Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic") make sure different variables tracking lmb_size are updated to be 64 bit. Fixes: af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for hot-plugged memory") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007114836.282468-4-aneesh.kumar@linux.ibm.com
2020-10-06pseries/hotplug-memory: hot-add: skip redundant LMB lookupScott Cheloha
During memory hot-add, dlpar_add_lmb() calls memory_add_physaddr_to_nid() to determine which node id (nid) to use when later calling __add_memory(). This is wasteful. On pseries, memory_add_physaddr_to_nid() finds an appropriate nid for a given address by looking up the LMB containing the address and then passing that LMB to of_drconf_to_nid_single() to get the nid. In dlpar_add_lmb() we get this address from the LMB itself. In short, we have a pointer to an LMB and then we are searching for that LMB *again* in order to find its nid. If we call of_drconf_to_nid_single() directly from dlpar_add_lmb() we can skip the redundant lookup. The only error handling we need to duplicate from memory_add_physaddr_to_nid() is the fallback to the default nid when drconf_to_nid_single() returns -1 (NUMA_NO_NODE) or an invalid nid. Skipping the extra lookup makes hot-add operations faster, especially on machines with many LMBs. Consider an LPAR with 126976 LMBs. In one test, hot-adding 126000 LMBs on an upatched kernel took ~3.5 hours while a patched kernel completed the same operation in ~2 hours: Unpatched (12450 seconds): Sep 9 04:06:31 ltc-brazos1 drmgr[810169]: drmgr: -c mem -a -q 126000 Sep 9 04:06:31 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s) [...] Sep 9 07:34:01 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added Patched (7065 seconds): Sep 8 21:49:57 ltc-brazos1 drmgr[877703]: drmgr: -c mem -a -q 126000 Sep 8 21:49:57 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s) [...] Sep 8 23:27:42 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added It should be noted that the speedup grows more substantial when hot-adding LMBs at the end of the drconf range. This is because we are skipping a linear LMB search. To see the distinction, consider smaller hot-add test on the same LPAR. A perf-stat run with 10 iterations showed that hot-adding 4096 LMBs completed less than 1 second faster on a patched kernel: Unpatched: Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs): 104,753.42 msec task-clock # 0.992 CPUs utilized ( +- 0.55% ) 4,708 context-switches # 0.045 K/sec ( +- 0.69% ) 2,444 cpu-migrations # 0.023 K/sec ( +- 1.25% ) 394 page-faults # 0.004 K/sec ( +- 0.22% ) 445,902,503,057 cycles # 4.257 GHz ( +- 0.55% ) (66.67%) 8,558,376,740 stalled-cycles-frontend # 1.92% frontend cycles idle ( +- 0.88% ) (49.99%) 300,346,181,651 stalled-cycles-backend # 67.36% backend cycles idle ( +- 0.76% ) (50.01%) 258,091,488,691 instructions # 0.58 insn per cycle # 1.16 stalled cycles per insn ( +- 0.22% ) (66.67%) 70,568,169,256 branches # 673.660 M/sec ( +- 0.17% ) (50.01%) 3,100,725,426 branch-misses # 4.39% of all branches ( +- 0.20% ) (49.99%) 105.583 +- 0.589 seconds time elapsed ( +- 0.56% ) Patched: Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs): 104,055.69 msec task-clock # 0.993 CPUs utilized ( +- 0.32% ) 4,606 context-switches # 0.044 K/sec ( +- 0.20% ) 2,463 cpu-migrations # 0.024 K/sec ( +- 0.93% ) 394 page-faults # 0.004 K/sec ( +- 0.25% ) 442,951,129,921 cycles # 4.257 GHz ( +- 0.32% ) (66.66%) 8,710,413,329 stalled-cycles-frontend # 1.97% frontend cycles idle ( +- 0.47% ) (50.06%) 299,656,905,836 stalled-cycles-backend # 67.65% backend cycles idle ( +- 0.39% ) (50.02%) 252,731,168,193 instructions # 0.57 insn per cycle # 1.19 stalled cycles per insn ( +- 0.20% ) (66.66%) 68,902,851,121 branches # 662.173 M/sec ( +- 0.13% ) (49.94%) 3,100,242,882 branch-misses # 4.50% of all branches ( +- 0.15% ) (49.98%) 104.829 +- 0.325 seconds time elapsed ( +- 0.31% ) This is consistent. An add-by-count hot-add operation adds LMBs greedily, so LMBs near the start of the drconf range are considered first. On an otherwise idle LPAR with so many LMBs we would expect to find the LMBs we need near the start of the drconf range, hence the smaller speedup. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200916145122.3408129-1-cheloha@linux.ibm.com