linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-06-27	Merge tag 'docs-arm64-move' of git://git.lwn.net/linux	Linus Torvalds
	Pull arm64 documentation move from Jonathan Corbet: "Move the arm64 architecture documentation under Documentation/arch/. This brings some order to the documentation directory, declutters the top-level directory, and makes the documentation organization more closely match that of the source" * tag 'docs-arm64-move' of git://git.lwn.net/linux: perf arm-spe: Fix a dangling Documentation/arm64 reference mm: Fix a dangling Documentation/arm64 reference arm64: Fix dangling references to Documentation/arm64 dt-bindings: fix dangling Documentation/arm64 reference docs: arm64: Move arm64 documentation under Documentation/arch/
2023-06-27	Merge tag 'docs-arm-move' of git://git.lwn.net/linux	Linus Torvalds
	Pull arm documentation move from Jonathan Corbet: "Move the Arm architecture documentation under Documentation/arch/. This brings some order to the documentation directory, declutters the top-level directory, and makes the documentation organization more closely match that of the source" * tag 'docs-arm-move' of git://git.lwn.net/linux: dt-bindings: Update Documentation/arm references docs: update some straggling Documentation/arm references crypto: update some Arm documentation references mips: update a reference to a moved Arm Document arm64: Update Documentation/arm references arm: update in-source documentation references arm: docs: Move Arm documentation to Documentation/arch/
2023-06-26	Merge tag 'arm64-upstream' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Notable features are user-space support for the memcpy/memset instructions and the permission indirection extension. - Support for the Armv8.9 Permission Indirection Extensions. While this feature doesn't add new functionality, it enables future support for Guarded Control Stacks (GCS) and Permission Overlays - User-space support for the Armv8.8 memcpy/memset instructions - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and cleanups - Removal of superfluous ISBs on context switch (following retrospective architecture tightening) - Decode the ISS2 register during faults for additional information to help with debugging - KPTI clean-up/simplification of the trampoline exit code - Addressing several -Wmissing-prototype warnings - Kselftest improvements for signal handling and ptrace - Fix TPIDR2_EL0 restoring on sigreturn - Clean-up, robustness improvements of the module allocation code - More sysreg conversions to the automatic register/bitfields generation - CPU capabilities handling cleanup - Arm documentation updates: ACPI, ptdump" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits) kselftest/arm64: Add a test case for TPIDR2 restore arm64/signal: Restore TPIDR2 register rather than memory state arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe Documentation/arm64: Add ptdump documentation arm64: hibernate: remove WARN_ON in save_processor_state kselftest/arm64: Log signal code and address for unexpected signals docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst arm64/fpsimd: Exit streaming mode when flushing tasks docs: perf: Add new description for HiSilicon UC PMU drivers/perf: hisi: Add support for HiSilicon UC PMU driver drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE perf/arm-cmn: Add sysfs identifier perf/arm-cmn: Revamp model detection perf/arm_dmc620: Add cpumask arm64: mm: fix VA-range sanity check arm64/mm: remove now-superfluous ISBs from TTBR writes Documentation/arm64: Update ACPI tables from BBR Documentation/arm64: Update references in arm-acpi Documentation/arm64: Update ARM and arch reference ...
2023-06-26	Merge tag 'smp-core-2023-06-26' of ↵	Linus Torvalds
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Thomas Gleixner: "A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely" * tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) trace,smp: Add tracepoints for scheduling remotelly called functions trace,smp: Add tracepoints around remotelly called functions MAINTAINERS: Add CPU HOTPLUG entry x86/smpboot: Fix the parallel bringup decision x86/realmode: Make stack lock work in trampoline_compat() x86/smp: Initialize cpu_primary_thread_mask late cpu/hotplug: Fix off by one in cpuhp_bringup_mask() x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Implement a bit spinlock to protect the realmode stack x86/apic: Save the APIC virtual base address cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE x86/apic: Provide cpu_primary_thread mask x86/smpboot: Enable split CPU startup cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism cpu/hotplug: Reset task stack state in _cpu_up() cpu/hotplug: Remove unused state functions riscv: Switch to hotplug core state synchronization parisc: Switch to hotplug core state synchronization ...
2023-06-26	kbuild: Drop -Wdeclaration-after-statement	Peter Zijlstra
	With the advent on scope-based resource management it comes really tedious to abide by the contraints of -Wdeclaration-after-statement. It will still be recommeneded to place declarations at the start of a scope where possible, but it will no longer be enforced. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/CAHk-%3Dwi-RyoUhbChiVaJZoZXheAwnJ7OO%3DGxe85BkPAd93TwDA%40mail.gmail.com
2023-06-23	Merge branch 'for-next/feat_s1pie' into for-next/core	Catalin Marinas
	* for-next/feat_s1pie: : Support for the Armv8.9 Permission Indirection Extensions (stage 1 only) KVM: selftests: get-reg-list: add Permission Indirection registers KVM: selftests: get-reg-list: support ID register features arm64: Document boot requirements for PIE arm64: transfer permission indirection settings to EL2 arm64: enable Permission Indirection Extension (PIE) arm64: add encodings of PIRx_ELx registers arm64: disable EL2 traps for PIE arm64: reorganise PAGE_/PROT_ macros arm64: add PTE_WRITE to PROT_SECT_NORMAL arm64: add PTE_UXN/PTE_WRITE to SWAPPER__FLAGS KVM: arm64: expose ID_AA64MMFR3_EL1 to guests KVM: arm64: Save/restore PIE registers KVM: arm64: Save/restore TCR2_EL1 arm64: cpufeature: add Permission Indirection Extension cpucap arm64: cpufeature: add TCR2 cpucap arm64: cpufeature: add system register ID_AA64MMFR3 arm64/sysreg: add PIR_ELx registers arm64/sysreg: update HCRX_EL2 register arm64/sysreg: add system registers TCR2_ELx arm64/sysreg: Add ID register ID_AA64MMFR3
2023-06-23	Merge branches 'for-next/kpti', 'for-next/missing-proto-warn', ↵	Catalin Marinas
	'for-next/iss2-decode', 'for-next/kselftest', 'for-next/misc', 'for-next/feat_mops', 'for-next/module-alloc', 'for-next/sysreg', 'for-next/cpucap', 'for-next/acpi', 'for-next/kdump', 'for-next/acpi-doc', 'for-next/doc' and 'for-next/tpidr2-fix', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst docs: perf: Add new description for HiSilicon UC PMU drivers/perf: hisi: Add support for HiSilicon UC PMU driver drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE perf/arm-cmn: Add sysfs identifier perf/arm-cmn: Revamp model detection perf/arm_dmc620: Add cpumask dt-bindings: perf: fsl-imx-ddr: Add i.MX93 compatible drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver perf/arm_cspmu: Decouple APMT dependency perf/arm_cspmu: Clean up ACPI dependency ACPI/APMT: Don't register invalid resource perf/arm_cspmu: Fix event attribute type perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used drivers/perf: hisi: Don't migrate perf to the CPU going to teardown drivers/perf: apple_m1: Force 63bit counters for M2 CPUs perf/arm-cmn: Fix DTC reset perf: qcom_l2_pmu: Make l2_cache_pmu_probe_cluster() more robust perf/arm-cci: Slightly optimize cci_pmu_sync_counters() * for-next/kpti: : Simplify KPTI trampoline exit code arm64: entry: Simplify tramp_alias macro and tramp_exit routine arm64: entry: Preserve/restore X29 even for compat tasks * for-next/missing-proto-warn: : Address -Wmissing-prototype warnings arm64: add alt_cb_patch_nops prototype arm64: move early_brk64 prototype to header arm64: signal: include asm/exception.h arm64: kaslr: add kaslr_early_init() declaration arm64: flush: include linux/libnvdimm.h arm64: module-plts: inline linux/moduleloader.h arm64: hide unused is_valid_bugaddr() arm64: efi: add efi_handle_corrupted_x18 prototype arm64: cpuidle: fix #ifdef for acpi functions arm64: kvm: add prototypes for functions called in asm arm64: spectre: provide prototypes for internal functions arm64: move cpu_suspend_set_dbg_restorer() prototype to header arm64: avoid prototype warnings for syscalls arm64: add scs_patch_vmlinux prototype arm64: xor-neon: mark xor_arm64_neon_() static for-next/iss2-decode: : Add decode of ISS2 to data abort reports arm64/esr: Add decode of ISS2 to data abort reporting arm64/esr: Use GENMASK() for the ISS mask * for-next/kselftest: : Various arm64 kselftest improvements kselftest/arm64: Log signal code and address for unexpected signals kselftest/arm64: Add a smoke test for ptracing hardware break/watch points * for-next/misc: : Miscellaneous patches arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe arm64: hibernate: remove WARN_ON in save_processor_state arm64/fpsimd: Exit streaming mode when flushing tasks arm64: mm: fix VA-range sanity check arm64/mm: remove now-superfluous ISBs from TTBR writes arm64: consolidate rox page protection logic arm64: set __exception_irq_entry with __irq_entry as a default arm64: syscall: unmask DAIF for tracing status arm64: lockdep: enable checks for held locks when returning to userspace arm64/cpucaps: increase string width to properly format cpucaps.h arm64/cpufeature: Use helper for ECV CNTPOFF cpufeature * for-next/feat_mops: : Support for ARMv8.8 memcpy instructions in userspace kselftest/arm64: add MOPS to hwcap test arm64: mops: allow disabling MOPS from the kernel command line arm64: mops: detect and enable FEAT_MOPS arm64: mops: handle single stepping after MOPS exception arm64: mops: handle MOPS exceptions KVM: arm64: hide MOPS from guests arm64: mops: don't disable host MOPS instructions from EL2 arm64: mops: document boot requirements for MOPS KVM: arm64: switch HCRX_EL2 between host and guest arm64: cpufeature: detect FEAT_HCX KVM: arm64: initialize HCRX_EL2 * for-next/module-alloc: : Make the arm64 module allocation code more robust (clean-up, VA range expansion) arm64: module: rework module VA range selection arm64: module: mandate MODULE_PLTS arm64: module: move module randomization to module.c arm64: kaslr: split kaslr/module initialization arm64: kasan: remove !KASAN_VMALLOC remnants arm64: module: remove old !KASAN_VMALLOC logic * for-next/sysreg: (21 commits) : More sysreg conversions to automatic generation arm64/sysreg: Convert TRBIDR_EL1 register to automatic generation arm64/sysreg: Convert TRBTRG_EL1 register to automatic generation arm64/sysreg: Convert TRBMAR_EL1 register to automatic generation arm64/sysreg: Convert TRBSR_EL1 register to automatic generation arm64/sysreg: Convert TRBBASER_EL1 register to automatic generation arm64/sysreg: Convert TRBPTR_EL1 register to automatic generation arm64/sysreg: Convert TRBLIMITR_EL1 register to automatic generation arm64/sysreg: Rename TRBIDR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBTRG_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBMAR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBSR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBBASER_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBPTR_EL1 fields per auto-gen tools format arm64/sysreg: Rename TRBLIMITR_EL1 fields per auto-gen tools format arm64/sysreg: Convert OSECCR_EL1 to automatic generation arm64/sysreg: Convert OSDTRTX_EL1 to automatic generation arm64/sysreg: Convert OSDTRRX_EL1 to automatic generation arm64/sysreg: Convert OSLAR_EL1 to automatic generation arm64/sysreg: Standardise naming of bitfield constants in OSL[AS]R_EL1 arm64/sysreg: Convert MDSCR_EL1 to automatic register generation ... * for-next/cpucap: : arm64 cpucap clean-up arm64: cpufeature: fold cpus_set_cap() into update_cpu_capabilities() arm64: cpufeature: use cpucap naming arm64: alternatives: use cpucap naming arm64: standardise cpucap bitmap names * for-next/acpi: : Various arm64-related ACPI patches ACPI: bus: Consolidate all arm specific initialisation into acpi_arm_init() * for-next/kdump: : Simplify the crashkernel reservation behaviour of crashkernel=X,high on arm64 arm64: add kdump.rst into index.rst Documentation: add kdump.rst to present crashkernel reservation on arm64 arm64: kdump: simplify the reservation behaviour of crashkernel=,high * for-next/acpi-doc: : Update ACPI documentation for Arm systems Documentation/arm64: Update ACPI tables from BBR Documentation/arm64: Update references in arm-acpi Documentation/arm64: Update ARM and arch reference * for-next/doc: : arm64 documentation updates Documentation/arm64: Add ptdump documentation * for-next/tpidr2-fix: : Fix the TPIDR2_EL0 register restoring on sigreturn kselftest/arm64: Add a test case for TPIDR2 restore arm64/signal: Restore TPIDR2 register rather than memory state
2023-06-23	arm64/signal: Restore TPIDR2 register rather than memory state	Mark Brown
	Currently when restoring the TPIDR2 signal context we set the new value from the signal frame in the thread data structure but not the register, following the pattern for the rest of the data we are restoring. This does not work in the case of TPIDR2, the register always has the value for the current task. This means that either we return to userspace and ignore the new value or we context switch and save the register value on top of the newly restored value. Load the value from the signal context into the register instead. Fixes: 39e54499280f ("arm64/signal: Include TPIDR2 in the signal context") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: <stable@vger.kernel.org> # 6.3.x Link: https://lore.kernel.org/r/20230621-arm64-fix-tpidr2-signal-restore-v2-1-c8e8fcc10302@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-21	arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe	Mark Rutland
	When patching kernel alternatives, we need to be careful not to execute kernel code which is itself subject to patching. In general, if code is executed after the instructions in memory have been patched but prior to the cache maintenance and barriers completing, it could lead to UNPREDICTABLE results. As our regular cache maintenance routines are patched with alternatives, we have a clean_dcache_range_nopatch() function which is intended to avoid patchable code and therefore supposed to be safe in the middle of patching alternatives. Unfortunately, it's not marked as 'noinstr', and so can be instrumented with patchable code. Additionally, it calls read_sanitised_ftr_reg() (which may be instrumented with patchable code) to find the sanitized value of CTR_EL0.DminLine, and is therefore not safe to call during patching. Luckily, since commit: 675b0563d6b26aa9 ("arm64: cpufeature: expose arm64_ftr_reg struct for CTR_EL0") ... we can read the sanitised CTR_EL0 value directly, and avoid the call to read_sanitised_ftr_reg(). This patch marks clean_dcache_range_nopatch() as noinstr, and has it read the sanitized CTR_EL0 value directly, avoiding the issues above. As a bonus, this is also an optimization. As read_sanitised_ftr_reg() performs a binary search to find the CTR_EL0 value, reading the value directly avoids this binary search per applied alternative, avoiding some unnecessary work. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230616103150.1238132-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-21	arm64: Fix dangling references to Documentation/arm64	Jonathan Corbet
	The arm64 documentation has moved under Documentation/arch/; fix up references in the arm64 subtree to match. Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-efi@vger.kernel.org Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-06-21	arm64: hibernate: remove WARN_ON in save_processor_state	Song Shuai
	During hibernation or restoration, freeze_secondary_cpus checks num_online_cpus via BUG_ON, and the subsequent save_processor_state also does the checking with WARN_ON. In the case of CONFIG_PM_SLEEP_SMP=n, freeze_secondary_cpus is not defined, but the sole possible condition to disable CONFIG_PM_SLEEP_SMP is !SMP where num_online_cpus is always 1. We also don't have to check it in save_processor_state. So remove the unnecessary checking in save_processor_state. Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230609075049.2651723-3-songshuaishuai@tinylab.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-20	arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL	Donglin Peng
	The previous patch ("function_graph: Support recording and printing the return value of function") has laid the groundwork for the for the funcgraph-retval, and this modification makes it available on the ARM64 platform. We introduce a new structure called fgraph_ret_regs for the ARM64 platform to hold return registers and the frame pointer. We then fill its content in the return_to_handler and pass its address to the function ftrace_return_to_handler to record the return value. Link: https://lkml.kernel.org/r/c78366416ce93f704ae7000c4ee60eb4258c38f7.1680954589.git.pengdonglin@sangfor.com.cn Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-06-16	arm64/fpsimd: Exit streaming mode when flushing tasks	Mark Brown
	Ensure there is no path where we might attempt to save SME state after we flush a task by updating the SVCR register state as well as updating our in memory state. I haven't seen a specific case where this is happening or seen a path where it might happen but for the cost of a single low overhead instruction it seems sensible to close the potential gap. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230607-arm64-flush-svcr-v2-1-827306001841@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-16	Merge branch kvm-arm64/ampere1-hafdbs-mitigation into kvmarm/next	Oliver Upton
	* kvm-arm64/ampere1-hafdbs-mitigation: : AmpereOne erratum AC03_CPU_38 mitigation : : AmpereOne does not advertise support for FEAT_HAFDBS due to an : underlying erratum in the feature. The associated control bits do not : have RES0 behavior as required by the architecture. : : Introduce mitigations to prevent KVM from enabling the feature at : stage-2 as well as preventing KVM guests from enabling HAFDBS at : stage-1. KVM: arm64: Prevent guests from enabling HA/HD on Ampere1 KVM: arm64: Refactor HFGxTR configuration into separate helpers arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2 Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-16	arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2	Oliver Upton
	AmpereOne has an erratum in its implementation of FEAT_HAFDBS that required disabling the feature on the design. This was done by reporting the feature as not implemented in the ID register, although the corresponding control bits were not actually RES0. This does not align well with the requirements of the architecture, which mandates these bits be RES0 if HAFDBS isn't implemented. The kernel's use of stage-1 is unaffected, as the HA and HD bits are only set if HAFDBS is detected in the ID register. KVM, on the other hand, relies on the RES0 behavior at stage-2 to use the same value for VTCR_EL2 on any cpu in the system. Mitigate the non-RES0 behavior by leaving VTCR_EL2.HA clear on affected systems. Cc: stable@vger.kernel.org Cc: D Scott Phillips <scott@os.amperecomputing.com> Cc: Darren Hart <darren@os.amperecomputing.com> Acked-by: D Scott Phillips <scott@os.amperecomputing.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230609220104.1836988-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15	Merge branch kvm-arm64/misc into kvmarm/next	Oliver Upton
	* kvm-arm64/misc: : Miscellaneous updates : : - Avoid trapping CTR_EL0 on systems with FEAT_EVT, as the register is : commonly read by userspace : : - Make use of FEAT_BTI at hyp stage-1, setting the Guard Page bit to 1 : for executable mappings : : - Use a separate set of pointer authentication keys for the hypervisor : when running in protected mode (i.e. pKVM) : : - Plug a few holes in timer initialization where KVM fails to free the : timer IRQ(s) KVM: arm64: Use different pointer authentication keys for pKVM KVM: arm64: timers: Fix resource leaks in kvm_timer_hyp_init() KVM: arm64: Use BTI for nvhe KVM: arm64: Relax trapping of CTR_EL0 when FEAT_EVT is available Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15	Merge branch kvm-arm64/configurable-id-regs into kvmarm/next	Oliver Upton
	* kvm-arm64/configurable-id-regs: : Configurable ID register infrastructure, courtesy of Jing Zhang : : Create generalized infrastructure for allowing userspace to select the : supported feature set for a VM, so long as the feature set is a subset : of what hardware + KVM allows. This does not add any new features that : are user-configurable, and instead focuses on the necessary refactoring : to enable future work. : : As a consequence of the series, feature asymmetry is now deliberately : disallowed for KVM. It is unlikely that VMMs ever configured VMs with : asymmetry, nor does it align with the kernel's overall stance that : features must be uniform across all cores in the system. : : Furthermore, KVM incorrectly advertised an IMP_DEF PMU to guests for : some time. Migrations from affected kernels was supported by explicitly : allowing such an ID register value from userspace, and forwarding that : along to the guest. KVM now allows an IMP_DEF PMU version to be restored : through the ID register interface, but reinterprets the user value as : not implemented (0). KVM: arm64: Rip out the vestiges of the 'old' ID register scheme KVM: arm64: Handle ID register reads using the VM-wide values KVM: arm64: Use generic sanitisation for ID_AA64PFR0_EL1 KVM: arm64: Use generic sanitisation for ID_(AA64)DFR0_EL1 KVM: arm64: Use arm64_ftr_bits to sanitise ID register writes KVM: arm64: Save ID registers' sanitized value per guest KVM: arm64: Reuse fields of sys_reg_desc for idreg KVM: arm64: Rewrite IMPDEF PMU version as NI KVM: arm64: Make vCPU feature flags consistent VM-wide KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF KVM: arm64: Separate out feature sanitisation and initialisation Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15	Merge branch for-next/module-alloc into kvmarm/next	Oliver Upton
	* for-next/module-alloc: : Drag in module VA rework to handle conflicts w/ sw feature refactor arm64: module: rework module VA range selection arm64: module: mandate MODULE_PLTS arm64: module: move module randomization to module.c arm64: kaslr: split kaslr/module initialization arm64: kasan: remove !KASAN_VMALLOC remnants arm64: module: remove old !KASAN_VMALLOC logic Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15	KVM: arm64: Use arm64_ftr_bits to sanitise ID register writes	Jing Zhang
	Rather than reinventing the wheel in KVM to do ID register sanitisation we can rely on the work already done in the core kernel. Implement a generalized sanitisation of ID registers based on the combination of the arm64_ftr_bits definitions from the core kernel and (optionally) a set of KVM-specific overrides. This all amounts to absolutely nothing for now, but will be used in subsequent changes to realize user-configurable ID registers. Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20230609190054.1542113-8-oliver.upton@linux.dev [Oliver: split off from monster patch, rewrote commit description, reworked RAZ handling, return EINVAL to userspace] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-15	KVM: arm64: Fix hVHE init on CPUs where HCR_EL2.E2H is not RES1	Marc Zyngier
	On CPUs where E2H is RES1, we very quickly set the scene for running EL2 with a VHE configuration, as we do not have any other choice. However, CPUs that conform to the current writing of the architecture start with E2H=0, and only later upgrade with E2H=1. This is all good, but nothing there is actually reconfiguring EL2 to be able to correctly run the kernel at EL1. Huhuh... The "obvious" solution is not to just reinitialise the timer controls like we do, but to really intitialise everything unconditionally. This requires a bit of surgery, and is a good opportunity to remove the macro that messes with SPSR_EL2 in init_el2_state. With that, hVHE now works correctly on my trusted A55 machine! Reported-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230614155129.2697388-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12	arm64: Allow arm64_sw.hvhe on command line	Marc Zyngier
	Add the arm64_sw.hvhe=1 option to force the use of the hVHE mode in the hypervisor code only. This enables the hVHE mode of operation when using KVM on VHE hardware. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230609162200.2024064-17-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12	arm64: Don't enable VHE for the kernel if OVERRIDE_HVHE is set	Marc Zyngier
	If the OVERRIDE_HVHE SW override is set (as a precursor of the KVM_HVHE capability), do not enable VHE for the kernel and drop to EL1 as if VHE was either disabled or unavailable. Further changes will enable VHE at EL2 only, with the kernel still running at EL1. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230609162200.2024064-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12	arm64: Add KVM_HVHE capability and has_hvhe() predicate	Marc Zyngier
	Expose a capability keying the hVHE feature as well as a new predicate testing it. Nothing is so far using it, and nothing is enabling it yet. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230609162200.2024064-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12	arm64: Turn kaslr_feature_override into a generic SW feature override	Marc Zyngier
	Disabling KASLR from the command line is implemented as a feature override. Repaint it slightly so that it can further be used as more generic infrastructure for SW override purposes. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230609162200.2024064-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-06-12	arm64: Update Documentation/arm references	Jonathan Corbet
	The Arm documentation has moved to Documentation/arch/arm; update references under arch/arm64 to match. Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-06-09	arm64: enable perf events based hard lockup detector	Douglas Anderson
	With the recent feature added to enable perf events to use pseudo NMIs as interrupts on platforms which support GICv3 or later, its now been possible to enable hard lockup detector (or NMI watchdog) on arm64 platforms. So enable corresponding support. One thing to note here is that normally lockup detector is initialized just after the early initcalls but PMU on arm64 comes up much later as device_initcall(). To cope with that, override arch_perf_nmi_is_available() to let the watchdog framework know PMU not ready, and inform the framework to re-initialize lockup detection once PMU has been initialized. [dianders@chromium.org: only HAVE_HARDLOCKUP_DETECTOR_PERF if the PMU config is enabled] Link: https://lkml.kernel.org/r/20230523073952.1.I60217a63acc35621e13f10be16c0cd7c363caf8c@changeid Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Co-developed-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09	arm64: add hw_nmi_get_sample_period for preparation of lockup detector	Lecopzer Chen
	Set safe maximum CPU frequency to 5 GHz in case a particular platform doesn't implement cpufreq driver. Although, architecture doesn't put any restrictions on maximum frequency but 5 GHz seems to be safe maximum given the available Arm CPUs in the market which are clocked much less than 5 GHz. On the other hand, we can't make it much higher as it would lead to a large hard-lockup detection timeout on parts which are running slower (eg. 1GHz on Developerbox) and doesn't possess a cpufreq driver. Link: https://lkml.kernel.org/r/20230519101840.v5.17.Ia9d02578e89c3f44d3cb12eec8b0176603c8ab2f@changeid Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Co-developed-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Colin Cross <ccross@android.com> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guenter Roeck <groeck@chromium.org> Cc: Ian Rogers <irogers@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Tzung-Bi Shih <tzungbi@chromium.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09	mm/gup: remove vmas parameter from get_user_pages_remote()	Lorenzo Stoakes
	The only instances of get_user_pages_remote() invocations which used the vmas parameter were for a single page which can instead simply look up the VMA directly. In particular:- - __update_ref_ctr() looked up the VMA but did nothing with it so we simply remove it. - __access_remote_vm() was already using vma_lookup() when the original lookup failed so by doing the lookup directly this also de-duplicates the code. We are able to perform these VMA operations as we already hold the mmap_lock in order to be able to call get_user_pages_remote(). As part of this work we add get_user_page_vma_remote() which abstracts the VMA lookup, error handling and decrementing the page reference count should the VMA lookup fail. This forms part of a broader set of patches intended to eliminate the vmas parameter altogether. [akpm@linux-foundation.org: avoid passing NULL to PTR_ERR] Link: https://lkml.kernel.org/r/d20128c849ecdbf4dd01cc828fcec32127ed939a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> (for arm64) Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> (for s390) Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian König <christian.koenig@amd.com> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09	kasan: use internal prototypes matching gcc-13 builtins	Arnd Bergmann
	gcc-13 warns about function definitions for builtin interfaces that have a different prototype, e.g.: In file included from kasan_test.c:31: kasan.h:574:6: error: conflicting types for built-in function '__asan_register_globals'; expected 'void(void , long int)' [-Werror=builtin-declaration-mismatch] 574 \| void __asan_register_globals(struct kasan_global globals, size_t size); kasan.h:577:6: error: conflicting types for built-in function '__asan_alloca_poison'; expected 'void(void , long int)' [-Werror=builtin-declaration-mismatch] 577 \| void __asan_alloca_poison(unsigned long addr, size_t size); kasan.h:580:6: error: conflicting types for built-in function '__asan_load1'; expected 'void(void )' [-Werror=builtin-declaration-mismatch] 580 \| void __asan_load1(unsigned long addr); kasan.h:581:6: error: conflicting types for built-in function '__asan_store1'; expected 'void(void )' [-Werror=builtin-declaration-mismatch] 581 \| void __asan_store1(unsigned long addr); kasan.h:643:6: error: conflicting types for built-in function '__hwasan_tag_memory'; expected 'void(void , unsigned char, long int)' [-Werror=builtin-declaration-mismatch] 643 \| void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size); The two problems are: - Addresses are passes as 'unsigned long' in the kernel, but gcc-13 expects a 'void '. - sizes meant to use a signed ssize_t rather than size_t. Change all the prototypes to match these. Using 'void ' consistently for addresses gets rid of a couple of type casts, so push that down to the leaf functions where possible. This now passes all randconfig builds on arm, arm64 and x86, but I have not tested it on the other architectures that support kasan, since they tend to fail randconfig builds in other ways. This might fail if any of the 32-bit architectures expect a 'long' instead of 'int' for the size argument. The __asan_allocas_unpoison() function prototype is somewhat weird, since it uses a pointer for 'stack_top' and an size_t for 'stack_bottom'. This looks like it is meant to be 'addr' and 'size' like the others, but the implementation clearly treats them as 'top' and 'bottom'. Link: https://lkml.kernel.org/r/20230509145735.9263-2-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-07	arm64: syscall: unmask DAIF for tracing status	Guo Hui
	The following code: static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, const syscall_fn_t syscall_table[]) { ... if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) { local_daif_mask(); flags = read_thread_flags(); -------------------------------- A if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP)) return; local_daif_restore(DAIF_PROCCTX); } trace_exit: syscall_trace_exit(regs); ------- B } 1. The flags in the if conditional statement should be used in the same way as the flags in the function syscall_trace_exit, because DAIF is not shielded in the function syscall_trace_exit, so when the flags are obtained in line A of the code, there is no need to shield DAIF. Don't care about the modification of flags after line A. 2. Masking DAIF caused syscall performance to deteriorate by about 10%. The Unixbench single core syscall performance data is as follows: Machine: Kunpeng 920 Mask DAIF: System Call Overhead 1172314.1 lps (10.0 s, 7 samples) System Benchmarks Partial Index BASELINE RESULT INDEX System Call Overhead 15000.0 1172314.1 781.5 ======== System Benchmarks Index Score (Partial Only) 781.5 Unmask DAIF: System Call Overhead 1287944.6 lps (10.0 s, 7 samples) System Benchmarks Partial Index BASELINE RESULT INDEX System Call Overhead 15000.0 1287944.6 858.6 ======== System Benchmarks Index Score (Partial Only) 858.6 Rationale from Mark Rutland as for why this is safe: This masking is an artifact of the old "ret_fast_syscall" assembly that was converted to C in commit: f37099b6992a0b81 ("arm64: convert syscall trace logic to C") The assembly would mask DAIF, check the thread flags, and fall through to kernel_exit without unmasking if no tracing was needed. The conversion copied this masking into the C version, though this wasn't strictly necessary. As (in general) thread flags can be manipulated by other threads, it's not safe to manipulate the thread flags with plain reads and writes, and since commit: 342b3808786518ce ("arm64: Snapshot thread flags") ... we use read_thread_flags() to read the flags atomically. With this, there is no need to mask DAIF transiently around reading the flags, as we only decide whether to trace while DAIF is masked, and the actual tracing occurs with DAIF unmasked. When el0_svc_common() returns its caller will unconditionally mask DAIF via exit_to_user_mode(), so the masking is redundant. Signed-off-by: Guo Hui <guohui@uniontech.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230526024715.8773-1-guohui@uniontech.com [catalin.marinas@arm.com: updated comment with Mark's rationale] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-07	arm64: cpufeature: fold cpus_set_cap() into update_cpu_capabilities()	Mark Rutland
	We only use cpus_set_cap() in update_cpu_capabilities(), where we open-code an analgous update to boot_cpucaps. Due to the way the cpucap_ptrs[] array is initialized, we know that the capability number cannot be greater than or equal to ARM64_NCAPS, so the warning is superfluous. Fold cpus_set_cap() into update_cpu_capabilities(), matching what we do for the boot_cpucaps, and making the relationship between the two a bit clearer. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230607164846.3967305-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-07	arm64: cpufeature: use cpucap naming	Mark Rutland
	To more clearly align the various users of the cpucap enumeration, this patch changes the cpufeature code to use the term `cpucap` in favour of `cpu_hwcap`. This more clearly aligns with other users of the cpucaps, and avoids confusion with the ELF hwcaps. There should be no functional change as a result of this patch; this is purely a renaming exercise. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230607164846.3967305-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-07	arm64: alternatives: use cpucap naming	Mark Rutland
	To more clearly align the various users of the cpucap enumeration, this patch changes the alternative code to use the term `cpucap` in favour of `feature`. The alternative_has_feature_{likely,unlikely}() functions are renamed to alternative_has_cap_<likely,unlikely}() to more clearly align with the cpus_have_{const_,}cap() helpers. At the same time remove the stale comment referring to the "ARM64_CB bit", which is evidently a typo for ARM64_CB_PATCH, which was removed in commit: 4c0bd995d73ed889 ("arm64: alternatives: have callbacks take a cap") There should be no functional change as a result of this patch; this is purely a renaming exercise. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230607164846.3967305-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-07	arm64: standardise cpucap bitmap names	Mark Rutland
	The 'cpu_hwcaps' and 'boot_capabilities' bitmaps are bitmaps have the same enumerated bits, but are named wildly differently for no good reason. The terms 'hwcaps' and 'capabilities' have become ambiguous over time (e.g. due to clashes with ELF hwcaps and the structures used to manage feature detection), and it would be nicer to use 'cpucaps', matching the <asm/cpucaps.h> header the enumerated bit indices are defined in. While this isn't a functional problem, it makes the code harder than necessary to understand, and hard to extend with related functionality (e.g. per-cpu cpucap bitmaps). To that end, this patch renames `boot_capabilities` to `boot_cpucaps` and `cpu_hwcaps` to `system_cpucaps`. This more clearly indicates the relationship between the two and aligns with terminology used elsewhere in our feature management code. This change was scripted with: \| find . -type f -name '.[chS]' -print0 \| \ \| xargs -0 sed -i 's/\<boot_capabilities\>/boot_cpucaps/' \| find . -type f -name '.[chS]' -print0 \| \ \| xargs -0 sed -i 's/\<cpu_hwcaps\>/system_cpucaps/' ... and the instance of "cpu_hwcap" (without a trailing "s") in <asm/mmu_context.h> corrected manually to "system_cpucaps". Subsequent patches will adjust the naming of related functions to better align with the `cpucap` naming. There should be no functional change as a result of this patch; this is purely a renaming exercise. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230607164846.3967305-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: module: rework module VA range selection	Mark Rutland
	Currently, the modules region is 128M in size, which is a problem for some large modules. Shanker reports [1] that the NVIDIA GPU driver alone can consume 110M of module space in some configurations. We'd like to make the modules region a full 2G such that we can always make use of a 2G range. It's possible to build kernel images which are larger than 128M in some configurations, such as when many debug options are selected and many drivers are built in. In these configurations, we can't legitimately select a base for a 128M module region, though we currently select a value for which allocation will fail. It would be nicer to have a diagnostic message in this case. Similarly, in theory it's possible to build a kernel image which is larger than 2G and which cannot support modules. While this isn't likely to be the case for any realistic kernel deplyed in the field, it would be nice if we could print a diagnostic in this case. This patch reworks the module VA range selection to use a 2G range, and improves handling of cases where we cannot select legitimate module regions. We now attempt to select a 128M region and a 2G region: * The 128M region is selected such that modules can use direct branches (with JUMP26/CALL26 relocations) to branch to kernel code and other modules, and so that modules can reference data and text (using PREL32 relocations) anywhere in the kernel image and other modules. This region covers the entire kernel image (rather than just the text) to ensure that all PREL32 relocations are in range even when the kernel data section is absurdly large. Where we cannot allocate from this region, we'll fall back to the full 2G region. * The 2G region is selected such that modules can use direct branches with PLTs to branch to kernel code and other modules, and so that modules can use reference data and text (with PREL32 relocations) in the kernel image and other modules. This region covers the entire kernel image, and the 128M region (if one is selected). The two module regions are randomized independently while ensuring the constraints described above. [1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/ Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Shanker Donthineni <sdonthineni@nvidia.com> Cc: Will Deacon <will@kernel.org> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> Link: https://lore.kernel.org/r/20230530110328.2213762-7-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: module: mandate MODULE_PLTS	Mark Rutland
	Contemporary kernels and modules can be relatively large, especially when common debug options are enabled. Using GCC 12.1.0, a v6.3-rc7 defconfig kernel is ~38M, and with PROVE_LOCKING + KASAN_INLINE enabled this expands to ~117M. Shanker reports [1] that the NVIDIA GPU driver alone can consume 110M of module space in some configurations. Both KASLR and ARM64_ERRATUM_843419 select MODULE_PLTS, so anyone wanting a kernel to have KASLR or run on Cortex-A53 will have MODULE_PLTS selected. This is the case in defconfig and distribution kernels (e.g. Debian, Android, etc). Practically speaking, this means we're very likely to need MODULE_PLTS and while it's almost guaranteed that MODULE_PLTS will be selected, it is possible to disable support, and we have to maintain some awkward special cases for such unusual configurations. This patch removes the MODULE_PLTS config option, with the support code always enabled if MODULES is selected. This results in a slight simplification, and will allow for further improvement in subsequent patches. For any config which currently selects MODULE_PLTS, there will be no functional change as a result of this patch. [1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/ Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Shanker Donthineni <sdonthineni@nvidia.com> Cc: Will Deacon <will@kernel.org> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> Link: https://lore.kernel.org/r/20230530110328.2213762-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: module: move module randomization to module.c	Mark Rutland
	When CONFIG_RANDOMIZE_BASE=y, module_alloc_base is a variable which is configured by kaslr_module_init() in kaslr.c, and otherwise it is an expression defined in module.h. As kaslr_module_init() is no longer tightly coupled with the KASLR initialization code, we can centralize this in module.c. This patch moves kaslr_module_init() to module.c, making module_alloc_base a static variable, and removing redundant includes from kaslr.c. For the defintion of struct arm64_ftr_override we must include <asm/cpufeature.h>, which was previously included transitively via another header. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Will Deacon <will@kernel.org> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> Link: https://lore.kernel.org/r/20230530110328.2213762-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: kaslr: split kaslr/module initialization	Mark Rutland
	Currently kaslr_init() handles a mixture of detecting/announcing whether KASLR is enabled, and randomizing the module region depending on whether KASLR is enabled. To make it easier to rework the module region initialization, split the KASLR initialization into two steps: * kaslr_init() determines whether KASLR should be enabled, and announces this choice, recording this to a new global boolean variable. This is called from setup_arch() just before the existing call to kaslr_requires_kpti() so that this will always provide the expected result. * kaslr_module_init() randomizes the module region when required. This is called as a subsys_initcall, where we previously called kaslr_init(). As a bonus, moving the KASLR reporting earlier makes it easier to spot and permits it to be logged via earlycon, making it easier to debug any issues that could be triggered by KASLR. Booting a v6.4-rc1 kernel with this patch applied, the log looks like: \| EFI stub: Booting Linux Kernel... \| EFI stub: Generating empty DTB \| EFI stub: Exiting boot services... \| [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510] \| [ 0.000000] Linux version 6.4.0-rc1-00006-g4763a8f8aeb3 (mark@lakrids) (aarch64-linux-gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38) #2 SMP PREEMPT Tue May 9 11:03:37 BST 2023 \| [ 0.000000] KASLR enabled \| [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '') \| [ 0.000000] printk: bootconsole [pl11] enabled Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Will Deacon <will@kernel.org> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> Link: https://lore.kernel.org/r/20230530110328.2213762-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: module: remove old !KASAN_VMALLOC logic	Mark Rutland
	Historically, KASAN could be selected with or without KASAN_VMALLOC, and we had to be very careful where to place modules when KASAN_VMALLOC was not selected. However, since commit: f6f37d9320a11e90 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS modes") Selecting CONFIG_KASAN on arm64 will also select CONFIG_KASAN_VMALLOC, and so the logic for handling CONFIG_KASAN without CONFIG_KASAN_VMALLOC is redundant and can be removed. Note: the "kasan.vmalloc={on,off}" option which only exists for HW_TAGS changes whether the vmalloc region is given non-match-all tags, and does not affect the page table manipulation code. The VM_DEFER_KMEMLEAK flag was only necessary for !CONFIG_KASAN_VMALLOC as described in its introduction in commit: 60115fa54ad7b913 ("mm: defer kmemleak object creation of module_alloc()") ... and therefore it can also be removed. Remove the redundant logic for !CONFIG_KASAN_VMALLOC. At the same time, add the missing braces around the multi-line conditional block in arch/arm64/kernel/module.c. Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Will Deacon <will@kernel.org> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> Link: https://lore.kernel.org/r/20230530110328.2213762-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: lockdep: enable checks for held locks when returning to userspace	Eric Chan
	Currently arm64 doesn't use CONFIG_GENERIC_ENTRY and doesn't call lockdep_sys_exit() when returning to userspace. This means that lockdep won't check for held locks when returning to userspace, which would be useful to detect kernel bugs. Call lockdep_sys_exit() when returning to userspace, enabling checking for held locks. At the same time, rename arm64's prepare_exit_to_user_mode() to exit_to_user_mode_prepare() to more clearly align with the naming in the generic entry code. Signed-off-by: Eric Chan <ericchancf@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531090909.357047-1-ericchancf@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: transfer permission indirection settings to EL2	Joey Gouly
	Copy the EL1 registers: TCR2_EL1, PIR_EL1, PIRE0_EL1, such that PIE is also enabled for EL2. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230606145859.697944-18-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: add PTE_UXN/PTE_WRITE to SWAPPER_*_FLAGS	Joey Gouly
	With PIE enabled, the swapper PTEs would have a Permission Indirection Index (PIIndex) of 0. A PIIndex of 0 is not currently used by any other PTEs. To avoid using index 0 specifically for the swapper PTEs, mark them as PTE_UXN and PTE_WRITE, so that they map to a PAGE_KERNEL_EXEC equivalent. This also adds PTE_WRITE to KPTI_NG_PTE_FLAGS, which was tested by booting with kpti=on. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230606145859.697944-12-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: cpufeature: add Permission Indirection Extension cpucap	Joey Gouly
	This indicates if the system supports PIE. This is a CPUCAP_BOOT_CPU_FEATURE as the boot CPU will enable PIE if it has it, so secondary CPUs must also have this feature. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230606145859.697944-8-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: cpufeature: add TCR2 cpucap	Joey Gouly
	This capability indicates if the system supports the TCR2_ELx system register. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230606145859.697944-7-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-06	arm64: cpufeature: add system register ID_AA64MMFR3	Joey Gouly
	Add new system register ID_AA64MMFR3 to the cpufeature infrastructure. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230606145859.697944-6-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-05	arm64: mops: allow disabling MOPS from the kernel command line	Kristina Martsenko
	Make it possible to disable the MOPS extension at runtime using the kernel command line. This can be useful for testing or working around hardware issues. For example it could be used to test new memory copy routines that do not use MOPS instructions (e.g. from Arm Optimized Routines). Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-11-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-05	arm64: mops: detect and enable FEAT_MOPS	Kristina Martsenko
	The Arm v8.8/9.3 FEAT_MOPS feature provides new instructions that perform a memory copy or set. Wire up the cpufeature code to detect the presence of FEAT_MOPS and enable it. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-10-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-05	arm64: mops: handle single stepping after MOPS exception	Kristina Martsenko
	When a MOPS main or epilogue instruction is being executed, the task may get scheduled on a different CPU and restart execution from the prologue instruction. If the main or epilogue instruction is being single stepped then it makes sense to finish the step and take the step exception before starting to execute the next (prologue) instruction. So fast-forward the single step state machine when taking a MOPS exception. This means that if a main or epilogue instruction is single stepped with ptrace, the debugger will sometimes observe the PC moving back to the prologue instruction. (As already mentioned, this should be rare as it only happens when the task is scheduled to another CPU during the step.) This also ensures that perf breakpoints count prologue instructions consistently (i.e. every time they are executed), rather than skipping them when there also happens to be a breakpoint on a main or epilogue instruction. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-9-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-05	arm64: mops: handle MOPS exceptions	Kristina Martsenko
	The memory copy/set instructions added as part of FEAT_MOPS can take an exception (e.g. page fault) part-way through their execution and resume execution afterwards. If however the task is re-scheduled and execution resumes on a different CPU, then the CPU may take a new type of exception to indicate this. This is because the architecture allows two options (Option A and Option B) to implement the instructions and a heterogeneous system can have different implementations between CPUs. In this case the OS has to reset the registers and restart execution from the prologue instruction. The algorithm for doing this is provided as part of the Arm ARM. Add an exception handler for the new exception and wire it up for userspace tasks. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-8-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-05	arm64: cpufeature: detect FEAT_HCX	Kristina Martsenko
	Detect if the system has the new HCRX_EL2 register added in ARMv8.7/9.2, so that subsequent patches can check for its presence. KVM currently relies on the register being present on all CPUs (or none), so the kernel will panic if that is not the case. Fortunately no such systems currently exist, but this can be revisited if they appear. Note that the kernel will not panic if CONFIG_KVM is disabled. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20230509142235.3284028-3-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>