Age | Commit message (Collapse) | Author |
|
Pull arm64 documentation move from Jonathan Corbet:
"Move the arm64 architecture documentation under Documentation/arch/.
This brings some order to the documentation directory, declutters the
top-level directory, and makes the documentation organization more
closely match that of the source"
* tag 'docs-arm64-move' of git://git.lwn.net/linux:
perf arm-spe: Fix a dangling Documentation/arm64 reference
mm: Fix a dangling Documentation/arm64 reference
arm64: Fix dangling references to Documentation/arm64
dt-bindings: fix dangling Documentation/arm64 reference
docs: arm64: Move arm64 documentation under Documentation/arch/
|
|
Pull arm documentation move from Jonathan Corbet:
"Move the Arm architecture documentation under Documentation/arch/.
This brings some order to the documentation directory, declutters the
top-level directory, and makes the documentation organization more
closely match that of the source"
* tag 'docs-arm-move' of git://git.lwn.net/linux:
dt-bindings: Update Documentation/arm references
docs: update some straggling Documentation/arm references
crypto: update some Arm documentation references
mips: update a reference to a moved Arm Document
arm64: Update Documentation/arm references
arm: update in-source documentation references
arm: docs: Move Arm documentation to Documentation/arch/
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Notable features are user-space support for the memcpy/memset
instructions and the permission indirection extension.
- Support for the Armv8.9 Permission Indirection Extensions. While
this feature doesn't add new functionality, it enables future
support for Guarded Control Stacks (GCS) and Permission Overlays
- User-space support for the Armv8.8 memcpy/memset instructions
- arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs
identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and
cleanups
- Removal of superfluous ISBs on context switch (following
retrospective architecture tightening)
- Decode the ISS2 register during faults for additional information
to help with debugging
- KPTI clean-up/simplification of the trampoline exit code
- Addressing several -Wmissing-prototype warnings
- Kselftest improvements for signal handling and ptrace
- Fix TPIDR2_EL0 restoring on sigreturn
- Clean-up, robustness improvements of the module allocation code
- More sysreg conversions to the automatic register/bitfields
generation
- CPU capabilities handling cleanup
- Arm documentation updates: ACPI, ptdump"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits)
kselftest/arm64: Add a test case for TPIDR2 restore
arm64/signal: Restore TPIDR2 register rather than memory state
arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe
Documentation/arm64: Add ptdump documentation
arm64: hibernate: remove WARN_ON in save_processor_state
kselftest/arm64: Log signal code and address for unexpected signals
docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst
arm64/fpsimd: Exit streaming mode when flushing tasks
docs: perf: Add new description for HiSilicon UC PMU
drivers/perf: hisi: Add support for HiSilicon UC PMU driver
drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver
perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE
perf/arm-cmn: Add sysfs identifier
perf/arm-cmn: Revamp model detection
perf/arm_dmc620: Add cpumask
arm64: mm: fix VA-range sanity check
arm64/mm: remove now-superfluous ISBs from TTBR writes
Documentation/arm64: Update ACPI tables from BBR
Documentation/arm64: Update references in arm-acpi
Documentation/arm64: Update ARM and arch reference
...
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP updates from Thomas Gleixner:
"A large update for SMP management:
- Parallel CPU bringup
The reason why people are interested in parallel bringup is to
shorten the (kexec) reboot time of cloud servers to reduce the
downtime of the VM tenants.
The current fully serialized bringup does the following per AP:
1) Prepare callbacks (allocate, intialize, create threads)
2) Kick the AP alive (e.g. INIT/SIPI on x86)
3) Wait for the AP to report alive state
4) Let the AP continue through the atomic bringup
5) Let the AP run the threaded bringup to full online state
There are two significant delays:
#3 The time for an AP to report alive state in start_secondary()
on x86 has been measured in the range between 350us and 3.5ms
depending on vendor and CPU type, BIOS microcode size etc.
#4 The atomic bringup does the microcode update. This has been
measured to take up to ~8ms on the primary threads depending
on the microcode patch size to apply.
On a two socket SKL server with 56 cores (112 threads) the boot CPU
spends on current mainline about 800ms busy waiting for the APs to
come up and apply microcode. That's more than 80% of the actual
onlining procedure.
This can be reduced significantly by splitting the bringup
mechanism into two parts:
1) Run the prepare callbacks and kick the AP alive for each AP
which needs to be brought up.
The APs wake up, do their firmware initialization and run the
low level kernel startup code including microcode loading in
parallel up to the first synchronization point. (#1 and #2
above)
2) Run the rest of the bringup code strictly serialized per CPU
(#3 - #5 above) as it's done today.
Parallelizing that stage of the CPU bringup might be possible
in theory, but it's questionable whether required surgery
would be justified for a pretty small gain.
If the system is large enough the first AP is already waiting at
the first synchronization point when the boot CPU finished the
wake-up of the last AP. That reduces the AP bringup time on that
SKL from ~800ms to ~80ms, i.e. by a factor ~10x.
The actual gain varies wildly depending on the system, CPU,
microcode patch size and other factors. There are some
opportunities to reduce the overhead further, but that needs some
deep surgery in the x86 CPU bringup code.
For now this is only enabled on x86, but the core functionality
obviously works for all SMP capable architectures.
- Enhancements for SMP function call tracing so it is possible to
locate the scheduling and the actual execution points. That allows
to measure IPI delivery time precisely"
* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
trace,smp: Add tracepoints for scheduling remotelly called functions
trace,smp: Add tracepoints around remotelly called functions
MAINTAINERS: Add CPU HOTPLUG entry
x86/smpboot: Fix the parallel bringup decision
x86/realmode: Make stack lock work in trampoline_compat()
x86/smp: Initialize cpu_primary_thread_mask late
cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
x86/smpboot: Support parallel startup of secondary CPUs
x86/smpboot: Implement a bit spinlock to protect the realmode stack
x86/apic: Save the APIC virtual base address
cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
x86/apic: Provide cpu_primary_thread mask
x86/smpboot: Enable split CPU startup
cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
cpu/hotplug: Reset task stack state in _cpu_up()
cpu/hotplug: Remove unused state functions
riscv: Switch to hotplug core state synchronization
parisc: Switch to hotplug core state synchronization
...
|
|
With the advent on scope-based resource management it comes really
tedious to abide by the contraints of -Wdeclaration-after-statement.
It will still be recommeneded to place declarations at the start of a
scope where possible, but it will no longer be enforced.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/CAHk-%3Dwi-RyoUhbChiVaJZoZXheAwnJ7OO%3DGxe85BkPAd93TwDA%40mail.gmail.com
|
|
* for-next/feat_s1pie:
: Support for the Armv8.9 Permission Indirection Extensions (stage 1 only)
KVM: selftests: get-reg-list: add Permission Indirection registers
KVM: selftests: get-reg-list: support ID register features
arm64: Document boot requirements for PIE
arm64: transfer permission indirection settings to EL2
arm64: enable Permission Indirection Extension (PIE)
arm64: add encodings of PIRx_ELx registers
arm64: disable EL2 traps for PIE
arm64: reorganise PAGE_/PROT_ macros
arm64: add PTE_WRITE to PROT_SECT_NORMAL
arm64: add PTE_UXN/PTE_WRITE to SWAPPER_*_FLAGS
KVM: arm64: expose ID_AA64MMFR3_EL1 to guests
KVM: arm64: Save/restore PIE registers
KVM: arm64: Save/restore TCR2_EL1
arm64: cpufeature: add Permission Indirection Extension cpucap
arm64: cpufeature: add TCR2 cpucap
arm64: cpufeature: add system register ID_AA64MMFR3
arm64/sysreg: add PIR*_ELx registers
arm64/sysreg: update HCRX_EL2 register
arm64/sysreg: add system registers TCR2_ELx
arm64/sysreg: Add ID register ID_AA64MMFR3
|
|
'for-next/iss2-decode', 'for-next/kselftest', 'for-next/misc', 'for-next/feat_mops', 'for-next/module-alloc', 'for-next/sysreg', 'for-next/cpucap', 'for-next/acpi', 'for-next/kdump', 'for-next/acpi-doc', 'for-next/doc' and 'for-next/tpidr2-fix', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst
docs: perf: Add new description for HiSilicon UC PMU
drivers/perf: hisi: Add support for HiSilicon UC PMU driver
drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver
perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE
perf/arm-cmn: Add sysfs identifier
perf/arm-cmn: Revamp model detection
perf/arm_dmc620: Add cpumask
dt-bindings: perf: fsl-imx-ddr: Add i.MX93 compatible
drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver
perf/arm_cspmu: Decouple APMT dependency
perf/arm_cspmu: Clean up ACPI dependency
ACPI/APMT: Don't register invalid resource
perf/arm_cspmu: Fix event attribute type
perf: arm_cspmu: Set irq affinitiy only if overflow interrupt is used
drivers/perf: hisi: Don't migrate perf to the CPU going to teardown
drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
perf/arm-cmn: Fix DTC reset
perf: qcom_l2_pmu: Make l2_cache_pmu_probe_cluster() more robust
perf/arm-cci: Slightly optimize cci_pmu_sync_counters()
* for-next/kpti:
: Simplify KPTI trampoline exit code
arm64: entry: Simplify tramp_alias macro and tramp_exit routine
arm64: entry: Preserve/restore X29 even for compat tasks
* for-next/missing-proto-warn:
: Address -Wmissing-prototype warnings
arm64: add alt_cb_patch_nops prototype
arm64: move early_brk64 prototype to header
arm64: signal: include asm/exception.h
arm64: kaslr: add kaslr_early_init() declaration
arm64: flush: include linux/libnvdimm.h
arm64: module-plts: inline linux/moduleloader.h
arm64: hide unused is_valid_bugaddr()
arm64: efi: add efi_handle_corrupted_x18 prototype
arm64: cpuidle: fix #ifdef for acpi functions
arm64: kvm: add prototypes for functions called in asm
arm64: spectre: provide prototypes for internal functions
arm64: move cpu_suspend_set_dbg_restorer() prototype to header
arm64: avoid prototype warnings for syscalls
arm64: add scs_patch_vmlinux prototype
arm64: xor-neon: mark xor_arm64_neon_*() static
* for-next/iss2-decode:
: Add decode of ISS2 to data abort reports
arm64/esr: Add decode of ISS2 to data abort reporting
arm64/esr: Use GENMASK() for the ISS mask
* for-next/kselftest:
: Various arm64 kselftest improvements
kselftest/arm64: Log signal code and address for unexpected signals
kselftest/arm64: Add a smoke test for ptracing hardware break/watch points
* for-next/misc:
: Miscellaneous patches
arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe
arm64: hibernate: remove WARN_ON in save_processor_state
arm64/fpsimd: Exit streaming mode when flushing tasks
arm64: mm: fix VA-range sanity check
arm64/mm: remove now-superfluous ISBs from TTBR writes
arm64: consolidate rox page protection logic
arm64: set __exception_irq_entry with __irq_entry as a default
arm64: syscall: unmask DAIF for tracing status
arm64: lockdep: enable checks for held locks when returning to userspace
arm64/cpucaps: increase string width to properly format cpucaps.h
arm64/cpufeature: Use helper for ECV CNTPOFF cpufeature
* for-next/feat_mops:
: Support for ARMv8.8 memcpy instructions in userspace
kselftest/arm64: add MOPS to hwcap test
arm64: mops: allow disabling MOPS from the kernel command line
arm64: mops: detect and enable FEAT_MOPS
arm64: mops: handle single stepping after MOPS exception
arm64: mops: handle MOPS exceptions
KVM: arm64: hide MOPS from guests
arm64: mops: don't disable host MOPS instructions from EL2
arm64: mops: document boot requirements for MOPS
KVM: arm64: switch HCRX_EL2 between host and guest
arm64: cpufeature: detect FEAT_HCX
KVM: arm64: initialize HCRX_EL2
* for-next/module-alloc:
: Make the arm64 module allocation code more robust (clean-up, VA range expansion)
arm64: module: rework module VA range selection
arm64: module: mandate MODULE_PLTS
arm64: module: move module randomization to module.c
arm64: kaslr: split kaslr/module initialization
arm64: kasan: remove !KASAN_VMALLOC remnants
arm64: module: remove old !KASAN_VMALLOC logic
* for-next/sysreg: (21 commits)
: More sysreg conversions to automatic generation
arm64/sysreg: Convert TRBIDR_EL1 register to automatic generation
arm64/sysreg: Convert TRBTRG_EL1 register to automatic generation
arm64/sysreg: Convert TRBMAR_EL1 register to automatic generation
arm64/sysreg: Convert TRBSR_EL1 register to automatic generation
arm64/sysreg: Convert TRBBASER_EL1 register to automatic generation
arm64/sysreg: Convert TRBPTR_EL1 register to automatic generation
arm64/sysreg: Convert TRBLIMITR_EL1 register to automatic generation
arm64/sysreg: Rename TRBIDR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBTRG_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBMAR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBSR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBBASER_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBPTR_EL1 fields per auto-gen tools format
arm64/sysreg: Rename TRBLIMITR_EL1 fields per auto-gen tools format
arm64/sysreg: Convert OSECCR_EL1 to automatic generation
arm64/sysreg: Convert OSDTRTX_EL1 to automatic generation
arm64/sysreg: Convert OSDTRRX_EL1 to automatic generation
arm64/sysreg: Convert OSLAR_EL1 to automatic generation
arm64/sysreg: Standardise naming of bitfield constants in OSL[AS]R_EL1
arm64/sysreg: Convert MDSCR_EL1 to automatic register generation
...
* for-next/cpucap:
: arm64 cpucap clean-up
arm64: cpufeature: fold cpus_set_cap() into update_cpu_capabilities()
arm64: cpufeature: use cpucap naming
arm64: alternatives: use cpucap naming
arm64: standardise cpucap bitmap names
* for-next/acpi:
: Various arm64-related ACPI patches
ACPI: bus: Consolidate all arm specific initialisation into acpi_arm_init()
* for-next/kdump:
: Simplify the crashkernel reservation behaviour of crashkernel=X,high on arm64
arm64: add kdump.rst into index.rst
Documentation: add kdump.rst to present crashkernel reservation on arm64
arm64: kdump: simplify the reservation behaviour of crashkernel=,high
* for-next/acpi-doc:
: Update ACPI documentation for Arm systems
Documentation/arm64: Update ACPI tables from BBR
Documentation/arm64: Update references in arm-acpi
Documentation/arm64: Update ARM and arch reference
* for-next/doc:
: arm64 documentation updates
Documentation/arm64: Add ptdump documentation
* for-next/tpidr2-fix:
: Fix the TPIDR2_EL0 register restoring on sigreturn
kselftest/arm64: Add a test case for TPIDR2 restore
arm64/signal: Restore TPIDR2 register rather than memory state
|
|
Currently when restoring the TPIDR2 signal context we set the new value
from the signal frame in the thread data structure but not the register,
following the pattern for the rest of the data we are restoring. This does
not work in the case of TPIDR2, the register always has the value for the
current task. This means that either we return to userspace and ignore the
new value or we context switch and save the register value on top of the
newly restored value.
Load the value from the signal context into the register instead.
Fixes: 39e54499280f ("arm64/signal: Include TPIDR2 in the signal context")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org> # 6.3.x
Link: https://lore.kernel.org/r/20230621-arm64-fix-tpidr2-signal-restore-v2-1-c8e8fcc10302@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When patching kernel alternatives, we need to be careful not to execute
kernel code which is itself subject to patching. In general, if code is
executed after the instructions in memory have been patched but prior to
the cache maintenance and barriers completing, it could lead to
UNPREDICTABLE results.
As our regular cache maintenance routines are patched with alternatives,
we have a clean_dcache_range_nopatch() function which is *intended* to
avoid patchable code and therefore supposed to be safe in the middle of
patching alternatives. Unfortunately, it's not marked as 'noinstr', and
so can be instrumented with patchable code.
Additionally, it calls read_sanitised_ftr_reg() (which may be
instrumented with patchable code) to find the sanitized value of
CTR_EL0.DminLine, and is therefore not safe to call during patching.
Luckily, since commit:
675b0563d6b26aa9 ("arm64: cpufeature: expose arm64_ftr_reg struct for CTR_EL0")
... we can read the sanitised CTR_EL0 value directly, and avoid the call
to read_sanitised_ftr_reg().
This patch marks clean_dcache_range_nopatch() as noinstr, and has it
read the sanitized CTR_EL0 value directly, avoiding the issues above.
As a bonus, this is also an optimization. As read_sanitised_ftr_reg()
performs a binary search to find the CTR_EL0 value, reading the value
directly avoids this binary search per applied alternative, avoiding
some unnecessary work.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230616103150.1238132-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The arm64 documentation has moved under Documentation/arch/; fix up
references in the arm64 subtree to match.
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
During hibernation or restoration, freeze_secondary_cpus
checks num_online_cpus via BUG_ON, and the subsequent
save_processor_state also does the checking with WARN_ON.
In the case of CONFIG_PM_SLEEP_SMP=n, freeze_secondary_cpus
is not defined, but the sole possible condition to disable
CONFIG_PM_SLEEP_SMP is !SMP where num_online_cpus is always 1.
We also don't have to check it in save_processor_state.
So remove the unnecessary checking in save_processor_state.
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230609075049.2651723-3-songshuaishuai@tinylab.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The previous patch ("function_graph: Support recording and printing
the return value of function") has laid the groundwork for the for
the funcgraph-retval, and this modification makes it available on
the ARM64 platform.
We introduce a new structure called fgraph_ret_regs for the ARM64
platform to hold return registers and the frame pointer. We then
fill its content in the return_to_handler and pass its address to
the function ftrace_return_to_handler to record the return value.
Link: https://lkml.kernel.org/r/c78366416ce93f704ae7000c4ee60eb4258c38f7.1680954589.git.pengdonglin@sangfor.com.cn
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Donglin Peng <pengdonglin@sangfor.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Ensure there is no path where we might attempt to save SME state after we
flush a task by updating the SVCR register state as well as updating our
in memory state. I haven't seen a specific case where this is happening or
seen a path where it might happen but for the cost of a single low overhead
instruction it seems sensible to close the potential gap.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230607-arm64-flush-svcr-v2-1-827306001841@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
* kvm-arm64/ampere1-hafdbs-mitigation:
: AmpereOne erratum AC03_CPU_38 mitigation
:
: AmpereOne does not advertise support for FEAT_HAFDBS due to an
: underlying erratum in the feature. The associated control bits do not
: have RES0 behavior as required by the architecture.
:
: Introduce mitigations to prevent KVM from enabling the feature at
: stage-2 as well as preventing KVM guests from enabling HAFDBS at
: stage-1.
KVM: arm64: Prevent guests from enabling HA/HD on Ampere1
KVM: arm64: Refactor HFGxTR configuration into separate helpers
arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
AmpereOne has an erratum in its implementation of FEAT_HAFDBS that
required disabling the feature on the design. This was done by reporting
the feature as not implemented in the ID register, although the
corresponding control bits were not actually RES0. This does not align
well with the requirements of the architecture, which mandates these
bits be RES0 if HAFDBS isn't implemented.
The kernel's use of stage-1 is unaffected, as the HA and HD bits are
only set if HAFDBS is detected in the ID register. KVM, on the other
hand, relies on the RES0 behavior at stage-2 to use the same value for
VTCR_EL2 on any cpu in the system. Mitigate the non-RES0 behavior by
leaving VTCR_EL2.HA clear on affected systems.
Cc: stable@vger.kernel.org
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Acked-by: D Scott Phillips <scott@os.amperecomputing.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609220104.1836988-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/misc:
: Miscellaneous updates
:
: - Avoid trapping CTR_EL0 on systems with FEAT_EVT, as the register is
: commonly read by userspace
:
: - Make use of FEAT_BTI at hyp stage-1, setting the Guard Page bit to 1
: for executable mappings
:
: - Use a separate set of pointer authentication keys for the hypervisor
: when running in protected mode (i.e. pKVM)
:
: - Plug a few holes in timer initialization where KVM fails to free the
: timer IRQ(s)
KVM: arm64: Use different pointer authentication keys for pKVM
KVM: arm64: timers: Fix resource leaks in kvm_timer_hyp_init()
KVM: arm64: Use BTI for nvhe
KVM: arm64: Relax trapping of CTR_EL0 when FEAT_EVT is available
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* kvm-arm64/configurable-id-regs:
: Configurable ID register infrastructure, courtesy of Jing Zhang
:
: Create generalized infrastructure for allowing userspace to select the
: supported feature set for a VM, so long as the feature set is a subset
: of what hardware + KVM allows. This does not add any new features that
: are user-configurable, and instead focuses on the necessary refactoring
: to enable future work.
:
: As a consequence of the series, feature asymmetry is now deliberately
: disallowed for KVM. It is unlikely that VMMs ever configured VMs with
: asymmetry, nor does it align with the kernel's overall stance that
: features must be uniform across all cores in the system.
:
: Furthermore, KVM incorrectly advertised an IMP_DEF PMU to guests for
: some time. Migrations from affected kernels was supported by explicitly
: allowing such an ID register value from userspace, and forwarding that
: along to the guest. KVM now allows an IMP_DEF PMU version to be restored
: through the ID register interface, but reinterprets the user value as
: not implemented (0).
KVM: arm64: Rip out the vestiges of the 'old' ID register scheme
KVM: arm64: Handle ID register reads using the VM-wide values
KVM: arm64: Use generic sanitisation for ID_AA64PFR0_EL1
KVM: arm64: Use generic sanitisation for ID_(AA64)DFR0_EL1
KVM: arm64: Use arm64_ftr_bits to sanitise ID register writes
KVM: arm64: Save ID registers' sanitized value per guest
KVM: arm64: Reuse fields of sys_reg_desc for idreg
KVM: arm64: Rewrite IMPDEF PMU version as NI
KVM: arm64: Make vCPU feature flags consistent VM-wide
KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF
KVM: arm64: Separate out feature sanitisation and initialisation
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
* for-next/module-alloc:
: Drag in module VA rework to handle conflicts w/ sw feature refactor
arm64: module: rework module VA range selection
arm64: module: mandate MODULE_PLTS
arm64: module: move module randomization to module.c
arm64: kaslr: split kaslr/module initialization
arm64: kasan: remove !KASAN_VMALLOC remnants
arm64: module: remove old !KASAN_VMALLOC logic
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Rather than reinventing the wheel in KVM to do ID register sanitisation
we can rely on the work already done in the core kernel. Implement a
generalized sanitisation of ID registers based on the combination of the
arm64_ftr_bits definitions from the core kernel and (optionally) a set
of KVM-specific overrides.
This all amounts to absolutely nothing for now, but will be used in
subsequent changes to realize user-configurable ID registers.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20230609190054.1542113-8-oliver.upton@linux.dev
[Oliver: split off from monster patch, rewrote commit description,
reworked RAZ handling, return EINVAL to userspace]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
On CPUs where E2H is RES1, we very quickly set the scene for
running EL2 with a VHE configuration, as we do not have any other
choice.
However, CPUs that conform to the current writing of the architecture
start with E2H=0, and only later upgrade with E2H=1. This is all
good, but nothing there is actually reconfiguring EL2 to be able
to correctly run the kernel at EL1. Huhuh...
The "obvious" solution is not to just reinitialise the timer
controls like we do, but to really intitialise *everything*
unconditionally.
This requires a bit of surgery, and is a good opportunity to
remove the macro that messes with SPSR_EL2 in init_el2_state.
With that, hVHE now works correctly on my trusted A55 machine!
Reported-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230614155129.2697388-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Add the arm64_sw.hvhe=1 option to force the use of the hVHE mode
in the hypervisor code only.
This enables the hVHE mode of operation when using KVM on VHE
hardware.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-17-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
If the OVERRIDE_HVHE SW override is set (as a precursor of
the KVM_HVHE capability), do not enable VHE for the kernel
and drop to EL1 as if VHE was either disabled or unavailable.
Further changes will enable VHE at EL2 only, with the kernel
still running at EL1.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Expose a capability keying the hVHE feature as well as a new
predicate testing it. Nothing is so far using it, and nothing
is enabling it yet.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Disabling KASLR from the command line is implemented as a feature
override. Repaint it slightly so that it can further be used as
more generic infrastructure for SW override purposes.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230609162200.2024064-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The Arm documentation has moved to Documentation/arch/arm; update
references under arch/arm64 to match.
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
With the recent feature added to enable perf events to use pseudo NMIs as
interrupts on platforms which support GICv3 or later, its now been
possible to enable hard lockup detector (or NMI watchdog) on arm64
platforms. So enable corresponding support.
One thing to note here is that normally lockup detector is initialized
just after the early initcalls but PMU on arm64 comes up much later as
device_initcall(). To cope with that, override
arch_perf_nmi_is_available() to let the watchdog framework know PMU not
ready, and inform the framework to re-initialize lockup detection once PMU
has been initialized.
[dianders@chromium.org: only HAVE_HARDLOCKUP_DETECTOR_PERF if the PMU config is enabled]
Link: https://lkml.kernel.org/r/20230523073952.1.I60217a63acc35621e13f10be16c0cd7c363caf8c@changeid
Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid
Co-developed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Set safe maximum CPU frequency to 5 GHz in case a particular platform
doesn't implement cpufreq driver. Although, architecture doesn't put any
restrictions on maximum frequency but 5 GHz seems to be safe maximum given
the available Arm CPUs in the market which are clocked much less than 5
GHz.
On the other hand, we can't make it much higher as it would lead to a
large hard-lockup detection timeout on parts which are running slower (eg.
1GHz on Developerbox) and doesn't possess a cpufreq driver.
Link: https://lkml.kernel.org/r/20230519101840.v5.17.Ia9d02578e89c3f44d3cb12eec8b0176603c8ab2f@changeid
Co-developed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The only instances of get_user_pages_remote() invocations which used the
vmas parameter were for a single page which can instead simply look up the
VMA directly. In particular:-
- __update_ref_ctr() looked up the VMA but did nothing with it so we simply
remove it.
- __access_remote_vm() was already using vma_lookup() when the original
lookup failed so by doing the lookup directly this also de-duplicates the
code.
We are able to perform these VMA operations as we already hold the
mmap_lock in order to be able to call get_user_pages_remote().
As part of this work we add get_user_page_vma_remote() which abstracts the
VMA lookup, error handling and decrementing the page reference count should
the VMA lookup fail.
This forms part of a broader set of patches intended to eliminate the vmas
parameter altogether.
[akpm@linux-foundation.org: avoid passing NULL to PTR_ERR]
Link: https://lkml.kernel.org/r/d20128c849ecdbf4dd01cc828fcec32127ed939a.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> (for arm64)
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com> (for s390)
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
gcc-13 warns about function definitions for builtin interfaces that have a
different prototype, e.g.:
In file included from kasan_test.c:31:
kasan.h:574:6: error: conflicting types for built-in function '__asan_register_globals'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
574 | void __asan_register_globals(struct kasan_global *globals, size_t size);
kasan.h:577:6: error: conflicting types for built-in function '__asan_alloca_poison'; expected 'void(void *, long int)' [-Werror=builtin-declaration-mismatch]
577 | void __asan_alloca_poison(unsigned long addr, size_t size);
kasan.h:580:6: error: conflicting types for built-in function '__asan_load1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
580 | void __asan_load1(unsigned long addr);
kasan.h:581:6: error: conflicting types for built-in function '__asan_store1'; expected 'void(void *)' [-Werror=builtin-declaration-mismatch]
581 | void __asan_store1(unsigned long addr);
kasan.h:643:6: error: conflicting types for built-in function '__hwasan_tag_memory'; expected 'void(void *, unsigned char, long int)' [-Werror=builtin-declaration-mismatch]
643 | void __hwasan_tag_memory(unsigned long addr, u8 tag, unsigned long size);
The two problems are:
- Addresses are passes as 'unsigned long' in the kernel, but gcc-13
expects a 'void *'.
- sizes meant to use a signed ssize_t rather than size_t.
Change all the prototypes to match these. Using 'void *' consistently for
addresses gets rid of a couple of type casts, so push that down to the
leaf functions where possible.
This now passes all randconfig builds on arm, arm64 and x86, but I have
not tested it on the other architectures that support kasan, since they
tend to fail randconfig builds in other ways. This might fail if any of
the 32-bit architectures expect a 'long' instead of 'int' for the size
argument.
The __asan_allocas_unpoison() function prototype is somewhat weird, since
it uses a pointer for 'stack_top' and an size_t for 'stack_bottom'. This
looks like it is meant to be 'addr' and 'size' like the others, but the
implementation clearly treats them as 'top' and 'bottom'.
Link: https://lkml.kernel.org/r/20230509145735.9263-2-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The following code:
static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
const syscall_fn_t syscall_table[])
{
...
if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) {
local_daif_mask();
flags = read_thread_flags(); -------------------------------- A
if (!has_syscall_work(flags) && !(flags & _TIF_SINGLESTEP))
return;
local_daif_restore(DAIF_PROCCTX);
}
trace_exit:
syscall_trace_exit(regs); ------- B
}
1. The flags in the if conditional statement should be used
in the same way as the flags in the function syscall_trace_exit,
because DAIF is not shielded in the function syscall_trace_exit,
so when the flags are obtained in line A of the code,
there is no need to shield DAIF.
Don't care about the modification of flags after line A.
2. Masking DAIF caused syscall performance to deteriorate by about 10%.
The Unixbench single core syscall performance data is as follows:
Machine: Kunpeng 920
Mask DAIF:
System Call Overhead 1172314.1 lps (10.0 s, 7 samples)
System Benchmarks Partial Index BASELINE RESULT INDEX
System Call Overhead 15000.0 1172314.1 781.5
========
System Benchmarks Index Score (Partial Only) 781.5
Unmask DAIF:
System Call Overhead 1287944.6 lps (10.0 s, 7 samples)
System Benchmarks Partial Index BASELINE RESULT INDEX
System Call Overhead 15000.0 1287944.6 858.6
========
System Benchmarks Index Score (Partial Only) 858.6
Rationale from Mark Rutland as for why this is safe:
This masking is an artifact of the old "ret_fast_syscall" assembly that
was converted to C in commit:
f37099b6992a0b81 ("arm64: convert syscall trace logic to C")
The assembly would mask DAIF, check the thread flags, and fall through
to kernel_exit without unmasking if no tracing was needed. The
conversion copied this masking into the C version, though this wasn't
strictly necessary.
As (in general) thread flags can be manipulated by other threads, it's
not safe to manipulate the thread flags with plain reads and writes, and
since commit:
342b3808786518ce ("arm64: Snapshot thread flags")
... we use read_thread_flags() to read the flags atomically.
With this, there is no need to mask DAIF transiently around reading the
flags, as we only decide whether to trace while DAIF is masked, and the
actual tracing occurs with DAIF unmasked. When el0_svc_common() returns
its caller will unconditionally mask DAIF via exit_to_user_mode(), so
the masking is redundant.
Signed-off-by: Guo Hui <guohui@uniontech.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230526024715.8773-1-guohui@uniontech.com
[catalin.marinas@arm.com: updated comment with Mark's rationale]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
We only use cpus_set_cap() in update_cpu_capabilities(), where we
open-code an analgous update to boot_cpucaps.
Due to the way the cpucap_ptrs[] array is initialized, we know that the
capability number cannot be greater than or equal to ARM64_NCAPS, so the
warning is superfluous.
Fold cpus_set_cap() into update_cpu_capabilities(), matching what we do
for the boot_cpucaps, and making the relationship between the two a bit
clearer.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230607164846.3967305-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
To more clearly align the various users of the cpucap enumeration, this patch
changes the cpufeature code to use the term `cpucap` in favour of `cpu_hwcap`.
This more clearly aligns with other users of the cpucaps, and avoids confusion
with the ELF hwcaps.
There should be no functional change as a result of this patch; this is
purely a renaming exercise.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230607164846.3967305-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
To more clearly align the various users of the cpucap enumeration, this patch
changes the alternative code to use the term `cpucap` in favour of `feature`.
The alternative_has_feature_{likely,unlikely}() functions are renamed to
alternative_has_cap_<likely,unlikely}() to more clearly align with the
cpus_have_{const_,}cap() helpers.
At the same time remove the stale comment referring to the "ARM64_CB
bit", which is evidently a typo for ARM64_CB_PATCH, which was removed in
commit:
4c0bd995d73ed889 ("arm64: alternatives: have callbacks take a cap")
There should be no functional change as a result of this patch; this is
purely a renaming exercise.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230607164846.3967305-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The 'cpu_hwcaps' and 'boot_capabilities' bitmaps are bitmaps have the
same enumerated bits, but are named wildly differently for no good
reason. The terms 'hwcaps' and 'capabilities' have become ambiguous over
time (e.g. due to clashes with ELF hwcaps and the structures used to
manage feature detection), and it would be nicer to use 'cpucaps',
matching the <asm/cpucaps.h> header the enumerated bit indices are
defined in.
While this isn't a functional problem, it makes the code harder than
necessary to understand, and hard to extend with related functionality
(e.g. per-cpu cpucap bitmaps).
To that end, this patch renames `boot_capabilities` to `boot_cpucaps`
and `cpu_hwcaps` to `system_cpucaps`. This more clearly indicates the
relationship between the two and aligns with terminology used elsewhere
in our feature management code.
This change was scripted with:
| find . -type f -name '*.[chS]' -print0 | \
| xargs -0 sed -i 's/\<boot_capabilities\>/boot_cpucaps/'
| find . -type f -name '*.[chS]' -print0 | \
| xargs -0 sed -i 's/\<cpu_hwcaps\>/system_cpucaps/'
... and the instance of "cpu_hwcap" (without a trailing "s") in
<asm/mmu_context.h> corrected manually to "system_cpucaps".
Subsequent patches will adjust the naming of related functions to better
align with the `cpucap` naming.
There should be no functional change as a result of this patch; this is
purely a renaming exercise.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230607164846.3967305-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently, the modules region is 128M in size, which is a problem for
some large modules. Shanker reports [1] that the NVIDIA GPU driver alone
can consume 110M of module space in some configurations. We'd like to
make the modules region a full 2G such that we can always make use of a
2G range.
It's possible to build kernel images which are larger than 128M in some
configurations, such as when many debug options are selected and many
drivers are built in. In these configurations, we can't legitimately
select a base for a 128M module region, though we currently select a
value for which allocation will fail. It would be nicer to have a
diagnostic message in this case.
Similarly, in theory it's possible to build a kernel image which is
larger than 2G and which cannot support modules. While this isn't likely
to be the case for any realistic kernel deplyed in the field, it would
be nice if we could print a diagnostic in this case.
This patch reworks the module VA range selection to use a 2G range, and
improves handling of cases where we cannot select legitimate module
regions. We now attempt to select a 128M region and a 2G region:
* The 128M region is selected such that modules can use direct branches
(with JUMP26/CALL26 relocations) to branch to kernel code and other
modules, and so that modules can reference data and text (using PREL32
relocations) anywhere in the kernel image and other modules.
This region covers the entire kernel image (rather than just the text)
to ensure that all PREL32 relocations are in range even when the
kernel data section is absurdly large. Where we cannot allocate from
this region, we'll fall back to the full 2G region.
* The 2G region is selected such that modules can use direct branches
with PLTs to branch to kernel code and other modules, and so that
modules can use reference data and text (with PREL32 relocations) in
the kernel image and other modules.
This region covers the entire kernel image, and the 128M region (if
one is selected).
The two module regions are randomized independently while ensuring the
constraints described above.
[1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Contemporary kernels and modules can be relatively large, especially
when common debug options are enabled. Using GCC 12.1.0, a v6.3-rc7
defconfig kernel is ~38M, and with PROVE_LOCKING + KASAN_INLINE enabled
this expands to ~117M. Shanker reports [1] that the NVIDIA GPU driver
alone can consume 110M of module space in some configurations.
Both KASLR and ARM64_ERRATUM_843419 select MODULE_PLTS, so anyone
wanting a kernel to have KASLR or run on Cortex-A53 will have
MODULE_PLTS selected. This is the case in defconfig and distribution
kernels (e.g. Debian, Android, etc).
Practically speaking, this means we're very likely to need MODULE_PLTS
and while it's almost guaranteed that MODULE_PLTS will be selected, it
is possible to disable support, and we have to maintain some awkward
special cases for such unusual configurations.
This patch removes the MODULE_PLTS config option, with the support code
always enabled if MODULES is selected. This results in a slight
simplification, and will allow for further improvement in subsequent
patches.
For any config which currently selects MODULE_PLTS, there will be no
functional change as a result of this patch.
[1] https://lore.kernel.org/linux-arm-kernel/159ceeab-09af-3174-5058-445bc8dcf85b@nvidia.com/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Shanker Donthineni <sdonthineni@nvidia.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When CONFIG_RANDOMIZE_BASE=y, module_alloc_base is a variable which is
configured by kaslr_module_init() in kaslr.c, and otherwise it is an
expression defined in module.h.
As kaslr_module_init() is no longer tightly coupled with the KASLR
initialization code, we can centralize this in module.c.
This patch moves kaslr_module_init() to module.c, making
module_alloc_base a static variable, and removing redundant includes from
kaslr.c. For the defintion of struct arm64_ftr_override we must include
<asm/cpufeature.h>, which was previously included transitively via
another header.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently kaslr_init() handles a mixture of detecting/announcing whether
KASLR is enabled, and randomizing the module region depending on whether
KASLR is enabled.
To make it easier to rework the module region initialization, split the
KASLR initialization into two steps:
* kaslr_init() determines whether KASLR should be enabled, and announces
this choice, recording this to a new global boolean variable. This is
called from setup_arch() just before the existing call to
kaslr_requires_kpti() so that this will always provide the expected
result.
* kaslr_module_init() randomizes the module region when required. This
is called as a subsys_initcall, where we previously called
kaslr_init().
As a bonus, moving the KASLR reporting earlier makes it easier to spot
and permits it to be logged via earlycon, making it easier to debug any
issues that could be triggered by KASLR.
Booting a v6.4-rc1 kernel with this patch applied, the log looks like:
| EFI stub: Booting Linux Kernel...
| EFI stub: Generating empty DTB
| EFI stub: Exiting boot services...
| [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
| [ 0.000000] Linux version 6.4.0-rc1-00006-g4763a8f8aeb3 (mark@lakrids) (aarch64-linux-gcc (GCC) 12.1.0, GNU ld (GNU Binutils) 2.38) #2 SMP PREEMPT Tue May 9 11:03:37 BST 2023
| [ 0.000000] KASLR enabled
| [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
| [ 0.000000] printk: bootconsole [pl11] enabled
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Historically, KASAN could be selected with or without KASAN_VMALLOC, and
we had to be very careful where to place modules when KASAN_VMALLOC was
not selected.
However, since commit:
f6f37d9320a11e90 ("arm64: select KASAN_VMALLOC for SW/HW_TAGS modes")
Selecting CONFIG_KASAN on arm64 will also select CONFIG_KASAN_VMALLOC,
and so the logic for handling CONFIG_KASAN without CONFIG_KASAN_VMALLOC
is redundant and can be removed.
Note: the "kasan.vmalloc={on,off}" option which only exists for HW_TAGS
changes whether the vmalloc region is given non-match-all tags, and does
not affect the page table manipulation code.
The VM_DEFER_KMEMLEAK flag was only necessary for !CONFIG_KASAN_VMALLOC
as described in its introduction in commit:
60115fa54ad7b913 ("mm: defer kmemleak object creation of module_alloc()")
... and therefore it can also be removed.
Remove the redundant logic for !CONFIG_KASAN_VMALLOC. At the same time,
add the missing braces around the multi-line conditional block in
arch/arm64/kernel/module.c.
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com>
Link: https://lore.kernel.org/r/20230530110328.2213762-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently arm64 doesn't use CONFIG_GENERIC_ENTRY and doesn't call
lockdep_sys_exit() when returning to userspace.
This means that lockdep won't check for held locks when
returning to userspace, which would be useful to detect kernel bugs.
Call lockdep_sys_exit() when returning to userspace,
enabling checking for held locks.
At the same time, rename arm64's prepare_exit_to_user_mode() to
exit_to_user_mode_prepare() to more clearly align with the naming
in the generic entry code.
Signed-off-by: Eric Chan <ericchancf@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230531090909.357047-1-ericchancf@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Copy the EL1 registers: TCR2_EL1, PIR_EL1, PIRE0_EL1, such that PIE
is also enabled for EL2.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230606145859.697944-18-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
With PIE enabled, the swapper PTEs would have a Permission Indirection Index
(PIIndex) of 0. A PIIndex of 0 is not currently used by any other PTEs.
To avoid using index 0 specifically for the swapper PTEs, mark them as
PTE_UXN and PTE_WRITE, so that they map to a PAGE_KERNEL_EXEC equivalent.
This also adds PTE_WRITE to KPTI_NG_PTE_FLAGS, which was tested by booting
with kpti=on.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230606145859.697944-12-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
This indicates if the system supports PIE. This is a CPUCAP_BOOT_CPU_FEATURE
as the boot CPU will enable PIE if it has it, so secondary CPUs must also
have this feature.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230606145859.697944-8-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
This capability indicates if the system supports the TCR2_ELx system register.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230606145859.697944-7-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Add new system register ID_AA64MMFR3 to the cpufeature infrastructure.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20230606145859.697944-6-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Make it possible to disable the MOPS extension at runtime using the
kernel command line. This can be useful for testing or working around
hardware issues. For example it could be used to test new memory copy
routines that do not use MOPS instructions (e.g. from Arm Optimized
Routines).
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20230509142235.3284028-11-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The Arm v8.8/9.3 FEAT_MOPS feature provides new instructions that
perform a memory copy or set. Wire up the cpufeature code to detect the
presence of FEAT_MOPS and enable it.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20230509142235.3284028-10-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When a MOPS main or epilogue instruction is being executed, the task may
get scheduled on a different CPU and restart execution from the prologue
instruction. If the main or epilogue instruction is being single stepped
then it makes sense to finish the step and take the step exception
before starting to execute the next (prologue) instruction. So
fast-forward the single step state machine when taking a MOPS exception.
This means that if a main or epilogue instruction is single stepped with
ptrace, the debugger will sometimes observe the PC moving back to the
prologue instruction. (As already mentioned, this should be rare as it
only happens when the task is scheduled to another CPU during the step.)
This also ensures that perf breakpoints count prologue instructions
consistently (i.e. every time they are executed), rather than skipping
them when there also happens to be a breakpoint on a main or epilogue
instruction.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20230509142235.3284028-9-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The memory copy/set instructions added as part of FEAT_MOPS can take an
exception (e.g. page fault) part-way through their execution and resume
execution afterwards.
If however the task is re-scheduled and execution resumes on a different
CPU, then the CPU may take a new type of exception to indicate this.
This is because the architecture allows two options (Option A and Option
B) to implement the instructions and a heterogeneous system can have
different implementations between CPUs.
In this case the OS has to reset the registers and restart execution
from the prologue instruction. The algorithm for doing this is provided
as part of the Arm ARM.
Add an exception handler for the new exception and wire it up for
userspace tasks.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20230509142235.3284028-8-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Detect if the system has the new HCRX_EL2 register added in ARMv8.7/9.2,
so that subsequent patches can check for its presence.
KVM currently relies on the register being present on all CPUs (or
none), so the kernel will panic if that is not the case. Fortunately no
such systems currently exist, but this can be revisited if they appear.
Note that the kernel will not panic if CONFIG_KVM is disabled.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20230509142235.3284028-3-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|