summaryrefslogtreecommitdiff
path: root/scripts/lib/kdoc/kdoc_parser.py
diff options
context:
space:
mode:
authorMark Rutland <mark.rutland@arm.com>2025-05-08 14:26:33 +0100
committerWill Deacon <will@kernel.org>2025-05-08 15:29:10 +0100
commitcde5c32db55740659fca6d56c09b88800d88fd29 (patch)
treec6fa37c9715de0fc2a323695cf154c711650f4bc /scripts/lib/kdoc/kdoc_parser.py
parenta6d066f705747124fb2d662df0acbb45ffe6c406 (diff)
arm64/fpsimd: Make clone() compatible with ZA lazy saving
Linux is intended to be compatible with userspace written to Arm's AAPCS64 procedure call standard [1,2]. For the Scalable Matrix Extension (SME), AAPCS64 was extended with a "ZA lazy saving scheme", where SME's ZA tile is lazily callee-saved and caller-restored. In this scheme, TPIDR2_EL0 indicates whether the ZA tile is live or has been saved by pointing to a "TPIDR2 block" in memory, which has a "za_save_buffer" pointer. This scheme has been implemented in GCC and LLVM, with necessary runtime support implemented in glibc and bionic. AAPCS64 does not specify how the ZA lazy saving scheme is expected to interact with thread creation mechanisms such as fork() and pthread_create(), which would be implemented in terms of the Linux clone syscall. The behaviour implemented by Linux and glibc/bionic doesn't always compose safely, as explained below. Currently the clone syscall is implemented such that PSTATE.ZA and the ZA tile are always inherited by the new task, and TPIDR2_EL0 is inherited unless the 'flags' argument includes CLONE_SETTLS, in which case TPIDR2_EL0 is set to 0/NULL. This doesn't make much sense: (a) TPIDR2_EL0 is part of the calling convention, and changes as control is passed between functions. It is *NOT* used for thread local storage, despite superficial similarity to TPIDR_EL0, which is is used as the TLS register. (b) TPIDR2_EL0 and PSTATE.ZA are tightly coupled in the procedure call standard, and some combinations of states are illegal. In general, manipulating the two independently is not guaranteed to be safe. In practice, code which is compliant with the procedure call standard may issue a clone syscall while in the "ZA dormant" state, where PSTATE.ZA==1 and TPIDR2_EL0 is non-null and indicates that ZA needs to be saved. This can cause a variety of problems, including: * If the implementation of pthread_create() passes CLONE_SETTLS, the new thread will start with PSTATE.ZA==1 and TPIDR2==NULL. Per the procedure call standard this is not a legitimate state for most functions. This can cause data corruption (e.g. as code may rely on PSTATE.ZA being 0 to guarantee that an SMSTART ZA instruction will zero the ZA tile contents), and may result in other undefined behaviour. * If the implementation of pthread_create() does not pass CLONE_SETTLS, the new thread will start with PSTATE.ZA==1 and TPIDR2 pointing to a TPIDR2 block on the parent thread's stack. This can result in a variety of problems, e.g. - The child may write back to the parent's za_save_buffer, corrupting its contents. - The child may read from the TPIDR2 block after the parent has reused this memory for something else, and consequently the child may abort or clobber arbitrary memory. Ideally we'd require that userspace ensures that a task is in the "ZA off" state (with PSTATE.ZA==0 and TPIDR2_EL0==NULL) prior to issuing a clone syscall, and have the kernel force this state for new threads. Unfortunately, contemporary C libraries do not do this, and simply forcing this state within the implementation of clone would break fork(). Instead, we can bodge around this by considering the CLONE_VM flag, and manipulate PSTATE.ZA and TPIDR2_EL0 as a pair. CLONE_VM indicates that the new task will run in the same address space as its parent, and in that case it doesn't make sense to inherit a stale pointer to the parent's TPIDR2 block: * For fork(), CLONE_VM will not be set, and it is safe to inherit both PSTATE.ZA and TPIDR2_EL0 as the new task will have its own copy of the address space, and cannot clobber its parent's stack. * For pthread_create() and vfork(), CLONE_VM will be set, and discarding PSTATE.ZA and TPIDR2_EL0 for the new task doesn't break any existing assumptions in userspace. Implement this behaviour for clone(). We currently inherit PSTATE.ZA in arch_dup_task_struct(), but this does not have access to the clone flags, so move this logic under copy_thread(). Documentation is updated to describe the new behaviour. [1] https://github.com/ARM-software/abi-aa/releases/download/2025Q1/aapcs64.pdf [2] https://github.com/ARM-software/abi-aa/blob/c51addc3dc03e73a016a1e4edf25440bcac76431/aapcs64/aapcs64.rst Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Richard Sandiford <richard.sandiford@arm.com> Cc: Sander De Smalen <sander.desmalen@arm.com> Cc: Tamas Petz <tamas.petz@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yury Khrustalev <yury.khrustalev@arm.com> Acked-by: Yury Khrustalev <yury.khrustalev@arm.com> Link: https://lore.kernel.org/r/20250508132644.1395904-14-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Diffstat (limited to 'scripts/lib/kdoc/kdoc_parser.py')
0 files changed, 0 insertions, 0 deletions