summaryrefslogtreecommitdiff
path: root/arch/arm64/kernel
AgeCommit message (Collapse)Author
2023-10-23arm64: cpufeature: Display the set of cores with a featureJeremy Linton
The AMU feature can be enabled on a subset of the cores in a system. Because of that, it prints a message for each core as it is detected. This becomes tedious when there are hundreds of cores. Instead, for CPU features which can be enabled on a subset of the present cores, lets wait until update_cpu_capabilities() and print the subset of cores the feature was enabled on. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com> Tested-by: Punit Agrawal <punit.agrawal@bytedance.com> Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-18mm/gup: adapt get_user_page_vma_remote() to never return NULLLorenzo Stoakes
get_user_pages_remote() will never return 0 except in the case of FOLL_NOWAIT being specified, which we explicitly disallow. This simplifies error handling for the caller and avoids the awkwardness of dealing with both errors and failing to pin. Failing to pin here is an error. Link: https://lkml.kernel.org/r/00319ce292d27b3aae76a0eb220ce3f528187508.1696288092.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-17efi: move screen_info into efi init codeArnd Bergmann
After the vga console no longer relies on global screen_info, there are only two remaining use cases: - on the x86 architecture, it is used for multiple boot methods (bzImage, EFI, Xen, kexec) to commucate the initial VGA or framebuffer settings to a number of device drivers. - on other architectures, it is only used as part of the EFI stub, and only for the three sysfb framebuffers (simpledrm, simplefb, efifb). Remove the duplicate data structure definitions by moving it into the efi-init.c file that sets it up initially for the EFI case, leaving x86 as an exception that retains its own definition for non-EFI boots. The added #ifdefs here are optional, I added them to further limit the reach of screen_info to configurations that have at least one of the users enabled. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231017093947.3627976-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-16arm64/mm: Hoist synchronization out of set_ptes() loopRyan Roberts
set_ptes() sets a physically contiguous block of memory (which all belongs to the same folio) to a contiguous block of ptes. The arm64 implementation of this previously just looped, operating on each individual pte. But the __sync_icache_dcache() and mte_sync_tags() operations can both be hoisted out of the loop so that they are performed once for the contiguous set of pages (which may be less than the whole folio). This should result in minor performance gains. __sync_icache_dcache() already acts on the whole folio, and sets a flag in the folio so that it skips duplicate calls. But by hoisting the call, all the pte testing is done only once. mte_sync_tags() operates on each individual page with its own loop. But by passing the number of pages explicitly, we can rely solely on its loop and do the checks only once. This approach also makes it robust for the future, rather than assuming if a head page of a compound page is being mapped, then the whole compound page is being mapped, instead we explicitly know how many pages are being mapped. The old assumption may not continue to hold once the "anonymous large folios" feature is merged. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Remove cpus_have_const_cap()Mark Rutland
There are no longer any users of cpus_have_const_cap(), and therefore it can be removed. Remove cpus_have_const_cap(). At the same time, remove __cpus_have_const_cap(), as this is a trivial wrapper of alternative_has_cap_unlikely(), which can be used directly instead. The comment for __system_matches_cap() is updated to no longer refer to cpus_have_const_cap(). As we have a number of ways to check the cpucaps, the specific suggestions are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kristina Martsenko <kristina.martsenko@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNPMark Rutland
In has_useable_cnp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. We use has_useable_cnp() to determine whether we have the system-wide ARM64_HAS_CNP cpucap. Due to the structure of the cpufeature code, we call has_useable_cnp() in two distinct cases: 1) When finalizing system capabilities, setup_system_capabilities() will call has_useable_cnp() with SCOPE_SYSTEM to determine whether all CPUs have the feature. This is called after we've detected any local cpucaps including ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but prior to patching alternatives. If the ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not detect ARM64_HAS_CNP. 2) After finalizing system capabilties, verify_local_cpu_capabilities() will call has_useable_cnp() with SCOPE_LOCAL_CPU to verify that CPUs have CNP if we previously detected it. Note that if ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not have detected ARM64_HAS_CNP. For case 1 we must check the system_cpucaps bitmap as this occurs prior to patching the alternatives. For case 2 we'll only call has_useable_cnp() once per subsequent onlining of a CPU, and as this isn't a fast path it's not necessary to optimize for this case. This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. The ARM64_WORKAROUND_NVIDIA_CARMEL_CNP cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098Mark Rutland
In elf_hwcap_fixup() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_1742098, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_1742098 cpucap is detected and patched before elf_hwcap_fixup() can run, and hence it is not necessary to use cpus_have_const_cap(). We run cpus_have_const_cap() at most twice: once after finalizing system cpucaps, and potentially once more after detecting mismatched CPUs which support AArch32 at EL0. Due to this, it's not necessary to optimize for many calls to elf_hwcap_fixup(), and it's fine to use cpus_have_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. For consistenct with other cpucaps, the ARM64_WORKAROUND_1742098 cpucap is added to cpucap_is_possible() so that code can be elided when this is not possible. However, as we only define compat_elf_hwcap2 when CONFIG_COMPAT=y, some ifdeffery is still required within user_feature_fixup() to avoid build errors when CONFIG_COMPAT=n. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419Mark Rutland
We use cpus_have_const_cap() to check for ARM64_WORKAROUND_1542419 but this is not necessary and cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_1542419 cpucap is detected and patched before any userspace code can run, and the both __do_compat_cache_op() and ctr_read_handler() are only reachable from exceptions taken from userspace. Thus it is not necessary for either to use cpus_have_const_cap(), and cpus_have_final_cap() is equivalent. This patch replaces the use of cpus_have_const_cap() with cpus_have_final_cap(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Using cpus_have_final_cap() clearly documents that we do not expect this code to run before cpucaps are finalized, and will make it easier to spot issues if code is changed in future to allow these functions to be reached earlier. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419Mark Rutland
In count_plts() and is_forbidden_offset_for_adrp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is not necessary and cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. It's not possible to load a module in the window between detecting the ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA range limits are initialized much later in module_init_limits() which is a subsys_initcall, and module loading cannot happen before this. Hence it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to use cpus_have_const_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_final_cap() which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Using cpus_have_final_cap() clearly documents that we do not expect this code to run before cpucaps are finalized, and will make it easier to spot issues if code is changed in future to allow modules to be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible, and redundant IS_ENABLED() checks are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0Mark Rutland
In arm64_kernel_unmapped_at_el0() we use cpus_have_const_cap() to check for ARM64_UNMAP_KERNEL_AT_EL0, but this is only necessary so that arm64_get_bp_hardening_vector() and this_cpu_set_vectors() can run prior to alternatives being patched. Otherwise this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is a system-wide feature that is detected and patched before any translation tables are created for userspace. In the window between detecting the ARM64_UNMAP_KERNEL_AT_EL0 cpucap and patching alternatives, most users of arm64_kernel_unmapped_at_el0() do not need to know that the cpucap has been detected: * As KVM is initialized after cpucaps are finalized, no usaef of arm64_kernel_unmapped_at_el0() in the KVM code is reachable during this window. * The arm64_mm_context_get() function in arch/arm64/mm/context.c is only called after the SMMU driver is brought up after alternatives have been patched. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). Similarly the asids_update_limit() function is called after alternatives have been patched as an arch_initcall, and this can safely use cpus_have_final_cap() or alternative_has_cap_*(). Similarly we do not expect an ASID rollover to occur between cpucaps being detected and patching alternatives. Thus set_reserved_asid_bits() can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The __tlbi_user() and __tlbi_user_level() macros are not used during this window, and only need to invalidate additional entries once userspace translation tables have been active on a CPU. Thus these can safely use alternative_has_cap_*(). * The xen_kernel_unmapped_at_usr() function is not used during this window as it is only used in a late_initcall. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The arm64_get_meltdown_state() function is not used during this window. It only used by arm64_get_meltdown_state() and KVM code, both of which are only used after cpucaps have been finalized. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The tls_thread_switch() uses arm64_kernel_unmapped_at_el0() as an optimization to avoid zeroing tpidrro_el0 when KPTI is enabled and this will be trampled by the KPTI trampoline. It doesn't matter if this continues to zero the register during the window between detecting the cpucap and patching alternatives, so this can safely use alternative_has_cap_*(). * The sdei_arch_get_entry_point() and do_sdei_event() functions aren't reachable at this time as the SDEI driver is registered later by acpi_init() -> acpi_ghes_init() -> sdei_init(), where acpi_init is a subsys_initcall. Thus these can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The uses under drivers/ aren't reachable at this time as the drivers are registered later: - TRBE is registered via module_init() - SMMUv3 is registred via module_driver() - SPE is registred via module_init() * The arm64_get_bp_hardening_vector() and this_cpu_set_vectors() functions need to run on boot CPUs prior to patching alternatives. As these are only called during the onlining of a CPU, it's fine to perform a system_cpucaps bitmap test using cpus_have_cap(). This patch modifies this_cpu_set_vectors() to use cpus_have_cap(), and replaced all other use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}Mark Rutland
In system_supports_{sve,sme,sme2,fa64}() we use cpus_have_const_cap() to check for the relevant cpucaps, but this is only necessary so that sve_setup() and sme_setup() can run prior to alternatives being patched, and otherwise alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. All of system_supports_{sve,sme,sme2,fa64}() will return false prior to system cpucaps being detected. In the window between system cpucaps being detected and patching alternatives, we need system_supports_sve() and system_supports_sme() to run to initialize SVE and SME properties, but all other users of system_supports_{sve,sme,sme2,fa64}() don't depend on the relevant cpucap becoming true until alternatives are patched: * No KVM code runs until after alternatives are patched, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The cpuid_cpu_online() callback in arch/arm64/kernel/cpuinfo.c is registered later from cpuinfo_regs_init() as a device_initcall, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The entry, signal, and ptrace code isn't reachable until userspace has run, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * Currently perf_reg_validate() will un-reserve the PERF_REG_ARM64_VG pseudo-register before alternatives are patched, and before sve_setup() has run. If a sampling event is created early enough, this would allow perf_ext_reg_value() to sample (the as-yet uninitialized) thread_struct::vl[] prior to alternatives being patched. It would be preferable to defer this until alternatives are patched, and this can safely use alternative_has_cap_*(). * The context-switch code will run during this window as part of stop_machine() used during alternatives_patch_all(), and potentially for other work if other kernel threads are created early. No threads require the use of SVE/SME/SME2/FA64 prior to alternatives being patched, and it would be preferable for the related context-switch logic to take effect after alternatives are patched so that ths is guaranteed to see a consistent system-wide state (e.g. anything initialized by sve_setup() and sme_setup(). This can safely ues alternative_has_cap_*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The sve_setup() and sme_setup() functions are modified to use cpus_have_cap() directly so that they can observe the cpucaps being set prior to alternatives being patched. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_SSBSMark Rutland
In ssbs_thread_switch() we use cpus_have_const_cap() to check for ARM64_SSBS, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in ssbs_thread_switch() is an optimization to avoid the overhead of spectre_v4_enable_task_mitigation() where all CPUs implement SSBS and naturally preserve the SSBS bit in SPSR_ELx. In the window between detecting the ARM64_SSBS system-wide and patching alternative branches it is benign to continue to call spectre_v4_enable_task_mitigation(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PANMark Rutland
In system_uses_hw_pan() we use cpus_have_const_cap() to check for ARM64_HAS_PAN, but this is only necessary so that the system_uses_ttbr0_pan() check in setup_cpu_features() can run prior to alternatives being patched, and otherwise this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_HAS_PAN cpucap is used by system_uses_hw_pan() and system_uses_ttbr0_pan() depending on whether CONFIG_ARM64_SW_TTBR0_PAN is selected, and: * We only use system_uses_hw_pan() directly in __sdei_handler(), which isn't reachable until after alternatives have been patched, and for this it is safe to use alternative_has_cap_*(). * We use system_uses_ttbr0_pan() in a few places: - In check_and_switch_context() and cpu_uninstall_idmap(), which will defer installing a translation table into TTBR0 when the ARM64_HAS_PAN cpucap is not detected. Prior to patching alternatives, all CPUs will be using init_mm with the reserved ttbr0 translation tables install in TTBR0, so these can safely use alternative_has_cap_*(). - In update_saved_ttbr0(), which will only save the active TTBR0 into a per-thread variable when the ARM64_HAS_PAN cpucap is not detected. Prior to patching alternatives, all CPUs will be using init_mm with the reserved ttbr0 translation tables install in TTBR0, so these can safely use alternative_has_cap_*(). - In efi_set_pgd(), which will handle check_and_switch_context() deferring the installation of TTBR0 when TTBR0 PAN is detected. The EFI runtime services are not initialized until after alternatives have been patched, and so this can safely use alternative_has_cap_*() or cpus_have_final_cap(). - In uaccess_ttbr0_disable() and uaccess_ttbr0_enable(), where we'll avoid installing/uninstalling a translation table in TTBR0 when ARM64_HAS_PAN is detected. Prior to patching alternatives we will not perform any uaccess and will not call uaccess_ttbr0_disable() or uaccess_ttbr0_enable(), and so these can safely use alternative_has_cap_*() or cpus_have_final_cap(). - In is_el1_permission_fault() where we will consider a translation fault on a TTBR0 address to be a permission fault when ARM64_HAS_PAN is not detected *and* we have set the PAN bit in the SPSR (which tells us that in the interrupted context, TTBR0 pointed at the reserved zero ttbr). In the window between detecting system cpucaps and patching alternatives we should not perform any accesses to TTBR0 addresses, and no userspace translation tables exist until after patching alternatives. Thus it is safe for this to use alternative_has_cap*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. So that the check for TTBR0 PAN in setup_cpu_features() can run prior to alternatives being patched, the call to system_uses_ttbr0_pan() is replaced with an explicit check of the ARM64_HAS_PAN bit in the system_cpucaps bitmap. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DITMark Rutland
In __cpu_suspend_exit() we use cpus_have_const_cap() to check for ARM64_HAS_DIT but this is not necessary and cpus_have_final_cap() of alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_HAS_DIT cpucap is detected and patched (along with all other cpucaps) before __cpu_suspend_exit() can run. We'll only use __cpu_suspend_exit() as part of PSCI cpuidle or hibernation, and both of these are intialized after system cpucaps are detected and patched: the PSCI cpuidle driver is registered with a device_initcall, hibernation restoration occurs in a late_initcall, and hibarnation saving is driven by usrspace. Therefore it is not necessary to use cpus_have_const_cap(), and using alternative_has_cap_*() or cpus_have_final_cap() is sufficient. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. To clearly document the ordering relationship between suspend/resume and alternatives patching, an explicit check for system_capabilities_finalized() is added to cpu_suspend() along with a comment block, which will make it easier to spot issues if code is changed in future to allow these functions to be reached earlier. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_CNPMark Rutland
In system_supports_cnp() we use cpus_have_const_cap() to check for ARM64_HAS_CNP, but this is only necessary so that the cpu_enable_cnp() callback can run prior to alternatives being patched, and otherwise this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpu_enable_cnp() callback is run immediately after the ARM64_HAS_CNP cpucap is detected system-wide under setup_system_capabilities(), prior to alternatives being patched. During this window cpu_enable_cnp() uses cpu_replace_ttbr1() to set the CNP bit for the swapper_pg_dir in TTBR1. No other users of the ARM64_HAS_CNP cpucap need the up-to-date value during this window: * As KVM isn't initialized yet, kvm_get_vttbr() isn't reachable. * As cpuidle isn't initialized yet, __cpu_suspend_exit() isn't reachable. * At this point all CPUs are using the swapper_pg_dir with a reserved ASID in TTBR1, and the idmap_pg_dir in TTBR0, so neither check_and_switch_context() nor cpu_do_switch_mm() need to do anything special. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. To allow cpu_enable_cnp() to function prior to alternatives being patched, cpu_replace_ttbr1() is split into cpu_replace_ttbr1() and cpu_enable_swapper_cnp(), with the former only used for early TTBR1 replacement, and the latter used by both cpu_enable_cnp() and __cpu_suspend_exit(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_BTIMark Rutland
In system_supports_bti() we use cpus_have_const_cap() to check for ARM64_HAS_BTI, but this is not necessary and alternative_has_cap_*() or cpus_have_final_*cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. When CONFIG_ARM64_BTI_KERNEL=y, the ARM64_HAS_BTI cpucap is a strict boot cpu feature which is detected and patched early on the boot cpu. All uses guarded by CONFIG_ARM64_BTI_KERNEL happen after the boot CPU has detected ARM64_HAS_BTI and patched boot alternatives, and hence can safely use alternative_has_cap_*() or cpus_have_final_boot_cap(). Regardless of CONFIG_ARM64_BTI_KERNEL, all other uses of ARM64_HAS_BTI happen after system capabilities have been finalized and alternatives have been patched. Hence these can safely use alternative_has_cap_*) or cpus_have_final_cap(). This patch splits system_supports_bti() into system_supports_bti() and system_supports_bti_kernel(), with the former handling where the cpucap affects userspace functionality, and ther latter handling where the cpucap affects kernel functionality. The use of cpus_have_const_cap() is replaced by cpus_have_final_cap() in cpus_have_const_cap, and cpus_have_final_boot_cap() in system_supports_bti_kernel(). This will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The use of cpus_have_final_cap() and cpus_have_final_boot_cap() will make it easier to spot if code is chaanged such that these run before the ARM64_HAS_BTI cpucap is guaranteed to have been finalized. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Use a positive cpucap for FP/SIMDMark Rutland
Currently we have a negative cpucap which describes the *absence* of FP/SIMD rather than *presence* of FP/SIMD. This largely works, but is somewhat awkward relative to other cpucaps that describe the presence of a feature, and it would be nicer to have a cpucap which describes the presence of FP/SIMD: * This will allow the cpucap to be treated as a standard ARM64_CPUCAP_SYSTEM_FEATURE, which can be detected with the standard has_cpuid_feature() function and ARM64_CPUID_FIELDS() description. * This ensures that the cpucap will only transition from not-present to present, reducing the risk of unintentional and/or unsafe usage of FP/SIMD before cpucaps are finalized. * This will allow using arm64_cpu_capabilities::cpu_enable() to enable the use of FP/SIMD later, with FP/SIMD being disabled at boot time otherwise. This will ensure that any unintentional and/or unsafe usage of FP/SIMD prior to this is trapped, and will ensure that FP/SIMD is never unintentionally enabled for userspace in mismatched big.LITTLE systems. This patch replaces the negative ARM64_HAS_NO_FPSIMD cpucap with a positive ARM64_HAS_FPSIMD cpucap, making changes as described above. Note that as FP/SIMD will now be trapped when not supported system-wide, do_fpsimd_acc() must handle these traps in the same way as for SVE and SME. The commentary in fpsimd_restore_current_state() is updated to describe the new scheme. No users of system_supports_fpsimd() need to know that FP/SIMD is available prior to alternatives being patched, so this is updated to use alternative_has_cap_likely() to check for the ARM64_HAS_FPSIMD cpucap, without generating code to test the system_cpucaps bitmap. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Rename SVE/SME cpu_enable functionsMark Rutland
The arm64_cpu_capabilities::cpu_enable() callbacks for SVE, SME, SME2, and FA64 are named with an unusual "${feature}_kernel_enable" pattern rather than the much more common "cpu_enable_${feature}". Now that we only use these as cpu_enable() callbacks, it would be nice to have them match the usual scheme. This patch renames the cpu_enable() callbacks to match this scheme. At the same time, the comment above cpu_enable_sve() is removed for consistency with the other cpu_enable() callbacks. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Use build-time assertions for cpucap orderingMark Rutland
Both sme2_kernel_enable() and fa64_kernel_enable() need to run after sme_kernel_enable(). This happens to be true today as ARM64_SME has a lower index than either ARM64_SME2 or ARM64_SME_FA64, and both functions have a comment to this effect. It would be nicer to have a build-time assertion like we for for can_use_gic_priorities() and has_gic_prio_relaxed_sync(), as that way it will be harder to miss any potential breakage. This patch replaces the comments with build-time assertions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Explicitly save/restore CPACR when probing SVE and SMEMark Rutland
When a CPUs onlined we first probe for supported features and propetites, and then we subsequently enable features that have been detected. This is a little problematic for SVE and SME, as some properties (e.g. vector lengths) cannot be probed while they are disabled. Due to this, the code probing for SVE properties has to enable SVE for EL1 prior to proving, and the code probing for SME properties has to enable SME for EL1 prior to probing. We never disable SVE or SME for EL1 after probing. It would be a little nicer to transiently enable SVE and SME during probing, leaving them both disabled unless explicitly enabled, as this would make it much easier to catch unintentional usage (e.g. when they are not present system-wide). This patch reworks the SVE and SME feature probing code to only transiently enable support at EL1, disabling after probing is complete. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Split kpti_install_ng_mappings()Mark Rutland
The arm64_cpu_capabilities::cpu_enable callbacks are intended for cpu-local feature enablement (e.g. poking system registers). These get called for each online CPU when boot/system cpucaps get finalized and enabled, and get called whenever a CPU is subsequently onlined. For KPTI with the ARM64_UNMAP_KERNEL_AT_EL0 cpucap, we use the kpti_install_ng_mappings() function as the cpu_enable callback. This does a mixture of cpu-local configuration (setting VBAR_EL1 to the appropriate trampoline vectors) and some global configuration (rewriting the swapper page tables to sue non-glboal mappings) that must happen at most once. This patch splits kpti_install_ng_mappings() into a cpu-local cpu_enable_kpti() initialization function and a system-wide kpti_install_ng_mappings() function. The cpu_enable_kpti() function is responsible for selecting the necessary cpu-local vectors each time a CPU is onlined, and the kpti_install_ng_mappings() function performs the one-time rewrite of the translation tables too use non-global mappings. Splitting the two makes the code a bit easier to follow and also allows the page table rewriting code to be marked as __init such that it can be freed after use. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Fixup user features at boot timeMark Rutland
For ARM64_WORKAROUND_2658417, we use a cpu_enable() callback to hide the ID_AA64ISAR1_EL1.BF16 ID register field. This is a little awkward as CPUs may attempt to apply the workaround concurrently, requiring that we protect the bulk of the callback with a raw_spinlock, and requiring some pointless work every time a CPU is subsequently hotplugged in. This patch makes this a little simpler by handling the masking once at boot time. A new user_feature_fixup() function is called at the start of setup_user_features() to mask the feature, matching the style of elf_hwcap_fixup(). The ARM64_WORKAROUND_2658417 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Note that the ARM64_WORKAROUND_2658417 capability is matched with ERRATA_MIDR_RANGE(), which implicitly gives the capability a ARM64_CPUCAP_LOCAL_CPU_ERRATUM type, which forbids the late onlining of a CPU with the erratum if the erratum was not present at boot time. Therefore this patch doesn't change the behaviour for late onlining. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Rework setup_cpu_features()Mark Rutland
Currently setup_cpu_features() handles a mixture of one-time kernel feature setup (e.g. cpucaps) and one-time user feature setup (e.g. ELF hwcaps). Subsequent patches will rework other one-time setup and expand the logic currently in setup_cpu_features(), and in preparation for this it would be helpful to split the kernel and user setup into separate functions. This patch splits setup_user_features() out of setup_cpu_features(), with a few additional cleanups of note: * setup_cpu_features() is renamed to setup_system_features() to make it clear that it handles system-wide feature setup rather than cpu-local feature setup. * setup_system_capabilities() is folded into setup_system_features(). * Presence of TTBR0 pan is logged immediately after update_cpu_capabilities(), so that this is guaranteed to appear alongside all the other detected system cpucaps. * The 'cwg' variable is removed as its value is only consumed once and it's simpler to use cache_type_cwg() directly without assigning its return value to a variable. * The call to setup_user_features() is moved after alternatives are patched, which will allow user feature setup code to depend on alternative branches and allow for simplifications in subsequent patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-13arm64: add FEAT_LSE128 HWCAPJoey Gouly
Add HWCAP for FEAT_LSE128 (128-bit Atomic instructions). Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20231003124544.858804-2-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-13arm64: add FEAT_LRCPC3 HWCAPJoey Gouly
FEAT_LRCPC3 adds more instructions to support the Release Consistency model. Add a HWCAP so that userspace can make decisions about instructions it can use. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230919162757.2707023-2-joey.gouly@arm.com [catalin.marinas@arm.com: change the HWCAP number] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-10arm: Remove now superfluous sentinel elem from ctl_table arraysJoel Granados
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Removed the sentinel as well as the explicit size from ctl_isa_vars. The size is redundant as the initialization sets it. Changed insn_emulation->sysctl from a 2 element array of struct ctl_table to a simple struct. This has no consequence for the sysctl registration as it is forwarded as a pointer. Removed sentinel from sve_defatul_vl_table, sme_default_vl_table, tagged_addr_sysctl_table and armv8_pmu_sysctl_table. This removal is safe because register_sysctl_sz and register_sysctl use the array size in addition to checking for the sentinel. Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-10-09KVM: arm64: Add handler for MOPS exceptionsKristina Martsenko
An Armv8.8 FEAT_MOPS main or epilogue instruction will take an exception if executed on a CPU with a different MOPS implementation option (A or B) than the CPU where the preceding prologue instruction ran. In this case the OS exception handler is expected to reset the registers and restart execution from the prologue instruction. A KVM guest may use the instructions at EL1 at times when the guest is not able to handle the exception, expecting that the instructions will only run on one CPU (e.g. when running UEFI boot services in the guest). As KVM may reschedule the guest between different types of CPUs at any time (on an asymmetric system), it needs to also handle the resulting exception itself in case the guest is not able to. A similar situation will also occur in the future when live migrating a guest from one type of CPU to another. Add handling for the MOPS exception to KVM. The handling can be shared with the EL0 exception handler, as the logic and register layouts are the same. The exception can be handled right after exiting a guest, which avoids the cost of returning to the host exit handler. Similarly to the EL0 exception handler, in case the main or epilogue instruction is being single stepped, it makes sense to finish the step before executing the prologue instruction, so advance the single step state machine. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230922112508.1774352-2-kristina.martsenko@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-06arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeupDouglas Anderson
In commit 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI") we started using a scheduler IPI to avoid a dedicated reschedule. When we did this, we used arch_smp_send_reschedule() directly rather than calling smp_send_reschedule(). The only difference is that calling arch_smp_send_reschedule() directly avoids tracing. Presumably we _don't_ want to avoid tracing here, so switch to smp_send_reschedule(). Fixes: 2b2d0a7a96ab ("arm64: smp: Remove dedicated wakeup IPI") Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-06arm64: smp: avoid NMI IPIs with broken MediaTek FWMark Rutland
Some MediaTek devices have broken firmware which corrupts some GICR registers behind the back of the OS, and pseudo-NMIs cannot be used on these devices. For more details see commit: 44bd78dd2b8897f5 ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware issues") We did not take this problem into account in commit: 331a1b3a836c0f38 ("arm64: smp: Add arch support for backtrace using pseudo-NMI") Since that commit arm64's SMP code will try to setup some IPIs as pseudo-NMIs, even on systems with broken FW. The GICv3 code will (rightly) reject attempts to request interrupts as pseudo-NMIs, resulting in boot-time failures. Avoid the problem by taking the broken FW into account when deciding to request IPIs as pseudo-NMIs. The GICv3 driver maintains a static_key named "supports_pseudo_nmis" which is false on systems with broken FW, and we can consult this within ipi_should_be_nmi(). Fixes: 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI") Reported-by: Chen-Yu Tsai <wenst@chromium.org> Closes: https://issuetracker.google.com/issues/197061987#comment68 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Tested-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-04rcu: Standardize explicit CPU-hotplug callsFrederic Weisbecker
rcu_report_dead() and rcutree_migrate_callbacks() have their headers in rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug functions. Also rcu_cpu_starting() and rcu_report_dead() have different naming conventions while they mirror each other's effects. Fix the headers and propose a naming that relates both functions and aligns with the prefix of other rcutree CPU-hotplug functions. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-10-04rcu: Assume rcu_report_dead() is always called locallyFrederic Weisbecker
rcu_report_dead() has to be called locally by the CPU that is going to exit the RCU state machine. Passing a cpu argument here is error-prone and leaves the possibility for a racy remote call. Use local access instead. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2023-09-29arm64: errata: Add Cortex-A520 speculative unprivileged load workaroundRob Herring
Implement the workaround for ARM Cortex-A520 erratum 2966298. On an affected Cortex-A520 core, a speculatively executed unprivileged load might leak data from a privileged load via a cache side channel. The issue only exists for loads within a translation regime with the same translation (e.g. same ASID and VMID). Therefore, the issue only affects the return to EL0. The workaround is to execute a TLBI before returning to EL0 after all loads of privileged data. A non-shareable TLBI to any address is sufficient. The workaround isn't necessary if page table isolation (KPTI) is enabled, but for simplicity it will be. Page table isolation should normally be disabled for Cortex-A520 as it supports the CSV3 feature and the E0PD feature (used when KASLR is enabled). Cc: stable@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230921194156.1050055-2-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-09-29arm64/sve: Report FEAT_SVE_B16B16 to userspaceMark Brown
SVE 2.1 introduced a new feature FEAT_SVE_B16B16 which adds instructions supporting the BFloat16 floating point format. Report this to userspace through the ID registers and hwcap. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230915-arm64-zfr-b16b16-el0-v1-1-f9aba807bdb5@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-25arm64: smp: Mark IPI globals as __ro_after_initDouglas Anderson
Mark the three IPI-related globals in smp.c as "__ro_after_init" since they are only ever set in set_smp_ipi_range(), which is marked "__init". This is a better and more secure marking than the old "__read_mostly". Suggested-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.7.I625d393afd71e1766ef73d3bfaac0b347a4afd19@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-25arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundupDouglas Anderson
Up until now we've been using the generic (weak) implementation for kgdb_roundup_cpus() when using kgdb on arm64. Let's move to a custom one. The advantage here is that, when pseudo-NMI is enabled on a device, we'll be able to round up CPUs using pseudo-NMI. This allows us to debug CPUs that are stuck with interrupts disabled. If pseudo-NMIs are not enabled then we'll fallback to just using an IPI, which is still slightly better than the generic implementation since it avoids the potential situation described in the generic kgdb_call_nmi_hook(). Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.6.I2ef26d1b3bfbed2d10a281942b0da7d9854de05e@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-25arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMIDouglas Anderson
There's no reason why IPI_CPU_STOP and IPI_CPU_CRASH_STOP can't be handled as NMI. They are very simple and everything in them is NMI-safe. Mark them as things to use NMI for if NMI is available. Suggested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Misono Tomohiro <misono.tomohiro@fujitsu.com> Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.5.Ifadbfd45b22c52edcb499034dd4783d096343260@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-25arm64: smp: Add arch support for backtrace using pseudo-NMIDouglas Anderson
Enable arch_trigger_cpumask_backtrace() support on arm64. This enables things much like they are enabled on arm32 (including some of the funky logic around NR_IPI, nr_ipi, and MAX_IPI) but with the difference that, unlike arm32, we'll try to enable the backtrace to use pseudo-NMI. NOTE: this patch is a squash of the little bit of code adding the ability to mark an IPI to try to use pseudo-NMI plus the little bit of code to hook things up for kgdb. This approach was decided upon in the discussion of v9 [1]. This patch depends on commit 8d539b84f1e3 ("nmi_backtrace: allow excluding an arbitrary CPU") since that commit changed the prototype of arch_trigger_cpumask_backtrace(), which this patch implements. [1] https://lore.kernel.org/r/ZORY51mF4alI41G1@FVFF77S0Q05N Co-developed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Misono Tomohiro <misono.tomohiro@fujitsu.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.4.Ie6c132b96ebbbcddbf6954b9469ed40a6960343c@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-25arm64: smp: Remove dedicated wakeup IPIMark Rutland
To enable NMI backtrace and KGDB's NMI cpu roundup, we need to free up at least one dedicated IPI. On arm64 the IPI_WAKEUP IPI is only used for the ACPI parking protocol, which itself is only used on some very early ARMv8 systems which couldn't implement PSCI. Remove the IPI_WAKEUP IPI, and rely on the IPI_RESCHEDULE IPI to wake CPUs from the parked state. This will cause a tiny amonut of redundant work to check the thread flags, but this is miniscule in relation to the cost of taking and handling the IPI in the first place. We can safely handle redundant IPI_RESCHEDULE IPIs, so there should be no functional impact as a result of this change. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230906090246.v13.3.I7209db47ef8ec151d3de61f59005bbc59fe8f113@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-25arm64: idle: Tag the arm64 idle functions as __cpuidleDouglas Anderson
As per the (somewhat recent) comment before the definition of `__cpuidle`, the tag is like `noinstr` but also marks a function so it can be identified by cpu_in_idle(). Let's add these markings to arm64 cpuidle functions With this change we get useful backtraces like: NMI backtrace for cpu N skipped: idling at cpu_do_idle+0x94/0x98 instead of useless backtraces when dumping all processors using nmi_cpu_backtrace(). NOTE: this patch won't make cpu_in_idle() work perfectly for arm64, but it doesn't hurt and does catch some cases. Specifically an example that wasn't caught in my testing looked like this: gic_cpu_sys_reg_init+0x1f8/0x314 gic_cpu_pm_notifier+0x40/0x78 raw_notifier_call_chain+0x5c/0x134 cpu_pm_notify+0x38/0x64 cpu_pm_exit+0x20/0x2c psci_enter_idle_state+0x48/0x70 cpuidle_enter_state+0xb8/0x260 cpuidle_enter+0x44/0x5c do_idle+0x188/0x30c Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20230906090246.v13.2.I4baba13e220bdd24d11400c67f137c35f07f82c7@changeid Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-25arm64/sve: Remove SMCR pseudo register from cpufeature codeMark Brown
For reasons that are not currently apparent during cpufeature enumeration we maintain a pseudo register for SMCR which records the maximum supported vector length using the value that would be written to SMCR_EL1.LEN to configure it. This is not exposed to userspace and is not sufficient for detecting unsupportable configurations, we need the more detailed checks in vec_update_vq_map() for that since we can't cope with missing vector lengths on late CPUs and KVM requires an exactly matching set of supported vector lengths as EL1 can enumerate VLs directly with the hardware. Remove the code, replacing the usage in sme_setup() with a query of the vq_map. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230913-arm64-vec-len-cpufeature-v1-2-cc69b0600a8a@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-25arm64/sve: Remove ZCR pseudo register from cpufeature codeMark Brown
For reasons that are not currently apparent during cpufeature enumeration we maintain a pseudo register for ZCR which records the maximum supported vector length using the value that would be written to ZCR_EL1.LEN to configure it. This is not exposed to userspace and is not sufficient for detecting unsupportable configurations, we need the more detailed checks in vec_update_vq_map() for that since we can't cope with missing vector lengths on late CPUs and KVM requires an exactly matching set of supported vector lengths as EL1 can enumerate VLs directly with the hardware. Remove the code, replacing the usage in sve_setup() with a query of the vq_map. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230913-arm64-vec-len-cpufeature-v1-1-cc69b0600a8a@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-09-18arm64: cpufeature: Fix CLRBHB and BC detectionKristina Martsenko
ClearBHB support is indicated by the CLRBHB field in ID_AA64ISAR2_EL1. Following some refactoring the kernel incorrectly checks the BC field instead. Fix the detection to use the right field. (Note: The original ClearBHB support had it as FTR_HIGHER_SAFE, but this patch uses FTR_LOWER_SAFE, which seems more correct.) Also fix the detection of BC (hinted conditional branches) to use FTR_LOWER_SAFE, so that it is not reported on mismatched systems. Fixes: 356137e68a9f ("arm64/sysreg: Make BHB clear feature defines match the architecture") Fixes: 8fcc8285c0e3 ("arm64/sysreg: Convert ID_AA64ISAR2_EL1 to automatic generation") Cc: stable@vger.kernel.org Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230912133429.2606875-1-kristina.martsenko@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-09-08Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The main one is a fix for a broken strscpy() conversion that landed in the merge window and broke early parsing of the kernel command line. - Fix an incorrect mask in the CXL PMU driver - Fix a regression in early parsing of the kernel command line - Fix an IP checksum OoB access reported by syzbot" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: csum: Fix OoB access in IP checksum code for negative lengths arm64/sysreg: Fix broken strncpy() -> strscpy() conversion perf: CXL: fix mismatched number of counters mask
2023-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Clean up vCPU targets, always returning generic v8 as the preferred target - Trap forwarding infrastructure for nested virtualization (used for traps that are taken from an L2 guest and are needed by the L1 hypervisor) - FEAT_TLBIRANGE support to only invalidate specific ranges of addresses when collapsing a table PTE to a block PTE. This avoids that the guest refills the TLBs again for addresses that aren't covered by the table PTE. - Fix vPMU issues related to handling of PMUver. - Don't unnecessary align non-stack allocations in the EL2 VA space - Drop HCR_VIRT_EXCP_MASK, which was never used... - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu parameter instead - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort() - Remove prototypes without implementations RISC-V: - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V s390: - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya) x86: - Clean up KVM's handling of Intel architectural events - Intel bugfixes - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug registers and generate/handle #DBs - Clean up LBR virtualization code - Fix a bug where KVM fails to set the target pCPU during an IRTE update - Fix fatal bugs in SEV-ES intrahost migration - Fix a bug where the recent (architecturally correct) change to reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it) - Retry APIC map recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR cannot diverge from the default when TSC scaling is disabled up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID - Rip out the ancient MMU_DEBUG crud and replace the useful bits with CONFIG_KVM_PROVE_MMU - Fix KVM's handling of !visible guest roots to avoid premature triple fault injection - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface that is needed by external users (currently only KVMGT), and fix a variety of issues in the process Generic: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations Selftests: - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs - Add support for printf() in guest code and covert all guest asserts to use printf-based reporting - Clean up the PMU event filter test and add new testcases - Include x86 selftests in the KVM x86 MAINTAINERS entry" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits) KVM: x86/mmu: Include mmu.h in spte.h KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots KVM: x86/mmu: Disallow guest from using !visible slots for page tables KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page KVM: x86/mmu: Harden new PGD against roots without shadow pages KVM: x86/mmu: Add helper to convert root hpa to shadow page drm/i915/gvt: Drop final dependencies on KVM internal details KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers KVM: x86/mmu: Drop @slot param from exported/external page-track APIs KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled KVM: x86/mmu: Assert that correct locks are held for page write-tracking KVM: x86/mmu: Rename page-track APIs to reflect the new reality KVM: x86/mmu: Drop infrastructure for multiple page-track modes KVM: x86/mmu: Use page-track notifiers iff there are external users KVM: x86/mmu: Move KVM-only page-track declarations to internal header KVM: x86: Remove the unused page-track hook track_flush_slot() drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() KVM: x86: Add a new page-track hook to handle memslot deletion drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot KVM: x86: Reject memslot MOVE operations if KVMGT is attached ...
2023-09-06arm64/sysreg: Fix broken strncpy() -> strscpy() conversionWill Deacon
Mostafa reports that commit d232606773a0 ("arm64/sysreg: refactor deprecated strncpy") breaks our early command-line parsing because the original code is working on space-delimited substrings rather than NUL-terminated strings. Rather than simply reverting the broken conversion patch, replace the strscpy() with a simple memcpy() with an explicit NUL-termination of the result. Reported-by: Mostafa Saleh <smostafa@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Fixes: d232606773a0 ("arm64/sysreg: refactor deprecated strncpy") Signed-off-by: Justin Stitt <justinstitt@google.com> Link: https://lore.kernel.org/r/20230905-strncpy-arch-arm64-v4-1-bc4b14ddfaef@google.com Link: https://lore.kernel.org/r/20230831162227.2307863-1-smostafa@google.com Signed-off-by: Will Deacon <will@kernel.org>
2023-08-31Merge tag 'x86_shstk_for_6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ...
2023-08-31Merge tag 'kvmarm-6.6' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.6 - Add support for TLB range invalidation of Stage-2 page tables, avoiding unnecessary invalidations. Systems that do not implement range invalidation still rely on a full invalidation when dealing with large ranges. - Add infrastructure for forwarding traps taken from a L2 guest to the L1 guest, with L0 acting as the dispatcher, another baby step towards the full nested support. - Simplify the way we deal with the (long deprecated) 'CPU target', resulting in a much needed cleanup. - Fix another set of PMU bugs, both on the guest and host sides, as we seem to never have any shortage of those... - Relax the alignment requirements of EL2 VA allocations for non-stack allocations, as we were otherwise wasting a lot of that precious VA space. - The usual set of non-functional cleanups, although I note the lack of spelling fixes...
2023-08-30Merge tag 'drm-next-2023-08-30' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "The drm core grew a new generic gpu virtual address manager, and new execution locking helpers. These are used by nouveau now to provide uAPI support for the userspace Vulkan driver. AMD had a bunch of new IP core support, loads of refactoring around fbdev, but mostly just the usual amount of stuff across the board. core: - fix gfp flags in drmm_kmalloc gpuva: - add new generic GPU VA manager (for nouveau initially) syncobj: - add new DRM_IOCTL_SYNCOBJ_EVENTFD ioctl dma-buf: - acquire resv lock for mmap() in exporters - support dma-buf self import automatically - docs fixes backlight: - fix fbdev interactions atomic: - improve logging prime: - remove struct gem_prim_mmap plus driver updates gem: - drm_exec: add locking over multiple GEM objects - fix lockdep checking fbdev: - make fbdev userspace interfaces optional - use linux device instead of fbdev device - use deferred i/o helper macros in various drivers - Make FB core selectable without drivers - Remove obsolete flags FBINFO_DEFAULT and FBINFO_FLAG_DEFAULT - Add helper macros and Kconfig tokens for DMA-allocated framebuffer ttm: - support init_on_free - swapout fixes panel: - panel-edp: Support AUO B116XAB01.4 - Support Visionox R66451 plus DT bindings - ld9040: - Backlight support - magic improved - Kconfig fix - Convert to of_device_get_match_data() - Fix Kconfig dependencies - simple: - Set bpc value to fix warning - Set connector type for AUO T215HVN01 - Support Innolux G156HCE-L01 plus DT bindings - ili9881: Support TDO TL050HDV35 LCD panel plus DT bindings - startek: Support KD070FHFID015 MIPI-DSI panel plus DT bindings - sitronix-st7789v: - Support Inanbo T28CP45TN89 plus DT bindings - Support EDT ET028013DMA plus DT bindings - Various cleanups - edp: Add timings for N140HCA-EAC - Allow panels and touchscreens to power sequence together - Fix Innolux G156HCE-L01 LVDS clock bridge: - debugfs for chains support - dw-hdmi: - Improve support for YUV420 bus format - CEC suspend/resume - update EDID on HDMI detect - dw-mipi-dsi: Fix enable/disable of DSI controller - lt9611uxc: Use MODULE_FIRMWARE() - ps8640: Remove broken EDID code - samsung-dsim: Fix command transfer - tc358764: - Handle HS/VS polarity - Use BIT() macro - Various cleanups - adv7511: Fix low refresh rate - anx7625: - Switch to macros instead of hardcoded values - locking fixes - tc358767: fix hardware delays - sitronix-st7789v: - Support panel orientation - Support rotation property - Add support for Jasonic JT240MHQS-HWT-EK-E3 plus DT bindings amdgpu: - SDMA 6.1.0 support - HDP 6.1 support - SMUIO 14.0 support - PSP 14.0 support - IH 6.1 support - Lots of checkpatch cleanups - GFX 9.4.3 updates - Add USB PD and IFWI flashing documentation - GPUVM updates - RAS fixes - DRR fixes - FAMS fixes - Virtual display fixes - Soft IH fixes - SMU13 fixes - Rework PSP firmware loading for other IPs - Kernel doc fixes - DCN 3.0.1 fixes - LTTPR fixes - DP MST fixes - DCN 3.1.6 fixes - SMU 13.x fixes - PSP 13.x fixes - SubVP fixes - GC 9.4.3 fixes - Display bandwidth calculation fixes - VCN4 secure submission fixes - Allow building DC on RISC-V - Add visible FB info to bo_print_info - HBR3 fixes - GFX9 MCBP fix - GMC10 vmhub index fix - GMC11 vmhub index fix - Create a new doorbell manager - SR-IOV fixes - initial freesync panel replay support - revert zpos properly until igt regression is fixeed - use TTM to manage doorbell BAR - Expose both current and average power via hwmon if supported amdkfd: - Cleanup CRIU dma-buf handling - Use KIQ to unmap HIQ - GFX 9.4.3 debugger updates - GFX 9.4.2 debugger fixes - Enable cooperative groups fof gfx11 - SVM fixes - Convert older APUs to use dGPU path like newer APUs - Drop IOMMUv2 path as it is no longer used - TBA fix for aldebaran i915: - ICL+ DSI modeset sequence - HDCP improvements - MTL display fixes and cleanups - HSW/BDW PSR1 restored - Init DDI ports in VBT order - General display refactors - Start using plane scale factor for relative data rate - Use shmem for dpt objects - Expose RPS thresholds in sysfs - Apply GuC SLPC min frequency softlimit correctly - Extend Wa_14015795083 to TGL, RKL, DG1 and ADL - Fix a VMA UAF for multi-gt platform - Do not use stolen on MTL due to HW bug - Check HuC and GuC version compatibility on MTL - avoid infinite GPU waits due to premature release of request memory - Fixes and updates for GSC memory allocation - Display SDVO fixes - Take stolen handling out of FBC code - Make i915_coherent_map_type GT-centric - Simplify shmem_create_from_object map_type msm: - SM6125 MDSS support - DPU: SM6125 DPU support - DSI: runtime PM support, burst mode support - DSI PHY: SM6125 support in 14nm DSI PHY driver - GPU: prepare for a7xx - fix a690 firmware - disable relocs on a6xx and newer radeon: - Lots of checkpatch cleanups ast: - improve device-model detection - Represent BMV as virtual connector - Report DP connection status nouveau: - add new exec/bind interface to support Vulkan - document some getparam ioctls - improve VRAM detection - various fixes/cleanups - workraound DPCD issues ivpu: - MMU updates - debugfs support - Support vpu4 virtio: - add sync object support atmel-hlcdc: - Support inverted pixclock polarity etnaviv: - runtime PM cleanups - hang handling fixes exynos: - use fbdev DMA helpers - fix possible NULL ptr dereference komeda: - always attach encoder omapdrm: - use fbdev DMA helpers ingenic: - kconfig regmap fixes loongson: - support display controller mediatek: - Small mtk-dpi cleanups - DisplayPort: support eDP and aux-bus - Fix coverity issues - Fix potential memory leak if vmap() fail mgag200: - minor fixes mxsfb: - support disabling overlay planes panfrost: - fix sync in IRQ handling ssd130x: - Support per-controller default resolution plus DT bindings - Reduce memory-allocation overhead - Improve intermediate buffer size computation - Fix allocation of temporary buffers - Fix pitch computation - Fix shadow plane allocation tegra: - use fbdev DMA helpers - Convert to devm_platform_ioremap_resource() - support bridge/connector - enable PM tidss: - Support TI AM625 plus DT bindings - Implement new connector model plus driver updates vkms: - improve write back support - docs fixes - support gamma LUT zynqmp-dpsub: - misc fixes" * tag 'drm-next-2023-08-30' of git://anongit.freedesktop.org/drm/drm: (1327 commits) drm/gpuva_mgr: remove unused prev pointer in __drm_gpuva_sm_map() drm/tests/drm_kunit_helpers: Place correct function name in the comment header drm/nouveau: uapi: don't pass NO_PREFETCH flag implicitly drm/nouveau: uvmm: fix unset region pointer on remap drm/nouveau: sched: avoid job races between entities drm/i915: Fix HPD polling, reenabling the output poll work as needed drm: Add an HPD poll helper to reschedule the poll work drm/i915: Fix TLB-Invalidation seqno store drm/ttm/tests: Fix type conversion in ttm_pool_test drm/msm/a6xx: Bail out early if setting GPU OOB fails drm/msm/a6xx: Move LLC accessors to the common header drm/msm/a6xx: Introduce a6xx_llc_read drm/ttm/tests: Require MMU when testing drm/panel: simple: Fix Innolux G156HCE-L01 LVDS clock Revert "Revert "drm/amdgpu/display: change pipe policy for DCN 2.0"" drm/amdgpu: Add memory vendor information drm/amd: flush any delayed gfxoff on suspend entry drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix drm/amdgpu: Remove gfxoff check in GFX v9.4.3 drm/amd/pm: Update pci link speed for smu v13.0.6 ...
2023-08-29Merge tag 'sysctl-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "Long ago we set out to remove the kitchen sink on kernel/sysctl.c arrays and placings sysctls to their own sybsystem or file to help avoid merge conflicts. Matthew Wilcox pointed out though that if we're going to do that we might as well also *save* space while at it and try to remove the extra last sysctl entry added at the end of each array, a sentintel, instead of bloating the kernel by adding a new sentinel with each array moved. Doing that was not so trivial, and has required slowing down the moves of kernel/sysctl.c arrays and measuring the impact on size by each new move. The complex part of the effort to help reduce the size of each sysctl is being done by the patient work of el señor Don Joel Granados. A lot of this is truly painful code refactoring and testing and then trying to measure the savings of each move and removing the sentinels. Although Joel already has code which does most of this work, experience with sysctl moves in the past shows is we need to be careful due to the slew of odd build failures that are possible due to the amount of random Kconfig options sysctls use. To that end Joel's work is split by first addressing the major housekeeping needed to remove the sentinels, which is part of this merge request. The rest of the work to actually remove the sentinels will be done later in future kernel releases. The preliminary math is showing this will all help reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array where we are able to remove each sentinel in the future. That also means there is no more bloating the kernel with the extra ~64 bytes per array moved as no new sentinels are created" * tag 'sysctl-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: Use ctl_table_size as stopping criteria for list macro sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl vrf: Update to register_net_sysctl_sz networking: Update to register_net_sysctl_sz netfilter: Update to register_net_sysctl_sz ax.25: Update to register_net_sysctl_sz sysctl: Add size to register_net_sysctl function sysctl: Add size arg to __register_sysctl_init sysctl: Add size to register_sysctl sysctl: Add a size arg to __register_sysctl_table sysctl: Add size argument to init_header sysctl: Add ctl_table_size to ctl_table_header sysctl: Use ctl_table_header in list_for_each_table_entry sysctl: Prefer ctl_table_header in proc_sysctl
2023-08-29Merge tag 'modules-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules updates from Luis Chamberlain: "Summary of the changes worth highlighting from most interesting to boring below: - Christoph Hellwig's symbol_get() fix to Nvidia's efforts to circumvent the protection he put in place in year 2020 to prevent proprietary modules from using GPL only symbols, and also ensuring proprietary modules which export symbols grandfather their taint. That was done through year 2020 commit 262e6ae7081d ("modules: inherit TAINT_PROPRIETARY_MODULE"). Christoph's new fix is done by clarifing __symbol_get() was only ever intended to prevent module reference loops by Linux kernel modules and so making it only find symbols exported via EXPORT_SYMBOL_GPL(). The circumvention tactic used by Nvidia was to use symbol_get() to purposely swift through proprietary module symbols and completely bypass our traditional EXPORT_SYMBOL*() annotations and community agreed upon restrictions. A small set of preamble patches fix up a few symbols which just needed adjusting for this on two modules, the rtc ds1685 and the networking enetc module. Two other modules just needed some build fixing and removal of use of __symbol_get() as they can't ever be modular, as was done by Arnd on the ARM pxa module and Christoph did on the mmc au1xmmc driver. This is a good reminder to us that symbol_get() is just a hack to address things which should be fixed through Kconfig at build time as was done in the later patches, and so ultimately it should just go. - Extremely late minor fix for old module layout 055f23b74b20 ("module: check for exit sections in layout_sections() instead of module_init_section()") by James Morse for arm64. Note that this layout thing is old, it is *not* Song Liu's commit ac3b43283923 ("module: replace module_layout with module_memory"). The issue however is very odd to run into and so there was no hurry to get this in fast. - Although the fix did not go through the modules tree I'd like to highlight the fix by Peter Zijlstra in commit 54097309620e ("x86/static_call: Fix __static_call_fixup()") now merged in your tree which came out of what was originally suspected to be a fallout of the the newer module layout changes by Song Liu commit ac3b43283923 ("module: replace module_layout with module_memory") instead of module_init_section()"). Thanks to the report by Christian Bricart and the debugging by Song Liu & Peter that turned to be noted as a kernel regression in place since v5.19 through commit ee88d363d156 ("x86,static_call: Use alternative RET encoding"). I highlight this to reflect and clarify that we haven't seen more fallout from ac3b43283923 ("module: replace module_layout with module_memory"). - RISC-V toolchain got mapping symbol support which prefix symbols with "$" to help with alignment considerations for disassembly. This is used to differentiate between incompatible instruction encodings when disassembling. RISC-V just matches what ARM/AARCH64 did for alignment considerations and Palmer Dabbelt extended is_mapping_symbol() to accept these symbols for RISC-V. We already had support for this for all architectures but it also checked for the second character, the RISC-V check Dabbelt added was just for the "$". After a bit of testing and fallout on linux-next and based on feedback from Masahiro Yamada it was decided to simplify the check and treat the first char "$" as unique for all architectures, and so we no make is_mapping_symbol() for all archs if the symbol starts with "$". The most relevant commit for this for RISC-V on binutils was: https://sourceware.org/pipermail/binutils/2021-July/117350.html - A late fix by Andrea Righi (today) to make module zstd decompression use vmalloc() instead of kmalloc() to account for large compressed modules. I suspect we'll see similar things for other decompression algorithms soon. - samples/hw_breakpoint minor fixes by Rong Tao, Arnd Bergmann and Chen Jiahao" * tag 'modules-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module/decompress: use vmalloc() for zstd decompression workspace kallsyms: Add more debug output for selftest ARM: module: Use module_init_layout_section() to spot init sections arm64: module: Use module_init_layout_section() to spot init sections module: Expose module_init_layout_section() modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules rtc: ds1685: use EXPORT_SYMBOL_GPL for ds1685_rtc_poweroff net: enetc: use EXPORT_SYMBOL_GPL for enetc_phc_index mmc: au1xmmc: force non-modular build and remove symbol_get usage ARM: pxa: remove use of symbol_get() samples/hw_breakpoint: mark sample_hbp as static samples/hw_breakpoint: fix building without module unloading samples/hw_breakpoint: Fix kernel BUG 'invalid opcode: 0000' modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols kernel: params: Remove unnecessary ‘0’ values from err module: Ignore RISC-V mapping symbols too