Age | Commit message (Collapse) | Author |
|
* kvm-arm64/nv-vgic:
: NV VGICv3 support, courtesy of Marc Zyngier
:
: Support for emulating the GIC hypervisor controls and managing shadow
: VGICv3 state for the L1 hypervisor. As part of it, bring in support for
: taking IRQs to the L1 and UAPI to manage the VGIC maintenance interrupt.
KVM: arm64: nv: Fail KVM init if asking for NV without GICv3
KVM: arm64: nv: Allow userland to set VGIC maintenance IRQ
KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
KVM: arm64: nv: Propagate used_lrs between L1 and L0 contexts
KVM: arm64: nv: Request vPE doorbell upon nested ERET to L2
KVM: arm64: nv: Respect virtual HCR_EL2.TWx setting
KVM: arm64: nv: Add Maintenance Interrupt emulation
KVM: arm64: nv: Handle L2->L1 transition on interrupt injection
KVM: arm64: nv: Nested GICv3 emulation
KVM: arm64: nv: Sanitise ICH_HCR_EL2 accesses
KVM: arm64: nv: Plumb handling of GICv3 EL2 accesses
KVM: arm64: nv: Add ICH_*_EL2 registers to vpcu_sysreg
KVM: arm64: nv: Load timer before the GIC
arm64: sysreg: Add layout for ICH_MISR_EL2
arm64: sysreg: Add layout for ICH_VTR_EL2
arm64: sysreg: Add layout for ICH_HCR_EL2
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Currently, for rqspinlock usage, the implementation of
smp_cond_load_acquire (and thus, atomic_cond_read_acquire) are
susceptible to stalls on arm64, because they do not guarantee that the
conditional expression will be repeatedly invoked if the address being
loaded from is not written to by other CPUs. When support for
event-streams is absent (which unblocks stuck WFE-based loops every
~100us), we may end up being stuck forever.
This causes a problem for us, as we need to repeatedly invoke the
RES_CHECK_TIMEOUT in the spin loop to break out when the timeout
expires.
Let us import the smp_cond_load_acquire_timewait implementation Ankur is
proposing in [0], and then fallback to it once it is merged.
While we rely on the implementation to amortize the cost of sampling
check_timeout for us, it will not happen when event stream support is
unavailable. This is not the common case, and it would be difficult to
fit our logic in the time_expr_ns >= time_limit_ns comparison, hence
just let it be.
[0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-9-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Platforms subscribe into generic ptdump implementation via GENERIC_PTDUMP.
But generic ptdump gets enabled via PTDUMP_CORE. These configs
combination is confusing as they sound very similar and does not
differentiate between platform's feature subscription and feature
enablement for ptdump. Rename the configs as ARCH_HAS_PTDUMP and PTDUMP
making it more clear and improve readability.
Link: https://lkml.kernel.org/r/20250226122404.1927473-6-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
ioremap_prot() currently accepts pgprot_val parameter as an unsigned long,
thus implicitly assuming that pgprot_val and pgprot_t could never be
bigger than unsigned long. But this assumption soon will not be true on
arm64 when using D128 pgtables. In 128 bit page table configuration,
unsigned long is 64 bit, but pgprot_t is 128 bit.
Passing platform abstracted pgprot_t argument is better as compared to
size based data types. Let's change the parameter to directly pass
pgprot_t like another similar helper generic_ioremap_prot().
Without this change in place, D128 configuration does not work on arm64 as
the top 64 bits gets silently stripped when passing the protection value
to this function.
Link: https://lkml.kernel.org/r/20250218101954.415331-1-anshuman.khandual@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Co-developed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This patch lays the groundwork for supporting batch PTE unmapping in
try_to_unmap_one(). It introduces range handling for TLB batch flushing,
with the range currently set to the size of PAGE_SIZE.
The function __flush_tlb_range_nosync() is architecture-specific and is
only used within arch/arm64. This function requires the mm structure
instead of the vma structure. To allow its reuse by
arch_tlbbatch_add_pending(), which operates with mm but not vma, this
patch modifies the argument of __flush_tlb_range_nosync() to take mm as
its parameter.
Link: https://lkml.kernel.org/r/20250214093015.51024-3-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shaoqin Huang <shahuang@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chis Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mauricio Faria de Oliveira <mfo@canonical.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add load-acquire ("load_acq", LDAR{,B,H}) and store-release
("store_rel", STLR{,B,H}) instructions. Breakdown of encoding:
size L (Rs) o0 (Rt2) Rn Rt
mask (0x3fdffc00): 00 111111 1 1 0 11111 1 11111 00000 00000
value, load_acq (0x08dffc00): 00 001000 1 1 0 11111 1 11111 00000 00000
value, store_rel (0x089ffc00): 00 001000 1 0 0 11111 1 11111 00000 00000
As suggested by Xu [1], include all Should-Be-One (SBO) bits ("Rs" and
"Rt2" fields) in the "mask" and "value" numbers.
It is worth noting that we are adding the "no offset" variant of STLR
instead of the "pre-index" variant, which has a different encoding.
Reference: Arm Architecture Reference Manual (ARM DDI 0487K.a,
ID032224),
* C6.2.161 LDAR
* C6.2.353 STLR
[1] https://lore.kernel.org/bpf/4e6641ce-3f1e-4251-8daf-4dd4b77d08c4@huaweicloud.com/
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/ba92057b7502ce4c9c9b03b7d637abe5e178134e.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We are planning to add load-acquire (LDAR{,B,H}) and store-release
(STLR{,B,H}) instructions to insn.{c,h}; add BIT(23) to mask of load_ex
and store_ex to prevent aarch64_insn_is_{load,store}_ex() from returning
false-positives for load-acquire and store-release instructions.
Reference: Arm Architecture Reference Manual (ARM DDI 0487K.a,
ID032224),
* C6.2.228 LDXR
* C6.2.165 LDAXR
* C6.2.161 LDAR
* C6.2.393 STXR
* C6.2.360 STLXR
* C6.2.353 STLR
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/5a4d2a52b2cc022bf86d0b572789f0b3bc3d5162.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Instead of creating and initializing _all_ hyp vcpus in pKVM when
the first host vcpu runs for the first time, initialize _each_
hyp vcpu in conjunction with its corresponding host vcpu.
Some of the host vcpu state (e.g., system registers and traps
values) is not initialized until the first time the host vcpu is
run. Therefore, initializing a hyp vcpu before its corresponding
host vcpu has run for the first time might not view the complete
host state of these vcpus.
Additionally, this behavior is inline with non-protected modes.
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250314111832.4137161-5-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Factor out the code for setting a vcpu's HCRX_EL2 traps in to a
separate inline function. This allows us to share the logic with
pKVM when setting the traps in protected mode.
No functional change intended.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250314111832.4137161-2-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main one is a horrible macro fix for our TLB flushing code which
resulted in over-invalidation on the MMU notifier path.
Summary:
- Fix population of the vmemmap for regions of memory that are
smaller than a section (128 MiB)
- Fix range-based TLB over-invalidation when invoked via a MMU
notifier"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Fix mmu notifiers for range-based invalidates
arm64: mm: Populate vmemmap at the page level if not section aligned
|
|
>From the TRM, MIDR_CORTEX_A76AE has a partnum of 0xDOE and an
implementor of 0x41 (ARM). Add the values.
Cc: stable@vger.kernel.org # dependency of the next fix in the series
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250107120555.v4.4.I151f3b7ee323bcc3082179b8c60c3cd03308aa94@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The code for detecting CPUs that are vulnerable to Spectre BHB was
based on a hardcoded list of CPU IDs that were known to be affected.
Unfortunately, the list mostly only contained the IDs of standard ARM
cores. The IDs for many cores that are minor variants of the standard
ARM cores (like many Qualcomm Kyro CPUs) weren't listed. This led the
code to assume that those variants were not affected.
Flip the code on its head and instead assume that a core is vulnerable
if it doesn't have CSV2_3 but is unrecognized as being safe. This
involves creating a "Spectre BHB safe" list.
As of right now, the only CPU IDs added to the "Spectre BHB safe" list
are ARM Cortex A35, A53, A55, A510, and A520. This list was created by
looking for cores that weren't listed in ARM's list [1] as per review
feedback on v2 of this patch [2]. Additionally Brahma A53 is added as
per mailing list feedback [3].
NOTE: this patch will not actually _mitigate_ anyone, it will simply
cause them to report themselves as vulnerable. If any cores in the
system are reported as vulnerable but not mitigated then the whole
system will be reported as vulnerable though the system will attempt
to mitigate with the information it has about the known cores.
[1] https://developer.arm.com/Arm%20Security%20Center/Spectre-BHB
[2] https://lore.kernel.org/r/20241219175128.GA25477@willie-the-truck
[3] https://lore.kernel.org/r/18dbd7d1-a46c-4112-a425-320c99f67a8d@broadcom.com
Fixes: 558c303c9734 ("arm64: Mitigate spectre style branch history side channels")
Cc: stable@vger.kernel.org
Reviewed-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20250107120555.v4.2.I2040fa004dafe196243f67ebcc647cbedbb516e6@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Address bytes shifted with a single 64 bit page table entry (any page table
level) has been always hard coded as 3 (aka 2^3 = 8). Although intuitive it
is not very readable or easy to reason about. Besides it is going to change
with D128, where each 128 bit page table entry will shift address bytes by
4 (aka 2^4 = 16) instead.
Let's just formalise this address bytes shift value into a new macro called
PTDESC_ORDER establishing a logical abstraction, thus improving readability
as well. While here re-organize EARLY_LEVEL macro along with its dependents
for better clarity. This does not cause any functional change. Also replace
all (PAGE_SHIFT - PTDESC_ORDER) instances with PTDESC_TABLE_SHIFT.
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: kasan-dev@googlegroups.com
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250311045710.550625-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Count the pages used by pKVM for the guest stage-2 in memory stats under
secondary pagetable, similarly to what the VHE mode does.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250313114038.1502357-4-vdonnefort@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
In order to account for memory dedicated to the stage-2 page-tables, use
a separated memcache when tearing down the VM. Meanwhile rename
reclaim_guest_pages to reflect the fact it only reclaim page-table
pages.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250313114038.1502357-3-vdonnefort@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Add flags to kvm_hyp_memcache and propagate the latter to the allocation
and free callbacks. This will later allow to account for memory, based
on the memcache configuration.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250313114038.1502357-2-vdonnefort@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Drop all PXD_TABLE_BIT macros as they are not used any more.
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250221044227.1145393-9-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Check for pmd_table() in pmd_trans_huge() rather then just checking for the
PMD_TABLE_BIT. But ensure all present-invalid entries are handled correctly
by always setting PTE_VALID before checking with pmd_table().
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250221044227.1145393-8-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
pud_bad() is currently defined in terms of pud_table(). Although for some
configs, pud_table() is hard-coded to true i.e. when using 64K base pages
or when page table levels are less than 3.
pud_bad() is intended to check that the pud is configured correctly. Hence
let's open-code the same check that the full version of pud_table() uses
into pud_bad(). Then it always performs the check regardless of the config.
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250221044227.1145393-7-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Check page table entries against PXD_TYPE_TABLE on PXD_TYPE_MASK mask bits
in [p4d|pgd]_bad() while determining a table entry instead of just checking
only for PXD_TABLE_BIT.
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250221044227.1145393-6-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Clear PXX_TYPE_MASK in [pmd|pud]_mkhuge() while creating section mappings
instead of just the PXX_TABLE_BIT and also set PXD_TYPE_SECT. Also ensure
PTE_VALID does not get modified in these helpers, because present-invalid
entries should preserve their state across.
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250221044227.1145393-5-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Clear PXX_TYPE_MASK bits in mk_[pmd|pud]_sect_prot() while creating section
mappings instead of just clearing the PXX_TABLE_BIT.
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250221044227.1145393-4-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The supported guest PMU version on a particular platform is ultimately a
KVM decision. Move PMUVer filtering into KVM code.
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250305202641.428114-9-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
KVM is about to learn some new tricks to virtualize PMUv3 on IMPDEF
hardware. As part of that, we now need to differentiate host support
from guest support for PMUv3.
Add a cpucap to determine if an architectural PMUv3 is present to guard
host usage of PMUv3 controls.
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250305202641.428114-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The PMU appears to have a separate register for filtering 'guest'
exception levels (i.e. EL1 and !ELIsInHost(EL0)) which has the same
layout as PMCR1_EL1. Conveniently, there exists a VHE register alias
(PMCR1_EL12) that can be used to configure it.
Support guest events by programming the EL12 register with the intended
guest kernel/userspace filters. Limit support for guest events to VHE
(i.e. kernel running at EL2), as it avoids involving KVM to context
switch PMU registers. VHE is the only supported mode on M* parts anyway,
so this isn't an actual feature limitation.
Tested-by: Janne Grunau <j@jannau.net>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250305202641.428114-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Commit fbc7e61195e2 ("KVM: arm64: Unconditionally save+flush host
FPSIMD/SVE/SME state") removed the implementation but leave declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250309070723.1390958-1-yuehaibing@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
FEAT_PMUv3p9 registers such as PMICNTR_EL0, PMICFILTR_EL0, and PMUACR_EL1
access from EL1 requires appropriate EL2 fine grained trap configuration
via FEAT_FGT2 based trap control registers HDFGRTR2_EL2 and HDFGWTR2_EL2.
Otherwise such register accesses will result in traps into EL2.
Add a new helper __init_el2_fgt2() which initializes FEAT_FGT2 based fine
grained trap control registers HDFGRTR2_EL2 and HDFGWTR2_EL2 (setting the
bits nPMICNTR_EL0, nPMICFILTR_EL0 and nPMUACR_EL1) to enable access into
PMICNTR_EL0, PMICFILTR_EL0, and PMUACR_EL1 registers.
Also update booting.rst with SCR_EL3.FGTEn2 requirement for all FEAT_FGT2
based registers to be accessible in EL2.
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: kvmarm@lists.linux.dev
Fixes: 0bbff9ed8165 ("perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access control")
Fixes: d8226d8cfbaf ("perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter")
Tested-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250227035119.2025171-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When a device performs DMA to a shared buffer using physical addresses,
(without Stage1 translation), the device must use the "{I}PA address" with the
top bit set in Realm. This is to make sure that a trusted device will be able
to write to shared buffers as well as the protected buffers. Thus, a Realm must
always program the full address including the "protection" bit, like AMD SME
encryption bits.
Enable this by providing arm64 specific dma_addr_{encrypted, canonical}
helpers for Realms. Please note that the VMM needs to similarly make sure that
the SMMU Stage2 in the Non-secure world is setup accordingly to map IPA at the
unprotected alias.
Cc: Will Deacon <will@kernel.org>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: 42be24a4178f ("arm64: Enable memory encrypt for Realms")
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250227144150.1667735-4-suzuki.poulose@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Update the __flush_tlb_range_op macro not to modify its parameters as
these are unexepcted semantics. In practice, this fixes the call to
mmu_notifier_arch_invalidate_secondary_tlbs() in
__flush_tlb_range_nosync() to use the correct range instead of an empty
range with start=end. The empty range was (un)lucky as it results in
taking the invalidate-all path that doesn't cause correctness issues,
but can certainly result in suboptimal perf.
This has been broken since commit 6bbd42e2df8f ("mmu_notifiers: call
invalidate_range() when invalidating TLBs") when the call to the
notifiers was added to __flush_tlb_range(). It predates the addition of
the __flush_tlb_range_op() macro from commit 360839027a6e ("arm64: tlb:
Refactor the core flush algorithm of __flush_tlb_range") that made the
bug hard to spot.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs")
Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Raghavendra Rao Ananta <rananta@google.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: iommu@lists.linux.dev
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Link: https://lore.kernel.org/r/20250304085127.2238030-1-pjaroszynski@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As tested on one example of a Qualcomm Kryo 5XX CPU [1] and one
example of a Qualcomm Kryo 6XX CPU [2], we don't need any extra MIDR
definitions for the cores in those processors. Add comments to make it
clear that these IDs weren't forgotten and just aren't needed.
[1] https://lore.kernel.org/r/l5rqbbxn6hktlcxooolkvi5n3arkht6zzhrvdjf6kis322nsup@5hsrak4cgteq/
[2] https://lore.kernel.org/r/tx7vtur7yea6ruefrkpkccqptahgmxnsrudwdz5uzcfxnng25b@afrr5bmdk2xa/
Suggested-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Trilok Soni <quic_tsoni@quicinc.com>
Link: https://lore.kernel.org/r/20241219131107.v3.2.I520dfa10ad9f598581c2591d631aa6e9e26f7603@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Add a definition for the Qualcomm Kryo 300-series Gold cores.
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Trilok Soni <quic_tsoni@quicinc.com>
Link: https://lore.kernel.org/r/20241219131107.v3.1.I18e0288742871393228249a768e5d56ea65d93dc@changeid
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The value of POR_EL0_INIT is not architectural, it is a software
decision. Since we have a dedicated header for POR_ELx, we might as
well define POR_EL0_INIT there.
While at it also define POR_EL0_INIT using POR_ELx_PERM_PREP(),
making it clearer that we are setting permissions for POIndex/pkey
0.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20250219164029.2309119-4-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
It is customary to list R, W, X permissions in that order. In fact
this is already the case for PIE constants (PIE_RWX). Rename POE_RXW
accordingly, as well as POE_XW (currently unused).
While at it also swap the W/X lines in
compute_s1_overlay_permissions() to follow the R, W, X order.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20250219164029.2309119-3-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
We currently have one helper to set a PIRx_ELx's permission field to
a given value, PIRx_ELx_PERM(), and another helper to extract a
permission field from POR_ELx, POR_ELx_IDX(). The naming is pretty
confusing - it isn't clear at all that "_PERM" corresponds to a
setter and "_IDX" to a getter.
This patch aims at improving the situation by using the same
suffixes as FIELD_PREP()/FIELD_GET(), which we have already adopted
for SYS_FIELD_{PREP,GET}():
* PIRx_ELx_PERM_PREP(), POR_ELx_PERM_PREP() create a register value
where the permission field for a given index is set to a given value.
* POR_ELx_PERM_GET() extracts the permission field from a given
register value for a given index.
These helpers are not implemented using FIELD_PREP()/FIELD_GET()
because the mask may not be constant, and they need to be usable in
assembly. They are all defined in asm/sysreg.h, as one would expect
for basic sysreg-related helpers.
Finally the new POR_ELx_PERM_* macros are used for existing
calculations in signal.c and mmu.c.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20250219164029.2309119-2-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Pull KVM fixes from Paolo Bonzini:
"arm64:
- Fix a couple of bugs affecting pKVM's PSCI relay implementation
when running in the hVHE mode, resulting in the host being entered
with the MMU in an unknown state, and EL2 being in the wrong mode
x86:
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow
- Ensure DEBUGCTL is context switched on AMD to avoid running the
guest with the host's value, which can lead to unexpected bus lock
#DBs
- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't
properly emulate BTF. KVM's lack of context switching has meant BTF
has always been broken to some extent
- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as
the guest can enable DebugSwap without KVM's knowledge
- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes
to RO memory" phase without actually generating a write-protection
fault
- Fix a printf() goof in the SEV smoke test that causes build
failures with -Werror
- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when
PERFMON_V2 isn't supported by KVM"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM
KVM: selftests: Fix printf() format goof in SEV smoke test
KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage
KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3
KVM: SVM: Save host DR masks on CPUs with DebugSwap
KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()
KVM: arm64: Initialize HCR_EL2.E2H early
KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
KVM: x86: Snapshot the host's DEBUGCTL in common x86
KVM: SVM: Suppress DEBUGCTL.BTF on AMD
KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
KVM: selftests: Assert that STI blocking isn't set after event injection
KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow
|
|
To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be an array of VDSO clocks.
Now that all preparatory changes are in place:
Split the clock related struct members into a separate struct
vdso_clock. Make sure all users are aware, that vdso_time_data is no longer
initialized as an array and vdso_clock is now the array inside
vdso_data. Remove the vdso_clock define, which mapped it to vdso_time_data
for the transition.
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250303-vdso-clock-v1-19-c1b5c69a166f@linutronix.de
|
|
To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of VDSO clocks. At the moment,
vdso_clock is simply a define which maps vdso_clock to vdso_time_data.
To prepare for the rework of the data structures, replace the struct
vdso_time_data pointer with a struct vdso_clock pointer where applicable.
No functional change.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250303-vdso-clock-v1-16-c1b5c69a166f@linutronix.de
|
|
asm/cache.h can be used during the vDSO build through vdso/cache.h.
Not all definitions in it are compatible with the vDSO, especially the
compat vDSO.
Hide the more complex definitions from the vDSO build.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250303-vdso-clock-v1-2-c1b5c69a166f@linutronix.de
|
|
Similarly to what was done with the memcpy() routines, make
copy_to_user(), copy_from_user() and clear_user() also use the Armv8.8
FEAT_MOPS instructions.
Both MOPS implementation options (A and B) are supported, including
asymmetric systems. The exception fixup code fixes up the registers
according to the option used.
In case of a fault the routines return precisely how much was not copied
(as required by the comment in include/linux/uaccess.h), as unprivileged
versions of CPY/SET are guaranteed not to have written past the
addresses reported in the GPRs.
The MOPS instructions could possibly be inlined into callers (and
patched to branch to the generic implementation if not detected;
similarly to what x86 does), but as a first step this patch just uses
them in the out-of-line routines.
Signed-off-by: Kristina Martšenko <kristina.martsenko@arm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20250228170006.390100-4-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
A subsequent patch will use CPY* instructions to copy between user and
kernel memory. Add handling for PAN faults caused by an intended kernel
memory access erroneously accessing user memory, in order to make it
easier to debug kernel bugs and to keep the same behavior as with
regular loads/stores.
Signed-off-by: Kristina Martšenko <kristina.martsenko@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20250228170006.390100-3-kristina.martsenko@arm.com
[catalin.marinas@arm.com: Folded the extable search into insn_may_access_user()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
A subsequent patch will use CPY* instructions to copy between user and
kernel memory. Add a new exception fixup type to avoid fixing up faults
on kernel memory accesses, in order to make it easier to debug kernel
bugs and to keep the same behavior as with regular loads/stores.
Signed-off-by: Kristina Martšenko <kristina.martsenko@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20250228170006.390100-2-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When CONFIG_ARM64_PA_BITS_52 is enabled, page table helpers __pte_to_phys()
and __phys_to_pte_val() are functions which return phys_addr_t and pteval_t
respectively as expected. But otherwise without this config being enabled,
they are defined as macros and their return types are implicit.
Until now this has worked out correctly as both pte_t and phys_addr_t data
types have been 64 bits. But with the introduction of 128 bit page tables,
pte_t becomes 128 bits. Hence this ends up with incorrect widths after the
conversions, which leads to compiler warnings.
Fix these warnings by converting __pte_to_phys() and __phys_to_pte_val()
as functions instead where the return types are handled explicitly.
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20250227022412.2015835-1-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
The VGIC maintenance IRQ signals various conditions about the LRs, when
the GIC's virtualization extension is used.
So far we didn't need it, but nested virtualization needs to know about
this interrupt, so add a userland interface to setup the IRQ number.
The architecture mandates that it must be a PPI, on top of that this code
only exports a per-device option, so the PPI is the same on all VCPUs.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[added some bits of documentation]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-16-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Running an L2 guest with GICv4 enabled goes absolutely nowhere, and gets
into a vicious cycle of nested ERET followed by nested exception entry
into the L1.
When KVM does a put on a runnable vCPU, it marks the vPE as nonresident
but does not request a doorbell IRQ. Behind the scenes in the ITS
driver's view of the vCPU, its_vpe::pending_last gets set to true to
indicate that context is still runnable.
This comes to a head when doing the nested ERET into L2. The vPE doesn't
get scheduled on the redistributor as it is exclusively part of the L1's
VGIC context. kvm_vgic_vcpu_pending_irq() returns true because the vPE
appears runnable, and KVM does a nested exception entry into the L1
before L2 ever gets off the ground.
This issue can be papered over by requesting a doorbell IRQ when
descheduling a vPE as part of a nested ERET. KVM needs this anyway to
kick the vCPU out of the L2 when an IRQ becomes pending for the L1.
Link: https://lore.kernel.org/r/20240823212703.3576061-4-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-13-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
they are not coming from the virtual EL2 and virtual HCR_EL2.TWx is set.
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-12-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
An interrupt being delivered to L1 while running L2 must result
in the correct exception being delivered to L1.
This means that if, on entry to L2, we found ourselves with pending
interrupts in the L1 distributor, we need to take immediate action.
This is done by posting a request which will prevent the entry in
L2, and deliver an IRQ exception to L1, forcing the switch.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
When entering a nested VM, we set up the hypervisor control interface
based on what the guest hypervisor has set. Especially, we investigate
each list register written by the guest hypervisor whether HW bit is
set. If so, we translate hw irq number from the guest's point of view
to the real hardware irq number if there is a mapping.
Co-developed-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
[Christoffer: Redesigned execution flow around vcpu load/put]
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: Rewritten to support GICv3 instead of GICv2, NV2 support]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-9-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
FEAT_NV2 comes with a bunch of register-to-memory redirection
involving the ICH_*_EL2 registers (LRs, APRs, VMCR, HCR).
Adds them to the vcpu_sysreg enumeration.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The ICH_MISR_EL2-related macros are missing a number of status
bits that we are about to handle. Take this opportunity to fully
describe the layout of that register as part of the automatic
generation infrastructure.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The ICH_VTR_EL2-related macros are missing a number of config
bits that we are about to handle. Take this opportunity to fully
describe the layout of that register as part of the automatic
generation infrastructure.
This results in a bit of churn to repaint constants that are now
generated with a different format.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250225172930.1850838-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|