Age | Commit message (Collapse) | Author |
|
Pull kvm updates from Paolo Bonzini:
"ARM:
- Initial infrastructure for shadow stage-2 MMUs, as part of nested
virtualization enablement
- Support for userspace changes to the guest CTR_EL0 value, enabling
(in part) migration of VMs between heterogenous hardware
- Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
of the protocol
- FPSIMD/SVE support for nested, including merged trap configuration
and exception routing
- New command-line parameter to control the WFx trap behavior under
KVM
- Introduce kCFI hardening in the EL2 hypervisor
- Fixes + cleanups for handling presence/absence of FEAT_TCRX
- Miscellaneous fixes + documentation updates
LoongArch:
- Add paravirt steal time support
- Add support for KVM_DIRTY_LOG_INITIALLY_SET
- Add perf kvm-stat support for loongarch
RISC-V:
- Redirect AMO load/store access fault traps to guest
- perf kvm stat support
- Use guest files for IMSIC virtualization, when available
s390:
- Assortment of tiny fixes which are not time critical
x86:
- Fixes for Xen emulation
- Add a global struct to consolidate tracking of host values, e.g.
EFER
- Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
effective APIC bus frequency, because TDX
- Print the name of the APICv/AVIC inhibits in the relevant
tracepoint
- Clean up KVM's handling of vendor specific emulation to
consistently act on "compatible with Intel/AMD", versus checking
for a specific vendor
- Drop MTRR virtualization, and instead always honor guest PAT on
CPUs that support self-snoop
- Update to the newfangled Intel CPU FMS infrastructure
- Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
it reads '0' and writes from userspace are ignored
- Misc cleanups
x86 - MMU:
- Small cleanups, renames and refactoring extracted from the upcoming
Intel TDX support
- Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
that can't hold leafs SPTEs
- Unconditionally drop mmu_lock when allocating TDP MMU page tables
for eager page splitting, to avoid stalling vCPUs when splitting
huge pages
- Bug the VM instead of simply warning if KVM tries to split a SPTE
that is non-present or not-huge. KVM is guaranteed to end up in a
broken state because the callers fully expect a valid SPTE, it's
all but dangerous to let more MMU changes happen afterwards
x86 - AMD:
- Make per-CPU save_area allocations NUMA-aware
- Force sev_es_host_save_area() to be inlined to avoid calling into
an instrumentable function from noinstr code
- Base support for running SEV-SNP guests. API-wise, this includes a
new KVM_X86_SNP_VM type, encrypting/measure the initial image into
guest memory, and finalizing it before launching it. Internally,
there are some gmem/mmu hooks needed to prepare gmem-allocated
pages before mapping them into guest private memory ranges
This includes basic support for attestation guest requests, enough
to say that KVM supports the GHCB 2.0 specification
There is no support yet for loading into the firmware those signing
keys to be used for attestation requests, and therefore no need yet
for the host to provide certificate data for those keys.
To support fetching certificate data from userspace, a new KVM exit
type will be needed to handle fetching the certificate from
userspace.
An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
exit type to handle this was introduced in v1 of this patchset, but
is still being discussed by community, so for now this patchset
only implements a stub version of SNP Extended Guest Requests that
does not provide certificate data
x86 - Intel:
- Remove an unnecessary EPT TLB flush when enabling hardware
- Fix a series of bugs that cause KVM to fail to detect nested
pending posted interrupts as valid wake eents for a vCPU executing
HLT in L2 (with HLT-exiting disable by L1)
- KVM: x86: Suppress MMIO that is triggered during task switch
emulation
Explicitly suppress userspace emulated MMIO exits that are
triggered when emulating a task switch as KVM doesn't support
userspace MMIO during complex (multi-step) emulation
Silently ignoring the exit request can result in the
WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
for some other reason prior to purging mmio_needed
See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write
exits if emulator detects exception") for more details on KVM's
limitations with respect to emulated MMIO during complex emulator
flows
Generic:
- Rename the AS_UNMOVABLE flag that was introduced for KVM to
AS_INACCESSIBLE, because the special casing needed by these pages
is not due to just unmovability (and in fact they are only
unmovable because the CPU cannot access them)
- New ioctl to populate the KVM page tables in advance, which is
useful to mitigate KVM page faults during guest boot or after live
migration. The code will also be used by TDX, but (probably) not
through the ioctl
- Enable halt poll shrinking by default, as Intel found it to be a
clear win
- Setup empty IRQ routing when creating a VM to avoid having to
synchronize SRCU when creating a split IRQCHIP on x86
- Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
a flag that arch code can use for hooking both sched_in() and
sched_out()
- Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
truncating a bogus value from userspace, e.g. to help userspace
detect bugs
- Mark a vCPU as preempted if and only if it's scheduled out while in
the KVM_RUN loop, e.g. to avoid marking it preempted and thus
writing guest memory when retrieving guest state during live
migration blackout
Selftests:
- Remove dead code in the memslot modification stress test
- Treat "branch instructions retired" as supported on all AMD Family
17h+ CPUs
- Print the guest pseudo-RNG seed only when it changes, to avoid
spamming the log for tests that create lots of VMs
- Make the PMU counters test less flaky when counting LLC cache
misses by doing CLFLUSH{OPT} in every loop iteration"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
crypto: ccp: Add the SNP_VLEK_LOAD command
KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
KVM: x86: Replace static_call_cond() with static_call()
KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
x86/sev: Move sev_guest.h into common SEV header
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
KVM: x86: Suppress MMIO that is triggered during task switch emulation
KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
KVM: Document KVM_PRE_FAULT_MEMORY ioctl
mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
perf kvm: Add kvm-stat for loongarch64
LoongArch: KVM: Add PV steal time support in guest side
...
|
|
When requesting an attestation report a guest is able to specify whether
it wants SNP firmware to sign the report using either a Versioned Chip
Endorsement Key (VCEK), which is derived from chip-unique secrets, or a
Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD
Key Derivation Service (KDS) and derived from seeds allocated to
enrolled cloud service providers (CSPs).
For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load
them into the system after obtaining them from the KDS. Add a
corresponding userspace interface so to allow the loading of VLEK keys
into the system.
See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240501085210.2213060-21-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, the sev-guest driver uses the vmpck-0 key by default. When an
SVSM is present, the kernel is running at a VMPL other than 0 and the
vmpck-0 key is no longer available. If a specific vmpck key has not be
requested by the user via the vmpck_id module parameter, choose the
vmpck key based on the active VMPL level.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/b88081c5d88263176849df8ea93e90a404619cab.1717600736.git.thomas.lendacky@amd.com
|
|
A missing newline after "::" resulted in the htmldocs build failing to
recognize the start of a pre-formatted block of text, resulting in
kernel/linux/Documentation/virt/coco/sev-guest.rst:75: WARNING: Inline emphasis start-string without end-string.
kernel/linux/Documentation/virt/coco/sev-guest.rst:78: WARNING: Inline emphasis start-string without end-string.
kernel/linux/Documentation/virt/coco/sev-guest.rst:81: WARNING: Inline emphasis start-string without end-string.
kernel/linux/Documentation/virt/coco/sev-guest.rst:83: WARNING: Definition list ends without a blank line; unexpected unindent.
Fix it.
Fixes: f5db8841ebe5 ("crypto: ccp: Add the SNP_PLATFORM_STATUS command")
Closes: https://lore.kernel.org/linux-next/20240202145932.31c62fd6@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240202160544.2297320-1-michael.roth@amd.com
|
|
The SEV-SNP firmware provides the SNP_CONFIG command used to set various
system-wide configuration values for SNP guests, such as the reported
TCB version used when signing guest attestation reports. Add an
interface to set this via userspace.
[ mdr: Squash in doc patch from Dionna, drop extended request/
certificate handling and simplify this to a simple wrapper around
SNP_CONFIG fw cmd. ]
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Co-developed-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Co-developed-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-26-michael.roth@amd.com
|
|
The SNP_COMMIT command is used to commit the currently installed version
of the SEV firmware. Once committed, the firmware cannot be replaced
with a previous firmware version (cannot be rolled back). This command
will also update the reported TCB to match that of the currently
installed firmware.
[ mdr: Note the reported TCB update in the documentation/commit. ]
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-25-michael.roth@amd.com
|
|
This command is used to query the SNP platform status. See the SEV-SNP
spec for more details.
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240126041126.1927228-24-michael.roth@amd.com
|
|
The GHCB specification declares that the firmware error value for
a guest request will be stored in the lower 32 bits of EXIT_INFO_2. The
upper 32 bits are for the VMM's own error code. The fw_err argument to
snp_guest_issue_request() is thus a misnomer, and callers will need
access to all 64 bits.
The type of unsigned long also causes problems, since sw_exit_info2 is
u64 (unsigned long long) vs the argument's unsigned long*. Change this
type for issuing the guest request. Pass the ioctl command struct's error
field directly instead of in a local variable, since an incomplete guest
request may not set the error code, and uninitialized stack memory would
be written back to user space.
The firmware might not even be called, so bookend the call with the no
firmware call error and clear the error.
Since the "fw_err" field is really exitinfo2 split into the upper bits'
vmm error code and lower bits' firmware error code, convert the 64 bit
value to a union.
[ bp:
- Massage commit message
- adjust code
- Fix a build issue as
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202303070609.vX6wp2Af-lkp@intel.com
- print exitinfo2 in hex
Tom:
- Correct -EIO exit case. ]
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230214164638.1189804-5-dionnaglaze@google.com
Link: https://lore.kernel.org/r/20230307192449.24732-12-bp@alien8.de
|
|
The PSP can return a "firmware error" code of -1 in circumstances where
the PSP has not actually been called. To make this protocol unambiguous,
name the value SEV_RET_NO_FW_CALL.
[ bp: Massage a bit. ]
Signed-off-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20221207010210.2563293-2-dionnaglaze@google.com
|
|
TDX guest driver exposes IOCTL interfaces to service TDX guest
user-specific requests. Currently, it is only used to allow the user to
get the TDREPORT to support TDX attestation.
Details about the TDX attestation process are documented in
Documentation/x86/tdx.rst, and the IOCTL details are documented in
Documentation/virt/coco/tdx-guest.rst.
Operations like getting TDREPORT involves sending a blob of data as
input and getting another blob of data as output. It was considered
to use a sysfs interface for this, but it doesn't fit well into the
standard sysfs model for configuring values. It would be possible to
do read/write on files, but it would need multiple file descriptors,
which would be somewhat messy. IOCTLs seem to be the best fitting
and simplest model for this use case. The AMD sev-guest driver also
uses the IOCTL interface to support attestation.
[Bagas Sanjaya: Ack is for documentation portion]
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/all/20221116223820.819090-3-sathyanarayanan.kuppuswamy%40linux.intel.com
|
|
Rename the drivers/virt/coco/sevguest directory and files to sev-guest
so as to match the driver name.
[ bp: Rename Documentation/virt/coco/sevguest.rst too, as reported by sfr:
https://lore.kernel.org/r/20220427101059.3bf55262@canb.auug.org.au ]
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/2f5c9cb16e3a67599c8e3170f6c72c8712c47d53.1650464054.git.thomas.lendacky@amd.com
|
|
Update the documentation with information regarding SEV-SNP CPUID
Enforcement details and what sort of assurances it provides to guests.
Signed-off-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220307213356.2797205-47-brijesh.singh@amd.com
|
|
Version 2 of GHCB specification defines Non-Automatic-Exit (NAE) to get
extended guest report which is similar to the SNP_GET_REPORT ioctl. The
main difference is related to the additional data that will be returned.
That additional data returned is a certificate blob that can be used by
the SNP guest user. The certificate blob layout is defined in the GHCB
specification. The driver simply treats the blob as a opaque data and
copies it to userspace.
[ bp: Massage commit message, cast 1st arg of access_ok() ]
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220307213356.2797205-46-brijesh.singh@amd.com
|
|
The SNP_GET_DERIVED_KEY ioctl interface can be used by the SNP guest to
ask the firmware to provide a key derived from a root key. The derived
key may be used by the guest for any purposes it chooses, such as a
sealing key or communicating with the external entities.
See SEV-SNP firmware spec for more information.
[ bp: No need to memset "req" - it will get overwritten. ]
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Link: https://lore.kernel.org/r/20220307213356.2797205-45-brijesh.singh@amd.com
|
|
The SEV-SNP specification provides the guest a mechanism to communicate
with the PSP without risk from a malicious hypervisor who wishes to
read, alter, drop or replay the messages sent. The driver uses
snp_issue_guest_request() to issue GHCB SNP_GUEST_REQUEST or
SNP_EXT_GUEST_REQUEST NAE events to submit the request to PSP.
The PSP requires that all communication should be encrypted using key
specified through a struct snp_guest_platform_data descriptor.
Userspace can use SNP_GET_REPORT ioctl() to query the guest attestation
report.
See SEV-SNP spec section Guest Messages for more details.
[ bp: Remove the "what" from the commit message, massage. ]
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220307213356.2797205-44-brijesh.singh@amd.com
|