summaryrefslogtreecommitdiff
path: root/arch/x86/platform/pvh
AgeCommit message (Collapse)Author
2025-04-18x86/asm: Remove semicolon from "rep" prefixesUros Bizjak
Minimum version of binutils required to compile the kernel is 2.25. This version correctly handles the "rep" prefixes, so it is possible to remove the semicolon, which was used to support ancient versions of GNU as. Due to the semicolon, the compiler considers "rep; insn" (or its alternate "rep\n\tinsn" form) as two separate instructions. Removing the semicolon makes asm length calculations more accurate, consequently making scheduling and inlining decisions of the compiler more accurate. Removing the semicolon also enables assembler checks involving "rep" prefixes. Trying to assemble e.g. "rep addl %eax, %ebx" results in: Error: invalid instruction `add' after `rep' Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20250418071437.4144391-2-ubizjak@gmail.com
2025-02-18x86/percpu/64: Use relative percpu offsetsBrian Gerst
The percpu section is currently linked at absolute address 0, because older compilers hard-coded the stack protector canary value at a fixed offset from the start of the GS segment. Now that the canary is a normal percpu variable, the percpu section does not need to be linked at a specific address. x86-64 will now calculate the percpu offsets as the delta between the initial percpu address and the dynamically allocated memory, like other architectures. Note that GSBASE is limited to the canonical address width (48 or 57 bits, sign-extended). As long as the kernel text, modules, and the dynamically allocated percpu memory are all in the negative address space, the delta will not overflow this limit. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-9-brgerst@gmail.com
2025-02-18x86/pvh: Use fixed_percpu_data for early boot GSBASEBrian Gerst
Instead of having a private area for the stack canary, use fixed_percpu_data for GSBASE like the native kernel. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-5-brgerst@gmail.com
2024-10-29x86/pvh: Avoid absolute symbol references in .head.textArd Biesheuvel
The .head.text section contains code that may execute from a different address than it was linked at. This is fragile, given that the x86 ABI can refer to global symbols via absolute or relative references, and the toolchain assumes that these are interchangeable, which they are not in this particular case. For this reason, all absolute symbol references are being removed from code that is emitted into .head.text. Subsequently, build time validation may be added that ensures that no absolute ELF relocations exist at all in that ELF section. In the case of the PVH code, the absolute references are in 32-bit code, which gets emitted with R_X86_64_32 relocations, and these are even more problematic going forward, as it prevents running the linker in PIE mode. So update the 64-bit code to avoid _pa(), and to only rely on relative symbol references: these are always 32-bits wide, even in 64-bit code, and are resolved by the linker at build time. Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Tested-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Message-ID: <20241009160438.3884381-12-ardb+git@google.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-10-29x86/xen: Avoid relocatable quantities in Xen ELF notesArd Biesheuvel
Xen puts virtual and physical addresses into ELF notes that are treated by the linker as relocatable by default. Doing so is not only pointless, given that the ELF notes are only intended for consumption by Xen before the kernel boots. It is also a KASLR leak, given that the kernel's ELF notes are exposed via the world readable /sys/kernel/notes. So emit these constants in a way that prevents the linker from marking them as relocatable. This involves place-relative relocations (which subtract their own virtual address from the symbol value) and linker provided absolute symbols that add the address of the place to the desired value. Tested-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Message-ID: <20241009160438.3884381-11-ardb+git@google.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-10-29x86/pvh: Omit needless clearing of phys_baseArd Biesheuvel
Since commit d9ec1158056b ("x86/boot/64: Use RIP_REL_REF() to assign 'phys_base'") phys_base is assigned directly rather than added to, so it is no longer necessary to clear it after use. Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Tested-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Message-ID: <20241009160438.3884381-10-ardb+git@google.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-10-29x86/pvh: Use correct size value in GDT descriptorArd Biesheuvel
The limit field in a GDT descriptor is an inclusive bound, and therefore one less than the size of the covered range. Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Tested-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Message-ID: <20241009160438.3884381-9-ardb+git@google.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-10-29x86/pvh: Call C code via the kernel virtual mappingArd Biesheuvel
Calling C code via a different mapping than it was linked at is problematic, because the compiler assumes that RIP-relative and absolute symbol references are interchangeable. GCC in particular may use RIP-relative per-CPU variable references even when not using -fpic. So call xen_prepare_pvh() via its kernel virtual mapping on x86_64, so that those RIP-relative references produce the correct values. This matches the pre-existing behavior for i386, which also invokes xen_prepare_pvh() via the kernel virtual mapping before invoking startup_32 with paging disabled again. Fixes: 7243b93345f7 ("xen/pvh: Bootstrap PVH guest") Tested-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Message-ID: <20241009160438.3884381-8-ardb+git@google.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25x86/pvh: Add 64bit relocation page tablesJason Andryuk
The PVH entry point is 32bit. For a 64bit kernel, the entry point must switch to 64bit mode, which requires a set of page tables. In the past, PVH used init_top_pgt. This works fine when the kernel is loaded at LOAD_PHYSICAL_ADDR, as the page tables are prebuilt for this address. If the kernel is loaded at a different address, they need to be adjusted. __startup_64() adjusts the prebuilt page tables for the physical load address, but it is 64bit code. The 32bit PVH entry code can't call it to adjust the page tables, so it can't readily be re-used. 64bit PVH entry needs page tables set up for identity map, the kernel high map and the direct map. pvh_start_xen() enters identity mapped. Inside xen_prepare_pvh(), it jumps through a pv_ops function pointer into the highmap. The direct map is used for __va() on the initramfs and other guest physical addresses. Add a dedicated set of prebuild page tables for PVH entry. They are adjusted in assembly before loading. Add XEN_ELFNOTE_PHYS32_RELOC to indicate support for relocation along with the kernel's loading constraints. The maximum load address, KERNEL_IMAGE_SIZE - 1, is determined by a single pvh_level2_ident_pgt page. It could be larger with more pages. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240823193630.2583107-6-jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25x86/pvh: Set phys_base when calling xen_prepare_pvh()Jason Andryuk
phys_base needs to be set for __pa() to work in xen_pvh_init() when finding the hypercall page. Set it before calling into xen_prepare_pvh(), which calls xen_pvh_init(). Clear it afterward to avoid __startup_64() adding to it and creating an incorrect value. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240823193630.2583107-4-jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-25x86/pvh: Make PVH entrypoint PIC for x86-64Jason Andryuk
The PVH entrypoint is 32bit non-PIC code running the uncompressed vmlinux at its load address CONFIG_PHYSICAL_START - default 0x1000000 (16MB). The kernel is loaded at that physical address inside the VM by the VMM software (Xen/QEMU). When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1 into the PVH container. There exist system firmwares (Coreboot/EDK2) with reserved memory at 16MB. This creates a conflict where the PVH kernel cannot be loaded at that address. Modify the PVH entrypoint to be position-indepedent to allow flexibility in load address. Only the 64bit entry path is converted. A 32bit kernel is not PIC, so calling into other parts of the kernel, like xen_prepare_pvh() and mk_pgtable_32(), don't work properly when relocated. This makes the code PIC, but the page tables need to be updated as well to handle running from the kernel high map. The UNWIND_HINT_END_OF_STACK is to silence: vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction after the lret into 64bit code. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240823193630.2583107-3-jason.andryuk@amd.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12xen, pvh: fix unbootable VMs by inlining memset() in xen_prepare_pvh()Alexey Dobriyan
If this memset() is not inlined than PVH early boot code can call into KASAN-instrumented memset() which results in unbootable VMs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20240802154253.482658-3-adobriyan@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12xen, pvh: fix unbootable VMs (PVH + KASAN - AMD_MEM_ENCRYPT)Alexey Dobriyan
Uninstrument arch/x86/platform/pvh/enlighten.c: KASAN has not been setup _this_ early in the boot process. Steps to reproduce: make allnoconfig make sure CONFIG_AMD_MEM_ENCRYPT is disabled AMD_MEM_ENCRYPT independently uninstruments lib/string.o so PVH boot code calls into uninstrumented memset() and memcmp() which can make the bug disappear depending on the compiler. enable CONFIG_PVH enable CONFIG_KASAN enable serial console this is fun exercise if you never done it from nothing :^) make qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -smp cpus=1 \ -m 4096 \ -serial stdio \ -kernel vmlinux \ -append 'console=ttyS0 ignore_loglevel' Messages on serial console will easily tell OK kernel from unbootable kernel. In bad case qemu hangs in an infinite loop stroboscoping "SeaBIOS" message. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20240802154253.482658-1-adobriyan@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-25x86/xen: fix memblock_reserve() usage on PVHRoger Pau Monne
The current usage of memblock_reserve() in init_pvh_bootparams() is done before the .bss is zeroed, and that used to be fine when memblock_reserved_init_regions implicitly ended up in the .meminit.data section. However after commit 73db3abdca58c memblock_reserved_init_regions ends up in the .bss section, thus breaking it's usage before the .bss is cleared. Move and rename the call to xen_reserve_extra_memory() so it's done in the x86_init.oem.arch_setup hook, which gets executed after the .bss has been zeroed, but before calling e820__memory_setup(). Fixes: 73db3abdca58c ("init/modpost: conditionally check section mismatch to __meminit*") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240725073116.14626-3-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-03-19Merge tag 'for-linus-6.9-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - Xen event channel handling fix for a regression with a rare kernel config and some added hardening - better support of running Xen dom0 in PVH mode - a cleanup for the xen grant-dma-iommu driver * tag 'for-linus-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: increment refcnt only if event channel is refcounted xen/evtchn: avoid WARN() when unbinding an event channel x86/xen: attempt to inflate the memory balloon on PVH xen/grant-dma-iommu: Convert to platform remove callback returning void
2024-03-13x86/xen: attempt to inflate the memory balloon on PVHRoger Pau Monne
When running as PVH or HVM Linux will use holes in the memory map as scratch space to map grants, foreign domain pages and possibly miscellaneous other stuff. However the usage of such memory map holes for Xen purposes can be problematic. The request of holesby Xen happen quite early in the kernel boot process (grant table setup already uses scratch map space), and it's possible that by then not all devices have reclaimed their MMIO space. It's not unlikely for chunks of Xen scratch map space to end up using PCI bridge MMIO window memory, which (as expected) causes quite a lot of issues in the system. At least for PVH dom0 we have the possibility of using regions marked as UNUSABLE in the e820 memory map. Either if the region is UNUSABLE in the native memory map, or it has been converted into UNUSABLE in order to hide RAM regions from dom0, the second stage translation page-tables can populate those areas without issues. PV already has this kind of logic, where the balloon driver is inflated at boot. Re-use the current logic in order to also inflate it when running as PVH. onvert UNUSABLE regions up to the ratio specified in EXTRA_MEM_RATIO to RAM, while reserving them using xen_add_extra_mem() (which is also moved so it's no longer tied to CONFIG_PV). [jgross: fixed build for CONFIG_PVH without CONFIG_XEN_PVH] Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20240220174341.56131-1-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2024-01-30x86: Do not include <asm/bootparam.h> in several filesThomas Zimmermann
Remove the include statement for <asm/bootparam.h> from several files that don't require it and limit the exposure of those definitions within the Linux kernel code. [ bp: Massage commit message. ] Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240112095000.8952-5-tzimmermann@suse.de
2024-01-08Merge tag 'x86-cleanups-2024-01-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: - Change global variables to local - Add missing kernel-doc function parameter descriptions - Remove unused parameter from a macro - Remove obsolete Kconfig entry - Fix comments - Fix typos, mostly scripted, manually reviewed and a micro-optimization got misplaced as a cleanup: - Micro-optimize the asm code in secondary_startup_64_no_verify() * tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch/x86: Fix typos x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify() x86/docs: Remove reference to syscall trampoline in PTI x86/Kconfig: Remove obsolete config X86_32_SMP x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro x86/mtrr: Document missing function parameters in kernel-doc x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
2024-01-03arch/x86: Fix typosBjorn Helgaas
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, script-generated changeVegard Nossum
Actually replace the numeric values by the new symbolic values. I used this to find all the existing users of the GDT_ENTRY*() macros: $ git grep -P 'GDT_ENTRY(_INIT)?\(' Some of the lines will exceed 80 characters, but some of them will be shorter again in the next couple of patches. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-4-vegard.nossum@oracle.com
2023-12-20x86/asm: Replace magic numbers in GDT descriptors, preparationsVegard Nossum
We'd like to replace all the magic numbers in various GDT descriptors with new, semantically meaningful, symbolic values. In order to be able to verify that the change doesn't cause any actual changes to the compiled binary code, I've split the change into two patches: - Part 1 (this commit): everything _but_ actually replacing the numbers - Part 2 (the following commit): _only_ replacing the numbers The reason we need this split for verification is that including new headers causes some spurious changes to the object files, mostly line number changes in the debug info but occasionally other subtle codegen changes. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231219151200.2878271-3-vegard.nossum@oracle.com
2023-04-28Merge tag 'objtool-core-2023-04-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown and panic functions - Misc improvements & fixes * tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/hyperv: Mark hv_ghcb_terminate() as noreturn scsi: message: fusion: Mark mpt_halt_firmware() __noreturn x86/cpu: Mark {hlt,resume}_play_dead() __noreturn btrfs: Mark btrfs_assertfail() __noreturn objtool: Include weak functions in global_noreturns check cpu: Mark nmi_panic_self_stop() __noreturn cpu: Mark panic_smp_self_stop() __noreturn arm64/cpu: Mark cpu_park_loop() and friends __noreturn x86/head: Mark *_start_kernel() __noreturn init: Mark start_kernel() __noreturn init: Mark [arch_call_]rest_init() __noreturn objtool: Generate ORC data for __pfx code x86/linkage: Fix padding for typed functions objtool: Separate prefix code from stack validation code objtool: Remove superfluous dead_end_function() check objtool: Add symbol iteration helpers objtool: Add WARN_INSN() scripts/objdump-func: Support multiple functions context_tracking: Fix KCSAN noinstr violation objtool: Add stackleak instrumentation to uaccess safe list ...
2023-03-30docs: move x86 documentation into Documentation/arch/Jonathan Corbet
Move the x86 documentation under Documentation/arch/ as a way of cleaning up the top-level directory and making the structure of our docs more closely match the structure of the source directories it describes. All in-kernel references to the old paths have been updated. Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/ Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-03-23x86,objtool: Split UNWIND_HINT_EMPTY in twoJosh Poimboeuf
Mark reported that the ORC unwinder incorrectly marks an unwind as reliable when the unwind terminates prematurely in the dark corners of return_to_handler() due to lack of information about the next frame. The problem is UNWIND_HINT_EMPTY is used in two different situations: 1) The end of the kernel stack unwind before hitting user entry, boot code, or fork entry 2) A blind spot in ORC coverage where the unwinder has to bail due to lack of information about the next frame The ORC unwinder has no way to tell the difference between the two. When it encounters an undefined stack state with 'end=1', it blindly marks the stack reliable, which can break the livepatch consistency model. Fix it by splitting UNWIND_HINT_EMPTY into UNWIND_HINT_UNDEFINED and UNWIND_HINT_END_OF_STACK. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/fd6212c8b450d3564b855e1cb48404d6277b4d9f.1677683419.git.jpoimboe@kernel.org
2022-04-19x86,xen,objtool: Add UNWIND hintPeter Zijlstra
SYM_CODE_START*() doesn't get auto-validated and needs an UNWIND hint to get checked, add one. vmlinux.o: warning: objtool: pvh_start_xen()+0x0: unreachable Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220408094718.321246297@infradead.org
2021-10-05x86/PVH: adjust function/data placementJan Beulich
Two of the variables can live in .init.data, allowing the open-coded placing in .data to go away. Another "variable" is used to communicate a size value only to very early assembly code, which hence can be both const and live in .init.*. Additionally two functions were lacking __init annotations. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/3b0bb22e-43f4-e459-c5cb-169f996b5669@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-04-27Merge tag 'x86_core_for_v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Borislav Petkov: - Turn the stack canary into a normal __percpu variable on 32-bit which gets rid of the LAZY_GS stuff and a lot of code. - Add an insn_decode() API which all users of the instruction decoder should preferrably use. Its goal is to keep the details of the instruction decoder away from its users and simplify and streamline how one decodes insns in the kernel. Convert its users to it. - kprobes improvements and fixes - Set the maximum DIE per package variable on Hygon - Rip out the dynamic NOP selection and simplify all the machinery around selecting NOPs. Use the simplified NOPs in objtool now too. - Add Xeon Sapphire Rapids to list of CPUs that support PPIN - Simplify the retpolines by folding the entire thing into an alternative now that objtool can handle alternatives with stack ops. Then, have objtool rewrite the call to the retpoline with the alternative which then will get patched at boot time. - Document Intel uarch per models in intel-family.h - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the exception on Intel. * tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) x86, sched: Treat Intel SNC topology as default, COD as exception x86/cpu: Comment Skylake server stepping too x86/cpu: Resort and comment Intel models objtool/x86: Rewrite retpoline thunk calls objtool: Skip magical retpoline .altinstr_replacement objtool: Cache instruction relocs objtool: Keep track of retpoline call sites objtool: Add elf_create_undef_symbol() objtool: Extract elf_symbol_add() objtool: Extract elf_strtab_concat() objtool: Create reloc sections implicitly objtool: Add elf_create_reloc() helper objtool: Rework the elf_rebuild_reloc_section() logic objtool: Fix static_call list generation objtool: Handle per arch retpoline naming objtool: Correctly handle retpoline thunk calls x86/retpoline: Simplify retpolines x86/alternatives: Optimize optimize_nops() x86: Add insn_decode_kernel() x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration ...
2021-03-21x86: Remove unusual Unicode characters from commentsIngo Molnar
We've accumulated a few unusual Unicode characters in arch/x86/ over the years, substitute them with their proper ASCII equivalents. A few of them were a whitespace equivalent: ' ' - the use was harmless. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org
2021-03-08x86/stackprotector/32: Make the canary into a regular percpu variableAndy Lutomirski
On 32-bit kernels, the stackprotector canary is quite nasty -- it is stored at %gs:(20), which is nasty because 32-bit kernels use %fs for percpu storage. It's even nastier because it means that whether %gs contains userspace state or kernel state while running kernel code depends on whether stackprotector is enabled (this is CONFIG_X86_32_LAZY_GS), and this setting radically changes the way that segment selectors work. Supporting both variants is a maintenance and testing mess. Merely rearranging so that percpu and the stack canary share the same segment would be messy as the 32-bit percpu address layout isn't currently compatible with putting a variable at a fixed offset. Fortunately, GCC 8.1 added options that allow the stack canary to be accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary percpu variable. This lets us get rid of all of the code to manage the stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess. (That name is special. We could use any symbol we want for the %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any name other than __stack_chk_guard.) Forcibly disable stackprotector on older compilers that don't support the new options and turn the stack canary into a percpu variable. The "lazy GS" approach is now used for all 32-bit configurations. Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels, it loads the GS selector and updates the user GSBASE accordingly. (This is unchanged.) On 32-bit kernels, it loads the GS selector and updates GSBASE, which is now always the user base. This means that the overall effect is the same on 32-bit and 64-bit, which avoids some ifdeffery. [ bp: Massage commit message. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
2021-01-26x86/xen/pvh: Annotate indirect branch as safeJosh Poimboeuf
This indirect jump is harmless; annotate it to keep objtool's retpoline validation happy. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/4797c72a258b26e06741c58ccd4a75c42db39c1d.1611263462.git.jpoimboe@redhat.com
2020-10-25treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-18x86/asm: Make some functions localJiri Slaby
There are a couple of assembly functions which are invoked only locally in the file they are defined. In C, they are marked "static". In assembly, annotate them using SYM_{FUNC,CODE}_START_LOCAL (and switch their ENDPROC to SYM_{FUNC,CODE}_END too). Whether FUNC or CODE is used, depends on whether ENDPROC or END was used for a particular function before. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andy@infradead.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: linux-arch@vger.kernel.org Cc: linux-efi <linux-efi@vger.kernel.org> Cc: linux-efi@vger.kernel.org Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: platform-driver-x86@vger.kernel.org Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20191011115108.12392-21-jslaby@suse.cz
2019-10-18xen/pvh: Annotate data appropriatelyJiri Slaby
Use the new SYM_DATA_START_LOCAL, and SYM_DATA_END* macros to get: 0000 8 OBJECT LOCAL DEFAULT 6 gdt 0008 32 OBJECT LOCAL DEFAULT 6 gdt_start 0028 0 OBJECT LOCAL DEFAULT 6 gdt_end 0028 256 OBJECT LOCAL DEFAULT 6 early_stack 0128 0 OBJECT LOCAL DEFAULT 6 early_stack Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andy Shevchenko <andy@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: linux-arch@vger.kernel.org Cc: platform-driver-x86@vger.kernel.org Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20191011115108.12392-15-jslaby@suse.cz
2019-06-08docs: fix broken documentation linksMauro Carvalho Chehab
Mostly due to x86 and acpi conversion, several documentation links are still pointing to the old file. Fix them. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Reviewed-by: Wolfram Sang <wsa@the-dreams.de> Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-04-25xen/pvh: correctly setup the PV EFI interface for dom0Roger Pau Monne
This involves initializing the boot params EFI related fields and the efi global variable. Without this fix a PVH dom0 doesn't detect when booted from EFI, and thus doesn't support accessing any of the EFI related data. Reported-by: PGNet Dev <pgnet.dev@gmail.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org # 4.19+
2018-12-13KVM: x86: Allow Qemu/KVM to use PVH entry pointMaran Wilson
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13xen/pvh: Move Xen code for getting mem map via hcall out of common fileMaran Wilson
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13xen/pvh: Move Xen specific PVH VM initialization out of common fileMaran Wilson
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13xen/pvh: Create a new file for Xen specific PVH codeMaran Wilson
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-12-13xen/pvh: Move PVH entry code out of Xen specific treeMaran Wilson
Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>