summaryrefslogtreecommitdiff
path: root/arch/x86/mm
AgeCommit message (Collapse)Author
2018-10-31memblock: remove _virt from APIs returning virtual addressMike Rapoport
The conversion is done using sed -i 's@memblock_virt_alloc@memblock_alloc@g' \ $(git grep -l memblock_virt_alloc) Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*Mike Rapoport
Make it explicit that the caller gets a physical address rather than a virtual one. This will also allow using meblock_alloc prefix for memblock allocations returning virtual address, which is done in the following patches. The conversion is done using the following semantic patch: @@ expression e1, e2, e3; @@ ( - memblock_alloc(e1, e2) + memblock_phys_alloc(e1, e2) | - memblock_alloc_nid(e1, e2, e3) + memblock_phys_alloc_nid(e1, e2, e3) | - memblock_alloc_try_nid(e1, e2, e3) + memblock_phys_alloc_try_nid(e1, e2, e3) ) Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-30x86/numa_emulation: Fix uniform-split numa emulationDave Jiang
The numa_emulation() routine in the 'uniform' case walks through all the physical 'memblk' instances and divides them into N emulated nodes with split_nodes_size_interleave_uniform(). As each physical node is consumed it is removed from the physical memblk array in the numa_remove_memblk_from() helper. Since split_nodes_size_interleave_uniform() handles advancing the array as the 'memblk' is consumed it is expected that the base of the array is always specified as the argument. Otherwise, on multi-socket (> 2) configurations the uniform-split capability can generate an invalid numa configuration leading to boot failures with signatures like the following: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Sending NMI from CPU 0 to CPUs 2: NMI backtrace for cpu 2 CPU: 2 PID: 1332 Comm: pgdatinit0 Not tainted 4.19.0-rc8-next-20181019-baseline #59 RIP: 0010:__init_single_page.isra.74+0x81/0x90 [..] Call Trace: deferred_init_pages+0xaa/0xe3 deferred_init_memmap+0x18f/0x318 kthread+0xf8/0x130 ? deferred_free_pages.isra.105+0xc9/0xc9 ? kthread_stop+0x110/0x110 ret_from_fork+0x35/0x40 Fixes: 1f6a2c6d9f121 ("x86/numa_emulation: Introduce uniform split capability") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/154049911459.2685845.9210186007479774286.stgit@dwillia2-desk3.amr.corp.intel.com
2018-10-29x86/mm/pat: Disable preemption around __flush_tlb_all()Sebastian Andrzej Siewior
The WARN_ON_ONCE(__read_cr3() != build_cr3()) in switch_mm_irqs_off() triggers every once in a while during a snapshotted system upgrade. The warning triggers since commit decab0888e6e ("x86/mm: Remove preempt_disable/enable() from __native_flush_tlb()"). The callchain is: get_page_from_freelist() -> post_alloc_hook() -> __kernel_map_pages() with CONFIG_DEBUG_PAGEALLOC enabled. Disable preemption during CR3 reset / __flush_tlb_all() and add a comment why preemption has to be disabled so it won't be removed accidentaly. Add another preemptible() check in __flush_tlb_all() to catch callers with enabled preemption when PGE is enabled, because PGE enabled does not trigger the warning in __native_flush_tlb(). Suggested by Andy Lutomirski. Fixes: decab0888e6e ("x86/mm: Remove preempt_disable/enable() from __native_flush_tlb()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181017103432.zgv46nlu3hc7k4rq@linutronix.de
2018-10-24Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "I have been slowly sorting out siginfo and this is the culmination of that work. The primary result is in several ways the signal infrastructure has been made less error prone. The code has been updated so that manually specifying SEND_SIG_FORCED is never necessary. The conversion to the new siginfo sending functions is now complete, which makes it difficult to send a signal without filling in the proper siginfo fields. At the tail end of the patchset comes the optimization of decreasing the size of struct siginfo in the kernel from 128 bytes to about 48 bytes on 64bit. The fundamental observation that enables this is by definition none of the known ways to use struct siginfo uses the extra bytes. This comes at the cost of a small user space observable difference. For the rare case of siginfo being injected into the kernel only what can be copied into kernel_siginfo is delivered to the destination, the rest of the bytes are set to 0. For cases where the signal and the si_code are known this is safe, because we know those bytes are not used. For cases where the signal and si_code combination is unknown the bits that won't fit into struct kernel_siginfo are tested to verify they are zero, and the send fails if they are not. I made an extensive search through userspace code and I could not find anything that would break because of the above change. If it turns out I did break something it will take just the revert of a single change to restore kernel_siginfo to the same size as userspace siginfo. Testing did reveal dependencies on preferring the signo passed to sigqueueinfo over si->signo, so bit the bullet and added the complexity necessary to handle that case. Testing also revealed bad things can happen if a negative signal number is passed into the system calls. Something no sane application will do but something a malicious program or a fuzzer might do. So I have fixed the code that performs the bounds checks to ensure negative signal numbers are handled" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits) signal: Guard against negative signal numbers in copy_siginfo_from_user32 signal: Guard against negative signal numbers in copy_siginfo_from_user signal: In sigqueueinfo prefer sig not si_signo signal: Use a smaller struct siginfo in the kernel signal: Distinguish between kernel_siginfo and siginfo signal: Introduce copy_siginfo_from_user and use it's return value signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE signal: Fail sigqueueinfo if si_signo != sig signal/sparc: Move EMT_TAGOVF into the generic siginfo.h signal/unicore32: Use force_sig_fault where appropriate signal/unicore32: Generate siginfo in ucs32_notify_die signal/unicore32: Use send_sig_fault where appropriate signal/arc: Use force_sig_fault where appropriate signal/arc: Push siginfo generation into unhandled_exception signal/ia64: Use force_sig_fault where appropriate signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn signal/ia64: Use the generic force_sigsegv in setup_frame signal/arm/kvm: Use send_sig_mceerr signal/arm: Use send_sig_fault where appropriate signal/arm: Use force_sig_fault where appropriate ...
2018-10-23Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Ingo Molnar: "The main changes: - Make the IBPB barrier more strict and add STIBP support (Jiri Kosina) - Micro-optimize and clean up the entry code (Andy Lutomirski) - ... plus misc other fixes" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Propagate information about RSB filling mitigation to sysfs x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation x86/speculation: Apply IBPB more strictly to avoid cross-process data leak x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION x86/pti/64: Remove the SYSCALL64 entry trampoline x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space x86/entry/64: Document idtentry
2018-10-23Merge branch 'x86-paravirt-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Ingo Molnar: "Two main changes: - Remove no longer used parts of the paravirt infrastructure and put large quantities of paravirt ops under a new config option PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross) - Enable PV spinlocks on Hyperv (Yi Sun)" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Enable PV qspinlock for Hyper-V x86/hyperv: Add GUEST_IDLE_MSR support x86/paravirt: Clean up native_patch() x86/paravirt: Prevent redefinition of SAVE_FLAGS macro x86/xen: Make xen_reservation_lock static x86/paravirt: Remove unneeded mmu related paravirt ops bits x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella x86/paravirt: Introduce new config option PARAVIRT_XXL x86/paravirt: Remove unused paravirt bits x86/paravirt: Use a single ops structure x86/paravirt: Remove clobbers from struct paravirt_patch_site x86/paravirt: Remove clobbers parameter from paravirt patch functions x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static x86/xen: Add SPDX identifier in arch/x86/xen files x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
2018-10-23Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "Lots of changes in this cycle: - Lots of CPA (change page attribute) optimizations and related cleanups (Thomas Gleixner, Peter Zijstra) - Make lazy TLB mode even lazier (Rik van Riel) - Fault handler cleanups and improvements (Dave Hansen) - kdump, vmcore: Enable kdumping encrypted memory with AMD SME enabled (Lianbo Jiang) - Clean up VM layout documentation (Baoquan He, Ingo Molnar) - ... plus misc other fixes and enhancements" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry() x86/mm: Kill stray kernel fault handling comment x86/mm: Do not warn about PCI BIOS W+X mappings resource: Clean it up a bit resource: Fix find_next_iomem_res() iteration issue resource: Include resource end in walk_*() interfaces x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error x86/mm: Remove spurious fault pkey check x86/mm/vsyscall: Consider vsyscall page part of user address space x86/mm: Add vsyscall address helper x86/mm: Fix exception table comments x86/mm: Add clarifying comments for user addr space x86/mm: Break out user address space handling x86/mm: Break out kernel address space handling x86/mm: Clarify hardware vs. software "error_code" x86/mm/tlb: Make lazy TLB mode lazier x86/mm/tlb: Add freed_tables element to flush_tlb_info x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range smp,cpumask: introduce on_each_cpu_cond_mask smp: use __cpumask_set_cpu in on_each_cpu_cond ...
2018-10-23Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and misc x86 updates from Ingo Molnar: "Lots of changes in this cycle - in part because locking/core attracted a number of related x86 low level work which was easier to handle in a single tree: - Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E. McKenney, Andrea Parri) - lockdep scalability improvements and micro-optimizations (Waiman Long) - rwsem improvements (Waiman Long) - spinlock micro-optimization (Matthew Wilcox) - qspinlocks: Provide a liveness guarantee (more fairness) on x86. (Peter Zijlstra) - Add support for relative references in jump tables on arm64, x86 and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens) - Be a lot less permissive on weird (kernel address) uaccess faults on x86: BUG() when uaccess helpers fault on kernel addresses (Jann Horn) - macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav Amit) - ... and a handful of other smaller changes as well" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) locking/lockdep: Make global debug_locks* variables read-mostly locking/lockdep: Fix debug_locks off performance problem locking/pvqspinlock: Extend node size when pvqspinlock is configured locking/qspinlock_stat: Count instances of nested lock slowpaths locking/qspinlock, x86: Provide liveness guarantee x86/asm: 'Simplify' GEN_*_RMWcc() macros locking/qspinlock: Rework some comments locking/qspinlock: Re-order code locking/lockdep: Remove duplicated 'lock_class_ops' percpu array x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y futex: Replace spin_is_locked() with lockdep locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs x86/extable: Macrofy inline assembly code to work around GCC inlining bugs x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs x86/refcount: Work around GCC inlining bug x86/objtool: Use asm macros to work around GCC inlining bugs ...
2018-10-23Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes are: - Add support for enlisting the help of the EFI firmware to create memory reservations that persist across kexec. - Add page fault handling to the runtime services support code on x86 so we can more gracefully recover from buggy EFI firmware. - Fix command line handling on x86 for the boot path that omits the stub's PE/COFF entry point. - Other assorted fixes and updates" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: boot: Fix EFI stub alignment efi/x86: Call efi_parse_options() from efi_main() efi/x86: earlyprintk - Add 64bit efi fb address support efi/x86: drop task_lock() from efi_switch_mm() efi/x86: Handle page faults occurring while running EFI runtime services efi: Make efi_rts_work accessible to efi page fault handler efi/efi_test: add exporting ResetSystem runtime service efi/libstub: arm: support building with clang efi: add API to reserve memory persistently across kexec reboot efi/arm: libstub: add a root memreserve config table efi: honour memory reservations passed via a linux specific config table
2018-10-21x86/mm: Kill stray kernel fault handling commentDave Hansen
I originally had matching user and kernel comments, but the kernel one got improved. Some errant conflict resolution kicked the commment somewhere wrong. Kill it. Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: aa37c51b94 ("x86/mm: Break out user address space handling") Link: http://lkml.kernel.org/r/20181019140842.12F929FA@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-10x86/mm: Do not warn about PCI BIOS W+X mappingsThomas Gleixner
PCI BIOS requires the BIOS area 0x0A0000-0x0FFFFFF to be mapped W+X for various legacy reasons. When CONFIG_DEBUG_WX is enabled, this triggers the WX warning, but this is misleading because the mapping is required and is not a result of an accidental oversight. Prevent the full warning when PCI BIOS is enabled and the detected WX mapping is in the BIOS area. Just emit a pr_warn() which denotes the fact. This is partially duplicating the info which the PCI BIOS code emits when it maps the area as executable, but that info is not in the context of the WX checking output. Remove the extra %p printout in the WARN_ONCE() while at it. %pS is enough. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kees Cook <keescook@chromium.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1810082151160.2455@nanos.tec.linutronix.de
2018-10-09x86/mm: Remove spurious fault pkey checkDave Hansen
Spurious faults only ever occur in the kernel's address space. They are also constrained specifically to faults with one of these error codes: X86_PF_WRITE | X86_PF_PROT X86_PF_INSTR | X86_PF_PROT So, it's never even possible to reach spurious_kernel_fault_check() with X86_PF_PK set. In addition, the kernel's address space never has pages with user-mode protections. Protection Keys are only enforced on pages with user-mode protection. This gives us lots of reasons to not check for protection keys in our sprurious kernel fault handling. But, let's also add some warnings to ensure that these assumptions about protection keys hold true. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160231.243A0D6A@viggo.jf.intel.com
2018-10-09x86/mm/vsyscall: Consider vsyscall page part of user address spaceDave Hansen
The vsyscall page is weird. It is in what is traditionally part of the kernel address space. But, it has user permissions and we handle faults on it like we would on a user page: interrupts on. Right now, we handle vsyscall emulation in the "bad_area" code, which is used for both user-address-space and kernel-address-space faults. Move the handling to the user-address-space code *only* and ensure we get there by "excluding" the vsyscall page from the kernel address space via a check in fault_in_kernel_space(). Since the fault_in_kernel_space() check is used on 32-bit, also add a 64-bit check to make it clear we only use this path on 64-bit. Also move the unlikely() to be in is_vsyscall_vaddr() itself. This helps clean up the kernel fault handling path by removing a case that can happen in normal[1] operation. (Yeah, yeah, we can argue about the vsyscall page being "normal" or not.) This also makes sanity checks easier, like the "we never take pkey faults in the kernel address space" check in the next patch. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160230.6E9336EE@viggo.jf.intel.com
2018-10-09x86/mm: Add vsyscall address helperDave Hansen
We will shortly be using this check in two locations. Put it in a helper before we do so. Let's also insert PAGE_MASK instead of the open-coded ~0xfff. It is easier to read and also more obviously correct considering the implicit type conversion that has to happen when it is not an implicit 'unsigned long'. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160228.C593509B@viggo.jf.intel.com
2018-10-09x86/mm: Fix exception table commentsDave Hansen
The comments here are wrong. They are too absolute about where faults can occur when running in the kernel. The comments are also a bit hard to match up with the code. Trim down the comments, and make them more precise. Also add a comment explaining why we are doing the bad_area_nosemaphore() path here. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160227.077DDD7A@viggo.jf.intel.com
2018-10-09x86/mm: Add clarifying comments for user addr spaceDave Hansen
The SMAP and Reserved checking do not have nice comments. Add some to clarify and make it match everything else. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160225.FFD44B8D@viggo.jf.intel.com
2018-10-09x86/mm: Break out user address space handlingDave Hansen
The last patch broke out kernel address space handing into its own helper. Now, do the same for user address space handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160223.9C4F6440@viggo.jf.intel.com
2018-10-09x86/mm: Break out kernel address space handlingDave Hansen
The page fault handler (__do_page_fault()) basically has two sections: one for handling faults in the kernel portion of the address space and another for faults in the user portion of the address space. But, these two parts don't stick out that well. Let's make that more clear from code separation and naming. Pull kernel fault handling into its own helper, and reflect that naming by renaming spurious_fault() -> spurious_kernel_fault(). Also, rewrite the vmalloc() handling comment a bit. It was a bit stale and also glossed over the reserved bit handling. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160222.401F4E10@viggo.jf.intel.com
2018-10-09x86/mm: Clarify hardware vs. software "error_code"Dave Hansen
We pass around a variable called "error_code" all around the page fault code. Sounds simple enough, especially since "error_code" looks like it exactly matches the values that the hardware gives us on the stack to report the page fault error code (PFEC in SDM parlance). But, that's not how it works. For part of the page fault handler, "error_code" does exactly match PFEC. But, during later parts, it diverges and starts to mean something a bit different. Give it two names for its two jobs. The place it diverges is also really screwy. It's only in a spot where the hardware tells us we have kernel-mode access that occurred while we were in usermode accessing user-controlled address space. Add a warning in there. Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180928160220.4A2272C9@viggo.jf.intel.com
2018-10-09x86/mm/tlb: Make lazy TLB mode lazierRik van Riel
Lazy TLB mode can result in an idle CPU being woken up by a TLB flush, when all it really needs to do is reload %CR3 at the next context switch, assuming no page table pages got freed. Memory ordering is used to prevent race conditions between switch_mm_irqs_off, which checks whether .tlb_gen changed, and the TLB invalidation code, which increments .tlb_gen whenever page table entries get invalidated. The atomic increment in inc_mm_tlb_gen is its own barrier; the context switch code adds an explicit barrier between reading tlbstate.is_lazy and next->context.tlb_gen. CPUs in lazy TLB mode remain part of the mm_cpumask(mm), both because that allows TLB flush IPIs to be sent at page table freeing time, and because the cache line bouncing on the mm_cpumask(mm) was responsible for about half the CPU use in switch_mm_irqs_off(). We can change native_flush_tlb_others() without touching other (paravirt) implementations of flush_tlb_others() because we'll be flushing less. The existing implementations flush more and are therefore still correct. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-8-riel@surriel.com
2018-10-09x86/mm/tlb: Add freed_tables element to flush_tlb_infoRik van Riel
Pass the information on to native_flush_tlb_others. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-7-riel@surriel.com
2018-10-09x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_rangeRik van Riel
Add an argument to flush_tlb_mm_range to indicate whether page tables are about to be freed after this TLB flush. This allows for an optimization of flush_tlb_mm_range to skip CPUs in lazy TLB mode. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-6-riel@surriel.com
2018-10-09x86/mm/tlb: Restructure switch_mm_irqs_off()Rik van Riel
Move some code that will be needed for the lazy -> !lazy state transition when a lazy TLB CPU has gotten out of date. No functional changes, since the if (real_prev == next) branch always returns. (cherry picked from commit 61d0beb5796ab11f7f3bf38cb2eccc6579aaa70b) Cc: npiggin@gmail.com Cc: efault@gmx.de Cc: will.deacon@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180716190337.26133-4-riel@surriel.com
2018-10-09x86/mm/tlb: Always use lazy TLB modeRik van Riel
On most workloads, the number of context switches far exceeds the number of TLB flushes sent. Optimizing the context switches, by always using lazy TLB mode, speeds up those workloads. This patch results in about a 1% reduction in CPU use on a two socket Broadwell system running a memcache like workload. Cc: npiggin@gmail.com Cc: efault@gmx.de Cc: will.deacon@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> (cherry picked from commit 95b0e6357d3e4e05349668940d7ff8f3b7e7e11e) Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180716190337.26133-7-riel@surriel.com
2018-10-09x86/mm: Page size aware flush_tlb_mm_range()Peter Zijlstra
Use the new tlb_get_unmap_shift() to determine the stride of the INVLPG loop. Cc: Nick Piggin <npiggin@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2018-10-09x86/mm: Avoid VLA in pgd_alloc()Kees Cook
Arnd Bergmann reported that turning on -Wvla found a new (unintended) VLA usage: arch/x86/mm/pgtable.c: In function 'pgd_alloc': include/linux/build_bug.h:29:45: error: ISO C90 forbids variable length array 'u_pmds' [-Werror=vla] arch/x86/mm/pgtable.c:190:34: note: in expansion of macro 'static_cpu_has' #define PREALLOCATED_USER_PMDS (static_cpu_has(X86_FEATURE_PTI) ? \ ^~~~~~~~~~~~~~ arch/x86/mm/pgtable.c:431:16: note: in expansion of macro 'PREALLOCATED_USER_PMDS' pmd_t *u_pmds[PREALLOCATED_USER_PMDS]; ^~~~~~~~~~~~~~~~~~~~~~ Use the actual size of the array that is used for X86_FEATURE_PTI, which is known at build time, instead of the variable size. [ mingo: Squashed original fix with followup fix to avoid bisection breakage, wrote new changelog. ] Reported-by: Arnd Bergmann <arnd@arndb.de> Original-written-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Fixes: 1be3f247c288 ("x86/mm: Avoid VLA in pgd_alloc()") Link: http://lkml.kernel.org/r/20181008235434.GA35035@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-06x86/ioremap: Add an ioremap_encrypted() helperLianbo Jiang
When SME is enabled, the memory is encrypted in the first kernel. In this case, SME also needs to be enabled in the kdump kernel, and we have to remap the old memory with the memory encryption mask. The case of concern here is if SME is active in the first kernel, and it is active too in the kdump kernel. There are four cases to be considered: a. dump vmcore It is encrypted in the first kernel, and needs be read out in the kdump kernel. b. crash notes When dumping vmcore, the people usually need to read useful information from notes, and the notes is also encrypted. c. iommu device table It's encrypted in the first kernel, kdump kernel needs to access its content to analyze and get information it needs. d. mmio of AMD iommu not encrypted in both kernels Add a new bool parameter @encrypted to __ioremap_caller(). If set, memory will be remapped with the SME mask. Add a new function ioremap_encrypted() to explicitly pass in a true value for @encrypted. Use ioremap_encrypted() for the above a, b, c cases. [ bp: cleanup commit message, extern defs in io.h and drop forgotten include. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: kexec@lists.infradead.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dan.j.williams@intel.com Cc: bhelgaas@google.com Cc: baiyaowei@cmss.chinamobile.com Cc: tiwai@suse.de Cc: brijesh.singh@amd.com Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: jroedel@suse.de Link: https://lkml.kernel.org/r/20180927071954.29615-2-lijiang@redhat.com
2018-10-04Merge branch 'linus' into x86/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-03x86/mm: Fix typo in commentTakuya Yamamoto
Signed-off-by: Takuya Yamamoto <tkyymmt01@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180829072730.988-1-tkyymmt01@gmail.com
2018-09-27x86/mm/cpa: Optimize __cpa_flush_range()Peter Zijlstra
If we IPI for WBINDV, then we might as well kill the entire TLB too. But if we don't have to invalidate cache, there is no reason not to use a range TLB flush. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.195633798@infradead.org
2018-09-27x86/mm/cpa: Factor common code between cpa_flush_*()Peter Zijlstra
The start of cpa_flush_range() and cpa_flush_array() is the same, use a common function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.138859183@infradead.org
2018-09-27x86/mm/cpa: Move CLFLUSH test into cpa_flush_array()Peter Zijlstra
Rather than guarding cpa_flush_array() users with a CLFLUSH test, put it inside. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.087848187@infradead.org
2018-09-27x86/mm/cpa: Move CLFLUSH test into cpa_flush_range()Peter Zijlstra
Rather than guarding all cpa_flush_range() uses with a CLFLUSH test, put it inside. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085948.036195503@infradead.org
2018-09-27x86/mm/cpa: Use flush_tlb_kernel_range()Peter Zijlstra
Both cpa_flush_range() and cpa_flush_array() have a well specified range, use that to do a range based TLB invalidate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.985193217@infradead.org
2018-09-27x86/mm/cpa: Unconditionally avoid WBINDV when we canPeter Zijlstra
CAT has happened, WBINDV is bad (even before CAT blowing away the entire cache on a multi-core platform wasn't nice), try not to use it ever. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.933674526@infradead.org
2018-09-27x86/mm/cpa: Move flush_tlb_all()Peter Zijlstra
There is an atom errata, where we do a local TLB invalidate right before we return and then do a global TLB invalidate. Move the global invalidate up a little bit and avoid the local invalidate entirely. This does put the global invalidate under pgd_lock, but that shouldn't matter. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.882287392@infradead.org
2018-09-27x86/mm/cpa: Use flush_tlb_all()Peter Zijlstra
Instead of open-coding it.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180919085947.831102058@infradead.org
2018-09-27x86/mm/cpa: Avoid the 4k pages check completelyThomas Gleixner
The extra loop which tries hard to preserve large pages in case of conflicts with static protection regions turns out to be not preserving anything, at least not in the experiments which have been conducted. There might be corner cases in which the code would be able to preserve a large page oaccsionally, but it's really not worth the extra code and the cycles wasted in the common case. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 541 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 514 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.589642503@linutronix.de
2018-09-27x86/mm/cpa: Do the range check earlyThomas Gleixner
To avoid excessive 4k wise checks in the common case do a quick check first whether the requested new page protections conflict with a static protection area in the large page. If there is no conflict then the decision whether to preserve or to split the page can be made immediately. If the requested range covers the full large page, preserve it. Otherwise split it up. No point in doing a slow crawl in 4k steps. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 560642 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 541 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 514 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.507259989@linutronix.de
2018-09-27x86/mm/cpa: Optimize same protection checkThomas Gleixner
When the existing mapping is correct and the new requested page protections are the same as the existing ones, then further checks can be omitted and the large page can be preserved. The slow path 4k wise check will not come up with a different result. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800709 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 538 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 560642 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.424477581@linutronix.de
2018-09-27x86/mm/cpa: Add sanity check for existing mappingsThomas Gleixner
With the range check it is possible to do a quick verification that the current mapping is correct vs. the static protection areas. In case a incorrect mapping is detected a warning is emitted and the large page is split up. If the large page is a 2M page, then the split code is forced to check the static protections for the PTE entries to fix up the incorrectness. For 1G pages this can't be done easily because that would require to either find the offending 2M areas before the split or afterwards. For now just warn about that case and revisit it when reported. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.331408643@linutronix.de
2018-09-27x86/mm/cpa: Avoid static protection checks on unmapThomas Gleixner
If the new pgprot has the PRESENT bit cleared, then conflicts vs. RW/NX are completely irrelevant. Before: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800770 4K pages set-checked: 7668 After: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800709 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.245849757@linutronix.de
2018-09-27x86/mm/cpa: Add large page preservation statisticsThomas Gleixner
The large page preservation mechanism is just magic and provides no information at all. Add optional statistic output in debugfs so the magic can be evaluated. Defaults is off. Output: 1G pages checked: 2 1G pages sameprot: 0 1G pages preserved: 0 2M pages checked: 540 2M pages sameprot: 466 2M pages preserved: 47 4K pages checked: 800770 4K pages set-checked: 7668 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.160867778@linutronix.de
2018-09-27x86/mm/cpa: Add debug mechanismThomas Gleixner
The whole static protection magic is silently fixing up anything which is handed in. That's just wrong. The offending call sites need to be fixed. Add a debug mechanism which emits a warning if a requested mapping needs to be fixed up. The DETECT debug mechanism is really not meant to be enabled except for developers, so limit the output hard to the protection fixups. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143546.078998733@linutronix.de
2018-09-27x86/mm/cpa: Allow range check for static protectionsThomas Gleixner
Checking static protections only page by page is slow especially for huge pages. To allow quick checks over a complete range, add the ability to do that. Make the checks inclusive so the ranges can be directly used for debug output later. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.995734490@linutronix.de
2018-09-27x86/mm/cpa: Rework static_protections()Thomas Gleixner
static_protections() is pretty unreadable. Split it up into separate checks for each protection area. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.913005317@linutronix.de
2018-09-27x86/mm/cpa: Split, rename and clean up try_preserve_large_page()Thomas Gleixner
Avoid the extra variable and gotos by splitting the function into the actual algorithm and a callable function which contains the lock protection. Rename it to should_split_large_page() while at it so the return values make actually sense. Clean up the code flow, comments and general whitespace damage while at it. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.830507216@linutronix.de
2018-09-27x86/mm/init32: Mark text and rodata RO in one goThomas Gleixner
The sequence of marking text and rodata read-only in 32bit init is: set_ro(text); kernel_set_to_readonly = 1; set_ro(rodata); When kernel_set_to_readonly is 1 it enables the protection mechanism in CPA for the read only regions. With the upcoming checks for existing mappings this consequently triggers the warning about an existing mapping being incorrect vs. static protections because rodata has not been converted yet. There is no technical reason to split the two, so just combine the RO protection to convert text and rodata in one go. Convert the printks to pr_info while at it. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Bin Yang <bin.yang@intel.com> Cc: Mark Gross <mark.gross@intel.com> Link: https://lkml.kernel.org/r/20180917143545.731701535@linutronix.de
2018-09-27Merge tag 'efi-next' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/core Pull EFI updates for v4.20 from Ard Biesheuvel: - Add support for enlisting the help of the EFI firmware to create memory reservations that persist across kexec. - Add page fault handling to the runtime services support code on x86 so we can gracefully recover from buggy EFI firmware. - Fix command line handling on x86 for the boot path that omits the stub's PE/COFF entry point. - Other assorted fixes.