summaryrefslogtreecommitdiff
path: root/arch/um/include
AgeCommit message (Collapse)Author
7 hoursMerge tag 'uml-for-linux-6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Johannes Berg: "The only really new thing is the long-standing seccomp work (originally from 2021!). Wven if it still isn't enabled by default due to security concerns it can still be used e.g. for tests. - remove obsolete network transports - remove PCI IO port support - start adding seccomp-based process handling instead of ptrace" * tag 'uml-for-linux-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (29 commits) um: remove "extern" from implementation of sigchld_handler um: fix unused variable warning um: fix SECCOMP 32bit xstate register restore um: pass FD for memory operations when needed um: Add SECCOMP support detection and initialization um: Implement kernel side of SECCOMP based process handling um: Track userspace children dying in SECCOMP mode um: Add helper functions to get/set state for SECCOMP um: Add stub side of SECCOMP/futex based process handling um: Move faultinfo extraction into userspace routine um: vector: Use mac_pton() for MAC address parsing um: vector: Clean up and modernize log messages um: chan_kern: use raw spinlock for irqs_to_free_lock MAINTAINERS: remove obsolete file entry in TUN/TAP DRIVER um: Fix tgkill compile error on old host OSes um: stop using PCI port I/O um: Remove legacy network transport infrastructure um: vector: Eliminate the dependency on uml_net um: Remove obsolete legacy network transports um/asm: Replace "REP; NOP" with PAUSE mnemonic ...
3 daysum: pass FD for memory operations when neededBenjamin Berg
Instead of always sharing the FDs with the userspace process, only hand over the FDs needed for mmap when required. The idea is that userspace might be able to force the stub into executing an mmap syscall, however, it will not be able to manipulate the control flow sufficiently to have access to an FD that would allow mapping arbitrary memory. Security wise, we need to be sure that only the expected syscalls are executed after the kernel sends FDs through the socket. This is currently not the case, as userspace can trivially jump to the rt_sigreturn syscall instruction to execute any syscall that the stub is permitted to do. With this, it can trick the kernel to send the FD, which in turn allows userspace to freely map any physical memory. As such, this is currently *not* secure. However, in principle the approach should be fine with a more strict SECCOMP filter and a careful review of the stub control flow (as userspace can prepare a stack). With some care, it is likely possible to extend the security model to SMP if desired. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250602130052.545733-8-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 daysum: Implement kernel side of SECCOMP based process handlingBenjamin Berg
This adds the kernel side of the seccomp based process handling. Co-authored-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250602130052.545733-6-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 daysum: Track userspace children dying in SECCOMP modeBenjamin Berg
When in seccomp mode, we would hang forever on the futex if a child has died unexpectedly. In contrast, ptrace mode will notice it and kill the corresponding thread when it fails to run it. Fix this issue using a new IRQ that is fired after a SIGCHLD and keeping an (internal) list of all MMs. In the IRQ handler, find the affected MM and set its PID to -1 as well as the futex variable to FUTEX_IN_KERN. This, together with futex returning -EINTR after the signal is sufficient to implement a race-free detection of a child dying. Note that this also enables IRQ handling while starting a userspace process. This should be safe and SECCOMP requires the IRQ in case the process does not come up properly. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250602130052.545733-5-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 daysum: Add stub side of SECCOMP/futex based process handlingBenjamin Berg
This adds the stub side for the new seccomp process management code. In this case we do register save/restore through the signal handler mcontext. Add special code for handling TLS, which for x86_64 means setting the FS_BASE/GS_BASE registers while for i386 it means calling the set_thread_area syscall. Co-authored-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250602130052.545733-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
5 daysMerge tag 'mm-stable-2025-05-31-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...
10 daysMerge tag 'v6.16-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Fix memcpy_sglist to handle partially overlapping SG lists - Use memcpy_sglist to replace null skcipher - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS - Hide CRYPTO_MANAGER - Add delayed freeing of driver crypto_alg structures Compression: - Allocate large buffers on first use instead of initialisation in scomp - Drop destination linearisation buffer in scomp - Move scomp stream allocation into acomp - Add acomp scatter-gather walker - Remove request chaining - Add optional async request allocation Hashing: - Remove request chaining - Add optional async request allocation - Move partial block handling into API - Add ahash support to hmac - Fix shash documentation to disallow usage in hard IRQs Algorithms: - Remove unnecessary SIMD fallback code on x86 and arm/arm64 - Drop avx10_256 xts(aes)/ctr(aes) on x86 - Improve avx-512 optimisations for xts(aes) - Move chacha arch implementations into lib/crypto - Move poly1305 into lib/crypto and drop unused Crypto API algorithm - Disable powerpc/poly1305 as it has no SIMD fallback - Move sha256 arch implementations into lib/crypto - Convert deflate to acomp - Set block size correctly in cbcmac Drivers: - Do not use sg_dma_len before mapping in sun8i-ss - Fix warm-reboot failure by making shutdown do more work in qat - Add locking in zynqmp-sha - Remove cavium/zip - Add support for PCI device 0x17D8 to ccp - Add qat_6xxx support in qat - Add support for RK3576 in rockchip-rng - Add support for i.MX8QM in caam Others: - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up - Add new SEV/SNP platform shutdown API in ccp" * tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits) x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining crypto: qat - add missing header inclusion crypto: api - Redo lookup on EEXIST Revert "crypto: testmgr - Add hash export format testing" crypto: marvell/cesa - Do not chain submitted requests crypto: powerpc/poly1305 - add depends on BROKEN for now Revert "crypto: powerpc/poly1305 - Add SIMD fallback" crypto: ccp - Add missing tee info reg for teev2 crypto: ccp - Add missing bootloader info reg for pspv5 crypto: sun8i-ce - move fallback ahash_request to the end of the struct crypto: octeontx2 - Use dynamic allocated memory region for lmtst crypto: octeontx2 - Initialize cptlfs device info once crypto: xts - Only add ecb if it is not already there crypto: lrw - Only add ecb if it is not already there crypto: testmgr - Add hash export format testing crypto: testmgr - Use ahash for generic tfm crypto: hmac - Add ahash support crypto: testmgr - Ignore EEXIST on shash allocation crypto: algapi - Add driver template support to crypto_inst_setname crypto: shash - Set reqsize in shash_alg ...
2025-05-11syscall.h: introduce syscall_set_nr()Dmitry V. Levin
Similar to syscall_set_arguments() that complements syscall_get_arguments(), introduce syscall_set_nr() that complements syscall_get_nr(). syscall_set_nr() is going to be needed along with syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. Link: https://lkml.kernel.org/r/20250303112020.GD24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> # mips Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11syscall.h: add syscall_set_arguments()Dmitry V. Levin
This function is going to be needed on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. This partially reverts commit 7962c2eddbfe ("arch: remove unused function syscall_set_arguments()") by reusing some of old syscall_set_arguments() implementations. [nathan@kernel.org: fix compile time fortify checks] Link: https://lkml.kernel.org/r/20250408213131.GA2872426@ax162 Link: https://lkml.kernel.org/r/20250303112009.GC24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> [mips] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11um: remove custom definition of mk_pte()Matthew Wilcox (Oracle)
Move the pfn_pte() definitions from the 2level and 4level files to the generic pgtable.h and delete the custom definition of mk_pte() so that we use the central definition. Link: https://lkml.kernel.org/r/20250402181709.2386022-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-07um: Include linux/types.h in asm/fpu/api.hHerbert Xu
Include linux/types.h before using bool. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505070045.vWc04ygs-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-05um: Remove legacy network transport infrastructureTiwei Bie
All legacy network transports have been removed. Vector transports provide the same capabilities with significantly higher network throughput. There is no reason to keep the legacy network transport infrastructure anymore. Remove it to reduce the maintenance burden. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20250503051710.3286595-4-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-05um: Remove obsolete legacy network transportsTiwei Bie
These legacy network transports were marked as obsolete in commit 40814b98a570 ("um: Mark non-vector net transports as obsolete"). More than five years have passed since then. Remove these network transports to reduce the maintenance burden. Suggested-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20250503051710.3286595-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-05um: Add cmpxchg8b_emu and checksum functions to asm-prototypes.hSami Tolvanen
With CONFIG_GENDWARFKSYMS, um builds fail due to missing prototypes in asm/asm-prototypes.h. Add declarations for cmpxchg8b_emu and the exported checksum functions, including csum_partial_copy_generic as it's also exported. Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: linux-kbuild@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503251216.lE4t9Ikj-lkp@intel.com/ Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://patch.msgid.link/20250326190500.847236-2-samitolvanen@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-05um: fix _nofault accessesJohannes Berg
Nathan reported [1] that when built with clang, the um kernel crashes pretty much immediately. This turned out to be an issue with the inline assembly I had added, when clang used %rax/%eax for both operands. Reorder it so current->thread.segv_continue is written first, and then the lifetime of _faulted won't have overlap with the lifetime of segv_continue. In the email thread Benjamin also pointed out that current->mm is only NULL for true kernel tasks, but we could do this for a userspace task, so the current->thread.segv_continue logic must be lifted out of the mm==NULL check. Finally, while looking at this, put a barrier() so the NULL assignment to thread.segv_continue cannot be reorder before the possibly faulting operation. Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/r/20250402221254.GA384@ax162 [1] Fixes: d1d7f01f7cd3 ("um: mark rodata read-only and implement _nofault accesses") Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-04-03Merge tag 'mm-stable-2025-04-02-22-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - The series "mm: fixes for fallouts from mem_init() cleanup" from Mike Rapoport fixes a couple of issues with the just-merged "arch, mm: reduce code duplication in mem_init()" series - The series "MAINTAINERS: add my isub-entries to MM part." from Mike Rapoport does some maintenance on MAINTAINERS - The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some cleanup work to the page mapping code - The series "mseal system mappings" from Jeff Xu permits sealing of "system mappings", such as vdso, vvar, vvar_vclock, vectors (arm compat-mode), sigpage (arm compat-mode) - Plus the usual shower of singleton patches * tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits) mseal sysmap: add arch-support txt mseal sysmap: enable s390 selftest: test system mappings are sealed mseal sysmap: update mseal.rst mseal sysmap: uprobe mapping mseal sysmap: enable arm64 mseal sysmap: enable x86-64 mseal sysmap: generic vdso vvar mapping selftests: x86: test_mremap_vdso: skip if vdso is msealed mseal sysmap: kernel config and header change mm: pgtable: remove tlb_remove_page_ptdesc() x86: pgtable: convert to use tlb_remove_ptdesc() riscv: pgtable: unconditionally use tlb_remove_ptdesc() mm: pgtable: convert some architectures to use tlb_remove_ptdesc() mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc* mm: pgtable: make generic tlb_remove_table() use struct ptdesc microblaze/mm: put mm_cmdline_setup() in .init.text section mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range MAINTAINERS: mm: add entry for secretmem MAINTAINERS: mm: add entry for numa memblocks and numa emulation ...
2025-04-02Merge tag 'uml-for-linux-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Johannes Berg: - proper nofault accesses and read-only rodata - hostfs fix for host inode number reuse - fixes for host errno handling - various cleanups/small fixes * tag 'uml-for-linux-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: Rewrite the sigio workaround based on epoll and tgkill um: Prohibit the VM_CLONE flag in run_helper_thread() um: Switch to the pthread-based helper in sigio workaround um: ubd: Switch to the pthread-based helper um: Add pthread-based helper support um: x86: clean up elf specific definitions um: Store full CSGSFS and SS register from mcontext um: virt-pci: Refactor virtio_pcidev into its own module um: work around sched_yield not yielding in time-travel mode um/locking: Remove semicolon from "lock" prefix um: Update min_low_pfn to match changes in uml_reserved um: use str_yes_no() to remove hardcoded "yes" and "no" um: hostfs: avoid issues on inode number reuse by host um: Allocate vdso page pointer statically um: remove copy_from_kernel_nofault_allowed um: mark rodata read-only and implement _nofault accesses um: Pass the correct Rust target and options with gcc
2025-04-01mm: pgtable: convert some architectures to use tlb_remove_ptdesc()Qi Zheng
Now, the nine architectures of csky, hexagon, loongarch, m68k, mips, nios2, openrisc, sh and um do not select CONFIG_MMU_GATHER_RCU_TABLE_FREE, and just call pagetable_dtor() + tlb_remove_page_ptdesc() (the wrapper of tlb_remove_page()). This is the same as the implementation of tlb_remove_{ptdesc|table}() under !CONFIG_MMU_GATHER_TABLE_FREE, so convert these architectures to use tlb_remove_ptdesc(). The ultimate goal is to make the architecture only use tlb_remove_ptdesc() or tlb_remove_table() for page table pages. [zhengqi.arch@bytedance.com: v2] Link: https://lkml.kernel.org/r/20250303072603.45423-1-zhengqi.arch@bytedance.com [akpm@linux-foundation.org: remove trailing semi in arch/loongarch/include/asm/pgalloc.h] Link: https://lkml.kernel.org/r/19db3e8673b67bad2f1df1ab37f1c89d99eacfea.1740454179.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickens <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-20um: Rewrite the sigio workaround based on epoll and tgkillTiwei Bie
The existing sigio workaround implementation removes FDs from the poll when events are triggered, requiring users to re-add them via add_sigio_fd() after processing. This introduces a potential race condition between FD removal in write_sigio_thread() and next_poll update in __add_sigio_fd(), and is inefficient due to frequent FD removal and re-addition. Rewrite the implementation based on epoll and tgkill for improved efficiency and reliability. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250315161910.4082396-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-20um: Add pthread-based helper supportTiwei Bie
Introduce a new set of utility functions that can be used to create pthread-based helpers. Helper threads created in this way will ensure thread safety for errno while sharing the same memory space. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250319135523.97050-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: x86: clean up elf specific definitionsHajime Tazaki
The file arch/x86/um/asm/module.h is equivalent to the definition of asm-generic. Thus this commit cleans up to use it. Signed-off-by: Hajime Tazaki <thehajime@gmail.com> Link: https://patch.msgid.link/2d70a0ed79ee0a0bef80ad4790063f4833dd9bed.1737348399.git.thehajime@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: work around sched_yield not yielding in time-travel modeBenjamin Berg
sched_yield by a userspace may not actually cause scheduling in time-travel mode as no time has passed. In the case seen it appears to be a badly implemented userspace spinlock in ASAN. Unfortunately, with time-travel it causes an extreme slowdown or even deadlock depending on the kernel configuration (CONFIG_UML_MAX_USERSPACE_ITERATIONS). Work around it by accounting time to the process whenever it executes a sched_yield syscall. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250314130815.226872-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: remove copy_from_kernel_nofault_allowedBenjamin Berg
There is no need to override the default version of this function anymore as UML now has proper _nofault memory access functions. Doing this also fixes the fact that the implementation was incorrect as using mincore() will incorrectly flag pages as inaccessible if they were swapped out by the host. Fixes: f75b1b1bedfb ("um: Implement probe_kernel_read()") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250210160926.420133-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-18um: mark rodata read-only and implement _nofault accessesJohannes Berg
Mark read-only data actually read-only (simple mprotect), and to be able to test it also implement _nofault accesses. This works by setting up a new "segv_continue" pointer in current, and then when we hit a segfault we change the signal return context so that we continue at that address. The code using this sets it up so that it jumps to a label and then aborts the access that way, returning -EFAULT. It's possible to optimize the ___backtrack_faulted() thing by using asm goto (compiler version dependent) and/or gcc's (not sure if clang has it) &&label extension, but at least in one attempt I made the && caused the compiler to not load -EFAULT into the register in case of jumping to the &&label from the fault handler. So leave it like this for now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Co-developed-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250210160926.420133-2-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-03-17arch, mm: set max_mapnr when allocating memory map for FLATMEMMike Rapoport (Microsoft)
max_mapnr is essentially the size of the memory map for systems that use FLATMEM. There is no reason to calculate it in each and every architecture when it's anyway calculated in alloc_node_mem_map(). Drop setting of max_mapnr from architecture code and set it once in alloc_node_mem_map(). While on it, move definition of mem_map and max_mapnr to mm/mm_init.c so there won't be two copies for MMU and !MMU variants. Link: https://lkml.kernel.org/r/20250313135003.836600-10-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86] Tested-by: Mark Brown <broonie@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren (csky) <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Richard Weinberger <richard@nod.at> Cc: Russel King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-30Merge tag 'uml-for-linus-6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - hostfs: Convert to writepages - many cleanups: removal of dead macros, missing __init * tag 'uml-for-linus-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: Remove unused asm/archparam.h header um: Include missing headers in asm/pgtable.h hostfs: Convert to writepages um: rtc: use RTC time when calculating the alarm um: Remove unused user_context function um: Remove unused THREAD_NAME_LEN macro um: Remove unused PGD_BOUND macro um: Mark setup_env_path as __init um: Mark install_fatal_handler as __init um: Mark set_stklim as __init um: Mark get_top_address as __init um: Mark parse_cache_line as __init um: Mark parse_host_cpu_flags as __init um: Count iomem_size only once in physmem calculation um: Remove obsolete fixmap support um: Remove unused MODULES_LEN macro
2025-01-25mm: pgtable: introduce pagetable_dtor()Qi Zheng
The pagetable_p*_dtor() are exactly the same except for the handling of ptlock. If we make ptlock_free() handle the case where ptdesc->ptl is NULL and remove VM_BUG_ON_PAGE() from pmd_ptlock_free(), we can unify pagetable_p*_dtor() into one function. Let's introduce pagetable_dtor() to do this. Later, pagetable_dtor() will be moved to tlb_remove_ptdesc(), so that ptlock and page table pages can be freed together (regardless of whether RCU is used). This prevents the use-after-free problem where the ptlock is freed immediately but the page table pages is freed later via RCU. Link: https://lkml.kernel.org/r/47f44fff9dc68d9d9e9a0d6c036df275f820598a.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13um: Include missing headers in asm/pgtable.hTiwei Bie
Formerly, asm/pgtable.h relied on the implicit inclusion of asm/page.h and linux/mm_types.h via asm/fixmap.h. With the removal of asm/fixmap.h, these headers need to be included explicitly in asm/pgtable.h now. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501131814.E6CxjulL-lkp@intel.com/ Fixes: 5bfc4a3a0af3 ("um: Remove obsolete fixmap support") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20250113153218.3331321-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-10um: Remove obsolete fixmap supportTiwei Bie
It was added to support highmem. But since the highmem support has been removed by commit a98a6d864d3b ("um: Remove broken highmem support"), it is no longer needed. Remove it to simplify the code. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241128081939.2216246-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-10um: Remove unused MODULES_LEN macroTiwei Bie
It's not used anywhere. And its definition doesn't seem right. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241128081939.2216246-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-30Merge tag 'uml-for-linus-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Lots of cleanups, mostly from Benjamin Berg and Tiwei Bie - Removal of unused code - Fix for sparse warnings - Cleanup around stub_exe() * tag 'uml-for-linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (68 commits) hostfs: Fix the NULL vs IS_ERR() bug for __filemap_get_folio() um: move thread info into task um: Always dump trace for specified task in show_stack um: vector: Do not use drvdata in release um: net: Do not use drvdata in release um: ubd: Do not use drvdata in release um: ubd: Initialize ubd's disk pointer in ubd_add um: virtio_uml: query the number of vqs if supported um: virtio_uml: fix call_fd IRQ allocation um: virtio_uml: send SET_MEM_TABLE message with the exact size um: remove broken double fault detection um: remove duplicate UM_NSEC_PER_SEC definition um: remove file sync for stub data um: always include kconfig.h and compiler-version.h um: set DONTDUMP and DONTFORK flags on KASAN shadow memory um: fix sparse warnings in signal code um: fix sparse warnings from regset refactor um: Remove double zero check um: fix stub exe build with CONFIG_GCOV um: Use os_set_pdeathsig helper in winch thread/process ...
2024-11-20Merge tag 'asm-generic-3.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "These are a number of unrelated cleanups, generally simplifying the architecture specific header files: - A series from Al Viro simplifies asm/vga.h, after it turns out that most of it can be generalized. - A series from Julian Vetter adds a common version of memcpy_{to,from}io() and memset_io() and changes most architectures to use that instead of their own implementation - A series from Niklas Schnelle concludes his work to make PC style inb()/outb() optional - Nicolas Pitre contributes improvements for the generic do_div() helper - Christoph Hellwig adds a generic version of page_to_phys() and phys_to_page(), replacing the slightly different architecture specific definitions. - Uwe Kleine-Koenig has a minor cleanup for ioctl definitions" * tag 'asm-generic-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (24 commits) empty include/asm-generic/vga.h sparc: get rid of asm/vga.h asm/vga.h: don't bother with scr_mem{cpy,move}v() unless we need to vt_buffer.h: get rid of dead code in default scr_...() instances tty: serial: export serial_8250_warn_need_ioport lib/iomem_copy: fix kerneldoc format style hexagon: simplify asm/io.h for !HAS_IOPORT loongarch: Use new fallback IO memcpy/memset csky: Use new fallback IO memcpy/memset arm64: Use new fallback IO memcpy/memset New implementation for IO memcpy and IO memset watchdog: Add HAS_IOPORT dependency for SBC8360 and SBC7240 __arch_xprod64(): make __always_inline when optimizing for performance ARM: div64: improve __arch_xprod_64() asm-generic/div64: optimize/simplify __div64_const32() lib/math/test_div64: add some edge cases relevant to __div64_const32() asm-generic: add an optional pfn_valid check to page_to_phys asm-generic: provide generic page_to_phys and phys_to_page implementations asm-generic/io.h: Remove I/O port accessors for HAS_IOPORT=n tty: serial: handle HAS_IOPORT dependencies ...
2024-11-12um: move thread info into taskBenjamin Berg
This selects the THREAD_INFO_IN_TASK option for UM and changes the way that the current task is discovered. This is trivial though, as UML already tracks the current task in cpu_tasks[] and this can be used to retrieve it. Also remove the signal handler code that copies the thread information into the IRQ stack. It is obsolete now, which also means that the mentioned race condition cannot happen anymore. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Hajime Tazaki <thehajime@gmail.com> Link: https://patch.msgid.link/20241111102910.46512-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07um: remove broken double fault detectionBenjamin Berg
The show_stack function had some code to detect double faults. However, the logic is wrong and it would e.g. trigger if a WARNING happened inside an IRQ. Remove it without trying to add a new logic. The current behaviour, which will just fault repeatedly until the IRQ stack is used up and the host kills UML, seems to be good enough. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241103150506.1367695-5-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07um: remove duplicate UM_NSEC_PER_SEC definitionBenjamin Berg
Just remove the first entry as there is a second later on. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241103150506.1367695-4-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07um: remove file sync for stub dataBenjamin Berg
There is no need to sync the stub code to "disk" for the other process to see the correct memory. Drop the fsync there and remove the helper function. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241103150506.1367695-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07um: always include kconfig.h and compiler-version.hBenjamin Berg
Since commit a95b37e20db9 ("kbuild: get <linux/compiler_types.h> out of <linux/kconfig.h>") we can safely include these files in userspace code. Doing so simplifies matters as options do not need to be exported via asm-offsets.h anymore. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241103150506.1367695-2-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-28asm-generic: provide generic page_to_phys and phys_to_page implementationsChristoph Hellwig
page_to_phys is duplicated by all architectures, and from some strange reason placed in <asm/io.h> where it doesn't fit at all. phys_to_page is only provided by a few architectures despite having a lot of open coded users. Provide generic versions in <asm-generic/memory_model.h> to make these helpers more easily usable. Note with this patch powerpc loses the CONFIG_DEBUG_VIRTUAL pfn_valid check. It will be added back in a generic version later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-25um: Add os_set_pdeathsig helper functionTiwei Bie
This helper can be used to set the parent-death signal of the calling process to SIGKILL to ensure that the process will be killed if the UML kernel dies unexpectedly without proper cleanup. This helper will be used in the follow-up patches. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241024142828.2612828-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: switch to regset API and depend on XSTATEBenjamin Berg
The PTRACE_GETREGSET API has now existed since Linux 2.6.33. The XSAVE CPU feature should also be sufficiently common to be able to rely on it. With this, define our internal FP state to be the hosts XSAVE data. Add discovery for the hosts XSAVE size and place the FP registers at the end of task_struct so that we can adjust the size at runtime. Next we can implement the regset API on top and update the signal handling as well as ptrace APIs to use them. Also switch coredump creation to use the regset API and finally set HAVE_ARCH_TRACEHOOK. This considerably improves the signal frames. Previously they might not have contained all the registers (i386) and also did not have the sizes and magic values set to the correct values to permit userspace to decode the frame. As a side effect, this will permit UML to run on hosts with newer CPU extensions (such as AMX) that need even more register state. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241023094120.4083426-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: insert scheduler ticks when userspace does not yieldBenjamin Berg
In time-travel mode userspace can do a lot of work without any time passing. Unfortunately, this can result in OOM situations as the RCU core code will never be run. Work around this by keeping track of userspace processes that do not yield for a lot of operations. When this happens, insert a jiffie into the sched_clock clock to account time against the process and cause the bookkeeping to run. As sched_clock is used for tracing, it is useful to keep it in sync between the different VMs. As such, try to remove added ticks again when the actual clock ticks. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241010142537.1134685-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Rename _PAGE_NEWPAGE to _PAGE_NEEDSYNCTiwei Bie
The _PAGE_NEWPAGE bit does not really indicate that this is a new page, but rather whether this entry needs to be synced or not. Renaming it to _PAGE_NEEDSYNC will make it more clear how everything ties together. Suggested-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011102354.1682626-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Abandon the _PAGE_NEWPROT bitTiwei Bie
When a PTE is updated in the page table, the _PAGE_NEWPAGE bit will always be set. And the corresponding page will always be mapped or unmapped depending on whether the PTE is present or not. The check on the _PAGE_NEWPROT bit is not really reachable. Abandoning it will allow us to simplify the code and remove the unreachable code. Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011102354.1682626-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: remove fault_catcher infrastructureJohannes Berg
This was perhaps intended to do _nofault copies, but the real reason is lost to history. Remove this, it's not needed, and using longjmp() out of the middle of the signal handler with all the state it has modified is not going to be a good idea anyway. Link: https://patch.msgid.link/20241010224513.901c4d390b3e.Ia74742668b44603c1ca23dd36f90e964e6e7ee55@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-16vdso: Introduce vdso/page.hVincenzo Frascino
The VDSO implementation includes headers from outside of the vdso/ namespace. Introduce vdso/page.h to make sure that the generic library uses only the allowed namespace. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Link: https://lore.kernel.org/all/20241014151340.1639555-3-vincenzo.frascino@arm.com
2024-10-10um: Switch to 4 level page tables on 64 bitBenjamin Berg
The larger memory space is useful to support more applications inside UML. One example for this is ASAN instrumentation of userspace applications which requires addresses that would otherwise not be available. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-11-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Discover host_task_size from envpBenjamin Berg
When loading the UML binary, the host kernel will place the stack at the highest possible address. It will then map the program name and environment variables onto the start of the stack. As such, an easy way to figure out the host_task_size is to use the highest pointer to an environment variable as a reference. Ensure that this works by disabling address layout randomization and re-executing UML in case it was enabled. This increases the available TASK_SIZE for 64 bit UML considerably. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-9-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: use execveat to create userspace MMsBenjamin Berg
Using clone will not undo features that have been enabled by libc. An example of this already happening is rseq, which could cause the kernel to read/write memory of the userspace process. In the future the standard library might also use mseal by default to protect itself, which would also thwart our attempts at unmapping everything. Solve all this by taking a step back and doing an execve into a tiny static binary that sets up the minimal environment required for the stub without using any standard library. That way we have a clean execution environment that is fully under the control of UML. Note that this changes things a bit as the FDs are not anymore shared with the kernel. Instead, we explicitly share the FDs for the physical memory and all existing iomem regions. Doing this is fine, as iomem regions cannot be added at runtime. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-3-benjamin@sipsolutions.net [use pipe() instead of pipe2(), remove unneeded close() calls] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: remove auxiliary FP registersBenjamin Berg
We do not need the extra save/restore of the FP registers when getting the fault information. This was originally added in commit 2f56debd77a8 ("uml: fix FP register corruption") but at that time the code was not saving/restoring the FP registers when switching to userspace. This was fixed in commit fbfe9c847edf ("um: Save FPU registers between task switches") and since then the auxiliary registers have not been useful. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241004233821.2130874-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Remove 3-level page table support on i386Tiwei Bie
The highmem support has been removed by commit a98a6d864d3b ("um: Remove broken highmem support"). The 2-level page table is sufficient on UML/i386 now. Remove the 3-level page table support on UML/i386 which is still marked as experimental. Suggested-by: Benjamin Berg <benjamin@sipsolutions.net> Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240918061702.614837-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>