summaryrefslogtreecommitdiff
path: root/arch/loongarch
AgeCommit message (Collapse)Author
2025-02-13LoongArch: Remove the deprecated notifier hook mechanismYuli Wang
The notifier hook mechanism in proc and cpuinfo is actually unnecessary for LoongArch because it's not used anywhere. It was originally added to the MIPS code in commit d6d3c9afaab4 ("MIPS: MT: proc: Add support for printing VPE and TC ids"), and LoongArch then inherited it. But as the kernel code stands now, this notifier hook mechanism doesn't really make sense for either LoongArch or MIPS. In addition, the seq_file forward declaration needs to be moved to its proper place, as only the show_ipi_list() function in smp.c requires it. Co-developed-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-02-13LoongArch: Use str_yes_no() helper function for /proc/cpuinfoYuli Wang
Remove hard-coded strings by using the str_yes_no() helper function. Similar to commit c4a0a4a45a45 ("MIPS: kernel: proc: Use str_yes_no() helper function"). Co-developed-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Wentao Guan <guanwentao@uniontech.com> Signed-off-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-02-13LoongArch: Fix kernel_page_present() for KPRANGE/XKPRANGEHuacai Chen
Now kernel_page_present() always return true for KPRANGE/XKPRANGE addresses, this isn't correct because hibernation (ACPI S4) use it to distinguish whether a page is saveable. If all KPRANGE/XKPRANGE addresses are considered as saveable, then reserved memory such as EFI_RUNTIME_SERVICES_CODE / EFI_RUNTIME_SERVICES_DATA will also be saved and restored. Fix this by returning true only if the KPRANGE/XKPRANGE address is in memblock.memory. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-02-13LoongArch: Fix idle VS timer enqueueMarco Crivellari
LoongArch re-enables interrupts on its idle routine and performs a TIF_NEED_RESCHED check afterwards before putting the CPU to sleep. The IRQs firing between the check and the idle instruction may set the TIF_NEED_RESCHED flag. In order to deal with such a race, IRQs interrupting __arch_cpu_idle() rollback their return address to the beginning of __arch_cpu_idle() so that TIF_NEED_RESCHED is checked again before going back to sleep. However idle IRQs can also queue timers that may require a tick reprogramming through a new generic idle loop iteration but those timers would go unnoticed here because __arch_cpu_idle() only checks TIF_NEED_RESCHED. It doesn't check for pending timers. Fix this with fast-forwarding idle IRQs return address to the end of the idle routine instead of the beginning, so that the generic idle loop can handle both TIF_NEED_RESCHED and pending timers. Fixes: 0603839b18f4 ("LoongArch: Add exception/interrupt handling") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-02-12loongarch: dts: remove non-existent DAC from 2k1000-refConor Dooley
The 2k1000 reference boards do not have a Rohm DAC on them as far as I can tell, and they certainly do not have a dh2228fv, as this device does not actually exist! Remove the dac nodes from the devicetrees as it is not acceptable to pretend to have a device on a board in order to bind the spidev driver in Linux. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240717-preacher-sandal-2aeffa322b9f@spud Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
2025-02-08lib/crc32: remove "_le" from crc32c base and arch functionsEric Biggers
Following the standardization on crc32c() as the lib entry point for the Castagnoli CRC32 instead of the previous mix of crc32c(), crc32c_le(), and __crc32c_le(), make the same change to the underlying base and arch functions that implement it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-01-28Merge tag 'loongarch-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Migrate to the generic rule for built-in DTB - Disable FIX_EARLYCON_MEM when ARCH_IOREMAP is enabled - Derive timer max_delta from PRCFG1's timer_bits - Correct the cacheinfo sharing information - Add pgprot_nx() implementation - Add debugfs entries to switch SFB/TSO state - Change the maximum number of watchpoints - Some bug fixes and other small changes * tag 'loongarch-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Extend the maximum number of watchpoints LoongArch: Change 8 to 14 for LOONGARCH_MAX_{BRP,WRP} LoongArch: Add debugfs entries to switch SFB/TSO state LoongArch: Fix warnings during S3 suspend LoongArch: Adjust SETUP_SLEEP and SETUP_WAKEUP LoongArch: Refactor bug_handler() implementation LoongArch: Add pgprot_nx() implementation LoongArch: Correct the __switch_to() prototype in comments LoongArch: Correct the cacheinfo sharing information LoongArch: Derive timer max_delta from PRCFG1's timer_bits LoongArch: Disable FIX_EARLYCON_MEM when ARCH_IOREMAP is enabled LoongArch: Migrate to the generic rule for built-in DTB
2025-01-26Merge tag 'mm-stable-2025-01-26-14-59' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ...
2025-01-26LoongArch: Extend the maximum number of watchpointsTiezhu Yang
The maximum number of load/store watchpoints and fetch instruction watchpoints is 14 each according to LoongArch Reference Manual, so extend the maximum number of watchpoints from 8 to 14 for ptrace. By the way, just simply change 8 to 14 for the definition in struct user_watch_state at the beginning, but it may corrupt uapi, then add a new struct user_watch_state_v2 directly. As far as I can tell, the only users for this struct in the userspace are GDB and LLDB, there are no any problems of software compatibility between the application and kernel according to the analysis. The compatibility problem has been considered while developing and testing. When the applications in the userspace get watchpoint state, the length will be specified which is no bigger than the sizeof struct user_watch_state or user_watch_state_v2, the actual length is assigned as the minimal value of the application and kernel in the generic code of ptrace: kernel/ptrace.c: ptrace_regset(): kiov->iov_len = min(kiov->iov_len, (__kernel_size_t) (regset->n * regset->size)); if (req == PTRACE_GETREGSET) return copy_regset_to_user(task, view, regset_no, 0, kiov->iov_len, kiov->iov_base); else return copy_regset_from_user(task, view, regset_no, 0, kiov->iov_len, kiov->iov_base); For example, there are four kind of combinations, all of them work well. (1) "older kernel + older gdb", the actual length is 8+(8+8+4+4)*8=200; (2) "newer kernel + newer gdb", the actual length is 8+(8+8+4+4)*14=344; (3) "older kernel + newer gdb", the actual length is 8+(8+8+4+4)*8=200; (4) "newer kernel + older gdb", the actual length is 8+(8+8+4+4)*8=200. Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints Cc: stable@vger.kernel.org Fixes: 1a69f7a161a7 ("LoongArch: ptrace: Expose hardware breakpoints to debuggers") Reviewed-by: WANG Xuerui <git@xen0n.name> Reviewed-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-26LoongArch: Change 8 to 14 for LOONGARCH_MAX_{BRP,WRP}Tiezhu Yang
The maximum number of load/store watchpoints and fetch instruction watchpoints is 14 each according to LoongArch Reference Manual, so change 8 to 14 for the related code. Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints Cc: stable@vger.kernel.org Fixes: edffa33c7bb5 ("LoongArch: Add hardware breakpoints/watchpoints support") Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-26LoongArch: Add debugfs entries to switch SFB/TSO stateHuacai Chen
We need to switch SFB (Store Fill Buffer) and TSO (Total Store Order) state at runtime to debug memory management and KVM virtualization, so add two debugfs entries "sfb_state" and "tso_state" under the directory /sys/kernel/debug/loongarch. Query SFB: cat /sys/kernel/debug/loongarch/sfb_state Enable SFB: echo 1 > /sys/kernel/debug/loongarch/sfb_state Disable SFB: echo 0 > /sys/kernel/debug/loongarch/sfb_state Query TSO: cat /sys/kernel/debug/loongarch/tso_state Switch TSO: echo [TSO] > /sys/kernel/debug/loongarch/tso_state Available [TSO] states: 0 (No Load No Store) 1 (All Load No Store) 3 (Same Load No Store) 4 (No Load All Store) 5 (All Load All Store) 7 (Same Load All Store) Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-26LoongArch: Fix warnings during S3 suspendHuacai Chen
The enable_gpe_wakeup() function calls acpi_enable_all_wakeup_gpes(), and the later one may call the preempt_schedule_common() function, resulting in a thread switch and causing the CPU to be in an interrupt enabled state after the enable_gpe_wakeup() function returns, leading to the warnings as follow. [ C0] WARNING: ... at kernel/time/timekeeping.c:845 ktime_get+0xbc/0xc8 [ C0] ... [ C0] Call Trace: [ C0] [<90000000002243b4>] show_stack+0x64/0x188 [ C0] [<900000000164673c>] dump_stack_lvl+0x60/0x88 [ C0] [<90000000002687e4>] __warn+0x8c/0x148 [ C0] [<90000000015e9978>] report_bug+0x1c0/0x2b0 [ C0] [<90000000016478e4>] do_bp+0x204/0x3b8 [ C0] [<90000000025b1924>] exception_handlers+0x1924/0x10000 [ C0] [<9000000000343bbc>] ktime_get+0xbc/0xc8 [ C0] [<9000000000354c08>] tick_sched_timer+0x30/0xb0 [ C0] [<90000000003408e0>] __hrtimer_run_queues+0x160/0x378 [ C0] [<9000000000341f14>] hrtimer_interrupt+0x144/0x388 [ C0] [<9000000000228348>] constant_timer_interrupt+0x38/0x48 [ C0] [<90000000002feba4>] __handle_irq_event_percpu+0x64/0x1e8 [ C0] [<90000000002fed48>] handle_irq_event_percpu+0x20/0x80 [ C0] [<9000000000306b9c>] handle_percpu_irq+0x5c/0x98 [ C0] [<90000000002fd4a0>] generic_handle_domain_irq+0x30/0x48 [ C0] [<9000000000d0c7b0>] handle_cpu_irq+0x70/0xa8 [ C0] [<9000000001646b30>] handle_loongarch_irq+0x30/0x48 [ C0] [<9000000001646bc8>] do_vint+0x80/0xe0 [ C0] [<90000000002aea1c>] finish_task_switch.isra.0+0x8c/0x2a8 [ C0] [<900000000164e34c>] __schedule+0x314/0xa48 [ C0] [<900000000164ead8>] schedule+0x58/0xf0 [ C0] [<9000000000294a2c>] worker_thread+0x224/0x498 [ C0] [<900000000029d2f0>] kthread+0xf8/0x108 [ C0] [<9000000000221f28>] ret_from_kernel_thread+0xc/0xa4 [ C0] [ C0] ---[ end trace 0000000000000000 ]--- The root cause is acpi_enable_all_wakeup_gpes() uses a mutex to protect acpi_hw_enable_all_wakeup_gpes(), and acpi_ut_acquire_mutex() may cause a thread switch. Since there is no longer concurrent execution during loongarch_acpi_suspend(), we can call acpi_hw_enable_all_wakeup_gpes() directly in enable_gpe_wakeup(). The solution is similar to commit 22db06337f590d01 ("ACPI: sleep: Avoid breaking S3 wakeup due to might_sleep()"). Fixes: 366bb35a8e48 ("LoongArch: Add suspend (ACPI S3) support") Signed-off-by: Qunqin Zhao <zhaoqunqin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25mm/memblock: add memblock_alloc_or_panic interfaceGuo Weikang
Before SLUB initialization, various subsystems used memblock_alloc to allocate memory. In most cases, when memory allocation fails, an immediate panic is required. To simplify this behavior and reduce repetitive checks, introduce `memblock_alloc_or_panic`. This function ensures that memory allocation failures result in a panic automatically, improving code readability and consistency across subsystems that require this behavior. [guoweikang.kernel@gmail.com: arch/s390: save_area_alloc default failure behavior changed to panic] Link: https://lkml.kernel.org/r/20250109033136.2845676-1-guoweikang.kernel@gmail.com Link: https://lore.kernel.org/lkml/Z2fknmnNtiZbCc7x@kernel.org/ Link: https://lkml.kernel.org/r/20250102072528.650926-1-guoweikang.kernel@gmail.com Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25asm-generic: pgalloc: provide generic __pgd_{alloc,free}Kevin Brodsky
We already have a generic implementation of alloc/free up to P4D level, as well as pgd_free(). Let's finish the work and add a generic PGD-level alloc helper as well. Unlike at lower levels, almost all architectures need some specific magic at PGD level (typically initialising PGD entries), so introducing a generic pgd_alloc() isn't worth it. Instead we introduce two new helpers, __pgd_alloc() and __pgd_free(), and make use of them in the arch-specific pgd_alloc() and pgd_free() wherever possible. To accommodate as many arch as possible, __pgd_alloc() takes a page allocation order. Because pagetable_alloc() allocates zeroed pages, explicit zeroing in pgd_alloc() becomes redundant and we can get rid of it. Some trivial implementations of pgd_free() also become unnecessary once __pgd_alloc() is used; remove them. Another small improvement is consistent accounting of PGD pages by using GFP_PGTABLE_{USER,KERNEL} as appropriate. Not all PGD allocations can be handled by the generic helpers. In particular, multiple architectures allocate PGDs from a kmem_cache, and those PGDs may not be page-sized. Link: https://lkml.kernel.org/r/20250103184415.2744423-6-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: pgtable: introduce pagetable_dtor()Qi Zheng
The pagetable_p*_dtor() are exactly the same except for the handling of ptlock. If we make ptlock_free() handle the case where ptdesc->ptl is NULL and remove VM_BUG_ON_PAGE() from pmd_ptlock_free(), we can unify pagetable_p*_dtor() into one function. Let's introduce pagetable_dtor() to do this. Later, pagetable_dtor() will be moved to tlb_remove_ptdesc(), so that ptlock and page table pages can be freed together (regardless of whether RCU is used). This prevents the use-after-free problem where the ptlock is freed immediately but the page table pages is freed later via RCU. Link: https://lkml.kernel.org/r/47f44fff9dc68d9d9e9a0d6c036df275f820598a.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: add build-time option for hotplug memory default online typeGregory Price
Memory hotplug presently auto-onlines memory into a zone the kernel deems appropriate if CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y. The memhp_default_state boot param enables runtime config, but it's not possible to do this at build-time. Remove CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, and replace it with CONFIG_MHP_DEFAULT_ONLINE_TYPE_* choices that sync with the boot param. Selections: CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE => mhp_default_online_type = "offline" Memory will not be onlined automatically. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO => mhp_default_online_type = "online" Memory will be onlined automatically in a zone deemed. appropriate by the kernel. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_KERNEL => mhp_default_online_type = "online_kernel" Memory will be onlined automatically. The zone may allow kernel data (e.g. ZONE_NORMAL). CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE => mhp_default_online_type = "online_movable" Memory will be onlined automatically. The zone will be ZONE_MOVABLE. Default to CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE to match the existing default CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n behavior. Existing users of CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y should use CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO. [gourry@gourry.net: update KConfig comments] Link: https://lkml.kernel.org/r/20241226182918.648799-1-gourry@gourry.net Link: https://lkml.kernel.org/r/20241220210709.300066-1-gourry@gourry.net Signed-off-by: Gregory Price <gourry@gourry.net> Acked-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "Loongarch: - Clear LLBCTL if secondary mmu mapping changes - Add hypercall service support for usermode VMM x86: - Add a comment to kvm_mmu_do_page_fault() to explain why KVM performs a direct call to kvm_tdp_page_fault() when RETPOLINE is enabled - Ensure that all SEV code is compiled out when disabled in Kconfig, even if building with less brilliant compilers - Remove a redundant TLB flush on AMD processors when guest CR4.PGE changes - Use str_enabled_disabled() to replace open coded strings - Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's APICv cache prior to every VM-Enter - Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities instead of just those where KVM needs to manage state and/or explicitly enable the feature in hardware. Along the way, refactor the code to make it easier to add features, and to make it more self-documenting how KVM is handling each feature - Rework KVM's handling of VM-Exits during event vectoring; this plugs holes where KVM unintentionally puts the vCPU into infinite loops in some scenarios (e.g. if emulation is triggered by the exit), and brings parity between VMX and SVM - Add pending request and interrupt injection information to the kvm_exit and kvm_entry tracepoints respectively - Fix a relatively benign flaw where KVM would end up redoing RDPKRU when loading guest/host PKRU, due to a refactoring of the kernel helpers that didn't account for KVM's pre-checking of the need to do WRPKRU - Make the completion of hypercalls go through the complete_hypercall function pointer argument, no matter if the hypercall exits to userspace or not. Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically went to userspace, and all the others did not; the new code need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at all whether there was an exit to userspace or not - As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. When TDX will be enabled, operations on private pages will need to go through the privileged TDX Module via SEAMCALLs; as a result, they are limited and relatively slow compared to reading a PTE. The patches included in 6.14 allow KVM to keep a mirror of the private EPT in host memory, and define entries in kvm_x86_ops to operate on external page tables such as the TDX private EPT - The recently introduced conversion of the NX-page reclamation kthread to vhost_task moved the task under the main process. The task is created as soon as KVM_CREATE_VM was invoked and this, of course, broke userspace that didn't expect to see any child task of the VM process until it started creating its own userspace threads. In particular crosvm refuses to fork() if procfs shows any child task, so unbreak it by creating the task lazily. This is arguably a userspace bug, as there can be other kinds of legitimate worker tasks and they wouldn't impede fork(); but it's not like userspace has a way to distinguish kernel worker tasks right now. Should they show as "Kthread: 1" in proc/.../status? x86 - Intel: - Fix a bug where KVM updates hardware's APICv cache of the highest ISR bit while L2 is active, while ultimately results in a hardware-accelerated L1 EOI effectively being lost - Honor event priority when emulating Posted Interrupt delivery during nested VM-Enter by queueing KVM_REQ_EVENT instead of immediately handling the interrupt - Rework KVM's processing of the Page-Modification Logging buffer to reap entries in the same order they were created, i.e. to mark gfns dirty in the same order that hardware marked the page/PTE dirty - Misc cleanups Generic: - Cleanup and harden kvm_set_memory_region(); add proper lockdep assertions when setting memory regions and add a dedicated API for setting KVM-internal memory regions. The API can then explicitly disallow all flags for KVM-internal memory regions - Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug where KVM would return a pointer to a vCPU prior to it being fully online, and give kvm_for_each_vcpu() similar treatment to fix a similar flaw - Wait for a vCPU to come online prior to executing a vCPU ioctl, to fix a bug where userspace could coerce KVM into handling the ioctl on a vCPU that isn't yet onlined - Gracefully handle xarray insertion failures; even though such failures are impossible in practice after xa_reserve(), reserving an entry is always followed by xa_store() which does not know (or differentiate) whether there was an xa_reserve() before or not RISC-V: - Zabha, Svvptc, and Ziccrse extension support for guests. None of them require anything in KVM except for detecting them and marking them as supported; Zabha adds byte and halfword atomic operations, while the others are markers for specific operation of the TLB and of LL/SC instructions respectively - Virtualize SBI system suspend extension for Guest/VM - Support firmware counters which can be used by the guests to collect statistics about traps that occur in the host Selftests: - Rework vcpu_get_reg() to return a value instead of using an out-param, and update all affected arch code accordingly - Convert the max_guest_memory_test into a more generic mmu_stress_test. The basic gist of the "conversion" is to have the test do mprotect() on guest memory while vCPUs are accessing said memory, e.g. to verify KVM and mmu_notifiers are working as intended - Play nice with treewrite builds of unsupported architectures, e.g. arm (32-bit), as KVM selftests' Makefile doesn't do anything to ensure the target architecture is actually one KVM selftests supports - Use the kernel's $(ARCH) definition instead of the target triple for arch specific directories, e.g. arm64 instead of aarch64, mainly so as not to be different from the rest of the kernel - Ensure that format strings for logging statements are checked by the compiler even when the logging statement itself is disabled - Attempt to whack the last LLC references/misses mole in the Intel PMU counters test by adding a data load and doing CLFLUSH{OPT} on the data instead of the code being executed. It seems that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that events are counting correctly without actually knowing what the events count given the underlying hardware; this can happen if Intel reuses a formerly microarchitecture-specific event encoding as an architectural event, as was the case for Top-Down Slots" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (151 commits) kvm: defer huge page recovery vhost task to later KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault() KVM: Disallow all flags for KVM-internal memslots KVM: x86: Drop double-underscores from __kvm_set_memory_region() KVM: Add a dedicated API for setting KVM-internal memslots KVM: Assert slots_lock is held when setting memory regions KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API) LoongArch: KVM: Add hypercall service support for usermode VMM LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changed KVM: SVM: Use str_enabled_disabled() helper in svm_hardware_setup() KVM: VMX: read the PML log in the same order as it was written KVM: VMX: refactor PML terminology KVM: VMX: Fix comment of handle_vmx_instruction() KVM: VMX: Reinstate __exit attribute for vmx_exit() KVM: SVM: Use str_enabled_disabled() helper in sev_hardware_setup() KVM: x86: Avoid double RDPKRU when loading host/guest PKRU KVM: x86: Use LVT_TIMER instead of an open coded literal RISC-V: KVM: Add new exit statstics for redirected traps RISC-V: KVM: Update firmware counters for various events RISC-V: KVM: Redirect instruction access fault trap to guest ...
2025-01-25LoongArch: Adjust SETUP_SLEEP and SETUP_WAKEUPHuacai Chen
SETUP_SLEEP should only save the GPR context, which is symmetric to SETUP_WAKEUP, so move the acpi_saved_sp handling out of SETUP_SLEEP. Move "addi.d sp, sp, PT_SIZE" into SETUP_WAKEUP for the same reason. No functional changes. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Refactor bug_handler() implementationHuacai Chen
1. Early return for user mode triggered exception with all types. 2. Give a chance to call fixup_exception() for default types (like S390). Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Add pgprot_nx() implementationHuacai Chen
Commit cca98e9f8b5ebcd964 ("mm: enforce that vmap can't map pages executable") enforces the W^X protection by not allowing remapping existing pages as executable. Add LoongArch bits so that LoongArch can benefit the same protection. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Correct the __switch_to() prototype in commentsHuacai Chen
Correct the __switch_to() prototype in comments, keep it be the same as the declaration in switch_to.h. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Correct the cacheinfo sharing informationHuacai Chen
SMT cores and their sibling cores share the same L1 and L2 private caches (of course last level cache is also shared), so correct the cacheinfo sharing information to let shared_cpu_map correctly reflect this relationship. Below is the output of "lscpu" on Loongson-3A6000 (4 cores, 8 threads). 1. Before patch: L1d: 512 KiB (8 instances) L1i: 512 KiB (8 instances) L2: 2 MiB (8 instances) L3: 16 MiB (1 instance) 2. After patch: L1d: 256 KiB (4 instances) L1i: 256 KiB (4 instances) L2: 1 MiB (4 instances) L3: 16 MiB (1 instance) Reported-by: Chao Li <lichao@loongson.cn> Signed-off-by: Juxin Gao <gaojuxin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Derive timer max_delta from PRCFG1's timer_bitsJiaxun Yang
As per arch spec, maximum timer bits is configurable and should not be hardcoded in any way. Probe timer bits from PRCFG1 and use that to determine the clockevent's max_delta to be conformance. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Disable FIX_EARLYCON_MEM when ARCH_IOREMAP is enabledJiaxun Yang
When ARCH_IOREMAP is enabled, we are using always accessible DMW for ioremap(). It makes no sense to create a dedicated mapping for earlycon given that we can access the region via DMW. Disable FIX_EARLYCON_MEM when ARCH_IOREMAP is selected. This can ease debugging for early mapping issues. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-25LoongArch: Migrate to the generic rule for built-in DTBMasahiro Yamada
Commit 654102df2ac2 ("kbuild: add generic support for built-in boot DTBs") introduced generic support for built-in DTBs. Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-24Merge tag 'v6.14-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Remove physical address skcipher walking - Fix boot-up self-test race Algorithms: - Optimisations for x86/aes-gcm - Optimisations for x86/aes-xts - Remove VMAC - Remove keywrap Drivers: - Remove n2 Others: - Fixes for padata UAF - Fix potential rhashtable deadlock by moving schedule_work outside lock" * tag 'v6.14-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (75 commits) rhashtable: Fix rhashtable_try_insert test dt-bindings: crypto: qcom,inline-crypto-engine: Document the SM8750 ICE dt-bindings: crypto: qcom,prng: Document SM8750 RNG dt-bindings: crypto: qcom-qce: Document the SM8750 crypto engine crypto: asymmetric_keys - Remove unused key_being_used_for[] padata: avoid UAF for reorder_work padata: fix UAF in padata_reorder padata: add pd get/put refcnt helper crypto: skcipher - call cond_resched() directly crypto: skcipher - optimize initializing skcipher_walk fields crypto: skcipher - clean up initialization of skcipher_walk::flags crypto: skcipher - fold skcipher_walk_skcipher() into skcipher_walk_virt() crypto: skcipher - remove redundant check for SKCIPHER_WALK_SLOW crypto: skcipher - remove redundant clamping to page size crypto: skcipher - remove unnecessary page alignment of bounce buffer crypto: skcipher - document skcipher_walk_done() and rename some vars crypto: omap - switch from scatter_walk to plain offset crypto: powerpc/p10-aes-gcm - simplify handling of linear associated data crypto: bcm - Drop unused setting of local 'ptr' variable crypto: hisilicon/qm - support new function communication ...
2025-01-22Merge tag 'crc-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: - Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be directly accessible via the library API, instead of requiring the crypto API. This is much simpler and more efficient. - Convert some users such as ext4 to use the CRC32 library API instead of the crypto API. More conversions like this will come later. - Add a KUnit test that tests and benchmarks multiple CRC variants. Remove older, less-comprehensive tests that are made redundant by this. - Add an entry to MAINTAINERS for the kernel's CRC library code. I'm volunteering to maintain it. I have additional cleanups and optimizations planned for future cycles. * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits) MAINTAINERS: add entry for CRC library powerpc/crc: delete obsolete crc-vpmsum_test.c lib/crc32test: delete obsolete crc32test.c lib/crc16_kunit: delete obsolete crc16_kunit.c lib/crc_kunit.c: add KUnit test suite for CRC library functions powerpc/crc-t10dif: expose CRC-T10DIF function through lib arm64/crc-t10dif: expose CRC-T10DIF function through lib arm/crc-t10dif: expose CRC-T10DIF function through lib x86/crc-t10dif: expose CRC-T10DIF function through lib crypto: crct10dif - expose arch-optimized lib function lib/crc-t10dif: add support for arch overrides lib/crc-t10dif: stop wrapping the crypto API scsi: target: iscsi: switch to using the crc32c library f2fs: switch to using the crc32 library jbd2: switch to using the crc32c library ext4: switch to using the crc32c library lib/crc32: make crc32c() go directly to lib bcachefs: Explicitly select CRYPTO from BCACHEFS_FS x86/crc32: expose CRC32 functions through lib x86/crc32: update prototype for crc32_pclmul_le_16() ...
2025-01-21Merge tag 'ftrace-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Have fprobes built on top of function graph infrastructure The fprobe logic is an optimized kprobe that uses ftrace to attach to functions when a probe is needed at the start or end of the function. The fprobe and kretprobe logic implements a similar method as the function graph tracer to trace the end of the function. That is to hijack the return address and jump to a trampoline to do the trace when the function exits. To do this, a shadow stack needs to be created to store the original return address. Fprobes and function graph do this slightly differently. Fprobes (and kretprobes) has slots per callsite that are reserved to save the return address. This is fine when just a few points are traced. But users of fprobes, such as BPF programs, are starting to add many more locations, and this method does not scale. The function graph tracer was created to trace all functions in the kernel. In order to do this, when function graph tracing is started, every task gets its own shadow stack to hold the return address that is going to be traced. The function graph tracer has been updated to allow multiple users to use its infrastructure. Now have fprobes be one of those users. This will also allow for the fprobe and kretprobe methods to trace the return address to become obsolete. With new technologies like CFI that need to know about these methods of hijacking the return address, going toward a solution that has only one method of doing this will make the kernel less complex. - Cleanup with guard() and free() helpers There were several places in the code that had a lot of "goto out" in the error paths to either unlock a lock or free some memory that was allocated. But this is error prone. Convert the code over to use the guard() and free() helpers that let the compiler unlock locks or free memory when the function exits. - Remove disabling of interrupts in the function graph tracer When function graph tracer was first introduced, it could race with interrupts and NMIs. To prevent that race, it would disable interrupts and not trace NMIs. But the code has changed to allow NMIs and also interrupts. This change was done a long time ago, but the disabling of interrupts was never removed. Remove the disabling of interrupts in the function graph tracer is it is not needed. This greatly improves its performance. - Allow the :mod: command to enable tracing module functions on the kernel command line. The function tracer already has a way to enable functions to be traced in modules by writing ":mod:<module>" into set_ftrace_filter. That will enable either all the functions for the module if it is loaded, or if it is not, it will cache that command, and when the module is loaded that matches <module>, its functions will be enabled. This also allows init functions to be traced. But currently events do not have that feature. Because enabling function tracing can be done very early at boot up (before scheduling is enabled), the commands that can be done when function tracing is started is limited. Having the ":mod:" command to trace module functions as they are loaded is very useful. Update the kernel command line function filtering to allow it. * tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits) ftrace: Implement :mod: cache filtering on kernel command line tracing: Adopt __free() and guard() for trace_fprobe.c bpf: Use ftrace_get_symaddr() for kprobe_multi probes ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr Documentation: probes: Update fprobe on function-graph tracer selftests/ftrace: Add a test case for repeating register/unregister fprobe selftests: ftrace: Remove obsolate maxactive syntax check tracing/fprobe: Remove nr_maxactive from fprobe fprobe: Add fprobe_header encoding feature fprobe: Rewrite fprobe on function-graph tracer s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS tracing: Add ftrace_fill_perf_regs() for perf event tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs fprobe: Use ftrace_regs in fprobe exit handler fprobe: Use ftrace_regs in fprobe entry handler fgraph: Pass ftrace_regs to retfunc fgraph: Replace fgraph_ret_regs with ftrace_regs ...
2025-01-21Merge tag 'objtool-core-2025-01-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Introduce the generic section-based annotation infrastructure a.k.a. ASM_ANNOTATE/ANNOTATE (Peter Zijlstra) - Convert various facilities to ASM_ANNOTATE/ANNOTATE: (Peter Zijlstra) - ANNOTATE_NOENDBR - ANNOTATE_RETPOLINE_SAFE - instrumentation_{begin,end}() - VALIDATE_UNRET_BEGIN - ANNOTATE_IGNORE_ALTERNATIVE - ANNOTATE_INTRA_FUNCTION_CALL - {.UN}REACHABLE - Optimize the annotation-sections parsing code (Peter Zijlstra) - Centralize annotation definitions in <linux/objtool.h> - Unify & simplify the barrier_before_unreachable()/unreachable() definitions (Peter Zijlstra) - Convert unreachable() calls to BUG() in x86 code, as unreachable() has unreliable code generation (Peter Zijlstra) - Remove annotate_reachable() and annotate_unreachable(), as it's unreliable against compiler optimizations (Peter Zijlstra) - Fix non-standard ANNOTATE_REACHABLE annotation order (Peter Zijlstra) - Robustify the annotation code by warning about unknown annotation types (Peter Zijlstra) - Allow arch code to discover jump table size, in preparation of annotated jump table support (Ard Biesheuvel) * tag 'objtool-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Convert unreachable() to BUG() objtool: Allow arch code to discover jump table size objtool: Warn about unknown annotation types objtool: Fix ANNOTATE_REACHABLE to be a normal annotation objtool: Convert {.UN}REACHABLE to ANNOTATE objtool: Remove annotate_{,un}reachable() loongarch: Use ASM_REACHABLE x86: Convert unreachable() to BUG() unreachable: Unify objtool: Collect more annotations in objtool.h objtool: Collapse annotate sequences objtool: Convert ANNOTATE_INTRA_FUNCTION_CALL to ANNOTATE objtool: Convert ANNOTATE_IGNORE_ALTERNATIVE to ANNOTATE objtool: Convert VALIDATE_UNRET_BEGIN to ANNOTATE objtool: Convert instrumentation_{begin,end}() to ANNOTATE objtool: Convert ANNOTATE_RETPOLINE_SAFE to ANNOTATE objtool: Convert ANNOTATE_NOENDBR to ANNOTATE objtool: Generic annotation infrastructure
2025-01-13LoongArch: KVM: Add hypercall service support for usermode VMMBibo Mao
Some VMMs provides special hypercall service in usermode, KVM should not handle the usermode hypercall service, thus pass it to usermode, let the usermode VMM handle it. Here a new code KVM_HCALL_CODE_USER_SERVICE is added for the user-mode hypercall service, KVM lets all six registers visible to usermode VMM. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-13LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changedBibo Mao
LLBCTL is a separated guest CSR register from host, host exception ERET instruction will clear the host LLBCTL CSR register, and guest exception will clear the guest LLBCTL CSR register. VCPU0 atomic64_fetch_add_unless VCPU1 atomic64_fetch_add_unless ll.d %[p], %[c] beq %[p], %[u], 1f Here secondary mmu mapping is changed, host hpa page is replaced with a new page. And VCPU1 will execute atomic instruction on the new page. ll.d %[p], %[c] beq %[p], %[u], 1f add.d %[rc], %[p], %[a] sc.d %[rc], %[c] add.d %[rc], %[p], %[a] sc.d %[rc], %[c] LLBCTL is set on VCPU0 and it represents the memory is not modified by other VCPUs, sc.d will modify the memory directly. So clear WCLLB of the guest LLBCTL register when mapping is the changed. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-01-04EDAC: Add an EDAC driver for the Loongson memory controllerZhao Qunqin
Add ECC support for Loongson SoC DDR controller. This driver reports single bit errors (CE) only. Only ACPI firmware is supported. [ bp: Document what last_ce_count is for. ] Signed-off-by: Zhao Qunqin <zhaoqunqin@loongson.cn> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://lore.kernel.org/r/20241219124846.1876-1-zhaoqunqin@loongson.cn Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-01-04crypto: vmac - remove unused VMAC algorithmEric Biggers
Remove the vmac64 template, as it has no known users. It also continues to have longstanding bugs such as alignment violations (see https://lore.kernel.org/r/20241226134847.6690-1-evepolonium@gmail.com/). This code was added in 2009 by commit f1939f7c5645 ("crypto: vmac - New hash algorithm for intel_txt support"). Based on the mention of intel_txt support in the commit title, it seems it was added as a prerequisite for the contemporaneous patch "intel_txt: add s3 userspace memory integrity verification" (https://lore.kernel.org/r/4ABF2B50.6070106@intel.com/). In the design proposed by that patch, when an Intel Trusted Execution Technology (TXT) enabled system resumed from suspend, the "tboot" trusted executable launched the Linux kernel without verifying userspace memory, and then the Linux kernel used VMAC to verify userspace memory. However, that patch was never merged, as reviewers had objected to the design. It was later reworked into commit 4bd96a7a8185 ("x86, tboot: Add support for S3 memory integrity protection") which made tboot verify the memory instead. Thus the VMAC support in Linux was never used. No in-tree user has appeared since then, other than potentially the usual components that allow specifying arbitrary hash algorithms by name, namely AF_ALG and dm-integrity. However there are no indications that VMAC is being used with these components. Debian Code Search and web searches for "vmac64" (the actual algorithm name) do not return any results other than the kernel itself, suggesting that it does not appear in any other code or documentation. Explicitly grepping the source code of the usual suspects (libell, iwd, cryptsetup) finds no matches either. Before 2018, the vmac code was also completely broken due to using a hardcoded nonce and the wrong endianness for the MAC. It was then fixed by commit ed331adab35b ("crypto: vmac - add nonced version with big endian digest") and commit 0917b873127c ("crypto: vmac - remove insecure version with hardcoded nonce"). These were intentionally breaking changes that changed all the computed MAC values as well as the algorithm name ("vmac" to "vmac64"). No complaints were ever received about these breaking changes, strongly suggesting the absence of users. The reason I had put some effort into fixing this code in 2018 is because it was used by an out-of-tree driver. But if it is still needed in that particular out-of-tree driver, the code can be carried in that driver instead. There is no need to carry it upstream. Cc: Atharva Tiwari <evepolonium@gmail.com> Cc: Shane Wang <shane.wang@intel.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-26fprobe: Add fprobe_header encoding featureMasami Hiramatsu (Google)
Fprobe store its data structure address and size on the fgraph return stack by __fprobe_header. But most 64bit architecture can combine those to one unsigned long value because 4 MSB in the kernel address are the same. With this encoding, fprobe can consume less space on ret_stack. This introduces asm/fprobe.h to define arch dependent encode/decode macros. Note that since fprobe depends on CONFIG_HAVE_FUNCTION_GRAPH_FREGS, currently only arm64, loongarch, riscv, s390 and x86 are supported. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/173519005783.391279.5307910947400277525.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26fprobe: Rewrite fprobe on function-graph tracerMasami Hiramatsu (Google)
Rewrite fprobe implementation on function-graph tracer. Major API changes are: - 'nr_maxactive' field is deprecated. - This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So currently works only on x86_64. - Currently the entry size is limited in 15 * sizeof(long). - If there is too many fprobe exit handler set on the same function, it will fail to probe. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/173519003970.391279.14406792285453830996.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNCMasami Hiramatsu (Google)
Add CONFIG_HAVE_FTRACE_GRAPH_FUNC kconfig in addition to ftrace_graph_func macro check. This is for the other feature (e.g. FPROBE) which requires to access ftrace_regs from fgraph_ops::entryfunc() can avoid compiling if the fgraph can not pass the valid ftrace_regs. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/173519001472.391279.1174901685282588467.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26fprobe: Use ftrace_regs in fprobe exit handlerMasami Hiramatsu (Google)
Change the fprobe exit handler to use ftrace_regs structure instead of pt_regs. This also introduce HAVE_FTRACE_REGS_HAVING_PT_REGS which means the ftrace_regs is including the pt_regs so that ftrace_regs can provide pt_regs without memory allocation. Fprobe introduces a new dependency with that. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: bpf <bpf@vger.kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Song Liu <song@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Matt Bobrowski <mattbobrowski@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Hao Luo <haoluo@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/173518995092.391279.6765116450352977627.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26fgraph: Replace fgraph_ret_regs with ftrace_regsMasami Hiramatsu (Google)
Use ftrace_regs instead of fgraph_ret_regs for tracing return value on function_graph tracer because of simplifying the callback interface. The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by CONFIG_HAVE_FUNCTION_GRAPH_FREGS. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/173518991508.391279.16635322774382197642.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-26fgraph: Pass ftrace_regs to entryfuncMasami Hiramatsu (Google)
Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not available, it passes a NULL instead. User callback function can access some registers (including return address) via this ftrace_regs. Note that the ftrace_regs can be NULL when the arch does NOT define: HAVE_DYNAMIC_FTRACE_WITH_ARGS or HAVE_DYNAMIC_FTRACE_WITH_REGS. More specifically, if HAVE_DYNAMIC_FTRACE_WITH_REGS is defined but not the HAVE_DYNAMIC_FTRACE_WITH_ARGS, and the ftrace ops used to register the function callback does not set FTRACE_OPS_FL_SAVE_REGS. In this case, ftrace_regs can be NULL in user callback. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/173518990044.391279.17406984900626078579.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-03LoongArch: KVM: Protect kvm_io_bus_{read,write}() with SRCUHuacai Chen
When we enable lockdep we get such a warning: ============================= WARNING: suspicious RCU usage 6.12.0-rc7+ #1891 Tainted: G W ----------------------------- arch/loongarch/kvm/../../../virt/kvm/kvm_main.c:5945 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-loo/948: #0: 90000001184a00a8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0xf4/0xe20 [kvm] stack backtrace: CPU: 2 UID: 0 PID: 948 Comm: qemu-system-loo Tainted: G W 6.12.0-rc7+ #1891 Tainted: [W]=WARN Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 900000012c578000 900000012c57b940 0000000000000000 900000012c57b948 9000000007e53788 900000000815bcc8 900000000815bcc0 900000012c57b7b0 0000000000000001 0000000000000001 4b031894b9d6b725 0000000005dec000 9000000100427b00 00000000000003d2 0000000000000001 000000000000002d 0000000000000003 0000000000000030 00000000000003b4 0000000005dec000 0000000000000000 900000000806d000 9000000007e53788 00000000000000b4 0000000000000004 0000000000000004 0000000000000000 0000000000000000 9000000107baf600 9000000008916000 9000000007e53788 9000000005924778 000000001fe001e5 00000000000000b0 0000000000000007 0000000000000000 0000000000071c1d ... Call Trace: [<9000000005924778>] show_stack+0x38/0x180 [<90000000071519c4>] dump_stack_lvl+0x94/0xe4 [<90000000059eb754>] lockdep_rcu_suspicious+0x194/0x240 [<ffff80000221f47c>] kvm_io_bus_read+0x19c/0x1e0 [kvm] [<ffff800002225118>] kvm_emu_mmio_read+0xd8/0x440 [kvm] [<ffff8000022254bc>] kvm_handle_read_fault+0x3c/0xe0 [kvm] [<ffff80000222b3c8>] kvm_handle_exit+0x228/0x480 [kvm] Fix it by protecting kvm_io_bus_{read,write}() with SRCU. Cc: stable@vger.kernel.org Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-02objtool: Fix ANNOTATE_REACHABLE to be a normal annotationPeter Zijlstra
Currently REACHABLE is weird for being on the instruction after the instruction it modifies. Since all REACHABLE annotations have an explicit instruction, flip them around. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20241128094312.494176035@infradead.org
2024-12-02objtool: Convert {.UN}REACHABLE to ANNOTATEPeter Zijlstra
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20241128094312.353431347@infradead.org
2024-12-02loongarch: Use ASM_REACHABLEPeter Zijlstra
annotate_reachable() is unreliable since the compiler is free to place random code inbetween two consecutive asm() statements. This removes the last and only annotate_reachable() user. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20241128094312.133437051@infradead.org
2024-12-02LoongArch: KVM: Protect kvm_check_requests() with SRCUHuacai Chen
When we enable lockdep we get such a warning: ============================= WARNING: suspicious RCU usage 6.12.0-rc7+ #1891 Tainted: G W ----------------------------- include/linux/kvm_host.h:1043 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by qemu-system-loo/948: #0: 90000001184a00a8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0xf4/0xe20 [kvm] stack backtrace: CPU: 0 UID: 0 PID: 948 Comm: qemu-system-loo Tainted: G W 6.12.0-rc7+ #1891 Tainted: [W]=WARN Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 900000012c578000 900000012c57b920 0000000000000000 900000012c57b928 9000000007e53788 900000000815bcc8 900000000815bcc0 900000012c57b790 0000000000000001 0000000000000001 4b031894b9d6b725 0000000004dec000 90000001003299c0 0000000000000414 0000000000000001 000000000000002d 0000000000000003 0000000000000030 00000000000003b4 0000000004dec000 90000001184a0000 900000000806d000 9000000007e53788 00000000000000b4 0000000000000004 0000000000000004 0000000000000000 0000000000000000 9000000107baf600 9000000008916000 9000000007e53788 9000000005924778 0000000010000044 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000005924778>] show_stack+0x38/0x180 [<90000000071519c4>] dump_stack_lvl+0x94/0xe4 [<90000000059eb754>] lockdep_rcu_suspicious+0x194/0x240 [<ffff8000022143bc>] kvm_gfn_to_hva_cache_init+0xfc/0x120 [kvm] [<ffff80000222ade4>] kvm_pre_enter_guest+0x3a4/0x520 [kvm] [<ffff80000222b3dc>] kvm_handle_exit+0x23c/0x480 [kvm] Fix it by protecting kvm_check_requests() with SRCU. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-02LoongArch: BPF: Adjust the parameter of emit_jirl()Tiezhu Yang
The branch instructions beq, bne, blt, bge, bltu, bgeu and jirl belong to the format reg2i16, but the sequence of oprand is different for the instruction jirl. So adjust the parameter order of emit_jirl() to make it more readable correspond with the Instruction Set Architecture manual. Here are the instruction formats: beq rj, rd, offs16 bne rj, rd, offs16 blt rj, rd, offs16 bge rj, rd, offs16 bltu rj, rd, offs16 bgeu rj, rd, offs16 jirl rd, rj, offs16 Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#branch-instructions Suggested-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-02LoongArch: Add architecture specific huge_pte_clear()Bibo Mao
When executing mm selftests run_vmtests.sh, there is such an error: BUG: Bad page state in process uffd-unit-tests pfn:00000 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0 flags: 0xffff0000002000(reserved|node=0|zone=0|lastcpupid=0xffff) raw: 00ffff0000002000 ffffbf0000000008 ffffbf0000000008 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set Modules linked in: snd_seq_dummy snd_seq snd_seq_device rfkill vfat fat virtio_balloon efi_pstore virtio_net pstore net_failover failover fuse nfnetlink virtio_scsi virtio_gpu virtio_dma_buf dm_multipath efivarfs CPU: 2 UID: 0 PID: 1913 Comm: uffd-unit-tests Not tainted 6.12.0 #184 Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 Stack : 900000047c8ac000 0000000000000000 9000000000223a7c 900000047c8ac000 900000047c8af690 900000047c8af698 0000000000000000 900000047c8af7d8 900000047c8af7d0 900000047c8af7d0 900000047c8af5b0 0000000000000001 0000000000000001 900000047c8af698 10b3c7d53da40d26 0000010000000000 0000000000000022 0000000fffffffff fffffffffe000000 ffff800000000000 000000000000002f 0000800000000000 000000017a6d4000 90000000028f8940 0000000000000000 0000000000000000 90000000025aa5e0 9000000002905000 0000000000000000 90000000028f8940 ffff800000000000 0000000000000000 0000000000000000 0000000000000000 9000000000223a94 000000012001839c 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000000223a94>] show_stack+0x5c/0x180 [<9000000001c3fd64>] dump_stack_lvl+0x6c/0xa0 [<900000000056aa08>] bad_page+0x1a0/0x1f0 [<9000000000574978>] free_unref_folios+0xbf0/0xd20 [<90000000004e65cc>] folios_put_refs+0x1a4/0x2b8 [<9000000000599a0c>] free_pages_and_swap_cache+0x164/0x260 [<9000000000547698>] tlb_batch_pages_flush+0xa8/0x1c0 [<9000000000547f30>] tlb_finish_mmu+0xa8/0x218 [<9000000000543cb8>] exit_mmap+0x1a0/0x360 [<9000000000247658>] __mmput+0x78/0x200 [<900000000025583c>] do_exit+0x43c/0xde8 [<9000000000256490>] do_group_exit+0x68/0x110 [<9000000000256554>] sys_exit_group+0x1c/0x20 [<9000000001c413b4>] do_syscall+0x94/0x130 [<90000000002216d8>] handle_syscall+0xb8/0x158 Disabling lock debugging due to kernel taint BUG: non-zero pgtables_bytes on freeing mm: -16384 On LoongArch system, invalid huge pte entry should be invalid_pte_table or a single _PAGE_HUGE bit rather than a zero value. And it should be the same with invalid pmd entry, since pmd_none() is called by function free_pgd_range() and pmd_none() return 0 by huge_pte_clear(). So single _PAGE_HUGE bit is also treated as a valid pte table and free_pte_range() will be called in free_pmd_range(). free_pmd_range() pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); if (pmd_none_or_clear_bad(pmd)) continue; free_pte_range(tlb, pmd, addr); } while (pmd++, addr = next, addr != end); Here invalid_pte_table is used for both invalid huge pte entry and pmd entry. Cc: stable@vger.kernel.org Fixes: 09cfefb7fa70 ("LoongArch: Add memory management") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-02LoongArch/irq: Use seq_put_decimal_ull_width() for decimal valuesDavid Wang
Performance improvement for reading /proc/interrupts on LoongArch. On a system with n CPUs and m interrupts, there will be n*m decimal values yielded via seq_printf(.."%10u "..) which is less efficient than seq_put_decimal_ull_width(), stress reading /proc/interrupts indicates ~30% performance improvement with this patch (and its friends). Signed-off-by: David Wang <00107082@163.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-02LoongArch: Fix reserving screen info memory for above-4G firmwareHuacai Chen
Since screen_info.lfb_base is a __u32 type, an above-4G address need an ext_lfb_base to present its higher 32bits. In init_screen_info() we can use __screen_info_lfb_base() to handle this case for reserving screen info memory. Signed-off-by: Xuefeng Zhao <zhaoxuefeng@loongson.cn> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-12-01loongarch/crc32: expose CRC32 functions through libEric Biggers
Move the loongarch CRC32 assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going through the crypto API. It remains usable via the crypto API too via the shash algorithms that use the library interface. Thus all the arch-specific "shash" code becomes unnecessary and is removed. Note: to see the diff from arch/loongarch/crypto/crc32-loongarch.c to arch/loongarch/lib/crc32-loongarch.c, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: WangYuli <wangyuli@uniontech.com> Link: https://lore.kernel.org/r/20241202010844.144356-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-11-27Merge tag 'dmaengine-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "New hardware support: - Qualcomm SAR2130P GPI dma support - Sifive PIC64GX pdma support - Rcar r7s72100 support and associated updates Updates: - STM32 DMA3 updates for packing/unpacking mode and prevention of additional xfers - Simplification of devm_acpi_dma_controller_register() and associate cleanup including headers - loongson prefix renames - Switch back to platform_driver::remove() subsystem update" * tag 'dmaengine-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: loongson2-apb: Rename the prefix ls2x to loongson2 dt-bindings: dma: sifive pdma: Add PIC64GX to compatibles dmaengine: fix typo in the comment dmaengine: stm32-dma3: clamp AXI burst using match data dmaengine: stm32-dma3: prevent LL refactoring thanks to DT configuration dt-bindings: dma: stm32-dma3: prevent additional transfers dmaengine: stm32-dma3: refactor HW linked-list to optimize memory accesses dmaengine: stm32-dma3: prevent pack/unpack thanks to DT configuration dt-bindings: dma: stm32-dma3: prevent packing/unpacking mode dmaengine: idxd: Move DSA/IAA device IDs to IDXD driver dt-bindings: dma: qcom,gpi: Add SAR2130P compatible dmaengine: Switch back to struct platform_driver::remove() dmaengine: ep93xx: Fix unsigned compared against 0 dmaengine: acpi: Clean up headers dmaengine: acpi: Simplify devm_acpi_dma_controller_register() dmaengine: acpi: Drop unused devm_acpi_dma_controller_free() dmaengine: sh: rz-dmac: add r7s72100 support dt-bindings: dma: rz-dmac: Document RZ/A1H SoC