linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-25	Merge tag 'timers-cleanups-2025-03-23' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member" * tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) wifi: rt2x00: Switch to use hrtimer_update_function() io_uring: Use helper function hrtimer_update_function() serial: xilinx_uartps: Use helper function hrtimer_update_function() ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup() RDMA: Switch to use hrtimer_setup() virtio: mem: Switch to use hrtimer_setup() drm/vmwgfx: Switch to use hrtimer_setup() drm/xe/oa: Switch to use hrtimer_setup() drm/vkms: Switch to use hrtimer_setup() drm/msm: Switch to use hrtimer_setup() drm/i915/request: Switch to use hrtimer_setup() drm/i915/uncore: Switch to use hrtimer_setup() drm/i915/pmu: Switch to use hrtimer_setup() drm/i915/perf: Switch to use hrtimer_setup() drm/i915/gvt: Switch to use hrtimer_setup() drm/i915/huc: Switch to use hrtimer_setup() drm/amdgpu: Switch to use hrtimer_setup() stm class: heartbeat: Switch to use hrtimer_setup() i2c: Switch to use hrtimer_setup() iio: Switch to use hrtimer_setup() ...
2025-03-24	Merge tag 'perf-core-2025-03-22' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Core: - Move perf_event sysctls into kernel/events/ (Joel Granados) - Use POLLHUP for pinned events in error (Namhyung Kim) - Avoid the read if the count is already updated (Peter Zijlstra) - Allow the EPOLLRDNORM flag for poll (Tao Chen) - locking/percpu-rwsem: Add guard support [ NOTE: this got (mis-)merged into the perf tree due to related work ] (Peter Zijlstra) perf_pmu_unregister() related improvements: (Peter Zijlstra) - Simplify the perf_event_alloc() error path - Simplify the perf_pmu_register() error path - Simplify perf_pmu_register() - Simplify perf_init_event() - Simplify perf_event_alloc() - Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count - Add this_cpc() helper - Introduce perf_free_addr_filters() - Robustify perf_event_free_bpf_prog() - Simplify the perf_mmap() control flow - Further simplify perf_mmap() - Remove retry loop from perf_mmap() - Lift event->mmap_mutex in perf_mmap() - Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes - Fix perf_mmap() failure path Uprobes: - Harden x86 uretprobe syscall trampoline check (Jiri Olsa) - Remove redundant spinlock in uprobe_deny_signal() (Liao Chang) - Remove the spinlock within handle_singlestep() (Liao Chang) x86 Intel PMU enhancements: - Support PEBS counters snapshotting (Kan Liang) - Fix intel_pmu_read_event() (Kan Liang) - Extend per event callchain limit to branch stack (Kan Liang) - Fix system-wide LBR profiling (Kan Liang) - Allocate bts_ctx only if necessary (Li RongQing) - Apply static call for drain_pebs (Peter Zijlstra) x86 AMD PMU enhancements: (Ravi Bangoria) - Remove pointless sample period check - Fix ->config to sample period calculation for OP PMU - Fix perf_ibs_op.cnt_mask for CurCnt - Don't allow freq mode event creation through ->config interface - Add PMU specific minimum period - Add ->check_period() callback - Ceil sample_period to min_period - Add support for OP Load Latency Filtering - Update DTLB/PageSize decode logic Hardware breakpoints: - Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar Bhaskar) Hardlockup detector improvements: (Li Huafei) - perf_event memory leak - Warn if watchdog_ev is leaked Fixes and cleanups: - Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan)" * tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) perf: Fix __percpu annotation perf: Clean up pmu specific data perf/x86: Remove swap_task_ctx() perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide mode perf: Supply task information to sched_task() perf: attach/detach PMU specific data locking/percpu-rwsem: Add guard support perf: Save PMU specific data in task_struct perf: Extend per event callchain limit to branch stack perf/ring_buffer: Allow the EPOLLRDNORM flag for poll perf/core: Use POLLHUP for pinned events in error perf/core: Use sysfs_emit() instead of scnprintf() perf/core: Remove optional 'size' arguments from strscpy() calls perf/x86/intel/bts: Check if bts_ctx is allocated when calling BTS functions uprobes/x86: Harden uretprobe syscall trampoline check watchdog/hardlockup/perf: Warn if watchdog_ev is leaked watchdog/hardlockup/perf: Fix perf_event memory leak perf/x86: Annotate struct bts_buffer::buf with __counted_by() perf/core: Clean up perf_try_init_event() perf/core: Fix perf_mmap() failure path ...
2025-03-24	Merge tag 'sched-core-2025-03-22' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core & fair scheduler changes: - Cancel the slice protection of the idle entity (Zihan Zhou) - Reduce the default slice to avoid tasks getting an extra tick (Zihan Zhou) - Force propagating min_slice of cfs_rq when {en,de}queue tasks (Tianchen Ding) - Refactor can_migrate_task() to elimate looping (I Hsin Cheng) - Add unlikey branch hints to several system calls (Colin Ian King) - Optimize current_clr_polling() on certain architectures (Yujun Dong) Deadline scheduler: (Juri Lelli) - Remove redundant dl_clear_root_domain call - Move dl_rebuild_rd_accounting to cpuset.h Uclamp: - Use the uclamp_is_used() helper instead of open-coding it (Xuewen Yan) - Optimize sched_uclamp_used static key enabling (Xuewen Yan) Scheduler topology support: (Juri Lelli) - Ignore special tasks when rebuilding domains - Add wrappers for sched_domains_mutex - Generalize unique visiting of root domains - Rebuild root domain accounting after every update - Remove partition_and_rebuild_sched_domains - Stop exposing partition_sched_domains_locked RSEQ: (Michael Jeanson) - Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y - Fix segfault on registration when rseq_cs is non-zero - selftests: Add rseq syscall errors test - selftests: Ensure the rseq ABI TLS is actually 1024 bytes Membarriers: - Fix redundant load of membarrier_state (Nysal Jan K.A.) Scheduler debugging: - Introduce and use preempt_model_str() (Sebastian Andrzej Siewior) - Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar) Fixes and cleanups: - Always save/restore x86 TSC sched_clock() on suspend/resume (Guilherme G. Piccoli) - Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian Andrzej Siewior)" * tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling() sched/debug: Remove CONFIG_SCHED_DEBUG sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional sched/debug: Make 'const_debug' tunables unconditional __read_mostly sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() rseq/selftests: Fix namespace collision with rseq UAPI header include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h sched/topology: Stop exposing partition_sched_domains_locked cgroup/cpuset: Remove partition_and_rebuild_sched_domains sched/topology: Remove redundant dl_clear_root_domain call sched/deadline: Rebuild root domain accounting after every update sched/deadline: Generalize unique visiting of root domains sched/topology: Wrappers for sched_domains_mutex sched/deadline: Ignore special tasks when rebuilding domains tracing: Use preempt_model_str() xtensa: Rely on generic printing of preemption model x86: Rely on generic printing of preemption model s390: Rely on generic printing of preemption model ...
2025-03-24	Merge tag 'bitmap-for-6.15' of https://github.com/norov/linux	Linus Torvalds
	Pull bitmap updates from Yury Norov: - cpumask_next_wrap() rework (me) - GENMASK() simplification (I Hsin) - rust bindings for cpumasks (Viresh and me) - scattered cleanups (Andy, Tamir, Vincent, Ignacio and Joel) * tag 'bitmap-for-6.15' of https://github.com/norov/linux: (22 commits) cpumask: align text in comment riscv: fix test_and_{set,clear}_bit ordering documentation treewide: fix typo 'unsigned __init128' -> 'unsigned __int128' MAINTAINERS: add rust bindings entry for bitmap API rust: Add cpumask helpers uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)" cpumask: drop cpumask_next_wrap_old() PCI: hv: Switch hv_compose_multi_msi_req_get_cpu() to using cpumask_next_wrap() scsi: lpfc: rework lpfc_next_{online,present}_cpu() scsi: lpfc: switch lpfc_irq_rebalance() to using cpumask_next_wrap() s390: switch stop_machine_yield() to using cpumask_next_wrap() padata: switch padata_find_next() to using cpumask_next_wrap() cpumask: use cpumask_next_wrap() where appropriate cpumask: re-introduce cpumask_next{,_and}_wrap() cpumask: deprecate cpumask_next_wrap() powerpc/xmon: simplify xmon_batch_next_cpu() ibmvnic: simplify ibmvnic_set_queue_affinity() virtio_net: simplify virtnet_set_affinity() objpool: rework objpool_pop() cpumask: add for_each_{possible,online}_cpu_wrap ...
2025-03-24	Merge tag 'seccomp-v6.15-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - avoid the lock trip seccomp_filter_release in common case (Mateusz Guzik) - remove unused 'sd' argument through-out (Oleg Nesterov) - selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64 * tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: avoid the lock trip seccomp_filter_release in common case seccomp: remove the 'sd' argument from __seccomp_filter() seccomp: remove the 'sd' argument from __secure_computing() seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER seccomp/mips: change syscall_trace_enter() to use secure_computing() selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64
2025-03-24	Merge tag 'execve-v6.15-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - elf: Define and use note name macros (Akihiko Odaki) - elf: add remaining SHF_ flag macros (Timur Tabi) - binfmt: Remove loader from linux_binprm struct (Yonatan Goldschmidt) - binfmt_elf_fdpic: fix variable set but not used warning (sunliming) * tag 'execve-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_elf_fdpic: fix variable set but not used warning elf: add remaining SHF_ flag macros binfmt: Remove loader from linux_binprm struct crash: Remove KEXEC_CORE_NOTE_NAME s390/crash: Use note name macros crash: Use note name macros powerpc/crash: Use note name macros binfmt_elf: Use note name macros elf: Define note name macros
2025-03-24	Merge tag 'vfs-6.15-rc1.sysv' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs sysv removal from Christian Brauner: "This removes the sysv filesystem. We've discussed this various times. It's time to try" * tag 'vfs-6.15-rc1.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: sysv: Remove the filesystem
2025-03-24	Merge tag 'vfs-6.15-rc1.mount' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Mount notifications The day has come where we finally provide a new api to listen for mount topology changes outside of /proc/<pid>/mountinfo. A mount namespace file descriptor can be supplied and registered with fanotify to listen for mount topology changes. Currently notifications for mount, umount and moving mounts are generated. The generated notification record contains the unique mount id of the mount. The listmount() and statmount() api can be used to query detailed information about the mount using the received unique mount id. This allows userspace to figure out exactly how the mount topology changed without having to generating diffs of /proc/<pid>/mountinfo in userspace. - Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount api - Support detached mounts in overlayfs Since last cycle we support specifying overlayfs layers via file descriptors. However, we don't allow detached mounts which means userspace cannot user file descriptors received via open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to attach them to a mount namespace via move_mount() first. This is cumbersome and means they have to undo mounts via umount(). Allow them to directly use detached mounts. - Allow to retrieve idmappings with statmount Currently it isn't possible to figure out what idmapping has been attached to an idmapped mount. Add an extension to statmount() which allows to read the idmapping from the mount. - Allow creating idmapped mounts from mounts that are already idmapped So far it isn't possible to allow the creation of idmapped mounts from already idmapped mounts as this has significant lifetime implications. Make the creation of idmapped mounts atomic by allow to pass struct mount_attr together with the open_tree_attr() system call allowing to solve these issues without complicating VFS lookup in any way. The system call has in general the benefit that creating a detached mount and applying mount attributes to it becomes an atomic operation for userspace. - Add a way to query statmount() for supported options Allow userspace to query which mount information can be retrieved through statmount(). - Allow superblock owners to force unmount * tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits) umount: Allow superblock owners to force umount selftests: add tests for mount notification selinux: add FILE__WATCH_MOUNTNS samples/vfs: fix printf format string for size_t fs: allow changing idmappings fs: add kflags member to struct mount_kattr fs: add open_tree_attr() fs: add copy_mount_setattr() helper fs: add vfs_open_tree() helper statmount: add a new supported_mask field samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP selftests: add tests for using detached mount with overlayfs samples/vfs: check whether flag was raised statmount: allow to retrieve idmappings uidgid: add map_id_range_up() fs: allow detached mounts in clone_private_mount() selftests/overlayfs: test specifying layers as O_PATH file descriptors fs: support O_PATH fds with FSCONFIG_SET_FD vfs: add notifications for mount attach and detach fanotify: notify on mount attach and detach ...
2025-03-24	Merge tag 'vfs-6.15-rc1.misc' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Add CONFIG_DEBUG_VFS infrastucture: - Catch invalid modes in open - Use the new debug macros in inode_set_cached_link() - Use debug-only asserts around fd allocation and install - Place f_ref to 3rd cache line in struct file to resolve false sharing Cleanups: - Start using anon_inode_getfile_fmode() helper in various places - Don't take f_lock during SEEK_CUR if exclusion is guaranteed by f_pos_lock - Add unlikely() to kcmp() - Remove legacy ->remount_fs method from ecryptfs after port to the new mount api - Remove invalidate_inodes() in favour of evict_inodes() - Simplify ep_busy_loopER by removing unused argument - Avoid mmap sem relocks when coredumping with many missing pages - Inline getname() - Inline new_inode_pseudo() and de-staticize alloc_inode() - Dodge an atomic in putname if ref == 1 - Consistently deref the files table with rcu_dereference_raw() - Dedup handling of struct filename init and refcounts bumps - Use wq_has_sleeper() in end_dir_add() - Drop the lock trip around I_NEW wake up in evict() - Load the ->i_sb pointer once in inode_sb_list_{add,del} - Predict not reaching the limit in alloc_empty_file() - Tidy up do_sys_openat2() with likely/unlikely - Call inode_sb_list_add() outside of inode hash lock - Sort out fd allocation vs dup2 race commentary - Turn page_offset() into a wrapper around folio_pos() - Remove locking in exportfs around ->get_parent() call - try_lookup_one_len() does not need any locks in autofs - Fix return type of several functions from long to int in open - Fix return type of several functions from long to int in ioctls Fixes: - Fix watch queue accounting mismatch" * tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) fs: sort out fd allocation vs dup2 race commentary, take 2 fs: call inode_sb_list_add() outside of inode hash lock fs: tidy up do_sys_openat2() with likely/unlikely fs: predict not reaching the limit in alloc_empty_file() fs: load the ->i_sb pointer once in inode_sb_list_{add,del} fs: drop the lock trip around I_NEW wake up in evict() fs: use wq_has_sleeper() in end_dir_add() VFS/autofs: try_lookup_one_len() does not need any locks fs: dedup handling of struct filename init and refcounts bumps fs: consistently deref the files table with rcu_dereference_raw() exportfs: remove locking around ->get_parent() call. fs: use debug-only asserts around fd allocation and install fs: dodge an atomic in putname if ref == 1 vfs: Remove invalidate_inodes() ecryptfs: remove NULL remount_fs from super_operations watch_queue: fix pipe accounting mismatch fs: place f_ref to 3rd cache line in struct file to resolve false sharing epoll: simplify ep_busy_loop by removing always 0 argument fs: Turn page_offset() into a wrapper around folio_pos() kcmp: improve performance adding an unlikely hint to task comparisons ...
2025-03-21	net: remove sb1000 cable modem driver	Arnd Bergmann
	This one is hilariously outdated, it provided a faster downlink over TV cable for users of analog modems in the 1990s, through an ISA card. The web page for the userspace tools has been broken for 25 years, and the driver has only ever seen mechanical updates. Link: http://web.archive.org/web/20000611165545/http://home.adelphia.net:80/~siglercm/sb1000.html Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312085236.2531870-1-arnd@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	crypto: lib/chacha - remove unused arch-specific init support	Eric Biggers
	All implementations of chacha_init_arch() just call chacha_init_generic(), so it is pointless. Just delete it, and replace chacha_init() with what was previously chacha_init_generic(). Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-20	Merge branch 'kvm-nvmx-and-vm-teardown' into HEAD	Paolo Bonzini
	The immediate issue being fixed here is a nVMX bug where KVM fails to detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI). However, checking for a pending interrupt accesses the legacy PIC, and x86's kvm_arch_destroy_vm() currently frees the PIC before destroying vCPUs, i.e. checking for IRQs during the forced nested VM-Exit results in a NULL pointer deref; that's a prerequisite for the nVMX fix. The remaining patches attempt to bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. E.g. KVM currently unloads each vCPU's MMUs in a separate operation from destroying vCPUs, all because when guest SMP support was added, KVM had a kludgy MMU teardown flow that broke when a VM had more than one 1 vCPU. And that oddity lived on, for 18 years... Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-17	arch, mm: make releasing of memory to page allocator more explicit	Mike Rapoport (Microsoft)
	The point where the memory is released from memblock to the buddy allocator is hidden inside arch-specific mem_init()s and the call to memblock_free_all() is needlessly duplicated in every artiste cure and after introduction of arch_mm_preinit() hook, mem_init() implementation on many architecture only contains the call to memblock_free_all(). Pull memblock_free_all() call into mm_core_init() and drop mem_init() on relevant architectures to make it more explicit where the free memory is released from memblock to the buddy allocator and to reduce code duplication in architecture specific code. Link: https://lkml.kernel.org/r/20250313135003.836600-14-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Tested-by: Mark Brown <broonie@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren (csky) <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Richard Weinberger <richard@nod.at> Cc: Russel King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17	arch, mm: introduce arch_mm_preinit	Mike Rapoport (Microsoft)
	Currently, implementation of mem_init() in every architecture consists of one or more of the following: * initializations that must run before page allocator is active, for instance swiotlb_init() * a call to memblock_free_all() to release all the memory to the buddy allocator * initializations that must run after page allocator is ready and there is no arch-specific hook other than mem_init() for that, like for example register_page_bootmem_info() in x86 and sparc64 or simple setting of mem_init_done = 1 in several architectures * a bunch of semi-related stuff that apparently had no better place to live, for example a ton of BUILD_BUG_ON()s in parisc. Introduce arch_mm_preinit() that will be the first thing called from mm_core_init(). On architectures that have initializations that must happen before the page allocator is ready, move those into arch_mm_preinit() along with the code that does not depend on ordering with page allocator setup. On several architectures this results in reduction of mem_init() to a single call to memblock_free_all() that allows its consolidation next. Link: https://lkml.kernel.org/r/20250313135003.836600-13-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86] Tested-by: Mark Brown <broonie@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren (csky) <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Richard Weinberger <richard@nod.at> Cc: Russel King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17	arch, mm: streamline HIGHMEM freeing	Mike Rapoport (Microsoft)
	All architectures that support HIGHMEM have their code that frees high memory pages to the buddy allocator while __free_memory_core() is limited to freeing only low memory. There is no actual reason for that. The memory map is completely ready by the time memblock_free_all() is called and high pages can be released to the buddy allocator along with low memory. Remove low memory limit from __free_memory_core() and drop per-architecture code that frees high memory pages. Link: https://lkml.kernel.org/r/20250313135003.836600-12-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86] Tested-by: Mark Brown <broonie@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren (csky) <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Richard Weinberger <richard@nod.at> Cc: Russel King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17	arch, mm: set max_mapnr when allocating memory map for FLATMEM	Mike Rapoport (Microsoft)
	max_mapnr is essentially the size of the memory map for systems that use FLATMEM. There is no reason to calculate it in each and every architecture when it's anyway calculated in alloc_node_mem_map(). Drop setting of max_mapnr from architecture code and set it once in alloc_node_mem_map(). While on it, move definition of mem_map and max_mapnr to mm/mm_init.c so there won't be two copies for MMU and !MMU variants. Link: https://lkml.kernel.org/r/20250313135003.836600-10-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86] Tested-by: Mark Brown <broonie@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Guo Ren (csky) <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Richard Weinberger <richard@nod.at> Cc: Russel King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17	powerpc: Rely on generic printing of preemption model	Sebastian Andrzej Siewior
	After the first printk in __die() there is show_regs() -> show_regs_print_info() which prints the current preemption model. Remove the preempion model from the arch code. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20250314160810.2373416-6-bigeasy@linutronix.de
2025-03-17	perf: Supply task information to sched_task()	Kan Liang
	To save/restore LBR call stack data in system-wide mode, the task_struct information is required. Extend the parameters of sched_task() to supply task_struct information. When schedule in, the LBR call stack data for new task will be restored. When schedule out, the LBR call stack data for old task will be saved. Only need to pass the required task_struct information. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250314172700.438923-4-kan.liang@linux.intel.com
2025-03-17	mm: rename GENERIC_PTDUMP and PTDUMP_CORE	Anshuman Khandual
	Platforms subscribe into generic ptdump implementation via GENERIC_PTDUMP. But generic ptdump gets enabled via PTDUMP_CORE. These configs combination is confusing as they sound very similar and does not differentiate between platform's feature subscription and feature enablement for ptdump. Rename the configs as ARCH_HAS_PTDUMP and PTDUMP making it more clear and improve readability. Link: https://lkml.kernel.org/r/20250226122404.1927473-6-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> (powerpc) Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17	arch/powerpc: drop GENERIC_PTDUMP from mpc885_ads_defconfig	Anshuman Khandual
	GENERIC_PTDUMP gets selected on powerpc explicitly and hence can be dropped off from mpc885_ads_defconfig. Replace with CONFIG_PTDUMP_DEBUGFS instead. Link: https://lkml.kernel.org/r/20250226122404.1927473-3-anshuman.khandual@arm.com Fixes: e084728393a5 ("powerpc/ptdump: Convert powerpc to GENERIC_PTDUMP") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Steven Price <steven.price@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16	powerpc/crash: use generic crashkernel reservation	Sourabh Jain
	Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Along with moving to the generic crashkernel reservation, the code related to finding the base address for the crashkernel has been separated into its own function name get_crash_base() for better readability and maintainability. Link: https://lkml.kernel.org/r/20250131113830.925179-8-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16	powerpc: insert System RAM resource to prevent crashkernel conflict	Sourabh Jain
	The next patch in the series with title "powerpc/crash: use generic crashkernel reservation" enables powerpc to use generic crashkernel reservation instead of custom implementation. This leads to exporting of `Crash Kernel` memory to iomem_resource (/proc/iomem) via insert_crashkernel_resources():kernel/crash_reserve.c or at another place in the same file if HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY is set. The add_system_ram_resources():arch/powerpc/mm/mem.c adds `System RAM` to iomem_resource using request_resource(). This creates a conflict with `Crash Kernel`, which is added by the generic crashkernel reservation code. As a result, the kernel ultimately fails to add `System RAM` to iomem_resource. Consequently, it does not appear in /proc/iomem. There are multiple approches tried to avoid this: 1. Don't add Crash Kernel to iomem_resource: - This has two issues. First, it requires adding an architecture-specific hook in the generic code. There are already two code paths to choose when to add `Crash Kernel` to iomem_resource. This adds one more code path to skip it. Second, what if `Crash Kernel` is required in /proc/iomem in the future? Many architectures do export it. 2. Don't add `System RAM` to iomem_resource by reverting commit c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem"): - It's not ideal to export `System RAM` via /proc/iomem, but since it already done ealier and userspace tools like kdump and kdump-utils rely on `System RAM` from /proc/iomem, removing it will break userspace. 3. Add Crash Kernel along with System RAM to /proc/iomem: This patch takes the third approach by updating add_system_ram_resources() to use insert_resource() instead of the request_resource() API to add the `System RAM` resource to iomem_resource. insert_resource() allows inserting resources even if they overlap with existing ones. Since `Crash Kernel` and `System RAM` resources are added to iomem_resource early in the boot, any other conflict is not expected. With the changes introduced here and in the next patch, "powerpc/crash: use generic crashkernel reservation," /proc/iomem now exports `System RAM` and `Crash Kernel` as shown below: $ cat /proc/iomem 00000000-3ffffffff : System RAM 10000000-4fffffff : Crash kernel The kdump script is capable of handling `System RAM` and `Crash Kernel` in the above format. The same format is used in other architectures. Link: https://lkml.kernel.org/r/20250131113830.925179-7-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16	powerpc/crash: preserve user-specified memory limit	Sourabh Jain
	Commit 59d58189f3d9 ("crash: fix crash memory reserve exceed system memory bug") fails crashkernel parsing if the crash size is found to be higher than system RAM, which makes the memory_limit adjustment code ineffective due to an early exit from reserve_crashkernel(). Regardless lets not violate the user-specified memory limit by adjusting it. Remove this adjustment to ensure all reservations stay within the limit. Commit f94f5ac07983 ("powerpc/fadump: Don't update the user-specified memory limit") did the same for fadump. Link: https://lkml.kernel.org/r/20250131113830.925179-6-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16	powerpc/crash: use generic APIs to locate memory hole for kdump	Sourabh Jain
	On PowerPC, the memory reserved for the crashkernel can contain components like RTAS, TCE, OPAL, etc., which should be avoided when loading kexec segments into crashkernel memory. Due to these special components, PowerPC has its own set of APIs to locate holes in the crashkernel memory for loading kexec segments for kdump. However, for loading kexec segments in the kexec case, PowerPC already uses generic APIs to locate holes. The previous patch in this series, titled "crash: Let arch decide usable memory range in reserved area," introduced arch-specific hook to handle such special regions in the crashkernel area. So, switch PowerPC to use the generic APIs to locate memory holes for kdump and remove the redundant PowerPC-specific APIs. Link: https://lkml.kernel.org/r/20250131113830.925179-5-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16	mm/hugetlb: enable bootmem allocation from CMA areas	Frank van der Linden
	If hugetlb_cma_only is enabled, we know that hugetlb pages can only be allocated from CMA. Now that there is an interface to do early reservations from a CMA area (returning memblock memory), it can be used to allocate hugetlb pages from CMA. This also allows for doing pre-HVO on these pages (if enabled). Make sure to initialize the page structures and associated data correctly. Create a flag to signal that a hugetlb page has been allocated from CMA to make things a little easier. Some configurations of powerpc have a special hugetlb bootmem allocator, so introduce a boolean arch_specific_huge_bootmem_alloc that returns true if such an allocator is present. In that case, CMA bootmem allocations can't be used, so check that function before trying. Link: https://lkml.kernel.org/r/20250228182928.2645936-27-fvdl@google.com Signed-off-by: Frank van der Linden <fvdl@google.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16	mm/hugetlb: add pre-HVO framework	Frank van der Linden
	Define flags for pre-HVOed bootmem hugetlb pages, and act on them. The most important flag is the HVO flag, signalling that a bootmem allocated gigantic page has already been HVO-ed. If this flag is seen by the hugetlb bootmem gather code, the page is marked as HVO optimized. The HVO code will then not try to optimize it again. Instead, it will just map the tail page mirror pages read-only, completing the HVO steps. No functional change, as nothing sets the flags yet. Link: https://lkml.kernel.org/r/20250228182928.2645936-18-fvdl@google.com Signed-off-by: Frank van der Linden <fvdl@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16	mm/bootmem_info: export register_page_bootmem_memmap	Frank van der Linden
	If other mm code wants to use this function for early memmap inialization (on the platforms that have it), it should be made available properly, not just unconditionally in mm.h Make this function available for such cases. Link: https://lkml.kernel.org/r/20250228182928.2645936-10-fvdl@google.com Signed-off-by: Frank van der Linden <fvdl@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16	mm/ioremap: pass pgprot_t to ioremap_prot() instead of unsigned long	Ryan Roberts
	ioremap_prot() currently accepts pgprot_val parameter as an unsigned long, thus implicitly assuming that pgprot_val and pgprot_t could never be bigger than unsigned long. But this assumption soon will not be true on arm64 when using D128 pgtables. In 128 bit page table configuration, unsigned long is 64 bit, but pgprot_t is 128 bit. Passing platform abstracted pgprot_t argument is better as compared to size based data types. Let's change the parameter to directly pass pgprot_t like another similar helper generic_ioremap_prot(). Without this change in place, D128 configuration does not work on arm64 as the top 64 bits gets silently stripped when passing the protection value to this function. Link: https://lkml.kernel.org/r/20250218101954.415331-1-anshuman.khandual@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Co-developed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16	cxl: Remove driver	Andrew Donnellan
	Remove the cxl driver that provides support for the IBM Coherent Accelerator Processor Interface. Revert or clean up associated code in arch/powerpc that is no longer necessary. cxl has received minimal maintenance for several years, and is not supported on the Power10 processor. We aren't aware of any users who are likely to be using recent kernels. Thanks to Mikey Neuling, Ian Munsie, Daniel Axtens, Frederic Barrat, Christophe Lombard, Philippe Bergheaud, Vaibhav Jain and Alastair D'Silva for their work on this driver over the years. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20250219070007.177725-2-ajd@linux.ibm.com
2025-03-15	crypto: skcipher - Make skcipher_walk src.virt.addr const	Herbert Xu
	Mark the src.virt.addr field in struct skcipher_walk as a pointer to const data. This guarantees that the user won't modify the data which should be done through dst.virt.addr to ensure that flushing is done when necessary. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-14	cpu/SMT: Provide a default topology_is_primary_thread()	Yicong Yang
	Currently if architectures want to support HOTPLUG_SMT they need to provide a topology_is_primary_thread() telling the framework which thread in the SMT cannot offline. However arm64 doesn't have a restriction on which thread in the SMT cannot offline, a simplest choice is that just make 1st thread as the "primary" thread. So just make this as the default implementation in the framework and let architectures like x86 that have special primary thread to override this function (which they've already done). There's no need to provide a stub function if !CONFIG_SMP or !CONFIG_HOTPLUG_SMT. In such case the testing CPU is already the 1st CPU in the SMT so it's always the primary thread. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20250311075143.61078-2-yangyicong@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-12	spufs: fix a leak in spufs_create_context()	Al Viro
	Leak fixes back in 2008 missed one case - if we are trying to set affinity and spufs_mkdir() fails, we need to drop the reference to neighbor. Fixes: 58119068cb27 "[POWERPC] spufs: Fix memory leak on SPU affinity" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-03-12	spufs: fix gang directory lifetimes	Al Viro
	prior to "[POWERPC] spufs: Fix gang destroy leaks" we used to have a problem with gang lifetimes - creation of a gang returns opened gang directory, which normally gets removed when that gets closed, but if somebody has created a context belonging to that gang and kept it alive until the gang got closed, removal failed and we ended up with a leak. Unfortunately, it had been fixed the wrong way. Dentry of gang directory was no longer pinned, and rmdir on close was gone. One problem was that failure of open kept calling simple_rmdir() as cleanup, which meant an unbalanced dput(). Another bug was in the success case - gang creation incremented link count on root directory, but that was no longer undone when gang got destroyed. Fix consists of * reverting the commit in question * adding a counter to gang, protected by ->i_rwsem of gang directory inode. * having it set to 1 at creation time, dropped in both spufs_dir_close() and spufs_gang_close() and bumped in spufs_create_context(), provided that it's not 0. * using simple_recursive_removal() to take the gang directory out when counter reaches zero. Fixes: 877907d37da9 "[POWERPC] spufs: Fix gang destroy leaks" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-03-12	spufs: fix a leak on spufs_new_file() failure	Al Viro
	It's called from spufs_fill_dir(), and caller of that will do spufs_rmdir() in case of failure. That does remove everything we'd managed to create, but... the problem dentry is still negative. IOW, it needs to be explicitly dropped. Fixes: 3f51dd91c807 "[PATCH] spufs: fix spufs_fill_dir error path" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-03-11	powerpc: asm/io.h: remove split ioread64/iowrite64 helpers	Arnd Bergmann
	In previous kernels, there were conflicting definitions for what ioread64_lo_hi() and similar functions were supposed to do on architectures with native 64-bit MMIO. Based on the actual usage in drivers, they are in fact expected to be a pair of 32-bit accesses on all architectures, which makes the powerpc64 definition wrong. Remove it and use the generic implementation instead. Drivers that want to have split lo/hi or hi/lo accesses on 32-bit architectures but can use 64-bit accesses where supported should instead use ioread64()/iowrite64() after including the corresponding header file. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-10	Merge 6.14-rc6 into driver-core-next	Greg Kroah-Hartman
	We need the driver core fix in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-10	lib/crc: remove unnecessary prompt for CONFIG_LIBCRC32C	Eric Biggers
	All modules that need CONFIG_LIBCRC32C already select it, so there is no need to bother users about the option. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250304230712.167600-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-10	powerpc/ftrace: Use RCU in all users of __module_text_address().	Sebastian Andrzej Siewior
	__module_text_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_text_address() with RCU. Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-trace-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20250108090457.512198-21-bigeasy@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-03-10	powerpc/kexec: fix physical address calculation in clear_utlb_entry()	Christophe Leroy
	In relocate_32.S, function clear_utlb_entry() goes into real mode. To do so, it has to calculate the physical address based on the virtual address. To get the virtual address it uses 'bl' which is problematic (see commit c974809a26a1 ("powerpc/vdso: Avoid link stack corruption in __get_datapage()")). In addition, the calculation is done on a wrong address because 'bl' loads LR with the address of the following instruction, not the address of the target. So when the target is not the instruction following the 'bl' instruction, it may lead to unexpected behaviour. Fix it by re-writing the code so that is goes via another path which is based 'bcl 20,31,.+4' which is the right instruction to use for that. Fixes: 683430200315 ("powerpc/47x: Kernel support for KEXEC") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/dc4f9616fba9c05c5dbf9b4b5480eb1c362adc17.1741256651.git.christophe.leroy@csgroup.eu
2025-03-10	crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD	Christophe Leroy
	The following build warning has been reported: arch/powerpc/crypto/ghashp8-ppc.o: warning: objtool: .text+0x22c: unannotated intra-function call This happens due to commit bb7f054f4de2 ("objtool/powerpc: Add support for decoding all types of uncond branches") Disassembly of arch/powerpc/crypto/ghashp8-ppc.o shows: arch/powerpc/crypto/ghashp8-ppc.o: file format elf64-powerpcle Disassembly of section .text: 0000000000000140 <gcm_ghash_p8>: 140: f8 ff 00 3c lis r0,-8 ... 20c: 20 00 80 4e blr 210: 00 00 00 00 .long 0x0 214: 00 0c 14 00 .long 0x140c00 218: 00 00 04 00 .long 0x40000 21c: 00 00 00 00 .long 0x0 220: 47 48 41 53 rlwimi. r1,r26,9,1,3 224: 48 20 66 6f xoris r6,r27,8264 228: 72 20 50 6f xoris r16,r26,8306 22c: 77 65 72 49 bla 1726574 <gcm_ghash_p8+0x1726434> <== ... It corresponds to the following code in ghashp8-ppc.o : _GLOBAL(gcm_ghash_p8) lis 0,0xfff8 ... blr .long 0 .byte 0,12,0x14,0,0,0,4,0 .long 0 .size gcm_ghash_p8,.-gcm_ghash_p8 .byte 71,72,65,83,72,32,102,111,114,32,80,111,119,101,114,73,83,65,32,50,46,48,55,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 .align 2 .align 2 In fact this is raw data that is after the function end and that is not text so shouldn't be disassembled as text. But ghashp8-ppc.S is generated by a perl script and should have been marked as OBJECT_FILES_NON_STANDARD. Now that 'bla' is understood as a call instruction, that raw data is mis-interpreted as an infra-function call. Mark ghashp8-ppc.o as a OBJECT_FILES_NON_STANDARD to avoid this warning. Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/8c4c3fc2-2bd7-4148-af68-2f504d6119e0@linux.ibm.com Fixes: 109303336a0c ("crypto: vmx - Move to arch/powerpc/crypto") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/7aa7eb73fe6bc95ac210510e22394ca0ae227b69.1741128786.git.christophe.leroy@csgroup.eu
2025-03-10	powerpc: Fix 'intra_function_call not a direct call' warning	Christophe Leroy
	The following build warning have been reported: arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xe84: intra_function_call not a direct call arch/powerpc/kernel/switch.o: warning: objtool: .text+0x4: intra_function_call not a direct call This happens due to commit bb7f054f4de2 ("objtool/powerpc: Add support for decoding all types of uncond branches") because that commit decodes 'bl .+4' as a normal instruction because that instruction is used by clang instead of 'bcl 20,31,+.4' for relocatable code. The solution is simply to remove the ANNOTATE_INTRA_FUNCTION_CALL annotation now that the instruction is not seen as a function call anymore. Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/8c4c3fc2-2bd7-4148-af68-2f504d6119e0@linux.ibm.com Fixes: bb7f054f4de2 ("objtool/powerpc: Add support for decoding all types of uncond branches") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/88876fb4e412203452e57d1037a1341cf15ccc7b.1741128981.git.christophe.leroy@csgroup.eu
2025-03-08	powerpc/vdso: Prepare introduction of struct vdso_clock	Nam Cao
	To support multiple PTP clocks, the VDSO data structure needs to be reworked. All clock specific data will end up in struct vdso_clock and in struct vdso_time_data there will be array of VDSO clocks. At the moment, vdso_clock is simply a define which maps vdso_clock to vdso_time_data. To prepare for the rework of the data structures, replace the struct vdso_time_data pointer with a struct vdso_clock pointer where applicable. No functional change. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250303-vdso-clock-v1-17-c1b5c69a166f@linutronix.de
2025-03-07	powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu'	Vaibhav Jain
	Commit 176cda0619b6 ("powerpc/perf: Add perf interface to expose vpa counters") introduced 'vpa_pmu' to expose Book3s-HV nested APIv2 provided L1<->L2 context switch latency counters to L1 user-space via perf-events. However the newly introduced PMU named 'vpa_pmu' doesn't assign ownership of the PMU to the module 'vpa_pmu'. Consequently the module 'vpa_pmu' can be unloaded while one of the perf-events are still active, which can lead to kernel oops and panic of the form below on a Pseries-LPAR: BUG: Kernel NULL pointer dereference on read at 0x00000058 <snip> NIP [c000000000506cb8] event_sched_out+0x40/0x258 LR [c00000000050e8a4] __perf_remove_from_context+0x7c/0x2b0 Call Trace: [c00000025fc3fc30] [c00000025f8457a8] 0xc00000025f8457a8 (unreliable) [c00000025fc3fc80] [fffffffffffffee0] 0xfffffffffffffee0 [c00000025fc3fcd0] [c000000000501e70] event_function+0xa8/0x120 <snip> Kernel panic - not syncing: Aiee, killing interrupt handler! Fix this by adding the module ownership to 'vpa_pmu' so that the module 'vpa_pmu' is ref-counted and prevented from being unloaded when perf-events are initialized. Fixes: 176cda0619b6 ("powerpc/perf: Add perf interface to expose vpa counters") Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250204153527.125491-1-vaibhav@linux.ibm.com
2025-03-07	KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests	Amit Machhiwal
	Currently on book3s-hv, the capability KVM_CAP_SPAPR_TCE_VFIO is only available for KVM Guests running on PowerNV and not for the KVM guests running on pSeries hypervisors. This prevents a pSeries L2 guest from leveraging the in-kernel acceleration for H_PUT_TCE_INDIRECT and H_STUFF_TCE hcalls that results in slow startup times for large memory guests. Support for VFIO on pSeries was restored in commit f431a8cde7f1 ("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries"), making it possible to re-enable this capability on pSeries hosts. This change enables KVM_CAP_SPAPR_TCE_VFIO for nested PAPR guests on pSeries, while maintaining the existing behavior on PowerNV. Booting an L2 guest with 128GB of memory shows an average 11% improvement in startup time. Fixes: f431a8cde7f1 ("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries") Cc: stable@vger.kernel.org Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250220070002.1478849-1-amachhiw@linux.ibm.com
2025-03-07	powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7	Andreas Schwab
	Similar to the PowerMac3,1, the PowerBook6,7 is missing the #size-cells property on the i2s node. Depends-on: commit 045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells handling") Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> [maddy: added "commit" work in depends-on to avoid checkpatch error] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/875xmizl6a.fsf@igel.home
2025-03-02	crypto: lib/Kconfig - Hide arch options from user	Herbert Xu
	The ARCH_MAY_HAVE patch missed arm64, mips and s390. But it may also lead to arch options being enabled but ineffective because of modular/built-in conflicts. As the primary user of all these options wireguard is selecting the arch options anyway, make the same selections at the lib/crypto option level and hide the arch options from the user. Instead of selecting them centrally from lib/crypto, simply set the default of each arch option as suggested by Eric Biggers. Change the Crypto API generic algorithms to select the top-level lib/crypto options instead of the generic one as otherwise there is no way to enable the arch options (Eric Biggers). Introduce a set of INTERNAL options to work around dependency cycles on the CONFIG_CRYPTO symbol. Fixes: 1047e21aecdf ("crypto: lib/Kconfig - Fix lib built-in failure when arch is modular") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Arnd Bergmann <arnd@kernel.org> Closes: https://lore.kernel.org/oe-kbuild-all/202502232152.JC84YDLp-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-01	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Ryan's been hard at work finding and fixing mm bugs in the arm64 code, so here's a small crop of fixes for -rc5. The main changes are to fix our zapping of non-present PTEs for hugetlb entries created using the contiguous bit in the page-table rather than a block entry at the level above. Prior to these fixes, we were pulling the contiguous bit back out of the PTE in order to determine the size of the hugetlb page but this is clearly bogus if the thing isn't present and consequently both the clearing of the PTE(s) and the TLB invalidation were unreliable. Although the problem was found by code inspection, we really don't want this sitting around waiting to trigger and the changes are CC'd to stable accordingly. Note that the diffstat looks a lot worse than it really is; huge_ptep_get_and_clear() now takes a size argument from the core code and so all the arch implementations of that have been updated in a pretty mechanical fashion. - Fix a sporadic boot failure due to incorrect randomization of the linear map on systems that support it - Fix the zapping (both clearing the entries and invalidating the TLB) of hugetlb PTEs constructed using the contiguous bit" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() arm64/mm: Fix Boot panic on Ampere Altra
2025-02-27	mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()	Ryan Roberts
	In order to fix a bug, arm64 needs to be told the size of the huge page for which the huge_pte is being cleared in huge_ptep_get_and_clear(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear() and set_huge_pte_at(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, loongarch, mips, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. Cc: stable@vger.kernel.org Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-02-26	KVM: Drop kvm_arch_sync_events() now that all implementations are nops	Sean Christopherson
	Remove kvm_arch_sync_events() now that x86 no longer uses it (no other arch has ever used it). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Message-ID: <20250224235542.2562848-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-26	powerpc/microwatt: Add SMP support	Paul Mackerras
	This adds support for Microwatt systems with more than one core, and updates the device tree for a 2-core version. The secondary CPUs are started and sent to spin in __secondary_hold very early on, in the platform probe function. The reason for doing this is so that they are there when smp_release_cpus() gets called, which is before the platform init_smp function or even the platform setup_arch function gets called. Note that having two CPUs in the device tree doesn't preclude operation with only one CPU. The SYSCON_CPU_CTRL register has a read-only field which indicates the number of CPU cores, so microwatt_init_smp() will only start as many CPU cores as are present in the system, and any extra CPU device-tree nodes will just be ignored. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/Z5xt8aooKyXZv6Kf@thinks.paulus.ozlabs.org