Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:
- Two changes to prepare for the future Rust 1.91.0 release (expected
2025-10-30, currently in nightly): a target specification format
change and a renamed, soon-to-be-stabilized 'core' function.
* tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: support Rust >= 1.91.0 target spec
rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
|
|
As part of the stabilization of Location::file_with_nul(), it was brought
up that the with_nul() suffix usually means something else in Rust APIs,
so the API is being renamed prior to stabilization [1].
Thus, use the new name on new rustc versions.
Link: https://www.github.com/rust-lang/rust/pull/145928 [1]
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250827-file_as_c_str-v1-1-d3f5a3916a9c@google.com
[ Kept `cfg` separation. Reworded slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
After an innocuous change in -next that modified a structure that
contains __counted_by, clang-19 start crashing when building certain
files in drivers/gpu/drm/xe. When assertions are enabled, the more
descriptive failure is:
clang: clang/lib/AST/RecordLayoutBuilder.cpp:3335: const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: Assertion `D && "Cannot get layout of forward declarations!"' failed.
According to a reverse bisect, a tangential change to the LLVM IR
generation phase of clang during the LLVM 20 development cycle [1]
resolves this problem. Bump the version of clang that enables
CONFIG_CC_HAS_COUNTED_BY to 20.1.0 to ensure that this issue cannot be
hit.
Link: https://github.com/llvm/llvm-project/commit/160fb1121cdf703c3ef5e61fb26c5659eb581489 [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20250807-fix-counted_by-clang-19-v1-1-902c86c1d515@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
us closer to being able to remove page->mapping
- "relayfs: misc changes" (Jason Xing) does some maintenance and
minor feature addition work in relayfs
- "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
us from static preallocation of the kdump crashkernel's working
memory over to dynamic allocation. So the difficulty of a-priori
estimation of the second kernel's needs is removed and the first
kernel obtains extra memory
- "generalize panic_print's dump function to be used by other
kernel parts" (Feng Tang) implements some consolidation and
rationalization of the various ways in which a failing kernel
splats information at the operator
* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
tools/getdelays: add backward compatibility for taskstats version
kho: add test for kexec handover
delaytop: enhance error logging and add PSI feature description
samples: Kconfig: fix spelling mistake "instancess" -> "instances"
fat: fix too many log in fat_chain_add()
scripts/spelling.txt: add notifer||notifier to spelling.txt
xen/xenbus: fix typo "notifer"
net: mvneta: fix typo "notifer"
drm/xe: fix typo "notifer"
cxl: mce: fix typo "notifer"
KVM: x86: fix typo "notifer"
MAINTAINERS: add maintainers for delaytop
ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
ucount: fix atomic_long_inc_below() argument type
kexec: enable CMA based contiguous allocation
stackdepot: make max number of pools boot-time configurable
lib/xxhash: remove unused functions
init/Kconfig: restore CONFIG_BROKEN help text
lib/raid6: update recov_rvv.c zero page usage
docs: update docs after introducing delaytop
...
|
|
Linus added it in 2003, it later was removed. Put it back.
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Add support for cgroup "cpu.max" interface
- Code organization cleanup so that ext_idle.c doesn't depend on the
source-file-inclusion build method of sched/
- Drop UP paths in accordance with sched core changes
- Documentation and other misc changes
* tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix scx_bpf_reenqueue_local() reference
sched_ext: Drop kfuncs marked for removal in 6.15
sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic
kernel/sched/ext.c: fix typo "occured" -> "occurred" in comments
sched_ext: Add support for cgroup bandwidth control interface
sched_ext, sched/core: Factor out struct scx_task_group
sched_ext: Return NULL in llc_span
sched_ext: Always use SMP versions in kernel/sched/ext_idle.h
sched_ext: Always use SMP versions in kernel/sched/ext_idle.c
sched_ext: Always use SMP versions in kernel/sched/ext.h
sched_ext: Always use SMP versions in kernel/sched/ext.c
sched_ext: Documentation: Clarify time slice handling in task lifecycle
sched_ext: Make scx_locked_rq() inline
sched_ext: Make scx_rq_bypassing() inline
sched_ext: idle: Make local functions static in ext_idle.c
sched_ext: idle: Remove unnecessary ifdef in scx_bpf_cpu_node()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"As usual, many cleanups. The below blurbiage describes 42 patchsets.
21 of those are partially or fully cleanup work. "cleans up",
"cleanup", "maintainability", "rationalizes", etc.
I never knew the MM code was so dirty.
"mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
mapped VMAs were not eligible for merging with existing adjacent
VMAs.
"mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
adds a new kernel module which simplifies the setup and usage of
DAMON in production environments.
"stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
is a cleanup to the writeback code which removes a couple of
pointers from struct writeback_control.
"drivers/base/node.c: optimization and cleanups" (Donet Tom)
contains largely uncorrelated cleanups to the NUMA node setup and
management code.
"mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
does some maintenance work on the userfaultfd code.
"Readahead tweaks for larger folios" (Ryan Roberts)
implements some tuneups for pagecache readahead when it is reading
into order>0 folios.
"selftests/mm: Tweaks to the cow test" (Mark Brown)
provides some cleanups and consistency improvements to the
selftests code.
"Optimize mremap() for large folios" (Dev Jain)
does that. A 37% reduction in execution time was measured in a
memset+mremap+munmap microbenchmark.
"Remove zero_user()" (Matthew Wilcox)
expunges zero_user() in favor of the more modern memzero_page().
"mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
addresses some warts which David noticed in the huge page code.
These were not known to be causing any issues at this time.
"mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
provides some cleanup and consolidation work in DAMON.
"use vm_flags_t consistently" (Lorenzo Stoakes)
uses vm_flags_t in places where we were inappropriately using other
types.
"mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
increases the reliability of large page allocation in the memfd
code.
"mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
removes several now-unneeded PFN_* flags.
"mm/damon: decouple sysfs from core" (SeongJae Park)
implememnts some cleanup and maintainability work in the DAMON
sysfs layer.
"madvise cleanup" (Lorenzo Stoakes)
does quite a lot of cleanup/maintenance work in the madvise() code.
"madvise anon_name cleanups" (Vlastimil Babka)
provides additional cleanups on top or Lorenzo's effort.
"Implement numa node notifier" (Oscar Salvador)
creates a standalone notifier for NUMA node memory state changes.
Previously these were lumped under the more general memory
on/offline notifier.
"Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
cleans up the pageblock isolation code and fixes a potential issue
which doesn't seem to cause any problems in practice.
"selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
adds additional drgn- and python-based DAMON selftests which are
more comprehensive than the existing selftest suite.
"Misc rework on hugetlb faulting path" (Oscar Salvador)
fixes a rather obscure deadlock in the hugetlb fault code and
follows that fix with a series of cleanups.
"cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
rationalizes and cleans up the highmem-specific code in the CMA
allocator.
"mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
provides cleanups and future-preparedness to the migration code.
"mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
adds some tracepoints to some DAMON auto-tuning code.
"mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
does that.
"mm/damon: misc cleanups" (SeongJae Park)
also does what it claims.
"mm: folio_pte_batch() improvements" (David Hildenbrand)
cleans up the large folio PTE batching code.
"mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
facilitates dynamic alteration of DAMON's inter-node allocation
policy.
"Remove unmap_and_put_page()" (Vishal Moola)
provides a couple of page->folio conversions.
"mm: per-node proactive reclaim" (Davidlohr Bueso)
implements a per-node control of proactive reclaim - beyond the
current memcg-based implementation.
"mm/damon: remove damon_callback" (SeongJae Park)
replaces the damon_callback interface with a more general and
powerful damon_call()+damos_walk() interface.
"mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
implements a number of mremap cleanups (of course) in preparation
for adding new mremap() functionality: newly permit the remapping
of multiple VMAs when the user is specifying MREMAP_FIXED. It still
excludes some specialized situations where this cannot be performed
reliably.
"drop hugetlb_free_pgd_range()" (Anthony Yznaga)
switches some sparc hugetlb code over to the generic version and
removes the thus-unneeded hugetlb_free_pgd_range().
"mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
augments the present userspace-requested update of DAMON sysfs
monitoring files. Automatic update is now provided, along with a
tunable to control the update interval.
"Some randome fixes and cleanups to swapfile" (Kemeng Shi)
does what is claims.
"mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
provides (and uses) a means by which debug-style functions can grab
a copy of a pageframe and inspect it locklessly without tripping
over the races inherent in operating on the live pageframe
directly.
"use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
addresses the large contention issues which can be triggered by
reads from that procfs file. Latencies are reduced by more than
half in some situations. The series also introduces several new
selftests for the /proc/pid/maps interface.
"__folio_split() clean up" (Zi Yan)
cleans up __folio_split()!
"Optimize mprotect() for large folios" (Dev Jain)
provides some quite large (>3x) speedups to mprotect() when dealing
with large folios.
"selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
does some cleanup work in the selftests code.
"tools/testing: expand mremap testing" (Lorenzo Stoakes)
extends the mremap() selftest in several ways, including adding
more checking of Lorenzo's recently added "permit mremap() move of
multiple VMAs" feature.
"selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
extends the DAMON sysfs interface selftest so that it tests all
possible user-requested parameters. Rather than the present minimal
subset"
* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
MAINTAINERS: add missing headers to mempory policy & migration section
MAINTAINERS: add missing file to cgroup section
MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
MAINTAINERS: add missing zsmalloc file
MAINTAINERS: add missing files to page alloc section
MAINTAINERS: add missing shrinker files
MAINTAINERS: move memremap.[ch] to hotplug section
MAINTAINERS: add missing mm_slot.h file THP section
MAINTAINERS: add missing interval_tree.c to memory mapping section
MAINTAINERS: add missing percpu-internal.h file to per-cpu section
mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
selftests/damon: introduce _common.sh to host shared function
selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
selftests/damon/sysfs.py: test non-default parameters runtime commit
selftests/damon/sysfs.py: generalize DAMON context commit assertion
selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
selftests/damon/sysfs.py: test DAMOS filters commitment
selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
selftests/damon/sysfs.py: test DAMOS destinations commitment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler changes:
- Better tracking of maximum lag of tasks in presence of different
slices duration, for better handling of lag in the fair scheduler
(Vincent Guittot)
- Clean up and standardize #if/#else/#endif markers throughout the
entire scheduler code base (Ingo Molnar)
- Make SMP unconditional: build the SMP scheduler's data structures
and logic on UP kernel too, even though they are not used, to
simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
blocks from the scheduler (Ingo Molnar)
- Reorganize cgroup bandwidth control interface handling for better
interfacing with sched_ext (Tejun Heo)
Balancing:
- Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
Mason)
- Remove sched_domain_topology_level::flags to simplify the code
(Prateek Nayak)
- Simplify and clean up build_sched_topology() (Li Chen)
- Optimize build_sched_topology() on large machines (Li Chen)
Real-time scheduling:
- Add initial version of proxy execution: a mechanism for
mutex-owning tasks to inherit the scheduling context of higher
priority waiters.
Currently limited to a single runqueue and conditional on
CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
Valentin Schneider)
- Deadline scheduler (Juri Lelli):
- Fix dl_servers initialization order (Juri Lelli)
- Fix DL scheduler's root domain reinitialization logic (Juri
Lelli)
- Fix accounting bugs after global limits change (Juri Lelli)
- Fix scalability regression by implementing less agressive
dl_server handling (Peter Zijlstra)
PSI:
- Improve scalability by optimizing psi_group_change() cpu_clock()
usage (Peter Zijlstra)
Rust changes:
- Make Task, CondVar and PollCondVar methods inline to avoid
unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)
- Add might_sleep() support for Rust code: Rust's "#[track_caller]"
mechanism is used so that Rust's might_sleep() doesn't need to be
defined as a macro (Fujita Tomonori)
- Introduce file_from_location() (Boqun Feng)
Debugging & instrumentation:
- Make clangd usable with scheduler source code files again (Peter
Zijlstra)
- tools: Add root_domains_dump.py which dumps root domains info (Juri
Lelli)
- tools: Add dl_bw_dump.py for printing bandwidth accounting info
(Juri Lelli)
Misc cleanups & fixes:
- Remove play_idle() (Feng Lee)
- Fix check_preemption_disabled() (Sebastian Andrzej Siewior)
- Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
Claudio R. Goncalves)
- Correct the comment in place_entity() (wang wei)"
* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
sched/idle: Remove play_idle()
sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
sched: Start blocked_on chain processing in find_proxy_task()
sched: Fix proxy/current (push,pull)ability
sched: Add an initial sketch of the find_proxy_task() function
sched: Fix runtime accounting w/ split exec & sched contexts
sched: Move update_curr_task logic into update_curr_se
locking/mutex: Add p->blocked_on wrappers for correctness checks
locking/mutex: Rework task_struct::blocked_on
sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
sched/topology: Remove sched_domain_topology_level::flags
x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
x86/smpboot: moves x86_topology to static initialize and truncate
x86/smpboot: remove redundant CONFIG_SCHED_SMT
smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
tools/sched: Add root_domains_dump.py which dumps root domains info
sched/deadline: Fix accounting after global limits change
sched/deadline: Reset extra_bw to max_bw when clearing root domains
sched/deadline: Initialize dl_servers after SMP
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex updates from Thomas Gleixner:
- Switch the reference counting to a RCU based per-CPU reference to
address a performance bottleneck vs the single instance rcuref
variant
- Make the futex selftest build on 32-bit architectures which only
support 64-bit time_t, e.g. RISCV-32
- Cleanups and improvements in selftests and futex bench
* tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/futex: Fix spelling mistake "Succeffuly" -> "Successfully"
selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t
perf bench futex: Remove support for IMMUTABLE
selftests/futex: Remove support for IMMUTABLE
futex: Remove support for IMMUTABLE
futex: Make futex_private_hash_get() static
futex: Use RCU-based per-CPU reference counting instead of rcuref_t
selftests/futex: Adapt the private hash test to RCU related changes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc / IIO / other driver updates from Greg KH:
"Here is the big set of char/misc/iio and other smaller driver
subsystems for 6.17-rc1. It's a big set this time around, with the
huge majority being in the iio subsystem with new drivers and dts
files being added there.
Highlights include:
- IIO driver updates, additions, and changes making more code const
and cleaning up some init logic
- bus_type constant conversion changes
- misc device test functions added
- rust miscdevice minor fixup
- unused function removals for some drivers
- mei driver updates
- mhi driver updates
- interconnect driver updates
- Android binder updates and test infrastructure added
- small cdx driver updates
- small comedi fixes
- small nvmem driver updates
- small pps driver updates
- some acrn virt driver fixes for printk messages
- other small driver updates
All of these have been in linux-next with no reported issues"
* tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
binder: Use seq_buf in binder_alloc kunit tests
binder: Add copyright notice to new kunit files
misc: ti_fpc202: Switch to of_fwnode_handle()
bus: moxtet: Use dev_fwnode()
pc104: move PC104 option to drivers/Kconfig
drivers: virt: acrn: Don't use %pK through printk
comedi: fix race between polling and detaching
interconnect: qcom: Add Milos interconnect provider driver
dt-bindings: interconnect: document the RPMh Network-On-Chip Interconnect in Qualcomm Milos SoC
mei: more prints with client prefix
mei: bus: use cldev in prints
bus: mhi: host: pci_generic: Add Telit FN990B40 modem support
bus: mhi: host: Detect events pointing to unexpected TREs
bus: mhi: host: pci_generic: Add Foxconn T99W696 modem
bus: mhi: host: Use str_true_false() helper
bus: mhi: host: pci_generic: Add support for EM929x and set MRU to 32768 for better performance.
bus: mhi: host: Fix endianness of BHI vector table
bus: mhi: host: pci_generic: Disable runtime PM for QDU100
bus: mhi: host: pci_generic: Fix the modem name of Foxconn T99W640
dt-bindings: interconnect: qcom,msm8998-bwmon: Allow 'nonposted-mmio'
...
|
|
Pull io_uring updates from Jens Axboe:
- Optimization to avoid reference counts on non-cloned registered
buffers. This is how these buffers were handled prior to having
cloning support, and we can still use that approach as long as the
buffers haven't been cloned to another ring.
- Cleanup and improvement for uring_cmd, where btrfs was the only user
of storing allocated data for the lifetime of the uring_cmd. Clean
that up so we can get rid of the need to do that.
- Avoid unnecessary memory copies in uring_cmd usage. This is
particularly important as a lot of uring_cmd usage necessitates the
use of 128b SQEs.
- A few updates for recv multishot, where it's now possible to add
fairness limits for limiting how much is transferred for each retry
loop. Additionally, recv multishot now supports an overall cap as
well, where once reached the multishot recv will terminate. The
latter is useful for buffer management and juggling many recv streams
at the same time.
- Add support for returning the TX timestamps via a new socket command.
This feature can work in either singleshot or multishot mode, where
the latter triggers a completion whenever new timestamps are
available. This is an alternative to using the existing error queue.
- Add support for an io_uring "mock" file, which is the start of being
able to do 100% targeted testing in terms of exercising io_uring
request handling. The idea is to have a file type that can be
anything the tester would like, and behave exactly how you want it to
behave in terms of hitting the code paths you want.
- Improve zcrx by using sgtables to de-duplicate and improve dma
address handling.
- Prep work for supporting larger pages for zcrx.
- Various little improvements and fixes.
* tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits)
io_uring/zcrx: fix leaking pages on sg init fail
io_uring/zcrx: don't leak pages on account failure
io_uring/zcrx: fix null ifq on area destruction
io_uring: fix breakage in EXPERT menu
io_uring/cmd: remove struct io_uring_cmd_data
btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd
io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag
io_uring/zcrx: account area memory
io_uring: export io_[un]account_mem
io_uring/net: Support multishot receive len cap
io_uring: deduplicate wakeup handling
io_uring/net: cast min_not_zero() type
io_uring/poll: cleanup apoll freeing
io_uring/net: allow multishot receive per-invocation cap
io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags
io_uring/net: use passed in 'len' in io_recv_buf_select()
io_uring/zcrx: prepare fallback for larger pages
io_uring/zcrx: assert area type in io_zcrx_iov_page
io_uring/zcrx: allocate sgtable for umem areas
io_uring/zcrx: introduce io_populate_area_dma
...
|
|
Put the PC104 kconfig option in drivers/Kconfig along with
other buses (AMBA, EISA, PCI, CXL, PCCard, & RapidIO).
This localizes PC104 with option bus kconfig options to make
it easier to find.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: William Breathitt Gray <wbg@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250722235431.3671754-1-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add a dependency for IO_URING for the GCOV_PROFILE_URING symbol.
Without this patch the EXPERT config menu ends with
"Enable IO uring support" and the menu prompts for
GCOV_PROFILE_URING and IO_URING_MOCK_FILE are not subordinate to it.
This causes all of the EXPERT Kconfig options that follow
GCOV_PROFILE_URING to be display in the "upper" menu (General setup),
just following the EXPERT menu.
Fixes: 1802656ef890 ("io_uring: add GCOV_PROFILE_URING Kconfig option")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: io-uring@vger.kernel.org
Link: https://lore.kernel.org/r/20250720010456.2945344-1-rdunlap@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The vmstat_text array contains labels for counters displayed in
/proc/vmstat. It is important to keep the labels in sync with the
counters.
There is a BUILD_BUG_ON() check in vmstat_start() that ensures the size of
the vmstat_text is not smaller than VM_EVENT_COUNTERS. This helps to
catch cases where a new counter is added but the label is not. However,
it does not help if a counter is removed but the label remains.
It would be nice to make the BUILD_BUG_ON() check more strict to catch
such cases. However, when compiling with MEMCG enabled but
VM_EVENT_COUNTERS disabled, the vmstat_text array is larger than
NR_VMSTAT_ITEMS.
This issue arises because some elements of the vmstat_text array are
present when either MEMCG or VM_EVENT_COUNTERS is enabled, but
NR_VMSTAT_ITEMS only accounts for these elements if VM_EVENT_COUNTERS is
enabled.
Instead of adjusting the NR_VMSTAT_ITEMS definition to account for MEMCG,
make MEMCG select VM_EVENT_COUNTERS. VM_EVENT_COUNTERS is enabled in most
configurations anyway.
Link: https://lkml.kernel.org/r/20250604095111.533783-1-kirill.shutemov@linux.intel.com
Fixes: ebc5d83d0443 ("mm/memcontrol: use vmstat names for printing statistics")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument
sched_proxy_exec= that can be used to disable the feature at boot
time if CONFIG_SCHED_PROXY_EXEC was enabled.
Also uses this option to allow the rq->donor to be different from
rq->curr.
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250712033407.2383110-2-jstultz@google.com
|
|
Avoid merge conflicts
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
The use of rcuref_t for reference counting introduces a performance bottleneck
when accessed concurrently by multiple threads during futex operations.
Replace rcuref_t with special crafted per-CPU reference counters. The
lifetime logic remains the same.
The newly allocate private hash starts in FR_PERCPU state. In this state, each
futex operation that requires the private hash uses a per-CPU counter (an
unsigned int) for incrementing or decrementing the reference count.
When the private hash is about to be replaced, the per-CPU counters are
migrated to a atomic_t counter mm_struct::futex_atomic.
The migration process:
- Waiting for one RCU grace period to ensure all users observe the
current private hash. This can be skipped if a grace period elapsed
since the private hash was assigned.
- futex_private_hash::state is set to FR_ATOMIC, forcing all users to
use mm_struct::futex_atomic for reference counting.
- After a RCU grace period, all users are guaranteed to be using the
atomic counter. The per-CPU counters can now be summed up and added to
the atomic_t counter. If the resulting count is zero, the hash can be
safely replaced. Otherwise, active users still hold a valid reference.
- Once the atomic reference count drops to zero, the next futex
operation will switch to the new private hash.
call_rcu_hurry() is used to speed up transition which otherwise might be
delay with RCU_LAZY. There is nothing wrong with using call_rcu(). The
side effects would be that on auto scaling the new hash is used later
and the SET_SLOTS prctl() will block longer.
[bigeasy: commit description + mm get/ put_async]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250710110011.384614-3-bigeasy@linutronix.de
|
|
io_uring commands provide an ioctl style interface for files to
implement file specific operations. io_uring provides many features and
advanced api to commands, and it's getting hard to test as it requires
specific files/devices.
Add basic infrastucture for creating special mock files that will be
implementing the cmd api and using various io_uring features we want to
test. It'll also be useful to test some more obscure read/write/polling
edge cases in the future.
Suggested-by: chase xd <sl1589472800@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93f21b0af58c1367a2b22635d5a7d694ad0272fc.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Chris Mason reported a performance regression on big iron. Reports of
this kind were usually reported as part of a micro benchmark but Chris'
test did mimic his real workload. This makes it a real regression.
The root cause is rcuref_get() which is invoked during each futex
operation. If all threads of an application do this simultaneously then
it leads to cache line bouncing and the performance drops.
Disable FUTEX_PRIVATE_HASH entirely for this cycle. The performance
regression will be addressed in the following cycle enabling the option
again.
Closes: https://lore.kernel.org/all/3ad05298-351e-4d61-9972-ca45a0a50e33@meta.com/
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250630145034.8JnINEaS@linutronix.de
|
|
Most of kernel debugging facilities take a nul-terminated string for
file names for a callsite (generated from __FILE__), however the Rust
courterpart, Location, would return a Rust string (not nul-terminated)
from method .file(). And such a string cannot be passed to C debugging
function directly.
There is ongoing work to support a Location::file_with_nul() [1], which
returns a nul-terminated string from a Location. Since it's still
working in progress, and it will take some time before the feature
finally gets stabilized and the kernel's minimal rustc version might
also take a while to bump to a version that at least has that feature,
introduce a file_from_location() function, which returns a warning
string if Location::file_with_nul() is not available.
This should work in most cases because as for now the known usage of
Location::file_with_nul() is only in debugging code (e.g. might_sleep())
and there might be other information reported by the debugging code that
could help locate the problematic function, so missing the file name is
fine at the moment.
Link: https://github.com/rust-lang/rust/issues/141727 [1]
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250619151007.61767-2-boqun.feng@gmail.com
|
|
From 077814f57f8acce13f91dc34bbd2b7e4911fbf25 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Fri, 13 Jun 2025 15:06:47 -1000
- Add CONFIG_GROUP_SCHED_BANDWIDTH which is selected by both
CONFIG_CFS_BANDWIDTH and EXT_GROUP_SCHED.
- Put bandwidth control interface files for both cgroup v1 and v2 under
CONFIG_GROUP_SCHED_BANDWIDTH.
- Update tg_bandwidth() to fetch configuration parameters from fair if
CONFIG_CFS_BANDWIDTH, SCX otherwise.
- Update tg_set_bandwidth() to update the parameters for both fair and SCX.
- Add bandwidth control parameters to struct scx_cgroup_init_args.
- Add sched_ext_ops.cgroup_set_bandwidth() which is invoked on bandwidth
control parameter updates.
- Update scx_qmap and maximal selftest to test the new feature.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- KUnit '#[test]'s:
- Support KUnit-mapped 'assert!' macros.
The support that landed last cycle was very basic, and the
'assert!' macros panicked since they were the standard library
ones. Now, they are mapped to the KUnit ones in a similar way to
how is done for doctests, reusing the infrastructure there.
With this, a failing test like:
#[test]
fn my_first_test() {
assert_eq!(42, 43);
}
will report:
# my_first_test: ASSERTION FAILED at rust/kernel/lib.rs:251
Expected 42 == 43 to be true, but is false
# my_first_test.speed: normal
not ok 1 my_first_test
- Support tests with checked 'Result' return types.
The return value of test functions that return a 'Result' will
be checked, thus one can now easily catch errors when e.g. using
the '?' operator in tests.
With this, a failing test like:
#[test]
fn my_test() -> Result {
f()?;
Ok(())
}
will report:
# my_test: ASSERTION FAILED at rust/kernel/lib.rs:321
Expected is_test_result_ok(my_test()) to be true, but is false
# my_test.speed: normal
not ok 1 my_test
- Add 'kunit_tests' to the prelude.
- Clarify the remaining language unstable features in use.
- Compile 'core' with edition 2024 for Rust >= 1.87.
- Workaround 'bindgen' issue with forward references to 'enum' types.
- objtool: relax slice condition to cover more 'noreturn' functions.
- Use absolute paths in macros referencing 'core' and 'kernel'
crates.
- Skip '-mno-fdpic' flag for bindgen in GCC 32-bit arm builds.
- Clean some 'doc_markdown' lint hits -- we may enable it later on.
'kernel' crate:
- 'alloc' module:
- 'Box': support for type coercion, e.g. 'Box<T>' to 'Box<dyn U>'
if 'T' implements 'U'.
- 'Vec': implement new methods (prerequisites for nova-core and
binder): 'truncate', 'resize', 'clear', 'pop',
'push_within_capacity' (with new error type 'PushError'),
'drain_all', 'retain', 'remove' (with new error type
'RemoveError'), insert_within_capacity' (with new error type
'InsertError').
In addition, simplify 'push' using 'spare_capacity_mut', split
'set_len' into 'inc_len' and 'dec_len', add type invariant 'len
<= capacity' and simplify 'truncate' using 'dec_len'.
- 'time' module:
- Morph the Rust hrtimer subsystem into the Rust timekeeping
subsystem, covering delay, sleep, timekeeping, timers. This new
subsystem has all the relevant timekeeping C maintainers listed
in the entry.
- Replace 'Ktime' with 'Delta' and 'Instant' types to represent a
duration of time and a point in time.
- Temporarily add 'Ktime' to 'hrtimer' module to allow 'hrtimer'
to delay converting to 'Instant' and 'Delta'.
- 'xarray' module:
- Add a Rust abstraction for the 'xarray' data structure. This
abstraction allows Rust code to leverage the 'xarray' to store
types that implement 'ForeignOwnable'. This support is a
dependency for memory backing feature of the Rust null block
driver, which is waiting to be merged.
- Set up an entry in 'MAINTAINERS' for the XArray Rust support.
Patches will go to the new Rust XArray tree and then via the
Rust subsystem tree for now.
- Allow 'ForeignOwnable' to carry information about the pointed-to
type. This helps asserting alignment requirements for the
pointer passed to the foreign language.
- 'container_of!': retain pointer mut-ness and add a compile-time
check of the type of the first parameter ('$field_ptr').
- Support optional message in 'static_assert!'.
- Add C FFI types (e.g. 'c_int') to the prelude.
- 'str' module: simplify KUnit tests 'format!' macro, convert
'rusttest' tests into KUnit, take advantage of the '-> Result'
support in KUnit '#[test]'s.
- 'list' module: add examples for 'List', fix path of
'assert_pinned!' (so far unused macro rule).
- 'workqueue' module: remove 'HasWork::OFFSET'.
- 'page' module: add 'inline' attribute.
'macros' crate:
- 'module' macro: place 'cleanup_module()' in '.exit.text' section.
'pin-init' crate:
- Add 'Wrapper<T>' trait for creating pin-initializers for wrapper
structs with a structurally pinned value such as 'UnsafeCell<T>' or
'MaybeUninit<T>'.
- Add 'MaybeZeroable' derive macro to try to derive 'Zeroable', but
not error if not all fields implement it. This is needed to derive
'Zeroable' for all bindgen-generated structs.
- Add 'unsafe fn cast_[pin_]init()' functions to unsafely change the
initialized type of an initializer. These are utilized by the
'Wrapper<T>' implementations.
- Add support for visibility in 'Zeroable' derive macro.
- Add support for 'union's in 'Zeroable' derive macro.
- Upstream dev news: streamline CI, fix some bugs. Add new workflows
to check if the user-space version and the one in the kernel tree
have diverged. Use the issues tab [1] to track them, which should
help folks report and diagnose issues w.r.t. 'pin-init' better.
[1] https://github.com/rust-for-linux/pin-init/issues
Documentation:
- Testing: add docs on the new KUnit '#[test]' tests.
- Coding guidelines: explain that '///' vs. '//' applies to private
items too. Add section on C FFI types.
- Quick Start guide: update Ubuntu instructions and split them into
"25.04" and "24.04 LTS and older".
And a few other cleanups and improvements"
* tag 'rust-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (78 commits)
rust: list: Fix typo `much` in arc.rs
rust: check type of `$ptr` in `container_of!`
rust: workqueue: remove HasWork::OFFSET
rust: retain pointer mut-ness in `container_of!`
Documentation: rust: testing: add docs on the new KUnit `#[test]` tests
Documentation: rust: rename `#[test]`s to "`rusttest` host tests"
rust: str: take advantage of the `-> Result` support in KUnit `#[test]`'s
rust: str: simplify KUnit tests `format!` macro
rust: str: convert `rusttest` tests into KUnit
rust: add `kunit_tests` to the prelude
rust: kunit: support checked `-> Result`s in KUnit `#[test]`s
rust: kunit: support KUnit-mapped `assert!` macros in `#[test]`s
rust: make section names plural
rust: list: fix path of `assert_pinned!`
rust: compile libcore with edition 2024 for 1.87+
rust: dma: add missing Markdown code span
rust: task: add missing Markdown code spans and intra-doc links
rust: pci: fix docs related to missing Markdown code spans
rust: alloc: add missing Markdown code span
rust: alloc: add missing Markdown code spans
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "zram: support algorithm-specific parameters" from Sergey Senozhatsky
adds infrastructure for passing algorithm-specific parameters into
zram. A single parameter `winbits' is implemented at this time.
- "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg
charging nmi-safe, which is required by BFP, which can operate in NMI
context.
- "Some random fixes and cleanup to shmem" from Kemeng Shi implements
small fixes and cleanups in the shmem code.
- "Skip mm selftests instead when kernel features are not present" from
Zi Yan fixes some issues in the MM selftest code.
- "mm/damon: build-enable essential DAMON components by default" from
SeongJae Park reworks DAMON Kconfig to make it easier to enable
CONFIG_DAMON.
- "sched/numa: add statistics of numa balance task migration" from Libo
Chen adds more info into sysfs and procfs files to improve visibility
into the NUMA balancer's task migration activity.
- "selftests/mm: cow and gup_longterm cleanups" from Mark Brown
provides various updates to some of the MM selftests to make them
play better with the overall containing framework.
* tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits)
mm/khugepaged: clean up refcount check using folio_expected_ref_count()
selftests/mm: fix test result reporting in gup_longterm
selftests/mm: report unique test names for each cow test
selftests/mm: add helper for logging test start and results
selftests/mm: use standard ksft_finished() in cow and gup_longterm
selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
sched/numa: add statistics of numa balance task
sched/numa: fix task swap by skipping kernel threads
tools/testing: check correct variable in open_procmap()
tools/testing/vma: add missing function stub
mm/gup: update comment explaining why gup_fast() disables IRQs
selftests/mm: two fixes for the pfnmap test
mm/khugepaged: fix race with folio split/free using temporary reference
mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order
mmu_notifiers: remove leftover stub macros
selftests/mm: deduplicate test names in madv_populate
kcov: rust: add flags for KCOV with Rust
mm: rust: make CONFIG_MMU ifdefs more narrow
mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
mm/damon/Kconfig: enable CONFIG_DAMON by default
...
|
|
There are archs which have NMI but does not support this_cpu_* ops safely
in the nmi context but they support safe atomic ops in nmi context. For
such archs, let's add infra to use atomic ops for the memcg stats which
can be updated in nmi.
At the moment, the memcg stats which get updated in the objcg charging
path are MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B.
Rather than adding support for all memcg stats to be nmi safe, let's just
add infra to make these three stats nmi safe which this patch is doing.
Link: https://lkml.kernel.org/r/20250519063142.111219-3-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "memcg: nmi-safe kmem charging", v4.
Users can attached their BPF programs at arbitrary execution points in the
kernel and such BPF programs may run in nmi context. In addition, these
programs can trigger memcg charged kernel allocations in the nmi context.
However memcg charging infra for kernel memory is not equipped to handle
nmi context for all architectures.
This series removes the hurdles to enable kmem charging in the nmi context
for most of the archs. For archs without CONFIG_HAVE_NMI, this series is
a noop. For archs with NMI support and have
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS, the previous work to make memcg
stats re-entrant is sufficient for allowing kmem charging in nmi context.
For archs with NMI support but without
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and with ARCH_HAVE_NMI_SAFE_CMPXCHG,
this series added infra to support kmem charging in nmi context. Lastly
those archs with NMI support but without
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and ARCH_HAVE_NMI_SAFE_CMPXCHG, kmem
charging in nmi context is not supported at all.
Mostly used archs have support for CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
and this series should be almost a noop (other than making
memcg_rstat_updated nmi safe) for such archs.
This patch (of 5):
The memcg accounting and stats uses this_cpu* and atomic* ops. There are
archs which define CONFIG_HAVE_NMI but does not define
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and ARCH_HAVE_NMI_SAFE_CMPXCHG, so
memcg accounting for such archs in nmi context is not possible to support.
Let's just disable memcg accounting in nmi context for such archs.
Link: https://lkml.kernel.org/r/20250519063142.111219-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20250519063142.111219-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- Update overflow helpers to ease refactoring of on-stack flex array
instances (Gustavo A. R. Silva, Kees Cook)
- lkdtm: Use SLAB_NO_MERGE instead of constructors (Harry Yoo)
- Simplify CONFIG_CC_HAS_COUNTED_BY (Jan Hendrik Farr)
- Disable u64 usercopy KUnit test on 32-bit SPARC (Thomas Weißschuh)
- Add missed designated initializers now exposed by fixed randstruct
(Nathan Chancellor, Kees Cook)
- Document compilers versions for __builtin_dynamic_object_size
- Remove ARM_SSP_PER_TASK GCC plugin
- Fix GCC plugin randstruct, add selftests, and restore COMPILE_TEST
builds
- Kbuild: induce full rebuilds when dependencies change with GCC
plugins, the Clang sanitizer .scl file, or the randstruct seed.
- Kbuild: Switch from -Wvla to -Wvla-larger-than=1
- Correct several __nonstring uses for -Wunterminated-string-initialization
* tag 'hardening-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
Revert "hardening: Disable GCC randstruct for COMPILE_TEST"
lib/tests: randstruct: Add deep function pointer layout test
lib/tests: Add randstruct KUnit test
randstruct: gcc-plugin: Remove bogus void member
net: qede: Initialize qede_ll_ops with designated initializer
scsi: qedf: Use designated initializer for struct qed_fcoe_cb_ops
md/bcache: Mark __nonstring look-up table
integer-wrap: Force full rebuild when .scl file changes
randstruct: Force full rebuild when seed changes
gcc-plugins: Force full rebuild when plugins change
kbuild: Switch from -Wvla to -Wvla-larger-than=1
hardening: simplify CONFIG_CC_HAS_COUNTED_BY
overflow: Fix direct struct member initialization in _DEFINE_FLEX()
kunit/overflow: Add tests for STACK_FLEX_ARRAY_SIZE() helper
overflow: Add STACK_FLEX_ARRAY_SIZE() helper
input/joystick: magellan: Mark __nonstring look-up table const
watchdog: exar: Shorten identity name to fit correctly
mod_devicetable: Enlarge the maximum platform_device_id name length
overflow: Clarify expectations for getting DEFINE_FLEX variable sizes
compiler_types: Identify compiler versions for __builtin_dynamic_object_size
...
|
|
The KUnit `#[test]` support that landed recently is very basic and does
not map the `assert*!` macros into KUnit like the doctests do, so they
panic at the moment.
Thus implement the custom mapping in a similar way to doctests, reusing
the infrastructure there.
In Rust 1.88.0, the `file()` method in `Span` may be stable [1]. However,
it was changed recently (from `SourceFile`), so we need to do something
different in previous versions. Thus create a helper for it and use it
to get the path.
With this, a failing test suite like:
#[kunit_tests(my_test_suite)]
mod tests {
use super::*;
#[test]
fn my_first_test() {
assert_eq!(42, 43);
}
#[test]
fn my_second_test() {
assert!(42 >= 43);
}
}
will properly map back to KUnit, printing something like:
[ 1.924325] KTAP version 1
[ 1.924421] # Subtest: my_test_suite
[ 1.924506] # speed: normal
[ 1.924525] 1..2
[ 1.926385] # my_first_test: ASSERTION FAILED at rust/kernel/lib.rs:251
[ 1.926385] Expected 42 == 43 to be true, but is false
[ 1.928026] # my_first_test.speed: normal
[ 1.928075] not ok 1 my_first_test
[ 1.928723] # my_second_test: ASSERTION FAILED at rust/kernel/lib.rs:256
[ 1.928723] Expected 42 >= 43 to be true, but is false
[ 1.929834] # my_second_test.speed: normal
[ 1.929868] not ok 2 my_second_test
[ 1.930032] # my_test_suite: pass:0 fail:2 skip:0 total:2
[ 1.930153] # Totals: pass:0 fail:2 skip:0 total
Link: https://github.com/rust-lang/rust/pull/140514 [1]
Reviewed-by: David Gow <davidgow@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/r/20250502215133.1923676-2-ojeda@kernel.org
[ Required `KUNIT=y` like for doctests. Used the `cfg_attr` from the
TODO comment and clarified its comment now that the stabilization is
in beta and thus quite likely stable in Rust 1.88.0. Simplified the
`new_body` code by introducing a new variable. Added
`#[allow(clippy::incompatible_msrv)]`. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core & fair scheduler changes:
- Tweak wait_task_inactive() to force dequeue sched_delayed tasks
(John Stultz)
- Adhere to place_entity() constraints (Peter Zijlstra)
- Allow decaying util_est when util_avg > CPU capacity (Pierre
Gondois)
- Fix up wake_up_sync() vs DELAYED_DEQUEUE (Xuewen Yan)
Energy management:
- Introduce sched_update_asym_prefer_cpu() (K Prateek Nayak)
- cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings
change (K Prateek Nayak)
- Align uclamp and util_est and call before freq update (Xuewen Yan)
CPU isolation:
- Make use of more than one housekeeping CPU (Phil Auld)
RT scheduler:
- Fix race in push_rt_task() (Harshit Agarwal)
- Add kernel cmdline option for rt_group_sched (Michal Koutný)
Scheduler topology support:
- Improve topology_span_sane speed (Steve Wahl)
Scheduler debugging:
- Move and extend the sched_process_exit() tracepoint (Andrii
Nakryiko)
- Add RT_GROUP WARN checks for non-root task_groups (Michal Koutný)
- Fix trace_sched_switch(.prev_state) (Peter Zijlstra)
- Untangle cond_resched() and live-patching (Peter Zijlstra)
Fixes and cleanups:
- Misc fixes and cleanups (K Prateek Nayak, Michal Koutný, Peter
Zijlstra, Xuewen Yan)"
* tag 'sched-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
sched/uclamp: Align uclamp and util_est and call before freq update
sched/util_est: Simplify condition for util_est_{en,de}queue()
sched/fair: Fixup wake_up_sync() vs DELAYED_DEQUEUE
sched,livepatch: Untangle cond_resched() and live-patching
sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks
sched/fair: Adhere to place_entity() constraints
sched/debug: Print the local group's asym_prefer_cpu
cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change
sched/topology: Introduce sched_update_asym_prefer_cpu()
sched/fair: Use READ_ONCE() to read sg->asym_prefer_cpu
sched/isolation: Make use of more than one housekeeping cpu
sched/rt: Fix race in push_rt_task
sched: Add annotations to RT_GROUP_SCHED fields
sched: Add RT_GROUP WARN checks for non-root task_groups
sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled
sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
sched: Add commadline option for RT_GROUP_SCHED toggling
sched: Always initialize rt_rq's task_group
sched: Remove unneeed macro wrap
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Futexes:
- Add support for task local hash maps (Sebastian Andrzej Siewior,
Peter Zijlstra)
- Implement the FUTEX2_NUMA ABI, which feature extends the futex
interface to be NUMA-aware. On NUMA-aware futexes a second u32 word
containing the NUMA node is added to after the u32 futex value word
(Peter Zijlstra)
- Implement the FUTEX2_MPOL ABI, which feature extends the futex
interface to be mempolicy-aware as well, to further refine futex
node mappings and lookups (Peter Zijlstra)
Locking primitives:
- Misc cleanups (Andy Shevchenko, Borislav Petkov, Colin Ian King,
Ingo Molnar, Nam Cao, Peter Zijlstra)
Lockdep:
- Prevent abuse of lockdep subclasses (Waiman Long)
- Add number of dynamic keys to /proc/lockdep_stats (Waiman Long)
Plus misc cleanups and fixes"
* tag 'locking-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
selftests/futex: Fix spelling mistake "unitiliazed" -> "uninitialized"
futex: Correct the kernedoc return value for futex_wait_setup().
tools headers: Synchronize prctl.h ABI header
futex: Use RCU_INIT_POINTER() in futex_mm_init().
selftests/futex: Use TAP output in futex_numa_mpol
selftests/futex: Use TAP output in futex_priv_hash
futex: Fix kernel-doc comments
futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in futex_mm_init()
futex: Fix outdated comment in struct restart_block
locking/lockdep: Add number of dynamic keys to /proc/lockdep_stats
locking/lockdep: Prevent abuse of lockdep subclass
locking/lockdep: Move hlock_equal() to the respective #ifdeffery
futex,selftests: Add another FUTEX2_NUMA selftest
selftests/futex: Add futex_numa_mpol
selftests/futex: Add futex_priv_hash
selftests/futex: Build without headers nonsense
tools/perf: Allow to select the number of hash buckets
tools headers: Synchronize prctl.h ABI header
futex: Implement FUTEX2_MPOL
futex: Implement FUTEX2_NUMA
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Use folios for symlinks in the page cache
FUSE already uses folios for its symlinks. Mirror that conversion
in the generic code and the NFS code. That lets us get rid of a few
folio->page->folio conversions in this path, and some of the few
remaining users of read_cache_page() / read_mapping_page()
- Try and make a few filesystem operations killable on the VFS
inode->i_mutex level
- Add sysctl vfs_cache_pressure_denom for bulk file operations
Some workloads need to preserve more dentries than we currently
allow through out sysctl interface
A HDFS servers with 12 HDDs per server, on a HDFS datanode startup
involves scanning all files and caching their metadata (including
dentries and inodes) in memory. Each HDD contains approximately 2
million files, resulting in a total of ~20 million cached dentries
after initialization
To minimize dentry reclamation, they set vfs_cache_pressure to 1.
Despite this configuration, memory pressure conditions can still
trigger reclamation of up to 50% of cached dentries, reducing the
cache from 20 million to approximately 10 million entries. During
the subsequent cache rebuild period, any HDFS datanode restart
operation incurs substantial latency penalties until full cache
recovery completes
To maintain service stability, more dentries need to be preserved
during memory reclamation. The current minimum reclaim ratio (1/100
of total dentries) remains too aggressive for such workload. This
patch introduces vfs_cache_pressure_denom for more granular cache
pressure control
The configuration [vfs_cache_pressure=1,
vfs_cache_pressure_denom=10000] effectively maintains the full 20
million dentry cache under memory pressure, preventing datanode
restart performance degradation
- Avoid some jumps in inode_permission() using likely()/unlikely()
- Avid a memory access which is most likely a cache miss when
descending into devcgroup_inode_permission()
- Add fastpath predicts for stat() and fdput()
- Anonymous inodes currently don't come with a proper mode causing
issues in the kernel when we want to add useful VFS debug assert.
Fix that by giving them a proper mode and masking it off when we
report it to userspace which relies on them not having any mode
- Anonymous inodes currently allow to change inode attributes because
the VFS falls back to simple_setattr() if i_op->setattr isn't
implemented. This means the ownership and mode for every single
user of anon_inode_inode can be changed. Block that as it's either
useless or actively harmful. If specific ownership is needed the
respective subsystem should allocate anonymous inodes from their
own private superblock
- Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock
- Add proper tests for anonymous inode behavior
- Make it easy to detect proper anonymous inodes and to ensure that
we can detect them in codepaths such as readahead()
Cleanups:
- Port pidfs to the new anon_inode_{g,s}etattr() helpers
- Try to remove the uselib() system call
- Add unlikely branch hint return path for poll
- Add unlikely branch hint on return path for core_sys_select
- Don't allow signals to interrupt getdents copying for fuse
- Provide a size hint to dir_context for during readdir()
- Use writeback_iter directly in mpage_writepages
- Update compression and mtime descriptions in initramfs
documentation
- Update main netfs API document
- Remove useless plus one in super_cache_scan()
- Remove unnecessary NULL-check guards during setns()
- Add separate separate {get,put}_cgroup_ns no-op cases
Fixes:
- Fix typo in root= kernel parameter description
- Use KERN_INFO for infof()|info_plog()|infofc()
- Correct comments of fs_validate_description()
- Mark an unlikely if condition with unlikely() in
vfs_parse_monolithic_sep()
- Delete macro fsparam_u32hex()
- Remove unused and problematic validate_constant_table()
- Fix potential unsigned integer underflow in fs_name()
- Make file-nr output the total allocated file handles"
* tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits)
fs: Pass a folio to page_put_link()
nfs: Use a folio in nfs_get_link()
fs: Convert __page_get_link() to use a folio
fs/read_write: make default_llseek() killable
fs/open: make do_truncate() killable
fs/open: make chmod_common() and chown_common() killable
include/linux/fs.h: add inode_lock_killable()
readdir: supply dir_context.count as readdir buffer size hint
vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
fuse: don't allow signals to interrupt getdents copying
Documentation: fix typo in root= kernel parameter description
include/cgroup: separate {get,put}_cgroup_ns no-op case
kernel/nsproxy: remove unnecessary guards
fs: use writeback_iter directly in mpage_writepages
fs: remove useless plus one in super_cache_scan()
fs: add S_ANON_INODE
fs: remove uselib() system call
device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()
fs/fs_parse: Remove unused and problematic validate_constant_table()
fs: touch up predicts in inode_permission()
...
|
|
This is a leftover from commit 98e20e5e13d2 ("bpfilter: remove bpfilter").
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Simplifies CONFIG_CC_HAS_COUNTED_BY by removing the build test and
relying solely on gcc/clang version numbering (GCC_VERSION >= 150100 and
CLANG_VERSION >= 190103).
The build test was used to allow unreleased gcc 15.0 builds to use the
__counted_by attribute. Now that gcc 15.1.0 has been released, this is
not needed anymore. Note: This will disable __counted_by on unreleased
gcc 15.0 builds.
clang version support for __counted_by remains unchanged.
Link: https://lore.kernel.org/all/Zw8iawAF5W2uzGuh@archlinux/T/#m204c09f63c076586a02d194b87dffc7e81b8de7b
Link: https://lore.kernel.org/r/20241029140036.577804-2-kernel@jfarr.cc
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jan Hendrik Farr <kernel@jfarr.cc>
Link: https://lore.kernel.org/r/20250430184231.671365-2-kernel@jfarr.cc
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Starting with Rust 1.88.0 (expected 2025-06-26) [1][2], `rustc` may
introduce a new lint that catches unnecessary transmutes, e.g.:
error: unnecessary transmute
--> rust/uapi/uapi_generated.rs:23242:18
|
23242 | unsafe { ::core::mem::transmute(self._bitfield_1.get(0usize, 1u8) as u8) }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace this with: `(self._bitfield_1.get(0usize, 1u8) as u8 == 1)`
|
= note: `-D unnecessary-transmutes` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(unnecessary_transmutes)]`
There are a lot of them (at least 300), but luckily they are all in
`bindgen`-generated code.
Thus clean all up by allowing it there.
Since unknown lints trigger a lint itself in older compilers, do it
conditionally so that we can keep the `unknown_lints` lint enabled.
Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Link: https://github.com/rust-lang/rust/pull/136083 [1]
Link: https://github.com/rust-lang/rust/issues/136067 [2]
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250502140237.1659624-4-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Extend the futex2 interface to be aware of mempolicy.
When FUTEX2_MPOL is specified and there is a MPOL_PREFERRED or
home_node specified covering the futex address, use that hash-map.
Notably, in this case the futex will go to the global node hashtable,
even if it is a PRIVATE futex.
When FUTEX2_NUMA|FUTEX2_MPOL is specified and the user specified node
value is FUTEX_NO_NODE, the MPOL lookup (as described above) will be
tried first before reverting to setting node to the local node.
[bigeasy: add CONFIG_FUTEX_MPOL, add MPOL to FUTEX2_VALID_MASK, write
the node only to user if FUTEX_NO_NODE was supplied]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-18-bigeasy@linutronix.de
|
|
The futex hash is system wide and shared by all tasks. Each slot
is hashed based on futex address and the VMA of the thread. Due to
randomized VMAs (and memory allocations) the same logical lock (pointer)
can end up in a different hash bucket on each invocation of the
application. This in turn means that different applications may share a
hash bucket on the first invocation but not on the second and it is not
always clear which applications will be involved. This can result in
high latency's to acquire the futex_hash_bucket::lock especially if the
lock owner is limited to a CPU and can not be effectively PI boosted.
Introduce basic infrastructure for process local hash which is shared by
all threads of process. This hash will only be used for a
PROCESS_PRIVATE FUTEX operation.
The hashmap can be allocated via:
prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, num);
A `num' of 0 means that the global hash is used instead of a private
hash.
Other values for `num' specify the number of slots for the hash and the
number must be power of two, starting with two.
The prctl() returns zero on success. This function can only be used
before a thread is created.
The current status for the private hash can be queried via:
num = prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS);
which return the current number of slots. The value 0 means that the
global hash is used. Values greater than 0 indicate the number of slots
that are used. A negative number indicates an error.
For optimisation, for the private hash jhash2() uses only two arguments
the address and the offset. This omits the VMA which is always the same.
[peterz: Use 0 for global hash. A bit shuffling and renaming. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250416162921.513656-13-bigeasy@linutronix.de
|
|
This system call has been deprecated for quite a while now.
Let's try and remove it from the kernel completely.
Link: https://lore.kernel.org/20250415-kanufahren-besten-02ac00e6becd@brauner
Acked-by: Kees Cook <kees@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This odd system call will be removed in the future. Let's decouple it
from CONFIG_EXPERT and switch the default to n as a first step.
Link: https://lore.kernel.org/20250415-dezimieren-wertpapier-9fd18a211a41@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Only simple implementation with a static key wrapper, it will be wired
in later.
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250310170442.504716-5-mkoutny@suse.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix a nonsensical Kconfig combination
- Remove an unnecessary rseq-notification
* tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Eliminate useless task_work on execve
sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
|
|
Pull ARM and clkdev updates from Russell King:
- Simplify ARM_MMU_KEEP usage
- Add Rust support for ARM architecture version 7
- Align IPIs reported in /proc/interrupts
- require linker to support KEEP within OVERLAY
- add KEEP() for ARM vectors
- add __printf() attribute for clkdev functions
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9445/1: clkdev: Mark some functions with __printf() attribute
ARM: 9444/1: add KEEP() keyword to ARM_VECTORS
ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE
ARM: 9442/1: smp: Fix IPI alignment in /proc/interrupts
ARM: 9441/1: rust: Enable Rust support for ARMv7
ARM: 9439/1: arm32: simplify ARM_MMU_KEEP usage
|
|
kernel/sched/isolation.c obviously makes no sense without CONFIG_SMP, but
the Kconfig entry we have right now:
config CPU_ISOLATION
bool "CPU isolation"
depends on SMP || COMPILE_TEST
allows the creation of pointless .config's which cause
build failures.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250330134955.GA7910@redhat.com
Closes: https://lore.kernel.org/oe-kbuild-all/202503260646.lrUqD3j5-lkp@intel.com/
|
|
Patch series "mseal system mappings", v9.
As discussed during mseal() upstream process [1], mseal() protects the
VMAs of a given virtual memory range against modifications, such as the
read/write (RW) and no-execute (NX) bits. For complete descriptions of
memory sealing, please see mseal.rst [2].
The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped.
The system mappings are readonly only, memory sealing can protect them
from ever changing to writable or unmmap/remapped as different attributes.
System mappings such as vdso, vvar, vvar_vclock, vectors (arm
compat-mode), sigpage (arm compat-mode), are created by the kernel during
program initialization, and could be sealed after creation.
Unlike the aforementioned mappings, the uprobe mapping is not established
during program startup. However, its lifetime is the same as the
process's lifetime [3]. It could be sealed from creation.
The vsyscall on x86-64 uses a special address (0xffffffffff600000), which
is outside the mm managed range. This means mprotect, munmap, and mremap
won't work on the vsyscall. Since sealing doesn't enhance the vsyscall's
security, it is skipped in this patch. If we ever seal the vsyscall, it
is probably only for decorative purpose, i.e. showing the 'sl' flag in
the /proc/pid/smaps. For this patch, it is ignored.
It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations. UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings.
Consequently, this feature cannot be universally enabled across all
systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
To support mseal of system mappings, architectures must define
CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special
mappings calls to pass mseal flag. Additionally, architectures must
confirm they do not unmap/remap system mappings during the process
lifetime. The existence of this flag for an architecture implies that it
does not require the remapping of thest system mappings during process
lifetime, so sealing these mappings is safe from a kernel perspective.
This version covers x86-64 and arm64 archiecture as minimum viable feature.
While no specific CPU hardware features are required for enable this
feature on an archiecture, memory sealing requires a 64-bit kernel. Other
architectures can choose whether or not to adopt this feature. Currently,
I'm not aware of any instances in the kernel code that actively
munmap/mremap a system mapping without a request from userspace. The PPC
does call munmap when _install_special_mapping fails for vdso; however,
it's uncertain if this will ever fail for PPC - this needs to be
investigated by PPC in the future [4]. The UML kernel can add this
support when KUnit tests require it [5].
In this version, we've improved the handling of system mapping sealing
from previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it. This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.
Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker. This approach revealed several
issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller
than its actual size). The dynamic linker, which relies on PT_LOAD
information to determine mapping size, would then split and partially
seal the vdso mapping. Since each architecture has its own vdso/vvar
code, fixing this in the kernel would require going through each
archiecture. Our initial goal was to enable sealing readonly mappings,
e.g. .text, across all architectures, sealing vdso from kernel since
creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not
length information. Similar issues might exist for other special
mappings.
- Mappings like uprobe are not covered by the dynamic linker, and there
is no effective solution for them.
This feature's security enhancements will benefit ChromeOS, Android, and
other high security systems.
Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
Android to ensure the sealing doesn't affect the functionality of
Chromebook and Android phone.
I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
normal operations such as browsing the web, open/edit doc are OK.
Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]
Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt+HA@mail.gmail.com/ [4]
Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]
This patch (of 7):
Provide infrastructure to mseal system mappings. Establish two kernel
configs (CONFIG_MSEAL_SYSTEM_MAPPINGS,
ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future
patches.
Link: https://lkml.kernel.org/r/20250305021711.3867874-1-jeffxu@google.com
Link: https://lkml.kernel.org/r/20250305021711.3867874-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
ld.lld prior to 21.0.0 does not support using the KEEP keyword within an
overlay description, which may be needed to avoid discarding necessary
sections within an overlay with '--gc-sections', which can be enabled
for the kernel via CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.
Disallow CONFIG_LD_DEAD_CODE_DATA_ELIMINATION without support for KEEP
within OVERLAY and introduce a macro, OVERLAY_KEEP, that can be used to
conditionally add KEEP when it is properly supported to avoid breaking
old versions of ld.lld.
Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/381599f1fe973afad3094e55ec99b1620dba7d8c
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Ingo Molnar:
"x86 CPU features support:
- Generate the <asm/cpufeaturemasks.h> header based on build config
(H. Peter Anvin, Xin Li)
- x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
- Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
- Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
Jackman)
- Utilize CPU-type for CPU matching (Pawan Gupta)
- Warn about unmet CPU feature dependencies (Sohil Mehta)
- Prepare for new Intel Family numbers (Sohil Mehta)
Percpu code:
- Standardize & reorganize the x86 percpu layout and related cleanups
(Brian Gerst)
- Convert the stackprotector canary to a regular percpu variable
(Brian Gerst)
- Add a percpu subsection for cache hot data (Brian Gerst)
- Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
- Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
MM:
- Add support for broadcast TLB invalidation using AMD's INVLPGB
instruction (Rik van Riel)
- Rework ROX cache to avoid writable copy (Mike Rapoport)
- PAT: restore large ROX pages after fragmentation (Kirill A.
Shutemov, Mike Rapoport)
- Make memremap(MEMREMAP_WB) map memory as encrypted by default
(Kirill A. Shutemov)
- Robustify page table initialization (Kirill A. Shutemov)
- Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
- Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
(Matthew Wilcox)
KASLR:
- x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
Singh)
CPU bugs:
- Implement FineIBT-BHI mitigation (Peter Zijlstra)
- speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
- speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
Gupta)
- RFDS: Exclude P-only parts from the RFDS affected list (Pawan
Gupta)
System calls:
- Break up entry/common.c (Brian Gerst)
- Move sysctls into arch/x86 (Joel Granados)
Intel LAM support updates: (Maciej Wieczor-Retman)
- selftests/lam: Move cpu_has_la57() to use cpuinfo flag
- selftests/lam: Skip test if LAM is disabled
- selftests/lam: Test get_user() LAM pointer handling
AMD SMN access updates:
- Add SMN offsets to exclusive region access (Mario Limonciello)
- Add support for debugfs access to SMN registers (Mario Limonciello)
- Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
Power management updates: (Patryk Wlazlyn)
- Allow calling mwait_play_dead with an arbitrary hint
- ACPI/processor_idle: Add FFH state handling
- intel_idle: Provide the default enter_dead() handler
- Eliminate mwait_play_dead_cpuid_hint()
Build system:
- Raise the minimum GCC version to 8.1 (Brian Gerst)
- Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)
Kconfig: (Arnd Bergmann)
- Add cmpxchg8b support back to Geode CPUs
- Drop 32-bit "bigsmp" machine support
- Rework CONFIG_GENERIC_CPU compiler flags
- Drop configuration options for early 64-bit CPUs
- Remove CONFIG_HIGHMEM64G support
- Drop CONFIG_SWIOTLB for PAE
- Drop support for CONFIG_HIGHPTE
- Document CONFIG_X86_INTEL_MID as 64-bit-only
- Remove old STA2x11 support
- Only allow CONFIG_EISA for 32-bit
Headers:
- Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
headers (Thomas Huth)
Assembly code & machine code patching:
- x86/alternatives: Simplify alternative_call() interface (Josh
Poimboeuf)
- x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
- KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
- x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
- x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
- x86/kexec: Merge x86_32 and x86_64 code using macros from
<asm/asm.h> (Uros Bizjak)
- Use named operands in inline asm (Uros Bizjak)
- Improve performance by using asm_inline() for atomic locking
instructions (Uros Bizjak)
Earlyprintk:
- Harden early_serial (Peter Zijlstra)
NMI handler:
- Add an emergency handler in nmi_desc & use it in
nmi_shootdown_cpus() (Waiman Long)
Miscellaneous fixes and cleanups:
- by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
Kuznetsov, Xin Li, liuye"
* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
x86/asm: Make asm export of __ref_stack_chk_guard unconditional
x86/mm: Only do broadcast flush from reclaim if pages were unmapped
perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
x86/asm: Use asm_inline() instead of asm() in clwb()
x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
x86/hweight: Use asm_inline() instead of asm()
x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
x86/hweight: Use named operands in inline asm()
x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
x86/head/64: Avoid Clang < 17 stack protector in startup code
x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
x86/cpu/intel: Limit the non-architectural constant_tsc model checks
x86/mm/pat: Replace Intel x86_model checks with VFM ones
x86/cpu/intel: Fix fast string initialization for extended Families
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Add deprecation info messages to cgroup1-only features
- rstat updates including a bug fix and breaking up a critical section
to reduce interrupt latency impact
- Other misc and doc updates
* tag 'cgroup-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: rstat: Cleanup flushing functions and locking
cgroup/rstat: avoid disabling irqs for O(num_cpu)
mm: Fix a build breakage in memcontrol-v1.c
blk-cgroup: Simplify policy files registration
cgroup: Update file naming comment
cgroup: Add deprecation message to legacy freezer controller
mm: Add transformation message for per-memcg swappiness
RFC cgroup/cpuset-v1: Add deprecation messages to sched_relax_domain_level
cgroup/cpuset-v1: Add deprecation messages to memory_migrate
cgroup/cpuset-v1: Add deprecation messages to mem_exclusive and mem_hardwall
cgroup: Print message when /proc/cgroups is read on v2-only system
cgroup/blkio: Add deprecation messages to reset_stats
cgroup/cpuset-v1: Add deprecation messages to memory_spread_page and memory_spread_slab
cgroup/cpuset-v1: Add deprecation messages to sched_load_balance and memory_pressure_enabled
cgroup, docs: Be explicit about independence of RT_GROUP_SCHED and non-cpu controllers
cgroup/rstat: Fix forceidle time in cpu.stat
cgroup/misc: Remove unused misc_cg_res_total_usage
cgroup/cpuset: Move procfs cpuset attribute under cgroup-v1.c
cgroup: update comment about dropping cgroup kn refs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As usual, it's scattered changes all over. Patches touching things
outside of our traditional areas in the tree have been Acked by
maintainers or were trivial changes:
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan
Vadivel)
- samples/check-exec: Fix script name (Mickaël Salaün)
- yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
- lib/string_choices: Sort by function name (R Sundar)
- hardening: Allow default HARDENED_USERCOPY to be set at compile
time (Mel Gorman)
- uaccess: Split out compile-time checks into ucopysize.h
- kbuild: clang: Support building UM with SUBARCH=i386
- x86: Enable i386 FORTIFY_SOURCE on Clang 16+
- ubsan/overflow: Rework integer overflow sanitizer option
- Add missing __nonstring annotations for callers of
memtostr*()/strtomem*()
- Add __must_be_noncstr() and have memtostr*()/strtomem*() check for
it
- Introduce __nonstring_array for silencing future GCC 15 warnings"
* tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
compiler_types: Introduce __nonstring_array
hardening: Enable i386 FORTIFY_SOURCE on Clang 16+
x86/build: Remove -ffreestanding on i386 with GCC
ubsan/overflow: Enable ignorelist parsing and add type filter
ubsan/overflow: Enable pattern exclusions
ubsan/overflow: Rework integer overflow sanitizer option to turn on everything
samples/check-exec: Fix script name
yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl()
kbuild: clang: Support building UM with SUBARCH=i386
loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported
lib/string_choices: Rearrange functions in sorted order
string.h: Validate memtostr*()/strtomem*() arguments more carefully
compiler.h: Introduce __must_be_noncstr()
nilfs2: Mark on-disk strings as nonstring
uapi: stddef.h: Introduce __kernel_nonstring
x86/tdx: Mark message.bytes as nonstring
string: kunit: Mark nonstring test strings as __nonstring
scsi: qla2xxx: Mark device strings as nonstring
scsi: mpt3sas: Mark device strings as nonstring
scsi: mpi3mr: Mark device strings as nonstring
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs initramfs updates from Christian Brauner:
"This adds basic kunit test coverage for initramfs unpacking and cleans
up some buffer handling issues and inefficiencies"
* tag 'vfs-6.15-rc1.initramfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
MAINTAINERS: append initramfs files to the VFS section
initramfs: avoid static buffer for error message
initramfs: fix hardlink hash leak without TRAILER
initramfs: reuse name_len for dir mtime tracking
initramfs: allocate heap buffers together
initramfs: avoid memcpy for hex header fields
vsprintf: add simple_strntoul
initramfs_test: kunit tests for initramfs unpacking
init: add initramfs_internal.h
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
GCC has expanded support of the "nonstring" attribute so that it can be
applied to arrays of character arrays[1], which is needed to identify
correct static initialization of those kinds of objects. Since this was
not supported prior to GCC 15, we need to distinguish the usage of Linux's
existing __nonstring macro for the attribute for non-multi-dimensional
char arrays. Until GCC 15 is the minimum version, use __nonstring_array to
mark arrays of non-string character arrays. (Regular non-string character
arrays can continue to use __nonstring.) Once GCC 15 is the minimum
compiler version we can replace all uses of __nonstring_array with just
__nonstring and remove this macro.
This allows for changes like this:
-static const char table_sigs[][ACPI_NAMESEG_SIZE] __initconst = {
+static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst = {
ACPI_SIG_BERT, ACPI_SIG_BGRT, ACPI_SIG_CPEP, ACPI_SIG_ECDT,
Which will silence the coming -Wunterminated-string-initialization
warnings in GCC 15:
In file included from ../include/acpi/actbl.h:371, from ../include/acpi/acpi.h:26, from ../include/linux/acpi.h:26,
from ../drivers/acpi/tables.c:19:
../include/acpi/actbl1.h:30:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (5 chars into 4 available) [-Wunterminated-string-initialization]
30 | #define ACPI_SIG_BERT "BERT" /* Boot Error Record Table */
| ^~~~~~ ../drivers/acpi/tables.c:400:9: note: in expansion of macro 'ACPI_SIG_BERT' 400 | ACPI_SIG_BERT, ACPI_SIG_BGRT, ACPI_SIG_CPEP, ACPI_SIG_ECDT,
| ^~~~~~~~~~~~~ ../include/acpi/actbl1.h:31:33: warning: initializer-string for array of 'char' truncates NUL terminator but destination lacks 'nonstring' attribute (5 chars into 4 available) [-Wunterminated-string-initialization]
31 | #define ACPI_SIG_BGRT "BGRT" /* Boot Graphics Resource Table */
| ^~~~~~
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1]
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/r/20250310214244.work.194-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
The kernel cannot currently self-parse BTF containing Rust debug
information. pahole uses the language of the CU to determine whether to
filter out debug information when generating the BTF. When LTO is
enabled, Rust code can cross CU boundaries, resulting in Rust debug
information in CUs labeled as C. This results in a system which cannot
parse its own BTF.
Signed-off-by: Matthew Maurer <mmaurer@google.com>
Cc: stable@vger.kernel.org
Fixes: c1177979af9c ("btf, scripts: Exclude Rust CUs with pahole")
Link: https://lore.kernel.org/r/20250108-rust-btf-lto-incompat-v1-1-60243ff6d820@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|