| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
fixes a build issue and does some cleanup in ib/sys_info.c
- "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
enhances the 64-bit math code on behalf of a PWM driver and beefs up
the test module for these library functions
- "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
makes BPF symbol names, sizes, and line numbers available to the GDB
debugger
- "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
adds a sysctl which can be used to cause additional info dumping when
the hung-task and lockup detectors fire
- "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
adds a general base64 encoder/decoder to lib/ and migrates several
users away from their private implementations
- "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
makes TCP a little faster
- "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
reworks the KEXEC Handover interfaces in preparation for Live Update
Orchestrator (LUO), and possibly for other future clients
- "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
increases the flexibility of KEXEC Handover. Also preparation for LUO
- "Live Update Orchestrator" (Pasha Tatashin)
is a major new feature targeted at cloud environments. Quoting the
cover letter:
This series introduces the Live Update Orchestrator, a kernel
subsystem designed to facilitate live kernel updates using a
kexec-based reboot. This capability is critical for cloud
environments, allowing hypervisors to be updated with minimal
downtime for running virtual machines. LUO achieves this by
preserving the state of selected resources, such as memory,
devices and their dependencies, across the kernel transition.
As a key feature, this series includes support for preserving
memfd file descriptors, which allows critical in-memory data, such
as guest RAM or any other large memory region, to be maintained in
RAM across the kexec reboot.
Mike Rappaport merits a mention here, for his extensive review and
testing work.
- "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
moves the kexec and kdump sysfs entries from /sys/kernel/ to
/sys/kernel/kexec/ and adds back-compatibility symlinks which can
hopefully be removed one day
- "kho: fixes for vmalloc restoration" (Mike Rapoport)
fixes a BUG which was being hit during KHO restoration of vmalloc()
regions
* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
calibrate: update header inclusion
Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
vmcoreinfo: track and log recoverable hardware errors
kho: fix restoring of contiguous ranges of order-0 pages
kho: kho_restore_vmalloc: fix initialization of pages array
MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
init: replace simple_strtoul with kstrtoul to improve lpj_setup
KHO: fix boot failure due to kmemleak access to non-PRESENT pages
Documentation/ABI: new kexec and kdump sysfs interface
Documentation/ABI: mark old kexec sysfs deprecated
kexec: move sysfs entries to /sys/kernel/kexec
test_kho: always print restore status
kho: free chunks using free_page() instead of kfree()
selftests/liveupdate: add kexec test for multiple and empty sessions
selftests/liveupdate: add simple kexec-based selftest for LUO
selftests/liveupdate: add userspace API selftests
docs: add documentation for memfd preservation via LUO
mm: memfd_luo: allow preserving memfd
liveupdate: luo_file: add private argument to store runtime state
mm: shmem: export some functions to internal.h
...
|
|
Pull drm updates from Dave Airlie:
"There was a rather late merge of a new color pipeline feature, that
some userspace projects are blocked on, and has seen a lot of work in
amdgpu. This should have seen some time in -next. There is additional
support for this for Intel, that if it arrives in the next day or two
I'll pass it on in another pull request and you can decide if you want
to take it.
Highlights:
- Arm Ethos NPU accelerator driver
- new DRM color pipeline support
- amdgpu will now run discrete SI/CIK cards instead of radeon, which
enables vulkan support in userspace
- msm gets gen8 gpu support
- initial Xe3P support in xe
Full detail summary:
New driver:
- Arm Ethos-U65/U85 accel driver
Core:
- support the drm color pipeline in vkms/amdgfx
- add support for drm colorop pipeline
- add COLOR PIPELINE plane property
- add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE
- throttle dirty worker with vblank
- use drm_for_each_bridge_in_chain_scoped in drm's bridge code
- Ensure drm_client_modeset tests are enabled in UML
- add simulated vblank interrupt - use in drivers
- dumb buffer sizing helper
- move freeing of drm client memory to driver
- crtc sharpness strength property
- stop using system_wq in scheduler/drivers
- support emergency restore in drm-client
Rust:
- make slice::as_flattened usable on all supported rustc
- add FromBytes::from_bytes_prefix() method
- remove redundant device ptr from Rust GEM object
- Change how AlwaysRefCounted is implemented for GEM objects
gpuvm:
- Add deferred vm_bo cleanup to GPUVM (for rust)
atomic:
- cleanup and improve state handling interfaces
buddy:
- optimize block management
dma-buf:
- heaps: Create heap per CMA reserved location
- improve userspace documentation
dp:
- add POST_LT_ADJ_REQ training sequence
- DPCD dSC quirk for synaptics panamera devices
- helpers to query branch DSC max throughput
ttm:
- Rename ttm_bo_put to ttm_bo_fini
- allow page protection flags on risc-v
- rework pipelined eviction fence handling
amdgpu:
- enable amdgpu by default for SI/CI dGPUs
- enable DC by default on SI
- refactor CIK/SI enablement
- add ABM KMS property
- Re-enable DM idle optimizations
- DC Analog encoders support
- Powerplay fixes for fiji/iceland
- Enable DC on bonaire by default
- HMM cleanup
- Add new RAS framework
- DML2.1 updates
- YCbCr420 fixes
- DC FP fixes
- DMUB fixes
- LTTPR fixes
- DTBCLK fixes
- DMU cursor offload handling
- Userq validation improvements
- Unify shutdown callback handling
- Suspend improvements
- Power limit code cleanup
- SR-IOV fixes
- AUX backlight fixes
- DCN 3.5 fixes
- HDMI compliance fixes
- DCN 4.0.1 cursor updates
- DCN interrupt fix
- DC KMS full update improvements
- Add additional HDCP traces
- DCN 3.2 fixes
- DP MST fixes
- Add support for new SR-IOV mailbox interface
- UQ reset support
- HDP flush rework
- VCE1 support
amdkfd:
- HMM cleanups
- Relax checks on save area overallocations
- Fix GPU mappings after prefetch
radeon:
- refactor CIK/SI enablement
xe:
- Initial Xe3P support
- panic support on VRAM for display
- fix stolen size check
- Loosen used tracking restriction
- New SR-IOV debugfs structure and debugfs updates
- Hide the GPU madvise flag behind a VM_BIND flag
- Always expose VRAM provisioning data on discrete GPUs
- Allow VRAM mappings for userptr when used with SVM
- Allow pinning of p2p dma-buf
- Use per-tile debugfs where appropriate
- Add documentation for Execution Queues
- PF improvements
- VF migration recovery redesign work
- User / Kernel VRAM partitioning
- Update Tile-based messages
- Allow configfs to disable specific GT types
- VF provisioning and migration improvements
- use SVM range helpers in PT layer
- Initial CRI support
- access VF registers using dedicated MMIO view
- limit number of jobs per exec queue
- add sriov_admin sysfs tree
- more crescent island specific support
- debugfs residency counter
- SRIOV migration work
- runtime registers for GFX 35
i915:
- add initial Xe3p_LPD display version 35 support
- Enable LNL+ content adaptive sharpness filter
- Use optimized VRR guardband
- Enable Xe3p LT PHY
- enable FBC support for Xe3p_LPD display
- add display 30.02 firmware support
- refactor SKL+ watermark latency setup
- refactor fbdev handling
- call i915/xe runtime PM via function pointers
- refactor i915/xe stolen memory/display interfaces
- use display version instead of gfx version in display code
- extend i915_display_info with Type-C port details
- lots of display cleanups/refactorings
- set O_LARGEFILE in __create_shmem
- skuip guc communication warning on reset
- fix time conversions
- defeature DRRS on LNL+
- refactor intel_frontbuffer split between i915/xe/display
- convert inteL_rom interfaces to struct drm_device
- unify display register polling interfaces
- aovid lock inversion when pinning to GGTT on CHV/BXT+VTD
panel:
- Add KD116N3730A08/A12, chromebook mt8189
- JT101TM023, LQ079L1SX01,
- GLD070WX3-SL01 MIPI DSI
- Samsung LTL106AL0, Samsung LTL106AL01
- Raystar RFF500F-AWH-DNN
- Winstar WF70A8SYJHLNGA
- Wanchanglong w552946aaa
- Samsung SOFEF00
- Lenovo X13s panel
- ilitek-ili9881c - add rpi 5" support
- visionx-rm69299 - add backlight support
- edp - support AUI B116XAN02.0
bridge:
- improve ref counting
- ti-sn65dsi86 - add support for DP mode with HPD
- synopsis: support CEC, init timer with correct freq
- ASL CS5263 DP-to-HDMI bridge support
nova-core:
- introduce bitfield! macro
- introduce safe integer converters
- GSP inits to fully booted state on Ampere
- Use more future-proof register for GPU identification
nova-drm:
- select NOVA_CORE
- 64-bit only
nouveau:
- improve reclocking on tegra 186+
- add large page and compression support
msm:
- GPU:
- Gen8 support: A840 (Kaanapali) and X2-85 (Glymur)
- A612 support
- MDSS:
- Added support for Glymur and QCS8300 platforms
- DPU:
- Enabled Quad-Pipe support, unlocking higher resolutions support
- Added support for Glymur platform
- Documented DPU on QCS8300 platform as supported
- DisplayPort:
- Added support for Glymur platform
- Added support lame remapping inside DP block
- Documented DisplayPort controller on QCS8300 and SM6150/QCS615
as supported
tegra:
- NVJPG driver
panfrost:
- display JM contexts over debugfs
- export JM contexts to userspace
- improve error and job handling
panthor:
- support custom ASN_HASH for mt8196
- support mali-G1 GPU
- flush shmem write before mapping buffers uncached
- make timeout per-queue instead of per-job
mediatek:
- MT8195/88 HDMIv2/DDCv2 support
rockchip:
- dsi: add support for RK3368
amdxdna:
- enhance runtime PM
- last hardware error reading uapi
- support firmware debug output
- add resource and telemetry data uapi
- preemption support
imx:
- add driver for HDMI TX Parallel audio interface
ivpu:
- add support for user-managed preemption buffer
- add userptr support
- update JSM firware API to 3.33.0
- add better alloc/free warnings
- fix page fault in unbind all bos
- rework bind/unbind of imported buffers
- enable MCA ECC signalling
- split fw runtime and global memory buffers
- add fdinfo memory statistics
tidss:
- convert to drm logging
- logging cleanup
ast:
- refactor generation init paths
- add per chip generation detect_tx_chip
- set quirks for each chip model
atmel-hlcdc:
- set LCDC_ATTRE register in plane disable
- set correct values for plane scaler
solomon:
- use drm helper for get_modes and move_valid
sitronix:
- fix output position when clearing screens
qaic:
- support dma-buf exports
- support new firmware's READ_DATA implementation
- sahara AIC200 image table update
- add sysfs support
- add coredump support
- add uevents support
- PM support
sun4i:
- layer refactors to decouple plane from output
- improve DE33 support
vc4:
- switch to generic CEC helpers
komeda:
- use drm_ logging functions
vkms:
- configfs support for display configuration
vgem:
- fix fence timer deadlock
etnaviv:
- add HWDB entry for GC8000 Nano Ultra VIP r6205"
* tag 'drm-next-2025-12-03' of https://gitlab.freedesktop.org/drm/kernel: (1869 commits)
Revert "drm/amd: Skip power ungate during suspend for VPE"
drm/amdgpu: use common defines for HUB faults
drm/amdgpu/gmc12: add amdgpu_vm_handle_fault() handling
drm/amdgpu/gmc11: add amdgpu_vm_handle_fault() handling
drm/amdgpu: use static ids for ACP platform devs
drm/amdgpu/sdma6: Update SDMA 6.0.3 FW version to include UMQ protected-fence fix
drm/amdgpu: Forward VMID reservation errors
drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ring
drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
drm/amdgpu/gmc6: Cache VM fault info
drm/amdgpu/gmc6: Don't print MC client as it's unknown
drm/amdgpu/cz_ih: Enable soft IRQ handler ring
drm/amdgpu/tonga_ih: Enable soft IRQ handler ring
drm/amdgpu/iceland_ih: Enable soft IRQ handler ring
drm/amdgpu/cik_ih: Enable soft IRQ handler ring
drm/amdgpu/si_ih: Enable soft IRQ handler ring
drm/amd/display: fix typo in display_mode_core_structs.h
drm/amd/display: fix Smart Power OLED not working after S4
drm/amd/display: Move RGB-type check for audio sync to DCE HW sequence
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nicolas Schier:
- Enable -fms-extensions, allowing anonymous use of tagged struct or
union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
conversion patch is added here, too (btrfs).
[ Editor's note: the core of this actually came in early through a
shared branch and a few other trees - Linus ]
- Introduce architecture-specific CC_CAN_LINK and flags for userprogs
- Add new packaging target 'modules-cpio-pkg' for building a initramfs
cpio w/ kmods
- Handle included .c files in gen_compile_commands
- Minor kbuild changes:
- Use objtree for module signing key path, fixing oot kmod signing
- Improve documentation of KBUILD_BUILD_TIMESTAMP
- Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
- Rename scripts/Makefile.extrawarn to Makefile.warn
- Drop obsolete types.h check from headers_check.pl
- Remove outdated config leak ignore entries
* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: add target to build a cpio containing modules
initramfs: add gen_init_cpio to hostprogs unconditionally
kbuild: allow architectures to override CC_CAN_LINK
init: deduplicate cc-can-link.sh invocations
kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
scripts: headers_install.sh: Remove two outdated config leak ignore entries
scripts/clang-tools: Handle included .c files in gen_compile_commands
kbuild: uapi: Drop types.h check from headers_check.pl
kbuild: Rename Makefile.extrawarn to Makefile.warn
MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
kbuild: uapi: reuse KBUILD_USERCFLAGS
kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
kbuild: Use objtree for module signing key path
btrfs: send: make use of -fms-extensions for defining struct fs_path
|
|
Move KHO to kernel/liveupdate/ in preparation of placing all Live Update
core kernel related files to the same place.
[pasha.tatashin@soleen.com: disable the menu when DEFERRED_STRUCT_PAGE_INIT]
Link: https://lkml.kernel.org/r/CA+CK2bAvh9Oa2SLfsbJ8zztpEjrgr_hr-uGgF1coy8yoibT39A@mail.gmail.com
Link: https://lkml.kernel.org/r/20251101142325.1326536-8-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Pratyush Yadav <pratyush@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The generic test for CC_CAN_LINK assumes that all architectures use -m32
and -m64 to switch between 32-bit and 64-bit compilation. This is overly
simplistic. Architectures may use other flags (-mabi, -m31, etc.) or may
also require byte order handling (-mlittle-endian, -EL). Expressing all
of the different possibilities will be very complicated and brittle.
Instead allow architectures to supply their own logic which will be
easy to understand and evolve.
Both the boolean ARCH_HAS_CC_CAN_LINK and the string ARCH_USERFLAGS need
to be implemented as kconfig does not allow the reuse of string options.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-3-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
|
|
The command to invoke scripts/cc-can-link.sh is very long and new usages
are about to be added.
Add a helper variable to make the code easier to read and maintain.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251114-kbuild-userprogs-bits-v3-2-4dee0d74d439@linutronix.de
Signed-off-by: Nicolas Schier <nsc@kernel.org>
|
|
In Rust 1.80, the previously unstable `slice::flatten` family of methods
have been stabilized and renamed to `slice::as_flattened`.
This creates an issue as we want to use `as_flattened`, but need to
support the MSRV (which at the moment is Rust 1.78) where it is named
`flatten`.
Solve this by enabling the `slice_flatten` feature, and providing an
`as_flattened` implementation through an extension trait for compiler
versions where it is not available.
The trait is then exported from the prelude, making the `as_flattened`
family of methods transparently available for all supported compiler
versions.
This extension trait can be removed once the MSRV passes 1.80.
Suggested-by: Miguel Ojeda <ojeda@kernel.org>
Link: https://lore.kernel.org/all/CANiq72kK4pG=O35NwxPNoTO17oRcg1yfGcvr3==Fi4edr+sfmw@mail.gmail.com/
Acked-by: Danilo Krummrich <dakr@kernel.org>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Message-ID: <20251110-gsp_boot-v9-8-8ae4058e3c0e@nvidia.com>
Message-ID: <20251104-b4-as-flattened-v3-1-6cb9c26b45cd@nvidia.com>
|
|
The kernel cmdline length is allowed to be longer than what printk can
handle. When this happens the cmdline that's printed to the kernel ring
buffer at bootup is cutoff and some kernel cmdline options are "hidden"
from the logs. This undercuts the usefulness of the log message.
Specifically, grepping for COMMAND_LINE_SIZE shows that 2048 is common and
some architectures even define it as 4096. s390 allows a CONFIG-based
maximum up to 1MB (though it's not expected that anyone will go over the
default max of 4096 [1]).
The maximum message pr_notice() seems to be able to handle (based on
experiment) is 1021 characters. This appears to be based on the current
value of PRINTKRB_RECORD_MAX as 1024 and the fact that pr_notice() spends
2 characters on the loglevel prefix and we have a '\n' at the end.
While it would be possible to increase the limits of printk() (and
therefore pr_notice()) somewhat, it doesn't appear possible to increase it
enough to fully include a 2048-character cmdline without breaking
userspace. Specifically on at least two tested userspaces (ChromeOS plus
the Debian-based distro I'm typing this message on) the `dmesg` tool reads
lines from `/dev/kmsg` in 2047-byte chunks. As per
`Documentation/ABI/testing/dev-kmsg`:
Every read() from the opened device node receives one record
of the kernel's printk buffer.
...
Messages in the record ring buffer get overwritten as whole,
there are never partial messages received by read().
We simply can't fit a 2048-byte cmdline plus the "Kernel command line:"
prefix plus info about time/log_level/etc in a 2047-byte read.
The above means that if we want to avoid the truncation we need to do some
type of wrapping of the cmdline when printing.
Add wrapping to the printout of the kernel command line. By default, the
wrapping is set to 1021 characters to avoid breaking anyone, but allow
wrapping to be set lower by a Kconfig knob
"CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN". Any tools that are correctly parsing
the cmdline today (because it is less than 1021 characters) will see no
difference in their behavior. The format of wrapped output is designed to
be matched by anyone using "grep" to search for the cmdline and also to be
easy for tools to handle. Anyone who is sure their tools (if any) handle
the wrapped format can choose a lower wrapping value and have prettier
output.
Setting CONFIG_CMDLINE_LOG_WRAP_IDEAL_LEN to 0 fully disables the wrapping
logic. This means that long command lines will be truncated again, but
this config could be set if command lines are expected to be long and
userspace is known not to handle parsing logs with the wrapping.
Wrapping is based on spaces, ignoring quotes. All lines are prefixed with
"Kernel command line: " and lines that are not the last line have a " \"
suffix added to them. The prefix and suffix count towards the line length
for wrapping purposes. The ideal length will be exceeded if no
appropriate place to wrap is found.
The wrapping function added here is fairly generic and could be made a
library function (somewhat like print_hex_dump()) if it's needed elsewhere
in the kernel. However, having printk() directly incorporate this
wrapping would be unlikely to be a good idea since it would break
printouts into more than one record without any obvious common line prefix
to tie lines together. It would also be extra overhead when, in general,
kernel log message should simply be kept smaller than 1021 bytes. For
some discussion on this topic, see responses to the v1 posting of this
patch [2].
[akpm@linux-foundation.org: make print_kernel_cmdline __init]
[dianders@chromium.org: v4]
Link: https://lkml.kernel.org/r/20251027082204.v4.1.I095f1e2c6c27f9f4de0b4841f725f356c643a13f@changeid
Link: https://lkml.kernel.org/r/20251023113257.v3.1.I095f1e2c6c27f9f4de0b4841f725f356c643a13f@changeid
Link: https://lore.kernel.org/r/20251021131633.26700Dd6-hca@linux.ibm.com [1]
Link: https://lore.kernel.org/r/CAD=FV=VNyt1zG_8pS64wgV8VkZWiWJymnZ-XCfkrfaAhhFSKcA@mail.gmail.com [2]
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Chant <achant@google.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Francesco Valla <francesco@valla.it>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: guoweikang <guoweikang.kernel@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Hendrik Farr <kernel@jfarr.cc>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now that all bits and pieces are in place, hook the RSEQ handling fast path
function into exit_to_user_mode_prepare() after the TIF work bits have been
handled. If case of fast path failure, TIF_NOTIFY_RESUME has been raised
and the caller needs to take another turn through the TIF handling slow
path.
This only works for architectures which use the generic entry code.
Architectures who still have their own incomplete hacks are not supported
and won't be.
This results in the following improvements:
Kernel build Before After Reduction
exit to user 80692981 80514451
signal checks: 32581 121 99%
slowpath runs: 1201408 1.49% 198 0.00% 100%
fastpath runs: 675941 0.84% N/A
id updates: 1233989 1.53% 50541 0.06% 96%
cs checks: 1125366 1.39% 0 0.00% 100%
cs cleared: 1125366 100% 0 100%
cs fixup: 0 0% 0
RSEQ selftests Before After Reduction
exit to user: 386281778 387373750
signal checks: 35661203 0 100%
slowpath runs: 140542396 36.38% 100 0.00% 100%
fastpath runs: 9509789 2.51% N/A
id updates: 176203599 45.62% 9087994 2.35% 95%
cs checks: 175587856 45.46% 4728394 1.22% 98%
cs cleared: 172359544 98.16% 1319307 27.90% 99%
cs fixup: 3228312 1.84% 3409087 72.10%
The 'cs cleared' and 'cs fixup' percentages are not relative to the exit to
user invocations, they are relative to the actual 'cs check' invocations.
While some of this could have been avoided in the original code, like the
obvious clearing of CS when it's already clear, the main problem of going
through TIF_NOTIFY_RESUME cannot be solved. In some workloads the RSEQ
notify handler is invoked more than once before going out to user
space. Doing this once when everything has stabilized is the only solution
to avoid this.
The initial attempt to completely decouple it from the TIF work turned out
to be suboptimal for workloads, which do a lot of quick and short system
calls. Even if the fast path decision is only 4 instructions (including a
conditional branch), this adds up quickly and becomes measurable when the
rate for actually having to handle rseq is in the low single digit
percentage range of user/kernel transitions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.701201365@linutronix.de
|
|
Config based debug is rarely turned on and is not available easily when
things go wrong.
Provide a static branch to allow permanent integration of debug mechanisms
along with the usual toggles in Kconfig, command line and debugfs.
Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.089270547@linutronix.de
|
|
Analyzing the call frequency without actually using tracing is helpful for
analysis of this infrastructure. The overhead is minimal as it just
increments a per CPU counter associated to each operation.
The debugfs readout provides a racy sum of all counters.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.027916598@linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Add KUnit test for the printk ring buffer
- Fix the check of the maximal record size which is allowed to be
stored into the printk ring buffer. It prevents corruptions of the
ring buffer.
Note that printk() is on the safe side. The messages are limited by
1kB buffer and are always small enough for the minimal log buffer
size 4kB, see CONFIG_LOG_BUF_SHIFT definition.
* tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: ringbuffer: Fix data block max size check
printk: kunit: support offstack cpumask
printk: kunit: Fix __counted_by() in struct prbtest_rbdata
printk: ringbuffer: Explain why the KUnit test ignores failed writes
printk: ringbuffer: Add KUnit test
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO updates from Thomas Gleixner:
- Further consolidation of the VDSO infrastructure and the common data
store
- Simplification of the related Kconfig logic
- Improve the VDSO selftest suite
* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests: vDSO: Drop vdso_test_clock_getres
selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
selftests: vDSO: vdso_test_abi: Use explicit indices for name array
selftests: vDSO: vdso_test_abi: Drop clock availability tests
selftests: vDSO: vdso_test_abi: Use ksft_finished()
selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
vdso: Drop kconfig GENERIC_COMPAT_VDSO
vdso: Drop kconfig GENERIC_VDSO_32
riscv: vdso: Untangle Kconfig logic
time: Build generic update_vsyscall() only with generic time vDSO
vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
vdso: Move ENABLE_COMPAT_VDSO from core to arm64
ARM: VDSO: Remove cntvct_ok global variable
vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Refactor SCLP memory hotplug code
- Introduce common boot_panic() decompressor helper macro and use it to
get rid of nearly few identical implementations
- Take into account additional key generation flags and forward it to
the ep11 implementation. With that allow users to modify the key
generation process, e.g. provide valid combinations of XCP_BLOB_*
flags
- Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390
debug facility and HMC driver
- Add DAX support for DCSS memory block devices
- Make the compiler statement attribute "assume" available with a new
__assume macro
- Rework ffs() and fls() family bitops functions, including source code
improvements and generated code optimizations. Use the newly
introduced __assume macro for that
- Enable additional network features in default configurations
- Use __GFP_ACCOUNT flag for user page table allocations to add missing
kmemcg accounting
- Add WQ_PERCPU flag to explicitly request the use of the per-CPU
workqueue for 3590 tape driver
- Switch power reading to the per-CPU and the Hiperdispatch to the
default workqueue
- Add memory allocation profiling hooks to allow better profiling data
and the /proc/allocinfo output similar to other architectures
* tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
s390/mm: Add memory allocation profiling hooks
s390: Replace use of system_wq with system_dfl_wq
s390/diag324: Replace use of system_wq with system_percpu_wq
s390/tape: Add WQ_PERCPU to alloc_workqueue users
s390/bitops: Switch to generic ffs() if supported by compiler
s390/bitops: Switch to generic fls(), fls64(), etc.
s390/mm: Use __GFP_ACCOUNT for user page table allocations
s390/configs: Enable additional network features
s390/bitops: Cleanup __flogr()
s390/bitops: Use __assume() for __flogr() inline assembly return value
compiler_types: Add __assume macro
s390/bitops: Limit return value range of __flogr()
s390/dcssblk: Add DAX support
s390/hmcdrv: Replace kmalloc() + copy_from_user() with memdup_user_nul()
s390/debug: Replace kmalloc() + copy_from_user() with memdup_user_nul()
s390/pkey: Forward keygenflags to ep11_unwrapkey
s390/boot: Add common boot_panic() code
s390/bitops: Optimize inlining
s390/bitops: Slightly optimize ffs() and fls64()
s390/sclp: Move memory hotplug code for better modularity
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"One notable addition is the creation of the 'transitional' keyword for
kconfig so CONFIG renaming can go more smoothly.
This has been a long-standing deficiency, and with the renaming of
CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
support), this came up again.
The breadth of the diffstat is mainly this renaming.
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
- lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
(Junjie Cao)
- Add str_assert_deassert() helper (Lad Prabhakar)
- gcc-plugins: Remove TODO_verify_il for GCC >= 16
- kconfig: Fix BrokenPipeError warnings in selftests
- kconfig: Add transitional symbol attribute for migration support
- kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"
* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib/string_choices: Add str_assert_deassert() helper
kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
kconfig: Add transitional symbol attribute for migration support
kconfig: Fix BrokenPipeError warnings in selftests
gcc-plugins: Remove TODO_verify_il for GCC >= 16
stddef: Introduce __TRAILING_OVERLAP()
stddef: Remove token-pasting in TRAILING_OVERLAP()
lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add "initramfs_options" parameter to set initramfs mount options.
This allows to add specific mount options to the rootfs to e.g.,
limit the memory size
- Add RWF_NOSIGNAL flag for pwritev2()
Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
signal from being raised when writing on disconnected pipes or
sockets. The flag is handled directly by the pipe filesystem and
converted to the existing MSG_NOSIGNAL flag for sockets
- Allow to pass pid namespace as procfs mount option
Ever since the introduction of pid namespaces, procfs has had very
implicit behaviour surrounding them (the pidns used by a procfs
mount is auto-selected based on the mounting process's active
pidns, and the pidns itself is basically hidden once the mount has
been constructed)
This implicit behaviour has historically meant that userspace was
required to do some special dances in order to configure the pidns
of a procfs mount as desired. Examples include:
* In order to bypass the mnt_too_revealing() check, Kubernetes
creates a procfs mount from an empty pidns so that user
namespaced containers can be nested (without this, the nested
containers would fail to mount procfs)
But this requires forking off a helper process because you cannot
just one-shot this using mount(2)
* Container runtimes in general need to fork into a container
before configuring its mounts, which can lead to security issues
in the case of shared-pidns containers (a privileged process in
the pidns can interact with your container runtime process)
While SUID_DUMP_DISABLE and user namespaces make this less of an
issue, the strict need for this due to a minor uAPI wart is kind
of unfortunate
Things would be much easier if there was a way for userspace to
just specify the pidns they want. So this pull request contains
changes to implement a new "pidns" argument which can be set
using fsconfig(2):
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
Cleanups:
- Remove the last references to EXPORT_OP_ASYNC_LOCK
- Make file_remove_privs_flags() static
- Remove redundant __GFP_NOWARN when GFP_NOWAIT is used
- Use try_cmpxchg() in start_dir_add()
- Use try_cmpxchg() in sb_init_done_wq()
- Replace offsetof() with struct_size() in ioctl_file_dedupe_range()
- Remove vfs_ioctl() export
- Replace rwlock() with spinlock in epoll code as rwlock causes
priority inversion on preempt rt kernels
- Make ns_entries in fs/proc/namespaces const
- Use a switch() statement() in init_special_inode() just like we do
in may_open()
- Use struct_size() in dir_add() in the initramfs code
- Use str_plural() in rd_load_image()
- Replace strcpy() with strscpy() in find_link()
- Rename generic_delete_inode() to inode_just_drop() and
generic_drop_inode() to inode_generic_drop()
- Remove unused arguments from fcntl_{g,s}et_rw_hint()
Fixes:
- Document @name parameter for name_contains_dotdot() helper
- Fix spelling mistake
- Always return zero from replace_fd() instead of the file descriptor
number
- Limit the size for copy_file_range() in compat mode to prevent a
signed overflow
- Fix debugfs mount options not being applied
- Verify the inode mode when loading it from disk in minixfs
- Verify the inode mode when loading it from disk in cramfs
- Don't trigger automounts with RESOLVE_NO_XDEV
If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
through automounts, but could still trigger them
- Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
tracepoints
- Fix unused variable warning in rd_load_image() on s390
- Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD
- Use ns_capable_noaudit() when determining net sysctl permissions
- Don't call path_put() under namespace semaphore in listmount() and
statmount()"
* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
fcntl: trim arguments
listmount: don't call path_put() under namespace semaphore
statmount: don't call path_put() under namespace semaphore
pid: use ns_capable_noaudit() when determining net sysctl permissions
fs: rename generic_delete_inode() and generic_drop_inode()
init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
initramfs: Replace strcpy() with strscpy() in find_link()
initrd: Use str_plural() in rd_load_image()
initramfs: Use struct_size() helper to improve dir_add()
initrd: Fix unused variable warning in rd_load_image() on s390
fs: use the switch statement in init_special_inode()
fs/proc/namespaces: make ns_entries const
filelock: add FL_RECLAIM to show_fl_flags() macro
eventpoll: Replace rwlock with spinlock
selftests/proc: add tests for new pidns APIs
procfs: add "pidns" mount option
pidns: move is-ancestor logic to helper
openat2: don't trigger automounts with RESOLVE_NO_XDEV
namei: move cross-device check to __traverse_mounts
namei: remove LOOKUP_NO_XDEV check from handle_mounts
...
|
|
There's a silly problem with the CC_HAS_ASM_GOTO_OUTPUT test: even with
a working compiler it will fail on some architectures simply because it
uses the mnemonic "jmp" for testing the inline asm.
And as reported by Geert, not all architectures use that mnemonic, so
the test fails spuriously on such platforms (including arm and riscv,
but also several other architectures).
This issue avoided any obvious test failures because the build still
works thanks to falling back on the old non-asm-goto code, which just
generates worse code.
Just use an empty asm statement instead.
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Fixes: e2ffa15b9baa ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17")
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The kernel's CFI implementation uses the KCFI ABI specifically, and is
not strictly tied to a particular compiler. In preparation for GCC
supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with
associated options).
Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will
enable CONFIG_CFI during olddefconfig.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
clang < 17 fails to use scope local labels with CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y:
{
__label__ local_lbl;
...
unsafe_get_user(uval, uaddr, local_lbl);
...
return 0;
local_lbl:
return -EFAULT;
}
when two such scopes exist in the same function:
error: cannot jump from this asm goto statement to one of its possible targets
There are other failure scenarios. Shuffling code around slightly makes it
worse and fail even with one instance.
That issue prevents using local labels for a cleanup based user access
mechanism.
After failed attempts to provide a simple enough test case for the 'depends
on' test in Kconfig, the initial cure was to mark ASM goto broken on clang
versions < 17 to get this road block out of the way.
But Nathan pointed out that this is a known clang issue and indeed affects
clang < version 17 in combination with cleanup(). It's not even required to
use local labels for that.
The clang issue tracker has a small enough test case, which can be used as
a test in the 'depends on' section of CC_HAS_ASM_GOTO_OUTPUT:
void bar(void **);
void* baz(void);
int foo (void) {
{
asm goto("jmp %l0"::::l0);
return 0;
l0:
return 1;
}
void *x __attribute__((cleanup(bar))) = baz();
{
asm goto("jmp %l0"::::l1);
return 42;
l1:
return 0xff;
}
}
Add another dependency to config CC_HAS_ASM_GOTO_OUTPUT for it and use the
clang issue tracker test case for detection by condensing it to obfuscated
C-code contest format. This reliably catches the problem on clang < 17 and
did not show any issues on the non broken GCC versions.
That test might be sufficient to catch all issues and therefore could
replace the existing test, but keeping that around does no harm either.
Thanks to Nathan for pointing to the relevant clang issue!
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1886
Link: https://github.com/llvm/llvm-project/commit/f023f5cdb2e6c19026f04a15b5a935c041835d14
|
|
Make the statement attribute "assume" with a new __assume macro available.
The assume attribute is used to indicate that a certain condition is
assumed to be true. Compilers may or may not use this indication to
generate optimized code. If this condition is violated at runtime, the
behavior is undefined.
Note that the clang documentation states that optimizers may react
differently to this attribute, and this may even have a negative
performance impact. Therefore this attribute should be used with care.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
INITRAMFS_PRESERVE_MTIME is only used in init/initramfs.c and
init/initramfs_test.c. Hence add a dependency on BLK_DEV_INITRD, to
prevent asking the user about this feature when configuring a kernel
without initramfs support.
Fixes: 1274aea127b2e8c9 ("initramfs: add INITRAMFS_PRESERVE_MTIME Kconfig option")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:
- Two changes to prepare for the future Rust 1.91.0 release (expected
2025-10-30, currently in nightly): a target specification format
change and a renamed, soon-to-be-stabilized 'core' function.
* tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: support Rust >= 1.91.0 target spec
rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
|
|
All architectures implementing time-related functionality in the vDSO are
using the generic vDSO library which handles time namespaces properly.
Remove the now unnecessary Kconfig symbol.
Enables the use of time namespaces on architectures, which use the
generic vDSO but did not enable GENERIC_VDSO_TIME_NS, namely MIPS and arm.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/all/20250826-vdso-cleanups-v1-10-d9b65750e49f@linutronix.de
|
|
As part of the stabilization of Location::file_with_nul(), it was brought
up that the with_nul() suffix usually means something else in Rust APIs,
so the API is being renamed prior to stabilization [1].
Thus, use the new name on new rustc versions.
Link: https://www.github.com/rust-lang/rust/pull/145928 [1]
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250827-file_as_c_str-v1-1-d3f5a3916a9c@google.com
[ Kept `cfg` separation. Reworded slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
After an innocuous change in -next that modified a structure that
contains __counted_by, clang-19 start crashing when building certain
files in drivers/gpu/drm/xe. When assertions are enabled, the more
descriptive failure is:
clang: clang/lib/AST/RecordLayoutBuilder.cpp:3335: const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: Assertion `D && "Cannot get layout of forward declarations!"' failed.
According to a reverse bisect, a tangential change to the LLVM IR
generation phase of clang during the LLVM 20 development cycle [1]
resolves this problem. Bump the version of clang that enables
CONFIG_CC_HAS_COUNTED_BY to 20.1.0 to ensure that this issue cannot be
hit.
Link: https://github.com/llvm/llvm-project/commit/160fb1121cdf703c3ef5e61fb26c5659eb581489 [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20250807-fix-counted_by-clang-19-v1-1-902c86c1d515@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
us closer to being able to remove page->mapping
- "relayfs: misc changes" (Jason Xing) does some maintenance and
minor feature addition work in relayfs
- "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
us from static preallocation of the kdump crashkernel's working
memory over to dynamic allocation. So the difficulty of a-priori
estimation of the second kernel's needs is removed and the first
kernel obtains extra memory
- "generalize panic_print's dump function to be used by other
kernel parts" (Feng Tang) implements some consolidation and
rationalization of the various ways in which a failing kernel
splats information at the operator
* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
tools/getdelays: add backward compatibility for taskstats version
kho: add test for kexec handover
delaytop: enhance error logging and add PSI feature description
samples: Kconfig: fix spelling mistake "instancess" -> "instances"
fat: fix too many log in fat_chain_add()
scripts/spelling.txt: add notifer||notifier to spelling.txt
xen/xenbus: fix typo "notifer"
net: mvneta: fix typo "notifer"
drm/xe: fix typo "notifer"
cxl: mce: fix typo "notifer"
KVM: x86: fix typo "notifer"
MAINTAINERS: add maintainers for delaytop
ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
ucount: fix atomic_long_inc_below() argument type
kexec: enable CMA based contiguous allocation
stackdepot: make max number of pools boot-time configurable
lib/xxhash: remove unused functions
init/Kconfig: restore CONFIG_BROKEN help text
lib/raid6: update recov_rvv.c zero page usage
docs: update docs after introducing delaytop
...
|
|
Linus added it in 2003, it later was removed. Put it back.
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Add support for cgroup "cpu.max" interface
- Code organization cleanup so that ext_idle.c doesn't depend on the
source-file-inclusion build method of sched/
- Drop UP paths in accordance with sched core changes
- Documentation and other misc changes
* tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Fix scx_bpf_reenqueue_local() reference
sched_ext: Drop kfuncs marked for removal in 6.15
sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic
kernel/sched/ext.c: fix typo "occured" -> "occurred" in comments
sched_ext: Add support for cgroup bandwidth control interface
sched_ext, sched/core: Factor out struct scx_task_group
sched_ext: Return NULL in llc_span
sched_ext: Always use SMP versions in kernel/sched/ext_idle.h
sched_ext: Always use SMP versions in kernel/sched/ext_idle.c
sched_ext: Always use SMP versions in kernel/sched/ext.h
sched_ext: Always use SMP versions in kernel/sched/ext.c
sched_ext: Documentation: Clarify time slice handling in task lifecycle
sched_ext: Make scx_locked_rq() inline
sched_ext: Make scx_rq_bypassing() inline
sched_ext: idle: Make local functions static in ext_idle.c
sched_ext: idle: Remove unnecessary ifdef in scx_bpf_cpu_node()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"As usual, many cleanups. The below blurbiage describes 42 patchsets.
21 of those are partially or fully cleanup work. "cleans up",
"cleanup", "maintainability", "rationalizes", etc.
I never knew the MM code was so dirty.
"mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
mapped VMAs were not eligible for merging with existing adjacent
VMAs.
"mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
adds a new kernel module which simplifies the setup and usage of
DAMON in production environments.
"stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
is a cleanup to the writeback code which removes a couple of
pointers from struct writeback_control.
"drivers/base/node.c: optimization and cleanups" (Donet Tom)
contains largely uncorrelated cleanups to the NUMA node setup and
management code.
"mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
does some maintenance work on the userfaultfd code.
"Readahead tweaks for larger folios" (Ryan Roberts)
implements some tuneups for pagecache readahead when it is reading
into order>0 folios.
"selftests/mm: Tweaks to the cow test" (Mark Brown)
provides some cleanups and consistency improvements to the
selftests code.
"Optimize mremap() for large folios" (Dev Jain)
does that. A 37% reduction in execution time was measured in a
memset+mremap+munmap microbenchmark.
"Remove zero_user()" (Matthew Wilcox)
expunges zero_user() in favor of the more modern memzero_page().
"mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
addresses some warts which David noticed in the huge page code.
These were not known to be causing any issues at this time.
"mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
provides some cleanup and consolidation work in DAMON.
"use vm_flags_t consistently" (Lorenzo Stoakes)
uses vm_flags_t in places where we were inappropriately using other
types.
"mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
increases the reliability of large page allocation in the memfd
code.
"mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
removes several now-unneeded PFN_* flags.
"mm/damon: decouple sysfs from core" (SeongJae Park)
implememnts some cleanup and maintainability work in the DAMON
sysfs layer.
"madvise cleanup" (Lorenzo Stoakes)
does quite a lot of cleanup/maintenance work in the madvise() code.
"madvise anon_name cleanups" (Vlastimil Babka)
provides additional cleanups on top or Lorenzo's effort.
"Implement numa node notifier" (Oscar Salvador)
creates a standalone notifier for NUMA node memory state changes.
Previously these were lumped under the more general memory
on/offline notifier.
"Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
cleans up the pageblock isolation code and fixes a potential issue
which doesn't seem to cause any problems in practice.
"selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
adds additional drgn- and python-based DAMON selftests which are
more comprehensive than the existing selftest suite.
"Misc rework on hugetlb faulting path" (Oscar Salvador)
fixes a rather obscure deadlock in the hugetlb fault code and
follows that fix with a series of cleanups.
"cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
rationalizes and cleans up the highmem-specific code in the CMA
allocator.
"mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
provides cleanups and future-preparedness to the migration code.
"mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
adds some tracepoints to some DAMON auto-tuning code.
"mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
does that.
"mm/damon: misc cleanups" (SeongJae Park)
also does what it claims.
"mm: folio_pte_batch() improvements" (David Hildenbrand)
cleans up the large folio PTE batching code.
"mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
facilitates dynamic alteration of DAMON's inter-node allocation
policy.
"Remove unmap_and_put_page()" (Vishal Moola)
provides a couple of page->folio conversions.
"mm: per-node proactive reclaim" (Davidlohr Bueso)
implements a per-node control of proactive reclaim - beyond the
current memcg-based implementation.
"mm/damon: remove damon_callback" (SeongJae Park)
replaces the damon_callback interface with a more general and
powerful damon_call()+damos_walk() interface.
"mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
implements a number of mremap cleanups (of course) in preparation
for adding new mremap() functionality: newly permit the remapping
of multiple VMAs when the user is specifying MREMAP_FIXED. It still
excludes some specialized situations where this cannot be performed
reliably.
"drop hugetlb_free_pgd_range()" (Anthony Yznaga)
switches some sparc hugetlb code over to the generic version and
removes the thus-unneeded hugetlb_free_pgd_range().
"mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
augments the present userspace-requested update of DAMON sysfs
monitoring files. Automatic update is now provided, along with a
tunable to control the update interval.
"Some randome fixes and cleanups to swapfile" (Kemeng Shi)
does what is claims.
"mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
provides (and uses) a means by which debug-style functions can grab
a copy of a pageframe and inspect it locklessly without tripping
over the races inherent in operating on the live pageframe
directly.
"use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
addresses the large contention issues which can be triggered by
reads from that procfs file. Latencies are reduced by more than
half in some situations. The series also introduces several new
selftests for the /proc/pid/maps interface.
"__folio_split() clean up" (Zi Yan)
cleans up __folio_split()!
"Optimize mprotect() for large folios" (Dev Jain)
provides some quite large (>3x) speedups to mprotect() when dealing
with large folios.
"selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
does some cleanup work in the selftests code.
"tools/testing: expand mremap testing" (Lorenzo Stoakes)
extends the mremap() selftest in several ways, including adding
more checking of Lorenzo's recently added "permit mremap() move of
multiple VMAs" feature.
"selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
extends the DAMON sysfs interface selftest so that it tests all
possible user-requested parameters. Rather than the present minimal
subset"
* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
MAINTAINERS: add missing headers to mempory policy & migration section
MAINTAINERS: add missing file to cgroup section
MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
MAINTAINERS: add missing zsmalloc file
MAINTAINERS: add missing files to page alloc section
MAINTAINERS: add missing shrinker files
MAINTAINERS: move memremap.[ch] to hotplug section
MAINTAINERS: add missing mm_slot.h file THP section
MAINTAINERS: add missing interval_tree.c to memory mapping section
MAINTAINERS: add missing percpu-internal.h file to per-cpu section
mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
selftests/damon: introduce _common.sh to host shared function
selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
selftests/damon/sysfs.py: test non-default parameters runtime commit
selftests/damon/sysfs.py: generalize DAMON context commit assertion
selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
selftests/damon/sysfs.py: test DAMOS filters commitment
selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
selftests/damon/sysfs.py: test DAMOS destinations commitment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler changes:
- Better tracking of maximum lag of tasks in presence of different
slices duration, for better handling of lag in the fair scheduler
(Vincent Guittot)
- Clean up and standardize #if/#else/#endif markers throughout the
entire scheduler code base (Ingo Molnar)
- Make SMP unconditional: build the SMP scheduler's data structures
and logic on UP kernel too, even though they are not used, to
simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
blocks from the scheduler (Ingo Molnar)
- Reorganize cgroup bandwidth control interface handling for better
interfacing with sched_ext (Tejun Heo)
Balancing:
- Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
Mason)
- Remove sched_domain_topology_level::flags to simplify the code
(Prateek Nayak)
- Simplify and clean up build_sched_topology() (Li Chen)
- Optimize build_sched_topology() on large machines (Li Chen)
Real-time scheduling:
- Add initial version of proxy execution: a mechanism for
mutex-owning tasks to inherit the scheduling context of higher
priority waiters.
Currently limited to a single runqueue and conditional on
CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
Valentin Schneider)
- Deadline scheduler (Juri Lelli):
- Fix dl_servers initialization order (Juri Lelli)
- Fix DL scheduler's root domain reinitialization logic (Juri
Lelli)
- Fix accounting bugs after global limits change (Juri Lelli)
- Fix scalability regression by implementing less agressive
dl_server handling (Peter Zijlstra)
PSI:
- Improve scalability by optimizing psi_group_change() cpu_clock()
usage (Peter Zijlstra)
Rust changes:
- Make Task, CondVar and PollCondVar methods inline to avoid
unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)
- Add might_sleep() support for Rust code: Rust's "#[track_caller]"
mechanism is used so that Rust's might_sleep() doesn't need to be
defined as a macro (Fujita Tomonori)
- Introduce file_from_location() (Boqun Feng)
Debugging & instrumentation:
- Make clangd usable with scheduler source code files again (Peter
Zijlstra)
- tools: Add root_domains_dump.py which dumps root domains info (Juri
Lelli)
- tools: Add dl_bw_dump.py for printing bandwidth accounting info
(Juri Lelli)
Misc cleanups & fixes:
- Remove play_idle() (Feng Lee)
- Fix check_preemption_disabled() (Sebastian Andrzej Siewior)
- Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
Claudio R. Goncalves)
- Correct the comment in place_entity() (wang wei)"
* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
sched/idle: Remove play_idle()
sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
sched: Start blocked_on chain processing in find_proxy_task()
sched: Fix proxy/current (push,pull)ability
sched: Add an initial sketch of the find_proxy_task() function
sched: Fix runtime accounting w/ split exec & sched contexts
sched: Move update_curr_task logic into update_curr_se
locking/mutex: Add p->blocked_on wrappers for correctness checks
locking/mutex: Rework task_struct::blocked_on
sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
sched/topology: Remove sched_domain_topology_level::flags
x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
x86/smpboot: moves x86_topology to static initialize and truncate
x86/smpboot: remove redundant CONFIG_SCHED_SMT
smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
tools/sched: Add root_domains_dump.py which dumps root domains info
sched/deadline: Fix accounting after global limits change
sched/deadline: Reset extra_bw to max_bw when clearing root domains
sched/deadline: Initialize dl_servers after SMP
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex updates from Thomas Gleixner:
- Switch the reference counting to a RCU based per-CPU reference to
address a performance bottleneck vs the single instance rcuref
variant
- Make the futex selftest build on 32-bit architectures which only
support 64-bit time_t, e.g. RISCV-32
- Cleanups and improvements in selftests and futex bench
* tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/futex: Fix spelling mistake "Succeffuly" -> "Successfully"
selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t
perf bench futex: Remove support for IMMUTABLE
selftests/futex: Remove support for IMMUTABLE
futex: Remove support for IMMUTABLE
futex: Make futex_private_hash_get() static
futex: Use RCU-based per-CPU reference counting instead of rcuref_t
selftests/futex: Adapt the private hash test to RCU related changes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc / IIO / other driver updates from Greg KH:
"Here is the big set of char/misc/iio and other smaller driver
subsystems for 6.17-rc1. It's a big set this time around, with the
huge majority being in the iio subsystem with new drivers and dts
files being added there.
Highlights include:
- IIO driver updates, additions, and changes making more code const
and cleaning up some init logic
- bus_type constant conversion changes
- misc device test functions added
- rust miscdevice minor fixup
- unused function removals for some drivers
- mei driver updates
- mhi driver updates
- interconnect driver updates
- Android binder updates and test infrastructure added
- small cdx driver updates
- small comedi fixes
- small nvmem driver updates
- small pps driver updates
- some acrn virt driver fixes for printk messages
- other small driver updates
All of these have been in linux-next with no reported issues"
* tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
binder: Use seq_buf in binder_alloc kunit tests
binder: Add copyright notice to new kunit files
misc: ti_fpc202: Switch to of_fwnode_handle()
bus: moxtet: Use dev_fwnode()
pc104: move PC104 option to drivers/Kconfig
drivers: virt: acrn: Don't use %pK through printk
comedi: fix race between polling and detaching
interconnect: qcom: Add Milos interconnect provider driver
dt-bindings: interconnect: document the RPMh Network-On-Chip Interconnect in Qualcomm Milos SoC
mei: more prints with client prefix
mei: bus: use cldev in prints
bus: mhi: host: pci_generic: Add Telit FN990B40 modem support
bus: mhi: host: Detect events pointing to unexpected TREs
bus: mhi: host: pci_generic: Add Foxconn T99W696 modem
bus: mhi: host: Use str_true_false() helper
bus: mhi: host: pci_generic: Add support for EM929x and set MRU to 32768 for better performance.
bus: mhi: host: Fix endianness of BHI vector table
bus: mhi: host: pci_generic: Disable runtime PM for QDU100
bus: mhi: host: pci_generic: Fix the modem name of Foxconn T99W640
dt-bindings: interconnect: qcom,msm8998-bwmon: Allow 'nonposted-mmio'
...
|
|
Pull io_uring updates from Jens Axboe:
- Optimization to avoid reference counts on non-cloned registered
buffers. This is how these buffers were handled prior to having
cloning support, and we can still use that approach as long as the
buffers haven't been cloned to another ring.
- Cleanup and improvement for uring_cmd, where btrfs was the only user
of storing allocated data for the lifetime of the uring_cmd. Clean
that up so we can get rid of the need to do that.
- Avoid unnecessary memory copies in uring_cmd usage. This is
particularly important as a lot of uring_cmd usage necessitates the
use of 128b SQEs.
- A few updates for recv multishot, where it's now possible to add
fairness limits for limiting how much is transferred for each retry
loop. Additionally, recv multishot now supports an overall cap as
well, where once reached the multishot recv will terminate. The
latter is useful for buffer management and juggling many recv streams
at the same time.
- Add support for returning the TX timestamps via a new socket command.
This feature can work in either singleshot or multishot mode, where
the latter triggers a completion whenever new timestamps are
available. This is an alternative to using the existing error queue.
- Add support for an io_uring "mock" file, which is the start of being
able to do 100% targeted testing in terms of exercising io_uring
request handling. The idea is to have a file type that can be
anything the tester would like, and behave exactly how you want it to
behave in terms of hitting the code paths you want.
- Improve zcrx by using sgtables to de-duplicate and improve dma
address handling.
- Prep work for supporting larger pages for zcrx.
- Various little improvements and fixes.
* tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits)
io_uring/zcrx: fix leaking pages on sg init fail
io_uring/zcrx: don't leak pages on account failure
io_uring/zcrx: fix null ifq on area destruction
io_uring: fix breakage in EXPERT menu
io_uring/cmd: remove struct io_uring_cmd_data
btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd
io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag
io_uring/zcrx: account area memory
io_uring: export io_[un]account_mem
io_uring/net: Support multishot receive len cap
io_uring: deduplicate wakeup handling
io_uring/net: cast min_not_zero() type
io_uring/poll: cleanup apoll freeing
io_uring/net: allow multishot receive per-invocation cap
io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags
io_uring/net: use passed in 'len' in io_recv_buf_select()
io_uring/zcrx: prepare fallback for larger pages
io_uring/zcrx: assert area type in io_zcrx_iov_page
io_uring/zcrx: allocate sgtable for umem areas
io_uring/zcrx: introduce io_populate_area_dma
...
|
|
Put the PC104 kconfig option in drivers/Kconfig along with
other buses (AMBA, EISA, PCI, CXL, PCCard, & RapidIO).
This localizes PC104 with option bus kconfig options to make
it easier to find.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: William Breathitt Gray <wbg@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250722235431.3671754-1-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add a dependency for IO_URING for the GCOV_PROFILE_URING symbol.
Without this patch the EXPERT config menu ends with
"Enable IO uring support" and the menu prompts for
GCOV_PROFILE_URING and IO_URING_MOCK_FILE are not subordinate to it.
This causes all of the EXPERT Kconfig options that follow
GCOV_PROFILE_URING to be display in the "upper" menu (General setup),
just following the EXPERT menu.
Fixes: 1802656ef890 ("io_uring: add GCOV_PROFILE_URING Kconfig option")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: io-uring@vger.kernel.org
Link: https://lore.kernel.org/r/20250720010456.2945344-1-rdunlap@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The vmstat_text array contains labels for counters displayed in
/proc/vmstat. It is important to keep the labels in sync with the
counters.
There is a BUILD_BUG_ON() check in vmstat_start() that ensures the size of
the vmstat_text is not smaller than VM_EVENT_COUNTERS. This helps to
catch cases where a new counter is added but the label is not. However,
it does not help if a counter is removed but the label remains.
It would be nice to make the BUILD_BUG_ON() check more strict to catch
such cases. However, when compiling with MEMCG enabled but
VM_EVENT_COUNTERS disabled, the vmstat_text array is larger than
NR_VMSTAT_ITEMS.
This issue arises because some elements of the vmstat_text array are
present when either MEMCG or VM_EVENT_COUNTERS is enabled, but
NR_VMSTAT_ITEMS only accounts for these elements if VM_EVENT_COUNTERS is
enabled.
Instead of adjusting the NR_VMSTAT_ITEMS definition to account for MEMCG,
make MEMCG select VM_EVENT_COUNTERS. VM_EVENT_COUNTERS is enabled in most
configurations anyway.
Link: https://lkml.kernel.org/r/20250604095111.533783-1-kirill.shutemov@linux.intel.com
Fixes: ebc5d83d0443 ("mm/memcontrol: use vmstat names for printing statistics")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument
sched_proxy_exec= that can be used to disable the feature at boot
time if CONFIG_SCHED_PROXY_EXEC was enabled.
Also uses this option to allow the rq->donor to be different from
rq->curr.
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250712033407.2383110-2-jstultz@google.com
|
|
Avoid merge conflicts
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
The use of rcuref_t for reference counting introduces a performance bottleneck
when accessed concurrently by multiple threads during futex operations.
Replace rcuref_t with special crafted per-CPU reference counters. The
lifetime logic remains the same.
The newly allocate private hash starts in FR_PERCPU state. In this state, each
futex operation that requires the private hash uses a per-CPU counter (an
unsigned int) for incrementing or decrementing the reference count.
When the private hash is about to be replaced, the per-CPU counters are
migrated to a atomic_t counter mm_struct::futex_atomic.
The migration process:
- Waiting for one RCU grace period to ensure all users observe the
current private hash. This can be skipped if a grace period elapsed
since the private hash was assigned.
- futex_private_hash::state is set to FR_ATOMIC, forcing all users to
use mm_struct::futex_atomic for reference counting.
- After a RCU grace period, all users are guaranteed to be using the
atomic counter. The per-CPU counters can now be summed up and added to
the atomic_t counter. If the resulting count is zero, the hash can be
safely replaced. Otherwise, active users still hold a valid reference.
- Once the atomic reference count drops to zero, the next futex
operation will switch to the new private hash.
call_rcu_hurry() is used to speed up transition which otherwise might be
delay with RCU_LAZY. There is nothing wrong with using call_rcu(). The
side effects would be that on auto scaling the new hash is used later
and the SET_SLOTS prctl() will block longer.
[bigeasy: commit description + mm get/ put_async]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250710110011.384614-3-bigeasy@linutronix.de
|
|
io_uring commands provide an ioctl style interface for files to
implement file specific operations. io_uring provides many features and
advanced api to commands, and it's getting hard to test as it requires
specific files/devices.
Add basic infrastucture for creating special mock files that will be
implementing the cmd api and using various io_uring features we want to
test. It'll also be useful to test some more obscure read/write/polling
edge cases in the future.
Suggested-by: chase xd <sl1589472800@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93f21b0af58c1367a2b22635d5a7d694ad0272fc.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Chris Mason reported a performance regression on big iron. Reports of
this kind were usually reported as part of a micro benchmark but Chris'
test did mimic his real workload. This makes it a real regression.
The root cause is rcuref_get() which is invoked during each futex
operation. If all threads of an application do this simultaneously then
it leads to cache line bouncing and the performance drops.
Disable FUTEX_PRIVATE_HASH entirely for this cycle. The performance
regression will be addressed in the following cycle enabling the option
again.
Closes: https://lore.kernel.org/all/3ad05298-351e-4d61-9972-ca45a0a50e33@meta.com/
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250630145034.8JnINEaS@linutronix.de
|
|
Most of kernel debugging facilities take a nul-terminated string for
file names for a callsite (generated from __FILE__), however the Rust
courterpart, Location, would return a Rust string (not nul-terminated)
from method .file(). And such a string cannot be passed to C debugging
function directly.
There is ongoing work to support a Location::file_with_nul() [1], which
returns a nul-terminated string from a Location. Since it's still
working in progress, and it will take some time before the feature
finally gets stabilized and the kernel's minimal rustc version might
also take a while to bump to a version that at least has that feature,
introduce a file_from_location() function, which returns a warning
string if Location::file_with_nul() is not available.
This should work in most cases because as for now the known usage of
Location::file_with_nul() is only in debugging code (e.g. might_sleep())
and there might be other information reported by the debugging code that
could help locate the problematic function, so missing the file name is
fine at the moment.
Link: https://github.com/rust-lang/rust/issues/141727 [1]
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250619151007.61767-2-boqun.feng@gmail.com
|
|
From 077814f57f8acce13f91dc34bbd2b7e4911fbf25 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Fri, 13 Jun 2025 15:06:47 -1000
- Add CONFIG_GROUP_SCHED_BANDWIDTH which is selected by both
CONFIG_CFS_BANDWIDTH and EXT_GROUP_SCHED.
- Put bandwidth control interface files for both cgroup v1 and v2 under
CONFIG_GROUP_SCHED_BANDWIDTH.
- Update tg_bandwidth() to fetch configuration parameters from fair if
CONFIG_CFS_BANDWIDTH, SCX otherwise.
- Update tg_set_bandwidth() to update the parameters for both fair and SCX.
- Add bandwidth control parameters to struct scx_cgroup_init_args.
- Add sched_ext_ops.cgroup_set_bandwidth() which is invoked on bandwidth
control parameter updates.
- Update scx_qmap and maximal selftest to test the new feature.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The KUnit test validates the correct operation of the ringbuffer.
A separate dedicated ringbuffer is used so that the global printk
ringbuffer is not touched.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20250612-printk-ringbuffer-test-v3-1-550c088ee368@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- KUnit '#[test]'s:
- Support KUnit-mapped 'assert!' macros.
The support that landed last cycle was very basic, and the
'assert!' macros panicked since they were the standard library
ones. Now, they are mapped to the KUnit ones in a similar way to
how is done for doctests, reusing the infrastructure there.
With this, a failing test like:
#[test]
fn my_first_test() {
assert_eq!(42, 43);
}
will report:
# my_first_test: ASSERTION FAILED at rust/kernel/lib.rs:251
Expected 42 == 43 to be true, but is false
# my_first_test.speed: normal
not ok 1 my_first_test
- Support tests with checked 'Result' return types.
The return value of test functions that return a 'Result' will
be checked, thus one can now easily catch errors when e.g. using
the '?' operator in tests.
With this, a failing test like:
#[test]
fn my_test() -> Result {
f()?;
Ok(())
}
will report:
# my_test: ASSERTION FAILED at rust/kernel/lib.rs:321
Expected is_test_result_ok(my_test()) to be true, but is false
# my_test.speed: normal
not ok 1 my_test
- Add 'kunit_tests' to the prelude.
- Clarify the remaining language unstable features in use.
- Compile 'core' with edition 2024 for Rust >= 1.87.
- Workaround 'bindgen' issue with forward references to 'enum' types.
- objtool: relax slice condition to cover more 'noreturn' functions.
- Use absolute paths in macros referencing 'core' and 'kernel'
crates.
- Skip '-mno-fdpic' flag for bindgen in GCC 32-bit arm builds.
- Clean some 'doc_markdown' lint hits -- we may enable it later on.
'kernel' crate:
- 'alloc' module:
- 'Box': support for type coercion, e.g. 'Box<T>' to 'Box<dyn U>'
if 'T' implements 'U'.
- 'Vec': implement new methods (prerequisites for nova-core and
binder): 'truncate', 'resize', 'clear', 'pop',
'push_within_capacity' (with new error type 'PushError'),
'drain_all', 'retain', 'remove' (with new error type
'RemoveError'), insert_within_capacity' (with new error type
'InsertError').
In addition, simplify 'push' using 'spare_capacity_mut', split
'set_len' into 'inc_len' and 'dec_len', add type invariant 'len
<= capacity' and simplify 'truncate' using 'dec_len'.
- 'time' module:
- Morph the Rust hrtimer subsystem into the Rust timekeeping
subsystem, covering delay, sleep, timekeeping, timers. This new
subsystem has all the relevant timekeeping C maintainers listed
in the entry.
- Replace 'Ktime' with 'Delta' and 'Instant' types to represent a
duration of time and a point in time.
- Temporarily add 'Ktime' to 'hrtimer' module to allow 'hrtimer'
to delay converting to 'Instant' and 'Delta'.
- 'xarray' module:
- Add a Rust abstraction for the 'xarray' data structure. This
abstraction allows Rust code to leverage the 'xarray' to store
types that implement 'ForeignOwnable'. This support is a
dependency for memory backing feature of the Rust null block
driver, which is waiting to be merged.
- Set up an entry in 'MAINTAINERS' for the XArray Rust support.
Patches will go to the new Rust XArray tree and then via the
Rust subsystem tree for now.
- Allow 'ForeignOwnable' to carry information about the pointed-to
type. This helps asserting alignment requirements for the
pointer passed to the foreign language.
- 'container_of!': retain pointer mut-ness and add a compile-time
check of the type of the first parameter ('$field_ptr').
- Support optional message in 'static_assert!'.
- Add C FFI types (e.g. 'c_int') to the prelude.
- 'str' module: simplify KUnit tests 'format!' macro, convert
'rusttest' tests into KUnit, take advantage of the '-> Result'
support in KUnit '#[test]'s.
- 'list' module: add examples for 'List', fix path of
'assert_pinned!' (so far unused macro rule).
- 'workqueue' module: remove 'HasWork::OFFSET'.
- 'page' module: add 'inline' attribute.
'macros' crate:
- 'module' macro: place 'cleanup_module()' in '.exit.text' section.
'pin-init' crate:
- Add 'Wrapper<T>' trait for creating pin-initializers for wrapper
structs with a structurally pinned value such as 'UnsafeCell<T>' or
'MaybeUninit<T>'.
- Add 'MaybeZeroable' derive macro to try to derive 'Zeroable', but
not error if not all fields implement it. This is needed to derive
'Zeroable' for all bindgen-generated structs.
- Add 'unsafe fn cast_[pin_]init()' functions to unsafely change the
initialized type of an initializer. These are utilized by the
'Wrapper<T>' implementations.
- Add support for visibility in 'Zeroable' derive macro.
- Add support for 'union's in 'Zeroable' derive macro.
- Upstream dev news: streamline CI, fix some bugs. Add new workflows
to check if the user-space version and the one in the kernel tree
have diverged. Use the issues tab [1] to track them, which should
help folks report and diagnose issues w.r.t. 'pin-init' better.
[1] https://github.com/rust-for-linux/pin-init/issues
Documentation:
- Testing: add docs on the new KUnit '#[test]' tests.
- Coding guidelines: explain that '///' vs. '//' applies to private
items too. Add section on C FFI types.
- Quick Start guide: update Ubuntu instructions and split them into
"25.04" and "24.04 LTS and older".
And a few other cleanups and improvements"
* tag 'rust-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (78 commits)
rust: list: Fix typo `much` in arc.rs
rust: check type of `$ptr` in `container_of!`
rust: workqueue: remove HasWork::OFFSET
rust: retain pointer mut-ness in `container_of!`
Documentation: rust: testing: add docs on the new KUnit `#[test]` tests
Documentation: rust: rename `#[test]`s to "`rusttest` host tests"
rust: str: take advantage of the `-> Result` support in KUnit `#[test]`'s
rust: str: simplify KUnit tests `format!` macro
rust: str: convert `rusttest` tests into KUnit
rust: add `kunit_tests` to the prelude
rust: kunit: support checked `-> Result`s in KUnit `#[test]`s
rust: kunit: support KUnit-mapped `assert!` macros in `#[test]`s
rust: make section names plural
rust: list: fix path of `assert_pinned!`
rust: compile libcore with edition 2024 for 1.87+
rust: dma: add missing Markdown code span
rust: task: add missing Markdown code spans and intra-doc links
rust: pci: fix docs related to missing Markdown code spans
rust: alloc: add missing Markdown code span
rust: alloc: add missing Markdown code spans
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- "zram: support algorithm-specific parameters" from Sergey Senozhatsky
adds infrastructure for passing algorithm-specific parameters into
zram. A single parameter `winbits' is implemented at this time.
- "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg
charging nmi-safe, which is required by BFP, which can operate in NMI
context.
- "Some random fixes and cleanup to shmem" from Kemeng Shi implements
small fixes and cleanups in the shmem code.
- "Skip mm selftests instead when kernel features are not present" from
Zi Yan fixes some issues in the MM selftest code.
- "mm/damon: build-enable essential DAMON components by default" from
SeongJae Park reworks DAMON Kconfig to make it easier to enable
CONFIG_DAMON.
- "sched/numa: add statistics of numa balance task migration" from Libo
Chen adds more info into sysfs and procfs files to improve visibility
into the NUMA balancer's task migration activity.
- "selftests/mm: cow and gup_longterm cleanups" from Mark Brown
provides various updates to some of the MM selftests to make them
play better with the overall containing framework.
* tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits)
mm/khugepaged: clean up refcount check using folio_expected_ref_count()
selftests/mm: fix test result reporting in gup_longterm
selftests/mm: report unique test names for each cow test
selftests/mm: add helper for logging test start and results
selftests/mm: use standard ksft_finished() in cow and gup_longterm
selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
sched/numa: add statistics of numa balance task
sched/numa: fix task swap by skipping kernel threads
tools/testing: check correct variable in open_procmap()
tools/testing/vma: add missing function stub
mm/gup: update comment explaining why gup_fast() disables IRQs
selftests/mm: two fixes for the pfnmap test
mm/khugepaged: fix race with folio split/free using temporary reference
mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order
mmu_notifiers: remove leftover stub macros
selftests/mm: deduplicate test names in madv_populate
kcov: rust: add flags for KCOV with Rust
mm: rust: make CONFIG_MMU ifdefs more narrow
mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
mm/damon/Kconfig: enable CONFIG_DAMON by default
...
|
|
There are archs which have NMI but does not support this_cpu_* ops safely
in the nmi context but they support safe atomic ops in nmi context. For
such archs, let's add infra to use atomic ops for the memcg stats which
can be updated in nmi.
At the moment, the memcg stats which get updated in the objcg charging
path are MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B.
Rather than adding support for all memcg stats to be nmi safe, let's just
add infra to make these three stats nmi safe which this patch is doing.
Link: https://lkml.kernel.org/r/20250519063142.111219-3-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "memcg: nmi-safe kmem charging", v4.
Users can attached their BPF programs at arbitrary execution points in the
kernel and such BPF programs may run in nmi context. In addition, these
programs can trigger memcg charged kernel allocations in the nmi context.
However memcg charging infra for kernel memory is not equipped to handle
nmi context for all architectures.
This series removes the hurdles to enable kmem charging in the nmi context
for most of the archs. For archs without CONFIG_HAVE_NMI, this series is
a noop. For archs with NMI support and have
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS, the previous work to make memcg
stats re-entrant is sufficient for allowing kmem charging in nmi context.
For archs with NMI support but without
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and with ARCH_HAVE_NMI_SAFE_CMPXCHG,
this series added infra to support kmem charging in nmi context. Lastly
those archs with NMI support but without
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and ARCH_HAVE_NMI_SAFE_CMPXCHG, kmem
charging in nmi context is not supported at all.
Mostly used archs have support for CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
and this series should be almost a noop (other than making
memcg_rstat_updated nmi safe) for such archs.
This patch (of 5):
The memcg accounting and stats uses this_cpu* and atomic* ops. There are
archs which define CONFIG_HAVE_NMI but does not define
CONFIG_ARCH_HAS_NMI_SAFE_THIS_CPU_OPS and ARCH_HAVE_NMI_SAFE_CMPXCHG, so
memcg accounting for such archs in nmi context is not possible to support.
Let's just disable memcg accounting in nmi context for such archs.
Link: https://lkml.kernel.org/r/20250519063142.111219-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20250519063142.111219-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- Update overflow helpers to ease refactoring of on-stack flex array
instances (Gustavo A. R. Silva, Kees Cook)
- lkdtm: Use SLAB_NO_MERGE instead of constructors (Harry Yoo)
- Simplify CONFIG_CC_HAS_COUNTED_BY (Jan Hendrik Farr)
- Disable u64 usercopy KUnit test on 32-bit SPARC (Thomas Weißschuh)
- Add missed designated initializers now exposed by fixed randstruct
(Nathan Chancellor, Kees Cook)
- Document compilers versions for __builtin_dynamic_object_size
- Remove ARM_SSP_PER_TASK GCC plugin
- Fix GCC plugin randstruct, add selftests, and restore COMPILE_TEST
builds
- Kbuild: induce full rebuilds when dependencies change with GCC
plugins, the Clang sanitizer .scl file, or the randstruct seed.
- Kbuild: Switch from -Wvla to -Wvla-larger-than=1
- Correct several __nonstring uses for -Wunterminated-string-initialization
* tag 'hardening-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
Revert "hardening: Disable GCC randstruct for COMPILE_TEST"
lib/tests: randstruct: Add deep function pointer layout test
lib/tests: Add randstruct KUnit test
randstruct: gcc-plugin: Remove bogus void member
net: qede: Initialize qede_ll_ops with designated initializer
scsi: qedf: Use designated initializer for struct qed_fcoe_cb_ops
md/bcache: Mark __nonstring look-up table
integer-wrap: Force full rebuild when .scl file changes
randstruct: Force full rebuild when seed changes
gcc-plugins: Force full rebuild when plugins change
kbuild: Switch from -Wvla to -Wvla-larger-than=1
hardening: simplify CONFIG_CC_HAS_COUNTED_BY
overflow: Fix direct struct member initialization in _DEFINE_FLEX()
kunit/overflow: Add tests for STACK_FLEX_ARRAY_SIZE() helper
overflow: Add STACK_FLEX_ARRAY_SIZE() helper
input/joystick: magellan: Mark __nonstring look-up table const
watchdog: exar: Shorten identity name to fit correctly
mod_devicetable: Enlarge the maximum platform_device_id name length
overflow: Clarify expectations for getting DEFINE_FLEX variable sizes
compiler_types: Identify compiler versions for __builtin_dynamic_object_size
...
|