Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core & fair scheduler changes:
- Cancel the slice protection of the idle entity (Zihan Zhou)
- Reduce the default slice to avoid tasks getting an extra tick
(Zihan Zhou)
- Force propagating min_slice of cfs_rq when {en,de}queue tasks
(Tianchen Ding)
- Refactor can_migrate_task() to elimate looping (I Hsin Cheng)
- Add unlikey branch hints to several system calls (Colin Ian King)
- Optimize current_clr_polling() on certain architectures (Yujun
Dong)
Deadline scheduler: (Juri Lelli)
- Remove redundant dl_clear_root_domain call
- Move dl_rebuild_rd_accounting to cpuset.h
Uclamp:
- Use the uclamp_is_used() helper instead of open-coding it (Xuewen
Yan)
- Optimize sched_uclamp_used static key enabling (Xuewen Yan)
Scheduler topology support: (Juri Lelli)
- Ignore special tasks when rebuilding domains
- Add wrappers for sched_domains_mutex
- Generalize unique visiting of root domains
- Rebuild root domain accounting after every update
- Remove partition_and_rebuild_sched_domains
- Stop exposing partition_sched_domains_locked
RSEQ: (Michael Jeanson)
- Update kernel fields in lockstep with CONFIG_DEBUG_RSEQ=y
- Fix segfault on registration when rseq_cs is non-zero
- selftests: Add rseq syscall errors test
- selftests: Ensure the rseq ABI TLS is actually 1024 bytes
Membarriers:
- Fix redundant load of membarrier_state (Nysal Jan K.A.)
Scheduler debugging:
- Introduce and use preempt_model_str() (Sebastian Andrzej Siewior)
- Make CONFIG_SCHED_DEBUG unconditional (Ingo Molnar)
Fixes and cleanups:
- Always save/restore x86 TSC sched_clock() on suspend/resume
(Guilherme G. Piccoli)
- Misc fixes and cleanups (Thorsten Blum, Juri Lelli, Sebastian
Andrzej Siewior)"
* tag 'sched-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
cpuidle, sched: Use smp_mb__after_atomic() in current_clr_polling()
sched/debug: Remove CONFIG_SCHED_DEBUG
sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files
sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation
sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional
sched/debug: Make 'const_debug' tunables unconditional __read_mostly
sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE()
rseq/selftests: Fix namespace collision with rseq UAPI header
include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
sched/topology: Stop exposing partition_sched_domains_locked
cgroup/cpuset: Remove partition_and_rebuild_sched_domains
sched/topology: Remove redundant dl_clear_root_domain call
sched/deadline: Rebuild root domain accounting after every update
sched/deadline: Generalize unique visiting of root domains
sched/topology: Wrappers for sched_domains_mutex
sched/deadline: Ignore special tasks when rebuilding domains
tracing: Use preempt_model_str()
xtensa: Rely on generic printing of preemption model
x86: Rely on generic printing of preemption model
s390: Rely on generic printing of preemption model
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Boqun Feng:
"Documentation:
- Add broken-timing possibility to stallwarn.rst
- Improve discussion of this_cpu_ptr(), add raw_cpu_ptr()
- Document self-propagating callbacks
- Point call_srcu() to call_rcu() for detailed memory ordering
- Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
- Clarify RCU_LAZY and RCU_LAZY_DEFAULT_OFF help text
- Remove references to old grace-period-wait primitives
srcu:
- Introduce srcu_read_{un,}lock_fast(), which is similar to
srcu_read_{un,}lock_lite(): avoid smp_mb()s in lock and unlock
at the cost of calling synchronize_rcu() in synchronize_srcu()
Moreover, by returning the percpu offset of the counter at
srcu_read_lock_fast() time, srcu_read_unlock_fast() can avoid
extra pointer dereferencing, which makes it faster than
srcu_read_{un,}lock_lite()
srcu_read_{un,}lock_fast() are intended to replace
rcu_read_{un,}lock_trace() if possible
RCU torture:
- Add get_torture_init_jiffies() to return the start time of the test
- Add a test_boost_holdoff module parameter to allow delaying
boosting tests when building rcutorture as built-in
- Add grace period sequence number logging at the beginning and end
of failure/close-call results
- Switch to hexadecimal for the expedited grace period sequence
number in the rcu_exp_grace_period trace point
- Make cur_ops->format_gp_seqs take buffer length
- Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
- Complain when invalid SRCU reader_flavor is specified
- Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing, which forces SRCU
uses atomics even when percpu ops are NMI safe, and use the Kconfig
for SRCU lockdep testing
Misc:
- Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
- Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
- Fix get_state_synchronize_rcu_full() GP-start detection
- Move RCU Tasks self-tests to core_initcall()
- Print segment lengths in show_rcu_nocb_gp_state()
- Make RCU watch ct_kernel_exit_state() warning
- Flush console log from kernel_power_off()
- rcutorture: Allow a negative value for nfakewriters
- rcu: Update TREE05.boot to test normal synchronize_rcu()
- rcu: Use _full() API to debug synchronize_rcu()
Make RCU handle PREEMPT_LAZY better:
- Fix header guard for rcu_all_qs()
- rcu: Rename PREEMPT_AUTO to PREEMPT_LAZY
- Update __cond_resched comment about RCU quiescent states
- Handle unstable rdp in rcu_read_unlock_strict()
- Handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
- osnoise: Provide quiescent states
- Adjust rcutorture with possible PREEMPT_RCU=n && PREEMPT_COUNT=y
combination
- Limit PREEMPT_RCU configurations
- Make rcutorture senario TREE07 and senario TREE10 use
PREEMPT_LAZY=y"
* tag 'rcu-next-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (59 commits)
rcutorture: Make scenario TREE07 build CONFIG_PREEMPT_LAZY=y
rcutorture: Make scenario TREE10 build CONFIG_PREEMPT_LAZY=y
rcu: limit PREEMPT_RCU configurations
rcutorture: Update ->extendables check for lazy preemption
rcutorture: Update rcutorture_one_extend_check() for lazy preemption
osnoise: provide quiescent states
rcu: Use _full() API to debug synchronize_rcu()
rcu: Update TREE05.boot to test normal synchronize_rcu()
rcutorture: Allow a negative value for nfakewriters
Flush console log from kernel_power_off()
context_tracking: Make RCU watch ct_kernel_exit_state() warning
rcu/nocb: Print segment lengths in show_rcu_nocb_gp_state()
rcu-tasks: Move RCU Tasks self-tests to core_initcall()
rcu: Fix get_state_synchronize_rcu_full() GP-start detection
torture: Make SRCU lockdep testing use srcu_read_lock_nmisafe()
srcu: Add FORCE_NEED_SRCU_NMI_SAFE Kconfig for testing
rcutorture: Complain when invalid SRCU reader_flavor is specified
rcutorture: Move RCU_TORTURE_TEST_{CHK_RDR_STATE,LOG_CPU} to bool
rcutorture: Make cur_ops->format_gp_seqs take buffer length
rcutorture: Add ftrace-compatible timestamp to GP# failure/close-call output
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
- 32bit s390 support
- opendir() and friends
- openat() support
- sscanf() support
- various cleanups
[ Paul has just forwarded the pull request from Thomas Weißschuh, so
the tag signature is from Thomas, not Paul - Linus ]
* tag 'nolibc-20250308-for-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (26 commits)
tools/nolibc: don't use asm/ UAPI headers
selftests/nolibc: stop testing constructor order
selftests/nolibc: use O_RDONLY flag instead of 0
tools/nolibc: drop outdated example from overview comment
tools/nolibc: process open() vararg as mode_t
tools/nolibc: always use openat(2) instead of open(2)
tools/nolibc: add support for openat(2)
selftests/nolibc: add armthumb configuration
selftests/nolibc: explicitly enable ARM mode
Revert "selftests: kselftest: Fix build failure with NOLIBC"
tools/nolibc: add support for [v]sscanf()
tools/nolibc: add support for 32-bit s390
selftests/nolibc: rename s390 to s390x
selftests/nolibc: only run constructor tests on nolibc
selftests/nolibc: split up architecture list in run-tests.sh
tools/nolibc: add support for directory access
tools/nolibc: add support for sys_llseek()
selftests/nolibc: always keep test kernel configuration up to date
selftests/nolibc: execute defconfig before other targets
selftests/nolibc: drop call to mrproper target
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Add mechanism to count and report internal events. This significantly
improves visibility on subtle corner conditions.
- The default idle CPU selection logic is revamped and improved in
multiple ways including being made topology aware.
- sched_ext was disabling ttwu_queue for simplicity, which can be
costly when hardware topology is more complex. Implement
SCX_OPS_ALLOWED_QUEUED_WAKEUP so that BPF schedulers can selectively
enable ttwu_queue.
- tools/sched_ext updates to improve compatibility among others.
- Other misc updates and fixes.
- sched_ext/for-6.14-fixes were pulled a few times to receive
prerequisite fixes and resolve conflicts.
* tag 'sched_ext-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (42 commits)
sched_ext: idle: Refactor scx_select_cpu_dfl()
sched_ext: idle: Honor idle flags in the built-in idle selection policy
sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()
sched_ext: Add trace point to track sched_ext core events
sched_ext: Change the event type from u64 to s64
sched_ext: Documentation: add task lifecycle summary
tools/sched_ext: Provide a compatible helper for scx_bpf_events()
selftests/sched_ext: Add NUMA-aware scheduler test
tools/sched_ext: Provide consistent access to scx flags
sched_ext: idle: Fix scx_bpf_pick_any_cpu_node() behavior
sched_ext: idle: Introduce scx_bpf_nr_node_ids()
sched_ext: idle: Introduce node-aware idle cpu kfunc helpers
sched_ext: idle: Per-node idle cpumasks
sched_ext: idle: Introduce SCX_OPS_BUILTIN_IDLE_PER_NODE
sched_ext: idle: Make idle static keys private
sched/topology: Introduce for_each_node_numadist() iterator
mm/numa: Introduce nearest_node_nodemask()
nodemask: numa: reorganize inclusion path
nodemask: add nodes_copy()
tools/sched_ext: Sync with scx repo
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
- avoid the lock trip seccomp_filter_release in common case (Mateusz
Guzik)
- remove unused 'sd' argument through-out (Oleg Nesterov)
- selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64
* tag 'seccomp-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: avoid the lock trip seccomp_filter_release in common case
seccomp: remove the 'sd' argument from __seccomp_filter()
seccomp: remove the 'sd' argument from __secure_computing()
seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER
seccomp/mips: change syscall_trace_enter() to use secure_computing()
selftests/seccomp: Add hard-coded __NR_uretprobe for x86_64
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull lib kunit selftest move from Kees Cook:
"This is a one-off tree to coordinate the move of selftests out of lib/
and into lib/tests/. A separate tree was used for this to keep the
paths sane with all the work in the same place.
- move lib/ selftests into lib/tests/ (Kees Cook, Gabriela
Bittencourt, Luis Felipe Hernandez, Lukas Bulwahn, Tamir
Duberstein)
- lib/math: Add int_log test suite (Bruno Sobreira França)
- lib/math: Add Kunit test suite for gcd() (Yu-Chun Lin)
- lib/tests/kfifo_kunit.c: add tests for the kfifo structure (Diego
Vieira)
- unicode: refactor selftests into KUnit (Gabriela Bittencourt)
- lib/prime_numbers: convert self-test to KUnit (Tamir Duberstein)
- printf: convert self-test to KUnit (Tamir Duberstein)
- scanf: convert self-test to KUnit (Tamir Duberstein)"
* tag 'move-lib-kunit-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (21 commits)
scanf: break kunit into test cases
scanf: convert self-test to KUnit
scanf: remove redundant debug logs
scanf: implicate test line in failure messages
printf: implicate test line in failure messages
printf: break kunit into test cases
printf: convert self-test to KUnit
kunit/fortify: Replace "volatile" with OPTIMIZER_HIDE_VAR()
kunit/fortify: Expand testing of __compiletime_strlen()
kunit/stackinit: Use fill byte different from Clang i386 pattern
kunit/overflow: Fix DEFINE_FLEX tests for counted_by
selftests: remove reference to prime_numbers.sh
MAINTAINERS: adjust entries in FORTIFY_SOURCE and KERNEL HARDENING
lib/prime_numbers: convert self-test to KUnit
lib/math: Add Kunit test suite for gcd()
unicode: kunit: change tests filename and path
unicode: kunit: refactor selftest to kunit tests
lib/tests/kfifo_kunit.c: add tests for the kfifo structure
lib: Move KUnit tests into tests/ subdirectory
lib/math: Add int_log test suite
...
|
|
Extend flood test to configure FDB entry with unresolved destination IP,
check that packets are not sent twice.
Without the previous patch which handles such scenario in mlxsw, the
tests fail:
$ TESTS='test_flood' ./vxlan_bridge_1d.sh
Running tests with UDP port 4789
TEST: VXLAN: flood [ OK ]
TEST: VXLAN: flood, unresolved FDB entry [FAIL]
vx2 ns2: Expected to capture 10 packets, got 20.
$ TESTS='test_flood' ./vxlan_bridge_1q.sh
INFO: Running tests with UDP port 4789
TEST: VXLAN: flood vlan 10 [ OK ]
TEST: VXLAN: flood vlan 20 [ OK ]
TEST: VXLAN: flood vlan 10, unresolved FDB entry [FAIL]
vx10 ns2: Expected to capture 10 packets, got 20.
TEST: VXLAN: flood vlan 20, unresolved FDB entry [FAIL]
vx20 ns2: Expected to capture 10 packets, got 20.
With the previous patch, the tests pass.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/7bc96e317531f3bf06319fb2ea447bd8666f29fa.1742224300.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The test_rss_context_dump() test assumes the indirection table is always
supported, which is not true for all drivers, e.g., virtio_net when
VIRTIO_NET_F_RSS is disabled.
Skip the check if 'indir' is not present.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250318112426.386651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add more imix_weights test cases (for incomplete input).
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250317090401.1240704-2-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount namespace updates from Christian Brauner:
"This expands the ability of anonymous mount namespaces:
- Creating detached mounts from detached mounts
Currently, detached mounts can only be created from attached
mounts. This limitaton prevents various use-cases. For example, the
ability to mount a subdirectory without ever having to make the
whole filesystem visible first.
The current permission modelis:
(1) Check that the caller is privileged over the owning user
namespace of it's current mount namespace.
(2) Check that the caller is located in the mount namespace of the
mount it wants to create a detached copy of.
While it is not strictly necessary to do it this way it is
consistently applied in the new mount api. This model will also be
used when allowing the creation of detached mount from another
detached mount.
The (1) requirement can simply be met by performing the same check
as for the non-detached case, i.e., verify that the caller is
privileged over its current mount namespace.
To meet the (2) requirement it must be possible to infer the origin
mount namespace that the anonymous mount namespace of the detached
mount was created from.
The origin mount namespace of an anonymous mount is the mount
namespace that the mounts that were copied into the anonymous mount
namespace originate from.
In order to check the origin mount namespace of an anonymous mount
namespace the sequence number of the original mount namespace is
recorded in the anonymous mount namespace.
With this in place it is possible to perform an equivalent check
(2') to (2). The origin mount namespace of the anonymous mount
namespace must be the same as the caller's mount namespace. To
establish this the sequence number of the caller's mount namespace
and the origin sequence number of the anonymous mount namespace are
compared.
The caller is always located in a non-anonymous mount namespace
since anonymous mount namespaces cannot be setns()ed into. The
caller's mount namespace will thus always have a valid sequence
number.
The owning namespace of any mount namespace, anonymous or
non-anonymous, can never change. A mount attached to a
non-anonymous mount namespace can never change mount namespace.
If the sequence number of the non-anonymous mount namespace and the
origin sequence number of the anonymous mount namespace match, the
owning namespaces must match as well.
Hence, the capability check on the owning namespace of the caller's
mount namespace ensures that the caller has the ability to copy the
mount tree.
- Allow mount detached mounts on detached mounts
Currently, detached mounts can only be mounted onto attached
mounts. This limitation makes it impossible to assemble a new
private rootfs and move it into place. Instead, a detached tree
must be created, attached, then mounted open and then either moved
or detached again. Lift this restriction.
In order to allow mounting detached mounts onto other detached
mounts the same permission model used for creating detached mounts
from detached mounts can be used (cf. above).
Allowing to mount detached mounts onto detached mounts leaves three
cases to consider:
(1) The source mount is an attached mount and the target mount is
a detached mount. This would be equivalent to moving a mount
between different mount namespaces. A caller could move an
attached mount to a detached mount. The detached mount can now
be freely attached to any mount namespace. This changes the
current delegatioh model significantly for no good reason. So
this will fail.
(2) Anonymous mount namespaces are always attached fully, i.e., it
is not possible to only attach a subtree of an anoymous mount
namespace. This simplifies the implementation and reasoning.
Consequently, if the anonymous mount namespace of the source
detached mount and the target detached mount are the identical
the mount request will fail.
(3) The source mount's anonymous mount namespace is different from
the target mount's anonymous mount namespace.
In this case the source anonymous mount namespace of the
source mount tree must be freed after its mounts have been
moved to the target anonymous mount namespace. The source
anonymous mount namespace must be empty afterwards.
By allowing to mount detached mounts onto detached mounts a caller
may do the following:
fd_tree1 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE)
fd_tree2 = open_tree(-EBADF, "/tmp", OPEN_TREE_CLONE)
fd_tree1 and fd_tree2 refer to two different detached mount trees
that belong to two different anonymous mount namespace.
It is important to note that fd_tree1 and fd_tree2 both refer to
the root of their respective anonymous mount namespaces.
By allowing to mount detached mounts onto detached mounts the
caller may now do:
move_mount(fd_tree1, "", fd_tree2, "",
MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH)
This will cause the detached mount referred to by fd_tree1 to be
mounted on top of the detached mount referred to by fd_tree2.
Thus, the detached mount fd_tree1 is moved from its separate
anonymous mount namespace into fd_tree2's anonymous mount
namespace.
It also means that while fd_tree2 continues to refer to the root of
its respective anonymous mount namespace fd_tree1 doesn't anymore.
This has the consequence that only fd_tree2 can be moved to another
anonymous or non-anonymous mount namespace. Moving fd_tree1 will
now fail as fd_tree1 doesn't refer to the root of an anoymous mount
namespace anymore.
Now fd_tree1 and fd_tree2 refer to separate detached mount trees
referring to the same anonymous mount namespace.
This is conceptually fine. The new mount api does allow for this to
happen already via:
mount -t tmpfs tmpfs /mnt
mkdir -p /mnt/A
mount -t tmpfs tmpfs /mnt/A
fd_tree3 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE | AT_RECURSIVE)
fd_tree4 = open_tree(-EBADF, "/mnt/A", 0)
Both fd_tree3 and fd_tree4 refer to two different detached mount
trees but both detached mount trees refer to the same anonymous
mount namespace. An as with fd_tree1 and fd_tree2, only fd_tree3
may be moved another mount namespace as fd_tree3 refers to the root
of the anonymous mount namespace just while fd_tree4 doesn't.
However, there's an important difference between the
fd_tree3/fd_tree4 and the fd_tree1/fd_tree2 example.
Closing fd_tree4 and releasing the respective struct file will have
no further effect on fd_tree3's detached mount tree.
However, closing fd_tree3 will cause the mount tree and the
respective anonymous mount namespace to be destroyed causing the
detached mount tree of fd_tree4 to be invalid for further mounting.
By allowing to mount detached mounts on detached mounts as in the
fd_tree1/fd_tree2 example both struct files will affect each other.
Both fd_tree1 and fd_tree2 refer to struct files that have
FMODE_NEED_UNMOUNT set.
To handle this we use the fact that @fd_tree1 will have a parent
mount once it has been attached to @fd_tree2.
When dissolve_on_fput() is called the mount that has been passed in
will refer to the root of the anonymous mount namespace. If it
doesn't it would mean that mounts are leaked. So before allowing to
mount detached mounts onto detached mounts this would be a bug.
Now that detached mounts can be mounted onto detached mounts it
just means that the mount has been attached to another anonymous
mount namespace and thus dissolve_on_fput() must not unmount the
mount tree or free the anonymous mount namespace as the file
referring to the root of the namespace hasn't been closed yet.
If it had been closed yet it would be obvious because the mount
namespace would be NULL, i.e., the @fd_tree1 would have already
been unmounted. If @fd_tree1 hasn't been unmounted yet and has a
parent mount it is safe to skip any cleanup as closing @fd_tree2
will take care of all cleanup operations.
- Allow mount propagation for detached mount trees
In commit ee2e3f50629f ("mount: fix mounting of detached mounts
onto targets that reside on shared mounts") I fixed a bug where
propagating the source mount tree of an anonymous mount namespace
into a target mount tree of a non-anonymous mount namespace could
be used to trigger an integer overflow in the non-anonymous mount
namespace causing any new mounts to fail.
The cause of this was that the propagation algorithm was unable to
recognize mounts from the source mount tree that were already
propagated into the target mount tree and then reappeared as
propagation targets when walking the destination propagation mount
tree.
When fixing this I disabled mount propagation into anonymous mount
namespaces. Make it possible for anonymous mount namespace to
receive mount propagation events correctly. This is now also a
correctness issue now that we allow mounting detached mount trees
onto detached mount trees.
Mark the source anonymous mount namespace with MNTNS_PROPAGATING
indicating that all mounts belonging to this mount namespace are
currently in the process of being propagated and make the
propagation algorithm discard those if they appear as propagation
targets"
* tag 'vfs-6.15-rc1.mount.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
selftests: test subdirectory mounting
selftests: add test for detached mount tree propagation
fs: namespace: fix uninitialized variable use
mount: handle mount propagation for detached mount trees
fs: allow creating detached mounts from fsmount() file descriptors
selftests: seventh test for mounting detached mounts onto detached mounts
selftests: sixth test for mounting detached mounts onto detached mounts
selftests: fifth test for mounting detached mounts onto detached mounts
selftests: fourth test for mounting detached mounts onto detached mounts
selftests: third test for mounting detached mounts onto detached mounts
selftests: second test for mounting detached mounts onto detached mounts
selftests: first test for mounting detached mounts onto detached mounts
fs: mount detached mounts onto detached mounts
fs: support getname_maybe_null() in move_mount()
selftests: create detached mounts from detached mounts
fs: create detached mounts from detached mounts
fs: add may_copy_tree()
fs: add fastpath for dissolve_on_fput()
fs: add assert for move_mount()
fs: add mnt_ns_empty() helper
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs nsfs updates from Christian Brauner:
"This contains non-urgent fixes for nsfs to validate ioctls before
performing any relevant operations.
We alredy did this for a few other filesystems last cycle"
* tag 'vfs-6.15-rc1.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/nsfs: add ioctl validation tests
nsfs: validate ioctls
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs overlayfs updates from Christian Brauner:
"Currently overlayfs uses the mounter's credentials for its
override_creds() calls. That provides a consistent permission model.
This patches allows a caller to instruct overlayfs to use its
credentials instead. The caller must be located in the same user
namespace hierarchy as the user namespace the overlayfs instance will
be mounted in. This provides a consistent and simple security model.
With this it is possible to e.g., mount an overlayfs instance where
the mounter must have CAP_SYS_ADMIN but the credentials used for
override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage
of custom fs{g,u}id different from the callers and other tweaks"
* tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/ovl: add third selftest for "override_creds"
selftests/ovl: add second selftest for "override_creds"
selftests/filesystems: add utils.{c,h}
selftests/ovl: add first selftest for "override_creds"
ovl: allow to specify override credentials
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs pidfs updates from Christian Brauner:
- Allow retrieving exit information after a process has been reaped
through pidfds via the new PIDFD_INTO_EXIT extension for the
PIDFD_GET_INFO ioctl. Various tools need access to information about
a process/task even after it has already been reaped.
Pidfd polling allows waiting on either task exit or for a task to
have been reaped. The contract for PIDFD_INFO_EXIT is simply that
EPOLLHUP must be observed before exit information can be retrieved,
i.e., exit information is only provided once the task has been reaped
and then can be retrieved as long as the pidfd is open.
- Add PIDFD_SELF_{THREAD,THREAD_GROUP} sentinels allowing userspace to
forgo allocating a file descriptor for their own process. This is
useful in scenarios where users want to act on their own process
through pidfds and is akin to AT_FDCWD.
- Improve premature thread-group leader and subthread exec behavior
when polling on pidfds:
(1) During a multi-threaded exec by a subthread, i.e.,
non-thread-group leader thread, all other threads in the
thread-group including the thread-group leader are killed and the
struct pid of the thread-group leader will be taken over by the
subthread that called exec. IOW, two tasks change their TIDs.
(2) A premature thread-group leader exit means that the thread-group
leader exited before all of the other subthreads in the
thread-group have exited.
Both cases lead to inconsistencies for pidfd polling with
PIDFD_THREAD. Any caller that holds a PIDFD_THREAD pidfd to the
current thread-group leader may or may not see an exit notification
on the file descriptor depending on when poll is performed. If the
poll is performed before the exec of the subthread has concluded an
exit notification is generated for the old thread-group leader. If
the poll is performed after the exec of the subthread has concluded
no exit notification is generated for the old thread-group leader.
The correct behavior is to simply not generate an exit notification
on the struct pid of a subhthread exec because the struct pid is
taken over by the subthread and thus remains alive.
But this is difficult to handle because a thread-group may exit
premature as mentioned in (2). In that case an exit notification is
reliably generated but the subthreads may continue to run for an
indeterminate amount of time and thus also may exec at some point.
After this pull no exit notifications will be generated for a
PIDFD_THREAD pidfd for a thread-group leader until all subthreads
have been reaped. If a subthread should exec before no exit
notification will be generated until that task exits or it creates
subthreads and repeates the cycle.
This means an exit notification indicates the ability for the father
to reap the child.
* tag 'vfs-6.15-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits)
selftests/pidfd: third test for multi-threaded exec polling
selftests/pidfd: second test for multi-threaded exec polling
selftests/pidfd: first test for multi-threaded exec polling
pidfs: improve multi-threaded exec and premature thread-group leader exit polling
pidfs: ensure that PIDFS_INFO_EXIT is available
selftests/pidfd: add seventh PIDFD_INFO_EXIT selftest
selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest
selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
selftests/pidfd: add third PIDFD_INFO_EXIT selftest
selftests/pidfd: add second PIDFD_INFO_EXIT selftest
selftests/pidfd: add first PIDFD_INFO_EXIT selftest
selftests/pidfd: expand common pidfd header
pidfs/selftests: ensure correct headers for ioctl handling
selftests/pidfd: fix header inclusion
pidfs: allow to retrieve exit information
pidfs: record exit code and cgroupid at exit
pidfs: use private inode slab cache
pidfs: move setting flags into pidfs_alloc_file()
pidfd: rely on automatic cleanup in __pidfd_prepare()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- Mount notifications
The day has come where we finally provide a new api to listen for
mount topology changes outside of /proc/<pid>/mountinfo. A mount
namespace file descriptor can be supplied and registered with
fanotify to listen for mount topology changes.
Currently notifications for mount, umount and moving mounts are
generated. The generated notification record contains the unique
mount id of the mount.
The listmount() and statmount() api can be used to query detailed
information about the mount using the received unique mount id.
This allows userspace to figure out exactly how the mount topology
changed without having to generating diffs of /proc/<pid>/mountinfo
in userspace.
- Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
api
- Support detached mounts in overlayfs
Since last cycle we support specifying overlayfs layers via file
descriptors. However, we don't allow detached mounts which means
userspace cannot user file descriptors received via
open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
attach them to a mount namespace via move_mount() first.
This is cumbersome and means they have to undo mounts via umount().
Allow them to directly use detached mounts.
- Allow to retrieve idmappings with statmount
Currently it isn't possible to figure out what idmapping has been
attached to an idmapped mount. Add an extension to statmount() which
allows to read the idmapping from the mount.
- Allow creating idmapped mounts from mounts that are already idmapped
So far it isn't possible to allow the creation of idmapped mounts
from already idmapped mounts as this has significant lifetime
implications. Make the creation of idmapped mounts atomic by allow to
pass struct mount_attr together with the open_tree_attr() system call
allowing to solve these issues without complicating VFS lookup in any
way.
The system call has in general the benefit that creating a detached
mount and applying mount attributes to it becomes an atomic operation
for userspace.
- Add a way to query statmount() for supported options
Allow userspace to query which mount information can be retrieved
through statmount().
- Allow superblock owners to force unmount
* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
umount: Allow superblock owners to force umount
selftests: add tests for mount notification
selinux: add FILE__WATCH_MOUNTNS
samples/vfs: fix printf format string for size_t
fs: allow changing idmappings
fs: add kflags member to struct mount_kattr
fs: add open_tree_attr()
fs: add copy_mount_setattr() helper
fs: add vfs_open_tree() helper
statmount: add a new supported_mask field
samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
selftests: add tests for using detached mount with overlayfs
samples/vfs: check whether flag was raised
statmount: allow to retrieve idmappings
uidgid: add map_id_range_up()
fs: allow detached mounts in clone_private_mount()
selftests/overlayfs: test specifying layers as O_PATH file descriptors
fs: support O_PATH fds with FSCONFIG_SET_FD
vfs: add notifications for mount attach and detach
fanotify: notify on mount attach and detach
...
|
|
Add ublk stripe target which can take 1~4 underlying backing files
or block device, with stripe size 4k ~ 512K.
Add two basic tests(write verify & mkfs/mount/umount) over ublk/stripe.
This target is helpful to cover multiple IOs aiming at same
fixed/registered IO kernel buffer.
It is also capable of verifying vectored registered (kernel)buffers
in future for zero copy, so far it isn't supported yet.
Todo: support vectored registered kernel buffer for ublk/zc.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use the added target io handling helpers for simplifying loop io
completion.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Enable zero copy for null target so that we can evaluate performance
from zero copy or not.
Also this should be the simplest ublk zero copy implementation, which
can be served as zc example.
Add test for covering 'add -t null -z'.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
- pass 'truct dev_ctx *ctx' to target init function
- add 'private_data' to 'struct ublk_dev' for storing target specific data
- add 'private_data' to 'struct ublk_io' for storing per-IO data
- add 'tgt_ios' to 'struct ublk_io' for counting how many io_uring ios
for handling the current io command
- add helper ublk_get_io() for supporting stripe target
- add two helpers for simplifying target io handling
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move two functions for initializing & de-initializing backing file
into common.c.
Also move one common helper into kublk.h.
Prepare for supporting ublk-stripe.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Increase max buffer size to 1MB, and 64KB is too small to evaluate
performance with builtin ublk server implementation.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Unify the sqe allocator helper, and we will use it for supporting
more cases, such as ublk stripe, in which variable sqe allocation
is required.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
block layer, ublk and io_uring might re-order IO in the past
- plug
- queue ublk io command via task work
Add one test for verifying if sequential WRITE IO is dispatched in order.
- null target is taken, so we can just observe io order from
`tracepoint:block:block_rq_complete` which represents the dispatch order
- WRITE IO is taken because READ may come from system-wide utility
Cc: Uday Shankar <ushankar@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250322093218.431419-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
number is invalid
syzbot reported out-of-bounds read in check_atomic_load/store() when the
register number is invalid in this context:
https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c
To avoid the issue from now on, let's add tests where the register number
is invalid for load-acquire/store-release.
After discussion with Eduard, I decided to use R15 as invalid register
because the actual slab-out-of-bounds read issue occurs when the register
number is R12 or larger.
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250322045340.18010-6-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
syzbot reported the following splat [0].
In check_atomic_load/store(), register validity is not checked before
atomic_ptr_type_ok(). This causes the out-of-bounds read in is_ctx_reg()
called from atomic_ptr_type_ok() when the register number is MAX_BPF_REG
or greater.
Call check_load_mem()/check_store_reg() before atomic_ptr_type_ok()
to avoid the OOB read.
However, some tests introduced by commit ff3afe5da998 ("selftests/bpf: Add
selftests for load-acquire and store-release instructions") assume
calling atomic_ptr_type_ok() before checking register validity.
Therefore the swapping of order unintentionally changes verifier messages
of these tests.
For example in the test load_acquire_from_pkt_pointer(), expected message
is 'BPF_ATOMIC loads from R2 pkt is not allowed' although actual messages
are different.
validate_msgs:FAIL:754 expect_msg
VERIFIER LOG:
=============
Global function load_acquire_from_pkt_pointer() doesn't return scalar. Only those are supported.
0: R1=ctx() R10=fp0
; asm volatile ( @ verifier_load_acquire.c:140
0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx() R2_w=pkt(r=0)
1: (d3) r0 = load_acquire((u8 *)(r2 +0))
invalid access to packet, off=0 size=1, R2(id=0,off=0,r=0)
R2 offset is outside of the packet
processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
=============
EXPECTED SUBSTR: 'BPF_ATOMIC loads from R2 pkt is not allowed'
#505/19 verifier_load_acquire/load-acquire from pkt pointer:FAIL
This is because instructions in the test don't pass check_load_mem() and
therefore don't enter the atomic_ptr_type_ok() path.
In this case, we have to modify instructions so that they pass the
check_load_mem() and trigger atomic_ptr_type_ok().
Similarly for store-release tests, we need to modify instructions so that
they pass check_store_reg().
Like load_acquire_from_pkt_pointer(), modify instructions in:
load_acquire_from_sock_pointer()
store_release_to_ctx_pointer()
store_release_to_pkt_pointer()
Also in store_release_to_sock_pointer(), check_store_reg() returns error
early and atomic_ptr_type_ok() is not triggered, since write to sock
pointer is not possible in general.
We might be able to remove the test, but for now let's leave it and just
change the expected message.
[0]
BUG: KASAN: slab-out-of-bounds in is_ctx_reg kernel/bpf/verifier.c:6185 [inline]
BUG: KASAN: slab-out-of-bounds in atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223
Read of size 4 at addr ffff888141b0d690 by task syz-executor143/5842
CPU: 1 UID: 0 PID: 5842 Comm: syz-executor143 Not tainted 6.14.0-rc3-syzkaller-gf28214603dc6 #0
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0x16e/0x5b0 mm/kasan/report.c:521
kasan_report+0x143/0x180 mm/kasan/report.c:634
is_ctx_reg kernel/bpf/verifier.c:6185 [inline]
atomic_ptr_type_ok+0x3d7/0x550 kernel/bpf/verifier.c:6223
check_atomic_store kernel/bpf/verifier.c:7804 [inline]
check_atomic kernel/bpf/verifier.c:7841 [inline]
do_check+0x89dd/0xedd0 kernel/bpf/verifier.c:19334
do_check_common+0x1678/0x2080 kernel/bpf/verifier.c:22600
do_check_main kernel/bpf/verifier.c:22691 [inline]
bpf_check+0x165c8/0x1cca0 kernel/bpf/verifier.c:23821
Reported-by: syzbot+a5964227adc0f904549c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a5964227adc0f904549c
Tested-by: syzbot+a5964227adc0f904549c@syzkaller.appspotmail.com
Fixes: e24bbad29a8d ("bpf: Introduce load-acquire and store-release instructions")
Fixes: ff3afe5da998 ("selftests/bpf: Add selftests for load-acquire and store-release instructions")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250322045340.18010-5-enjuk@amazon.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
create_pagecache_thp_and_fd() was previously writing a file sized at twice
the PMD size by making a per-byte write syscall. This was quite slow when
the PMD size is 4M, but completely intolerable for 32M (PMD size for
arm64's 16K page size), and 512M (PMD size for arm64's 64K page size).
The byte pattern has a 256 byte period, so let's create a 1K buffer and
fill it with exactly 4 periods. Then we can write the buffer as many
times as is required to fill the file. This makes things much more
tolerable.
The test now passes for 16K page size. It still fails for 64K page size
because MAX_PAGECACHE_ORDER is too small for 512M folio size (I think).
Link: https://lkml.kernel.org/r/20250318174343.243631-3-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
uffd-unit-tests uses a memory area with a fixed 32M size. Then it
calculates the number of pages by dividing by page_size, which itself is
either the base page size or the PMD huge page size depending on the test
config. For the latter, we end up with nr_pages=1 for arm64 16K base
pages, and nr_pages=0 for 64K base pages. This doesn't end well.
So let's make the 32M size a floor and also ensure that we have at least 2
pages given the PMD size. With this change, the tests pass on arm64 64K
base page size configuration.
Link: https://lkml.kernel.org/r/20250318174343.243631-2-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Rafael Aquini <raquini@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As discussed here:
https://lore.kernel.org/lkml/Z9RRkL1hom48z3Tt@google.com/
This code could benefit from some more commentary.
To avoid needing to comment the same thing in multiple places (I guess
more of these SKIPs will need to be added over time, for now I am only
like 20% of the way through Project Run run_vmtests.sh Successfully), add
a dummy "skip tests for this specific reason" function that basically just
serves as a hook to hang comments on.
Link: https://lkml.kernel.org/r/20250317-9pfs-comments-v1-1-9ac96043e146@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Firstly ublk char device node may not be created by udev yet, so wait
a while until it can be opened or timeout.
Secondly delete created ublk device in case of start failure, otherwise
the device becomes zombie.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250321135324.259677-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Lei Chen reported a bug with CLOCK_MONOTONIC_COARSE having inconsistencies
when NTP is adjusting the clock frequency.
This has gone seemingly undetected for ~15 years, illustrating a clear gap
in our testing.
The skew_consistency test is intended to catch this sort of problem, but
was focused on only evaluating CLOCK_MONOTONIC, and thus missed the problem
on CLOCK_MONOTONIC_COARSE.
So adjust the test to run with all clockids for 60 seconds each instead of
10 minutes with just CLOCK_MONOTONIC.
Reported-by: Lei Chen <lei.chen@smartx.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250320200306.1712599-2-jstultz@google.com
Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/
|
|
Expands the self-tests to include the 'release' feature in
sysdata.
Verifies that enabling the 'release' feature appends the
correct data and ensures that disabling it functions as expected.
When enabled, the message should have an item similar to in the
userdata: `release=$(uname -r)`
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-netcons_release-v1-5-07979c4b86af@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Some fixes may require user space to check if they are applied on the
running kernel before using a specific feature. For instance, this
applies when a restriction was previously too restrictive and is now
getting relaxed (e.g. for compatibility reasons). However, non-visible
changes for legitimate use (e.g. security fixes) do not require an
erratum.
Because fixes are backported down to a specific Landlock ABI, we need a
way to avoid cherry-pick conflicts. The solution is to only update a
file related to the lower ABI impacted by this issue. All the ABI files
are then used to create a bitmask of fixes.
The new errata interface is similar to the one used to get the supported
Landlock ABI version, but it returns a bitmask instead because the order
of fixes may not match the order of versions, and not all fixes may
apply to all versions.
The actual errata will come with dedicated commits. The description is
not actually used in the code but serves as documentation.
Create the landlock_abi_version symbol and use its value to check errata
consistency.
Update test_base's create_ruleset_checks_ordering tests and add errata
tests.
This commit is backportable down to the first version of Landlock.
Fixes: 3532b0b4352c ("landlock: Enable user space to infer supported features")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
For loop target, write cache isn't enabled, and each write isn't be
marked as DSYNC too.
Fix it by enabling write cache, meantime fix FLUSH implementation
by not taking LBA range into account, and there isn't such info
for FLUSH command.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250321004758.152572-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some user decides test result by exit code only, and wouldn't like to be
bothered by the test result.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250320013743.4167489-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ublk_drv may be built-in, so don't show modprobe failure, and we
do check `/dev/ublk-control` for skipping test if ublk_drv isn't
enabled.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250320013743.4167489-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add one dependency helper which can include new uapi definition which
isn't synced from kernel.
This way also helps a lot for downstream test deployment.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250320013743.4167489-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-03-13
The following pull-request contains BPF updates for your *net-next* tree.
We've added 4 non-merge commits during the last 3 day(s) which contain
a total of 2 files changed, 35 insertions(+), 12 deletions(-).
The main changes are:
1) bpf_getsockopt support for TCP_BPF_RTO_MIN and TCP_BPF_DELACK_MAX,
from Jason Xing
bpf-next-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Add bpf_getsockopt() for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN
tcp: bpf: Support bpf_getsockopt for TCP_BPF_DELACK_MAX
tcp: bpf: Support bpf_getsockopt for TCP_BPF_RTO_MIN
tcp: bpf: Introduce bpf_sol_tcp_getsockopt to support TCP_BPF flags
====================
Link: https://patch.msgid.link/20250313221620.2512684-1-martin.lau@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.14-rc8).
Conflict:
tools/testing/selftests/net/Makefile
03544faad761 ("selftest: net: add proc_net_pktgen")
3ed61b8938c6 ("selftests: net: test for lwtunnel dst ref loops")
tools/testing/selftests/net/config:
85cb3711acb8 ("selftests: net: Add test cases for link and peer netns")
3ed61b8938c6 ("selftests: net: test for lwtunnel dst ref loops")
Adjacent commits:
tools/testing/selftests/net/Makefile
c935af429ec2 ("selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY")
355d940f4d5a ("Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
There are scenarios where env.{sub,}test_state->stdout_saved, can be
NULL, e.g. sometimes when the watchdog timeout kicks in, or if the
open_memstream syscall is not available.
Avoid crashing test_progs by adding an explicit NULL check prior the
fclose() call.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250318081648.122523-1-bjorn@kernel.org
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.15
- Nested virtualization support for VGICv3, giving the nested
hypervisor control of the VGIC hardware when running an L2 VM
- Removal of 'late' nested virtualization feature register masking,
making the supported feature set directly visible to userspace
- Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers
- Paravirtual interface for discovering the set of CPU implementations
where a VM may run, addressing a longstanding issue of guest CPU
errata awareness in big-little systems and cross-implementation VM
migration
- Userspace control of the registers responsible for identifying a
particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
allowing VMs to be migrated cross-implementation
- pKVM updates, including support for tracking stage-2 page table
allocations in the protected hypervisor in the 'SecPageTable' stat
- Fixes to vPMU, ensuring that userspace updates to the vPMU after
KVM_RUN are reflected into the backing perf events
|
|
KVM/riscv changes for 6.15
- Disable the kernel perf counter during configure
- KVM selftests improvements for PMU
- Fix warning at the time of KVM module removal
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, bluetooth and ipsec.
This contains a last minute revert of a recent GRE patch, mostly to
allow me stating there are no known regressions outstanding.
Current release - regressions:
- revert "gre: Fix IPv6 link-local address generation."
- eth: ti: am65-cpsw: fix NAPI registration sequence
Previous releases - regressions:
- ipv6: fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
- mptcp: fix data stream corruption in the address announcement
- bluetooth: fix connection regression between LE and non-LE adapters
- can:
- flexcan: only change CAN state when link up in system PM
- ucan: fix out of bound read in strscpy() source
Previous releases - always broken:
- lwtunnel: fix reentry loops
- ipv6: fix TCP GSO segmentation with NAT
- xfrm: force software GSO only in tunnel mode
- eth: ti: icssg-prueth: add lock to stats
Misc:
- add Andrea Mayer as a maintainer of SRv6"
* tag 'net-6.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
MAINTAINERS: Add Andrea Mayer as a maintainer of SRv6
Revert "gre: Fix IPv6 link-local address generation."
Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."
net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES
tools headers: Sync uapi/asm-generic/socket.h with the kernel sources
mptcp: Fix data stream corruption in the address announcement
selftests: net: test for lwtunnel dst ref loops
net: ipv6: ioam6: fix lwtunnel_output() loop
net: lwtunnel: fix recursion loops
net: ti: icssg-prueth: Add lock to stats
net: atm: fix use after free in lec_send()
xsk: fix an integer overflow in xp_create_and_assign_umem()
net: stmmac: dwc-qos-eth: use devm_kzalloc() for AXI data
selftests: drv-net: use defer in the ping test
phy: fix xa_alloc_cyclic() error handling
dpll: fix xa_alloc_cyclic() error handling
devlink: fix xa_alloc_cyclic() error handling
ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create().
ipv6: Fix memleak of nhc_pcpu_rth_output in fib_check_nh_v6_gw().
net: ipv6: fix TCP GSO segmentation with NAT
...
|
|
devices."
This reverts commit 6f50175ccad4278ed3a9394c00b797b75441bd6e.
Commit 183185a18ff9 ("gre: Fix IPv6 link-local address generation.") is
going to be reverted. So let's revert the corresponding kselftest
first.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/259a9e98f7f1be7ce02b53d0b4afb7c18a8ff747.1742418408.git.gnault@redhat.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Ensure that during a multi-threaded exec and premature thread-group
leader exit no exit notification is generated.
Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-4-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Ensure that during a multi-threaded exec and premature thread-group
leader exit no exit notification is generated.
Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-3-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add first test for premature thread-group leader exit.
Link: https://lore.kernel.org/r/20250320-work-pidfs-thread_group-v4-2-da678ce805bf@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
As recently specified by commit 0ea09cbf8350 ("docs: netdev: add a note
on selftest posting") in net-next, the selftest is therefore shipped in
this series. However, this selftest does not really test this series. It
needs this series to avoid crashing the kernel. What it really tests,
thanks to kmemleak, is what was fixed by the following commits:
- commit c71a192976de ("net: ipv6: fix dst refleaks in rpl, seg6 and
ioam6 lwtunnels")
- commit 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and
ioam6 lwtunnels")
- commit c64a0727f9b1 ("net: ipv6: fix dst ref loop on input in seg6
lwt")
- commit 13e55fbaec17 ("net: ipv6: fix dst ref loop on input in rpl
lwt")
- commit 0e7633d7b95b ("net: ipv6: fix dst ref loop in ila lwtunnel")
- commit 5da15a9c11c1 ("net: ipv6: fix missing dst ref drop in ila
lwtunnel")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-4-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch checks if the newly added net.mptcp.path_manager is mapped
successfully from or to the old net.mptcp.pm_type in userspace_pm.sh.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-12-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
test_xdp_vlan.sh isn't used by the BPF CI.
Migrate test_xdp_vlan.sh in prog_tests/xdp_vlan.c.
It uses the same BPF programs located in progs/test_xdp_vlan.c and the
same network topology.
Remove test_xdp_vlan*.sh and their Makefile entries.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250221-xdp_vlan-v1-2-7d29847169af@bootlin.com/
|
|
The __load() helper expects BPF sections to be names 'xdp' or 'tc'
Rename BPF sections so they can be loaded with the __load() helper in
upcoming patch.
Rename the BPF functions with their previous section's name.
Update the 'ip link' commands in the script to use the program name
instead of the section name to load the BPF program.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250221-xdp_vlan-v1-1-7d29847169af@bootlin.com
|
|
* kvm-arm64/writable-midr:
: Writable implementation ID registers, courtesy of Sebastian Ott
:
: Introduce a new capability that allows userspace to set the
: ID registers that identify a CPU implementation: MIDR_EL1, REVIDR_EL1,
: and AIDR_EL1. Also plug a hole in KVM's trap configuration where
: SMIDR_EL1 was readable at EL1, despite the fact that KVM does not
: support SME.
KVM: arm64: Fix documentation for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
KVM: arm64: Copy MIDR_EL1 into hyp VM when it is writable
KVM: arm64: Copy guest CTR_EL0 into hyp VM
KVM: selftests: arm64: Test writes to MIDR,REVIDR,AIDR
KVM: arm64: Allow userspace to change the implementation ID registers
KVM: arm64: Load VPIDR_EL2 with the VM's MIDR_EL1 value
KVM: arm64: Maintain per-VM copy of implementation ID regs
KVM: arm64: Set HCR_EL2.TID1 unconditionally
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|