Age | Commit message (Collapse) | Author |
|
The panic can be reproduced by executing the command:
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress --rx-strp 100000
Then a kernel panic was captured:
'''
[ 657.460555] kernel BUG at net/core/skbuff.c:2178!
[ 657.462680] Tainted: [W]=WARN
[ 657.463287] Workqueue: events sk_psock_backlog
...
[ 657.469610] <TASK>
[ 657.469738] ? die+0x36/0x90
[ 657.469916] ? do_trap+0x1d0/0x270
[ 657.470118] ? pskb_expand_head+0x612/0xf40
[ 657.470376] ? pskb_expand_head+0x612/0xf40
[ 657.470620] ? do_error_trap+0xa3/0x170
[ 657.470846] ? pskb_expand_head+0x612/0xf40
[ 657.471092] ? handle_invalid_op+0x2c/0x40
[ 657.471335] ? pskb_expand_head+0x612/0xf40
[ 657.471579] ? exc_invalid_op+0x2d/0x40
[ 657.471805] ? asm_exc_invalid_op+0x1a/0x20
[ 657.472052] ? pskb_expand_head+0xd1/0xf40
[ 657.472292] ? pskb_expand_head+0x612/0xf40
[ 657.472540] ? lock_acquire+0x18f/0x4e0
[ 657.472766] ? find_held_lock+0x2d/0x110
[ 657.472999] ? __pfx_pskb_expand_head+0x10/0x10
[ 657.473263] ? __kmalloc_cache_noprof+0x5b/0x470
[ 657.473537] ? __pfx___lock_release.isra.0+0x10/0x10
[ 657.473826] __pskb_pull_tail+0xfd/0x1d20
[ 657.474062] ? __kasan_slab_alloc+0x4e/0x90
[ 657.474707] sk_psock_skb_ingress_enqueue+0x3bf/0x510
[ 657.475392] ? __kasan_kmalloc+0xaa/0xb0
[ 657.476010] sk_psock_backlog+0x5cf/0xd70
[ 657.476637] process_one_work+0x858/0x1a20
'''
The panic originates from the assertion BUG_ON(skb_shared(skb)) in
skb_linearize(). A previous commit(see Fixes tag) introduced skb_get()
to avoid race conditions between skb operations in the backlog and skb
release in the recvmsg path. However, this caused the panic to always
occur when skb_linearize is executed.
The "--rx-strp 100000" parameter forces the RX path to use the strparser
module which aggregates data until it reaches 100KB before calling sockmap
logic. The 100KB payload exceeds MAX_MSG_FRAGS, triggering skb_linearize.
To fix this issue, just move skb_get into sk_psock_skb_ingress_enqueue.
'''
sk_psock_backlog:
sk_psock_handle_skb
skb_get(skb) <== we move it into 'sk_psock_skb_ingress_enqueue'
sk_psock_skb_ingress____________
↓
|
| → sk_psock_skb_ingress_self
| sk_psock_skb_ingress_enqueue
sk_psock_verdict_apply_________________↑ skb_linearize
'''
Note that for verdict_apply path, the skb_get operation is unnecessary so
we add 'take_ref' param to control it's behavior.
Fixes: a454d84ee20b ("bpf, sockmap: Fix skb refcnt race after locking changes")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250407142234.47591-4-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In the !ingress path under sk_psock_handle_skb(), when sending data to the
remote under snd_buf limitations, partial skb data might be transmitted.
Although we preserved the partial transmission state (offset/length), the
state wasn't properly consumed during retries. This caused the retry path
to resend the entire skb data instead of continuing from the previous
offset, resulting in data overlap at the receiver side.
Fixes: 405df89dd52c ("bpf, sockmap: Improved check for empty queue")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250407142234.47591-3-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We call skb_bpf_redirect_clear() to clean _sk_redir before handling skb in
backlog, but when sk_psock_handle_skb() return EAGAIN due to sk_rcvbuf
limit, the redirect info in _sk_redir is not recovered.
Fix skb redir loss during EAGAIN retries by restoring _sk_redir
information using skb_bpf_set_redir().
Before this patch:
'''
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
Setting up benchmark 'sockmap'...
create socket fd c1:13 p1:14 c2:15 p2:16
Benchmark 'sockmap' started.
Send Speed 1343.172 MB/s, BPF Speed 1343.238 MB/s, Rcv Speed 65.271 MB/s
Send Speed 1352.022 MB/s, BPF Speed 1352.088 MB/s, Rcv Speed 0 MB/s
Send Speed 1354.105 MB/s, BPF Speed 1354.105 MB/s, Rcv Speed 0 MB/s
Send Speed 1355.018 MB/s, BPF Speed 1354.887 MB/s, Rcv Speed 0 MB/s
'''
Due to the high send rate, the RX processing path may frequently hit the
sk_rcvbuf limit. Once triggered, incorrect _sk_redir will cause the flow
to mistakenly enter the "!ingress" path, leading to send failures.
(The Rcv speed depends on tcp_rmem).
After this patch:
'''
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
Setting up benchmark 'sockmap'...
create socket fd c1:13 p1:14 c2:15 p2:16
Benchmark 'sockmap' started.
Send Speed 1347.236 MB/s, BPF Speed 1347.367 MB/s, Rcv Speed 65.402 MB/s
Send Speed 1353.320 MB/s, BPF Speed 1353.320 MB/s, Rcv Speed 65.536 MB/s
Send Speed 1353.186 MB/s, BPF Speed 1353.121 MB/s, Rcv Speed 65.536 MB/s
'''
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250407142234.47591-2-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Jiayuan Chen says:
====================
bpf: fix ktls panic with sockmap and add tests
We can reproduce the issue using the existing test program:
'./test_sockmap --ktls'
Or use the selftest I provided, which will cause a panic:
------------[ cut here ]------------
kernel BUG at lib/iov_iter.c:629!
PKRU: 55555554
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? iov_iter_revert+0x178/0x180
? iov_iter_revert+0x178/0x180
? do_error_trap+0x7d/0x110
? iov_iter_revert+0x178/0x180
? exc_invalid_op+0x50/0x70
? iov_iter_revert+0x178/0x180
? asm_exc_invalid_op+0x1a/0x20
? iov_iter_revert+0x178/0x180
? iov_iter_revert+0x5c/0x180
tls_sw_sendmsg_locked.isra.0+0x794/0x840
tls_sw_sendmsg+0x52/0x80
? inet_sendmsg+0x1f/0x70
__sys_sendto+0x1cd/0x200
? find_held_lock+0x2b/0x80
? syscall_trace_enter+0x140/0x270
? __lock_release.isra.0+0x5e/0x170
? find_held_lock+0x2b/0x80
? syscall_trace_enter+0x140/0x270
? lockdep_hardirqs_on_prepare+0xda/0x190
? ktime_get_coarse_real_ts64+0xc2/0xd0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x90/0x170
1. It looks like the issue started occurring after bpf being introduced to
ktls and later the addition of assertions to iov_iter has caused a panic.
If my fix tag is incorrect, please assist me in correcting the fix tag.
2. I make minimal changes for now, it's enough to make ktls work
correctly.
---
v1->v2: Added more content to the commit message
https://lore.kernel.org/all/20250123171552.57345-1-mrpre@163.com/#r
---
====================
Link: https://patch.msgid.link/20250219052015.274405-1-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
add ktls selftest for sockmap
Test results:
sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls disconnect_after_delete IPv4 SOCKMAP:OK
sockmap_ktls/sockmap_ktls update_fails_when_sock_has_ulp IPv4 SOCKMAP:OK
sockmap_ktls/tls simple offload:OK
sockmap_ktls/tls tx cork:OK
sockmap_ktls/tls tx cork with push:OK
sockmap_ktls/tls simple offload:OK
sockmap_ktls/tls tx cork:OK
sockmap_ktls/tls tx cork with push:OK
sockmap_ktls:OK
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250219052015.274405-3-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
[ 2172.936997] ------------[ cut here ]------------
[ 2172.936999] kernel BUG at lib/iov_iter.c:629!
......
[ 2172.944996] PKRU: 55555554
[ 2172.945155] Call Trace:
[ 2172.945299] <TASK>
[ 2172.945428] ? die+0x36/0x90
[ 2172.945601] ? do_trap+0xdd/0x100
[ 2172.945795] ? iov_iter_revert+0x178/0x180
[ 2172.946031] ? iov_iter_revert+0x178/0x180
[ 2172.946267] ? do_error_trap+0x7d/0x110
[ 2172.946499] ? iov_iter_revert+0x178/0x180
[ 2172.946736] ? exc_invalid_op+0x50/0x70
[ 2172.946961] ? iov_iter_revert+0x178/0x180
[ 2172.947197] ? asm_exc_invalid_op+0x1a/0x20
[ 2172.947446] ? iov_iter_revert+0x178/0x180
[ 2172.947683] ? iov_iter_revert+0x5c/0x180
[ 2172.947913] tls_sw_sendmsg_locked.isra.0+0x794/0x840
[ 2172.948206] tls_sw_sendmsg+0x52/0x80
[ 2172.948420] ? inet_sendmsg+0x1f/0x70
[ 2172.948634] __sys_sendto+0x1cd/0x200
[ 2172.948848] ? find_held_lock+0x2b/0x80
[ 2172.949072] ? syscall_trace_enter+0x140/0x270
[ 2172.949330] ? __lock_release.isra.0+0x5e/0x170
[ 2172.949595] ? find_held_lock+0x2b/0x80
[ 2172.949817] ? syscall_trace_enter+0x140/0x270
[ 2172.950211] ? lockdep_hardirqs_on_prepare+0xda/0x190
[ 2172.950632] ? ktime_get_coarse_real_ts64+0xc2/0xd0
[ 2172.951036] __x64_sys_sendto+0x24/0x30
[ 2172.951382] do_syscall_64+0x90/0x170
......
After calling bpf_exec_tx_verdict(), the size of msg_pl->sg may increase,
e.g., when the BPF program executes bpf_msg_push_data().
If the BPF program sets cork_bytes and sg.size is smaller than cork_bytes,
it will return -ENOSPC and attempt to roll back to the non-zero copy
logic. However, during rollback, msg->msg_iter is reset, but since
msg_pl->sg.size has been increased, subsequent executions will exceed the
actual size of msg_iter.
'''
iov_iter_revert(&msg->msg_iter, msg_pl->sg.size - orig_size);
'''
The changes in this commit are based on the following considerations:
1. When cork_bytes is set, rolling back to non-zero copy logic is
pointless and can directly go to zero-copy logic.
2. We can not calculate the correct number of bytes to revert msg_iter.
Assume the original data is "abcdefgh" (8 bytes), and after 3 pushes
by the BPF program, it becomes 11-byte data: "abc?de?fgh?".
Then, we set cork_bytes to 6, which means the first 6 bytes have been
processed, and the remaining 5 bytes "?fgh?" will be cached until the
length meets the cork_bytes requirement.
However, some data in "?fgh?" is not within 'sg->msg_iter'
(but in msg_pl instead), especially the data "?" we pushed.
So it doesn't seem as simple as just reverting through an offset of
msg_iter.
3. For non-TLS sockets in tcp_bpf_sendmsg, when a "cork" situation occurs,
the user-space send() doesn't return an error, and the returned length is
the same as the input length parameter, even if some data is cached.
Additionally, I saw that the current non-zero-copy logic for handling
corking is written as:
'''
line 1177
else if (ret != -EAGAIN) {
if (ret == -ENOSPC)
ret = 0;
goto send_end;
'''
So it's ok to just return 'copied' without error when a "cork" situation
occurs.
Fixes: fcb14cb1bdac ("new iov_iter flavour - ITER_UBUF")
Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20250219052015.274405-2-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
For systems with missing iptables-legacy tool this selftest fails.
Add check to find if iptables-legacy tool is available and skip the
test if the tool is missing.
Fixes: de9c8d848d90 ("selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test")
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250409095633.33653-1-skb99@linux.ibm.com
|
|
The link_create.flags are currently not used for multi-uprobes, so return
-EINVAL if it is set, same as for other attach APIs.
We allow target_fd to have an arbitrary value for multi-uprobe, though,
as there are existing users (libbpf) relying on this.
Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250407035752.1108927-2-chen.dylane@linux.dev
|
|
The link_create.flags are currently not used for multi-kprobes, so return
-EINVAL if it is set, same as for other attach APIs.
We allow target_fd, on the other hand, to have an arbitrary value for
multi-kprobe, as there are existing users (libbpf) relying on this.
Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250407035752.1108927-1-chen.dylane@linux.dev
|
|
Mykyta Yatsenko says:
====================
libbpf: introduce line_info and func_info getters
From: Mykyta Yatsenko <yatsenko@meta.com>
This patchset introduces new libbpf API getters that enable the retrieval
of .BTF.ext line and func info.
This change enables users to load bpf_program directly using bpf_prog_load,
bypassing the higher-level bpf_object__load API. Providing line and
function info is essential for BPF program verification in some cases.
v3 -> v5
* Fix tests on s390x, nits.
v2 -> v3
* Return ENOTSUPP if func or line info struct size differs from the one in
uapi linux headers.
* Add selftests.
v1 -> v2
Move bpf_line_info_min and bpf_func_info_min from libbpf_internal.h to
btf.h. Did not remove _min suffix, because there already are bpf_line_info
and bpf_func_info structs in uapi/../bpf.h.
====================
Link: https://patch.msgid.link/20250408234417.452565-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Add selftests checking that line and func info retrieved by newly added
libbpf APIs are the same as returned by kernel via bpf_prog_get_info_by_fd.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250408234417.452565-3-mykyta.yatsenko5@gmail.com
|
|
Introducing new libbpf API getters for BTF.ext func and line info,
namely:
bpf_program__func_info
bpf_program__func_info_cnt
bpf_program__line_info
bpf_program__line_info_cnt
This change enables scenarios, when user needs to load bpf_program
directly using `bpf_prog_load`, instead of higher-level
`bpf_object__load`. Line and func info are required for checking BTF
info in verifier; verification may fail without these fields if, for
example, program calls `bpf_obj_new`.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250408234417.452565-2-mykyta.yatsenko5@gmail.com
|
|
Extend commit e3c9abd0d14b ("selftests/bpf: Implement setting global
variables in veristat") to support applying presets to members of
the global structs or unions in veristat.
For example:
```
./veristat set_global_vars.bpf.o -G "union1.struct3.var_u8_h = 0xBB"
```
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250408104544.140317-1-mykyta.yatsenko5@gmail.com
|
|
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/bpf/20250401061546.1990156-1-nichen@iscas.ac.cn
|
|
Anton Protopopov says:
====================
likely/unlikely for bpf_helpers and a small comment fix
These two commits fix a comment describing bpf_attr in <linux/bpf.h>
and add likely/unlikely macros to <bph/bpf_helpers.h> to be consumed
by selftests and, later, by the static_branch_likely/unlikely macros.
v1 -> v2:
* squash libbpf and selftests fixes into one patch (Andrii)
====================
Link: https://patch.msgid.link/20250331203618.1973691-1-a.s.protopopov@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
A few selftests and, more importantly, consequent changes to the
bpf_helpers.h file, use likely/unlikely macros, so define them here
and remove duplicate definitions from existing selftests.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250331203618.1973691-3-a.s.protopopov@gmail.com
|
|
The map_fd field of the bpf_attr union is used in the BPF_MAP_FREEZE
syscall. Explicitly mention this in the comments.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250331203618.1973691-2-a.s.protopopov@gmail.com
|
|
Since memfd_create() is not consistently available across different
bionic libc implementations, using memfd_create() directly can break
some Android builds:
tools/lib/bpf/linker.c:576:7: error: implicit declaration of function 'memfd_create' [-Werror,-Wimplicit-function-declaration]
576 | fd = memfd_create(filename, 0);
| ^
To fix this, relocate and inline the sys_memfd_create() helper so that
it can be used in "linker.c". Similar issues were previously fixed by
commit 9fa5e1a180aa ("libbpf: Call memfd_create() syscall directly").
Fixes: 6d5e5e5d7ce1 ("libbpf: Extend linker API to support in-memory ELF files")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250330211325.530677-1-cmllamas@google.com
|
|
Pull smb server fixes from Steve French:
"Four ksmbd SMB3 server fixes, all also for stable"
* tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix null pointer dereference in alloc_preauth_hash()
ksmbd: validate zero num_subauth before sub_auth is accessed
ksmbd: fix overflow in dacloffset bounds check
ksmbd: fix session use-after-free in multichannel connection
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt:
"Persistent buffer cleanups and simplifications.
It was mistaken that the physical memory returned from "reserve_mem"
had to be vmap()'d to get to it from a virtual address. But
reserve_mem already maps the memory to the virtual address of the
kernel so a simple phys_to_virt() can be used to get to the virtual
address from the physical memory returned by "reserve_mem". With this
new found knowledge, the code can be cleaned up and simplified.
- Enforce that the persistent memory is page aligned
As the buffers using the persistent memory are all going to be
mapped via pages, make sure that the memory given to the tracing
infrastructure is page aligned. If it is not, it will print a
warning and fail to map the buffer.
- Use phys_to_virt() to get the virtual address from reserve_mem
Instead of calling vmap() on the physical memory returned from
"reserve_mem", use phys_to_virt() instead.
As the memory returned by "memmap" or any other means where a
physical address is given to the tracing infrastructure, it still
needs to be vmap(). Since this memory can never be returned back to
the buddy allocator nor should it ever be memmory mapped to user
space, flag this buffer and up the ref count. The ref count will
keep it from ever being freed, and the flag will prevent it from
ever being memory mapped to user space.
- Use vmap_page_range() for memmap virtual address mapping
For the memmap buffer, instead of allocating an array of struct
pages, assigning them to the contiguous phsycial memory and then
passing that to vmap(), use vmap_page_range() instead
- Replace flush_dcache_folio() with flush_kernel_vmap_range()
Instead of calling virt_to_folio() and passing that to
flush_dcache_folio(), just call flush_kernel_vmap_range() directly.
This also fixes a bug where if a subbuffer was bigger than
PAGE_SIZE only the PAGE_SIZE portion would be flushed"
* tag 'trace-ringbuffer-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Use flush_kernel_vmap_range() over flush_dcache_folio()
tracing: Use vmap_page_range() to map memmap ring buffer
tracing: Have reserve_mem use phys_to_virt() and separate from memmap buffer
tracing: Enforce the persistent ring buffer to be page aligned
|
|
Pull more block updates from Jens Axboe:
- NVMe pull request via Keith:
- PCI endpoint target cleanup (Damien)
- Early import for uring_cmd fixed buffer (Caleb)
- Multipath documentation and notification improvements (John)
- Invalid pci sq doorbell write fix (Maurizio)
- Queue init locking fix
- Remove dead nsegs parameter from blk_mq_get_new_requests()
* tag 'block-6.15-20250403' of git://git.kernel.dk/linux:
block: don't grab elevator lock during queue initialization
nvme-pci: skip nvme_write_sq_db on empty rqlist
nvme-multipath: change the NVME_MULTIPATH config option
nvme: update the multipath warning in nvme_init_ns_head
nvme/ioctl: move fixed buffer lookup to nvme_uring_cmd_io()
nvme/ioctl: move blk_mq_free_request() out of nvme_map_user_request()
nvme/ioctl: don't warn on vectorized uring_cmd with fixed buffer
nvmet: pci-epf: Keep completion queues mapped
block: remove unused nseg parameter
|
|
Pull more io_uring updates from Jens Axboe:
"Set of fixes/updates for io_uring that should go into this release.
The ublk bits could've gone via either tree - usually I put them in
block, but they got a bit mixed this series with the zero-copy
supported that ended up dipping into both trees.
This contains:
- Fix for sendmsg zc, include in pinned pages accounting like we do
for the other zc types
- Series for ublk fixing request aborting, doing various little
cleanups, fixing some zc issues, and adding queue_rqs support
- Another ublk series doing some code cleanups
- Series cleaning up the io_uring send path, mostly in preparation
for registered buffers
- Series doing little MSG_RING cleanups
- Fix for the newly added zc rx, fixing len being 0 for the last
invocation of the callback
- Add vectored registered buffer support for ublk. With that, then
ublk also supports this feature in the kernel revision where it
could generically introduced for rw/net
- A bunch of selftest additions for ublk. This is the majority of the
diffstat
- Silence a KCSAN data race warning for io-wq
- Various little cleanups and fixes"
* tag 'io_uring-6.15-20250403' of git://git.kernel.dk/linux: (44 commits)
io_uring: always do atomic put from iowq
selftests: ublk: enable zero copy for stripe target
io_uring: support vectored kernel fixed buffer
block: add for_each_mp_bvec()
io_uring: add validate_fixed_range() for validate fixed buffer
selftests: ublk: kublk: fix an error log line
selftests: ublk: kublk: use ioctl-encoded opcodes
io_uring/zcrx: return early from io_zcrx_recv_skb if readlen is 0
io_uring/net: avoid import_ubuf for regvec send
io_uring/rsrc: check size when importing reg buffer
io_uring: cleanup {g,s]etsockopt sqe reading
io_uring: hide caches sqes from drivers
io_uring: make zcrx depend on CONFIG_IO_URING
io_uring: add req flag invariant build assertion
Documentation: ublk: remove dead footnote
selftests: ublk: specify io_cmd_buf pointer type
ublk: specify io_cmd_buf pointer type
io_uring: don't pass ctx to tw add remote helper
io_uring/msg: initialise msg request opcode
io_uring/msg: rename io_double_lock_ctx()
...
|
|
Don't use a scoped guard that only protects the next statement.
Use a regular guard to make sure that the namespace semaphore is held
across the whole function.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reported-by: Leon Romanovsky <leon@kernel.org>
Link: https://lore.kernel.org/all/20250401170715.GA112019@unreal/
Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull more bcachefs updates from Kent Overstreet:
"More notable fixes:
- Fix for striping behaviour on tiering filesystems where replicas
exceeds durability on destination target
- Fix a race in device removal where deleting alloc info races with
the discard worker
- Some small stack usage improvements: this is just enough for KMSAN
builds to not blow the stack, more is queued up for 6.16"
* tag 'bcachefs-2025-04-03' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix "journal stuck" during recovery
bcachefs: backpointer_get_key: check for null from peek_slot()
bcachefs: Fix null ptr deref in invalidate_one_bucket()
bcachefs: Fix check_snapshot_exists() restart handling
bcachefs: use nonblocking variant of print_string_as_lines in error path
bcachefs: Fix scheduling while atomic from logging changes
bcachefs: Add error handling for zlib_deflateInit2()
bcachefs: add missing selection of XARRAY_MULTI
bcachefs: bch_dev_usage_full
bcachefs: Kill btree_iter.trans
bcachefs: do_trace_key_cache_fill()
bcachefs: Split up bch_dev.io_ref
bcachefs: fix ref leak in btree_node_read_all_replicas
bcachefs: Fix null ptr deref in bch2_write_endio()
bcachefs: Fix field spanning write warning
bcachefs: Fix striping behaviour
|
|
Pull 9p updates from Dominique Martinet:
- fix handling of bogus (negative/too long) replies
- fix crash on mkdir with ACLs (... looks like nobody is using ACLs
with semi-recent kernels...)
- ipv6 support for trans=tcp
- minor concurrency fix to make syzbot happy
- minor cleanup
* tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux:
docs: fs/9p: Add missing "not" in cache documentation
9p: Use hashtable.h for hash_errmap
Documentation/fs/9p: fix broken link
9p/trans_fd: mark concurrent read and writes to p9_conn->err
9p/net: return error on bogus (longer than requested) replies
9p/net: fix improper handling of bogus negative read/write replies
fs/9p: fix NULL pointer dereference on mkdir
net/9p/fd: support ipv6 for trans=tcp
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"We see a net reduction of the number of lines of code thanks to the
removal of a now unused driver and a testing tool that is not used
anymore. Apart from this, the max31335 driver gets support for a new
part number and pm8xxx gets UEFI support.
Core:
- setdate is removed as it has better replacements
- skip alarms with a second resolution when we know the RTC doesn't
support those.
Subsystem:
- remove unnecessary private struct members
- use devm_pm_set_wake_irq were relevant
Drivers:
- ds1307: stop disabling alarms on probe for DS1337, DS1339, DS1341
and DS3231
- max31335: add max31331 support
- pcf50633 is removed as support for the related SoC has been removed
- pcf85063: properly handle POR failures"
* tag 'rtc-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
rtc: remove 'setdate' test program
selftest: rtc: skip some tests if the alarm only supports minutes
rtc: mt6397: drop unused defines
rtc: pcf85063: replace dev_err+return with return dev_err_probe
rtc: pcf85063: do a SW reset if POR failed
rtc: max31335: Add driver support for max31331
dt-bindings: rtc: max31335: Add max31331 support
rtc: cros-ec: Avoid a couple of -Wflex-array-member-not-at-end warnings
dt-bindings: rtc: pcf2127: Reference spi-peripheral-props.yaml
rtc: rzn1: implement one-second accuracy for alarms
rtc: pcf50633: Remove
rtc: pm8xxx: implement qcom,no-alarm flag for non-HLOS owned alarm
rtc: pm8xxx: mitigate flash wear
rtc: pm8xxx: add support for uefi offset
dt-bindings: rtc: qcom-pm8xxx: document qcom,no-alarm flag
rtc: rv3032: drop WADA
rtc: rv3032: fix EERD location
rtc: pm8xxx: switch to devm_device_init_wakeup
rtc: pm8xxx: fix possible race condition
rtc: mpfs: switch to devm_device_init_wakeup
...
|
|
Pull ARM and clkdev updates from Russell King:
- Simplify ARM_MMU_KEEP usage
- Add Rust support for ARM architecture version 7
- Align IPIs reported in /proc/interrupts
- require linker to support KEEP within OVERLAY
- add KEEP() for ARM vectors
- add __printf() attribute for clkdev functions
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9445/1: clkdev: Mark some functions with __printf() attribute
ARM: 9444/1: add KEEP() keyword to ARM_VECTORS
ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE
ARM: 9442/1: smp: Fix IPI alignment in /proc/interrupts
ARM: 9441/1: rust: Enable Rust support for ARMv7
ARM: 9439/1: arm32: simplify ARM_MMU_KEEP usage
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix max_pfn calculation when hotplugging memory so that it never
decreases
- Fix dereference of unused source register in the MOPS SET operation
fault handling
- Fix NULL calling in do_compat_alignment_fixup() when the 32-bit user
space does an unaligned LDREX/STREX
- Add the HiSilicon HIP09 processor to the Spectre-BHB affected CPUs
- Drop unused code pud accessors (special/mkspecial)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Don't call NULL in do_compat_alignment_fixup()
arm64: Add support for HIP09 Spectre-BHB mitigation
arm64: mm: Drop dead code for pud special bit handling
arm64: mops: Do not dereference src reg for a set operation
arm64: mm: Correct the update of max_pfn
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix BPF selftests expectations of assembler output and struct layout
(Song Liu and Yonghong Song)
- Fix XSK error code when queue is full (Wang Liang)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Fix verifier_private_stack test failure
selftests/bpf: Fix verifier_bpf_fastcall test
selftests/bpf: Fix tests after fields reorder in struct file
xsk: Fix __xsk_generic_xmit() error code when cq is full
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-MM updates from Andrew Morton:
"One bugfix and a couple of small late-arriving updates"
* tag 'mm-nonmm-stable-2025-04-02-22-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib: scatterlist: fix sg_split_phys to preserve original scatterlist offsets
lib/sort.c: add _nonatomic() variants with cond_resched()
mailmap: add an entry for Nicolas Schier
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
Rapoport fixes a couple of issues with the just-merged "arch, mm:
reduce code duplication in mem_init()" series
- The series "MAINTAINERS: add my isub-entries to MM part." from Mike
Rapoport does some maintenance on MAINTAINERS
- The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
cleanup work to the page mapping code
- The series "mseal system mappings" from Jeff Xu permits sealing of
"system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
compat-mode), sigpage (arm compat-mode)
- Plus the usual shower of singleton patches
* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
mseal sysmap: add arch-support txt
mseal sysmap: enable s390
selftest: test system mappings are sealed
mseal sysmap: update mseal.rst
mseal sysmap: uprobe mapping
mseal sysmap: enable arm64
mseal sysmap: enable x86-64
mseal sysmap: generic vdso vvar mapping
selftests: x86: test_mremap_vdso: skip if vdso is msealed
mseal sysmap: kernel config and header change
mm: pgtable: remove tlb_remove_page_ptdesc()
x86: pgtable: convert to use tlb_remove_ptdesc()
riscv: pgtable: unconditionally use tlb_remove_ptdesc()
mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
mm: pgtable: make generic tlb_remove_table() use struct ptdesc
microblaze/mm: put mm_cmdline_setup() in .init.text section
mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
MAINTAINERS: mm: add entry for secretmem
MAINTAINERS: mm: add entry for numa memblocks and numa emulation
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM hotfixes from Andrew Morton:
"Five hotfixes. Three are cc:stable and the remainder address post-6.14
issues or aren't considered necessary for -stable kernels.
All patches are for MM"
* tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
mm/hugetlb: move hugetlb_sysctl_init() to the __init section
mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock
mm/hugetlb_vmemmap: fix memory loads ordering
mm/userfaultfd: fix release hang over concurrent GUP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Calling scx_bpf_create_dsq() with the same ID would succeed creating
duplicate DSQs. Fix it to return -EEXIST.
- scx_select_cpu_dfl() fixes and cleanups.
- Synchronize tool/sched_ext with external scheduler repo. While this
isn't a fix. There's no risk to the kernel and it's better if they
stay synced closer.
* tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
tools/sched_ext: Sync with scx repo
sched_ext: initialize built-in idle state before ops.init()
sched_ext: create_dsq: Return -EEXIST on duplicate request
sched_ext: Remove a meaningless conditional goto in scx_select_cpu_dfl()
sched_ext: idle: Fix return code of scx_select_cpu_dfl()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix build error when CONFIG_PROBE_EVENTS_BTF_ARGS is not enabled
The tracing of arguments in the function tracer depends on some
functions that are only defined when PROBE_EVENTS_BTF_ARGS is
enabled. In fact, PROBE_EVENTS_BTF_ARGS also depends on all the same
configs as the function argument tracing requires. Just have the
function argument tracing depend on PROBE_EVENTS_BTF_ARGS.
- Free module_delta for persistent ring buffer instance
When an instance holds the persistent ring buffer, it allocates a
helper array to hold the deltas between where modules are loaded on
the last boot and the current boot. This array needs to be freed when
the instance is freed.
- Add cond_resched() to loop in ftrace_graph_set_hash()
The hash functions in ftrace loop over every function that can be
enabled by ftrace. This can be 50,000 functions or more. This loop is
known to trigger soft lockup warnings and requires a cond_resched().
The loop in ftrace_graph_set_hash() was missing it.
- Fix the event format verifier to include "%*p.." arguments
To prevent events from dereferencing stale pointers that can happen
if a trace event uses a dereferece pointer to something that was not
copied into the ring buffer and can be freed by the time the trace is
read, a verifier is called. At boot or module load, the verifier
scans the print format string for pointers that can be dereferenced
and it checks the arguments to make sure they do not contain
something that can be freed. The "%*p" was not handled, which would
add another argument and cause the verifier to not only not verify
this pointer, but it will look at the wrong argument for every
pointer after that.
- Fix mcount sorttable building for different endian type target
When modifying the ELF file to sort the mcount_loc table in the
sorttable.c code, the endianess of the file and the host is used to
determine if the bytes need to be swapped when calculations are done.
A change was made to the sorting of the mcount_loc that read the
values from the ELF file into an array and the swap happened on the
filling of the array. But one of the calculations of the array still
did the swap when it did not need to. This caused building on a
little endian machine for a big endian target to not find the mcount
function in the 'nm' table and it zeroed it out, causing there to be
no functions available to trace.
- Add goto out_unlock jump to rv_register_monitor() on error path
One of the error paths in rv_register_monitor() just returned the
error when it should have jumped to the out_unlock label to release
the mutex.
* tag 'trace-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Fix missing unlock on double nested monitors return path
scripts/sorttable: Fix endianness handling in build-time mcount sort
tracing: Verify event formats that have "%*p.."
ftrace: Add cond_resched() to ftrace_graph_set_hash()
tracing: Free module_delta on freeing of persistent ring buffer
ftrace: Have tracing function args depend on PROBE_EVENTS_BTF_ARGS
|
|
If we crash when the journal pin fifo is completely full - i.e. we're
at the maximum number of dirty journal entries - that may put us in a
sticky situation in recovery, as journal replay will need to be able to
open new journal entries in order to get going.
bch2_fs_journal_start() already had provisions for resizing the journal
pin fifo if needed, but it needs a fudge factor to ensure there's room
for journal replay.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
peek_slot() doesn't normally return bkey_s_c_null - except when we ask
for a key at a btree level that doesn't exist, which can happen here.
We might want to revisit this, but we'll have to look over all the
places where we use peek_slot() on interior nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_backpointer_get_key() returns bkey_s_c_null when the target isn't
found.
backpointer_get_key() flags the error, so there's nothing else to do
here - just skip it and move on.
Link: https://github.com/koverstreet/bcachefs/issues/847
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Codepaths that create entries in the snapshots btree currently call
bch2_mark_snapshot(), which updates the in-memory snapshot table, before
transaction commit.
This is because bch2_mark_snapshot() is an atomic trigger, run with
btree write locks held, and isn't allowed to fail - but it might need to
reallocate the table, hence we call it early when we're still allowed to
fail.
This is generally harmless - if we fail, we'll have left an entry in the
snapshots table around, but nothing will reference it and it'll get
overwritten if reused by another transaction.
But check_snapshot_exists(), which reconstructs snapshots when the
snapshots btree has been corrupted or lost, was erronously rechecking if
the snapshot exists inside the transaction commit loop - so on
transaction restart (in this case mem_realloced), the second iteration
would return without repairing.
This code needs some cleanup: splitting out a "maybe realloc snapshots
table" helper would have avoided this, that will be in the next patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The inconsistency error path calls print_string_as_lines, which calls
console_lock, which is a potentially-sleeping function and so can't be
called in an atomic context.
Replace calls to it with the nonblocking variant which is safe to call.
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Two fixes from the recent logging changes:
bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt
context, or with rcu_read_lock() held.
The one syzbot found is in
bch2_bkey_pick_read_device
bch2_dev_rcu
bch2_fs_inconsistent
We're starting to switch to lift the printbufs up to higher levels so we
can emit better log messages and print them all in one go (avoid
garbling), so that conversion will help with spotting these in the
future; when we declare a printbuf it must be flagged if we're in an
atomic context.
Secondly, in btree_node_write_endio:
00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6
00085 preempt_count: 10001, expected: 0
00085 RCU nest depth: 0, expected: 0
00085 4 locks held by bch-reclaim/fa6/618:
00085 #0: ffffff80d7ccad68 (&j->reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198
00085 #1: ffffff80d7c84218 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440
00085 #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440
00085 #3: ffffff80c3823c20 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130
00085 irq event stamp: 328
00085 hardirqs last enabled at (327): [<ffffffc080073a14>] finish_task_switch.isra.0+0xbc/0x2a0
00085 hardirqs last disabled at (328): [<ffffffc080971a10>] el1_interrupt+0x20/0x60
00085 softirqs last enabled at (0): [<ffffffc08002f920>] copy_process+0x7c8/0x2118
00085 softirqs last disabled at (0): [<0000000000000000>] 0x0
00085 Preemption disabled at:
00085 [<ffffffc08003ada0>] irq_enter_rcu+0x18/0x90
00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798
00085 Hardware name: linux,dummy-virt (DT)
00085 Call trace:
00085 show_stack+0x1c/0x30 (C)
00085 dump_stack_lvl+0x84/0xc0
00085 dump_stack+0x14/0x20
00085 __might_resched+0x180/0x288
00085 __might_sleep+0x4c/0x88
00085 __kmalloc_node_track_caller_noprof+0x34c/0x3e0
00085 krealloc_noprof+0x1a0/0x2d8
00085 bch2_printbuf_make_room+0x9c/0x120
00085 bch2_prt_printf+0x60/0x1b8
00085 btree_node_write_endio+0x1b0/0x2d8
00085 bio_endio+0x138/0x1f0
00085 btree_node_write_endio+0xe8/0x2d8
00085 bio_endio+0x138/0x1f0
00085 blk_update_request+0x220/0x4c0
00085 blk_mq_end_request+0x28/0x148
00085 virtblk_request_done+0x64/0xe8
00085 blk_mq_complete_request+0x34/0x40
00085 virtblk_done+0x78/0x130
00085 vring_interrupt+0x6c/0xb0
00085 __handle_irq_event_percpu+0x8c/0x2e0
00085 handle_irq_event+0x50/0xb0
00085 handle_fasteoi_irq+0xc4/0x250
00085 handle_irq_desc+0x44/0x60
00085 generic_handle_domain_irq+0x20/0x30
00085 gic_handle_irq+0x54/0xc8
00085 call_on_irq_stack+0x24/0x40
Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In attempt_compress(), the return value of zlib_deflateInit2() needs to be
checked. A proper implementation can be found in pstore_compress().
Add an error check and return 0 immediately if the initialzation fails.
Fixes: 986e9842fb68 ("bcachefs: Compression levels")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
->elevator_lock depends on queue freeze lock, see block/blk-sysfs.c.
queue freeze lock depends on fs_reclaim.
So don't grab elevator lock during queue initialization which needs to
call kmalloc(GFP_KERNEL), and we can cut the dependency between
->elevator_lock and fs_reclaim, then the lockdep warning can be killed.
This way is safe because elevator setting isn't ready to run during
queue initialization.
There isn't such issue in __blk_mq_update_nr_hw_queues() because
memalloc_noio_save() is called before acquiring elevator lock.
Fixes the following lockdep warning:
https://lore.kernel.org/linux-block/67e6b425.050a0220.2f068f.007b.GAE@google.com/
Reported-by: syzbot+4c7e0f9b94ad65811efb@syzkaller.appspotmail.com
Cc: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250403105402.1334206-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_uring always switches requests to atomic refcounting for iowq
execution before there is any parallilism by setting REQ_F_REFCOUNT,
and the flag is not cleared until the request completes. That should be
fine as long as the compiler doesn't make up a non existing value for
the flags, however KCSAN still complains when the request owner changes
oter flag bits:
BUG: KCSAN: data-race in io_req_task_cancel / io_wq_free_work
...
read to 0xffff888117207448 of 8 bytes by task 3871 on cpu 0:
req_ref_put_and_test io_uring/refs.h:22 [inline]
Skip REQ_F_REFCOUNT checks for iowq, we know it's set.
Reported-by: syzbot+903a2ad71fb3f1e47cf5@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d880bc27fb8c3209b54641be4ff6ac02b0e5789a.1743679736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire update from Takashi Sakamoto:
"A single commit to use the common helper function for on-stack
trailing array to enqueue any isochronous packet by the requests
from userspace applications"
* tag 'firewire-updates-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: avoid -Wflex-array-member-not-at-end warning
|
|
Several verifier_private_stack tests failed with latest bpf-next.
For example, for 'Private stack, single prog' subtest, the
jitted code:
func #0:
0: f3 0f 1e fa endbr64
4: 0f 1f 44 00 00 nopl (%rax,%rax)
9: 0f 1f 00 nopl (%rax)
c: 55 pushq %rbp
d: 48 89 e5 movq %rsp, %rbp
10: f3 0f 1e fa endbr64
14: 49 b9 58 74 8a 8f 7d 60 00 00 movabsq $0x607d8f8a7458, %r9
1e: 65 4c 03 0c 25 28 c0 48 87 addq %gs:-0x78b73fd8, %r9
27: bf 2a 00 00 00 movl $0x2a, %edi
2c: 49 89 b9 00 ff ff ff movq %rdi, -0x100(%r9)
33: 31 c0 xorl %eax, %eax
35: c9 leave
36: e9 20 5d 0f e1 jmp 0xffffffffe10f5d5b
The insn 'addq %gs:-0x78b73fd8, %r9' does not match the expected
regex 'addq %gs:0x{{.*}}, %r9' and this caused test failure.
Fix it by changing '%gs:0x{{.*}}' to '%gs:{{.*}}' to accommodate the
possible negative offset. A few other subtests are fixed in a similar way.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250331033828.365077-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit [1] moves percpu data on x86 from address 0x000... to address
0xfff...
Before [1]:
159020: 0000000000030700 0 OBJECT GLOBAL DEFAULT 23 pcpu_hot
After [1]:
152602: ffffffff83a3e034 4 OBJECT GLOBAL DEFAULT 35 pcpu_hot
As a result, verifier_bpf_fastcall tests should now expect a negative
value for pcpu_hot, IOW, the disassemble should show "r=" instead of
"w=".
Fix this in the test.
Note that, a later change created a new variable "cpu_number" for
bpf_get_smp_processor_id() [2]. The inlining logic is updated properly
as part of this change, so there is no need to fix anything on the
kernel side.
[1] commit 9d7de2aa8b41 ("x86/percpu/64: Use relative percpu offsets")
[2] commit 01c7bc5198e9 ("x86/smp: Move cpu number to percpu hot section")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250328193124.808784-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The change in struct file [1] moved f_ref to the 3rd cache line.
It made *(u64 *)file dereference invalid from the verifier point of view,
because btf_struct_walk() walks into f_lock field, which is 4-byte long.
Fix the selftests to deference the file pointer as a 4-byte access.
[1] commit e249056c91a2 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250327185528.1740787-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When the cq reservation is failed, the error code is not set which is
initialized to zero in __xsk_generic_xmit(). That means the packet is not
send successfully but sendto() return ok.
Considering the impact on uapi, return -EAGAIN is a good idea. The cq is
full usually because it is not released in time, try to send msg again is
appropriate.
The bug was at the very early implementation of xsk, so the Fixes tag
targets the commit that introduced the changes in
xsk_cq_reserve_addr_locked where this fix depends on.
Fixes: e6c4047f5122 ("xsk: Use xsk_buff_pool directly for cq functions")
Suggested-by: Magnus Karlsson <magnus.karlsson@gmail.com>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250227081052.4096337-1-wangliang74@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mikulas Patocka:
- dm-crypt: switch to using the crc32 library
- dm-verity, dm-integrity, dm-crypt: documentation improvement
- dm-vdo fixes
- dm-stripe: enable inline crypto passthrough
- dm-integrity: set ti->error on memory allocation failure
- dm-bufio: remove unused return value
- dm-verity: do forward error correction on metadata I/O errors
- dm: fix unconditional IO throttle caused by REQ_PREFLUSH
- dm cache: prevent BUG_ON by blocking retries on failed device resumes
- dm cache: support shrinking the origin device
- dm: restrict dm device size to 2^63-512 bytes
- dm-delay: support zoned devices
- dm-verity: support block number limits for different ioprio classes
- dm-integrity: fix non-constant-time tag verification (security bug)
- dm-verity, dm-ebs: fix prefetch-vs-suspend race
* tag 'for-6.15/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (27 commits)
dm-ebs: fix prefetch-vs-suspend race
dm-verity: fix prefetch-vs-suspend race
dm-integrity: fix non-constant-time tag verification
dm-verity: support block number limits for different ioprio classes
dm-delay: support zoned devices
dm: restrict dm device size to 2^63-512 bytes
dm cache: support shrinking the origin device
dm cache: prevent BUG_ON by blocking retries on failed device resumes
dm vdo indexer: reorder uds_request to reduce padding
dm: fix unconditional IO throttle caused by REQ_PREFLUSH
dm vdo: rework processing of loaded refcount byte arrays
dm vdo: remove remaining ring references
dm-verity: do forward error correction on metadata I/O errors
dm-bufio: remove unused return value
dm-integrity: set ti->error on memory allocation failure
dm: Enable inline crypto passthrough for striped target
dm vdo slab-depot: read refcount blocks in large chunks at load time
dm vdo vio-pool: allow variable-sized metadata vios
dm vdo vio-pool: support pools with multiple data blocks per vio
dm vdo vio-pool: add a pool pointer to pooled_vio
...
|
|
A quick fix for what I assume is a typo.
Signed-off-by: Tingmao Wang <m@maowtm.org>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Message-ID: <20250330213443.98434-1-m@maowtm.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
|