Age | Commit message (Collapse) | Author |
|
This is needed in the context of Tetragon to test cgroup_skb programs
using BPF_PROG_TEST_RUN with direct packet access.
Commit b39b5f411dcf ("bpf: add cg_skb_is_valid_access for
BPF_PROG_TYPE_CGROUP_SKB") added direct packet access for cgroup_skb
programs and following commit 2cb494a36c98 ("bpf: add tests for direct
packet access from CGROUP_SKB") added tests to the verifier to ensure
that access to skb fields was possible and also fixed
bpf_prog_test_run_skb. However, is_direct_pkt_access was never set to
true for this program type, so data pointers were not computed when
using prog_test_run, making data_end always equal to zero (data_meta is
not accessible for cgroup_skb).
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Link: https://lore.kernel.org/r/20241125152603.375898-1-mahe.tardy@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since j1939_session_skb_queue() does an extra skb_get() for each new
skb, do the same for the initial one in j1939_session_new() to avoid
refcount underflow.
Reported-by: syzbot+d4e8dc385d9258220c31@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d4e8dc385d9258220c31
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241105094823.2403806-1-dmantipov@yandex.ru
[mkl: clean up commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The continual trickle of small conversion patches is grating on me, and
is really not helping. Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:
/*
* .remove_new() is a relic from a prototype conversion of .remove().
* New drivers are supposed to implement .remove(). Once all drivers are
* converted to not use .remove_new any more, it will be dropped.
*/
This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.
I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.
Then I just removed the old (sic) .remove_new member function, and this
is the end result. No more unnecessary conversion noise.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
syzbot found a NULL deref [1] in modify_prefix_route(), caused by one
fib6_info without a fib6_table pointer set.
This can happen for net->ipv6.fib6_null_entry
[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
CPU: 1 UID: 0 PID: 5837 Comm: syz-executor888 Not tainted 6.12.0-syzkaller-09567-g7eef7e306d3c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:__lock_acquire+0xe4/0x3c40 kernel/locking/lockdep.c:5089
Code: 08 84 d2 0f 85 15 14 00 00 44 8b 0d ca 98 f5 0e 45 85 c9 0f 84 b4 0e 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 96 2c 00 00 49 8b 04 24 48 3d a0 07 7f 93 0f 84
RSP: 0018:ffffc900035d7268 EFLAGS: 00010006
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000006 RSI: 1ffff920006bae5f RDI: 0000000000000030
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff90608e17 R11: 0000000000000001 R12: 0000000000000030
R13: ffff888036334880 R14: 0000000000000000 R15: 0000000000000000
FS: 0000555579e90380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc59cc4278 CR3: 0000000072b54000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5849
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
modify_prefix_route+0x30b/0x8b0 net/ipv6/addrconf.c:4831
inet6_addr_modify net/ipv6/addrconf.c:4923 [inline]
inet6_rtm_newaddr+0x12c7/0x1ab0 net/ipv6/addrconf.c:5055
rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6920
netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2541
netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1347
netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg net/socket.c:726 [inline]
____sys_sendmsg+0xaaf/0xc90 net/socket.c:2583
___sys_sendmsg+0x135/0x1e0 net/socket.c:2637
__sys_sendmsg+0x16e/0x220 net/socket.c:2669
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd1dcef8b79
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc59cc4378 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd1dcef8b79
RDX: 0000000000040040 RSI: 0000000020000140 RDI: 0000000000000004
RBP: 00000000000113fd R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 00007ffc59cc438c
R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
</TASK>
Fixes: 5eb902b8e719 ("net/ipv6: Remove expired routes with a separated list of routes.")
Reported-by: syzbot+1de74b0794c40c8eb300@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67461f7f.050a0220.1286eb.0021.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Kui-Feng Lee <thinker.li@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is:
WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:ip_rt_bug+0x14/0x20
Call Trace:
<IRQ>
ip_send_skb+0x14/0x40
__icmp_send+0x42d/0x6a0
ipv4_link_failure+0xe2/0x1d0
arp_error_report+0x3c/0x50
neigh_invalidate+0x8d/0x100
neigh_timer_handler+0x2e1/0x330
call_timer_fn+0x21/0x120
__run_timer_base.part.0+0x1c9/0x270
run_timer_softirq+0x4c/0x80
handle_softirqs+0xac/0x280
irq_exit_rcu+0x62/0x80
sysvec_apic_timer_interrupt+0x77/0x90
The script below reproduces this scenario:
ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \
dir out priority 0 ptype main flag localok icmp
ip l a veth1 type veth
ip a a 192.168.141.111/24 dev veth0
ip l s veth0 up
ping 192.168.141.155 -c 1
icmp_route_lookup() create input routes for locally generated packets
while xfrm relookup ICMP traffic.Then it will set input route
(dst->out = ip_rt_bug) to skb for DESTUNREACH.
For ICMP err triggered by locally generated packets, dst->dev of output
route is loopback. Generally, xfrm relookup verification is not required
on loopback interfaces (net.ipv4.conf.lo.disable_xfrm = 1).
Skip icmp relookup for locally generated packets to fix it.
Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support")
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241127040850.1513135-1-dongchenchen2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot is able to feed a packet with 14 bytes, pretending
it is a vlan one.
Since fill_frame_info() is relying on skb->mac_len already,
extend the check to cover this case.
BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:709 [inline]
BUG: KMSAN: uninit-value in hsr_forward_skb+0x9ee/0x3b10 net/hsr/hsr_forward.c:724
fill_frame_info net/hsr/hsr_forward.c:709 [inline]
hsr_forward_skb+0x9ee/0x3b10 net/hsr/hsr_forward.c:724
hsr_dev_xmit+0x2f0/0x350 net/hsr/hsr_device.c:235
__netdev_start_xmit include/linux/netdevice.h:5002 [inline]
netdev_start_xmit include/linux/netdevice.h:5011 [inline]
xmit_one net/core/dev.c:3590 [inline]
dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3606
__dev_queue_xmit+0x366a/0x57d0 net/core/dev.c:4434
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3146 [inline]
packet_sendmsg+0x91ae/0xa6f0 net/packet/af_packet.c:3178
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:726
__sys_sendto+0x594/0x750 net/socket.c:2197
__do_sys_sendto net/socket.c:2204 [inline]
__se_sys_sendto net/socket.c:2200 [inline]
__x64_sys_sendto+0x125/0x1d0 net/socket.c:2200
x64_sys_call+0x346a/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:45
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4091 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
__alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
alloc_skb include/linux/skbuff.h:1323 [inline]
alloc_skb_with_frags+0xc8/0xd00 net/core/skbuff.c:6612
sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2881
packet_alloc_skb net/packet/af_packet.c:2995 [inline]
packet_snd net/packet/af_packet.c:3089 [inline]
packet_sendmsg+0x74c6/0xa6f0 net/packet/af_packet.c:3178
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:726
__sys_sendto+0x594/0x750 net/socket.c:2197
__do_sys_sendto net/socket.c:2204 [inline]
__se_sys_sendto net/socket.c:2200 [inline]
__x64_sys_sendto+0x125/0x1d0 net/socket.c:2200
x64_sys_call+0x346a/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:45
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 48b491a5cc74 ("net: hsr: fix mac_len checks")
Reported-by: syzbot+671e2853f9851d039551@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6745dc7f.050a0220.21d33d.0018.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: WingMan Kwok <w-kwok2@ti.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: MD Danish Anwar <danishanwar@ti.com>
Cc: Jiri Pirko <jiri@nvidia.com>
Cc: George McCollister <george.mccollister@gmail.com>
Link: https://patch.msgid.link/20241126144344.4177332-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the length of a GSO packet in the tbf qdisc is larger than the burst
size configured the packet will be segmented by the tbf_segment function.
Whenever this function is used to enqueue SKBs, the backlog statistic of
the tbf is not increased correctly. This can lead to underflows of the
'backlog' byte-statistic value when these packets are dequeued from tbf.
Reproduce the bug:
Ensure that the sender machine has GSO enabled. Configured the tbf on
the outgoing interface of the machine as follows (burstsize = 1 MTU):
$ tc qdisc add dev <oif> root handle 1: tbf rate 50Mbit burst 1514 latency 50ms
Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
client on this machine. Check the qdisc statistics:
$ tc -s qdisc show dev <oif>
The 'backlog' byte-statistic has incorrect values while traffic is
transferred, e.g., high values due to u32 underflows. When the transfer
is stopped, the value is != 0, which should never happen.
This patch fixes this bug by updating the statistics correctly, even if
single SKBs of a GSO SKB cannot be enqueued.
Fixes: e43ac79a4bc6 ("sch_tbf: segment too big GSO packets")
Signed-off-by: Martin Ottens <martin.ottens@fau.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241125174608.1484356-1-martin.ottens@fau.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported that netdev_core_pick_tx() was reading an uninitialized
field [1].
This is indeed hapening for timewait sockets after recent commits.
We can copy the original established socket sk_tx_queue_mapping
and sk_rx_queue_mapping fields, instead of adding more checks
in fast paths.
As a bonus, packets will use the same transmit queue than
prior ones, this potentially can avoid reordering.
[1]
BUG: KMSAN: uninit-value in netdev_pick_tx+0x5c7/0x1550
netdev_pick_tx+0x5c7/0x1550
netdev_core_pick_tx+0x1d2/0x4a0 net/core/dev.c:4312
__dev_queue_xmit+0x128a/0x57d0 net/core/dev.c:4394
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
neigh_hh_output include/net/neighbour.h:523 [inline]
neigh_output include/net/neighbour.h:537 [inline]
ip_finish_output2+0x187c/0x1b70 net/ipv4/ip_output.c:236
__ip_finish_output+0x287/0x810
ip_finish_output+0x4b/0x600 net/ipv4/ip_output.c:324
NF_HOOK_COND include/linux/netfilter.h:303 [inline]
ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:434
dst_output include/net/dst.h:450 [inline]
ip_local_out net/ipv4/ip_output.c:130 [inline]
ip_send_skb net/ipv4/ip_output.c:1505 [inline]
ip_push_pending_frames+0x444/0x570 net/ipv4/ip_output.c:1525
ip_send_unicast_reply+0x18c1/0x1b30 net/ipv4/ip_output.c:1672
tcp_v4_send_reset+0x238d/0x2a40 net/ipv4/tcp_ipv4.c:910
tcp_v4_rcv+0x48f8/0x5750 net/ipv4/tcp_ipv4.c:2431
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_sublist_rcv_finish net/ipv4/ip_input.c:578 [inline]
ip_list_rcv_finish net/ipv4/ip_input.c:628 [inline]
ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:636
ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:670
__netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
__netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5762
__netif_receive_skb_list net/core/dev.c:5814 [inline]
netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:5905
gro_normal_list include/net/gro.h:515 [inline]
napi_complete_done+0x3d4/0x810 net/core/dev.c:6256
virtqueue_napi_complete drivers/net/virtio_net.c:758 [inline]
virtnet_poll+0x5d80/0x6bf0 drivers/net/virtio_net.c:3013
__napi_poll+0xe7/0x980 net/core/dev.c:6877
napi_poll net/core/dev.c:6946 [inline]
net_rx_action+0xa5a/0x19b0 net/core/dev.c:7068
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:554
__do_softirq kernel/softirq.c:588 [inline]
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0x68/0x180 kernel/softirq.c:655
irq_exit_rcu+0x12/0x20 kernel/softirq.c:671
common_interrupt+0x97/0xb0 arch/x86/kernel/irq.c:278
asm_common_interrupt+0x2b/0x40 arch/x86/include/asm/idtentry.h:693
__preempt_count_sub arch/x86/include/asm/preempt.h:84 [inline]
kmsan_virt_addr_valid arch/x86/include/asm/kmsan.h:95 [inline]
virt_to_page_or_null+0xfb/0x150 mm/kmsan/shadow.c:75
kmsan_get_metadata+0x13e/0x1c0 mm/kmsan/shadow.c:141
kmsan_get_shadow_origin_ptr+0x4d/0xb0 mm/kmsan/shadow.c:102
get_shadow_origin_ptr mm/kmsan/instrumentation.c:38 [inline]
__msan_metadata_ptr_for_store_4+0x27/0x40 mm/kmsan/instrumentation.c:93
rcu_preempt_read_enter kernel/rcu/tree_plugin.h:390 [inline]
__rcu_read_lock+0x46/0x70 kernel/rcu/tree_plugin.h:413
rcu_read_lock include/linux/rcupdate.h:847 [inline]
batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:408 [inline]
batadv_nc_worker+0x114/0x19e0 net/batman-adv/network-coding.c:719
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xae0/0x1c40 kernel/workqueue.c:3310
worker_thread+0xea7/0x14f0 kernel/workqueue.c:3391
kthread+0x3e2/0x540 kernel/kthread.c:389
ret_from_fork+0x6d/0x90 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Uninit was created at:
__alloc_pages_noprof+0x9a7/0xe00 mm/page_alloc.c:4774
alloc_pages_mpol_noprof+0x299/0x990 mm/mempolicy.c:2265
alloc_pages_noprof+0x1bf/0x1e0 mm/mempolicy.c:2344
alloc_slab_page mm/slub.c:2412 [inline]
allocate_slab+0x320/0x12e0 mm/slub.c:2578
new_slab mm/slub.c:2631 [inline]
___slab_alloc+0x12ef/0x35e0 mm/slub.c:3818
__slab_alloc mm/slub.c:3908 [inline]
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
kmem_cache_alloc_noprof+0x57a/0xb20 mm/slub.c:4141
inet_twsk_alloc+0x11f/0x9d0 net/ipv4/inet_timewait_sock.c:188
tcp_time_wait+0x83/0xf50 net/ipv4/tcp_minisocks.c:309
tcp_rcv_state_process+0x145a/0x49d0
tcp_v4_do_rcv+0xbf9/0x11a0 net/ipv4/tcp_ipv4.c:1939
tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351
ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:460 [inline]
ip_sublist_rcv_finish net/ipv4/ip_input.c:578 [inline]
ip_list_rcv_finish net/ipv4/ip_input.c:628 [inline]
ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:636
ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:670
__netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
__netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5762
__netif_receive_skb_list net/core/dev.c:5814 [inline]
netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:5905
gro_normal_list include/net/gro.h:515 [inline]
napi_complete_done+0x3d4/0x810 net/core/dev.c:6256
virtqueue_napi_complete drivers/net/virtio_net.c:758 [inline]
virtnet_poll+0x5d80/0x6bf0 drivers/net/virtio_net.c:3013
__napi_poll+0xe7/0x980 net/core/dev.c:6877
napi_poll net/core/dev.c:6946 [inline]
net_rx_action+0xa5a/0x19b0 net/core/dev.c:7068
handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:554
__do_softirq kernel/softirq.c:588 [inline]
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0x68/0x180 kernel/softirq.c:655
irq_exit_rcu+0x12/0x20 kernel/softirq.c:671
common_interrupt+0x97/0xb0 arch/x86/kernel/irq.c:278
asm_common_interrupt+0x2b/0x40 arch/x86/include/asm/idtentry.h:693
CPU: 0 UID: 0 PID: 3962 Comm: kworker/u8:18 Not tainted 6.12.0-syzkaller-09073-g9f16d5e6f220 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: bat_events batadv_nc_worker
Fixes: 79636038d37e ("ipv4: tcp: give socket pointer to control skbs")
Fixes: 507a96737d99 ("ipv6: tcp: give socket pointer to control skbs")
Reported-by: syzbot+8b0959fc16551d55896b@syzkaller.appspotmail.com
Link: https://lore.kernel.org/netdev/674442bd.050a0220.1cc393.0072.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241125093039.3095790-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull 9p updates from Dominique Martinet:
- usbg: fix alloc failure handling & build-as-module
- xen: couple of fixes
- v9fs_cache_register/unregister code cleanup
* tag '9p-for-6.13-rc1' of https://github.com/martinetd/linux:
net/9p/usbg: allow building as standalone module
9p/xen: fix release of IRQ
9p/xen: fix init sequence
net/9p/usbg: fix handling of the failed kzalloc() memory allocation
fs/9p: replace functions v9fs_cache_{register|unregister} with direct calls
|
|
Pull ceph updates from Ilya Dryomov:
"A fix for the mount "device" string parser from Patrick and two cred
reference counting fixups from Max, marked for stable.
Also included a number of cleanups and a tweak to MAINTAINERS to avoid
unnecessarily CCing netdev list"
* tag 'ceph-for-6.13-rc1' of https://github.com/ceph/ceph-client:
ceph: fix cred leak in ceph_mds_check_access()
ceph: pass cred pointer to ceph_mds_auth_match()
ceph: improve caps debugging output
ceph: correct ceph_mds_cap_peer field name
ceph: correct ceph_mds_cap_item field name
ceph: miscellaneous spelling fixes
ceph: Use strscpy() instead of strcpy() in __get_snap_name()
ceph: Use str_true_false() helper in status_show()
ceph: requalify some char pointers as const
ceph: extract entity name from device id
MAINTAINERS: exclude net/ceph from networking
ceph: Remove fs/ceph deadcode
libceph: Remove unused ceph_crypto_key_encode
libceph: Remove unused ceph_osdc_watch_check
libceph: Remove unused pagevec functions
libceph: Remove unused ceph_pagelist functions
|
|
Pull NFS client updates from Trond Myklebust:
"Bugfixes:
- nfs/localio: fix for a memory corruption in nfs_local_read_done
- Revert "nfs: don't reuse partially completed requests in
nfs_lock_and_join_requests"
- nfsv4:
- ignore SB_RDONLY when mounting nfs
- Fix a use-after-free problem in open()
- sunrpc:
- clear XPRT_SOCK_UPD_TIMEOUT when reseting the transport
- timeout and cancel TLS handshake with -ETIMEDOUT
- fix one UAF issue caused by sunrpc kernel tcp socket
- Fix a hang in TLS sock_close if sk_write_pending
- pNFS/blocklayout: Fix device registration issues
Features and cleanups:
- localio cleanups from Mike Snitzer
- Clean up refcounting on the nfs version modules
- __counted_by() annotations
- nfs: make processes that are waiting for an I/O lock killable"
* tag 'nfs-for-6.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
fs/nfs/io: make nfs_start_io_*() killable
nfs/blocklayout: Limit repeat device registration on failure
nfs/blocklayout: Don't attempt unregister for invalid block device
sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket
SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT
sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport
nfs: ignore SB_RDONLY when mounting nfs
Revert "nfs: don't reuse partially completed requests in nfs_lock_and_join_requests"
Revert "fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private"
nfs/localio: must clear res.replen in nfs_local_read_done
NFSv4.0: Fix a use-after-free problem in the asynchronous open()
NFSv4.0: Fix the wake up of the next waiter in nfs_release_seqid()
SUNRPC: Fix a hang in TLS sock_close if sk_write_pending
sunrpc: remove newlines from tracepoints
nfs: Annotate struct pnfs_commit_array with __counted_by()
nfs/localio: eliminate need for nfs_local_fsync_work forward declaration
nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx
nfs/localio: remove redundant suid/sgid handling
NFS: Implement get_nfs_version()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Current release - regressions:
- rtnetlink: fix rtnl_dump_ifinfo() error path
- bluetooth: remove the redundant sco_conn_put
Previous releases - regressions:
- netlink: fix false positive warning in extack during dumps
- sched: sch_fq: don't follow the fast path if Tx is behind now
- ipv6: delete temporary address if mngtmpaddr is removed or
unmanaged
- tcp: fix use-after-free of nreq in reqsk_timer_handler().
- bluetooth: fix slab-use-after-free Read in set_powered_sync
- l2tp: fix warning in l2tp_exit_net found
- eth:
- bnxt_en: fix receive ring space parameters when XDP is active
- lan78xx: fix double free issue with interrupt buffer allocation
- tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets
Previous releases - always broken:
- ipmr: fix tables suspicious RCU usage
- iucv: MSG_PEEK causes memory leak in iucv_sock_destruct()
- eth:
- octeontx2-af: fix low network performance
- stmmac: dwmac-socfpga: set RX watchdog interrupt as broken
- rtase: correct the speed for RTL907XD-V1
Misc:
- some documentation fixup"
* tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
ipmr: fix build with clang and DEBUG_NET disabled.
Documentation: tls_offload: fix typos and grammar
Fix spelling mistake
ipmr: fix tables suspicious RCU usage
ip6mr: fix tables suspicious RCU usage
ipmr: add debug check for mr table cleanup
selftests: rds: move test.py to TEST_FILES
net_sched: sch_fq: don't follow the fast path if Tx is behind now
tcp: Fix use-after-free of nreq in reqsk_timer_handler().
net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI
net: Comment copy_from_sockptr() explaining its behaviour
rxrpc: Improve setsockopt() handling of malformed user input
llc: Improve setsockopt() handling of malformed user input
Bluetooth: SCO: remove the redundant sco_conn_put
Bluetooth: MGMT: Fix possible deadlocks
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
bnxt_en: Unregister PTP during PCI shutdown and suspend
bnxt_en: Refactor bnxt_ptp_init()
bnxt_en: Fix receive ring space parameters when XDP is active
bnxt_en: Fix queue start to update vnic RSS table
...
|
|
BUG: KASAN: slab-use-after-free in tcp_write_timer_handler+0x156/0x3e0
Read of size 1 at addr ffff888111f322cd by task swapper/0/0
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc4-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
Call Trace:
<IRQ>
dump_stack_lvl+0x68/0xa0
print_address_description.constprop.0+0x2c/0x3d0
print_report+0xb4/0x270
kasan_report+0xbd/0xf0
tcp_write_timer_handler+0x156/0x3e0
tcp_write_timer+0x66/0x170
call_timer_fn+0xfb/0x1d0
__run_timers+0x3f8/0x480
run_timer_softirq+0x9b/0x100
handle_softirqs+0x153/0x390
__irq_exit_rcu+0x103/0x120
irq_exit_rcu+0xe/0x20
sysvec_apic_timer_interrupt+0x76/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:default_idle+0xf/0x20
Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90
90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 33 f8 25 00 fb f4 <fa> c3 cc cc cc
cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
RSP: 0018:ffffffffa2007e28 EFLAGS: 00000242
RAX: 00000000000f3b31 RBX: 1ffffffff4400fc7 RCX: ffffffffa09c3196
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9f00590f
RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed102360835d
R10: ffff88811b041aeb R11: 0000000000000001 R12: 0000000000000000
R13: ffffffffa202d7c0 R14: 0000000000000000 R15: 00000000000147d0
default_idle_call+0x6b/0xa0
cpuidle_idle_call+0x1af/0x1f0
do_idle+0xbc/0x130
cpu_startup_entry+0x33/0x40
rest_init+0x11f/0x210
start_kernel+0x39a/0x420
x86_64_start_reservations+0x18/0x30
x86_64_start_kernel+0x97/0xa0
common_startup_64+0x13e/0x141
</TASK>
Allocated by task 595:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
__kasan_slab_alloc+0x87/0x90
kmem_cache_alloc_noprof+0x12b/0x3f0
copy_net_ns+0x94/0x380
create_new_namespaces+0x24c/0x500
unshare_nsproxy_namespaces+0x75/0xf0
ksys_unshare+0x24e/0x4f0
__x64_sys_unshare+0x1f/0x30
do_syscall_64+0x70/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 100:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x54/0x70
kmem_cache_free+0x156/0x5d0
cleanup_net+0x5d3/0x670
process_one_work+0x776/0xa90
worker_thread+0x2e2/0x560
kthread+0x1a8/0x1f0
ret_from_fork+0x34/0x60
ret_from_fork_asm+0x1a/0x30
Reproduction script:
mkdir -p /mnt/nfsshare
mkdir -p /mnt/nfs/netns_1
mkfs.ext4 /dev/sdb
mount /dev/sdb /mnt/nfsshare
systemctl restart nfs-server
chmod 777 /mnt/nfsshare
exportfs -i -o rw,no_root_squash *:/mnt/nfsshare
ip netns add netns_1
ip link add name veth_1_peer type veth peer veth_1
ifconfig veth_1_peer 11.11.0.254 up
ip link set veth_1 netns netns_1
ip netns exec netns_1 ifconfig veth_1 11.11.0.1
ip netns exec netns_1 /root/iptables -A OUTPUT -d 11.11.0.254 -p tcp \
--tcp-flags FIN FIN -j DROP
(note: In my environment, a DESTROY_CLIENTID operation is always sent
immediately, breaking the nfs tcp connection.)
ip netns exec netns_1 timeout -s 9 300 mount -t nfs -o proto=tcp,vers=4.1 \
11.11.0.254:/mnt/nfsshare /mnt/nfs/netns_1
ip netns del netns_1
The reason here is that the tcp socket in netns_1 (nfs side) has been
shutdown and closed (done in xs_destroy), but the FIN message (with ack)
is discarded, and the nfsd side keeps sending retransmission messages.
As a result, when the tcp sock in netns_1 processes the received message,
it sends the message (FIN message) in the sending queue, and the tcp timer
is re-established. When the network namespace is deleted, the net structure
accessed by tcp's timer handler function causes problems.
To fix this problem, let's hold netns refcnt for the tcp kernel socket as
done in other modules. This is an ugly hack which can easily be backported
to earlier kernels. A proper fix which cleans up the interfaces will
follow, but may not be so easy to backport.
Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We've noticed a situation where an unstable TCP connection can cause the
TLS handshake to timeout waiting for userspace to complete it. When this
happens, we don't want to return from xs_tls_handshake_sync() with zero, as
this will cause the upper xprt to be set CONNECTED, and subsequent attempts
to transmit will be returned with -EPIPE. The sunrpc machine does not
recover from this situation and will spin attempting to transmit.
The return value of tls_handshake_cancel() can be used to detect a race
with completion:
* tls_handshake_cancel - cancel a pending handshake
* Return values:
* %true - Uncompleted handshake request was canceled
* %false - Handshake request already completed or not found
If true, we do not want the upper xprt to be connected, so return
-ETIMEDOUT. If false, its possible the handshake request was lost and
that may be the reason for our timeout. Again we do not want the upper
xprt to be connected, so return -ETIMEDOUT.
Ensure that we alway return an error from xs_tls_handshake_sync() if we
call tls_handshake_cancel().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Since transport->sock has been set to NULL during reset transport,
XPRT_SOCK_UPD_TIMEOUT also needs to be cleared. Otherwise, the
xs_tcp_set_socket_timeouts() may be triggered in xs_tcp_send_request()
to dereference the transport->sock that has been set to NULL.
Fixes: 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Sasha reported a build issue in ipmr::
net/ipv4/ipmr.c:320:13: error: function 'ipmr_can_free_table' is not \
needed and will not be emitted \
[-Werror,-Wunneeded-internal-declaration]
320 | static bool ipmr_can_free_table(struct net *net)
Apparently clang is too smart with BUILD_BUG_ON_INVALID(), let's
fallback to a plain WARN_ON_ONCE().
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.11-25635-g6813e2326f1e/testrun/26111580/suite/build/test/clang-nightly-lkftconfig/details/
Fixes: 11b6e701bce9 ("ipmr: add debug check for mr table cleanup")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/ee75faa926b2446b8302ee5fc30e129d2df73b90.1732810228.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
cgroup maximum depth is INT_MAX by default, there is a cgroup toggle to
restrict this maximum depth to a more reasonable value not to harm
performance. Remove unnecessary WARN_ON_ONCE which is reachable from
userspace.
Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces")
Reported-by: syzbot+57bac0866ddd99fe47c0@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Syzbot has reported the following BUG detected by KASAN:
BUG: KASAN: slab-out-of-bounds in strlen+0x58/0x70
Read of size 1 at addr ffff8881022da0c8 by task repro/5879
...
Call Trace:
<TASK>
dump_stack_lvl+0x241/0x360
? __pfx_dump_stack_lvl+0x10/0x10
? __pfx__printk+0x10/0x10
? _printk+0xd5/0x120
? __virt_addr_valid+0x183/0x530
? __virt_addr_valid+0x183/0x530
print_report+0x169/0x550
? __virt_addr_valid+0x183/0x530
? __virt_addr_valid+0x183/0x530
? __virt_addr_valid+0x45f/0x530
? __phys_addr+0xba/0x170
? strlen+0x58/0x70
kasan_report+0x143/0x180
? strlen+0x58/0x70
strlen+0x58/0x70
kstrdup+0x20/0x80
led_tg_check+0x18b/0x3c0
xt_check_target+0x3bb/0xa40
? __pfx_xt_check_target+0x10/0x10
? stack_depot_save_flags+0x6e4/0x830
? nft_target_init+0x174/0xc30
nft_target_init+0x82d/0xc30
? __pfx_nft_target_init+0x10/0x10
? nf_tables_newrule+0x1609/0x2980
? nf_tables_newrule+0x1609/0x2980
? rcu_is_watching+0x15/0xb0
? nf_tables_newrule+0x1609/0x2980
? nf_tables_newrule+0x1609/0x2980
? __kmalloc_noprof+0x21a/0x400
nf_tables_newrule+0x1860/0x2980
? __pfx_nf_tables_newrule+0x10/0x10
? __nla_parse+0x40/0x60
nfnetlink_rcv+0x14e5/0x2ab0
? __pfx_validate_chain+0x10/0x10
? __pfx_nfnetlink_rcv+0x10/0x10
? __lock_acquire+0x1384/0x2050
? netlink_deliver_tap+0x2e/0x1b0
? __pfx_lock_release+0x10/0x10
? netlink_deliver_tap+0x2e/0x1b0
netlink_unicast+0x7f8/0x990
? __pfx_netlink_unicast+0x10/0x10
? __virt_addr_valid+0x183/0x530
? __check_object_size+0x48e/0x900
netlink_sendmsg+0x8e4/0xcb0
? __pfx_netlink_sendmsg+0x10/0x10
? aa_sock_msg_perm+0x91/0x160
? __pfx_netlink_sendmsg+0x10/0x10
__sock_sendmsg+0x223/0x270
____sys_sendmsg+0x52a/0x7e0
? __pfx_____sys_sendmsg+0x10/0x10
__sys_sendmsg+0x292/0x380
? __pfx___sys_sendmsg+0x10/0x10
? lockdep_hardirqs_on_prepare+0x43d/0x780
? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
? exc_page_fault+0x590/0x8c0
? do_syscall_64+0xb6/0x230
do_syscall_64+0xf3/0x230
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
</TASK>
Since an invalid (without '\0' byte at all) byte sequence may be passed
from userspace, add an extra check to ensure that such a sequence is
rejected as possible ID and so never passed to 'kstrdup()' and further.
Reported-by: syzbot+6c8215822f35fdb35667@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6c8215822f35fdb35667
Fixes: 268cb38e1802 ("netfilter: x_tables: add LED trigger target")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Under certain kernel configurations when building with Clang/LLVM, the
compiler does not generate a return or jump as the terminator
instruction for ip_vs_protocol_init(), triggering the following objtool
warning during build time:
vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()
At runtime, this either causes an oops when trying to load the ipvs
module or a boot-time panic if ipvs is built-in. This same issue has
been reported by the Intel kernel test robot previously.
Digging deeper into both LLVM and the kernel code reveals this to be a
undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
of 64 chars to store the registered protocol names and leaves it
uninitialized after definition. The function calls strnlen() when
concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
strnlen() performs an extra step to check whether the last byte of the
input char buffer is a null character (commit 3009f891bb9f ("fortify:
Allow strlen() and strnlen() to pass compile-time known lengths")).
This, together with possibly other configurations, cause the following
IR to be generated:
define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section ".init.text" align 16 !kcfi_type !29 {
%1 = alloca [64 x i8], align 16
...
14: ; preds = %11
%15 = getelementptr inbounds i8, ptr %1, i64 63
%16 = load i8, ptr %15, align 1
%17 = tail call i1 @llvm.is.constant.i8(i8 %16)
%18 = icmp eq i8 %16, 0
%19 = select i1 %17, i1 %18, i1 false
br i1 %19, label %20, label %23
20: ; preds = %14
%21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
...
23: ; preds = %14, %11, %20
%24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 noundef 64) #24
...
}
The above code calculates the address of the last char in the buffer
(value %15) and then loads from it (value %16). Because the buffer is
never initialized, the LLVM GVN pass marks value %16 as undefined:
%13 = getelementptr inbounds i8, ptr %1, i64 63
br i1 undef, label %14, label %17
This gives later passes (SCCP, in particular) more DCE opportunities by
propagating the undef value further, and eventually removes everything
after the load on the uninitialized stack location:
define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section ".init.text" align 16 !kcfi_type !11 {
%1 = alloca [64 x i8], align 16
...
12: ; preds = %11
%13 = getelementptr inbounds i8, ptr %1, i64 63
unreachable
}
In this way, the generated native code will just fall through to the
next function, as LLVM does not generate any code for the unreachable IR
instruction and leaves the function without a terminator.
Zero the on-stack buffer to avoid this possible UB.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402100205.PWXIz1ZK-lkp@intel.com/
Co-developed-by: Ruowen Qin <ruqin@redhat.com>
Signed-off-by: Ruowen Qin <ruqin@redhat.com>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Similar to the previous patch, plumb the RCU lock inside
the ipmr_get_table(), provided a lockless variant and apply
the latter in the few spots were the lock is already held.
Fixes: 709b46e8d90b ("net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT")
Fixes: f0ad0860d01e ("ipv4: ipmr: support multiple tables")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Several places call ip6mr_get_table() with no RCU nor RTNL lock.
Add RCU protection inside such helper and provide a lockless variant
for the few callers that already acquired the relevant lock.
Note that some users additionally reference the table outside the RCU
lock. That is actually safe as the table deletion can happen only
after all table accesses are completed.
Fixes: e2d57766e674 ("net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.")
Fixes: d7c31cbde4bc ("net: ip6mr: add RTM_GETROUTE netlink op")
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The multicast route tables lifecycle, for both ipv4 and ipv6, is
protected by RCU using the RTNL lock for write access. In many
places a table pointer escapes the RCU (or RTNL) protected critical
section, but such scenarios are actually safe because tables are
deleted only at namespace cleanup time or just after allocation, in
case of default rule creation failure.
Tables freed at namespace cleanup time are assured to be alive for the
whole netns lifetime; tables freed just after creation time are never
exposed to other possible users.
Ensure that the free conditions are respected in ip{,6}mr_free_table, to
document the locking schema and to prevent future possible introduction
of 'table del' operation from breaking it.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Recent kernels cause a lot of TCP retransmissions
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.24 GBytes 19.2 Gbits/sec 2767 442 KBytes
[ 5] 1.00-2.00 sec 2.23 GBytes 19.1 Gbits/sec 2312 350 KBytes
^^^^
Replacing the qdisc with pfifo makes retransmissions go away.
It appears that a flow may have a delayed packet with a very near
Tx time. Later, we may get busy processing Rx and the target Tx time
will pass, but we won't service Tx since the CPU is busy with Rx.
If Rx sees an ACK and we try to push more data for the delayed flow
we may fastpath the skb, not realizing that there are already "ready
to send" packets for this flow sitting in the qdisc.
Don't trust the fastpath if we are "behind" according to the projected
Tx time for next flow waiting in the Qdisc. Because we consider anything
within the offload window to be okay for fastpath we must consider
the entire offload window as "now".
Qdisc config:
qdisc fq 8001: dev eth0 parent 1234:1 limit 10000p flow_limit 100p \
buckets 32768 orphan_mask 1023 bands 3 \
priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 \
weights 589824 196608 65536 quantum 3028b initial_quantum 15140b \
low_rate_threshold 550Kbit \
refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
For iperf this change seems to do fine, the reordering is gone.
The fastpath still gets used most of the time:
gc 0 highprio 0 fastpath 142614 throttled 418309 latency 19.1us
xx_behind 2731
where "xx_behind" counts how many times we hit the new "return false".
CC: stable@vger.kernel.org
Fixes: 076433bd78d7 ("net_sched: sch_fq: add fast path for mostly idle qdisc")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241124022148.3126719-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The cited commit replaced inet_csk_reqsk_queue_drop_and_put() with
__inet_csk_reqsk_queue_drop() and reqsk_put() in reqsk_timer_handler().
Then, oreq should be passed to reqsk_put() instead of req; otherwise
use-after-free of nreq could happen when reqsk is migrated but the
retry attempt failed (e.g. due to timeout).
Let's pass oreq to reqsk_put().
Fixes: e8c526f2bdf1 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().")
Reported-by: Liu Jian <liujian56@huawei.com>
Closes: https://lore.kernel.org/netdev/1284490f-9525-42ee-b7b8-ccadf6606f6d@huawei.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241123174236.62438-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- SCO: remove the redundant sco_conn_put
- MGMT: Fix slab-use-after-free Read in set_powered_sync
- MGMT: Fix possible deadlocks
* tag 'for-net-2024-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: SCO: remove the redundant sco_conn_put
Bluetooth: MGMT: Fix possible deadlocks
Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
====================
Link: https://patch.msgid.link/20241126165149.899213-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
copy_from_sockptr() does not return negative value on error; instead, it
reports the number of bytes that failed to copy. Since it's deprecated,
switch to copy_safe_from_sockptr().
Note: Keeping the `optlen != sizeof(unsigned int)` check as
copy_safe_from_sockptr() by itself would also accept
optlen > sizeof(unsigned int). Which would allow a more lenient handling
of inputs.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
copy_from_sockptr() is used incorrectly: return value is the number of
bytes that could not be copied. Since it's deprecated, switch to
copy_safe_from_sockptr().
Note: Keeping the `optlen != sizeof(int)` check as copy_safe_from_sockptr()
by itself would also accept optlen > sizeof(int). Which would allow a more
lenient handling of inputs.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: David Wei <dw@davidwei.uk>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Pull nfsd updates from Chuck Lever:
"Jeff Layton contributed a scalability improvement to NFSD's NFSv4
backchannel session implementation. This improvement is intended to
increase the rate at which NFSD can safely recall NFSv4 delegations
from clients, to avoid the need to revoke them. Revoking requires a
slow state recovery process.
A wide variety of bug fixes and other incremental improvements make up
the bulk of commits in this series. As always I am grateful to the
NFSD contributors, reviewers, testers, and bug reporters who
participated during this cycle"
* tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits)
nfsd: allow for up to 32 callback session slots
nfs_common: must not hold RCU while calling nfsd_file_put_local
nfsd: get rid of include ../internal.h
nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur
NFSD: Add nfsd4_copy time-to-live
NFSD: Add a laundromat reaper for async copy state
NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations
NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD
NFSD: Free async copy information in nfsd4_cb_offload_release()
NFSD: Fix nfsd4_shutdown_copy()
NFSD: Add a tracepoint to record canceled async COPY operations
nfsd: make nfsd4_session->se_flags a bool
nfsd: remove nfsd4_session->se_bchannel
nfsd: make use of warning provided by refcount_t
nfsd: Don't fail OP_SETCLIENTID when there are too many clients.
svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init()
xdrgen: Remove program_stat_to_errno() call sites
xdrgen: Update the files included in client-side source code
xdrgen: Remove check for "nfs_ok" in C templates
xdrgen: Remove tracepoint call site
...
|
|
The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging
tosend bytes, which is either msg->sg.size or a smaller value apply_bytes.
Potential problems with this strategy are as follows:
- If the actual sent bytes are smaller than tosend, we need to charge some
bytes back, as in line 487, which is okay but seems not clean.
- When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may
miss uncharging (msg->sg.size - apply_bytes) bytes.
[...]
415 tosend = msg->sg.size;
416 if (psock->apply_bytes && psock->apply_bytes < tosend)
417 tosend = psock->apply_bytes;
[...]
443 sk_msg_return(sk, msg, tosend);
444 release_sock(sk);
446 origsize = msg->sg.size;
447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
448 msg, tosend, flags);
449 sent = origsize - msg->sg.size;
[...]
454 lock_sock(sk);
455 if (unlikely(ret < 0)) {
456 int free = sk_msg_free_nocharge(sk, msg);
458 if (!cork)
459 *copied -= free;
460 }
[...]
487 if (eval == __SK_REDIRECT)
488 sk_mem_charge(sk, tosend - sent);
[...]
When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply,
the following warning will be reported:
------------[ cut here ]------------
WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0
Modules linked in:
CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: events sk_psock_destroy
RIP: 0010:inet_sock_destruct+0x190/0x1a0
RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206
RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800
RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900
RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0
R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400
R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100
FS: 0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x89/0x130
? inet_sock_destruct+0x190/0x1a0
? report_bug+0xfc/0x1e0
? handle_bug+0x5c/0xa0
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? inet_sock_destruct+0x190/0x1a0
__sk_destruct+0x25/0x220
sk_psock_destroy+0x2b2/0x310
process_scheduled_works+0xa3/0x3e0
worker_thread+0x117/0x240
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---
In __SK_REDIRECT, a more concise way is delaying the uncharging after sent
bytes are finalized, and uncharge this value. When (ret < 0), we shall
invoke sk_msg_free.
Same thing happens in case __SK_DROP, when tosend is set to apply_bytes,
we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same
warning will be reported in selftest.
[...]
468 case __SK_DROP:
469 default:
470 sk_msg_free_partial(sk, msg, tosend);
471 sk_msg_apply_bytes(psock, tosend);
472 *copied -= (tosend + delta);
473 return -EACCES;
[...]
So instead of sk_msg_free_partial we can do sk_msg_free here.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Fixes: 8ec95b94716a ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241016234838.3167769-3-zijianzhang@bytedance.com
|
|
When adding conn, it is necessary to increase and retain the conn reference
count at the same time.
Another problem was fixed along the way, conn_put is missing when hcon is NULL
in the timeout routine.
Fixes: e6720779ae61 ("Bluetooth: SCO: Use kref to track lifetime of sco_conn")
Reported-and-tested-by: syzbot+489f78df4709ac2bfdd3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=489f78df4709ac2bfdd3
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This fixes possible deadlocks like the following caused by
hci_cmd_sync_dequeue causing the destroy function to run:
INFO: task kworker/u19:0:143 blocked for more than 120 seconds.
Tainted: G W O 6.8.0-2024-03-19-intel-next-iLS-24ww14 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u19:0 state:D stack:0 pid:143 tgid:143 ppid:2 flags:0x00004000
Workqueue: hci0 hci_cmd_sync_work [bluetooth]
Call Trace:
<TASK>
__schedule+0x374/0xaf0
schedule+0x3c/0xf0
schedule_preempt_disabled+0x1c/0x30
__mutex_lock.constprop.0+0x3ef/0x7a0
__mutex_lock_slowpath+0x13/0x20
mutex_lock+0x3c/0x50
mgmt_set_connectable_complete+0xa4/0x150 [bluetooth]
? kfree+0x211/0x2a0
hci_cmd_sync_dequeue+0xae/0x130 [bluetooth]
? __pfx_cmd_complete_rsp+0x10/0x10 [bluetooth]
cmd_complete_rsp+0x26/0x80 [bluetooth]
mgmt_pending_foreach+0x4d/0x70 [bluetooth]
__mgmt_power_off+0x8d/0x180 [bluetooth]
? _raw_spin_unlock_irq+0x23/0x40
hci_dev_close_sync+0x445/0x5b0 [bluetooth]
hci_set_powered_sync+0x149/0x250 [bluetooth]
set_powered_sync+0x24/0x60 [bluetooth]
hci_cmd_sync_work+0x90/0x150 [bluetooth]
process_one_work+0x13e/0x300
worker_thread+0x2f7/0x420
? __pfx_worker_thread+0x10/0x10
kthread+0x107/0x140
? __pfx_kthread+0x10/0x10
ret_from_fork+0x3d/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Tested-by: Kiran K <kiran.k@intel.com>
Fixes: f53e1c9c726d ("Bluetooth: MGMT: Fix possible crash on mgmt_index_removed")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This fixes the following crash:
==================================================================
BUG: KASAN: slab-use-after-free in set_powered_sync+0x3a/0xc0 net/bluetooth/mgmt.c:1353
Read of size 8 at addr ffff888029b4dd18 by task kworker/u9:0/54
CPU: 1 UID: 0 PID: 54 Comm: kworker/u9:0 Not tainted 6.11.0-rc6-syzkaller-01155-gf723224742fc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
q kasan_report+0x143/0x180 mm/kasan/report.c:601
set_powered_sync+0x3a/0xc0 net/bluetooth/mgmt.c:1353
hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:328
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
worker_thread+0x86d/0xd10 kernel/workqueue.c:3389
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Allocated by task 5247:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
kasan_kmalloc include/linux/kasan.h:211 [inline]
__kmalloc_cache_noprof+0x19c/0x2c0 mm/slub.c:4193
kmalloc_noprof include/linux/slab.h:681 [inline]
kzalloc_noprof include/linux/slab.h:807 [inline]
mgmt_pending_new+0x65/0x250 net/bluetooth/mgmt_util.c:269
mgmt_pending_add+0x36/0x120 net/bluetooth/mgmt_util.c:296
set_powered+0x3cd/0x5e0 net/bluetooth/mgmt.c:1394
hci_mgmt_cmd+0xc47/0x11d0 net/bluetooth/hci_sock.c:1712
hci_sock_sendmsg+0x7b8/0x11c0 net/bluetooth/hci_sock.c:1832
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
sock_write_iter+0x2dd/0x400 net/socket.c:1160
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xa72/0xc90 fs/read_write.c:590
ksys_write+0x1a0/0x2c0 fs/read_write.c:643
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5246:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
__kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2256 [inline]
slab_free mm/slub.c:4477 [inline]
kfree+0x149/0x360 mm/slub.c:4598
settings_rsp+0x2bc/0x390 net/bluetooth/mgmt.c:1443
mgmt_pending_foreach+0xd1/0x130 net/bluetooth/mgmt_util.c:259
__mgmt_power_off+0x112/0x420 net/bluetooth/mgmt.c:9455
hci_dev_close_sync+0x665/0x11a0 net/bluetooth/hci_sync.c:5191
hci_dev_do_close net/bluetooth/hci_core.c:483 [inline]
hci_dev_close+0x112/0x210 net/bluetooth/hci_core.c:508
sock_do_ioctl+0x158/0x460 net/socket.c:1222
sock_ioctl+0x629/0x8e0 net/socket.c:1341
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83gv
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Reported-by: syzbot+03d6270b6425df1605bf@syzkaller.appspotmail.com
Tested-by: syzbot+03d6270b6425df1605bf@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=03d6270b6425df1605bf
Fixes: 275f3f648702 ("Bluetooth: Fix not checking MGMT cmd pending queue")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Following sequence in hsr_init_sk() is invalid :
skb_reset_mac_header(skb);
skb_reset_mac_len(skb);
skb_reset_network_header(skb);
skb_reset_transport_header(skb);
It is invalid because skb_reset_mac_len() needs the correct
network header, which should be after the mac header.
This patch moves the skb_reset_network_header()
and skb_reset_transport_header() before
the call to dev_hard_header().
As a result skb->mac_len is no longer set to a value
close to 65535.
Fixes: 48b491a5cc74 ("net: hsr: fix mac_len checks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: George McCollister <george.mccollister@gmail.com>
Link: https://patch.msgid.link/20241122171343.897551-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
RFC8981 section 3.4 says that existing temporary addresses must have their
lifetimes adjusted so that no temporary addresses should ever remain "valid"
or "preferred" longer than the incoming SLAAC Prefix Information. This would
strongly imply in Linux's case that if the "mngtmpaddr" address is deleted or
un-flagged as such, its corresponding temporary addresses must be cleared out
right away.
But now the temporary address is renewed even after ‘mngtmpaddr’ is removed
or becomes unmanaged as manage_tempaddrs() set temporary addresses
prefered/valid time to 0, and later in addrconf_verify_rtnl() all checkings
failed to remove the addresses. Fix this by deleting the temporary address
directly for these situations.
Fixes: 778964f2fdf0 ("ipv6/addrconf: fix timing bug in tempaddr regen")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Passing MSG_PEEK flag to skb_recv_datagram() increments skb refcount
(skb->users) and iucv_sock_recvmsg() does not decrement skb refcount
at exit.
This results in skb memory leak in skb_queue_purge() and WARN_ON in
iucv_sock_destruct() during socket close. To fix this decrease
skb refcount by one if MSG_PEEK is set in order to prevent memory
leak and WARN_ON.
WARNING: CPU: 2 PID: 6292 at net/iucv/af_iucv.c:286 iucv_sock_destruct+0x144/0x1a0 [af_iucv]
CPU: 2 PID: 6292 Comm: afiucv_test_msg Kdump: loaded Tainted: G W 6.10.0-rc7 #1
Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
Call Trace:
[<001587c682c4aa98>] iucv_sock_destruct+0x148/0x1a0 [af_iucv]
[<001587c682c4a9d0>] iucv_sock_destruct+0x80/0x1a0 [af_iucv]
[<001587c704117a32>] __sk_destruct+0x52/0x550
[<001587c704104a54>] __sock_release+0xa4/0x230
[<001587c704104c0c>] sock_close+0x2c/0x40
[<001587c702c5f5a8>] __fput+0x2e8/0x970
[<001587c7024148c4>] task_work_run+0x1c4/0x2c0
[<001587c7023b0716>] do_exit+0x996/0x1050
[<001587c7023b13aa>] do_group_exit+0x13a/0x360
[<001587c7023b1626>] __s390x_sys_exit_group+0x56/0x60
[<001587c7022bccca>] do_syscall+0x27a/0x380
[<001587c7049a6a0c>] __do_syscall+0x9c/0x160
[<001587c7049ce8a8>] system_call+0x70/0x98
Last Breaking-Event-Address:
[<001587c682c4a9d4>] iucv_sock_destruct+0x84/0x1a0 [af_iucv]
Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support")
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com>
Signed-off-by: Sidraya Jayagond <sidraya@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20241119152219.3712168-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In l2tp's net exit handler, we check that an IDR is empty before
destroying it:
WARN_ON_ONCE(!idr_is_empty(&pn->l2tp_tunnel_idr));
idr_destroy(&pn->l2tp_tunnel_idr);
By forcing memory allocation failures in idr_alloc_32, syzbot is able
to provoke a condition where idr_is_empty returns false despite there
being no items in the IDR. This turns out to be because the radix tree
of the IDR contains only internal radix-tree nodes and it is this that
causes idr_is_empty to return false. The internal nodes are cleaned by
idr_destroy.
Use idr_for_each to check that the IDR is empty instead of
idr_is_empty to avoid the problem.
Reported-by: syzbot+332fe1e67018625f63c9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=332fe1e67018625f63c9
Fixes: 73d33bd063c4 ("l2tp: avoid using drain_workqueue in l2tp_pre_exit_net")
Signed-off-by: James Chapman <jchapman@katalix.com>
Link: https://patch.msgid.link/20241118140411.1582555-1-jchapman@katalix.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When the umem is shared, the DMA mapping is also shared between the xsk
pools, therefore it should stay valid as long as at least 1 user remains.
However, the pool also keeps the copies of DMA-related information that are
initialized in the same way in xp_init_dma_info(), but cleared by
xp_dma_unmap() only for the last remaining pool, this causes the problems
below.
The first one is that the commit adbf5a42341f ("ice: remove af_xdp_zc_qps
bitmap") relies on pool->dev to determine the presence of a ZC pool on a
given queue, avoiding internal bookkeeping. This works perfectly fine if
the UMEM is not shared, but reliably fails otherwise as stated in the
linked report.
The second one is pool->dma_pages which is dynamically allocated and
only freed in xp_dma_unmap(), this leads to a small memory leak. kmemleak
does not catch it, but by printing the allocation results after terminating
the userspace program it is possible to see that all addresses except the
one belonging to the last detached pool are still accessible through the
kmemleak dump functionality.
Always clear the DMA mapping information from the pool and free
pool->dma_pages when unmapping the pool, so that the only difference
between results of the last remaining user's call and the ones before would
be the destruction of the DMA mapping.
Fixes: adbf5a42341f ("ice: remove af_xdp_zc_qps bitmap")
Fixes: 921b68692abb ("xsk: Enable sharing of dma mappings")
Reported-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com>
Closes: https://lore.kernel.org/PA4P194MB10056F208AF221D043F57A3D86512@PA4P194MB1005.EURP194.PROD.OUTLOOK.COM
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20241122112912.89881-1-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Jordy says:
"
In the xsk_map_delete_elem function an unsigned integer
(map->max_entries) is compared with a user-controlled signed integer
(k). Due to implicit type conversion, a large unsigned value for
map->max_entries can bypass the intended bounds check:
if (k >= map->max_entries)
return -EINVAL;
This allows k to hold a negative value (between -2147483648 and -2),
which is then used as an array index in m->xsk_map[k], which results
in an out-of-bounds access.
spin_lock_bh(&m->lock);
map_entry = &m->xsk_map[k]; // Out-of-bounds map_entry
old_xs = unrcu_pointer(xchg(map_entry, NULL)); // Oob write
if (old_xs)
xsk_map_sock_delete(old_xs, map_entry);
spin_unlock_bh(&m->lock);
The xchg operation can then be used to cause an out-of-bounds write.
Moreover, the invalid map_entry passed to xsk_map_sock_delete can lead
to further memory corruption.
"
It indeed results in following splat:
[76612.897343] BUG: unable to handle page fault for address: ffffc8fc2e461108
[76612.904330] #PF: supervisor write access in kernel mode
[76612.909639] #PF: error_code(0x0002) - not-present page
[76612.914855] PGD 0 P4D 0
[76612.917431] Oops: Oops: 0002 [#1] PREEMPT SMP
[76612.921859] CPU: 11 UID: 0 PID: 10318 Comm: a.out Not tainted 6.12.0-rc1+ #470
[76612.929189] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[76612.939781] RIP: 0010:xsk_map_delete_elem+0x2d/0x60
[76612.944738] Code: 00 00 41 54 55 53 48 63 2e 3b 6f 24 73 38 4c 8d a7 f8 00 00 00 48 89 fb 4c 89 e7 e8 2d bf 05 00 48 8d b4 eb 00 01 00 00 31 ff <48> 87 3e 48 85 ff 74 05 e8 16 ff ff ff 4c 89 e7 e8 3e bc 05 00 31
[76612.963774] RSP: 0018:ffffc9002e407df8 EFLAGS: 00010246
[76612.969079] RAX: 0000000000000000 RBX: ffffc9002e461000 RCX: 0000000000000000
[76612.976323] RDX: 0000000000000001 RSI: ffffc8fc2e461108 RDI: 0000000000000000
[76612.983569] RBP: ffffffff80000001 R08: 0000000000000000 R09: 0000000000000007
[76612.990812] R10: ffffc9002e407e18 R11: ffff888108a38858 R12: ffffc9002e4610f8
[76612.998060] R13: ffff888108a38858 R14: 00007ffd1ae0ac78 R15: ffffc9002e4610c0
[76613.005303] FS: 00007f80b6f59740(0000) GS:ffff8897e0ec0000(0000) knlGS:0000000000000000
[76613.013517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76613.019349] CR2: ffffc8fc2e461108 CR3: 000000011e3ef001 CR4: 00000000007726f0
[76613.026595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76613.033841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[76613.041086] PKRU: 55555554
[76613.043842] Call Trace:
[76613.046331] <TASK>
[76613.048468] ? __die+0x20/0x60
[76613.051581] ? page_fault_oops+0x15a/0x450
[76613.055747] ? search_extable+0x22/0x30
[76613.059649] ? search_bpf_extables+0x5f/0x80
[76613.063988] ? exc_page_fault+0xa9/0x140
[76613.067975] ? asm_exc_page_fault+0x22/0x30
[76613.072229] ? xsk_map_delete_elem+0x2d/0x60
[76613.076573] ? xsk_map_delete_elem+0x23/0x60
[76613.080914] __sys_bpf+0x19b7/0x23c0
[76613.084555] __x64_sys_bpf+0x1a/0x20
[76613.088194] do_syscall_64+0x37/0xb0
[76613.091832] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[76613.096962] RIP: 0033:0x7f80b6d1e88d
[76613.100592] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[76613.119631] RSP: 002b:00007ffd1ae0ac68 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[76613.131330] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f80b6d1e88d
[76613.142632] RDX: 0000000000000098 RSI: 00007ffd1ae0ad20 RDI: 0000000000000003
[76613.153967] RBP: 00007ffd1ae0adc0 R08: 0000000000000000 R09: 0000000000000000
[76613.166030] R10: 00007f80b6f77040 R11: 0000000000000206 R12: 00007ffd1ae0aed8
[76613.177130] R13: 000055ddf42ce1e9 R14: 000055ddf42d0d98 R15: 00007f80b6fab040
[76613.188129] </TASK>
Fix this by simply changing key type from int to u32.
Fixes: fbfc504a24f5 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP")
CC: stable@vger.kernel.org
Reported-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Jordy Zomer <jordyzomer@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20241122121030.716788-2-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
vsock defines a BPF callback to be invoked when close() is called. However,
this callback is never actually executed. As a result, a closed vsock
socket is not automatically removed from the sockmap/sockhash.
Introduce a dummy vsock_close() and make vsock_release() call proto::close.
Note: changes in __vsock_release() look messy, but it's only due to indent
level reduction and variables xmas tree reorder.
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-3-f1b9669cacdc@rbox.co
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
|
|
When a verdict program simply passes a packet without redirection, sk_msg
is enqueued on sk_psock::ingress_msg. Add a missing check to poll().
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-1-f1b9669cacdc@rbox.co
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
|
|
When skb needs GSO and wrap around happens, if xo->seq.low (seqno of
the first skb segment) is before the last seq number but oseq (seqno
of the last segment) is after it, xo->seq.low is still bigger than
replay_esn->oseq while oseq is smaller than it, so the update of
replay_esn->oseq_hi is missed for this case wrap around because of
the change in the cited commit.
For example, if sending a packet with gso_segs=3 while old
replay_esn->oseq=0xfffffffe, we calculate:
xo->seq.low = 0xfffffffe + 1 = 0x0xffffffff
oseq = 0xfffffffe + 3 = 0x1
(oseq < replay_esn->oseq) is true, but (xo->seq.low <
replay_esn->oseq) is false, so replay_esn->oseq_hi is not incremented.
To fix this issue, change the outer checking back for the update of
replay_esn->oseq_hi. And add new checking inside for the update of
packet's oseq_hi.
Fixes: 4b549ccce941 ("xfrm: replay: Fix ESN wrap around for GSO")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Commit under fixes extended extack reporting to dumps.
It works under normal conditions, because extack errors are
usually reported during ->start() or the first ->dump(),
it's quite rare that the dump starts okay but fails later.
If the dump does fail later, however, the input skb will
already have the initiating message pulled, so checking
if bad attr falls within skb->data will fail.
Switch the check to using nlh, which is always valid.
syzbot found a way to hit that scenario by filling up
the receive queue. In this case we initiate a dump
but don't call ->dump() until there is read space for
an skb.
WARNING: CPU: 1 PID: 5845 at net/netlink/af_netlink.c:2210 netlink_ack_tlv_fill+0x1a8/0x560 net/netlink/af_netlink.c:2209
RIP: 0010:netlink_ack_tlv_fill+0x1a8/0x560 net/netlink/af_netlink.c:2209
Call Trace:
<TASK>
netlink_dump_done+0x513/0x970 net/netlink/af_netlink.c:2250
netlink_dump+0x91f/0xe10 net/netlink/af_netlink.c:2351
netlink_recvmsg+0x6bb/0x11d0 net/netlink/af_netlink.c:1983
sock_recvmsg_nosec net/socket.c:1051 [inline]
sock_recvmsg+0x22f/0x280 net/socket.c:1073
__sys_recvfrom+0x246/0x3d0 net/socket.c:2267
__do_sys_recvfrom net/socket.c:2285 [inline]
__se_sys_recvfrom net/socket.c:2281 [inline]
__x64_sys_recvfrom+0xde/0x100 net/socket.c:2281
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff37dd17a79
Reported-by: syzbot+d4373fa8042c06cefa84@syzkaller.appspotmail.com
Fixes: 8af4f60472fc ("netlink: support all extack types in dumps")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241119224432.1713040-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot found that rtnl_dump_ifinfo() could return with a lock held [1]
Move code around so that rtnl_link_ops_put() and put_net()
can be called at the end of this function.
[1]
WARNING: lock held when returning to user space!
6.12.0-rc7-syzkaller-01681-g38f83a57aa8e #0 Not tainted
syz-executor399/5841 is leaving the kernel with locks still held!
1 lock held by syz-executor399/5841:
#0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
#0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
#0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rtnl_link_ops_get+0x22/0x250 net/core/rtnetlink.c:555
Fixes: 43c7ce69d28e ("rtnetlink: Protect struct rtnl_link_ops with SRCU.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241121194105.3632507-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is no reason only the usbg transport would not be its own module,
so make it tristate.
In particular, this fixes a couple of issues the current bool had:
- trans_usbg was apparently not compiled at all when NET_9P=m
- the workaround added in commit 2193ede180dd ("net/9p/usbg: fix
CONFIG_USB_GADGET dependency") became redundant because a tristate item
cannot be built-in when its dependency is a module, so we can depend on
USB_GADGET "normally" again.
Cc: Michael Grzeschik <m.grzeschik@pengutronix.de>
Link: https://lkml.kernel.org/r/ZzhWRPDNwu225NWz@codewreck.org
Message-ID: <20241122144754.1231919-1-asmadeus@codewreck.org>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
|
|
Kernel logs indicate an IRQ was double-freed.
Pass correct device ID during IRQ release.
Fixes: 71ebd71921e45 ("xen/9pfs: connect to the backend")
Signed-off-by: Alex Zenla <alex@edera.dev>
Signed-off-by: Alexander Merritt <alexander@edera.dev>
Signed-off-by: Ariadne Conill <ariadne@ariadne.space>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241121225100.5736-1-alexander@edera.dev>
[Dominique: remove confusing variable reset to 0]
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core:
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of
rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infrastructure is guarded by the
CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code:
- Add small file operations for debugfs, to reduce the struct ops
size.
- Refactoring and optimization for the implementation of page_frag
API, This is a preparatory work to consolidate the page_frag
implementation.
Netfilter:
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users the
option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent CI
improvements.
BPF:
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols:
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after
close, the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state
lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per
device neigh lists.
Driver API:
- Introduce a unified interface to configure transmission H/W
shaping, and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling:
- forwarding: introduce deferred commands, to simplify the cleanup
phase
Drivers:
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- add support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Add representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- add support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implement page pool support
- macsec:
- inherit lower device's features and TSO limits when
offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- add dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- add clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature"
* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
mm: page_frag: fix a compile error when kernel is not compiled
Documentation: tipc: fix formatting issue in tipc.rst
selftests: nic_performance: Add selftest for performance of NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex states
selftests: nic_link_layer: Add link layer selftest for NIC driver
bnxt_en: Add FW trace coredump segments to the coredump
bnxt_en: Add a new ethtool -W dump flag
bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
bnxt_en: Add functions to copy host context memory
bnxt_en: Do not free FW log context memory
bnxt_en: Manage the FW trace context memory
bnxt_en: Allocate backing store memory for FW trace logs
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
bnxt_en: Refactor bnxt_free_ctx_mem()
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
bnxt_en: Update firmware interface spec to 1.10.3.85
selftests/bpf: Add some tests with sockmap SK_PASS
bpf: fix recursive lock when verdict program return SK_PASS
wireguard: device: support big tcp GSO
wireguard: selftests: load nf_conntrack if not present
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Add BPF uprobe session support (Jiri Olsa)
- Optimize uprobe performance (Andrii Nakryiko)
- Add bpf_fastcall support to helpers and kfuncs (Eduard Zingerman)
- Avoid calling free_htab_elem() under hash map bucket lock (Hou Tao)
- Prevent tailcall infinite loop caused by freplace (Leon Hwang)
- Mark raw_tracepoint arguments as nullable (Kumar Kartikeya Dwivedi)
- Introduce uptr support in the task local storage map (Martin KaFai
Lau)
- Stringify errno log messages in libbpf (Mykyta Yatsenko)
- Add kmem_cache BPF iterator for perf's lock profiling (Namhyung Kim)
- Support BPF objects of either endianness in libbpf (Tony Ambardar)
- Add ksym to struct_ops trampoline to fix stack trace (Xu Kuohai)
- Introduce private stack for eligible BPF programs (Yonghong Song)
- Migrate samples/bpf tests to selftests/bpf test_progs (Daniel T. Lee)
- Migrate test_sock to selftests/bpf test_progs (Jordan Rife)
* tag 'bpf-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (152 commits)
libbpf: Change hash_combine parameters from long to unsigned long
selftests/bpf: Fix build error with llvm 19
libbpf: Fix memory leak in bpf_program__attach_uprobe_multi
bpf: use common instruction history across all states
bpf: Add necessary migrate_disable to range_tree.
bpf: Do not alloc arena on unsupported arches
selftests/bpf: Set test path for token/obj_priv_implicit_token_envvar
selftests/bpf: Add a test for arena range tree algorithm
bpf: Introduce range_tree data structure and use it in bpf arena
samples/bpf: Remove unused variable in xdp2skb_meta_kern.c
samples/bpf: Remove unused variables in tc_l2_redirect_kern.c
bpftool: Cast variable `var` to long long
bpf, x86: Propagate tailcall info only for subprogs
bpf: Add kernel symbol for struct_ops trampoline
bpf: Use function pointers count as struct_ops links count
bpf: Remove unused member rcu from bpf_struct_ops_map
selftests/bpf: Add struct_ops prog private stack tests
bpf: Support private stack for struct_ops progs
selftests/bpf: Add tracing prog private stack tests
bpf, x86: Support private stack in jit
...
|
|
Large amount of mount hangs observed during hotplugging of 9pfs devices. The
9pfs Xen driver attempts to initialize itself more than once, causing the
frontend and backend to disagree: the backend listens on a channel that the
frontend does not send on, resulting in stalled processing.
Only allow initialization of 9p frontend once.
Fixes: c15fe55d14b3b ("9p/xen: fix connection sequence")
Signed-off-by: Alex Zenla <alex@edera.dev>
Signed-off-by: Alexander Merritt <alexander@edera.dev>
Signed-off-by: Ariadne Conill <ariadne@ariadne.space>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241119211633.38321-1-alexander@edera.dev>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A rather large update for timekeeping and timers:
- The final step to get rid of auto-rearming posix-timers
posix-timers are currently auto-rearmed by the kernel when the
signal of the timer is ignored so that the timer signal can be
delivered once the corresponding signal is unignored.
This requires to throttle the timer to prevent a DoS by small
intervals and keeps the system pointlessly out of low power states
for no value. This is a long standing non-trivial problem due to
the lock order of posix-timer lock and the sighand lock along with
life time issues as the timer and the sigqueue have different life
time rules.
Cure this by:
- Embedding the sigqueue into the timer struct to have the same
life time rules. Aside of that this also avoids the lookup of
the timer in the signal delivery and rearm path as it's just a
always valid container_of() now.
- Queuing ignored timer signals onto a seperate ignored list.
- Moving queued timer signals onto the ignored list when the
signal is switched to SIG_IGN before it could be delivered.
- Walking the ignored list when SIG_IGN is lifted and requeue the
signals to the actual signal lists. This allows the signal
delivery code to rearm the timer.
This also required to consolidate the signal delivery rules so they
are consistent across all situations. With that all self test
scenarios finally succeed.
- Core infrastructure for VFS multigrain timestamping
This is required to allow the kernel to use coarse grained time
stamps by default and switch to fine grained time stamps when inode
attributes are actively observed via getattr().
These changes have been provided to the VFS tree as well, so that
the VFS specific infrastructure could be built on top.
- Cleanup and consolidation of the sleep() infrastructure
- Move all sleep and timeout functions into one file
- Rework udelay() and ndelay() into proper documented inline
functions and replace the hardcoded magic numbers by proper
defines.
- Rework the fsleep() implementation to take the reality of the
timer wheel granularity on different HZ values into account.
Right now the boundaries are hard coded time ranges which fail
to provide the requested accuracy on different HZ settings.
- Update documentation for all sleep/timeout related functions
and fix up stale documentation links all over the place
- Fixup a few usage sites
- Rework of timekeeping and adjtimex(2) to prepare for multiple PTP
clocks
A system can have multiple PTP clocks which are participating in
seperate and independent PTP clock domains. So far the kernel only
considers the PTP clock which is based on CLOCK TAI relevant as
that's the clock which drives the timekeeping adjustments via the
various user space daemons through adjtimex(2).
The non TAI based clock domains are accessible via the file
descriptor based posix clocks, but their usability is very limited.
They can't be accessed fast as they always go all the way out to
the hardware and they cannot be utilized in the kernel itself.
As Time Sensitive Networking (TSN) gains traction it is required to
provide fast user and kernel space access to these clocks.
The approach taken is to utilize the timekeeping and adjtimex(2)
infrastructure to provide this access in a similar way how the
kernel provides access to clock MONOTONIC, REALTIME etc.
Instead of creating a duplicated infrastructure this rework
converts timekeeping and adjtimex(2) into generic functionality
which operates on pointers to data structures instead of using
static variables.
This allows to provide time accessors and adjtimex(2) functionality
for the independent PTP clocks in a subsequent step.
- Consolidate hrtimer initialization
hrtimers are set up by initializing the data structure and then
seperately setting the callback function for historical reasons.
That's an extra unnecessary step and makes Rust support less
straight forward than it should be.
Provide a new set of hrtimer_setup*() functions and convert the
core code and a few usage sites of the less frequently used
interfaces over.
The bulk of the htimer_init() to hrtimer_setup() conversion is
already prepared and scheduled for the next merge window.
- Drivers:
- Ensure that the global timekeeping clocksource is utilizing the
cluster 0 timer on MIPS multi-cluster systems.
Otherwise CPUs on different clusters use their cluster specific
clocksource which is not guaranteed to be synchronized with
other clusters.
- Mostly boring cleanups, fixes, improvements and code movement"
* tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits)
posix-timers: Fix spurious warning on double enqueue versus do_exit()
clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties
clocksource/drivers/gpx: Remove redundant casts
clocksource/drivers/timer-ti-dm: Fix child node refcount handling
dt-bindings: timer: actions,owl-timer: convert to YAML
clocksource/drivers/ralink: Add Ralink System Tick Counter driver
clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource
clocksource/drivers/timer-ti-dm: Don't fail probe if int not found
clocksource/drivers:sp804: Make user selectable
clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions
hrtimers: Delete hrtimer_init_on_stack()
alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack()
io_uring: Switch to use hrtimer_setup_on_stack()
sched/idle: Switch to use hrtimer_setup_on_stack()
hrtimers: Delete hrtimer_init_sleeper_on_stack()
wait: Switch to use hrtimer_setup_sleeper_on_stack()
timers: Switch to use hrtimer_setup_sleeper_on_stack()
net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack()
futex: Switch to use hrtimer_setup_sleeper_on_stack()
fs/aio: Switch to use hrtimer_setup_sleeper_on_stack()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"This contains a single series from Uros to replace uses of
<linux/random.h> with prandom.h or other more specific headers
as needed, in order to avoid a circular header issue.
Uros' goal is to be able to use percpu.h from prandom.h, which
will then allow him to define __percpu in percpu.h rather than
in compiler_types.h"
* tag 'random-6.13-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: Include <linux/percpu.h> in <linux/prandom.h>
random: Do not include <linux/prandom.h> in <linux/random.h>
netem: Include <linux/prandom.h> in sch_netem.c
lib/test_scanf: Include <linux/prandom.h> instead of <linux/random.h>
lib/test_parman: Include <linux/prandom.h> instead of <linux/random.h>
bpf/tests: Include <linux/prandom.h> instead of <linux/random.h>
lib/rbtree-test: Include <linux/prandom.h> instead of <linux/random.h>
random32: Include <linux/prandom.h> instead of <linux/random.h>
kunit: string-stream-test: Include <linux/prandom.h>
lib/interval_tree_test.c: Include <linux/prandom.h> instead of <linux/random.h>
bpf: Include <linux/prandom.h> instead of <linux/random.h>
scsi: libfcoe: Include <linux/prandom.h> instead of <linux/random.h>
fscrypt: Include <linux/once.h> in fs/crypto/keyring.c
mtd: tests: Include <linux/prandom.h> instead of <linux/random.h>
media: vivid: Include <linux/prandom.h> in vivid-vid-cap.c
drm/lib: Include <linux/prandom.h> instead of <linux/random.h>
drm/i915/selftests: Include <linux/prandom.h> instead of <linux/random.h>
crypto: testmgr: Include <linux/prandom.h> instead of <linux/random.h>
x86/kaslr: Include <linux/prandom.h> instead of <linux/random.h>
|