Age | Commit message (Collapse) | Author |
|
Merge in late fixes to prepare for the 6.13 net-next PR.
Conflicts:
include/linux/phy.h
41ffcd95015f net: phy: fix phylib's dual eee_enabled
721aa69e708b net: phy: convert eee_broken_modes to a linkmode bitmap
https://lore.kernel.org/all/20241118135512.1039208b@canb.auug.org.au/
drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c
2160428bcb20 net: txgbe: fix null pointer to pcs
2160428bcb20 net: txgbe: remove GPIO interrupt controller
Adjacent commits:
include/linux/phy.h
41ffcd95015f net: phy: fix phylib's dual eee_enabled
516a5f11eb97 net: phy: respect cached advertising when re-enabling EEE
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzkaller reported a corrupted list in ieee802154_if_remove. [1]
Remove an IEEE 802.15.4 network interface after unregister an IEEE 802.15.4
hardware device from the system.
CPU0 CPU1
==== ====
genl_family_rcv_msg_doit ieee802154_unregister_hw
ieee802154_del_iface ieee802154_remove_interfaces
rdev_del_virtual_intf_deprecated list_del(&sdata->list)
ieee802154_if_remove
list_del_rcu
The net device has been unregistered, since the rcu grace period,
unregistration must be run before ieee802154_if_remove.
To avoid this issue, add a check for local->interfaces before deleting
sdata list.
[1]
kernel BUG at lib/list_debug.c:58!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 6277 Comm: syz-executor157 Not tainted 6.12.0-rc6-syzkaller-00005-g557329bcecc2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:__list_del_entry_valid_or_report+0xf4/0x140 lib/list_debug.c:56
Code: e8 a1 7e 00 07 90 0f 0b 48 c7 c7 e0 37 60 8c 4c 89 fe e8 8f 7e 00 07 90 0f 0b 48 c7 c7 40 38 60 8c 4c 89 fe e8 7d 7e 00 07 90 <0f> 0b 48 c7 c7 a0 38 60 8c 4c 89 fe e8 6b 7e 00 07 90 0f 0b 48 c7
RSP: 0018:ffffc9000490f3d0 EFLAGS: 00010246
RAX: 000000000000004e RBX: dead000000000122 RCX: d211eee56bb28d00
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffff88805b278dd8 R08: ffffffff8174a12c R09: 1ffffffff2852f0d
R10: dffffc0000000000 R11: fffffbfff2852f0e R12: dffffc0000000000
R13: dffffc0000000000 R14: dead000000000100 R15: ffff88805b278cc0
FS: 0000555572f94380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056262e4a3000 CR3: 0000000078496000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__list_del_entry_valid include/linux/list.h:124 [inline]
__list_del_entry include/linux/list.h:215 [inline]
list_del_rcu include/linux/rculist.h:157 [inline]
ieee802154_if_remove+0x86/0x1e0 net/mac802154/iface.c:687
rdev_del_virtual_intf_deprecated net/ieee802154/rdev-ops.h:24 [inline]
ieee802154_del_iface+0x2c0/0x5c0 net/ieee802154/nl-phy.c:323
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2551
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:744
____sys_sendmsg+0x52a/0x7e0 net/socket.c:2607
___sys_sendmsg net/socket.c:2661 [inline]
__sys_sendmsg+0x292/0x380 net/socket.c:2690
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Reported-and-tested-by: syzbot+985f827280dc3a6e7e92@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=985f827280dc3a6e7e92
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/20241113095129.1457225-1-lizhi.xu@windriver.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
When the stream_verdict program returns SK_PASS, it places the received skb
into its own receive queue, but a recursive lock eventually occurs, leading
to an operating system deadlock. This issue has been present since v6.9.
'''
sk_psock_strp_data_ready
write_lock_bh(&sk->sk_callback_lock)
strp_data_ready
strp_read_sock
read_sock -> tcp_read_sock
strp_recv
cb.rcv_msg -> sk_psock_strp_read
# now stream_verdict return SK_PASS without peer sock assign
__SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
sk_psock_verdict_apply
sk_psock_skb_ingress_self
sk_psock_skb_ingress_enqueue
sk_psock_data_ready
read_lock_bh(&sk->sk_callback_lock) <= dead lock
'''
This topic has been discussed before, but it has not been fixed.
Previous discussion:
https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch
Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
Reported-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20241118030910.36230-2-mrpre@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The ndev->npinfo pointer in __netpoll_setup() is RCU-protected but is being
accessed directly for a NULL check. While no RCU read lock is held in this
context, we should still use proper RCU primitives for consistency and
correctness.
Replace the direct NULL check with rcu_access_pointer(), which is the
appropriate primitive when only checking for NULL without dereferencing
the pointer. This function provides the necessary ordering guarantees
without requiring RCU read-side protection.
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20241118-netpoll_rcu-v1-1-a1888dcb4a02@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The errno should be replaced with drop reasons in fib_validate_source(),
and the "-EINVAL" shouldn't be returned. And this causes a warning, which
is reported by syzkaller:
netlink: 'syz-executor371': attribute type 4 has an invalid length.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 __sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
WARNING: CPU: 0 PID: 5842 at net/core/skbuff.c:1219 sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241
Modules linked in:
CPU: 0 UID: 0 PID: 5842 Comm: syz-executor371 Not tainted 6.12.0-rc6-syzkaller-01362-ga58f00ed24b8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:__sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
RIP: 0010:sk_skb_reason_drop+0x87/0x380 net/core/skbuff.c:1241
Code: 00 00 00 fc ff df 41 8d 9e 00 00 fc ff bf 01 00 fc ff 89 de e8 ea 9f 08 f8 81 fb 00 00 fc ff 77 3a 4c 89 e5 e8 9a 9b 08 f8 90 <0f> 0b 90 eb 5e bf 01 00 00 00 89 ee e8 c8 9f 08 f8 85 ed 0f 8e 49
RSP: 0018:ffffc90003d57078 EFLAGS: 00010293
RAX: ffffffff898c3ec6 RBX: 00000000fffbffea RCX: ffff8880347a5a00
RDX: 0000000000000000 RSI: 00000000fffbffea RDI: 00000000fffc0001
RBP: dffffc0000000000 R08: ffffffff898c3eb6 R09: 1ffff110023eb7d4
R10: dffffc0000000000 R11: ffffed10023eb7d5 R12: dffffc0000000000
R13: ffff888011f5bdc0 R14: 00000000ffffffea R15: 0000000000000000
FS: 000055557d41e380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056519d31d608 CR3: 000000007854e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kfree_skb_reason include/linux/skbuff.h:1263 [inline]
ip_rcv_finish_core+0xfde/0x1b50 net/ipv4/ip_input.c:424
ip_list_rcv_finish net/ipv4/ip_input.c:610 [inline]
ip_sublist_rcv+0x3b1/0xab0 net/ipv4/ip_input.c:636
ip_list_rcv+0x42b/0x480 net/ipv4/ip_input.c:670
__netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
__netif_receive_skb_list_core+0x94e/0x980 net/core/dev.c:5762
__netif_receive_skb_list net/core/dev.c:5814 [inline]
netif_receive_skb_list_internal+0xa51/0xe30 net/core/dev.c:5905
netif_receive_skb_list+0x55/0x4b0 net/core/dev.c:5957
xdp_recv_frames net/bpf/test_run.c:280 [inline]
xdp_test_run_batch net/bpf/test_run.c:361 [inline]
bpf_test_run_xdp_live+0x1b5e/0x21b0 net/bpf/test_run.c:390
bpf_prog_test_run_xdp+0x805/0x11e0 net/bpf/test_run.c:1318
bpf_prog_test_run+0x2e4/0x360 kernel/bpf/syscall.c:4266
__sys_bpf+0x48d/0x810 kernel/bpf/syscall.c:5671
__do_sys_bpf kernel/bpf/syscall.c:5760 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5758 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5758
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f18af25a8e9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffee4090af8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f18af25a8e9
RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Fix it by returning "-SKB_DROP_REASON_IP_LOCAL_SOURCE" instead of
"-EINVAL" in fib_validate_source().
Reported-by: syzbot+52fbd90f020788ec7709@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6738e539.050a0220.e1c64.0002.GAE@google.com/
Fixes: 82d9983ebeb8 ("net: ip: make ip_route_input_noref() return drop reasons")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
warnings"
This reverts commit 3bd9b9abdf1563a22041b7255baea6d449902f1a. We cannot
use the new tagged struct group because it throws C++ errors even under
"extern C".
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20241115204308.3821419-1-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The helper __lookup_addr() can be used in mptcp_pm_nl_get_local_id()
and mptcp_pm_nl_is_backup() to simplify the code, and avoid code
duplication.
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-2-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To return an endpoint to the userspace via Netlink, and to dump all of
them, the endpoint list was iterated while holding the pernet->lock, but
only to read the content of the list.
In these cases, the spin locks can be replaced by RCU read ones, and use
the _rcu variants to iterate over the entries list in a lockless way.
Note that the __lookup_addr_by_id() helper has been modified to use the
_rcu variants of list_for_each_entry(), but with an extra conditions, so
it can be called either while the RCU read lock is held, or when the
associated pernet->lock is held.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241115-net-next-mptcp-pm-lockless-dump-v1-1-f4a1bcb4ca2c@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 51183d233b5a ("net/neighbor: Update neigh_dump_info for strict
data checking") added strict checking. The err variable is not cleared,
so if we find no table to dump we will return the validation error even
if user did not want strict checking.
I think the only way to hit this is to send an buggy request, and ask
for a table which doesn't exist, so there's no point treating this
as a real fix. I only noticed it because a syzbot repro depended on it
to trigger another bug.
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241115003221.733593-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
"Thirteen patches, all focused on moving away from the current 'secid'
LSM identifier to a richer 'lsm_prop' structure.
This move will help reduce the translation that is necessary in many
LSMs, offering better performance, and make it easier to support
different LSMs in the future"
* tag 'lsm-pr-20241112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: remove lsm_prop scaffolding
netlabel,smack: use lsm_prop for audit data
audit: change context data from secid to lsm_prop
lsm: create new security_cred_getlsmprop LSM hook
audit: use an lsm_prop in audit_names
lsm: use lsm_prop in security_inode_getsecid
lsm: use lsm_prop in security_current_getsecid
audit: update shutdown LSM data
lsm: use lsm_prop in security_ipc_getsecid
audit: maintain an lsm_prop in audit_context
lsm: add lsmprop_to_secctx hook
lsm: use lsm_prop in security_audit_rule_match
lsm: add the lsm_prop data structure
|
|
There's issue as follows:
RPC: Registered rdma transport module.
RPC: Registered rdma backchannel transport module.
RPC: Unregistered rdma transport module.
RPC: Unregistered rdma backchannel transport module.
BUG: unable to handle page fault for address: fffffbfff80c609a
PGD 123fee067 P4D 123fee067 PUD 123fea067 PMD 10c624067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
RIP: 0010:percpu_counter_destroy_many+0xf7/0x2a0
Call Trace:
<TASK>
__die+0x1f/0x70
page_fault_oops+0x2cd/0x860
spurious_kernel_fault+0x36/0x450
do_kern_addr_fault+0xca/0x100
exc_page_fault+0x128/0x150
asm_exc_page_fault+0x26/0x30
percpu_counter_destroy_many+0xf7/0x2a0
mmdrop+0x209/0x350
finish_task_switch.isra.0+0x481/0x840
schedule_tail+0xe/0xd0
ret_from_fork+0x23/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
If register_sysctl() return NULL, then svc_rdma_proc_cleanup() will not
destroy the percpu counters which init in svc_rdma_proc_init().
If CONFIG_HOTPLUG_CPU is enabled, residual nodes may be in the
'percpu_counters' list. The above issue may occur once the module is
removed. If the CONFIG_HOTPLUG_CPU configuration is not enabled, memory
leakage occurs.
To solve above issue just destroy all percpu counters when
register_sysctl() return NULL.
Fixes: 1e7e55731628 ("svcrdma: Restore read and write stats")
Fixes: 22df5a22462e ("svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter")
Fixes: df971cd853c0 ("svcrdma: Convert rdma_stat_recv to a per-CPU counter")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The function `c_show` was called with protection from RCU. This only
ensures that `cp` will not be freed. Therefore, the reference count for
`cp` can drop to zero, which will trigger a refcount use-after-free
warning when `cache_get` is called. To resolve this issue, use
`cache_get_rcu` to ensure that `cp` remains active.
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 822 at lib/refcount.c:25
refcount_warn_saturate+0xb1/0x120
CPU: 7 UID: 0 PID: 822 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb1/0x120
Call Trace:
<TASK>
c_show+0x2fc/0x380 [sunrpc]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
proc_reg_read+0xe1/0x140
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull 'struct fd' class updates from Al Viro:
"The bulk of struct fd memory safety stuff
Making sure that struct fd instances are destroyed in the same scope
where they'd been created, getting rid of reassignments and passing
them by reference, converting to CLASS(fd{,_pos,_raw}).
We are getting very close to having the memory safety of that stuff
trivial to verify"
* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
deal with the last remaing boolean uses of fd_file()
css_set_fork(): switch to CLASS(fd_raw, ...)
memcg_write_event_control(): switch to CLASS(fd)
assorted variants of irqfd setup: convert to CLASS(fd)
do_pollfd(): convert to CLASS(fd)
convert do_select()
convert vfs_dedupe_file_range().
convert cifs_ioctl_copychunk()
convert media_request_get_by_fd()
convert spu_run(2)
switch spufs_calls_{get,put}() to CLASS() use
convert cachestat(2)
convert do_preadv()/do_pwritev()
fdget(), more trivial conversions
fdget(), trivial conversions
privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
o2hb_region_dev_store(): avoid goto around fdget()/fdput()
introduce "fd_pos" class, convert fdget_pos() users to it.
fdget_raw() users: switch to CLASS(fd_raw)
convert vmsplice() to CLASS(fd)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs file updates from Christian Brauner:
"This contains changes the changes for files for this cycle:
- Introduce a new reference counting mechanism for files.
As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop
it has O(N^2) behaviour under contention with N concurrent
operations and it is in a hot path in __fget_files_rcu().
The rcuref infrastructures remedies this problem by using an
unconditional increment relying on safe- and dead zones to make
this work and requiring rcu protection for the data structure in
question. This not just scales better it also introduces overflow
protection.
However, in contrast to generic rcuref, files require a memory
barrier and thus cannot rely on *_relaxed() atomic operations and
also require to be built on atomic_long_t as having massive amounts
of reference isn't unheard of even if it is just an attack.
This adds a file specific variant instead of making this a generic
library.
This has been tested by various people and it gives consistent
improvement up to 3-5% on workloads with loads of threads.
- Add a fastpath for find_next_zero_bit(). Skip 2-levels searching
via find_next_zero_bit() when there is a free slot in the word that
contains the next fd. This improves pts/blogbench-1.1.0 read by 8%
and write by 4% on Intel ICX 160.
- Conditionally clear full_fds_bits since it's very likely that a bit
in full_fds_bits has been cleared during __clear_open_fds(). This
improves pts/blogbench-1.1.0 read up to 13%, and write up to 5% on
Intel ICX 160.
- Get rid of all lookup_*_fdget_rcu() variants. They were used to
lookup files without taking a reference count. That became invalid
once files were switched to SLAB_TYPESAFE_BY_RCU and now we're
always taking a reference count. Switch to an already existing
helper and remove the legacy variants.
- Remove pointless includes of <linux/fdtable.h>.
- Avoid cmpxchg() in close_files() as nobody else has a reference to
the files_struct at that point.
- Move close_range() into fs/file.c and fold __close_range() into it.
- Cleanup calling conventions of alloc_fdtable() and expand_files().
- Merge __{set,clear}_close_on_exec() into one.
- Make __set_open_fd() set cloexec as well instead of doing it in two
separate steps"
* tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add file SLAB_TYPESAFE_BY_RCU recycling stressor
fs: port files to file_ref
fs: add file_ref
expand_files(): simplify calling conventions
make __set_open_fd() set cloexec state as well
fs: protect backing files with rcu
file.c: merge __{set,clear}_close_on_exec()
alloc_fdtable(): change calling conventions.
fs/file.c: add fast path in find_next_fd()
fs/file.c: conditionally clear full_fds
fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd()
move close_range(2) into fs/file.c, fold __close_range() into it
close_files(): don't bother with xchg()
remove pointless includes of <linux/fdtable.h>
get rid of ...lookup...fdget_rcu() family
|
|
ceph_crypto_key_encode() was added in 2010's commit
8b6e4f2d8b21 ("ceph: aes crypto and base64 encode/decode helpers")
but has remained unused (the decode is used).
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
ceph_osdc_watch_check() has been unused since it was added in commit
b07d3c4bd727 ("libceph: support for checking on status of watch")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
ceph_copy_user_to_page_vector() has been unused since 2013's commit
e8344e668915 ("ceph: Implement writev/pwritev for sync operation.")
ceph_copy_to_page_vector() has been unused since 2012's commit
913d2fdcf605 ("rbd: always pass ops array to rbd_req_sync_op()")
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
ceph_pagelist_truncate() and ceph_pagelist_set_cursor() have been unused
since commit
39be95e9c8c0 ("ceph: ceph_pagelist_append might sleep while atomic")
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Implement ipv6 udp hash4 like that in ipv4. The major difference is that
the hash value should be calculated with udp6_ehashfn(). Besides,
ipv4-mapped ipv6 address is handled before hash() and rehash(). Export
udp_ehashfn because now we use it in udpv6 rehash.
Core procedures of hash/unhash/rehash are same as ipv4, and udpv4 and
udpv6 share the same udptable, so some functions in ipv4 hash4 can also
be shared.
Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com>
Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com>
Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the udp_table has two hash table, the port hash and portaddr
hash. Usually for UDP servers, all sockets have the same local port and
addr, so they are all on the same hash slot within a reuseport group.
In some applications, UDP servers use connect() to manage clients. In
particular, when firstly receiving from an unseen 4 tuple, a new socket
is created and connect()ed to the remote addr:port, and then the fd is
used exclusively by the client.
Once there are connected sks in a reuseport group, udp has to score all
sks in the same hash2 slot to find the best match. This could be
inefficient with a large number of connections, resulting in high
softirq overhead.
To solve the problem, this patch implement 4-tuple hash for connected
udp sockets. During connect(), hash4 slot is updated, as well as a
corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(),
hslot4 will be searched firstly if the counter is non-zero. Otherwise,
hslot2 is used like before. Note that only connected sockets enter this
hash4 path, while un-connected ones are not affected.
hlist_nulls is used for hash4, because we probably move to another hslot
wrongly when lookup with concurrent rehash. Then we check nulls at the
list end to see if we should restart lookup. Because udp does not use
SLAB_TYPESAFE_BY_RCU, we don't need to touch sk_refcnt when lookup.
Stress test results (with 1 cpu fully used) are shown below, in pps:
(1) _un-connected_ socket as server
[a] w/o hash4: 1,825176
[b] w/ hash4: 1,831750 (+0.36%)
(2) 500 _connected_ sockets as server
[c] w/o hash4: 290860 (only 16% of [a])
[d] w/ hash4: 1,889658 (+3.1% compared with [b])
With hash4, compute_score is skipped when lookup, so [d] is slightly
better than [b].
Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com>
Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com>
Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new hash list, hash4, in udp table. It will be used to implement
4-tuple hash for connected udp sockets. This patch adds the hlist to
table, and implements helpers and the initialization. 4-tuple hash is
implemented in the following patch.
hash4 uses hlist_nulls to avoid moving wrongly onto another hlist due to
concurrent rehash, because rehash() can happen with lookup().
Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com>
Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com>
Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Preparing for udp 4-tuple hash (uhash4 for short).
To implement uhash4 without cache line missing when lookup, hslot2 is
used to record the number of hashed sockets in hslot4. Thus adding a new
struct udp_hslot_main with field hash4_cnt, which is used by hash2. The
new struct is used to avoid doubling the size of udp_hslot.
Before uhash4 lookup, firstly checking hash4_cnt to see if there are
hashed sks in hslot4. Because hslot2 is always used in lookup, there is
no cache line miss.
Related helpers are updated, and use the helpers as possible.
uhash4 is implemented in following patches.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next-11-15
1) Add support for RFC 9611 per cpu xfrm state handling.
2) Add inbound and outbound xfrm state caches to speed up
state lookups.
3) Convert xfrm to dscp_t. From Guillaume Nault.
4) Fix error handling in build_aevent.
From Everest K.C.
5) Replace strncpy with strscpy_pad in copy_to_user_auth.
From Daniel Yang.
6) Fix an uninitialized symbol during acquire state insertion.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On the linux-next, next-20241108 vanilla kernel, the coccinelle tool gave the
following error report:
./net/9p/trans_usbg.c:912:5-11: ERROR: allocation function on line 911 returns
NULL not ERR_PTR on failure
kzalloc() failure is fixed to handle the NULL return case on the memory exhaustion.
Fixes: a3be076dc174d ("net/9p/usbg: Add new usb gadget function transport")
Cc: Michael Grzeschik <m.grzeschik@pengutronix.de>
Cc: Eric Van Hensbergen <ericvh@kernel.org>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Christian Schoenebeck <linux_oss@crudebyte.com>
Cc: v9fs@lists.linux.dev
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Message-ID: <20241109211840.721226-2-mtodorovac69@gmail.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
|
|
In a similar fashion to ndo_fdb_add, which was covered in the previous
patch, add the bool *notified argument to ndo_fdb_del. Callees that send a
notification on their own set the flag to true.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/06b1acf4953ef0a5ed153ef1f32d7292044f2be6.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently when FDB entries are added to or deleted from a VXLAN netdevice,
the VXLAN driver emits one notification, including the VXLAN-specific
attributes. The core however always sends a notification as well, a generic
one. Thus two notifications are unnecessarily sent for these operations. A
similar situation comes up with bridge driver, which also emits
notifications on its own:
# ip link add name vx type vxlan id 1000 dstport 4789
# bridge monitor fdb &
[1] 1981693
# bridge fdb add de:ad:be:ef:13:37 dev vx self dst 192.0.2.1
de:ad:be:ef:13:37 dev vx dst 192.0.2.1 self permanent
de:ad:be:ef:13:37 dev vx self permanent
In order to prevent this duplicity, add a paremeter to ndo_fdb_add,
bool *notified. The flag is primed to false, and if the callee sends a
notification on its own, it sets it to true, thus informing the core that
it should not generate another notification.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/cbf6ae8195e85cbf922f8058ce4eba770f3b71ed.1731589511.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The netpoll subsystem maintains a pool of 32 pre-allocated SKBs per
instance, but these SKBs are not freed when the netpoll user is brought
down. This leads to memory waste as these buffers remain allocated but
unused.
Add skb_pool_flush() to properly clean up these SKBs when netconsole is
terminated, improving memory efficiency.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20241114-skb_buffers_v2-v3-2-9be9f52a8b69@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current implementation of the netpoll system uses a global skb
pool, which can lead to inefficient memory usage and
waste when targets are disabled or no longer in use.
This can result in a significant amount of memory being unnecessarily
allocated and retained, potentially causing performance issues and
limiting the availability of resources for other system components.
Modify the netpoll system to assign a skb pool to each target instead of
using a global one.
This approach allows for more fine-grained control over memory
allocation and deallocation, ensuring that resources are only allocated
and retained as needed.
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20241114-skb_buffers_v2-v3-1-9be9f52a8b69@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit d35c99ff77ec ("netlink: do not enter direct reclaim from
netlink_dump()") the cap is 32KiB.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20241113-tcp-md5-diag-prep-v2-5-00a2a7feb1fa@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a new skb is allocated for transmitting an xsk descriptor, i.e., for
every non-multibuf descriptor or the first frag of a multibuf descriptor,
but the descriptor is later found to have invalid options set for the TX
metadata, the new skb is never freed. This can leak skbs until the send
buffer is full which makes sending more packets impossible.
Fix this by freeing the skb in the error path if we are currently dealing
with the first frag, i.e., an skb allocated in this iteration of
xsk_build_skb.
Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/edb9b00fb19e680dff5a3350cd7581c5927975a8.1731581697.git.fmaurer@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Update .gitignore in selftest to skip conntrack_reverse_clash,
from Li Zhijian.
2) Fix conntrack_dump_flush return values, from Guan Jing.
3) syzbot found that ipset's bitmap type does not properly checks for
bitmap's first ip, from Jeongjun Park.
* tag 'nf-24-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: ipset: add missing range check in bitmap_ip_uadt
selftests: netfilter: Fix missing return values in conntrack_dump_flush
selftests: netfilter: Add missing gitignore file
====================
Link: https://patch.msgid.link/20241114125723.82229-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hold rcu_read_lock during netdev_nl_napi_set_doit, which calls
napi_by_id and requires rcu_read_lock to be held.
Closes: https://lore.kernel.org/netdev/719083c2-e277-447b-b6ea-ca3acb293a03@redhat.com/
Fixes: 1287c1ae0fc2 ("netdev-genl: Support setting per-NAPI config values")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241114175600.18882-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hold rcu_read_lock in netdev_nl_napi_get_doit, which calls napi_by_id
and is required to be called under rcu_read_lock.
Cc: stable@vger.kernel.org
Fixes: 27f91aaf49b3 ("netdev-genl: Add netlink framework functions for napi")
Signed-off-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20241114175157.16604-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- btusb: add Foxconn 0xe0fc for Qualcomm WCN785x
- btmtk: Fix ISO interface handling
- Add quirk for ATS2851
- btusb: Add RTL8852BE device 0489:e123
- ISO: Do not emit LE PA/BIG Create Sync if previous is pending
- btusb: Add USB HW IDs for MT7920/MT7925
- btintel_pcie: Add handshake between driver and firmware
- btintel_pcie: Add recovery mechanism
- hci_conn: Use disable_delayed_work_sync
- SCO: Use kref to track lifetime of sco_conn
- ISO: Use kref to track lifetime of iso_conn
- btnxpuart: Add GPIO support to power save feature
- btusb: Add 0x0489:0xe0f3 and 0x13d3:0x3623 for Qualcomm WCN785x
* tag 'for-net-next-2024-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (51 commits)
Bluetooth: MGMT: Add initial implementation of MGMT_OP_HCI_CMD_SYNC
Bluetooth: fix use-after-free in device_for_each_child()
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
Bluetooth: hci_bcm: Use the devm_clk_get_optional() helper
Bluetooth: ISO: Send BIG Create Sync via hci_sync
Bluetooth: hci_conn: Remove alloc from critical section
Bluetooth: ISO: Use kref to track lifetime of iso_conn
Bluetooth: SCO: Use kref to track lifetime of sco_conn
Bluetooth: HCI: Add IPC(11) bus type
Bluetooth: btusb: Add 3 HWIDs for MT7925
Bluetooth: btusb: Add new VID/PID 0489/e124 for MT7925
Bluetooth: ISO: Update hci_conn_hash_lookup_big for Broadcast slave
Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending
Bluetooth: ISO: Fix matching parent socket for BIS slave
Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pending
Bluetooth: btrtl: Decrease HCI_OP_RESET timeout from 10 s to 2 s
Bluetooth: btbcm: fix missing of_node_put() in btbcm_get_board_name()
Bluetooth: btusb: Add new VID/PID 0489/e111 for MT7925
Bluetooth: btmtk: adjust the position to init iso data anchor
...
====================
Link: https://patch.msgid.link/20241114214731.1994446-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Extended netlink error reporting if nfnetlink attribute parser fails,
from Donald Hunter.
2) Incorrect request_module() module, from Simon Horman.
3) A series of patches to reduce memory consumption for set element
transactions.
Florian Westphal says:
"When doing a flush on a set or mass adding/removing elements from a
set, each element needs to allocate 96 bytes to hold the transactional
state.
In such cases, virtually all the information in struct nft_trans_elem
is the same.
Change nft_trans_elem to a flex-array, i.e. a single nft_trans_elem
can hold multiple set element pointers.
The number of elements that can be stored in one nft_trans_elem is limited
by the slab allocator, this series limits the compaction to at most 62
elements as it caps the reallocation to 2048 bytes of memory."
4) A series of patches to prepare the transition to dscp_t in .flowi_tos.
From Guillaume Nault.
5) Support for bitwise operations with two source registers,
from Jeremy Sowden.
* tag 'nf-next-24-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: bitwise: add support for doing AND, OR and XOR directly
netfilter: bitwise: rename some boolean operation functions
netfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t.
netfilter: nft_fib: Convert nft_fib4_eval() to dscp_t.
netfilter: rpfilter: Convert rpfilter_mt() to dscp_t.
netfilter: flow_offload: Convert nft_flow_route() to dscp_t.
netfilter: ipv4: Convert ip_route_me_harder() to dscp_t.
netfilter: nf_tables: allocate element update information dynamically
netfilter: nf_tables: switch trans_elem to real flex array
netfilter: nf_tables: prepare nft audit for set element compaction
netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure
netfilter: nf_tables: add nft_trans_commit_list_add_elem helper
netfilter: bpf: Pass string literal as format argument of request_module()
netfilter: nfnetlink: Report extack policy errors for batched ops
====================
Link: https://patch.msgid.link/20241115133207.8907-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hitherto, these operations have been converted in user space to
mask-and-xor operations on one register and two immediate values, and it
is the latter which have been evaluated by the kernel. We add support
for evaluating these operations directly in kernel space on one register
and either an immediate value or a second register.
Pablo made a few changes to the original patch:
- EINVAL if NFTA_BITWISE_SREG2 is used with fast version.
- Allow _AND,_OR,_XOR with _DATA != sizeof(u32)
- Dump _SREG2 or _DATA with _AND,_OR,_XOR
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In the next patch we add support for doing AND, OR and XOR operations
directly in the kernel, so rename some functions and an enum constant
related to mask-and-xor boolean operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use ip4h_dscp() instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use ip4h_dscp() instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use ip4h_dscp() instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use ip4h_dscp()instead of reading ip_hdr()->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Also, remove the comment about the net/ip.h include file, since it's
now required for the ip4h_dscp() helper too.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Use ip4h_dscp()instead of reading iph->tos directly.
ip4h_dscp() returns a dscp_t value which is temporarily converted back
to __u8 with inet_dscp_to_dsfield(). When converting ->flowi4_tos to
dscp_t in the future, we'll only have to remove that
inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
A recent commit jumped over the dst hash computation and
left the symbol uninitialized. Fix this by explicitly
computing the dst hash before it is used.
Fixes: 0045e3d80613 ("xfrm: Cache used outbound xfrm states at the policy.")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
In ethtool_check_max_channel(), the new RX count must not only cover the
max queue indices in RSS indirection tables and RXNFC destinations
separately, but must also, for RXNFC rules with FLOW_RSS, cover the sum
of the destination queue and the maximum index in the associated RSS
context's indirection table, since that is the highest queue that the
rule can actually deliver traffic to.
It could be argued that the max queue across all custom RSS contexts
(ethtool_get_max_rss_ctx_channel()) need no longer be considered, since
any context to which packets can actually be delivered will be targeted
by some RXNFC rule and its max will thus be allowed for by
ethtool_get_max_rxnfc_channel(). For simplicity we keep both checks, so
even RSS contexts unused by any RXNFC rule must fit the channel count.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/43257d375434bef388e36181492aa4c458b88336.1731499022.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ethtool ntuple filters with FLOW_RSS were originally defined as adding
the base queue ID (ring_cookie) to the value from the indirection table,
so that the same table could distribute over more than one set of queues
when used by different filters.
However, some drivers / hardware ignore the ring_cookie, and simply use
the indirection table entries as queue IDs directly. Thus, for drivers
which have not opted in by setting ethtool_ops.cap_rss_rxnfc_adds to
declare that they support the original (addition) semantics, reject in
ethtool_set_rxnfc any filter which combines FLOW_RSS and a nonzero ring.
(For a ring_cookie of zero, both behaviours are equivalent.)
Set the cap bit in sfc, as it is known to support this feature.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/cc3da0844083b0e301a33092a6299e4042b65221.1731499022.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2024-11-14
We've added 9 non-merge commits during the last 4 day(s) which contain
a total of 3 files changed, 226 insertions(+), 84 deletions(-).
The main changes are:
1) Fixes to bpf_msg_push/pop_data and test_sockmap. The changes has
dependency on the other changes in the bpf-next/net branch,
from Zijian Zhang.
2) Drop netns codes from mptcp test. Reuse the common helpers in
test_progs, from Geliang Tang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
bpf, sockmap: Fix sk_msg_reset_curr
bpf, sockmap: Several fixes to bpf_msg_pop_data
bpf, sockmap: Several fixes to bpf_msg_push_data
selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap
selftests/bpf: Add push/pop checking for msg_verify_data in test_sockmap
selftests/bpf: Fix total_bytes in msg_loop_rx in test_sockmap
selftests/bpf: Fix SENDPAGE data logic in test_sockmap
selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap
selftests/bpf: Drop netns helpers in mptcp
====================
Link: https://patch.msgid.link/20241114202832.3187927-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().
Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/8338a12377c44f698a651d1ce357dd92bdf18120.1731064982.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
dscp_t value to __u8 with inet_dscp_to_dsfield().
Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
the inet_dscp_to_dsfield() call.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/35eacc8955003e434afb1365d404193cc98a9579.1731064982.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This adds the initial implementation of MGMT_OP_HCI_CMD_SYNC as
documented in mgmt-api (BlueZ tree):
Send HCI command and wait for event Command
===========================================
Command Code: 0x005B
Controller Index: <controller id>
Command Parameters: Opcode (2 Octets)
Event (1 Octet)
Timeout (1 Octet)
Parameter Length (2 Octets)
Parameter (variable)
Return Parameters: Response (1-variable Octets)
This command may be used to send a HCI command and wait for an
(optional) event.
The HCI command is specified by the Opcode, any arbitrary is supported
including vendor commands, but contrary to the like of
Raw/User channel it is run as an HCI command send by the kernel
since it uses its command synchronization thus it is possible to wait
for a specific event as a response.
Setting event to 0x00 will cause the command to wait for either
HCI Command Status or HCI Command Complete.
Timeout is specified in seconds, setting it to 0 will cause the
default timeout to be used.
Possible errors: Failed
Invalid Parameters
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Syzbot has reported the following KASAN splat:
BUG: KASAN: slab-use-after-free in device_for_each_child+0x18f/0x1a0
Read of size 8 at addr ffff88801f605308 by task kbnepd bnep0/4980
CPU: 0 UID: 0 PID: 4980 Comm: kbnepd bnep0 Not tainted 6.12.0-rc4-00161-gae90f6a6170d #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x100/0x190
? device_for_each_child+0x18f/0x1a0
print_report+0x13a/0x4cb
? __virt_addr_valid+0x5e/0x590
? __phys_addr+0xc6/0x150
? device_for_each_child+0x18f/0x1a0
kasan_report+0xda/0x110
? device_for_each_child+0x18f/0x1a0
? __pfx_dev_memalloc_noio+0x10/0x10
device_for_each_child+0x18f/0x1a0
? __pfx_device_for_each_child+0x10/0x10
pm_runtime_set_memalloc_noio+0xf2/0x180
netdev_unregister_kobject+0x1ed/0x270
unregister_netdevice_many_notify+0x123c/0x1d80
? __mutex_trylock_common+0xde/0x250
? __pfx_unregister_netdevice_many_notify+0x10/0x10
? trace_contention_end+0xe6/0x140
? __mutex_lock+0x4e7/0x8f0
? __pfx_lock_acquire.part.0+0x10/0x10
? rcu_is_watching+0x12/0xc0
? unregister_netdev+0x12/0x30
unregister_netdevice_queue+0x30d/0x3f0
? __pfx_unregister_netdevice_queue+0x10/0x10
? __pfx_down_write+0x10/0x10
unregister_netdev+0x1c/0x30
bnep_session+0x1fb3/0x2ab0
? __pfx_bnep_session+0x10/0x10
? __pfx_lock_release+0x10/0x10
? __pfx_woken_wake_function+0x10/0x10
? __kthread_parkme+0x132/0x200
? __pfx_bnep_session+0x10/0x10
? kthread+0x13a/0x370
? __pfx_bnep_session+0x10/0x10
kthread+0x2b7/0x370
? __pfx_kthread+0x10/0x10
ret_from_fork+0x48/0x80
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 4974:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0xaa/0xb0
__kmalloc_noprof+0x1d1/0x440
hci_alloc_dev_priv+0x1d/0x2820
__vhci_create_device+0xef/0x7d0
vhci_write+0x2c7/0x480
vfs_write+0x6a0/0xfc0
ksys_write+0x12f/0x260
do_syscall_64+0xc7/0x250
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 4979:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x4f/0x70
kfree+0x141/0x490
hci_release_dev+0x4d9/0x600
bt_host_release+0x6a/0xb0
device_release+0xa4/0x240
kobject_put+0x1ec/0x5a0
put_device+0x1f/0x30
vhci_release+0x81/0xf0
__fput+0x3f6/0xb30
task_work_run+0x151/0x250
do_exit+0xa79/0x2c30
do_group_exit+0xd5/0x2a0
get_signal+0x1fcd/0x2210
arch_do_signal_or_restart+0x93/0x780
syscall_exit_to_user_mode+0x140/0x290
do_syscall_64+0xd4/0x250
entry_SYSCALL_64_after_hwframe+0x77/0x7f
In 'hci_conn_del_sysfs()', 'device_unregister()' may be called when
an underlying (kobject) reference counter is greater than 1. This
means that reparenting (happened when the device is actually freed)
is delayed and, during that delay, parent controller device (hciX)
may be deleted. Since the latter may create a dangling pointer to
freed parent, avoid that scenario by reparenting to NULL explicitly.
Reported-by: syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com
Tested-by: syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f
Fixes: a85fb91e3d72 ("Bluetooth: Fix double free in hci_conn_cleanup")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|