Age | Commit message (Collapse) | Author |
|
In practice, userspace hasn't been able to set this for many
years, and mac80211 has already rejected it (which is now no
longer needed), so reject SMPS mode (other than "OFF" to be
a bit more compatible) in AP mode. Also remove the parameter
from the AP settings struct.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.fe1fc46484cf.I8676fb52b818a4bedeb9c25b901e1396277ffc0b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add support to indicate to the driver that an interface is about to be
added so that the driver could prepare its resources early if it needs
so.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241007144851.e0e8563e1c30.Ifccc96a46a347eb15752caefc9f4eff31f75ed47@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
conn->sk maybe have been unlinked/freed while waiting for iso_conn_lock
so this checks if the conn->sk is still valid by checking if it part of
iso_sk_list.
Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
conn->sk maybe have been unlinked/freed while waiting for sco_conn_lock
so this checks if the conn->sk is still valid by checking if it part of
sco_sk_list.
Reported-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com
Tested-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4c0d0c4cde787116d465
Fixes: ba316be1b6a0 ("Bluetooth: schedule SCO timeouts with delayed_work")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
This make use of disable_work_* on hci_unregister_dev since the hci_dev is
about to be freed new submissions are not disarable.
Fixes: 0d151a103775 ("Bluetooth: hci_core: cancel all works upon hci_unregister_dev()")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
npinfo is not used in any of the ndo_netpoll_setup() methods.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241018052108.2610827-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Fix possible use-after-free in 'taprio_dump()' by adding RCU
read-side critical section there. Never seen on x86 but
found on a KASAN-enabled arm64 system when investigating
https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa:
[T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0
[T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862
[T15862]
[T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2
[T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024
[T15862] Call trace:
[T15862] dump_backtrace+0x20c/0x220
[T15862] show_stack+0x2c/0x40
[T15862] dump_stack_lvl+0xf8/0x174
[T15862] print_report+0x170/0x4d8
[T15862] kasan_report+0xb8/0x1d4
[T15862] __asan_report_load4_noabort+0x20/0x2c
[T15862] taprio_dump+0xa0c/0xbb0
[T15862] tc_fill_qdisc+0x540/0x1020
[T15862] qdisc_notify.isra.0+0x330/0x3a0
[T15862] tc_modify_qdisc+0x7b8/0x1838
[T15862] rtnetlink_rcv_msg+0x3c8/0xc20
[T15862] netlink_rcv_skb+0x1f8/0x3d4
[T15862] rtnetlink_rcv+0x28/0x40
[T15862] netlink_unicast+0x51c/0x790
[T15862] netlink_sendmsg+0x79c/0xc20
[T15862] __sock_sendmsg+0xe0/0x1a0
[T15862] ____sys_sendmsg+0x6c0/0x840
[T15862] ___sys_sendmsg+0x1ac/0x1f0
[T15862] __sys_sendmsg+0x110/0x1d0
[T15862] __arm64_sys_sendmsg+0x74/0xb0
[T15862] invoke_syscall+0x88/0x2e0
[T15862] el0_svc_common.constprop.0+0xe4/0x2a0
[T15862] do_el0_svc+0x44/0x60
[T15862] el0_svc+0x50/0x184
[T15862] el0t_64_sync_handler+0x120/0x12c
[T15862] el0t_64_sync+0x190/0x194
[T15862]
[T15862] Allocated by task 15857:
[T15862] kasan_save_stack+0x3c/0x70
[T15862] kasan_save_track+0x20/0x3c
[T15862] kasan_save_alloc_info+0x40/0x60
[T15862] __kasan_kmalloc+0xd4/0xe0
[T15862] __kmalloc_cache_noprof+0x194/0x334
[T15862] taprio_change+0x45c/0x2fe0
[T15862] tc_modify_qdisc+0x6a8/0x1838
[T15862] rtnetlink_rcv_msg+0x3c8/0xc20
[T15862] netlink_rcv_skb+0x1f8/0x3d4
[T15862] rtnetlink_rcv+0x28/0x40
[T15862] netlink_unicast+0x51c/0x790
[T15862] netlink_sendmsg+0x79c/0xc20
[T15862] __sock_sendmsg+0xe0/0x1a0
[T15862] ____sys_sendmsg+0x6c0/0x840
[T15862] ___sys_sendmsg+0x1ac/0x1f0
[T15862] __sys_sendmsg+0x110/0x1d0
[T15862] __arm64_sys_sendmsg+0x74/0xb0
[T15862] invoke_syscall+0x88/0x2e0
[T15862] el0_svc_common.constprop.0+0xe4/0x2a0
[T15862] do_el0_svc+0x44/0x60
[T15862] el0_svc+0x50/0x184
[T15862] el0t_64_sync_handler+0x120/0x12c
[T15862] el0t_64_sync+0x190/0x194
[T15862]
[T15862] Freed by task 6192:
[T15862] kasan_save_stack+0x3c/0x70
[T15862] kasan_save_track+0x20/0x3c
[T15862] kasan_save_free_info+0x4c/0x80
[T15862] poison_slab_object+0x110/0x160
[T15862] __kasan_slab_free+0x3c/0x74
[T15862] kfree+0x134/0x3c0
[T15862] taprio_free_sched_cb+0x18c/0x220
[T15862] rcu_core+0x920/0x1b7c
[T15862] rcu_core_si+0x10/0x1c
[T15862] handle_softirqs+0x2e8/0xd64
[T15862] __do_softirq+0x14/0x20
Fixes: 18cdd2f0998a ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://patch.msgid.link/20241018051339.418890-2-dmantipov@yandex.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In 'taprio_change()', 'admin' pointer may become dangling due to sched
switch / removal caused by 'advance_sched()', and critical section
protected by 'q->current_entry_lock' is too small to prevent from such
a scenario (which causes use-after-free detected by KASAN). Fix this
by prefer 'rcu_replace_pointer()' over 'rcu_assign_pointer()' to update
'admin' immediately before an attempt to schedule freeing.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Reported-by: syzbot+b65e0af58423fc8a73aa@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://patch.msgid.link/20241018051339.418890-1-dmantipov@yandex.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Recently, commit 4a0ec2aa0704 ("ipv6: switch inet6_addr_hash()
to less predictable hash") and commit 4daf4dc275f1 ("ipv6: switch
inet6_acaddr_hash() to less predictable hash") hardened IPv6
address hash functions.
inet_addr_hash() is also highly predictable, and a malicious use
could abuse a specific bucket.
Let's follow the change on IPv4 by using jhash_1word().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241018014100.93776-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
kernel test robot reported a section mismatch in ip6_mr_cleanup().
WARNING: modpost: vmlinux: section mismatch in reference: ip6_mr_cleanup+0x0 (section: .text) -> 0xffffffff (section: .init.rodata)
WARNING: modpost: vmlinux: section mismatch in reference: ip6_mr_cleanup+0x14 (section: .text) -> ip6mr_rtnl_msg_handlers (section: .init.rodata)
ip6_mr_cleanup() uses ip6mr_rtnl_msg_handlers[] that has
__initconst_or_module qualifier.
ip6_mr_cleanup() is only called from inet6_init() but does
not have __init qualifier.
Let's add __init to ip6_mr_cleanup().
Fixes: 3ac84e31b33e ("ipmr: Use rtnl_register_many().")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410180139.B3HeemsC-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20241017174732.39487-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This isn't used outside act_api.c, but is called by tcf_dump_walker()
prior to its definition. So move it upwards and make it static.
Simultaneously, reorder the variable declarations so that they follow
the networking "reverse Christmas tree" coding style.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241017161934.3599046-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
created by classifiers
tcf_action_init() has logic for checking mismatches between action and
filter offload flags (skip_sw/skip_hw). AFAIU, this is intended to run
on the transition between the new tc_act_bind(flags) returning true (aka
now gets bound to classifier) and tc_act_bind(act->tcfa_flags) returning
false (aka action was not bound to classifier before). Otherwise, the
check is skipped.
For the case where an action is not standalone, but rather it was
created by a classifier and is bound to it, tcf_action_init() skips the
check entirely, and this means it allows mismatched flags to occur.
Taking the matchall classifier code path as an example (with mirred as
an action), the reason is the following:
1 | mall_change()
2 | -> mall_replace_hw_filter()
3 | -> tcf_exts_validate_ex()
4 | -> flags |= TCA_ACT_FLAGS_BIND;
5 | -> tcf_action_init()
6 | -> tcf_action_init_1()
7 | -> a_o->init()
8 | -> tcf_mirred_init()
9 | -> tcf_idr_create_from_flags()
10 | -> tcf_idr_create()
11 | -> p->tcfa_flags = flags;
12 | -> tc_act_bind(flags))
13 | -> tc_act_bind(act->tcfa_flags)
When invoked from tcf_exts_validate_ex() like matchall does (but other
classifiers validate their extensions as well), tcf_action_init() runs
in a call path where "flags" always contains TCA_ACT_FLAGS_BIND (set by
line 4). So line 12 is always true, and line 13 is always true as well.
No transition ever takes place, and the check is skipped.
The code was added in this form in commit c86e0209dc77 ("flow_offload:
validate flags of filter and actions"), but I'm attributing the blame
even earlier in that series, to when TCA_ACT_FLAGS_SKIP_HW and
TCA_ACT_FLAGS_SKIP_SW were added to the UAPI.
Following the development process of this change, the check did not
always exist in this form. A change took place between v3 [1] and v4 [2],
AFAIU due to review feedback that it doesn't make sense for action flags
to be different than classifier flags. I think I agree with that
feedback, but it was translated into code that omits enforcing this for
"classic" actions created at the same time with the filters themselves.
There are 3 more important cases to discuss. First there is this command:
$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
action mirred ingress mirror dev eth1
which should be allowed, because prior to the concept of dedicated
action flags, it used to work and it used to mean the action inherited
the skip_sw/skip_hw flags from the classifier. It's not a mismatch.
Then we have this command:
$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
action mirred ingress mirror dev eth1 skip_hw
where there is a mismatch and it should be rejected.
Finally, we have:
$ tc qdisc add dev eth0 clasct
$ tc filter add dev eth0 ingress matchall skip_sw \
action mirred ingress mirror dev eth1 skip_sw
where the offload flags coincide, and this should be treated the same as
the first command based on inheritance, and accepted.
[1]: https://lore.kernel.org/netdev/20211028110646.13791-9-simon.horman@corigine.com/
[2]: https://lore.kernel.org/netdev/20211118130805.23897-10-simon.horman@corigine.com/
Fixes: 7adc57651211 ("flow_offload: add skip_hw and skip_sw to control if offload the action")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20241017161049.3570037-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This fixes the output of rps_default_mask and flow_limit_cpu_bitmap when
the CPU count is > 448, as it was truncated.
The underlying values are actually stored correctly when writing to
these sysctl but displaying them uses a fixed length temporary buffer in
dump_cpumask. This buffer can be too small if the CPU count is > 448.
Fix this by dynamically allocating the buffer in dump_cpumask, using a
guesstimate of what we need.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When computing the length we'll be able to use out of the buffers, one
char is removed from the temporary one to make room for a newline. It
should be removed from the output buffer length too, but in reality this
is not needed as the later call to scnprintf makes sure a null char is
written at the end of the buffer which we override with the newline.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Before adding a new line at the end of the temporary buffer in
dump_cpumask, a length check is performed to ensure there is space for
it.
len = min(sizeof(kbuf) - 1, *lenp);
len = scnprintf(kbuf, len, ...);
if (len < *lenp)
kbuf[len++] = '\n';
Note that the check is currently logically wrong, the written length is
compared against the output buffer, not the temporary one. However this
has no consequence as this is always true, even if fixed: scnprintf
includes a null char at the end of the buffer but the returned length do
not include it and there is always space for overriding it with a
newline.
Remove the condition.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
sock_{,re}set_flag() are contained in sock_valbool_flag(),
it would be cleaner to just use sock_valbool_flag().
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Link: https://patch.msgid.link/20241017133435.2552-1-yajun.deng@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We can now undo parts of 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT}
args in case of error") as discussed in [0].
Given the BPF helpers now have MEM_WRITE tag, the MEM_UNINIT can be cleared.
The mtu_len is an input as well as output argument, meaning, the BPF program
has to set it to something. It cannot be uninitialized. Therefore, allowing
uninitialized memory and zeroing it on error would be odd. It was done as
an interim step in 4b3786a6c539 as the desired behavior could not have been
expressed before the introduction of MEM_WRITE tag.
Fixes: 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/a86eb76d-f52f-dee4-e5d2-87e45de3e16f@iogearbox.net [0]
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241021152809.33343-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a MEM_WRITE attribute for BPF helper functions which can be used in
bpf_func_proto to annotate an argument type in order to let the verifier
know that the helper writes into the memory passed as an argument. In
the past MEM_UNINIT has been (ab)used for this function, but the latter
merely tells the verifier that the passed memory can be uninitialized.
There have been bugs with overloading the latter but aside from that
there are also cases where the passed memory is read + written which
currently cannot be expressed, see also 4b3786a6c539 ("bpf: Zero former
ARG_PTR_TO_{LONG,INT} args in case of error").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241021152809.33343-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Change ynl-gen-c.py to use NLA_BE16 and NLA_BE32 types to represent
big-endian u16 and u32 ynl types.
Doing this enables those attributes to have range checks applied, as
the validator will then convert to host endianness prior to validation.
The autogenerated kernel/uapi code have been regenerated by running:
./tools/net/ynl/ynl-regen.sh -f
This changes the policy types of the following attributes:
FOU_ATTR_PORT (NLA_U16 -> NLA_BE16)
FOU_ATTR_PEER_PORT (NLA_U16 -> NLA_BE16)
These two are used with nla_get_be16/nla_put_be16().
MPTCP_PM_ADDR_ATTR_ADDR4 (NLA_U32 -> NLA_BE32)
This one is used with nla_get_in_addr/nla_put_in_addr(),
which uses nla_get_be32/nla_put_be32().
IOWs the generated changes are AFAICT aligned with their implementations.
The generated userspace code remains identical, and have been verified
by comparing the output generated by the following command:
make -C tools/net/ynl/generated
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241017094704.3222173-1-ast@fiberby.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
This patchset contains Netfilter fixes for net:
1) syzkaller managed to triger UaF due to missing reference on netns in
bpf infrastructure, from Florian Westphal.
2) Fix incorrect conversion from NFPROTO_UNSPEC to NFPROTO_{IPV4,IPV6}
in the following xtables targets: MARK and NFLOG. Moreover, add
missing
I have my half share in this mistake, I did not take the necessary time
to review this: For several years I have been struggling to keep working
on Netfilter, juggling a myriad of side consulting projects to stop
burning my own savings.
I have extended the iptables-tests.py test infrastructure to improve the
coverage of ip6tables and detect similar problems in the future.
This is a v2 including a extended PR with one more fix.
netfilter pull request 24-10-21
* tag 'nf-24-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: xtables: fix typo causing some targets not to load on IPv6
netfilter: bpf: must hold reference on net namespace
====================
Link: https://patch.msgid.link/20241021094536.81487-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
guarantee that rtnl_af_ops is alive during inflight RTM_SETLINK
even when its module is being unloaded.
Let's use SRCU to protect ops.
rtnl_af_lookup() now iterates rtnl_af_ops under RCU and returns
SRCU-protected ops pointer. The caller must call rtnl_af_put()
to release the pointer after the use.
Also, rtnl_af_unregister() unlinks the ops first and calls
synchronize_srcu() to wait for inflight RTM_SETLINK requests to
complete.
Note that rtnl_af_ops needs to be protected by its dedicated lock
when RTNL is removed.
Note also that BUG_ON() in do_setlink() is changed to the normal
error handling as a different af_ops might be found after
validate_linkmsg().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The next patch will add init_srcu_struct() in rtnl_af_register(),
then we need to handle its error.
Let's add the error handling in advance to make the following
patch cleaner.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will push RTNL down to rtnl_setlink().
RTM_SETLINK could call rtnl_link_get_net_capable() in do_setlink()
to move a dev to a new netns, but the netns needs to be fetched before
holding rtnl_net_lock().
Let's move it to rtnl_setlink() and pass the netns to do_setlink().
Now, RTM_NEWLINK paths (rtnl_changelink() and rtnl_group_changelink())
can pass the prefetched netns to do_setlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will push RTNL down to rtnl_setlink().
Let's unify the error path to make it easy to place rtnl_net_lock().
While at it, keep the variables in reverse xmas order.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will push RTNL down to rtnl_delink().
Let's unify the error path to make it easy to place rtnl_net_lock().
While at it, keep the variables in reverse xmas order.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Another netns option for RTM_NEWLINK is IFLA_LINK_NETNSID and
is fetched in rtnl_newlink_create().
This must be done before holding rtnl_net_lock().
Let's move IFLA_LINK_NETNSID processing to rtnl_newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
As a prerequisite of per-netns RTNL, we must fetch netns before
looking up dev or moving it to another netns.
rtnl_link_get_net_capable() is called in rtnl_newlink_create() and
do_setlink(), but both of them need to be moved to the RTNL-independent
region, which will be rtnl_newlink().
Let's call rtnl_link_get_net_capable() in rtnl_newlink() and pass the
netns down to where needed.
Note that the latter two have not passed the nets to do_setlink() yet
but will do so after the remaining rtnl_link_get_net_capable() is moved
to rtnl_setlink() later.
While at it, dest_net is renamed to tgt_net in rtnl_newlink_create() to
align with rtnl_{del,set}link().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to
guarantee that rtnl_link_ops is alive during inflight RTM_NEWLINK
even when its module is being unloaded.
Let's use SRCU to protect ops.
rtnl_link_ops_get() now iterates link_ops under RCU and returns
SRCU-protected ops pointer. The caller must call rtnl_link_ops_put()
to release the pointer after the use.
Also, __rtnl_link_unregister() unlinks the ops first and calls
synchronize_srcu() to wait for inflight RTM_NEWLINK requests to
complete.
Note that link_ops needs to be protected by its dedicated lock
when RTNL is removed.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ops->validate() does not require RTNL.
Let's move it to rtnl_newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, if neither dev nor rtnl_link_ops is found in __rtnl_newlink(),
we release RTNL and redo the whole process after request_module(), which
complicates the logic.
The ops will be RTNL-independent later.
Let's move the ops lookup to rtnl_newlink() and do the retry earlier.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will push RTNL down to rtnl_newlink().
Let's move RTNL-independent validation to rtnl_newlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
__rtnl_newlink() got too long to maintain.
For example, netdev_master_upper_dev_get()->rtnl_link_ops is fetched even
when IFLA_INFO_SLAVE_DATA is not specified.
Let's factorise the single dev do_setlink() path to a separate function.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
There are 3 paths that finally call do_setlink(), and validate_linkmsg()
is called in each path.
1. RTM_NEWLINK
1-1. dev is found in __rtnl_newlink()
1-2. dev isn't found, but IFLA_GROUP is specified in
rtnl_group_changelink()
2. RTM_SETLINK
The next patch factorises 1-1 to a separate function.
As a preparation, let's move validate_linkmsg() calls to do_setlink().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We will move linkinfo to rtnl_newlink() and pass it down to other
functions.
Let's pack it into rtnl_newlink_tbs.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This was attempted by using the dev_name in the slab cache name, but as
Omar Sandoval pointed out, that can be an arbitrary string, eg something
like "/dev/root". Which in turn trips verify_dirent_name(), which fails
if a filename contains a slash.
So just make it use a sequence counter, and make it an atomic_t to avoid
any possible races or locking issues.
Reported-and-tested-by: Omar Sandoval <osandov@fb.com>
Link: https://lore.kernel.org/all/ZxafcO8KWMlXaeWE@telecaster.dhcp.thefacebook.com/
Fixes: 79efebae4afc ("9p: Avoid creating multiple slab caches with the same name")
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Some workloads hit the infamous dev_watchdog() message:
"NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out"
It seems possible to hit this even for perfectly normal
BQL enabled drivers:
1) Assume a TX queue was idle for more than dev->watchdog_timeo
(5 seconds unless changed by the driver)
2) Assume a big packet is sent, exceeding current BQL limit.
3) Driver ndo_start_xmit() puts the packet in TX ring,
and netdev_tx_sent_queue() is called.
4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue()
before txq->trans_start has been written.
5) txq->trans_start is written later, from netdev_start_xmit()
if (rc == NETDEV_TX_OK)
txq_trans_update(txq)
dev_watchdog() running on another cpu could read the old
txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5)
did not happen yet.
To solve the issue, write txq->trans_start right before one XOFF bit
is set :
- _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue()
- __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue()
From dev_watchdog(), we have to read txq->state before txq->trans_start.
Add memory barriers to enforce correct ordering.
In the future, we could avoid writing over txq->trans_start for normal
operations, and rename this field to txq->xoff_start_time.
Fixes: bec251bc8b6a ("net: no longer stop all TX queues in dev_watchdog()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241015194118.3951657-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
- There is no NFPROTO_IPV6 family for mark and NFLOG.
- TRACE is also missing module autoload with NFPROTO_IPV6.
This results in ip6tables failing to restore a ruleset. This issue has been
reported by several users providing incomplete patches.
Very similar to Ilya Katsnelson's patch including a missing chunk in the
TRACE extension.
Fixes: 0bfcb7b71e73 ("netfilter: xtables: avoid NFPROTO_UNSPEC where needed")
Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Reported-by: Ilya Katsnelson <me@0upti.me>
Reported-by: Krzysztof Olędzki <ole@ans.pl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.12-rc4).
Conflicts:
107a034d5c1e ("net/mlx5: qos: Store rate groups in a qos domain")
1da9cfd6c41c ("net/mlx5: Unregister notifier on eswitch init failure")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Pull bluetooth fixes from Luiz Augusto Von Dentz:
- ISO: Fix multiple init when debugfs is disabled
- Call iso_exit() on module unload
- Remove debugfs directory on module init failure
- btusb: Fix not being able to reconnect after suspend
- btusb: Fix regression with fake CSR controllers 0a12:0001
- bnep: fix wild-memory-access in proto_unregister
Note: normally the bluetooth fixes go through the networking tree, but
this missed the weekly merge, and two of the commits fix regressions
that have caused a fair amount of noise and have now hit stable too:
https://lore.kernel.org/all/4e1977ca-6166-4891-965e-34a6f319035f@leemhuis.info/
So I'm pulling it directly just to expedite things and not miss yet
another -rc release. This is not meant to become a new pattern.
* tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001
Bluetooth: bnep: fix wild-memory-access in proto_unregister
Bluetooth: btusb: Fix not being able to reconnect after suspend
Bluetooth: Remove debugfs directory on module init failure
Bluetooth: Call iso_exit() on module unload
Bluetooth: ISO: Fix multiple init when debugfs is disabled
|
|
Pull 9p fixes from Dominique Martinet:
"Mashed-up update that I sat on too long:
- fix for multiple slabs created with the same name
- enable multipage folios
- theorical fix to also look for opened fids by inode if none was
found by dentry"
[ Enabling multi-page folios should have been done during the merge
window, but it's a one-liner, and the actual meat of the enablement
is in netfs and already in use for other filesystems... - Linus ]
* tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux:
9p: Avoid creating multiple slab caches with the same name
9p: Enable multipage folios
9p: v9fs_fid_find: also lookup by inode if not found dentry
|
|
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to not affect subreg_def marks in its range
propagation (Eduard Zingerman)
- Fix a truncation bug in the BPF verifier's handling of
coerce_reg_to_size_sx (Dimitar Kanaliev)
- Fix the BPF verifier's delta propagation between linked registers
under 32-bit addition (Daniel Borkmann)
- Fix a NULL pointer dereference in BPF devmap due to missing rxq
information (Florian Kauer)
- Fix a memory leak in bpf_core_apply (Jiri Olsa)
- Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
arrays of nested structs (Hou Tao)
- Fix build ID fetching where memory areas backing the file were
created with memfd_secret (Andrii Nakryiko)
- Fix BPF task iterator tid filtering which was incorrectly using pid
instead of tid (Jordan Rome)
- Several fixes for BPF sockmap and BPF sockhash redirection in
combination with vsocks (Michal Luczaj)
- Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)
- Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
of an infinite BPF tailcall (Pu Lehui)
- Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
be resolved (Thomas Weißschuh)
- Fix a bug in kfunc BTF caching for modules where the wrong BTF object
was returned (Toke Høiland-Jørgensen)
- Fix a BPF selftest compilation error in cgroup-related tests with
musl libc (Tony Ambardar)
- Several fixes to BPF link info dumps to fill missing fields (Tyrone
Wu)
- Add BPF selftests for kfuncs from multiple modules, checking that the
correct kfuncs are called (Simon Sundberg)
- Ensure that internal and user-facing bpf_redirect flags don't overlap
(Toke Høiland-Jørgensen)
- Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
Riel)
- Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
under RT (Wander Lairson Costa)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
lib/buildid: Handle memfd_secret() files in build_id_parse()
selftests/bpf: Add test case for delta propagation
bpf: Fix print_reg_state's constant scalar dump
bpf: Fix incorrect delta propagation between linked registers
bpf: Properly test iter/task tid filtering
bpf: Fix iter/task tid filtering
riscv, bpf: Make BPF_CMPXCHG fully ordered
bpf, vsock: Drop static vsock_bpf_prot initialization
vsock: Update msg_count on read_skb()
vsock: Update rx_bytes on read_skb()
bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
selftests/bpf: Add asserts for netfilter link info
bpf: Fix link info netfilter flags to populate defrag flag
selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
bpf: Fix truncation bug in coerce_reg_to_size_sx()
selftests/bpf: Assert link info uprobe_multi count & path_size if unset
bpf: Fix unpopulated path_size when uprobe_multi fields unset
selftests/bpf: Fix cross-compiling urandom_read
selftests/bpf: Add test for kfunc module order
...
|
|
There is no longer any reason to implement the mac_select_pcs()
callback in DSA. Returning ERR_PTR(-EOPNOTSUPP) is functionally
equivalent to not providing the function.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Current release - new code bugs:
- eth: mlx5: HWS, don't destroy more bwc queue locks than allocated
Previous releases - regressions:
- ipv4: give an IPv4 dev to blackhole_netdev
- udp: compute L4 checksum as usual when not segmenting the skb
- tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().
- eth: mlx5e: don't call cleanup on profile rollback failure
- eth: microchip: vcap api: fix memory leaks in
vcap_api_encode_rule_test()
- eth: enetc: disable Tx BD rings after they are empty
- eth: macb: avoid 20s boot delay by skipping MDIO bus registration
for fixed-link PHY
Previous releases - always broken:
- posix-clock: fix missing timespec64 check in pc_clock_settime()
- genetlink: hold RCU in genlmsg_mcast()
- mptcp: prevent MPC handshake on port-based signal endpoints
- eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame
- eth: stmmac: dwmac-tegra: fix link bring-up sequence
- eth: bcmasp: fix potential memory leak in bcmasp_xmit()
Misc:
- add Andrew Lunn as a co-maintainer of all networking drivers"
* tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net/mlx5e: Don't call cleanup on profile rollback failure
net/mlx5: Unregister notifier on eswitch init failure
net/mlx5: Fix command bitmask initialization
net/mlx5: Check for invalid vector index on EQ creation
net/mlx5: HWS, use lock classes for bwc locks
net/mlx5: HWS, don't destroy more bwc queue locks than allocated
net/mlx5: HWS, fixed double free in error flow of definer layout
net/mlx5: HWS, removed wrong access to a number of rules variable
mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
net: phy: mdio-bcm-unimac: Add BCM6846 support
dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
udp: Compute L4 checksum as usual when not segmenting the skb
genetlink: hold RCU in genlmsg_mcast()
net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
...
|
|
BUG: KASAN: slab-use-after-free in __nf_unregister_net_hook+0x640/0x6b0
Read of size 8 at addr ffff8880106fe400 by task repro/72=
bpf_nf_link_release+0xda/0x1e0
bpf_link_free+0x139/0x2d0
bpf_link_release+0x68/0x80
__fput+0x414/0xb60
Eric says:
It seems that bpf was able to defer the __nf_unregister_net_hook()
after exit()/close() time.
Perhaps a netns reference is missing, because the netns has been
dismantled/freed already.
bpf_nf_link_attach() does :
link->net = net;
But I do not see a reference being taken on net.
Add such a reference and release it after hook unreg.
Note that I was unable to get syzbot reproducer to work, so I
do not know if this resolves this splat.
Fixes: 84601d6ee68a ("bpf: add bpf_link support for BPF_NETFILTER programs")
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
vsock_bpf_prot is set up at runtime. Remove the superfluous init.
No functional change intended.
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-4-d6577bbfe742@rbox.co
|
|
Dequeuing via vsock_transport::read_skb() left msg_count outdated, which
then confused SOCK_SEQPACKET recv(). Decrease the counter.
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-3-d6577bbfe742@rbox.co
|
|
Make sure virtio_transport_inc_rx_pkt() and virtio_transport_dec_rx_pkt()
calls are balanced (i.e. virtio_vsock_sock::rx_bytes doesn't lie) after
vsock_transport::read_skb().
While here, also inform the peer that we've freed up space and it has more
credit.
Failing to update rx_bytes after packet is dequeued leads to a warning on
SOCK_STREAM recv():
[ 233.396654] rx_queue is empty, but rx_bytes is non-zero
[ 233.396702] WARNING: CPU: 11 PID: 40601 at net/vmw_vsock/virtio_transport_common.c:589
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-2-d6577bbfe742@rbox.co
|
|
Don't mislead the callers of bpf_{sk,msg}_redirect_{map,hash}(): make sure
to immediately and visibly fail the forwarding of unsupported af_vsock
packets.
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-1-d6577bbfe742@rbox.co
|
|
Syzkaller reported this splat:
==================================================================
BUG: KASAN: slab-use-after-free in mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
Read of size 4 at addr ffff8880569ac858 by task syz.1.2799/14662
CPU: 0 UID: 0 PID: 14662 Comm: syz.1.2799 Not tainted 6.12.0-rc2-syzkaller-00307-g36c254515dc6 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:488
kasan_report+0xd9/0x110 mm/kasan/report.c:601
mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg net/socket.c:744 [inline]
____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
__sys_sendmsg+0x117/0x1f0 net/socket.c:2690
do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
__do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
RIP: 0023:0xf7fe4579
Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
RSP: 002b:00000000f574556c EFLAGS: 00000296 ORIG_RAX: 0000000000000172
RAX: ffffffffffffffda RBX: 000000000000000b RCX: 0000000020000140
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Allocated by task 5387:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
kmalloc_noprof include/linux/slab.h:878 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
subflow_create_ctx+0x87/0x2a0 net/mptcp/subflow.c:1803
subflow_ulp_init+0xc3/0x4d0 net/mptcp/subflow.c:1956
__tcp_set_ulp net/ipv4/tcp_ulp.c:146 [inline]
tcp_set_ulp+0x326/0x7f0 net/ipv4/tcp_ulp.c:167
mptcp_subflow_create_socket+0x4ae/0x10a0 net/mptcp/subflow.c:1764
__mptcp_subflow_connect+0x3cc/0x1490 net/mptcp/subflow.c:1592
mptcp_pm_create_subflow_or_signal_addr+0xbda/0x23a0 net/mptcp/pm_netlink.c:642
mptcp_pm_nl_fully_established net/mptcp/pm_netlink.c:650 [inline]
mptcp_pm_nl_work+0x3a1/0x4f0 net/mptcp/pm_netlink.c:943
mptcp_worker+0x15a/0x1240 net/mptcp/protocol.c:2777
process_one_work+0x958/0x1b30 kernel/workqueue.c:3229
process_scheduled_works kernel/workqueue.c:3310 [inline]
worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Freed by task 113:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:230 [inline]
slab_free_hook mm/slub.c:2342 [inline]
slab_free mm/slub.c:4579 [inline]
kfree+0x14f/0x4b0 mm/slub.c:4727
kvfree+0x47/0x50 mm/util.c:701
kvfree_rcu_list+0xf5/0x2c0 kernel/rcu/tree.c:3423
kvfree_rcu_drain_ready kernel/rcu/tree.c:3563 [inline]
kfree_rcu_monitor+0x503/0x8b0 kernel/rcu/tree.c:3632
kfree_rcu_shrink_scan+0x245/0x3a0 kernel/rcu/tree.c:3966
do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
shrink_slab+0x32b/0x12a0 mm/shrinker.c:662
shrink_one+0x47e/0x7b0 mm/vmscan.c:4818
shrink_many mm/vmscan.c:4879 [inline]
lru_gen_shrink_node mm/vmscan.c:4957 [inline]
shrink_node+0x2452/0x39d0 mm/vmscan.c:5937
kswapd_shrink_node mm/vmscan.c:6765 [inline]
balance_pgdat+0xc19/0x18f0 mm/vmscan.c:6957
kswapd+0x5ea/0xbf0 mm/vmscan.c:7226
kthread+0x2c1/0x3a0 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Last potentially related work creation:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
__kasan_record_aux_stack+0xba/0xd0 mm/kasan/generic.c:541
kvfree_call_rcu+0x74/0xbe0 kernel/rcu/tree.c:3810
subflow_ulp_release+0x2ae/0x350 net/mptcp/subflow.c:2009
tcp_cleanup_ulp+0x7c/0x130 net/ipv4/tcp_ulp.c:124
tcp_v4_destroy_sock+0x1c5/0x6a0 net/ipv4/tcp_ipv4.c:2541
inet_csk_destroy_sock+0x1a3/0x440 net/ipv4/inet_connection_sock.c:1293
tcp_done+0x252/0x350 net/ipv4/tcp.c:4870
tcp_rcv_state_process+0x379b/0x4f30 net/ipv4/tcp_input.c:6933
tcp_v4_do_rcv+0x1ad/0xa90 net/ipv4/tcp_ipv4.c:1938
sk_backlog_rcv include/net/sock.h:1115 [inline]
__release_sock+0x31b/0x400 net/core/sock.c:3072
__tcp_close+0x4f3/0xff0 net/ipv4/tcp.c:3142
__mptcp_close_ssk+0x331/0x14d0 net/mptcp/protocol.c:2489
mptcp_close_ssk net/mptcp/protocol.c:2543 [inline]
mptcp_close_ssk+0x150/0x220 net/mptcp/protocol.c:2526
mptcp_pm_nl_rm_addr_or_subflow+0x2be/0xcc0 net/mptcp/pm_netlink.c:878
mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg net/socket.c:744 [inline]
____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
__sys_sendmsg+0x117/0x1f0 net/socket.c:2690
do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
__do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
entry_SYSENTER_compat_after_hwframe+0x84/0x8e
The buggy address belongs to the object at ffff8880569ac800
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 88 bytes inside of
freed 512-byte region [ffff8880569ac800, ffff8880569aca00)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x569ac
head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
raw: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
head: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
head: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
head: 04fff00000000002 ffffea00015a6b01 ffffffffffffffff 0000000000000000
head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 10238, tgid 10238 (kworker/u32:6), ts 597403252405, free_ts 597177952947
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x2d1/0x350 mm/page_alloc.c:1537
prep_new_page mm/page_alloc.c:1545 [inline]
get_page_from_freelist+0x101e/0x3070 mm/page_alloc.c:3457
__alloc_pages_noprof+0x223/0x25a0 mm/page_alloc.c:4733
alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265
alloc_slab_page mm/slub.c:2412 [inline]
allocate_slab mm/slub.c:2578 [inline]
new_slab+0x2ba/0x3f0 mm/slub.c:2631
___slab_alloc+0xd1d/0x16f0 mm/slub.c:3818
__slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
__kmalloc_cache_noprof+0x2c5/0x310 mm/slub.c:4290
kmalloc_noprof include/linux/slab.h:878 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
mld_add_delrec net/ipv6/mcast.c:743 [inline]
igmp6_leave_group net/ipv6/mcast.c:2625 [inline]
igmp6_group_dropped+0x4ab/0xe40 net/ipv6/mcast.c:723
__ipv6_dev_mc_dec+0x281/0x360 net/ipv6/mcast.c:979
addrconf_leave_solict net/ipv6/addrconf.c:2253 [inline]
__ipv6_ifa_notify+0x3f6/0xc30 net/ipv6/addrconf.c:6283
addrconf_ifdown.isra.0+0xef9/0x1a20 net/ipv6/addrconf.c:3982
addrconf_notify+0x220/0x19c0 net/ipv6/addrconf.c:3781
notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1996
call_netdevice_notifiers_extack net/core/dev.c:2034 [inline]
call_netdevice_notifiers net/core/dev.c:2048 [inline]
dev_close_many+0x333/0x6a0 net/core/dev.c:1589
page last free pid 13136 tgid 13136 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1108 [inline]
free_unref_page+0x5f4/0xdc0 mm/page_alloc.c:2638
stack_depot_save_flags+0x2da/0x900 lib/stackdepot.c:666
kasan_save_stack+0x42/0x60 mm/kasan/common.c:48
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
unpoison_slab_object mm/kasan/common.c:319 [inline]
__kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345
kasan_slab_alloc include/linux/kasan.h:247 [inline]
slab_post_alloc_hook mm/slub.c:4085 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
kmem_cache_alloc_noprof+0x121/0x2f0 mm/slub.c:4141
skb_clone+0x190/0x3f0 net/core/skbuff.c:2084
do_one_broadcast net/netlink/af_netlink.c:1462 [inline]
netlink_broadcast_filtered+0xb11/0xef0 net/netlink/af_netlink.c:1540
netlink_broadcast+0x39/0x50 net/netlink/af_netlink.c:1564
uevent_net_broadcast_untagged lib/kobject_uevent.c:331 [inline]
kobject_uevent_net_broadcast lib/kobject_uevent.c:410 [inline]
kobject_uevent_env+0xacd/0x1670 lib/kobject_uevent.c:608
device_del+0x623/0x9f0 drivers/base/core.c:3882
snd_card_disconnect.part.0+0x58a/0x7c0 sound/core/init.c:546
snd_card_disconnect+0x1f/0x30 sound/core/init.c:495
snd_usx2y_disconnect+0xe9/0x1f0 sound/usb/usx2y/usbusx2y.c:417
usb_unbind_interface+0x1e8/0x970 drivers/usb/core/driver.c:461
device_remove drivers/base/dd.c:569 [inline]
device_remove+0x122/0x170 drivers/base/dd.c:561
That's because 'subflow' is used just after 'mptcp_close_ssk(subflow)',
which will initiate the release of its memory. Even if it is very likely
the release and the re-utilisation will be done later on, it is of
course better to avoid any issues and read the content of 'subflow'
before closing it.
Fixes: 1c1f72137598 ("mptcp: pm: only decrement add_addr_accepted for MPJ req")
Cc: stable@vger.kernel.org
Reported-by: syzbot+3c8b7a8e7df6a2a226ca@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/670d7337.050a0220.4cbc0.004f.GAE@google.com
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20241015-net-mptcp-uaf-pm-rm-v1-1-c4ee5d987a64@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ntuple filters can specify an rss context to use for packet hashing
and queue selection. When a filter is referencing an rss context, it
should be invalid for that context to be deleted. A list of active
ntuple filters and their associated rss contexts can be compiled by
querying a device's ethtool_ops.get_rxnfc. This patch checks to see if
any ntuple filters are referencing an rss context during context
deletion, and prevents the deletion if the requested context is still
in use.
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|