Age | Commit message (Collapse) | Author |
|
Instead of storing type_dev as a void pointer, convert it to union and
use it to store either struct net_device or struct ib_device pointer.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hosts that support 802.1X authentication are able to authenticate
themselves by exchanging EAPOL frames with an authenticator (Ethernet
bridge, in this case) and an authentication server. Access to the
network is only granted by the authenticator to successfully
authenticated hosts.
The above is implemented in the bridge using the "locked" bridge port
option. When enabled, link-local frames (e.g., EAPOL) can be locally
received by the bridge, but all other frames are dropped unless the host
is authenticated. That is, unless the user space control plane installed
an FDB entry according to which the source address of the frame is
located behind the locked ingress port. The entry can be dynamic, in
which case learning needs to be enabled so that the entry will be
refreshed by incoming traffic.
There are deployments in which not all the devices connected to the
authenticator (the bridge) support 802.1X. Such devices can include
printers and cameras. One option to support such deployments is to
unlock the bridge ports connecting these devices, but a slightly more
secure option is to use MAB. When MAB is enabled, the MAC address of the
connected device is used as the user name and password for the
authentication.
For MAB to work, the user space control plane needs to be notified about
MAC addresses that are trying to gain access so that they will be
compared against an allow list. This can be implemented via the regular
learning process with the sole difference that learned FDB entries are
installed with a new "locked" flag indicating that the entry cannot be
used to authenticate the device. The flag cannot be set by user space,
but user space can clear the flag by replacing the entry, thereby
authenticating the device.
Locked FDB entries implement the following semantics with regards to
roaming, aging and forwarding:
1. Roaming: Locked FDB entries can roam to unlocked (authorized) ports,
in which case the "locked" flag is cleared. FDB entries cannot roam
to locked ports regardless of MAB being enabled or not. Therefore,
locked FDB entries are only created if an FDB entry with the given {MAC,
VID} does not already exist. This behavior prevents unauthenticated
devices from disrupting traffic destined to already authenticated
devices.
2. Aging: Locked FDB entries age and refresh by incoming traffic like
regular entries.
3. Forwarding: Locked FDB entries forward traffic like regular entries.
If user space detects an unauthorized MAC behind a locked port and
wishes to prevent traffic with this MAC DA from reaching the host, it
can do so using tc or a different mechanism.
Enable the above behavior using a new bridge port option called "mab".
It can only be enabled on a bridge port that is both locked and has
learning enabled. Locked FDB entries are flushed from the port once MAB
is disabled. A new option is added because there are pure 802.1X
deployments that are not interested in notifications about locked FDB
entries.
Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
bpf 2022-11-04
We've added 8 non-merge commits during the last 3 day(s) which contain
a total of 10 files changed, 113 insertions(+), 16 deletions(-).
The main changes are:
1) Fix memory leak upon allocation failure in BPF verifier's stack state
tracking, from Kees Cook.
2) Fix address leakage when BPF progs release reference to an object,
from Youlin Li.
3) Fix BPF CI breakage from buggy in.h uapi header dependency,
from Andrii Nakryiko.
4) Fix bpftool pin sub-command's argument parsing, from Pu Lehui.
5) Fix BPF sockmap lockdep warning by cancelling psock work outside
of socket lock, from Cong Wang.
6) Follow-up for BPF sockmap to fix sk_forward_alloc accounting,
from Wang Yufen.
bpf-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add verifier test for release_reference()
bpf: Fix wrong reg type conversion in release_reference()
bpf, sock_map: Move cancel_work_sync() out of sock lock
tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI
net/ipv4: Fix linux/in.h header dependencies
bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE
bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues
bpf, verifier: Fix memory leak in array reallocation for stack state
====================
Link: https://lore.kernel.org/r/20221104000445.30761-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller managed to trigger another case where skb->len == 0
when we enter __dev_queue_xmit:
WARNING: CPU: 0 PID: 2470 at include/linux/skbuff.h:2576 skb_assert_len include/linux/skbuff.h:2576 [inline]
WARNING: CPU: 0 PID: 2470 at include/linux/skbuff.h:2576 __dev_queue_xmit+0x2069/0x35e0 net/core/dev.c:4295
Call Trace:
dev_queue_xmit+0x17/0x20 net/core/dev.c:4406
__bpf_tx_skb net/core/filter.c:2115 [inline]
__bpf_redirect_no_mac net/core/filter.c:2140 [inline]
__bpf_redirect+0x5fb/0xda0 net/core/filter.c:2163
____bpf_clone_redirect net/core/filter.c:2447 [inline]
bpf_clone_redirect+0x247/0x390 net/core/filter.c:2419
bpf_prog_48159a89cb4a9a16+0x59/0x5e
bpf_dispatcher_nop_func include/linux/bpf.h:897 [inline]
__bpf_prog_run include/linux/filter.h:596 [inline]
bpf_prog_run include/linux/filter.h:603 [inline]
bpf_test_run+0x46c/0x890 net/bpf/test_run.c:402
bpf_prog_test_run_skb+0xbdc/0x14c0 net/bpf/test_run.c:1170
bpf_prog_test_run+0x345/0x3c0 kernel/bpf/syscall.c:3648
__sys_bpf+0x43a/0x6c0 kernel/bpf/syscall.c:5005
__do_sys_bpf kernel/bpf/syscall.c:5091 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5089 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5089
do_syscall_64+0x54/0x70 arch/x86/entry/common.c:48
entry_SYSCALL_64_after_hwframe+0x61/0xc6
The reproducer doesn't really reproduce outside of syzkaller
environment, so I'm taking a guess here. It looks like we
do generate correct ETH_HLEN-sized packet, but we redirect
the packet to the tunneling device. Before we do so, we
__skb_pull l2 header and arrive again at skb->len == 0.
Doesn't seem like we can do anything better than having
an explicit check after __skb_pull?
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+f635e86ec3fa0a37e019@syzkaller.appspotmail.com
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221027225537.353077-1-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and netfilter.
Current release - regressions:
- net: several zerocopy flags fixes
- netfilter: fix possible memory leak in nf_nat_init()
- openvswitch: add missing .resv_start_op
Previous releases - regressions:
- neigh: fix null-ptr-deref in neigh_table_clear()
- sched: fix use after free in red_enqueue()
- dsa: fall back to default tagger if we can't load the one from DT
- bluetooth: fix use-after-free in l2cap_conn_del()
Previous releases - always broken:
- netfilter: netlink notifier might race to release objects
- nfc: fix potential memory leak of skb
- bluetooth: fix use-after-free caused by l2cap_reassemble_sdu
- bluetooth: use skb_put to set length
- eth: tun: fix bugs for oversize packet when napi frags enabled
- eth: lan966x: fixes for when MTU is changed
- eth: dwmac-loongson: fix invalid mdio_node"
* tag 'net-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
vsock: fix possible infinite sleep in vsock_connectible_wait_data()
vsock: remove the unused 'wait' in vsock_connectible_recvmsg()
ipv6: fix WARNING in ip6_route_net_exit_late()
bridge: Fix flushing of dynamic FDB entries
net, neigh: Fix null-ptr-deref in neigh_table_clear()
net/smc: Fix possible leaked pernet namespace in smc_init()
stmmac: dwmac-loongson: fix invalid mdio_node
ibmvnic: Free rwi on reset success
net: mdio: fix undefined behavior in bit shift for __mdiobus_register
Bluetooth: L2CAP: Fix attempting to access uninitialized memory
Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm
Bluetooth: L2CAP: Fix accepting connection request for invalid SPSM
Bluetooth: hci_conn: Fix not restoring ISO buffer count on disconnect
Bluetooth: L2CAP: Fix memory leak in vhci_write
Bluetooth: L2CAP: fix use-after-free in l2cap_conn_del()
Bluetooth: virtio_bt: Use skb_put to set length
Bluetooth: hci_conn: Fix CIS connection dst_type handling
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_reassemble_sdu
netfilter: ipset: enforce documented limit to prevent allocating huge memory
isdn: mISDN: netjet: fix wrong check of device registration
...
|
|
Add new apptrust extension attributes to the 8021Qaz APP managed object.
Two new attributes, DCB_ATTR_DCB_APP_TRUST_TABLE and
DCB_ATTR_DCB_APP_TRUST, has been added. Trusted selectors are passed in
the nested attribute DCB_ATTR_DCB_APP_TRUST, in order of precedence.
The new attributes are meant to allow drivers, whose hw supports the
notion of trust, to be able to set whether a particular app selector is
trusted - and in which order.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add new PCP selector for the 8021Qaz APP managed object.
As the PCP selector is not part of the 8021Qaz standard, a new non-std
extension attribute DCB_ATTR_DCB_APP has been introduced. Also two
helper functions to translate between selector and app attribute type
has been added. The new selector has been given a value of 255, to
minimize the risk of future overlap of std- and non-std attributes.
The new DCB_ATTR_DCB_APP is sent alongside the ieee std attribute in the
app table. This means that the dcb_app struct can now both contain std-
and non-std app attributes. Currently there is no overlap between the
selector values of the two attributes.
The purpose of adding the PCP selector, is to be able to offload
PCP-based queue classification to the 8021Q Priority Code Point table,
see 6.9.3 of IEEE Std 802.1Q-2018.
PCP and DEI is encoded in the protocol field as 8*dei+pcp, so that a
mapping of PCP 2 and DEI 1 to priority 3 is encoded as {255, 10, 3}.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Stanislav reported a lockdep warning, which is caused by the
cancel_work_sync() called inside sock_map_close(), as analyzed
below by Jakub:
psock->work.func = sk_psock_backlog()
ACQUIRE psock->work_mutex
sk_psock_handle_skb()
skb_send_sock()
__skb_send_sock()
sendpage_unlocked()
kernel_sendpage()
sock->ops->sendpage = inet_sendpage()
sk->sk_prot->sendpage = tcp_sendpage()
ACQUIRE sk->sk_lock
tcp_sendpage_locked()
RELEASE sk->sk_lock
RELEASE psock->work_mutex
sock_map_close()
ACQUIRE sk->sk_lock
sk_psock_stop()
sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED)
cancel_work_sync()
__cancel_work_timer()
__flush_work()
// wait for psock->work to finish
RELEASE sk->sk_lock
We can move the cancel_work_sync() out of the sock lock protection,
but still before saved_close() was called.
Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20221102043417.279409-1-xiyou.wangcong@gmail.com
|
|
Currently vsock_connectible_has_data() may miss a wakeup operation
between vsock_connectible_has_data() == 0 and the prepare_to_wait().
Fix the race by adding the process to the wait queue before checking
vsock_connectible_has_data().
Fixes: b3f7fd54881b ("af_vsock: separate wait data loop")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reported-by: Frédéric Dalleau <frederic.dalleau@docker.com>
Tested-by: Frédéric Dalleau <frederic.dalleau@docker.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Remove the unused variable introduced by 19c1b90e1979.
Fixes: 19c1b90e1979 ("af_vsock: separate receive data loop")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
During the initialization of ip6_route_net_init_late(), if file
ipv6_route or rt6_stats fails to be created, the initialization is
successful by default. Therefore, the ipv6_route or rt6_stats file
doesn't be found during the remove in ip6_route_net_exit_late(). It
will cause WRNING.
The following is the stack information:
name 'rt6_stats'
WARNING: CPU: 0 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460
Modules linked in:
Workqueue: netns cleanup_net
RIP: 0010:remove_proc_entry+0x389/0x460
PKRU: 55555554
Call Trace:
<TASK>
ops_exit_list+0xb0/0x170
cleanup_net+0x4ea/0xb00
process_one_work+0x9bf/0x1710
worker_thread+0x665/0x1080
kthread+0x2e4/0x3a0
ret_from_fork+0x1f/0x30
</TASK>
Fixes: cdb1876192db ("[NETNS][IPV6] route6 - create route6 proc files for the namespace")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221102020610.351330-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The following commands should result in all the dynamic FDB entries
being flushed, but instead all the non-local (non-permanent) entries are
flushed:
# bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static
# bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic
# ip link set dev br0 type bridge fdb_flush
# bridge fdb show brport dummy1
00:00:00:00:00:01 master br0 permanent
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent
This is because br_fdb_flush() works with FDB flags and not the
corresponding enumerator values. Fix by passing the FDB flag instead.
After the fix:
# bridge fdb add 00:aa:bb:cc:dd:ee dev dummy1 master static
# bridge fdb add 00:11:22:33:44:55 dev dummy1 master dynamic
# ip link set dev br0 type bridge fdb_flush
# bridge fdb show brport dummy1
00:aa:bb:cc:dd:ee master br0 static
00:00:00:00:00:01 master br0 permanent
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent
Fixes: 1f78ee14eeac ("net: bridge: fdb: add support for fine-grained flushing")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20221101185753.2120691-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When IPv6 module gets initialized but hits an error in the middle,
kenel panic with:
KASAN: null-ptr-deref in range [0x0000000000000598-0x000000000000059f]
CPU: 1 PID: 361 Comm: insmod
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:__neigh_ifdown.isra.0+0x24b/0x370
RSP: 0018:ffff888012677908 EFLAGS: 00000202
...
Call Trace:
<TASK>
neigh_table_clear+0x94/0x2d0
ndisc_cleanup+0x27/0x40 [ipv6]
inet6_init+0x21c/0x2cb [ipv6]
do_one_initcall+0xd3/0x4d0
do_init_module+0x1ae/0x670
...
Kernel panic - not syncing: Fatal exception
When ipv6 initialization fails, it will try to cleanup and calls:
neigh_table_clear()
neigh_ifdown(tbl, NULL)
pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev == NULL))
# dev_net(NULL) triggers null-ptr-deref.
Fix it by passing NULL to pneigh_queue_purge() in neigh_ifdown() if dev
is NULL, to make kernel not panic immediately.
Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Link: https://lore.kernel.org/r/20221101121552.21890-1-chenzhongjin@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In smc_init(), register_pernet_subsys(&smc_net_stat_ops) is called
without any error handling.
If it fails, registering of &smc_net_ops won't be reverted.
And if smc_nl_init() fails, &smc_net_stat_ops itself won't be reverted.
This leaves wild ops in subsystem linkedlist and when another module
tries to call register_pernet_operations() it triggers page fault:
BUG: unable to handle page fault for address: fffffbfff81b964c
RIP: 0010:register_pernet_operations+0x1b9/0x5f0
Call Trace:
<TASK>
register_pernet_subsys+0x29/0x40
ebtables_init+0x58/0x1000 [ebtables]
...
Fixes: 194730a9beb5 ("net/smc: Make SMC statistics network namespace aware")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20221101093722.127223-1-chenzhongjin@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
1) netlink socket notifier might win race to release objects that are
already pending to be released via commit release path, reported by
syzbot.
2) No need to postpone flow rule release to commit release path, this
triggered the syzbot report, complementary fix to previous patch.
3) Use explicit signed chars in IPVS to unbreak arm, from Jason A. Donenfeld.
4) Missing check for proc entry creation failure in IPVS, from Zhengchao Shao.
5) Incorrect error path handling when BPF NAT fails to register, from
Chen Zhongjin.
6) Prevent huge memory allocation in ipset hash types, from Jozsef Kadlecsik.
Except the incorrect BPF NAT error path which is broken in 6.1-rc, anything
else has been broken for several releases.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: ipset: enforce documented limit to prevent allocating huge memory
netfilter: nf_nat: Fix possible memory leak in nf_nat_init()
ipvs: fix WARNING in ip_vs_app_net_cleanup()
ipvs: fix WARNING in __ip_vs_cleanup_batch()
ipvs: use explicitly signed chars
netfilter: nf_tables: release flow rule object from commit path
netfilter: nf_tables: netlink notifier might race to release objects
====================
Link: https://lore.kernel.org/r/20221102184659.2502-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On l2cap_parse_conf_req the variable efs is only initialized if
remote_efs has been set.
CVE: CVE-2022-42895
CC: stable@vger.kernel.org
Reported-by: Tamás Koczka <poprdi@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: Tedd Ho-Jeong An <tedd.an@intel.com>
|
|
l2cap_global_chan_by_psm shall not return fixed channels as they are not
meant to be connected by (S)PSM.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: Tedd Ho-Jeong An <tedd.an@intel.com>
|
|
The Bluetooth spec states that the valid range for SPSM is from
0x0001-0x00ff so it is invalid to accept values outside of this range:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part A
page 1059:
Table 4.15: L2CAP_LE_CREDIT_BASED_CONNECTION_REQ SPSM ranges
CVE: CVE-2022-42896
CC: stable@vger.kernel.org
Reported-by: Tamás Koczka <poprdi@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: Tedd Ho-Jeong An <tedd.an@intel.com>
|
|
When disconnecting an ISO link the controller may not generate
HCI_EV_NUM_COMP_PKTS for unacked packets which needs to be restored in
hci_conn_del otherwise the host would assume they are still in use and
would not be able to use all the buffers available.
Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Frédéric Danis <frederic.danis@collabora.com>
|
|
Syzkaller reports a memory leak as follows:
====================================
BUG: memory leak
unreferenced object 0xffff88810d81ac00 (size 240):
[...]
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff838733d9>] __alloc_skb+0x1f9/0x270 net/core/skbuff.c:418
[<ffffffff833f742f>] alloc_skb include/linux/skbuff.h:1257 [inline]
[<ffffffff833f742f>] bt_skb_alloc include/net/bluetooth/bluetooth.h:469 [inline]
[<ffffffff833f742f>] vhci_get_user drivers/bluetooth/hci_vhci.c:391 [inline]
[<ffffffff833f742f>] vhci_write+0x5f/0x230 drivers/bluetooth/hci_vhci.c:511
[<ffffffff815e398d>] call_write_iter include/linux/fs.h:2192 [inline]
[<ffffffff815e398d>] new_sync_write fs/read_write.c:491 [inline]
[<ffffffff815e398d>] vfs_write+0x42d/0x540 fs/read_write.c:578
[<ffffffff815e3cdd>] ksys_write+0x9d/0x160 fs/read_write.c:631
[<ffffffff845e0645>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff845e0645>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
[<ffffffff84600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
====================================
HCI core will uses hci_rx_work() to process frame, which is queued to
the hdev->rx_q tail in hci_recv_frame() by HCI driver.
Yet the problem is that, HCI core may not free the skb after handling
ACL data packets. To be more specific, when start fragment does not
contain the L2CAP length, HCI core just copies skb into conn->rx_skb and
finishes frame process in l2cap_recv_acldata(), without freeing the skb,
which triggers the above memory leak.
This patch solves it by releasing the relative skb, after processing
the above case in l2cap_recv_acldata().
Fixes: 4d7ea8ee90e4 ("Bluetooth: L2CAP: Fix handling fragmented length")
Link: https://lore.kernel.org/all/0000000000000d0b1905e6aaef64@google.com/
Reported-and-tested-by: syzbot+8f819e36e01022991cfa@syzkaller.appspotmail.com
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
When l2cap_recv_frame() is invoked to receive data, and the cid is
L2CAP_CID_A2MP, if the channel does not exist, it will create a channel.
However, after a channel is created, the hold operation of the channel
is not performed. In this case, the value of channel reference counting
is 1. As a result, after hci_error_reset() is triggered, l2cap_conn_del()
invokes the close hook function of A2MP to release the channel. Then
l2cap_chan_unlock(chan) will trigger UAF issue.
The process is as follows:
Receive data:
l2cap_data_channel()
a2mp_channel_create() --->channel ref is 2
l2cap_chan_put() --->channel ref is 1
Triger event:
hci_error_reset()
hci_dev_do_close()
...
l2cap_disconn_cfm()
l2cap_conn_del()
l2cap_chan_hold() --->channel ref is 2
l2cap_chan_del() --->channel ref is 1
a2mp_chan_close_cb() --->channel ref is 0, release channel
l2cap_chan_unlock() --->UAF of channel
The detailed Call Trace is as follows:
BUG: KASAN: use-after-free in __mutex_unlock_slowpath+0xa6/0x5e0
Read of size 8 at addr ffff8880160664b8 by task kworker/u11:1/7593
Workqueue: hci0 hci_error_reset
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
print_report.cold+0x2ba/0x719
kasan_report+0xb1/0x1e0
kasan_check_range+0x140/0x190
__mutex_unlock_slowpath+0xa6/0x5e0
l2cap_conn_del+0x404/0x7b0
l2cap_disconn_cfm+0x8c/0xc0
hci_conn_hash_flush+0x11f/0x260
hci_dev_close_sync+0x5f5/0x11f0
hci_dev_do_close+0x2d/0x70
hci_error_reset+0x9e/0x140
process_one_work+0x98a/0x1620
worker_thread+0x665/0x1080
kthread+0x2e4/0x3a0
ret_from_fork+0x1f/0x30
</TASK>
Allocated by task 7593:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0xa9/0xd0
l2cap_chan_create+0x40/0x930
amp_mgr_create+0x96/0x990
a2mp_channel_create+0x7d/0x150
l2cap_recv_frame+0x51b8/0x9a70
l2cap_recv_acldata+0xaa3/0xc00
hci_rx_work+0x702/0x1220
process_one_work+0x98a/0x1620
worker_thread+0x665/0x1080
kthread+0x2e4/0x3a0
ret_from_fork+0x1f/0x30
Freed by task 7593:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_set_free_info+0x20/0x30
____kasan_slab_free+0x167/0x1c0
slab_free_freelist_hook+0x89/0x1c0
kfree+0xe2/0x580
l2cap_chan_put+0x22a/0x2d0
l2cap_conn_del+0x3fc/0x7b0
l2cap_disconn_cfm+0x8c/0xc0
hci_conn_hash_flush+0x11f/0x260
hci_dev_close_sync+0x5f5/0x11f0
hci_dev_do_close+0x2d/0x70
hci_error_reset+0x9e/0x140
process_one_work+0x98a/0x1620
worker_thread+0x665/0x1080
kthread+0x2e4/0x3a0
ret_from_fork+0x1f/0x30
Last potentially related work creation:
kasan_save_stack+0x1e/0x40
__kasan_record_aux_stack+0xbe/0xd0
call_rcu+0x99/0x740
netlink_release+0xe6a/0x1cf0
__sock_release+0xcd/0x280
sock_close+0x18/0x20
__fput+0x27c/0xa90
task_work_run+0xdd/0x1a0
exit_to_user_mode_prepare+0x23c/0x250
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x42/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Second to last potentially related work creation:
kasan_save_stack+0x1e/0x40
__kasan_record_aux_stack+0xbe/0xd0
call_rcu+0x99/0x740
netlink_release+0xe6a/0x1cf0
__sock_release+0xcd/0x280
sock_close+0x18/0x20
__fput+0x27c/0xa90
task_work_run+0xdd/0x1a0
exit_to_user_mode_prepare+0x23c/0x250
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x42/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: d0be8347c623 ("Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
hci_connect_cis and iso_connect_cis call hci_bind_cis inconsistently
with dst_type being either ISO socket address type or the HCI type, but
these values cannot be mixed like this. Fix this by using only the HCI
type.
CIS connection dst_type was also not initialized in hci_bind_cis, even
though it is used in hci_conn_hash_lookup_cis to find existing
connections. Set the value in hci_bind_cis, so that existing CIS
connections are found e.g. when doing deferred socket connections, also
when dst_type is not 0 (ADDR_LE_DEV_PUBLIC).
Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Fix the race condition between the following two flows that run in
parallel:
1. l2cap_reassemble_sdu -> chan->ops->recv (l2cap_sock_recv_cb) ->
__sock_queue_rcv_skb.
2. bt_sock_recvmsg -> skb_recv_datagram, skb_free_datagram.
An SKB can be queued by the first flow and immediately dequeued and
freed by the second flow, therefore the callers of l2cap_reassemble_sdu
can't use the SKB after that function returns. However, some places
continue accessing struct l2cap_ctrl that resides in the SKB's CB for a
short time after l2cap_reassemble_sdu returns, leading to a
use-after-free condition (the stack trace is below, line numbers for
kernel 5.19.8).
Fix it by keeping a local copy of struct l2cap_ctrl.
BUG: KASAN: use-after-free in l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth
Read of size 1 at addr ffff88812025f2f0 by task kworker/u17:3/43169
Workqueue: hci0 hci_rx_work [bluetooth]
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4))
print_report.cold (mm/kasan/report.c:314 mm/kasan/report.c:429)
? l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth
kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493)
? l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth
l2cap_rx_state_recv (net/bluetooth/l2cap_core.c:6906) bluetooth
l2cap_rx (net/bluetooth/l2cap_core.c:7236 net/bluetooth/l2cap_core.c:7271) bluetooth
ret_from_fork (arch/x86/entry/entry_64.S:306)
</TASK>
Allocated by task 43169:
kasan_save_stack (mm/kasan/common.c:39)
__kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469)
kmem_cache_alloc_node (mm/slab.h:750 mm/slub.c:3243 mm/slub.c:3293)
__alloc_skb (net/core/skbuff.c:414)
l2cap_recv_frag (./include/net/bluetooth/bluetooth.h:425 net/bluetooth/l2cap_core.c:8329) bluetooth
l2cap_recv_acldata (net/bluetooth/l2cap_core.c:8442) bluetooth
hci_rx_work (net/bluetooth/hci_core.c:3642 net/bluetooth/hci_core.c:3832) bluetooth
process_one_work (kernel/workqueue.c:2289)
worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2437)
kthread (kernel/kthread.c:376)
ret_from_fork (arch/x86/entry/entry_64.S:306)
Freed by task 27920:
kasan_save_stack (mm/kasan/common.c:39)
kasan_set_track (mm/kasan/common.c:45)
kasan_set_free_info (mm/kasan/generic.c:372)
____kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328)
slab_free_freelist_hook (mm/slub.c:1780)
kmem_cache_free (mm/slub.c:3536 mm/slub.c:3553)
skb_free_datagram (./include/net/sock.h:1578 ./include/net/sock.h:1639 net/core/datagram.c:323)
bt_sock_recvmsg (net/bluetooth/af_bluetooth.c:295) bluetooth
l2cap_sock_recvmsg (net/bluetooth/l2cap_sock.c:1212) bluetooth
sock_read_iter (net/socket.c:1087)
new_sync_read (./include/linux/fs.h:2052 fs/read_write.c:401)
vfs_read (fs/read_write.c:482)
ksys_read (fs/read_write.c:620)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
Link: https://lore.kernel.org/linux-bluetooth/CAKErNvoqga1WcmoR3-0875esY6TVWFQDandbVZncSiuGPBQXLA@mail.gmail.com/T/#u
Fixes: d2a7ac5d5d3a ("Bluetooth: Add the ERTM receive state machine")
Fixes: 4b51dae96731 ("Bluetooth: Add streaming mode receive and incoming packet classifier")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Daniel Xu reported that the hash:net,iface type of the ipset subsystem does
not limit adding the same network with different interfaces to a set, which
can lead to huge memory usage or allocation failure.
The quick reproducer is
$ ipset create ACL.IN.ALL_PERMIT hash:net,iface hashsize 1048576 timeout 0
$ for i in $(seq 0 100); do /sbin/ipset add ACL.IN.ALL_PERMIT 0.0.0.0/0,kaf_$i timeout 0 -exist; done
The backtrace when vmalloc fails:
[Tue Oct 25 00:13:08 2022] ipset: vmalloc error: size 1073741848, exceeds total pages
<...>
[Tue Oct 25 00:13:08 2022] Call Trace:
[Tue Oct 25 00:13:08 2022] <TASK>
[Tue Oct 25 00:13:08 2022] dump_stack_lvl+0x48/0x60
[Tue Oct 25 00:13:08 2022] warn_alloc+0x155/0x180
[Tue Oct 25 00:13:08 2022] __vmalloc_node_range+0x72a/0x760
[Tue Oct 25 00:13:08 2022] ? hash_netiface4_add+0x7c0/0xb20
[Tue Oct 25 00:13:08 2022] ? __kmalloc_large_node+0x4a/0x90
[Tue Oct 25 00:13:08 2022] kvmalloc_node+0xa6/0xd0
[Tue Oct 25 00:13:08 2022] ? hash_netiface4_resize+0x99/0x710
<...>
The fix is to enforce the limit documented in the ipset(8) manpage:
> The internal restriction of the hash:net,iface set type is that the same
> network prefix cannot be stored with more than 64 different interfaces
> in a single set.
Fixes: ccf0a4b7fc68 ("netfilter: ipset: Add bucketsize parameter to all hash types")
Reported-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull NFS client bugfixes from Anna Schumaker:
- Fix some coccicheck warnings
- Avoid memcpy() run-time warning
- Fix up various state reclaim / RECLAIM_COMPLETE errors
- Fix a null pointer dereference in sysfs
- Fix LOCK races
- Fix gss_unwrap_resp_integ() crasher
- Fix zero length clones
- Fix memleak when allocate slot fails
* tag 'nfs-for-6.1-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs4: Fix kmemleak when allocate slot failed
NFSv4.2: Fixup CLONE dest file size for zero-length count
SUNRPC: Fix crasher in gss_unwrap_resp_integ()
NFSv4: Retry LOCK on OLD_STATEID during delegation return
SUNRPC: Fix null-ptr-deref when xps sysfs alloc failed
NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot
NFSv4.1: Handle RECLAIM_COMPLETE trunking errors
NFSv4: Fix a potential state reclaim deadlock
NFS: Avoid memcpy() run-time warning for struct sockaddr overflows
nfs: Remove redundant null checks before kfree
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
bpf-next 2022-11-02
We've added 70 non-merge commits during the last 14 day(s) which contain
a total of 96 files changed, 3203 insertions(+), 640 deletions(-).
The main changes are:
1) Make cgroup local storage available to non-cgroup attached BPF programs
such as tc BPF ones, from Yonghong Song.
2) Avoid unnecessary deadlock detection and failures wrt BPF task storage
helpers, from Martin KaFai Lau.
3) Add LLVM disassembler as default library for dumping JITed code
in bpftool, from Quentin Monnet.
4) Various kprobe_multi_link fixes related to kernel modules,
from Jiri Olsa.
5) Optimize x86-64 JIT with emitting BMI2-based shift instructions,
from Jie Meng.
6) Improve BPF verifier's memory type compatibility for map key/value
arguments, from Dave Marchevsky.
7) Only create mmap-able data section maps in libbpf when data is exposed
via skeletons, from Andrii Nakryiko.
8) Add an autoattach option for bpftool to load all object assets,
from Wang Yufen.
9) Various memory handling fixes for libbpf and BPF selftests,
from Xu Kuohai.
10) Initial support for BPF selftest's vmtest.sh on arm64,
from Manu Bretelle.
11) Improve libbpf's BTF handling to dedup identical structs,
from Alan Maguire.
12) Add BPF CI and denylist documentation for BPF selftests,
from Daniel Müller.
13) Check BPF cpumap max_entries before doing allocation work,
from Florian Lehner.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (70 commits)
samples/bpf: Fix typo in README
bpf: Remove the obsolte u64_stats_fetch_*_irq() users.
bpf: check max_entries before allocating memory
bpf: Fix a typo in comment for DFS algorithm
bpftool: Fix spelling mistake "disasembler" -> "disassembler"
selftests/bpf: Fix bpftool synctypes checking failure
selftests/bpf: Panic on hard/soft lockup
docs/bpf: Add documentation for new cgroup local storage
selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x
selftests/bpf: Add selftests for new cgroup local storage
selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str
bpftool: Support new cgroup local storage
libbpf: Support new cgroup local storage
bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
bpf: Refactor some inode/task/sk storage functions for reuse
bpf: Make struct cgroup btf id global
selftests/bpf: Tracing prog can still do lookup under busy lock
selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection
bpf: Add new bpf_task_storage_delete proto with no deadlock detection
bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check
...
====================
Link: https://lore.kernel.org/r/20221102062120.5724-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add remote KCOV annotations for NFC processing that is done
in background threads. This enables efficient coverage-guided
fuzzing of the NFC subsystem.
The intention is to add annotations to background threads that
process skb's that were allocated in syscall context
(thus have a KCOV handle associated with the current fuzz test).
This includes nci_recv_frame() that is called by the virtual nci
driver in the syscall context.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The syzkaller reported an issue:
KASAN: null-ptr-deref in range [0x0000000000000380-0x0000000000000387]
CPU: 0 PID: 4069 Comm: kworker/0:15 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: rcu_gp srcu_invoke_callbacks
RIP: 0010:rose_send_frame+0x1dd/0x2f0 net/rose/rose_link.c:101
Call Trace:
<IRQ>
rose_transmit_clear_request+0x1d5/0x290 net/rose/rose_link.c:255
rose_rx_call_request+0x4c0/0x1bc0 net/rose/af_rose.c:1009
rose_loopback_timer+0x19e/0x590 net/rose/rose_loopback.c:111
call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474
expire_timers kernel/time/timer.c:1519 [inline]
__run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790
__run_timers kernel/time/timer.c:1768 [inline]
run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571
[...]
</IRQ>
It triggers NULL pointer dereference when 'neigh->dev->dev_addr' is
called in the rose_send_frame(). It's the first occurrence of the
`neigh` is in rose_loopback_timer() as `rose_loopback_neigh', and
the 'dev' in 'rose_loopback_neigh' is initialized sa nullptr.
It had been fixed by commit 3b3fd068c56e3fbea30090859216a368398e39bf
("rose: Fix Null pointer dereference in rose_send_frame()") ever.
But it's introduced by commit 3c53cd65dece47dd1f9d3a809f32e59d1d87b2b8
("rose: check NULL rose_loopback_neigh->loopback") again.
We fix it by add NULL check in rose_transmit_clear_request(). When
the 'dev' in 'neigh' is NULL, we don't reply the request and just
clear it.
syzkaller don't provide repro, and I provide a syz repro like:
r0 = syz_init_net_socket$bt_sco(0x1f, 0x5, 0x2)
ioctl$sock_inet_SIOCSIFFLAGS(r0, 0x8914, &(0x7f0000000180)={'rose0\x00', 0x201})
r1 = syz_init_net_socket$rose(0xb, 0x5, 0x0)
bind$rose(r1, &(0x7f00000000c0)=@full={0xb, @dev, @null, 0x0, [@null, @null, @netrom, @netrom, @default, @null]}, 0x40)
connect$rose(r1, &(0x7f0000000240)=@short={0xb, @dev={0xbb, 0xbb, 0xbb, 0x1, 0x0}, @remote={0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0x1}, 0x1, @netrom={0xbb, 0xbb, 0xbb, 0xbb, 0xbb, 0x0, 0x0}}, 0x1c)
Fixes: 3c53cd65dece ("rose: check NULL rose_loopback_neigh->loopback")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In nf_nat_init(), register_nf_nat_bpf() can fail and return directly
without any error handling.
Then nf_nat_bysource will leak and registering of &nat_net_ops,
&follow_master_nat and nf_nat_hook won't be reverted.
This leaves wild ops in linkedlists and when another module tries to
call register_pernet_operations() or nf_ct_helper_expectfn_register()
it triggers page fault:
BUG: unable to handle page fault for address: fffffbfff81b964c
RIP: 0010:register_pernet_operations+0x1b9/0x5f0
Call Trace:
<TASK>
register_pernet_subsys+0x29/0x40
ebtables_init+0x58/0x1000 [ebtables]
...
Fixes: 820dc0523e05 ("net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The TWT Information Frame Disabled bit of control field of TWT Setup
frame shall be set to 1 since handling TWT Information frame is not
supported by current mac80211 implementation.
Fixes: f5a4c24e689f ("mac80211: introduce individual TWT support in AP mode")
Signed-off-by: Howard Hsu <howard-yh.hsu@mediatek.com>
Link: https://lore.kernel.org/r/20221027015653.1448-1-howard-yh.hsu@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When trying to transmit an data frame with tx_status to a destination
that have no route in the mesh, then it is dropped without recrediting
the ack_status_frames idr.
Once it is exhausted, wpa_supplicant starts failing to do SAE with
NL80211_CMD_FRAME and logs "nl80211: Frame command failed".
Use ieee80211_free_txskb() instead of kfree_skb() to fix it.
Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Link: https://lore.kernel.org/r/20221027140133.1504-1-nicolas.cavallari@green-communications.fr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When device is running and the interface status is changed, the gpf issue
is triggered. The problem triggering process is as follows:
Thread A: Thread B
ieee80211_runtime_change_iftype() process_one_work()
... ...
ieee80211_do_stop() ...
... ...
sdata->bss = NULL ...
... ieee80211_subif_start_xmit()
ieee80211_multicast_to_unicast
//!sdata->bss->multicast_to_unicast
cause gpf issue
When the interface status is changed, the sending queue continues to send
packets. After the bss is set to NULL, the bss is accessed. As a result,
this causes a general-protection-fault issue.
The following is the stack information:
general protection fault, probably for non-canonical address
0xdffffc000000002f: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000178-0x000000000000017f]
Workqueue: mld mld_ifc_work
RIP: 0010:ieee80211_subif_start_xmit+0x25b/0x1310
Call Trace:
<TASK>
dev_hard_start_xmit+0x1be/0x990
__dev_queue_xmit+0x2c9a/0x3b60
ip6_finish_output2+0xf92/0x1520
ip6_finish_output+0x6af/0x11e0
ip6_output+0x1ed/0x540
mld_sendpack+0xa09/0xe70
mld_ifc_work+0x71c/0xdb0
process_one_work+0x9bf/0x1710
worker_thread+0x665/0x1080
kthread+0x2e4/0x3a0
ret_from_fork+0x1f/0x30
</TASK>
Fixes: f856373e2f31 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Reported-by: syzbot+c6e8fca81c294fd5620a@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20221026063959.177813-1-shaozhengchao@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
During the initialization of ip_vs_app_net_init(), if file ip_vs_app
fails to be created, the initialization is successful by default.
Therefore, the ip_vs_app file doesn't be found during the remove in
ip_vs_app_net_cleanup(). It will cause WRNING.
The following is the stack information:
name 'ip_vs_app'
WARNING: CPU: 1 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460
Modules linked in:
Workqueue: netns cleanup_net
RIP: 0010:remove_proc_entry+0x389/0x460
Call Trace:
<TASK>
ops_exit_list+0x125/0x170
cleanup_net+0x4ea/0xb00
process_one_work+0x9bf/0x1710
worker_thread+0x665/0x1080
kthread+0x2e4/0x3a0
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 457c4cbc5a3d ("[NET]: Make /proc/net per network namespace")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
During the initialization of ip_vs_conn_net_init(), if file ip_vs_conn
or ip_vs_conn_sync fails to be created, the initialization is successful
by default. Therefore, the ip_vs_conn or ip_vs_conn_sync file doesn't
be found during the remove.
The following is the stack information:
name 'ip_vs_conn_sync'
WARNING: CPU: 3 PID: 9 at fs/proc/generic.c:712
remove_proc_entry+0x389/0x460
Modules linked in:
Workqueue: netns cleanup_net
RIP: 0010:remove_proc_entry+0x389/0x460
Call Trace:
<TASK>
__ip_vs_cleanup_batch+0x7d/0x120
ops_exit_list+0x125/0x170
cleanup_net+0x4ea/0xb00
process_one_work+0x9bf/0x1710
worker_thread+0x665/0x1080
kthread+0x2e4/0x3a0
ret_from_fork+0x1f/0x30
</TASK>
Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The `char` type with no explicit sign is sometimes signed and sometimes
unsigned. This code will break on platforms such as arm, where char is
unsigned. So mark it here as explicitly signed, so that the
todrop_counter decrement and subsequent comparison is correct.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Jakub reported that the addition of the "network_byte_order"
member in struct nla_policy increases size of 32bit platforms.
Instead of scraping the bit from elsewhere Johannes suggested
to add explicit NLA_BE types instead, so do this here.
NLA_POLICY_MAX_BE() macro is removed again, there is no need
for it: NLA_POLICY_MAX(NLA_BE.., ..) will do the right thing.
NLA_BE64 can be added later.
Fixes: 08724ef69907 ("netlink: introduce NLA_POLICY_MAX_BE")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20221031123407.9158-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commits 36a6503fedda ("tcp: refine tcp_prune_ofo_queue()
to not drop all packets") and 72cd43ba64fc1
("tcp: free batches of packets in tcp_prune_ofo_queue()")
tcp_prune_ofo_queue() drops a fraction of ooo queue,
to make room for incoming packet.
However it makes no sense to drop packets that are
before the incoming packet, in sequence space.
In order to recover from packet losses faster,
it makes more sense to only drop ooo packets
which are after the incoming packet.
Tested:
packetdrill test:
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [3800], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 0>
+.1 < . 1:1(0) ack 1 win 1024
+0 accept(3, ..., ...) = 4
+.01 < . 200:300(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 200:300>
+.01 < . 400:500(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 400:500 200:300>
+.01 < . 600:700(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 600:700 400:500 200:300>
+.01 < . 800:900(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 800:900 600:700 400:500 200:300>
+.01 < . 1000:1100(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 1000:1100 800:900 600:700 400:500>
+.01 < . 1200:1300(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700>
// this packet is dropped because we have no room left.
+.01 < . 1400:1500(100) ack 1 win 1024
+.01 < . 1:200(199) ack 1 win 1024
// Make sure kernel did not drop 200:300 sequence
+0 > . 1:1(0) ack 300 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700>
// Make room, since our RCVBUF is very small
+0 read(4, ..., 299) = 299
+.01 < . 300:400(100) ack 1 win 1024
+0 > . 1:1(0) ack 500 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700>
+.01 < . 500:600(100) ack 1 win 1024
+0 > . 1:1(0) ack 700 <nop,nop, sack 1200:1300 1000:1100 800:900>
+0 read(4, ..., 400) = 400
+.01 < . 700:800(100) ack 1 win 1024
+0 > . 1:1(0) ack 900 <nop,nop, sack 1200:1300 1000:1100>
+.01 < . 900:1000(100) ack 1 win 1024
+0 > . 1:1(0) ack 1100 <nop,nop, sack 1200:1300>
+.01 < . 1100:1200(100) ack 1 win 1024
// This checks that 1200:1300 has not been removed from ooo queue
+0 > . 1:1(0) ack 1300
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20221101035234.3910189-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet[46]_pton check the input length against
a sane length limit (INET[6]_ADDRSTRLEN), but
the strlen value gets truncated due to being stored in an int,
so there's a theoretical potential for a >4G string to pass
the limit test.
Use size_t since that's what strlen actually returns.
I've had a hunt for callers that could hit this, but
I've not managed to find anything that doesn't get checked with
some other limit first; but it's possible that I've missed
something in the depth of the storage target paths.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20221029014604.114024-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When running `test_sockmap` selftests, the following warning appears:
WARNING: CPU: 2 PID: 197 at net/core/stream.c:205 sk_stream_kill_queues+0xd3/0xf0
Call Trace:
<TASK>
inet_csk_destroy_sock+0x55/0x110
tcp_rcv_state_process+0xd28/0x1380
? tcp_v4_do_rcv+0x77/0x2c0
tcp_v4_do_rcv+0x77/0x2c0
__release_sock+0x106/0x130
__tcp_close+0x1a7/0x4e0
tcp_close+0x20/0x70
inet_release+0x3c/0x80
__sock_release+0x3a/0xb0
sock_close+0x14/0x20
__fput+0xa3/0x260
task_work_run+0x59/0xb0
exit_to_user_mode_prepare+0x1b3/0x1c0
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x48/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The root case is in commit 84472b436e76 ("bpf, sockmap: Fix more uncharged
while msg has more_data"), where I used msg->sg.size to replace the tosend,
causing breakage:
if (msg->apply_bytes && msg->apply_bytes < tosend)
tosend = psock->apply_bytes;
Fixes: 84472b436e76 ("bpf, sockmap: Fix more uncharged while msg has more_data")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/1667266296-8794-1-git-send-email-wangyufen@huawei.com
|
|
A common exploit pattern for ROP attacks is to abuse prepare_kernel_cred()
in order to construct escalated privileges[1]. Instead of providing a
short-hand argument (NULL) to the "daemon" argument to indicate using
init_cred as the base cred, require that "daemon" is always set to
an actual task. Replace all existing callers that were passing NULL
with &init_task.
Future attacks will need to have sufficiently powerful read/write
primitives to have found an appropriately privileged task and written it
to the ROP stack as an argument to succeed, which is similarly difficult
to the prior effort needed to escalate privileges before struct cred
existed: locate the current cred and overwrite the uid member.
This has the added benefit of meaning that prepare_kernel_cred() can no
longer exceed the privileges of the init task, which may have changed from
the original init_cred (e.g. dropping capabilities from the bounding set).
[1] https://google.com/search?q=commit_creds(prepare_kernel_cred(0))
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steve French <sfrench@samba.org>
Cc: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: Shyam Prasad N <sprasad@microsoft.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Russ Weight <russell.h.weight@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Link: https://lore.kernel.org/r/20221026232943.never.775-kees@kernel.org
|
|
No need to postpone this to the commit release path, since no packets
are walking over this object, this is accessed from control plane only.
This helped uncovered UAF triggered by races with the netlink notifier.
Fixes: 9dd732e0bdf5 ("netfilter: nf_tables: memleak flow rule from commit path")
Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
commit release path is invoked via call_rcu and it runs lockless to
release the objects after rcu grace period. The netlink notifier handler
might win race to remove objects that the transaction context is still
referencing from the commit release path.
Call rcu_barrier() to ensure pending rcu callbacks run to completion
if the list of transactions to be destroyed is not empty.
Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In nft_inner_parse_l2l3(), the return value of skb_header_pointer() is
'veth' instead of 'eth' when case 'htons(ETH_P_8021Q)' and fix it.
Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Peng Wu <wupeng58@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
GRE_VERSION and GRE_VERSION0 are expressed in network byte order,
use __be16. Uncovered by sparse:
net/netfilter/nft_payload.c:112:25: warning: incorrect type in assignment (different base types)
net/netfilter/nft_payload.c:112:25: expected unsigned int [usertype] version
net/netfilter/nft_payload.c:112:25: got restricted __be16
net/netfilter/nft_payload.c:114:22: warning: restricted __be16 degrades to integer
Fixes: c247897d7c19 ("netfilter: nft_payload: access GRE payload via inner offset")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
As a first strep in introducing proper PAN management and association,
we need to be able to create coordinator interfaces which might act as
coordinator or PAN coordinator.
Hence, let's add the minimum support to allow the creation of these
interfaces.
Even though the necessary logic to handle several interfaces on the same
device is added to make this future move easier, in practice only
several interfaces of type MONITOR are allowed at the same time. The
other combinations are not allowed (interface creation is possible but
only one can be opened at a time) because, with a single PHY featuring a
single set of address filters, we cannot afford handling two distinct
interfaces (with different address filters or filtering requirements):
* Having 2 NODEs, 2 COORDs or 1 NODE + 1 COORD
-> cannot work because the address filters would be different
* Having 1 MONITOR + either 1 NODE or 1 COORD
-> cannot work because the filtering levels are incompatible
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20221026093502.602734-4-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
While going through the whole interface opening logic in my head I was
consistently bothered by the condition checking whether there was only
one interface of type NODE/COORD opened at the same time. What actually
bothered me was the fact that in one case we would use the wpan_dev
pointer directly while in the other case we would use the sdata pointer,
making it harder to differentiate both. In practice the condition should
be straightforward to read. IMHO dropping the wpan_dev indirection
allows to clarify the check.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20221026093502.602734-3-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
It may appear clearer to free the skb at the end of the path rather than
in the middle, within a helper.
Move kfree_skb() from the end of __ieee802154_rx_handle_packet() to
right after it in the calling function ieee802154_rx(). Doing so implies
reworking a little bit the exit path.
Suggested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20221026093502.602734-2-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
IPv4 reassembly unit can decide to drop frags based on
/proc/sys/net/ipv4/ipfrag_max_dist sysctl.
Add a specific drop reason to track this specific
and weird case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Used to track skbs freed after a timeout happened
in a reassmbly unit.
Passing a @reason argument to inet_frag_rbtree_purge()
allows to use correct consumed status for frags
that have been successfully re-assembled.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|