summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2022-11-21bpf: Move skb->len == 0 checks into __bpf_redirectStanislav Fomichev
To avoid potentially breaking existing users. Both mac/no-mac cases have to be amended; mac_header >= network_header is not enough (verified with a new test, see next patch). Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len") Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221121180340.1983627-1-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-11-21netfilter: ipset: regression in ip_set_hash_ip.cVishwanath Pai
This patch introduced a regression: commit 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") The variable e.ip is passed to adtfn() function which finally adds the ip address to the set. The patch above refactored the for loop and moved e.ip = htonl(ip) to the end of the for loop. What this means is that if the value of "ip" changes between the first assignement of e.ip and the forloop, then e.ip is pointing to a different ip address than "ip". Test case: $ ipset create jdtest_tmp hash:ip family inet hashsize 2048 maxelem 100000 $ ipset add jdtest_tmp 10.0.1.1/31 ipset v6.21.1: Element cannot be added to the set: it's already added The value of ip gets updated inside the "else if (tb[IPSET_ATTR_CIDR])" block but e.ip is still pointing to the old value. Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-21mptcp: more detailed error reporting on endpoint creationPaolo Abeni
Endpoint creation can fail for a number of reasons; in case of failure append the error number to the extended ack message, using a newly introduced generic helper. Additionally let mptcp_pm_nl_append_new_local_addr() report different error reasons. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21mptcp: deduplicate error paths on endpoint creationPaolo Abeni
When endpoint creation fails, we need to free the newly allocated entry and eventually destroy the paired mptcp listener socket. Consolidate such action in a single point let all the errors path reach it. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21net: Return errno in sk->sk_prot->get_port().Kuniyuki Iwashima
We assume the correct errno is -EADDRINUSE when sk->sk_prot->get_port() fails, so some ->get_port() functions return just 1 on failure and the callers return -EADDRINUSE instead. However, mptcp_get_port() can return -EINVAL. Let's not ignore the error. Note the only exception is inet_autobind(), all of whose callers return -EAGAIN instead. Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21ipv4/fib: Replace zero-length array with DECLARE_FLEX_ARRAY() helperKees Cook
Zero-length arrays are deprecated[1] and are being replaced with flexible array members in support of the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3. Replace zero-length array with flexible-array member in struct key_vector. This results in no differences in binary output. [1] https://github.com/KSPP/linux/issues/78 Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains late Netfilter fixes for net: 1) Use READ_ONCE()/WRITE_ONCE() to update ct->mark, from Daniel Xu. Not reported by syzbot, but I presume KASAN would trigger post a splat on this. This is a rather old issue, predating git history. 2) Do not set up extensions for set element with end interval flag set on. This leads to bogusly skipping this elements as expired when listing the set/map to userspace as well as increasing memory consumpton when stateful expressions are used. This issue has been present since 4.18, when timeout support for rbtree set was added. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-21Merge 6.1-rc6 into driver-core-nextGreg Kroah-Hartman
We need the kernfs changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-20bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncsDavid Vernet
Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal to the verifier that it should enforce that a BPF program passes it a "safe", trusted pointer. Currently, "safe" means that the pointer is either PTR_TO_CTX, or is refcounted. There may be cases, however, where the kernel passes a BPF program a safe / trusted pointer to an object that the BPF program wishes to use as a kptr, but because the object does not yet have a ref_obj_id from the perspective of the verifier, the program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS kfunc. The solution is to expand the set of pointers that are considered trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs with these pointers without getting rejected by the verifier. There is already a PTR_UNTRUSTED flag that is set in some scenarios, such as when a BPF program reads a kptr directly from a map without performing a bpf_kptr_xchg() call. These pointers of course can and should be rejected by the verifier. Unfortunately, however, PTR_UNTRUSTED does not cover all the cases for safety that need to be addressed to adequately protect kfuncs. Specifically, pointers obtained by a BPF program "walking" a struct are _not_ considered PTR_UNTRUSTED according to BPF. For example, say that we were to add a kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal that a task was unsafe to pass to a kfunc, the verifier would mistakenly allow the following unsafe BPF program to be loaded: SEC("tp_btf/task_newtask") int BPF_PROG(unsafe_acquire_task, struct task_struct *task, u64 clone_flags) { struct task_struct *acquired, *nested; nested = task->last_wakee; /* Would not be rejected by the verifier. */ acquired = bpf_task_acquire(nested); if (!acquired) return 0; bpf_task_release(acquired); return 0; } To address this, this patch defines a new type flag called PTR_TRUSTED which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are passed directly from the kernel as a tracepoint or struct_ops callback argument. Any nested pointer that is obtained from walking a PTR_TRUSTED pointer is no longer PTR_TRUSTED. From the example above, the struct task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer obtained from 'task->last_wakee' is not PTR_TRUSTED. A subsequent patch will add kfuncs for storing a task kfunc as a kptr, and then another patch will add selftests to validate. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221120051004.3605026-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-18net: dsa: tag_mtk: assign per-port queuesFelix Fietkau
Keeps traffic sent to the switch within link speed limits Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221116080734.44013-6-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18netlink: remove the flex array from struct nlmsghdrJakub Kicinski
I've added a flex array to struct nlmsghdr in commit 738136a0e375 ("netlink: split up copies in the ack construction") to allow accessing the data easily. It leads to warnings with clang, if user space wraps this structure into another struct and the flex array is not at the end of the container. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/all/20221114023927.GA685@u2004-local/ Link: https://lore.kernel.org/r/20221118033903.1651026-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18bpf, docs: DEVMAPs and XDP_REDIRECTMaryam Tahhan
Add documentation for BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH including kernel version introduced, usage and examples. Add documentation that describes XDP_REDIRECT. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221115144921.165483-1-mtahhan@redhat.com
2022-11-18netfilter: nf_tables: do not set up extensions for end intervalPablo Neira Ayuso
Elements with an end interval flag set on do not store extensions. The global set definition is currently setting on the timeout and stateful expression for end interval elements. This leads to skipping end interval elements from the set->ops->walk() path as the expired check bogusly reports true. Moreover, do not set up stateful expressions for elements with end interval flag set on since this is never used. Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition") Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-18netfilter: conntrack: Fix data-races around ct markDaniel Xu
nf_conn:mark can be read from and written to in parallel. Use READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted compiler optimizations. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-189p/fd: Use P9_HDRSZ for header sizeGUO Zihua
Cleanup hardcoded header sizes to use P9_HDRSZ instead of '7' Link: https://lkml.kernel.org/r/20221117091159.31533-4-guozihua@huawei.com Signed-off-by: GUO Zihua <guozihua@huawei.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: commit message adjusted to make sense after offset size adjustment got removed] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-11-189p/fd: Fix write overflow in p9_read_workGUO Zihua
This error was reported while fuzzing: BUG: KASAN: slab-out-of-bounds in _copy_to_iter+0xd35/0x1190 Write of size 4043 at addr ffff888008724eb1 by task kworker/1:1/24 CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 6.1.0-rc5-00002-g1adf73218daa-dirty #223 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Workqueue: events p9_read_work Call Trace: <TASK> dump_stack_lvl+0x4c/0x64 print_report+0x178/0x4b0 kasan_report+0xae/0x130 kasan_check_range+0x179/0x1e0 memcpy+0x38/0x60 _copy_to_iter+0xd35/0x1190 copy_page_to_iter+0x1d5/0xb00 pipe_read+0x3a1/0xd90 __kernel_read+0x2a5/0x760 kernel_read+0x47/0x60 p9_read_work+0x463/0x780 process_one_work+0x91d/0x1300 worker_thread+0x8c/0x1210 kthread+0x280/0x330 ret_from_fork+0x22/0x30 </TASK> Allocated by task 457: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7e/0x90 __kmalloc+0x59/0x140 p9_fcall_init.isra.11+0x5d/0x1c0 p9_tag_alloc+0x251/0x550 p9_client_prepare_req+0x162/0x350 p9_client_rpc+0x18d/0xa90 p9_client_create+0x670/0x14e0 v9fs_session_init+0x1fd/0x14f0 v9fs_mount+0xd7/0xaf0 legacy_get_tree+0xf3/0x1f0 vfs_get_tree+0x86/0x2c0 path_mount+0x885/0x1940 do_mount+0xec/0x100 __x64_sys_mount+0x1a0/0x1e0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd This BUG pops up when trying to reproduce https://syzkaller.appspot.com/bug?id=6c7cd46c7bdd0e86f95d26ec3153208ad186f9fa The callstack is different but the issue is valid and re-producable with the same re-producer in the link. The root cause of this issue is that we check the size of the message received against the msize of the client in p9_read_work. However, it turns out that capacity is no longer consistent with msize. Thus, the message size should be checked against sdata capacity. As the msize is non-consistant with the capacity of the tag and as we are now checking message size against capacity directly, there is no point checking message size against msize. So remove it. Link: https://lkml.kernel.org/r/20221117091159.31533-2-guozihua@huawei.com Link: https://lkml.kernel.org/r/20221117091159.31533-3-guozihua@huawei.com Reported-by: syzbot+0f89bd13eaceccc0e126@syzkaller.appspotmail.com Fixes: 60ece0833b6c ("net/9p: allocate appropriate reduced message buffers") Signed-off-by: GUO Zihua <guozihua@huawei.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: squash patches 1 & 2 and fix size including header part] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-11-189p/fd: fix issue of list_del corruption in p9_fd_cancel()Zhengchao Shao
Syz reported the following issue: kernel BUG at lib/list_debug.c:53! invalid opcode: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__list_del_entry_valid.cold+0x5c/0x72 Call Trace: <TASK> p9_fd_cancel+0xb1/0x270 p9_client_rpc+0x8ea/0xba0 p9_client_create+0x9c0/0xed0 v9fs_session_init+0x1e0/0x1620 v9fs_mount+0xba/0xb80 legacy_get_tree+0x103/0x200 vfs_get_tree+0x89/0x2d0 path_mount+0x4c0/0x1ac0 __x64_sys_mount+0x33b/0x430 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> The process is as follows: Thread A: Thread B: p9_poll_workfn() p9_client_create() ... ... p9_conn_cancel() p9_fd_cancel() list_del() ... ... list_del() //list_del corruption There is no lock protection when deleting list in p9_conn_cancel(). After deleting list in Thread A, thread B will delete the same list again. It will cause issue of list_del corruption. Setting req->status to REQ_STATUS_ERROR under lock prevents other cleanup paths from trying to manipulate req_list. The other thread can safely check req->status because it still holds a reference to req at this point. Link: https://lkml.kernel.org/r/20221110122606.383352-1-shaozhengchao@huawei.com Fixes: 52f1c45dde91 ("9p: trans_fd/p9_conn_cancel: drop client lock earlier") Reported-by: syzbot+9b69b8d10ab4a7d88056@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> [Dominique: add description of the fix in commit message] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-11-18nfc/nci: fix race with opening and closingLin Ma
Previously we leverage NCI_UNREG and the lock inside nci_close_device to prevent the race condition between opening a device and closing a device. However, it still has problem because a failed opening command will erase the NCI_UNREG flag and allow another opening command to bypass the status checking. This fix corrects that by making sure the NCI_UNREG is held. Reported-by: syzbot+43475bf3cfbd6e41f5b7@syzkaller.appspotmail.com Fixes: 48b71a9e66c2 ("NFC: add NCI_UNREG flag to eliminate the race") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18mrp: introduce active flags to prevent UAF when applicant uninitSchspa Shi
The caller of del_timer_sync must prevent restarting of the timer, If we have no this synchronization, there is a small probability that the cancellation will not be successful. And syzbot report the fellowing crash: ================================================================== BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:929 [inline] BUG: KASAN: use-after-free in enqueue_timer+0x18/0xa4 kernel/time/timer.c:605 Write at addr f9ff000024df6058 by task syz-fuzzer/2256 Pointer tag: [f9], memory tag: [fe] CPU: 1 PID: 2256 Comm: syz-fuzzer Not tainted 6.1.0-rc5-syzkaller-00008- ge01d50cbd6ee #0 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0xe0/0xf0 arch/arm64/kernel/stacktrace.c:156 dump_backtrace arch/arm64/kernel/stacktrace.c:162 [inline] show_stack+0x18/0x40 arch/arm64/kernel/stacktrace.c:163 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x68/0x84 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x1a8/0x4a0 mm/kasan/report.c:395 kasan_report+0x94/0xb4 mm/kasan/report.c:495 __do_kernel_fault+0x164/0x1e0 arch/arm64/mm/fault.c:320 do_bad_area arch/arm64/mm/fault.c:473 [inline] do_tag_check_fault+0x78/0x8c arch/arm64/mm/fault.c:749 do_mem_abort+0x44/0x94 arch/arm64/mm/fault.c:825 el1_abort+0x40/0x60 arch/arm64/kernel/entry-common.c:367 el1h_64_sync_handler+0xd8/0xe4 arch/arm64/kernel/entry-common.c:427 el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:576 hlist_add_head include/linux/list.h:929 [inline] enqueue_timer+0x18/0xa4 kernel/time/timer.c:605 mod_timer+0x14/0x20 kernel/time/timer.c:1161 mrp_periodic_timer_arm net/802/mrp.c:614 [inline] mrp_periodic_timer+0xa0/0xc0 net/802/mrp.c:627 call_timer_fn.constprop.0+0x24/0x80 kernel/time/timer.c:1474 expire_timers+0x98/0xc4 kernel/time/timer.c:1519 To fix it, we can introduce a new active flags to make sure the timer will not restart. Reported-by: syzbot+6fd64001c20aa99e34a4@syzkaller.appspotmail.com Signed-off-by: Schspa Shi <schspa@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18Merge tag 'rxrpc-next-20221116' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix oops and missing config conditionals The patches that were pulled into net-next previously[1] had some issues that this patchset fixes: (1) Fix missing IPV6 config conditionals. (2) Fix an oops caused by calling udpv6_sendmsg() directly on an AF_INET socket. (3) Fix the validation of network addresses on entry to socket functions so that we don't allow an AF_INET6 address if we've selected an AF_INET transport socket. Link: https://lore.kernel.org/r/166794587113.2389296.16484814996876530222.stgit@warthog.procyon.org.uk/ [1] ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]David Howells
After rxrpc_unbundle_conn() has removed a connection from a bundle, it checks to see if there are any conns with available channels and, if not, removes and attempts to destroy the bundle. Whilst it does check after grabbing client_bundles_lock that there are no connections attached, this races with rxrpc_look_up_bundle() retrieving the bundle, but not attaching a connection for the connection to be attached later. There is therefore a window in which the bundle can get destroyed before we manage to attach a new connection to it. Fix this by adding an "active" counter to struct rxrpc_bundle: (1) rxrpc_connect_call() obtains an active count by prepping/looking up a bundle and ditches it before returning. (2) If, during rxrpc_connect_call(), a connection is added to the bundle, this obtains an active count, which is held until the connection is discarded. (3) rxrpc_deactivate_bundle() is created to drop an active count on a bundle and destroy it when the active count reaches 0. The active count is checked inside client_bundles_lock() to prevent a race with rxrpc_look_up_bundle(). (4) rxrpc_unbundle_conn() then calls rxrpc_deactivate_bundle(). Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-15975 Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: zdi-disclosures@trendmicro.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18net: fix napi_disable() logic errorEric Dumazet
Dan reported a new warning after my recent patch: New smatch warnings: net/core/dev.c:6409 napi_disable() error: uninitialized symbol 'new'. Indeed, we must first wait for STATE_SCHED and STATE_NPSVC to be cleared, to make sure @new variable has been initialized properly. Fixes: 4ffa1d1c6842 ("net: adopt try_cmpxchg() in napi_{enable|disable}()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18rxrpc: uninitialized variable in rxrpc_send_ack_packet()Dan Carpenter
The "pkt" was supposed to have been deleted in a previous patch. It leads to an uninitialized variable bug. Fixes: 72f0c6fb0579 ("rxrpc: Allocate ACK records at proposal and queue for transmission") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18rxrpc: fix rxkad_verify_response()Dan Carpenter
The error handling for if skb_copy_bits() fails was accidentally deleted so the rxkad_decrypt_ticket() function is not called. Fixes: 5d7edbc9231e ("rxrpc: Get rid of the Rx ring") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18Merge tag 'wireless-next-2022-11-18' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.2 Second set of patches for v6.2. Only driver patches this time, nothing really special. Unused platform data support was removed from wl1251 and rtw89 got WoWLAN support. Major changes: ath11k * support configuring channel dwell time during scan rtw89 * new dynamic header firmware format support * Wake-over-WLAN support rtl8xxxu * enable IEEE80211_HW_SUPPORT_FAST_XMIT ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18sctp: add sysctl net.sctp.l3mdev_acceptXin Long
This patch is to add sysctl net.sctp.l3mdev_accept to allow users to change the pernet global l3mdev_accept. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18sctp: add dif and sdif check in asoc and ep lookupXin Long
This patch at first adds a pernet global l3mdev_accept to decide if it accepts the packets from a l3mdev when a SCTP socket doesn't bind to any interface. It's set to 1 to avoid any possible incompatible issue, and in next patch, a sysctl will be introduced to allow to change it. Then similar to inet/udp_sk_bound_dev_eq(), sctp_sk_bound_dev_eq() is added to check either dif or sdif is equal to sk_bound_dev_if, and to check sid is 0 or l3mdev_accept is 1 if sk_bound_dev_if is not set. This function is used to match a association or a endpoint, namely called by sctp_addrs_lookup_transport() and sctp_endpoint_is_match(). All functions that needs updating are: sctp_rcv(): asoc: __sctp_rcv_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() __sctp_rcv_lookup_harder() __sctp_rcv_init_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() __sctp_rcv_walk_lookup() __sctp_rcv_asconf_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() ep: __sctp_rcv_lookup_endpoint() -> sctp_endpoint_is_match() sctp_connect(): sctp_endpoint_is_peeled_off() __sctp_lookup_association() sctp_has_association() sctp_lookup_association() __sctp_lookup_association() -> sctp_addrs_lookup_transport() sctp_diag_dump_one(): sctp_transport_lookup_process() -> sctp_addrs_lookup_transport() Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18sctp: add skb_sdif in struct sctp_afXin Long
Add skb_sdif function in struct sctp_af to get the enslaved device for both ipv4 and ipv6 when adding SCTP VRF support in sctp_rcv in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18sctp: check sk_bound_dev_if when matching ep in get_portXin Long
In sctp_get_port_local(), when binding to IP and PORT, it should also check sk_bound_dev_if to match listening sk if it's set by SO_BINDTOIFINDEX, so that multiple sockets with the same IP and PORT, but different sk_bound_dev_if can be listened at the same time. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18sctp: check ipv6 addr with sk_bound_dev if setXin Long
When binding to an ipv6 address, it calls ipv6_chk_addr() to check if this address is on any dev. If a socket binds to a l3mdev but no dev is passed to do this check, all l3mdev and slaves will be skipped and the check will fail. This patch is to pass the bound_dev to make sure the devices under the same l3mdev can be returned in ipv6_chk_addr(). When the bound_dev is not a l3mdev or l3slave, l3mdev_master_dev_rcu() will return NULL in __ipv6_chk_addr_and_flags(), it will keep compitable with before when NULL dev was passed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18sctp: verify the bind address with the tb_id from l3mdevXin Long
After binding to a l3mdev, it should use the route table from the corresponding VRF to verify the addr when binding to an address. Note ipv6 doesn't need it, as binding to ipv6 address does not verify the addr with route lookup. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-18net: neigh: decrement the family specific qlenThomas Zeitlhofer
Commit 0ff4eb3d5ebb ("neighbour: make proxy_queue.qlen limit per-device") introduced the length counter qlen in struct neigh_parms. There are separate neigh_parms instances for IPv4/ARP and IPv6/ND, and while the family specific qlen is incremented in pneigh_enqueue(), the mentioned commit decrements always the IPv4/ARP specific qlen, regardless of the currently processed family, in pneigh_queue_purge() and neigh_proxy_process(). As a result, with IPv6/ND, the family specific qlen is only incremented (and never decremented) until it exceeds PROXY_QLEN, and then, according to the check in pneigh_enqueue(), neighbor solicitations are not answered anymore. As an example, this is noted when using the subnet-router anycast address to access a Linux router. After a certain amount of time (in the observed case, qlen exceeded PROXY_QLEN after two days), the Linux router stops answering neighbor solicitations for its subnet-router anycast address and effectively becomes unreachable. Another result with IPv6/ND is that the IPv4/ARP specific qlen is decremented more often than incremented. This leads to negative qlen values, as a signed integer has been used for the length counter qlen, and potentially to an integer overflow. Fix this by introducing the helper function neigh_parms_qlen_dec(), which decrements the family specific qlen. Thereby, make use of the existing helper function neigh_get_dev_parms_rcu(), whose definition therefore needs to be placed earlier in neighbour.c. Take the family member from struct neigh_table to determine the currently processed family and appropriately call neigh_parms_qlen_dec() from pneigh_queue_purge() and neigh_proxy_process(). Additionally, use an unsigned integer for the length counter qlen. Fixes: 0ff4eb3d5ebb ("neighbour: make proxy_queue.qlen limit per-device") Signed-off-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-17sctp: move SCTP_PAD4 and SCTP_TRUNC4 to linux/sctp.hXin Long
Move these two macros from net/sctp/sctp.h to linux/sctp.h, so that it will be enough to include only linux/sctp.h in nft_exthdr.c and xt_sctp.c. It should not include "net/sctp/sctp.h" if a module does not have a dependence on SCTP module. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Link: https://lore.kernel.org/r/ef6468a687f36da06f575c2131cd4612f6b7be88.1668526821.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17devlink: Allow to set up parent in devl_rate_leaf_create()Michal Wilczynski
Currently the driver is able to create leaf nodes for the devlink-rate, but is unable to set parent for them. This wasn't as issue before the possibility to export hierarchy from the driver. After adding the export feature, in order for the driver to supply correct hierarchy, it's necessary for it to be able to supply a parent name to devl_rate_leaf_create(). Introduce a new parameter 'parent_name' in devl_rate_leaf_create(). Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17devlink: Allow for devlink-rate nodes parent reassignmentMichal Wilczynski
Currently it's not possible to reassign the parent of the node using one command. As the previous commit introduced a way to export entire hierarchy from the driver, being able to modify and reassign parents become important. This way user might easily change QoS settings without interrupting traffic. Example command: devlink port function rate set pci/0000:4b:00.0/1 parent node_custom_1 This reassigns leaf node parent to node_custom_1. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17devlink: Enable creation of the devlink-rate nodes from the driverMichal Wilczynski
Intel 100G card internal firmware hierarchy for Hierarchicial QoS is very rigid and can't be easily removed. This requires an ability to export default hierarchy to allow user to modify it. Currently the driver is only able to create the 'leaf' nodes, which usually represent the vport. This is not enough for HQoS implemented in Intel hardware. Introduce new function devl_rate_node_create() that allows for creation of the devlink-rate nodes from the driver. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17devlink: Introduce new attribute 'tx_weight' to devlink-rateMichal Wilczynski
To fully utilize offload capabilities of Intel 100G card QoS capabilities new attribute 'tx_weight' needs to be introduced. This attribute allows for usage of Weighted Fair Queuing arbitration scheme among siblings. This arbitration scheme can be used simultaneously with the strict priority. Introduce new attribute in devlink-rate that will allow for configuration of Weighted Fair Queueing. New attribute is optional. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17devlink: Introduce new attribute 'tx_priority' to devlink-rateMichal Wilczynski
To fully utilize offload capabilities of Intel 100G card QoS capabilities new attribute 'tx_priority' needs to be introduced. This attribute allows for usage of strict priority arbiter among siblings. This arbitration scheme attempts to schedule nodes based on their priority as long as the nodes remain within their bandwidth limit. Introduce new attribute in devlink-rate that will allow for configuration of strict priority. New attribute is optional. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: autoload tag driver module on tagging protocol changeVladimir Oltean
Issue a request_module() call when an attempt to change the tagging protocol is made, either by sysfs or by device tree. In the case of ocelot (the only driver for which the default and the alternative tagging protocol are compiled as different modules), the user is now no longer required to insert tag_ocelot_8021q.ko manually. In the particular case of ocelot, this solves a problem where tag_ocelot_8021q.ko is built as module, and this is present in the device tree: &mscc_felix_port4 { dsa-tag-protocol = "ocelot-8021q"; }; &mscc_felix_port5 { dsa-tag-protocol = "ocelot-8021q"; }; Because no one attempts to load the module into the kernel at boot time, the switch driver will fail to probe (actually forever defer) until someone manually inserts tag_ocelot_8021q.ko. This is now no longer necessary and happens automatically. Rename dsa_find_tagger_by_name() to denote the change in functionality: there is now feature parity with dsa_tag_driver_get_by_id(), i.o.w. we also load the module if it's missing. Link: https://lore.kernel.org/lkml/20221027113248.420216-1-michael@walle.cc/ Suggested-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: rename dsa_tag_driver_get() to dsa_tag_driver_get_by_id()Vladimir Oltean
A future patch will introduce one more way of getting a reference on a tagging protocl driver (by name). Rename the current method to "by_id". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: strip sysfs "tagging" string of trailing newlineVladimir Oltean
Currently, dsa_find_tagger_by_name() uses sysfs_streq() which works both with strings that contain \n at the end (echo ocelot > .../dsa/tagging) and with strings that don't (printf ocelot > .../dsa/tagging). There will be a problem once we'll want to construct the modalias string based on which we auto-load the protocol kernel module. If the sysfs buffer ends in a newline, we need to strip it first. This is a preparatory patch specifically for that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: provide a second modalias to tag proto drivers based on their nameVladimir Oltean
Currently, tagging protocol drivers have a modalias of "dsa_tag:id-<number>", where the number is one of DSA_TAG_PROTO_*_VALUE. This modalias makes it possible for the request_module() call in dsa_tag_driver_get() to work, given the input it has - an integer returned by ds->ops->get_tag_protocol(). It is also possible to change tagging protocols at (pseudo-)runtime, via sysfs or via device tree, and this works via the name string of the tagging protocol rather than via its id (DSA_TAG_PROTO_*_VALUE). In the latter case, there is no request_module() call, because there is no association that the DSA core has between the string name and the ID, to construct the modalias. The module is simply assumed to have been inserted. This is actually slightly problematic when the tagging protocol change should take place at probe time, since it's expected that the dependency module should get autoloaded. For this purpose, let's introduce a second modalias, so that the DSA core can call request_module() by name. There is no reason to make the modalias by name optional, so just modify the MODULE_ALIAS_DSA_TAG_DRIVER() macro to take both the ID and the name as arguments, and generate two modaliases behind the scenes. Suggested-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 w/ ocelot_8021q Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: rename tagging protocol driver modaliasVladimir Oltean
It's autumn cleanup time, and today's target are modaliases. Michael says that for users of modinfo, "dsa_tag-20" is not the most suggestive name, and recommends a change to "dsa_tag-id-20". Andrew points out that other modaliases have a prefix delimited by colons, so he recommends "dsa_tag:20" instead of "dsa_tag-20". To satisfy both proposals, Florian recommends "dsa_tag:id-20". The modaliases are not stable ABI, and the essential information (protocol ID) is still conveyed in the new string, which request_module() must be adapted to form. Link: 20221027210830.3577793-1-vladimir.oltean@nxp.com Suggested-by: Andrew Lunn <andrew@lunn.ch> Suggested-by: Michael Walle <michael@walle.cc> Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: stop exposing tag proto module helpers to the worldVladimir Oltean
The DSA tagging protocol driver macros are in the public include/net/dsa.h probably because that's also where the DSA_TAG_PROTO_*_VALUE macros are (MODULE_ALIAS_DSA_TAG_DRIVER hinges on those macro definitions). But there is no reason to expose these helpers to <net/dsa.h>. That header is shared between switch drivers (drivers/net/dsa/), tagging protocol drivers (net/dsa/tag_*.c), the DSA core (net/dsa/ sans tag_*.c), and the rest of the world (DSA master drivers, network stack, etc). Too much exposure. On the other hand, net/dsa/dsa_priv.h is included only by the DSA core and by DSA tagging protocol drivers (or IOW, "friend" modules). Also a bit too much exposure - I've contemplated creating a new header which is only included by tagging protocol drivers, but completely separating a new dsa_tag_proto.h from dsa_priv.h is not immediately trivial - for example dsa_slave_to_port() is used both from the fast path and from the control path. So for now, move these definitions to dsa_priv.h which at least hides them from the world. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: set name_assign_type to NET_NAME_ENUM for enumerated user portsRasmus Villemoes
When a user port does not have a label in device tree, and we thus fall back to the eth%d scheme, the proper constant to use is NET_NAME_ENUM. See also commit e9f656b7a214 ("net: ethernet: set default assignment identifier to NET_NAME_ENUM"), which in turn quoted commit 685343fc3ba6 ("net: add name_assign_type netdev attribute"): ... when the kernel has given the interface a name using global device enumeration based on order of discovery (ethX, wlanY, etc) ... are labelled NET_NAME_ENUM. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.faineli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: use NET_NAME_PREDICTABLE for user ports with name given in DTRasmus Villemoes
When a user port has a label in device tree, the corresponding netdevice is, to quote include/uapi/linux/netdevice.h, "predictably named by the kernel". This is also explicitly one of the intended use cases for NET_NAME_PREDICTABLE, quoting 685343fc3ba6 ("net: add name_assign_type netdev attribute"): NET_NAME_PREDICTABLE: The ifname has been assigned by the kernel in a predictable way [...] Examples include [...] and names deduced from hardware properties (including being given explicitly by the firmware). Expose that information properly for the benefit of userspace tools that make decisions based on the name_assign_type attribute, e.g. a systemd-udev rule with "kernel" in NamePolicy. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.faineli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17net: dsa: refactor name assignment for user portsRasmus Villemoes
The following two patches each have a (small) chance of causing regressions for userspace and will in that case of course need to be reverted. In order to prepare for that and make those two patches independent and individually revertable, refactor the code which sets the names for user ports by moving the "fall back to eth%d if no label is given in device tree" to dsa_slave_create(). No functional change (at least none intended). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.faineli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/linux/bpf.h 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record") f71b2f64177a ("bpf: Refactor map->off_arr handling") https://lore.kernel.org/all/20221114095000.67a73239@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-18treewide: use get_random_u32_inclusive() when possibleJason A. Donenfeld
These cases were done with this Coccinelle: @@ expression H; expression L; @@ - (get_random_u32_below(H) + L) + get_random_u32_inclusive(L, H + L - 1) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - + E - - E ) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - - E - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - - E + F - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - + E + F - - E ) And then subsequently cleaned up by hand, with several automatic cases rejected if it didn't make sense contextually. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18treewide: use get_random_u32_{above,below}() instead of manual loopJason A. Donenfeld
These cases were done with this Coccinelle: @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I > E); + I = get_random_u32_below(E + 1); @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I >= E); + I = get_random_u32_below(E); @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I < E); + I = get_random_u32_above(E - 1); @@ expression E; identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I <= E); + I = get_random_u32_above(E); @@ identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (!I); + I = get_random_u32_above(0); @@ identifier I; @@ - do { ... when != I - I = get_random_u32(); ... when != I - } while (I == 0); + I = get_random_u32_above(0); @@ expression E; @@ - E + 1 + get_random_u32_below(U32_MAX - E) + get_random_u32_above(E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>