summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-13bpf: Tidy up verifier check_func_arg()Joanne Koong
This patch does two things: 1. For matching against the arg type, the match should be against the base type of the arg type, since the arg type can have different bpf_type_flags set on it. 2. Uses switch casing to improve readability + efficiency. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220712210603.123791-1-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-13libbpf: Error out when binary_path is NULL for uprobe and USDTHengqi Chen
binary_path is a required non-null parameter for bpf_program__attach_usdt and bpf_program__attach_uprobe_opts. Check it against NULL to prevent coredump on strchr. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220712025745.2703995-1-hengqi.chen@gmail.com
2022-07-12bpf: Make non-preallocated allocation low priorityYafang Shao
GFP_ATOMIC doesn't cooperate well with memcg pressure so far, especially if we allocate too much GFP_ATOMIC memory. For example, when we set the memcg limit to limit a non-preallocated bpf memory, the GFP_ATOMIC can easily break the memcg limit by force charge. So it is very dangerous to use GFP_ATOMIC in non-preallocated case. One way to make it safe is to remove __GFP_HIGH from GFP_ATOMIC, IOW, use (__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM) instead, then it will be limited if we allocate too much memory. There's a plan to completely remove __GFP_ATOMIC in the mm side[1], so let's use GFP_NOWAIT instead. We introduced BPF_F_NO_PREALLOC is because full map pre-allocation is too memory expensive for some cases. That means removing __GFP_HIGH doesn't break the rule of BPF_F_NO_PREALLOC, but has the same goal with it-avoiding issues caused by too much memory. So let's remove it. This fix can also apply to other run-time allocations, for example, the allocation in lpm trie, local storage and devmap. So let fix it consistently over the bpf code It also fixes a typo in the comment. [1]. https://lore.kernel.org/linux-mm/163712397076.13692.4727608274002939094@noble.neil.brown.name/ Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20220709154457.57379-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12bpf, x86: fix freeing of not-finalized bpf_prog_packSong Liu
syzbot reported a few issues with bpf_prog_pack [1], [2]. This only happens with multiple subprogs. In jit_subprogs(), we first call bpf_int_jit_compile() on each sub program. And then, we call it on each sub program again. jit_data is not freed in the first call of bpf_int_jit_compile(). Similarly we don't call bpf_jit_binary_pack_finalize() in the first call of bpf_int_jit_compile(). If bpf_int_jit_compile() failed for one sub program, we will call bpf_jit_binary_pack_finalize() for this sub program. However, we don't have a chance to call it for other sub programs. Then we will hit "goto out_free" in jit_subprogs(), and call bpf_jit_free on some subprograms that haven't got bpf_jit_binary_pack_finalize() yet. At this point, bpf_jit_binary_pack_free() is called and the whole 2MB page is freed erroneously. Fix this with a custom bpf_jit_free() for x86_64, which calls bpf_jit_binary_pack_finalize() if necessary. Also, with custom bpf_jit_free(), bpf_prog_aux->use_bpf_prog_pack is not needed any more, remove it. Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc") [1] https://syzkaller.appspot.com/bug?extid=2f649ec6d2eea1495a8f [2] https://syzkaller.appspot.com/bug?extid=87f65c75f4a72db05445 Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20220706002612.4013790-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12bpf: reparent bpf maps on memcg offliningRoman Gushchin
The memory consumed by a bpf map is always accounted to the memory cgroup of the process which created the map. The map can outlive the memory cgroup if it's used by processes in other cgroups or is pinned on bpffs. In this case the map pins the original cgroup in the dying state. For other types of objects (slab objects, non-slab kernel allocations, percpu objects and recently LRU pages) there is a reparenting process implemented: on cgroup offlining charged objects are getting reassigned to the parent cgroup. Because all charges and statistics are fully recursive it's a fairly cheap operation. For efficiency and consistency with other types of objects, let's do the same for bpf maps. Fortunately thanks to the objcg API, the required changes are minimal. Please, note that individual allocations (slabs, percpu and large kmallocs) already have the reparenting mechanism. This commit adds it to the saved map->memcg pointer by replacing it to map->objcg. Because dying cgroups are not visible for a user and all charges are recursive, this commit doesn't bring any behavior changes for a user. v2: added a missing const qualifier Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20220711162827.184743-1-roman.gushchin@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12Merge branch 'bpf: add a ksym BPF iterator'Alexei Starovoitov
Alan Maguire says: ==================== a ksym BPF iterator would be useful as it would allow more flexible interactions with kernel symbols than are currently supported; it could for example create more efficient map representations for lookup, speed up symbol resolution etc. The idea was initially discussed here [1]. Changes since v5 [2]: - no need to add kallsym_iter to bpf_iter.h as it has existed in kernels for a long time so will by in vmlinux.h for older kernels too, unlike struct bpf_iter__ksym (Yonghong, patch 2) Changes since v4 [3]: - add BPF_ITER_RESCHED to improve responsiveness (Hao, patch 1) - remove pr_warn to be consistent with other iterators (Andrii, patch 1) - add definitions to bpf_iter.h to ensure iter tests build on older kernels (Andrii, patch 2) Changes since v3 [4]: - use late_initcall() to register iter; means we are both consistent with other iters and can encapsulate all iter-specific code in kallsyms.c in CONFIG_BPF_SYSCALL (Alexei, Yonghong, patch 1). Changes since v2 [5]: - set iter->show_value on initialization based on current creds and use it in selftest to determine if we show values (Yonghong, patches 1/2) - inline iter registration into kallsyms_init (Yonghong, patch 1) Changes since RFC [6]: - change name of iterator (and associated structures/fields) to "ksym" (Andrii, patches 1, 2) - remove dependency on CONFIG_PROC_FS; it was used for other BPF iterators, and I assumed it was needed because of seq ops but I don't think it is required on digging futher (Andrii, patch 1) [1] https://lore.kernel.org/all/YjRPZj6Z8vuLeEZo@krava/ [2] https://lore.kernel.org/bpf/1657490998-31468-1-git-send-email-alan.maguire@oracle.com/ [3] https://lore.kernel.org/bpf/1657113391-5624-1-git-send-email-alan.maguire@oracle.com/ [4] https://lore.kernel.org/bpf/1656942916-13491-1-git-send-email-alan.maguire@oracle.com [5] https://lore.kernel.org/bpf/1656667620-18718-1-git-send-email-alan.maguire@oracle.com/ [6] https://lore.kernel.org/all/1656089118-577-1-git-send-email-alan.maguire@oracle.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12selftests/bpf: add a ksym iter subtestAlan Maguire
add subtest verifying BPF ksym iter behaviour. The BPF ksym iter program shows an example of dumping a format different to /proc/kallsyms. It adds KIND and MAX_SIZE fields which represent the kind of symbol (core kernel, module, ftrace, bpf, or kprobe) and the maximum size the symbol can be. The latter is calculated from the difference between current symbol value and the next symbol value. The key benefit for this iterator will likely be supporting in-kernel data-gathering rather than dumping symbol details to userspace and parsing the results. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/1657629105-7812-3-git-send-email-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-12bpf: add a ksym BPF iteratorAlan Maguire
add a "ksym" iterator which provides access to a "struct kallsym_iter" for each symbol. Intent is to support more flexible symbol parsing as discussed in [1]. [1] https://lore.kernel.org/all/YjRPZj6Z8vuLeEZo@krava/ Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/1657629105-7812-2-git-send-email-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-11bpf: Fix 'dubious one-bit signed bitfield' warningsMatthieu Baerts
Our CI[1] reported these warnings when using Sparse: $ touch net/mptcp/bpf.c $ make C=1 net/mptcp/bpf.o net/mptcp/bpf.c: note: in included file: include/linux/bpf_verifier.h:348:26: error: dubious one-bit signed bitfield include/linux/bpf_verifier.h:349:29: error: dubious one-bit signed bitfield Set them as 'unsigned' to avoid warnings. [1] https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2643588487 Fixes: 1ade23711971 ("bpf: Inline calls to bpf_loop when callback is known") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220711081200.2081262-1-matthieu.baerts@tessares.net
2022-07-11samples/bpf: Fix xdp_redirect_map egress devmap progJesper Dangaard Brouer
LLVM compiler optimized out the memcpy in xdp_redirect_map_egress, which caused the Ethernet source MAC-addr to always be zero when enabling the devmap egress prog via cmdline --load-egress. Issue observed with LLVM version 14.0.0 - Shipped with Fedora 36 on target: x86_64-redhat-linux-gnu. In verbose mode print the source MAC-addr in case xdp_devmap_attached mode is used. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/165754826292.575614.5636444052787717159.stgit@firesoul
2022-07-11bpf, arm64: Add bpf trampoline for arm64Xu Kuohai
This is arm64 version of commit fec56f5890d9 ("bpf: Introduce BPF trampoline"). A bpf trampoline converts native calling convention to bpf calling convention and is used to implement various bpf features, such as fentry, fexit, fmod_ret and struct_ops. This patch does essentially the same thing that bpf trampoline does on x86. Tested on Raspberry Pi 4B and qemu: #18 /1 bpf_tcp_ca/dctcp:OK #18 /2 bpf_tcp_ca/cubic:OK #18 /3 bpf_tcp_ca/invalid_license:OK #18 /4 bpf_tcp_ca/dctcp_fallback:OK #18 /5 bpf_tcp_ca/rel_setsockopt:OK #18 bpf_tcp_ca:OK #51 /1 dummy_st_ops/dummy_st_ops_attach:OK #51 /2 dummy_st_ops/dummy_init_ret_value:OK #51 /3 dummy_st_ops/dummy_init_ptr_arg:OK #51 /4 dummy_st_ops/dummy_multiple_args:OK #51 dummy_st_ops:OK #57 /1 fexit_bpf2bpf/target_no_callees:OK #57 /2 fexit_bpf2bpf/target_yes_callees:OK #57 /3 fexit_bpf2bpf/func_replace:OK #57 /4 fexit_bpf2bpf/func_replace_verify:OK #57 /5 fexit_bpf2bpf/func_sockmap_update:OK #57 /6 fexit_bpf2bpf/func_replace_return_code:OK #57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK #57 /8 fexit_bpf2bpf/func_replace_multi:OK #57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK #57 fexit_bpf2bpf:OK #237 xdp_bpf2bpf:OK Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20220711150823.2128542-5-xukuohai@huawei.com
2022-07-11bpf, arm64: Implement bpf_arch_text_poke() for arm64Xu Kuohai
Implement bpf_arch_text_poke() for arm64, so bpf prog or bpf trampoline can be patched with it. When the target address is NULL, the original instruction is patched to a NOP. When the target address and the source address are within the branch range, the original instruction is patched to a bl instruction to the target address directly. To support attaching bpf trampoline to both regular kernel function and bpf prog, we follow the ftrace patchsite way for bpf prog. That is, two instructions are inserted at the beginning of bpf prog, the first one saves the return address to x9, and the second is a nop which will be patched to a bl instruction when a bpf trampoline is attached. However, when a bpf trampoline is attached to bpf prog, the distance between target address and source address may exceed 128MB, the maximum branch range, because bpf trampoline and bpf prog are allocated separately with vmalloc. So long jump should be handled. When a bpf prog is constructed, a plt pointing to empty trampoline dummy_tramp is placed at the end: bpf_prog: mov x9, lr nop // patchsite ... ret plt: ldr x10, target br x10 target: .quad dummy_tramp // plt target This is also the state when no trampoline is attached. When a short-jump bpf trampoline is attached, the patchsite is patched to a bl instruction to the trampoline directly: bpf_prog: mov x9, lr bl <short-jump bpf trampoline address> // patchsite ... ret plt: ldr x10, target br x10 target: .quad dummy_tramp // plt target When a long-jump bpf trampoline is attached, the plt target is filled with the trampoline address and the patchsite is patched to a bl instruction to the plt: bpf_prog: mov x9, lr bl plt // patchsite ... ret plt: ldr x10, target br x10 target: .quad <long-jump bpf trampoline address> dummy_tramp is used to prevent another CPU from jumping to an unknown location during the patching process, making the patching process easier. The patching process is as follows: 1. when neither the old address or the new address is a long jump, the patchsite is replaced with a bl to the new address, or nop if the new address is NULL; 2. when the old address is not long jump but the new one is, the branch target address is written to plt first, then the patchsite is replaced with a bl instruction to the plt; 3. when the old address is long jump but the new one is not, the address of dummy_tramp is written to plt first, then the patchsite is replaced with a bl to the new address, or a nop if the new address is NULL; 4. when both the old address and the new address are long jump, the new address is written to plt and the patchsite is not changed. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220711150823.2128542-4-xukuohai@huawei.com
2022-07-11arm64: Add LDR (literal) instructionXu Kuohai
Add LDR (literal) instruction to load data from address relative to PC. This instruction will be used to implement long jump from bpf prog to bpf trampoline in the follow-up patch. The instruction encoding: 3 2 2 2 0 0 0 7 6 4 5 0 +-----+-------+---+-----+-------------------------------------+--------+ | 0 x | 0 1 1 | 0 | 0 0 | imm19 | Rt | +-----+-------+---+-----+-------------------------------------+--------+ for 32-bit, variant x == 0; for 64-bit, x == 1. branch_imm_common() is used to check the distance between pc and target address, since it's reused by this patch and LDR (literal) is not a branch instruction, rename it to label_imm_common(). Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/bpf/20220711150823.2128542-3-xukuohai@huawei.com
2022-07-11bpf: Remove is_valid_bpf_tramp_flags()Xu Kuohai
Before generating bpf trampoline, x86 calls is_valid_bpf_tramp_flags() to check the input flags. This check is architecture independent. So, to be consistent with x86, arm64 should also do this check before generating bpf trampoline. However, the BPF_TRAMP_F_XXX flags are not used by user code and the flags argument is almost constant at compile time, so this run time check is a bit redundant. Remove is_valid_bpf_tramp_flags() and add some comments to the usage of BPF_TRAMP_F_XXX flags, as suggested by Alexei. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220711150823.2128542-2-xukuohai@huawei.com
2022-07-11skmsg: Fix invalid last sg check in sk_msg_recvmsg()Liu Jian
In sk_psock_skb_ingress_enqueue function, if the linear area + nr_frags + frag_list of the SKB has NR_MSG_FRAG_IDS blocks in total, skb_to_sgvec will return NR_MSG_FRAG_IDS, then msg->sg.end will be set to NR_MSG_FRAG_IDS, and in addition, (NR_MSG_FRAG_IDS - 1) is set to the last SG of msg. Recv the msg in sk_msg_recvmsg, when i is (NR_MSG_FRAG_IDS - 1), the sk_msg_iter_var_next(i) will change i to 0 (not NR_MSG_FRAG_IDS), the judgment condition "msg_rx->sg.start==msg_rx->sg.end" and "i != msg_rx->sg.end" can not work. As a result, the processed msg cannot be deleted from ingress_msg list. But the length of all the sge of the msg has changed to 0. Then the next recvmsg syscall will process the msg repeatedly, because the length of sge is 0, the -EFAULT error is always returned. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220628123616.186950-1-liujian56@huawei.com
2022-07-11fddi/skfp: fix repeated words in commentsJilin Yuan
Delete the redundant word 'test'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11ethernet/via: fix repeated words in commentsJilin Yuan
Delete the redundant word 'driver'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11net: Find dst with sk's xfrm policy not ctl_sksewookseo
If we set XFRM security policy by calling setsockopt with option IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' struct. However tcp_v6_send_response doesn't look up dst_entry with the actual socket but looks up with tcp control socket. This may cause a problem that a RST packet is sent without ESP encryption & peer's TCP socket can't receive it. This patch will make the function look up dest_entry with actual socket, if the socket has XFRM policy(sock_policy), so that the TCP response packet via this function can be encrypted, & aligned on the encrypted TCP socket. Tested: We encountered this problem when a TCP socket which is encrypted in ESP transport mode encryption, receives challenge ACK at SYN_SENT state. After receiving challenge ACK, TCP needs to send RST to establish the socket at next SYN try. But the RST was not encrypted & peer TCP socket still remains on ESTABLISHED state. So we verified this with test step as below. [Test step] 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). 2. Client tries a new connection on the same TCP ports(src & dst). 3. Server will return challenge ACK instead of SYN,ACK. 4. Client will send RST to server to clear the SOCKET. 5. Client will retransmit SYN to server on the same TCP ports. [Expected result] The TCP connection should be established. Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Sehee Lee <seheele@google.com> Signed-off-by: Sewook Seo <sewookseo@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-07-09 We've added 94 non-merge commits during the last 19 day(s) which contain a total of 125 files changed, 5141 insertions(+), 6701 deletions(-). The main changes are: 1) Add new way for performing BTF type queries to BPF, from Daniel Müller. 2) Add inlining of calls to bpf_loop() helper when its function callback is statically known, from Eduard Zingerman. 3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz. 4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM hooks, from Stanislav Fomichev. 5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko. 6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky. 7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP selftests, from Magnus Karlsson & Maciej Fijalkowski. 8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet. 9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki. 10) Sockmap optimizations around throughput of UDP transmissions which have been improved by 61%, from Cong Wang. 11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa. 12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend. 13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang. 14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma macro, from James Hilliard. 15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan. 16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits) selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n bpf: Correctly propagate errors up from bpf_core_composites_match libbpf: Disable SEC pragma macro on GCC bpf: Check attach_func_proto more carefully in check_return_code selftests/bpf: Add test involving restrict type qualifier bpftool: Add support for KIND_RESTRICT to gen min_core_btf command MAINTAINERS: Add entry for AF_XDP selftests files selftests, xsk: Rename AF_XDP testing app bpf, docs: Remove deprecated xsk libbpf APIs description selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage libbpf, riscv: Use a0 for RC register libbpf: Remove unnecessary usdt_rel_ip assignments selftests/bpf: Fix few more compiler warnings selftests/bpf: Fix bogus uninitialized variable warning bpftool: Remove zlib feature test from Makefile libbpf: Cleanup the legacy uprobe_event on failed add/attach_event() libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy() libbpf: Cleanup the legacy kprobe_event on failed add/attach_event() selftests/bpf: Add type match test against kernel's task_struct selftests/bpf: Add nested type to type based tests ... ==================== Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-09ixp4xx_eth: Set MAC address from device treeLinus Walleij
If there is a MAC address specified in the device tree, then use it. This is already perfectly legal to specify in accordance with the generic ethernet-controller.yaml schema. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09ixp4xx_eth: Fall back to random MAC addressLinus Walleij
If the firmware does not provide a MAC address to the driver, fall back to generating a random MAC address. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09af_unix: fix unix_sysctl_register() error pathEric Dumazet
We want to kfree(table) if @table has been kmalloced, ie for non initial network namespace. Fixes: 849d5aa3a1d8 ("af_unix: Do not call kmemdup() for init_net's sysctl table.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09Merge branch 'mptcp-selftest-improvements-and-header-tweak'David S. Miller
Mat Martineau says: ==================== mptcp: Self test improvements and a header tweak Patch 1 moves a definition to a header so it can be used in a struct declaration. Patch 2 adjusts a time threshold for a selftest that runs much slower on debug kernels (and even more on slow CI infrastructure), to reduce spurious failures. Patches 3 & 4 improve userspace PM test coverage. Patches 5 & 6 clean up output from a test script and selftest helper tool. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09selftests: mptcp: update pm_nl_ctl usage headerGeliang Tang
The usage header of pm_nl_ctl command doesn't match with the context. So this patch adds the missing userspace PM keywords 'ann', 'rem', 'csf', 'dsf', 'events' and 'listen' in it. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09selftests: mptcp: avoid Terminated messages in userspace_pmGeliang Tang
There're some 'Terminated' messages in the output of userspace pm tests script after killing './pm_nl_ctl events' processes: Created network namespaces ns1, ns2 [OK] ./userspace_pm.sh: line 166: 13735 Terminated ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1 ./userspace_pm.sh: line 172: 13737 Terminated ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1 Established IPv4 MPTCP Connection ns2 => ns1 [OK] ./userspace_pm.sh: line 166: 13753 Terminated ip netns exec "$ns2" ./pm_nl_ctl events >> "$client_evts" 2>&1 ./userspace_pm.sh: line 172: 13755 Terminated ip netns exec "$ns1" ./pm_nl_ctl events >> "$server_evts" 2>&1 Established IPv6 MPTCP Connection ns2 => ns1 [OK] ADD_ADDR 10.0.2.2 (ns2) => ns1, invalid token [OK] This patch adds a helper kill_wait(), in it using 'wait $pid 2>/dev/null' commands after 'kill $pid' to avoid printing out these Terminated messages. Use this helper instead of using 'kill $pid'. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09selftests: mptcp: userspace pm subflow testsGeliang Tang
This patch adds userspace pm subflow tests support for mptcp_join.sh script. Add userspace pm create subflow and destroy test cases in userspace_tests(). Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09selftests: mptcp: userspace pm address testsGeliang Tang
This patch adds userspace pm tests support for mptcp_join.sh script. Add userspace pm add_addr and rm_addr test cases in userspace_tests(). Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09selftests: mptcp: tweak simult_flows for debug kernelsPaolo Abeni
The mentioned test measures the transfer run-time to verify that the user-space program is able to use the full aggregate B/W. Even on (virtual) link-speed-bound tests, debug kernel can slow down the transfer enough to cause sporadic test failures. Instead of unconditionally raising the maximum allowed run-time, tweak when the running kernel is a debug one, and use some simple/ rough heuristic to guess such scenarios. Note: this intentionally avoids looking for /boot/config-<version> as the latter file is not always available in our reference CI environments. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.hGeliang Tang
Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to include/net/mptcp.h. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08bcm63xx_enet: change the driver variables to staticYang Yingliang
bcm63xx_enetsw_driver and bcm63xx_enet_driver are only used in bcm63xx_enet.c now, change them to static. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220707135801.1483941-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08net: phylink: fix SGMII inband autoneg enableRussell King (Oracle)
When we are operating in SGMII inband mode, it implies that there is a PHY connected, and the ethtool advertisement for autoneg applies to the PHY, not the SGMII link. When in 1000base-X mode, then this applies to the 802.3z link and needs to be applied to the PCS. Fix this. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1o9Ng2-005Qbe-3H@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08Documentation: add a description for net.core.high_order_alloc_disableAntoine Tenart
A description is missing for the net.core.high_order_alloc_disable option in admin-guide/sysctl/net.rst ; add it. The above sysctl option was introduced by commit ce27ec60648d ("net: add high_order_alloc_disable sysctl/static key"). Thanks to Eric for running again the benchmark cited in the above commit, showing this knob is now mostly of historical importance. Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220707080245.180525-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08net: rxrpc: fix clang -Wformat warningJustin Stitt
When building with Clang we encounter this warning: | net/rxrpc/rxkad.c:434:33: error: format specifies type 'unsigned short' | but the argument has type 'u32' (aka 'unsigned int') [-Werror,-Wformat] | _leave(" = %d [set %hx]", ret, y); y is a u32 but the format specifier is `%hx`. Going from unsigned int to short int results in a loss of data. This is surely not intended behavior. If it is intended, the warning should be suppressed through other means. This patch should get us closer to the goal of enabling the -Wformat flag for Clang builds. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20220707182052.769989-1-justinstitt@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08Merge branch 'tls-pad-strparser-internal-header-decrypt_ctx-etc'Jakub Kicinski
Jakub Kicinski says: ==================== tls: pad strparser, internal header, decrypt_ctx etc. A grab bag of non-functional refactoring to make the series which will let us decrypt into a fresh skb smaller. Patches in this series are not strictly required to get the decryption into a fresh skb going, they are more in the "things which had been annoying me for a while" category. ==================== Link: https://lore.kernel.org/r/20220708010314.1451462-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08tls: rx: make tls_wait_data() return an recvmsg retcodeJakub Kicinski
tls_wait_data() sets the return code as an output parameter and always returns ctx->recv_pkt on success. Return the error code directly and let the caller read the skb from the context. Use positive return code to indicate ctx->recv_pkt is ready. While touching the definition of the function rename it. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08tls: create an internal headerJakub Kicinski
include/net/tls.h is getting a little long, and is probably hard for driver authors to navigate. Split out the internals into a header which will live under net/tls/. While at it move some static inlines with a single user into the source files, add a few tls_ prefixes and fix spelling of 'proccess'. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08tls: rx: coalesce exit paths in tls_decrypt_sg()Jakub Kicinski
Jump to the free() call, instead of having to remember to free the memory in multiple places. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08tls: rx: wrap decrypt params in a structJakub Kicinski
The max size of iv + aad + tail is 22B. That's smaller than a single sg entry (32B). Don't bother with the memory packing, just create a struct which holds the max size of those members. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08tls: rx: always allocate max possible aad size for decryptJakub Kicinski
AAD size is either 5 or 13. Really no point complicating the code for the 8B of difference. This will also let us turn the chunked up buffer into a sane struct. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08strparser: pad sk_skb_cb to avoid straddling cachelinesJakub Kicinski
sk_skb_cb lives within skb->cb[]. skb->cb[] straddles 2 cache lines, each containing 24B of data. The first cache line does not contain much interesting information for users of strparser, so pad things a little. Previously strp_msg->full_len would live in the first cache line and strp_msg->offset in the second. We need to reorder the 8 byte temp_reg with struct tls_msg to prevent a 4B hole which would push the struct over 48B. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/nMaxim Mikityanskiy
When CONFIG_NF_CONNTRACK=m, struct bpf_ct_opts and enum member BPF_F_CURRENT_NETNS are not exposed. This commit allows building the xdp_synproxy selftest in such cases. Note that nf_conntrack must be loaded before running the test if it's compiled as a module. This commit also allows this selftest to be successfully compiled when CONFIG_NF_CONNTRACK is disabled. One unused local variable of type struct bpf_ct_opts is also removed. Fixes: fb5cd0ce70d4 ("selftests/bpf: Add selftests for raw syncookie helpers") Reported-by: Yauheni Kaliuta <ykaliuta@redhat.com> Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220708130319.1016294-1-maximmi@nvidia.com
2022-07-08bpf: Correctly propagate errors up from bpf_core_composites_matchDaniel Müller
This change addresses a comment made earlier [0] about a missing return of an error when __bpf_core_types_match is invoked from bpf_core_composites_match, which could have let to us erroneously ignoring errors. Regarding the typedef name check pointed out in the same context, it is not actually an issue, because callers of the function perform a name check for the root type anyway. To make that more obvious, let's add comments to the function (similar to what we have for bpf_core_types_are_compat, which is called in pretty much the same context). [0]: https://lore.kernel.org/bpf/165708121449.4919.13204634393477172905.git-patchwork-notify@kernel.org/T/#m55141e8f8cfd2e8d97e65328fa04852870d01af6 Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220707211931.3415440-1-deso@posteo.net
2022-07-08libbpf: Disable SEC pragma macro on GCCJames Hilliard
It seems the gcc preprocessor breaks with pragmas when surrounding __attribute__. Disable these pragmas on GCC due to upstream bugs see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90400 Fixes errors like: error: expected identifier or '(' before '#pragma' 106 | SEC("cgroup/bind6") | ^~~ error: expected '=', ',', ';', 'asm' or '__attribute__' before '#pragma' 114 | char _license[] SEC("license") = "GPL"; | ^~~ Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220706111839.1247911-1-james.hilliard1@gmail.com
2022-07-08bpf: Check attach_func_proto more carefully in check_return_codeStanislav Fomichev
Syzkaller reports the following crash: RIP: 0010:check_return_code kernel/bpf/verifier.c:10575 [inline] RIP: 0010:do_check kernel/bpf/verifier.c:12346 [inline] RIP: 0010:do_check_common+0xb3d2/0xd250 kernel/bpf/verifier.c:14610 With the following reproducer: bpf$PROG_LOAD_XDP(0x5, &(0x7f00000004c0)={0xd, 0x3, &(0x7f0000000000)=ANY=[@ANYBLOB="1800000000000019000000000000000095"], &(0x7f0000000300)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x2b, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x80) Because we don't enforce expected_attach_type for XDP programs, we end up in hitting 'if (prog->expected_attach_type == BPF_LSM_CGROUP' part in check_return_code and follow up with testing `prog->aux->attach_func_proto->type`, but `prog->aux->attach_func_proto` is NULL. Add explicit prog_type check for the "Note, BPF_LSM_CGROUP that attach ..." condition. Also, don't skip return code check for LSM/STRUCT_OPS. The above actually brings an issue with existing selftest which tries to return EPERM from void inet_csk_clone. Fix the test (and move called_socket_clone to make sure it's not incremented in case of an error) and add a new one to explicitly verify this condition. Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor") Reported-by: syzbot+5cc0730bd4b4d2c5f152@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220708175000.2603078-1-sdf@google.com
2022-07-08net: ag71xx: switch to napi_build_skb() to reuse skbuff_headsSieng-Piaw Liew
napi_build_skb() reuses NAPI skbuff_head cache in order to save some cycles on freeing/allocating skbuff_heads on every new Rx or completed Tx. Use napi_consume_skb() to feed the cache with skbuff_heads of completed Tx, so it's never empty. The budget parameter is added to indicate NAPI context, as a value of zero can be passed in the case of netpoll. Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08net: minor optimization in __alloc_skb()Eric Dumazet
TCP allocates 'fast clones' skbs for packets in tx queues. Currently, __alloc_skb() initializes the companion fclone field to SKB_FCLONE_CLONE, and leaves other fields untouched. It makes sense to defer this init much later in skb_clone(), because all fclone fields are copied and hot in cpu caches at that time. This removes one cache line miss in __alloc_skb(), cost seen on an host with 256 cpus all competing on memory accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08octeontx2-af: Don't reset previous pfc configHariprasad Kelam
Current implementation is such that driver first resets the existing PFC config before applying new pfc configuration. This creates a problem like once PF or VFs requests PFC config previous pfc config by other PFVfs is getting reset. This patch fixes the problem by removing unnecessary resetting of PFC config. Also configure Pause quanta value to smaller as current value is too high. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08selftests/bpf: Add test involving restrict type qualifierDaniel Müller
This change adds a type based test involving the restrict type qualifier to the BPF selftests. On the btfgen path, this will verify that bpftool correctly handles the corresponding RESTRICT BTF kind. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220706212855.1700615-3-deso@posteo.net
2022-07-08bpftool: Add support for KIND_RESTRICT to gen min_core_btf commandDaniel Müller
This change adjusts bpftool's type marking logic, as used in conjunction with TYPE_EXISTS relocations, to correctly recognize and handle the RESTRICT BTF kind. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220623212205.2805002-1-deso@posteo.net/T/#m4c75205145701762a4b398e0cdb911d5b5305ffc Link: https://lore.kernel.org/bpf/20220706212855.1700615-2-deso@posteo.net
2022-07-08MAINTAINERS: Add entry for AF_XDP selftests filesMaciej Fijalkowski
Lukas reported that after commit f36600634282 ("libbpf: move xsk.{c,h} into selftests/bpf") MAINTAINERS file needed an update. In the meantime, Magnus removed AF_XDP samples in commit cfb5a2dbf141 ("bpf, samples: Remove AF_XDP samples"), but selftests part still misses its entry in MAINTAINERS. Now that xdpxceiver became xskxceiver, tools/testing/selftests/bpf/*xsk* will match all of the files related to AF_XDP testing (test_xsk.sh, xskxceiver, xsk_prereqs.sh, xsk.{c,h}). Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220707111613.49031-3-maciej.fijalkowski@intel.com