summaryrefslogtreecommitdiff
path: root/tools/lib/bpf/libbpf.c
AgeCommit message (Collapse)Author
2023-02-08libbpf: Add sample_period to creation optionsJon Doron
Add option to set when the perf buffer should wake up, by default the perf buffer becomes signaled for every event that is being pushed to it. In case of a high throughput of events it will be more efficient to wake up only once you have X events ready to be read. So your application can wakeup once and drain the entire perf buffer. Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230207081916.3398417-1-arilou@gmail.com
2023-02-06libbpf: Correctly set the kernel code version in Debian kernel.Hao Xiang
In a previous commit, Ubuntu kernel code version is correctly set by retrieving the information from /proc/version_signature. commit<5b3d72987701d51bf31823b39db49d10970f5c2d> (libbpf: Improve LINUX_VERSION_CODE detection) The /proc/version_signature file doesn't present in at least the older versions of Debian distributions (eg, Debian 9, 10). The Debian kernel has a similar issue where the release information from uname() syscall doesn't give the kernel code version that matches what the kernel actually expects. Below is an example content from Debian 10. release: 4.19.0-23-amd64 version: #1 SMP Debian 4.19.269-1 (2022-12-20) x86_64 Debian reports incorrect kernel version in utsname::release returned by uname() syscall, which in older kernels (Debian 9, 10) leads to kprobe BPF programs failing to load due to the version check mismatch. Fortunately, the correct kernel code version presents in the utsname::version returned by uname() syscall in Debian kernels. This change adds another get kernel version function to handle Debian in addition to the previously added get kernel version function to handle Ubuntu. Some minor refactoring work is also done to make the code more readable. Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230203234842.2933903-1-hao.xiang@bytedance.com
2023-01-25libbpf: Support sleepable struct_ops.s sectionDavid Vernet
In a prior change, the verifier was updated to support sleepable BPF_PROG_TYPE_STRUCT_OPS programs. A caller could set the program as sleepable with bpf_program__set_flags(), but it would be more ergonomic and more in-line with other sleepable program types if we supported suffixing a struct_ops section name with .s to indicate that it's sleepable. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125164735.785732-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-13libbpf: Replace '.' with '_' in legacy kprobe event nameMenglong Dong
'.' is not allowed in the event name of kprobe. Therefore, we will get a EINVAL if the kernel function name has a '.' in legacy kprobe attach case, such as 'icmp_reply.constprop.0'. In order to adapt this case, we need to replace the '.' with other char in gen_kprobe_legacy_event_name(). And I use '_' for this propose. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com
2023-01-10libbpf: Fix map creation flags sanitizationLudovic L'Hours
As BPF_F_MMAPABLE flag is now conditionnaly set (by map_is_mmapable), it should not be toggled but disabled if not supported by kernel. Fixes: 4fcac46c7e10 ("libbpf: only add BPF_F_MMAPABLE flag for data maps with global vars") Signed-off-by: Ludovic L'Hours <ludovic.lhours@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230108182018.24433-1-ludovic.lhours@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-12-20libbpf: Fix build warning on ref_ctr_off for 32-bit architecturesKhem Raj
Clang warns on 32-bit ARM on this comparision: libbpf.c:10497:18: error: result of comparison of constant 4294967296 with expression of type 'size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS)) ~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Typecast ref_ctr_off to __u64 in the check conditional, it is false on 32bit anyways. Signed-off-by: Khem Raj <raj.khem@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221219191526.296264-1-raj.khem@gmail.com
2022-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-22selftests/bpf: Workaround for llvm nop-4 bugAlexei Starovoitov
Currently LLVM fails to recognize .data.* as data section and defaults to .text section. Later BPF backend tries to emit 4-byte NOP instruction which doesn't exist in BPF ISA and aborts. The fix for LLVM is pending: https://reviews.llvm.org/D138477 While waiting for the fix lets workaround the linked_list test case by using .bss.* prefix which is properly recognized by LLVM as BSS section. Fix libbpf to support .bss. prefix and adjust tests. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-14libbpf: Fixed various checkpatch issues in libbpf.cKang Minchul
Fixed following checkpatch issues: WARNING: Block comments use a trailing */ on a separate line + * other BPF program's BTF object */ WARNING: Possible repeated word: 'be' + * name. This is important to be be able to find corresponding BTF ERROR: switch and case should be at the same indent + switch (ext->kcfg.sz) { + case 1: *(__u8 *)ext_val = value; break; + case 2: *(__u16 *)ext_val = value; break; + case 4: *(__u32 *)ext_val = value; break; + case 8: *(__u64 *)ext_val = value; break; + default: ERROR: trailing statements should be on next line + case 1: *(__u8 *)ext_val = value; break; ERROR: trailing statements should be on next line + case 2: *(__u16 *)ext_val = value; break; ERROR: trailing statements should be on next line + case 4: *(__u32 *)ext_val = value; break; ERROR: trailing statements should be on next line + case 8: *(__u64 *)ext_val = value; break; ERROR: code indent should use tabs where possible + }$ WARNING: please, no spaces at the start of a line + }$ WARNING: Block comments use a trailing */ on a separate line + * for faster search */ ERROR: code indent should use tabs where possible +^I^I^I^I^I^I &ext->kcfg.is_signed);$ WARNING: braces {} are not necessary for single statement blocks + if (err) { + return err; + } ERROR: code indent should use tabs where possible +^I^I^I^I sizeof(*obj->btf_modules), obj->btf_module_cnt + 1);$ Signed-off-by: Kang Minchul <tegongkang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221113190648.38556-3-tegongkang@gmail.com
2022-11-14libbpf: Use correct return pointer in attach_raw_tpJiri Olsa
We need to pass '*link' to final libbpf_get_error, because that one holds the return value, not 'link'. Fixes: 4fa5bcfe07f7 ("libbpf: Allow BPF program auto-attach handlers to bail out") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221114145257.882322-1-jolsa@kernel.org
2022-11-09libbpf: Hashmap interface update to allow both long and void* keys/valuesEduard Zingerman
An update for libbpf's hashmap interface from void* -> void* to a polymorphic one, allowing both long and void* keys and values. This simplifies many use cases in libbpf as hashmaps there are mostly integer to integer. Perf copies hashmap implementation from libbpf and has to be updated as well. Changes to libbpf, selftests/bpf and perf are packed as a single commit to avoid compilation issues with any future bisect. Polymorphic interface is acheived by hiding hashmap interface functions behind auxiliary macros that take care of necessary type casts, for example: #define hashmap_cast_ptr(p) \ ({ \ _Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\ #p " pointee should be a long-sized integer or a pointer"); \ (long *)(p); \ }) bool hashmap_find(const struct hashmap *map, long key, long *value); #define hashmap__find(map, key, value) \ hashmap_find((map), (long)(key), hashmap_cast_ptr(value)) - hashmap__find macro casts key and value parameters to long and long* respectively - hashmap_cast_ptr ensures that value pointer points to a memory of appropriate size. This hack was suggested by Andrii Nakryiko in [1]. This is a follow up for [2]. [1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/ [2] https://lore.kernel.org/bpf/af1facf9-7bc8-8a3d-0db4-7b3f333589a2@meta.com/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607d Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221109142611.879983-2-eddyz87@gmail.com
2022-10-25libbpf: Support new cgroup local storageYonghong Song
Add support for new cgroup local storage. Acked-by: David Vernet <void@manifault.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221026042856.673989-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-19libbpf: only add BPF_F_MMAPABLE flag for data maps with global varsAndrii Nakryiko
Teach libbpf to not add BPF_F_MMAPABLE flag unnecessarily for ARRAY maps that are backing data sections, if such data sections don't expose any variables to user-space. Exposed variables are those that have STB_GLOBAL or STB_WEAK ELF binding and correspond to BTF VAR's BTF_VAR_GLOBAL_ALLOCATED linkage. The overall idea is that if some data section doesn't have any variable that is exposed through BPF skeleton, then there is no reason to make such BPF array mmapable. Making BPF array mmapable is not a free no-op action, because BPF verifier doesn't allow users to put special objects (such as BPF spin locks, RB tree nodes, linked list nodes, kptrs, etc; anything that has a sensitive internal state that should not be modified arbitrarily from user space) into mmapable arrays, as there is no way to prevent user space from corrupting such sensitive state through direct memory access through memory-mapped region. By making sure that libbpf doesn't add BPF_F_MMAPABLE flag to BPF array maps corresponding to data sections that only have static variables (which are not supposed to be visible to user space according to libbpf and BPF skeleton rules), users now can have spinlocks, kptrs, etc in either default .bss/.data sections or custom .data.* sections (assuming there are no global variables in such sections). The only possible hiccup with this approach is the need to use global variables during BPF static linking, even if it's not intended to be shared with user space through BPF skeleton. To allow such scenarios, extend libbpf's STV_HIDDEN ELF visibility attribute handling to variables. Libbpf is already treating global hidden BPF subprograms as static subprograms and adjusts BTF accordingly to make BPF verifier verify such subprograms as static subprograms with preserving entire BPF verifier state between subprog calls. This patch teaches libbpf to treat global hidden variables as static ones and adjust BTF information accordingly as well. This allows to share variables between multiple object files during static linking, but still keep them internal to BPF program and not get them exposed through BPF skeleton. Note, that if the user has some advanced scenario where they absolutely need BPF_F_MMAPABLE flag on .data/.bss/.rodata BPF array map despite only having static variables, they still can achieve this by forcing it through explicit bpf_map__set_map_flags() API. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20221019002816.359650-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-19libbpf: clean up and refactor BTF fixup stepAndrii Nakryiko
Refactor libbpf's BTF fixup step during BPF object open phase. The only functional change is that we now ignore BTF_VAR_GLOBAL_EXTERN variables during fix up, not just BTF_VAR_STATIC ones, which shouldn't cause any change in behavior as there shouldn't be any extern variable in data sections for valid BPF object anyways. Otherwise it's just collapsing two functions that have no reason to be separate, and switching find_elf_var_offset() helper to return entire symbol pointer, not just its offset. This will be used by next patch to get ELF symbol visibility. While refactoring, also "normalize" debug messages inside btf_fixup_datasec() to follow general libbpf style and print out data section name consistently, where it's available. Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221019002816.359650-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-13libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()Shung-Hsi Yu
When there are no program sections, obj->programs is left unallocated, and find_prog_by_sec_insn()'s search lands on &obj->programs[0] == NULL, and will cause null-pointer dereference in the following access to prog->sec_idx. Guard the search with obj->nr_programs similar to what's being done in __bpf_program__iter() to prevent null-pointer access from happening. Fixes: db2b8b06423c ("libbpf: Support CO-RE relocations for multi-prog sections") Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221012022353.7350-4-shung-hsi.yu@suse.com
2022-10-13libbpf: Deal with section with no data gracefullyShung-Hsi Yu
ELF section data pointer returned by libelf may be NULL (if section has SHT_NOBITS), so null check section data pointer before attempting to copy license and kversion section. Fixes: cb1e5e961991 ("bpf tools: Collect version and license from ELF sections") Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221012022353.7350-3-shung-hsi.yu@suse.com
2022-10-13libbpf: Use elf_getshdrnum() instead of e_shnumShung-Hsi Yu
This commit replace e_shnum with the elf_getshdrnum() helper to fix two oss-fuzz-reported heap-buffer overflow in __bpf_object__open. Both reports are incorrectly marked as fixed and while still being reproducible in the latest libbpf. # clusterfuzz-testcase-minimized-bpf-object-fuzzer-5747922482888704 libbpf: loading object 'fuzz-object' from buffer libbpf: sec_cnt is 0 libbpf: elf: section(1) .data, size 0, link 538976288, flags 2020202020202020, type=2 libbpf: elf: section(2) .data, size 32, link 538976288, flags 202020202020ff20, type=1 ================================================================= ==13==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000c0 at pc 0x0000005a7b46 bp 0x7ffd12214af0 sp 0x7ffd12214ae8 WRITE of size 4 at 0x6020000000c0 thread T0 SCARINESS: 46 (4-byte-write-heap-buffer-overflow-far-from-bounds) #0 0x5a7b45 in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3414:24 #1 0x5733c0 in bpf_object_open /src/libbpf/src/libbpf.c:7223:16 #2 0x5739fd in bpf_object__open_mem /src/libbpf/src/libbpf.c:7263:20 ... The issue lie in libbpf's direct use of e_shnum field in ELF header as the section header count. Where as libelf implemented an extra logic that, when e_shnum == 0 && e_shoff != 0, will use sh_size member of the initial section header as the real section header count (part of ELF spec to accommodate situation where section header counter is larger than SHN_LORESERVE). The above inconsistency lead to libbpf writing into a zero-entry calloc area. So intead of using e_shnum directly, use the elf_getshdrnum() helper provided by libelf to retrieve the section header counter into sec_cnt. Fixes: 0d6988e16a12 ("libbpf: Fix section counting logic") Fixes: 25bbbd7a444b ("libbpf: Remove assumptions about uniqueness of .rodata/.data/.bss maps") Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40868 Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=40957 Link: https://lore.kernel.org/bpf/20221012022353.7350-2-shung-hsi.yu@suse.com
2022-09-26libbpf: Fix the case of running as non-root with capabilitiesJon Doron
When running rootless with special capabilities like: FOWNER / DAC_OVERRIDE / DAC_READ_SEARCH The "access" API will not make the proper check if there is really access to a file or not. >From the access man page: " The check is done using the calling process's real UID and GID, rather than the effective IDs as is done when actually attempting an operation (e.g., open(2)) on the file. Similarly, for the root user, the check uses the set of permitted capabilities rather than the set of effective capabilities; ***and for non-root users, the check uses an empty set of capabilities.*** " What that means is that for non-root user the access API will not do the proper validation if the process really has permission to a file or not. To resolve this this patch replaces all the access API calls with faccessat with AT_EACCESS flag. Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220925070431.1313680-1-arilou@gmail.com
2022-09-23libbpf: Add pathname_concat() helperWang Yufen
Move snprintf and len check to common helper pathname_concat() to make the code simpler. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1663828124-10437-1-git-send-email-wangyufen@huawei.com
2022-09-21bpf: Add libbpf logic for user-space ring bufferDavid Vernet
Now that all of the logic is in place in the kernel to support user-space produced ring buffers, we can add the user-space logic to libbpf. This patch therefore adds the following public symbols to libbpf: struct user_ring_buffer * user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts); void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample); void user_ring_buffer__discard(struct user_ring_buffer *rb, void user_ring_buffer__free(struct user_ring_buffer *rb); A user-space producer must first create a struct user_ring_buffer * object with user_ring_buffer__new(), and can then reserve samples in the ring buffer using one of the following two symbols: void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size); void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms); With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring buffer will be returned if sufficient space is available in the buffer. user_ring_buffer__reserve_blocking() provides similar semantics, but will block for up to 'timeout_ms' in epoll_wait if there is insufficient space in the buffer. This function has the guarantee from the kernel that it will receive at least one event-notification per invocation to bpf_ringbuf_drain(), provided that at least one sample is drained, and the BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain(). Once a sample is reserved, it must either be committed to the ring buffer with user_ring_buffer__submit(), or discarded with user_ring_buffer__discard(). Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
2022-09-21bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map typeDavid Vernet
We want to support a ringbuf map type where samples are published from user-space, to be consumed by BPF programs. BPF currently supports a kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to define a new map type for user-space -> kernel, as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer ring buffer, and we'll want to add one or more helper functions that would not apply for a kernel-producer ring buffer. This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type definition. The map type is useless in its current form, as there is no way to access or use it for anything until we one or more BPF helpers. A follow-on patch will therefore add a new helper function that allows BPF programs to run callbacks on samples that are published to the ring buffer. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
2022-09-16libbpf: Fix crash if SEC("freplace") programs don't have attach_prog_fd setAndrii Nakryiko
Fix SIGSEGV caused by libbpf trying to find attach type in vmlinux BTF for freplace programs. It's wrong to search in vmlinux BTF and libbpf doesn't even mark vmlinux BTF as required for freplace programs. So trying to search anything in obj->vmlinux_btf might cause NULL dereference if nothing else in BPF object requires vmlinux BTF. Instead, error out if freplace (EXT) program doesn't specify attach_prog_fd during at the load time. Fixes: 91abb4a6d79d ("libbpf: Support attachment of BPF tracing programs to kernel modules") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220909193053.577111-3-andrii@kernel.org
2022-08-17libbpf: Clean up deprecated and legacy aliasesAndrii Nakryiko
Remove three missed deprecated APIs that were aliased to new APIs: bpf_object__unload, bpf_prog_attach_xattr and btf__load. Also move legacy API libbpf_find_kernel_btf (aliased to btf__load_vmlinux_btf) into libbpf_legacy.h. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20220816001929.369487-4-andrii@kernel.org
2022-08-17libbpf: Streamline bpf_attr and perf_event_attr initializationAndrii Nakryiko
Make sure that entire libbpf code base is initializing bpf_attr and perf_event_attr with memset(0). Also for bpf_attr make sure we clear and pass to kernel only relevant parts of bpf_attr. bpf_attr is a huge union of independent sub-command attributes, so there is no need to clear and pass entire union bpf_attr, which over time grows quite a lot and for most commands this growth is completely irrelevant. Few cases where we were relying on compiler initialization of BPF UAPI structs (like bpf_prog_info, bpf_map_info, etc) with `= {};` were switched to memset(0) pattern for future-proofing. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20220816001929.369487-3-andrii@kernel.org
2022-08-17libbpf: Fix potential NULL dereference when parsing ELFAndrii Nakryiko
Fix if condition filtering empty ELF sections to prevent NULL dereference. Fixes: 47ea7417b074 ("libbpf: Skip empty sections in bpf_object__init_global_data_maps") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20220816001929.369487-2-andrii@kernel.org
2022-08-17libbpf: Allows disabling auto attachHao Luo
Adds libbpf APIs for disabling auto-attach for individual functions. This is motivated by the use case of cgroup iter [1]. Some iter types require their parameters to be non-zero, therefore applying auto-attach on them will fail. With these two new APIs, users who want to use auto-attach and these types of iters can disable auto-attach on the program and perform manual attach. [1] https://lore.kernel.org/bpf/CAEf4BzZ+a2uDo_t6kGBziqdz--m2gh2_EUwkGLDtMd65uwxUjA@mail.gmail.com/ Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220816234012.910255-1-haoluo@google.com
2022-08-15libbpf: Making bpf_prog_load() ignore name if kernel doesn't supportHangbin Liu
Similar with commit 10b62d6a38f7 ("libbpf: Add names for auxiliary maps"), let's make bpf_prog_load() also ignore name if kernel doesn't support program name. To achieve this, we need to call sys_bpf_prog_load() directly in probe_kern_prog_name() to avoid circular dependency. sys_bpf_prog_load() also need to be exported in the libbpf_internal.h file. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220813000936.6464-1-liuhangbin@gmail.com
2022-08-11libbpf: Add names for auxiliary mapsHangbin Liu
The bpftool self-created maps can appear in final map show output due to deferred removal in kernel. These maps don't have a name, which would make users confused about where it comes from. With a libbpf_ prefix name, users could know who created these maps. It also could make some tests (like test_offload.py, which skip base maps without names as a workaround) filter them out. Kernel adds bpf prog/map name support in the same merge commit fadad670a8ab ("Merge branch 'bpf-extend-info'"). So we can also use kernel_supports(NULL, FEAT_PROG_NAME) to check if kernel supports map name. As discussed [1], Let's make bpf_map_create accept non-null name string, and silently ignore the name if kernel doesn't support. [1] https://lore.kernel.org/bpf/CAEf4BzYL1TQwo1231s83pjTdFPk9XWWhfZC5=KzkU-VO0k=0Ug@mail.gmail.com/ Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220811034020.529685-1-liuhangbin@gmail.com
2022-08-10libbpf: preserve errno across pr_warn/pr_info/pr_debugAndrii Nakryiko
As suggested in [0], make sure that libbpf_print saves and restored errno and as such guaranteed that no matter what actual print callback user installs, macros like pr_warn/pr_info/pr_debug are completely transparent as far as errno goes. While libbpf code is pretty careful about not clobbering important errno values accidentally with pr_warn(), it's a trivial change to make sure that pr_warn can be used anywhere without a risk of clobbering errno. No functional changes, just future proofing. [0] https://github.com/libbpf/libbpf/pull/536 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Müller <deso@posteo.net> Link: https://lore.kernel.org/r/20220810183425.1998735-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-08libbpf: Do not require executable permission for shared librariesHengqi Chen
Currently, resolve_full_path() requires executable permission for both programs and shared libraries. This causes failures on distos like Debian since the shared libraries are not installed executable and Linux is not requiring shared libraries to have executable permissions. Let's remove executable permission check for shared libraries. Reported-by: Goro Fuji <goro@fastly.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220806102021.3867130-1-hengqi.chen@gmail.com
2022-08-08libbpf: Reject legacy 'maps' ELF sectionAndrii Nakryiko
Add explicit error message if BPF object file is still using legacy BPF map definitions in SEC("maps"). Before this change, if BPF object file is still using legacy map definition user will see a bit confusing: libbpf: elf: skipping unrecognized data section(4) maps libbpf: prog 'handler': bad map relo against 'server_map' in section 'maps' Now libbpf will be explicit about rejecting "maps" ELF section: libbpf: elf: legacy map definitions in 'maps' section are not supported by libbpf v1.0+ Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220803214202.23750-1-andrii@kernel.org
2022-08-04libbpf: Skip empty sections in bpf_object__init_global_data_mapsJames Hilliard
The GNU assembler generates an empty .bss section. This is a well established behavior in GAS that happens in all supported targets. The LLVM assembler doesn't generate an empty .bss section. bpftool chokes on the empty .bss section. Additionally in bpf_object__elf_collect the sec_desc->data is not initialized when a section is not recognized. In this case, this happens with .comment. So we must check that sec_desc->data is initialized before checking if the size is 0. Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20220731232649.4668-1-james.hilliard1@gmail.com
2022-07-28libbpf: Support PPC in arch_specific_syscall_pfxDaniel Müller
Commit 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes") added the arch_specific_syscall_pfx() function, which returns a string representing the architecture in use. As it turns out this function is currently not aware of Power PC, where NULL is returned. That's being flagged by the libbpf CI system, which builds for ppc64le and the compiler sees a NULL pointer being passed in to a %s format string. With this change we add representations for two more architectures, for Power PC and Power PC 64, and also adjust the string format logic to handle NULL pointers gracefully, in an attempt to prevent similar issues with other architectures in the future. Fixes: 708ac5bea0ce ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes") Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220728222345.3125975-1-deso@posteo.net
2022-07-19libbpf: make RINGBUF map size adjustments more eagerlyAndrii Nakryiko
Make libbpf adjust RINGBUF map size (rounding it up to closest power-of-2 of page_size) more eagerly: during open phase when initializing the map and on explicit calls to bpf_map__set_max_entries(). Such approach allows user to check actual size of BPF ringbuf even before it's created in the kernel, but also it prevents various edge case scenarios where BPF ringbuf size can get out of sync with what it would be in kernel. One of them (reported in [0]) is during an attempt to pin/reuse BPF ringbuf. Move adjust_ringbuf_sz() helper closer to its first actual use. The implementation of the helper is unchanged. Also make detection of whether bpf_object is already loaded more robust by checking obj->loaded explicitly, given that map->fd can be < 0 even if bpf_object is already loaded due to ability to disable map creation with bpf_map__set_autocreate(map, false). [0] Closes: https://github.com/libbpf/libbpf/pull/530 Fixes: 0087a681fa8c ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220715230952.2219271-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-19libbpf: fallback to tracefs mount point if debugfs is not mountedAndrii Nakryiko
Teach libbpf to fallback to tracefs mount point (/sys/kernel/tracing) if debugfs (/sys/kernel/debug/tracing) isn't mounted. Acked-by: Yonghong Song <yhs@fb.com> Suggested-by: Connor O'Brien <connoro@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220715185736.898848-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-19libbpf: add ksyscall/kretsyscall sections support for syscall kprobesAndrii Nakryiko
Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding kretsyscall variants (for return kprobes) to allow users to kprobe syscall functions in kernel. These special sections allow to ignore complexities and differences between kernel versions and host architectures when it comes to syscall wrapper and corresponding __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on whether host kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER (though libbpf itself doesn't rely on /proc/config.gz for detecting this, see BPF_KSYSCALL patch for how it's done internally). Combined with the use of BPF_KSYSCALL() macro, this allows to just specify intended syscall name and expected input arguments and leave dealing with all the variations to libbpf. In addition to SEC("ksyscall+") and SEC("kretsyscall+") add bpf_program__attach_ksyscall() API which allows to specify syscall name at runtime and provide associated BPF cookie value. At the moment SEC("ksyscall") and bpf_program__attach_ksyscall() do not handle all the calling convention quirks for mmap(), clone() and compat syscalls. It also only attaches to "native" syscall interfaces. If host system supports compat syscalls or defines 32-bit syscalls in 64-bit kernel, such syscall interfaces won't be attached to by libbpf. These limitations may or may not change in the future. Therefore it is recommended to use SEC("kprobe") for these syscalls or if working with compat and 32-bit interfaces is required. Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-19libbpf: improve BPF_KPROBE_SYSCALL macro and rename it to BPF_KSYSCALLAndrii Nakryiko
Improve BPF_KPROBE_SYSCALL (and rename it to shorter BPF_KSYSCALL to match libbpf's SEC("ksyscall") section name, added in next patch) to use __kconfig variable to determine how to properly fetch syscall arguments. Instead of relying on hard-coded knowledge of whether kernel's architecture uses syscall wrapper or not (which only reflects the latest kernel versions, but is not necessarily true for older kernels and won't necessarily hold for later kernel versions on some particular host architecture), determine this at runtime by attempting to create perf_event (with fallback to kprobe event creation through tracefs on legacy kernels, just like kprobe attachment code is doing) for kernel function that would correspond to bpf() syscall on a system that has CONFIG_ARCH_HAS_SYSCALL_WRAPPER set (e.g., for x86-64 it would try '__x64_sys_bpf'). If host kernel uses syscall wrapper, syscall kernel function's first argument is a pointer to struct pt_regs that then contains syscall arguments. In such case we need to use bpf_probe_read_kernel() to fetch actual arguments (which we do through BPF_CORE_READ() macro) from inner pt_regs. But if the kernel doesn't use syscall wrapper approach, input arguments can be read from struct pt_regs directly with no probe reading. All this feature detection is done without requiring /proc/config.gz existence and parsing, and BPF-side helper code uses newly added LINUX_HAS_SYSCALL_WRAPPER virtual __kconfig extern to keep in sync with user-side feature detection of libbpf. BPF_KSYSCALL() macro can be used both with SEC("kprobe") programs that define syscall function explicitly (e.g., SEC("kprobe/__x64_sys_bpf")) and SEC("ksyscall") program added in the next patch (which are the same kprobe program with added benefit of libbpf determining correct kernel function name automatically). Kretprobe and kretsyscall (added in next patch) programs don't need BPF_KSYSCALL as they don't provide access to input arguments. Normal BPF_KRETPROBE is completely sufficient and is recommended. Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-19libbpf: generalize virtual __kconfig externs and use it for USDTAndrii Nakryiko
Libbpf supports single virtual __kconfig extern currently: LINUX_KERNEL_VERSION. LINUX_KERNEL_VERSION isn't coming from /proc/kconfig.gz and is intead customly filled out by libbpf. This patch generalizes this approach to support more such virtual __kconfig externs. One such extern added in this patch is LINUX_HAS_BPF_COOKIE which is used for BPF-side USDT supporting code in usdt.bpf.h instead of using CO-RE-based enum detection approach for detecting bpf_get_attach_cookie() BPF helper. This allows to remove otherwise not needed CO-RE dependency and keeps user-space and BPF-side parts of libbpf's USDT support strictly in sync in terms of their feature detection. We'll use similar approach for syscall wrapper detection for BPF_KSYSCALL() BPF-side macro in follow up patch. Generally, currently libbpf reserves CONFIG_ prefix for Kconfig values and LINUX_ for virtual libbpf-backed externs. In the future we might extend the set of prefixes that are supported. This can be done without any breaking changes, as currently any __kconfig extern with unrecognized name is rejected. For LINUX_xxx externs we support the normal "weak rule": if libbpf doesn't recognize given LINUX_xxx extern but such extern is marked as __weak, it is not rejected and defaults to zero. This follows CONFIG_xxx handling logic and will allow BPF applications to opportunistically use newer libbpf virtual externs without breaking on older libbpf versions unnecessarily. Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220714070755.3235561-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-15libbpf: perfbuf: Add API to get the ring bufferJon Doron
Add support for writing a custom event reader, by exposing the ring buffer. With the new API perf_buffer__buffer() you will get access to the raw mmaped()'ed per-cpu underlying memory of the ring buffer. This region contains both the perf buffer data and header (struct perf_event_mmap_page), which manages the ring buffer state (head/tail positions, when accessing the head/tail position it's important to take into consideration SMP). With this type of low level access one can implement different types of consumers here are few simple examples where this API helps with: 1. perf_event_read_simple is allocating using malloc, perhaps you want to handle the wrap-around in some other way. 2. Since perf buf is per-cpu then the order of the events is not guarnteed, for example: Given 3 events where each event has a timestamp t0 < t1 < t2, and the events are spread on more than 1 CPU, then we can end up with the following state in the ring buf: CPU[0] => [t0, t2] CPU[1] => [t1] When you consume the events from CPU[0], you could know there is a t1 missing, (assuming there are no drops, and your event data contains a sequential index). So now one can simply do the following, for CPU[0], you can store the address of t0 and t2 in an array (without moving the tail, so there data is not perished) then move on the CPU[1] and set the address of t1 in the same array. So you end up with something like: void **arr[] = [&t0, &t1, &t2], now you can consume it orderely and move the tails as you process in order. 3. Assuming there are multiple CPUs and we want to start draining the messages from them, then we can "pick" with which one to start with according to the remaining free space in the ring buffer. Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220715181122.149224-1-arilou@gmail.com
2022-07-13libbpf: Fix the name of a reused mapAnquan Wu
BPF map name is limited to BPF_OBJ_NAME_LEN. A map name is defined as being longer than BPF_OBJ_NAME_LEN, it will be truncated to BPF_OBJ_NAME_LEN when a userspace program calls libbpf to create the map. A pinned map also generates a path in the /sys. If the previous program wanted to reuse the map, it can not get bpf_map by name, because the name of the map is only partially the same as the name which get from pinned path. The syscall information below show that map name "process_pinned_map" is truncated to "process_pinned_". bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/process_pinned_map", bpf_fd=0, file_flags=0}, 144) = -1 ENOENT (No such file or directory) bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=4, value_size=4,max_entries=1024, map_flags=0, inner_map_fd=0, map_name="process_pinned_",map_ifindex=0, btf_fd=3, btf_key_type_id=6, btf_value_type_id=10,btf_vmlinux_value_type_id=0}, 72) = 4 This patch check that if the name of pinned map are the same as the actual name for the first (BPF_OBJ_NAME_LEN - 1), bpf map still uses the name which is included in bpf object. Fixes: 26736eb9a483 ("tools: libbpf: allow map reuse") Signed-off-by: Anquan Wu <leiqi96@hotmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/OSZP286MB1725CEA1C95C5CB8E7CCC53FB8869@OSZP286MB1725.JPNP286.PROD.OUTLOOK.COM
2022-07-13libbpf: Error out when binary_path is NULL for uprobe and USDTHengqi Chen
binary_path is a required non-null parameter for bpf_program__attach_usdt and bpf_program__attach_uprobe_opts. Check it against NULL to prevent coredump on strchr. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220712025745.2703995-1-hengqi.chen@gmail.com
2022-07-05libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()Chuang Wang
A potential scenario, when an error is returned after add_uprobe_event_legacy() in perf_event_uprobe_open_legacy(), or bpf_program__attach_perf_event_opts() in bpf_program__attach_uprobe_opts() returns an error, the uprobe_event that was previously created is not cleaned. So, with this patch, when an error is returned, fix this by adding remove_uprobe_event_legacy() Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220629151848.65587-4-nashuiliang@gmail.com
2022-07-05libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()Chuang Wang
Use "type" as opposed to "err" in pr_warn() after determine_uprobe_perf_type_legacy() returns an error. Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220629151848.65587-3-nashuiliang@gmail.com
2022-07-05libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()Chuang Wang
Before the 0bc11ed5ab60 commit ("kprobes: Allow kprobes coexist with livepatch"), in a scenario where livepatch and kprobe coexist on the same function entry, the creation of kprobe_event using add_kprobe_event_legacy() will be successful, at the same time as a trace event (e.g. /debugfs/tracing/events/kprobe/XXX) will exist, but perf_event_open() will return an error because both livepatch and kprobe use FTRACE_OPS_FL_IPMODIFY. As follows: 1) add a livepatch $ insmod livepatch-XXX.ko 2) add a kprobe using tracefs API (i.e. add_kprobe_event_legacy) $ echo 'p:mykprobe XXX' > /sys/kernel/debug/tracing/kprobe_events 3) enable this kprobe (i.e. sys_perf_event_open) This will return an error, -EBUSY. On Andrii Nakryiko's comment, few error paths in bpf_program__attach_kprobe_opts() that should need to call remove_kprobe_event_legacy(). With this patch, whenever an error is returned after add_kprobe_event_legacy() or bpf_program__attach_perf_event_opts(), this ensures that the created kprobe_event is cleaned. Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Signed-off-by: Jingren Zhou <zhoujingren@didiglobal.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220629151848.65587-2-nashuiliang@gmail.com
2022-07-05bpf, libbpf: Add type match supportDaniel Müller
This patch adds support for the proposed type match relation to relo_core where it is shared between userspace and kernel. It plumbs through both kernel-side and libbpf-side support. The matching relation is defined as follows (copy from source): - modifiers and typedefs are stripped (and, hence, effectively ignored) - generally speaking types need to be of same kind (struct vs. struct, union vs. union, etc.) - exceptions are struct/union behind a pointer which could also match a forward declaration of a struct or union, respectively, and enum vs. enum64 (see below) Then, depending on type: - integers: - match if size and signedness match - arrays & pointers: - target types are recursively matched - structs & unions: - local members need to exist in target with the same name - for each member we recursively check match unless it is already behind a pointer, in which case we only check matching names and compatible kind - enums: - local variants have to have a match in target by symbolic name (but not numeric value) - size has to match (but enum may match enum64 and vice versa) - function pointers: - number and position of arguments in local type has to match target - for each argument and the return value we recursively check match Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220628160127.607834-5-deso@posteo.net
2022-06-29libbpf: add lsm_cgoup_sock typeStanislav Fomichev
lsm_cgroup/ is the prefix for BPF_LSM_CGROUP. Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-9-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-28libbpf: enforce strict libbpf 1.0 behaviorsAndrii Nakryiko
Remove support for legacy features and behaviors that previously had to be disabled by calling libbpf_set_strict_mode(): - legacy BPF map definitions are not supported now; - RLIMIT_MEMLOCK auto-setting, if necessary, is always on (but see libbpf_set_memlock_rlim()); - program name is used for program pinning (instead of section name); - cleaned up error returning logic; - entry BPF programs should have SEC() always. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220627211527.2245459-15-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-28libbpf: clean up SEC() handlingAndrii Nakryiko
Get rid of sloppy prefix logic and remove deprecated xdp_{devmap,cpumap} sections. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220627211527.2245459-13-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-28libbpf: remove internal multi-instance prog supportAndrii Nakryiko
Clean up internals that had to deal with the possibility of multi-instance bpf_programs. Libbpf 1.0 doesn't support this, so all this is not necessary now and can be simplified. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220627211527.2245459-12-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-28libbpf: remove multi-instance and custom private data APIsAndrii Nakryiko
Remove all the public APIs that are related to creating multi-instance bpf_programs through custom preprocessing callback and generally working with them. Also remove all the bpf_{object,map,program}__[set_]priv() APIs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220627211527.2245459-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>