summaryrefslogtreecommitdiff
path: root/kernel/bpf/helpers.c
AgeCommit message (Collapse)Author
2022-11-22bpf: Add bpf_cgroup_ancestor() kfuncDavid Vernet
struct cgroup * objects have a variably sized struct cgroup *ancestors[] field which stores pointers to their ancestor cgroups. If using a cgroup as a kptr, it can be useful to access these ancestors, but doing so requires variable offset accesses for PTR_TO_BTF_ID, which is currently unsupported. This is a very useful field to access for cgroup kptrs, as programs may wish to walk their ancestor cgroups when determining e.g. their proportional cpu.weight. So as to enable this functionality with cgroup kptrs before var_off is supported for PTR_TO_BTF_ID, this patch adds a bpf_cgroup_ancestor() kfunc which accesses the cgroup node on behalf of the caller, and acquires a reference on it. Once var_off is supported for PTR_TO_BTF_ID, and fields inside a struct can be marked as trusted so they retain the PTR_TRUSTED modifier when walked, this can be removed. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221122055458.173143-4-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-22bpf: Enable cgroups to be used as kptrsDavid Vernet
Now that tasks can be used as kfuncs, and the PTR_TRUSTED flag is available for us to easily add basic acquire / get / release kfuncs, we can do the same for cgroups. This patch set adds the following kfuncs which enable using cgroups as kptrs: struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp); struct cgroup *bpf_cgroup_kptr_get(struct cgroup **cgrpp); void bpf_cgroup_release(struct cgroup *cgrp); A follow-on patch will add a selftest suite which validates these kfuncs. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221122055458.173143-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-20bpf: Add a kfunc for generic type castYonghong Song
Implement bpf_rdonly_cast() which tries to cast the object to a specified type. This tries to support use case like below: #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) where skb_end_pointer(SKB) is a 'unsigned char *' and needs to be casted to 'struct skb_shared_info *'. The signature of bpf_rdonly_cast() looks like void *bpf_rdonly_cast(void *obj, __u32 btf_id) The function returns the same 'obj' but with PTR_TO_BTF_ID with btf_id. The verifier will ensure btf_id being a struct type. Since the supported type cast may not reflect what the 'obj' represents, the returned btf_id is marked as PTR_UNTRUSTED, so the return value and subsequent pointer chasing cannot be used as helper/kfunc arguments. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221120195437.3114585-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-20bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctxYonghong Song
Implement bpf_cast_to_kern_ctx() kfunc which does a type cast of a uapi ctx object to the corresponding kernel ctx. Previously if users want to access some data available in kctx but not in uapi ctx, bpf_probe_read_kernel() helper is needed. The introduction of bpf_cast_to_kern_ctx() allows direct memory access which makes code simpler and easier to understand. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221120195432.3113982-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-20bpf: Add support for kfunc set with common btf_idsYonghong Song
Later on, we will introduce kfuncs bpf_cast_to_kern_ctx() and bpf_rdonly_cast() which apply to all program types. Currently kfunc set only supports individual prog types. This patch added support for kfunc applying to all program types. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221120195426.3113828-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-20bpf: Disallow bpf_obj_new_impl call when bpf_mem_alloc_init failsKumar Kartikeya Dwivedi
In the unlikely event that bpf_global_ma is not correctly initialized, instead of checking the boolean everytime bpf_obj_new_impl is called, simply check it while loading the program and return an error if bpf_global_ma_set is false. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221120212610.2361700-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-20bpf: Add kfuncs for storing struct task_struct * as a kptrDavid Vernet
Now that BPF supports adding new kernel functions with kfuncs, and storing kernel objects in maps with kptrs, we can add a set of kfuncs which allow struct task_struct objects to be stored in maps as referenced kptrs. The possible use cases for doing this are plentiful. During tracing, for example, it would be useful to be able to collect some tasks that performed a certain operation, and then periodically summarize who they are, which cgroup they're in, how much CPU time they've utilized, etc. In order to enable this, this patch adds three new kfuncs: struct task_struct *bpf_task_acquire(struct task_struct *p); struct task_struct *bpf_task_kptr_get(struct task_struct **pp); void bpf_task_release(struct task_struct *p); A follow-on patch will add selftests validating these kfuncs. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221120051004.3605026-4-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17bpf: Introduce single ownership BPF linked list APIKumar Kartikeya Dwivedi
Add a linked list API for use in BPF programs, where it expects protection from the bpf_spin_lock in the same allocation as the bpf_list_head. For now, only one bpf_spin_lock can be present hence that is assumed to be the one protecting the bpf_list_head. The following functions are added to kick things off: // Add node to beginning of list void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node); // Add node to end of list void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node); // Remove node at beginning of list and return it struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head); // Remove node at end of list and return it struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head); The lock protecting the bpf_list_head needs to be taken for all operations. The verifier ensures that the lock that needs to be taken is always held, and only the correct lock is taken for these operations. These checks are made statically by relying on the reg->id preserved for registers pointing into regions having both bpf_spin_lock and the objects protected by it. The comment over check_reg_allocation_locked in this change describes the logic in detail. Note that bpf_list_push_front and bpf_list_push_back are meant to consume the object containing the node in the 1st argument, however that specific mechanism is intended to not release the ref_obj_id directly until the bpf_spin_unlock is called. In this commit, nothing is done, but the next commit will be introducing logic to handle this case, so it has been left as is for now. bpf_list_pop_front and bpf_list_pop_back delete the first or last item of the list respectively, and return pointer to the element at the list_node offset. The user can then use container_of style macro to get the actual entry type. The verifier however statically knows the actual type, so the safety properties are still preserved. With these additions, programs can now manage their own linked lists and store their objects in them. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-17-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17bpf: Introduce bpf_obj_dropKumar Kartikeya Dwivedi
Introduce bpf_obj_drop, which is the kfunc used to free allocated objects (allocated using bpf_obj_new). Pairing with bpf_obj_new, it implicitly destructs the fields part of object automatically without user intervention. Just like the previous patch, btf_struct_meta that is needed to free up the special fields is passed as a hidden argument to the kfunc. For the user, a convenience macro hides over the kernel side kfunc which is named bpf_obj_drop_impl. Continuing the previous example: void prog(void) { struct foo *f; f = bpf_obj_new(typeof(*f)); if (!f) return; bpf_obj_drop(f); } Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-15-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17bpf: Introduce bpf_obj_newKumar Kartikeya Dwivedi
Introduce type safe memory allocator bpf_obj_new for BPF programs. The kernel side kfunc is named bpf_obj_new_impl, as passing hidden arguments to kfuncs still requires having them in prototype, unlike BPF helpers which always take 5 arguments and have them checked using bpf_func_proto in verifier, ignoring unset argument types. Introduce __ign suffix to ignore a specific kfunc argument during type checks, then use this to introduce support for passing type metadata to the bpf_obj_new_impl kfunc. The user passes BTF ID of the type it wants to allocates in program BTF, the verifier then rewrites the first argument as the size of this type, after performing some sanity checks (to ensure it exists and it is a struct type). The second argument is also fixed up and passed by the verifier. This is the btf_struct_meta for the type being allocated. It would be needed mostly for the offset array which is required for zero initializing special fields while leaving the rest of storage in unitialized state. It would also be needed in the next patch to perform proper destruction of the object's special fields. Under the hood, bpf_obj_new will call bpf_mem_alloc and bpf_mem_free, using the any context BPF memory allocator introduced recently. To this end, a global instance of the BPF memory allocator is initialized on boot to be used for this purpose. This 'bpf_global_ma' serves all allocations for bpf_obj_new. In the future, bpf_obj_new variants will allow specifying a custom allocator. Note that now that bpf_obj_new can be used to allocate objects that can be linked to BPF linked list (when future linked list helpers are available), we need to also free the elements using bpf_mem_free. However, since the draining of elements is done outside the bpf_spin_lock, we need to do migrate_disable around the call since bpf_list_head_free can be called from map free path where migration is enabled. Otherwise, when called from BPF programs migration is already disabled. A convenience macro is included in the bpf_experimental.h header to hide over the ugly details of the implementation, leading to user code looking similar to a language level extension which allocates and constructs fields of a user type. struct bar { struct bpf_list_node node; }; struct foo { struct bpf_spin_lock lock; struct bpf_list_head head __contains(bar, node); }; void prog(void) { struct foo *f; f = bpf_obj_new(typeof(*f)); if (!f) return; ... } A key piece of this story is still missing, i.e. the free function, which will come in the next patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-14-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-17bpf: Allow locking bpf_spin_lock in allocated objectsKumar Kartikeya Dwivedi
Allow locking a bpf_spin_lock in an allocated object, in addition to already supported map value pointers. The handling is similar to that of map values, by just preserving the reg->id of PTR_TO_BTF_ID | MEM_ALLOC as well, and adjusting process_spin_lock to work with them and remember the id in verifier state. Refactor the existing process_spin_lock to work with PTR_TO_BTF_ID | MEM_ALLOC in addition to PTR_TO_MAP_VALUE. We need to update the reg_may_point_to_spin_lock which is used in mark_ptr_or_null_reg to preserve reg->id, that will be used in env->cur_state->active_spin_lock to remember the currently held spin lock. Also update the comment describing bpf_spin_lock implementation details to also talk about PTR_TO_BTF_ID | MEM_ALLOC type. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221118015614.2013203-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-14bpf: Support bpf_list_head in map valuesKumar Kartikeya Dwivedi
Add the support on the map side to parse, recognize, verify, and build metadata table for a new special field of the type struct bpf_list_head. To parameterize the bpf_list_head for a certain value type and the list_node member it will accept in that value type, we use BTF declaration tags. The definition of bpf_list_head in a map value will be done as follows: struct foo { struct bpf_list_node node; int data; }; struct map_value { struct bpf_list_head head __contains(foo, node); }; Then, the bpf_list_head only allows adding to the list 'head' using the bpf_list_node 'node' for the type struct foo. The 'contains' annotation is a BTF declaration tag composed of four parts, "contains:name:node" where the name is then used to look up the type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during the lookup. The node defines name of the member in this type that has the type struct bpf_list_node, which is actually used for linking into the linked list. For now, 'kind' part is hardcoded as struct. This allows building intrusive linked lists in BPF, using container_of to obtain pointer to entry, while being completely type safe from the perspective of the verifier. The verifier knows exactly the type of the nodes, and knows that list helpers return that type at some fixed offset where the bpf_list_node member used for this list exists. The verifier also uses this information to disallow adding types that are not accepted by a certain list. For now, no elements can be added to such lists. Support for that is coming in future patches, hence draining and freeing items is done with a TODO that will be resolved in a future patch. Note that the bpf_list_head_free function moves the list out to a local variable under the lock and releases it, doing the actual draining of the list items outside the lock. While this helps with not holding the lock for too long pessimizing other concurrent list operations, it is also necessary for deadlock prevention: unless every function called in the critical section would be notrace, a fentry/fexit program could attach and call bpf_map_update_elem again on the map, leading to the same lock being acquired if the key matches and lead to a deadlock. While this requires some special effort on part of the BPF programmer to trigger and is highly unlikely to occur in practice, it is always better if we can avoid such a condition. While notrace would prevent this, doing the draining outside the lock has advantages of its own, hence it is used to also fix the deadlock related problem. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-03bpf: Consolidate spin_lock, timer management into btf_recordKumar Kartikeya Dwivedi
Now that kptr_off_tab has been refactored into btf_record, and can hold more than one specific field type, accomodate bpf_spin_lock and bpf_timer as well. While they don't require any more metadata than offset, having all special fields in one place allows us to share the same code for allocated user defined types and handle both map values and these allocated objects in a similar fashion. As an optimization, we still keep spin_lock_off and timer_off offsets in the btf_record structure, just to avoid having to find the btf_field struct each time their offset is needed. This is mostly needed to manipulate such objects in a map value at runtime. It's ok to hardcode just one offset as more than one field is disallowed. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221103191013.1236066-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-25bpf: Implement cgroup storage available to non-cgroup-attached bpf progsYonghong Song
Similar to sk/inode/task storage, implement similar cgroup local storage. There already exists a local storage implementation for cgroup-attached bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper bpf_get_local_storage(). But there are use cases such that non-cgroup attached bpf progs wants to access cgroup local storage data. For example, tc egress prog has access to sk and cgroup. It is possible to use sk local storage to emulate cgroup local storage by storing data in socket. But this is a waste as it could be lots of sockets belonging to a particular cgroup. Alternatively, a separate map can be created with cgroup id as the key. But this will introduce additional overhead to manipulate the new map. A cgroup local storage, similar to existing sk/inode/task storage, should help for this use case. The life-cycle of storage is managed with the life-cycle of the cgroup struct. i.e. the storage is destroyed along with the owning cgroup with a call to bpf_cgrp_storage_free() when cgroup itself is deleted. The userspace map operations can be done by using a cgroup fd as a key passed to the lookup, update and delete operations. Typically, the following code is used to get the current cgroup: struct task_struct *task = bpf_get_current_task_btf(); ... task->cgroups->dfl_cgrp ... and in structure task_struct definition: struct task_struct { .... struct css_set __rcu *cgroups; .... } With sleepable program, accessing task->cgroups is not protected by rcu_read_lock. So the current implementation only supports non-sleepable program and supporting sleepable program will be the next step together with adding rcu_read_lock protection for rcu tagged structures. Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used for cgroup storage available to non-cgroup-attached bpf programs. The old cgroup storage supports bpf_get_local_storage() helper to get the cgroup data. The new cgroup storage helper bpf_cgrp_storage_get() can provide similar functionality. While old cgroup storage pre-allocates storage memory, the new mechanism can also pre-allocate with a user space bpf_map_update_elem() call to avoid potential run-time memory allocation failure. Therefore, the new cgroup storage can provide all functionality w.r.t. the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can be deprecated since the new one can provide the same functionality. Acked-by: David Vernet <void@manifault.com> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-10-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in the left-over fixes before the net-next pull-request. Conflicts: drivers/net/ethernet/mediatek/mtk_ppe.c ae3ed15da588 ("net: ethernet: mtk_eth_soc: fix state in __mtk_foe_entry_clear") 9d8cb4c096ab ("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc") https://lore.kernel.org/all/6cb6893b-4921-a068-4c30-1109795110bb@tessares.net/ kernel/bpf/helpers.c 8addbfc7b308 ("bpf: Gate dynptr API behind CAP_BPF") 5679ff2f138f ("bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPF") 8a67f2de9b1d ("bpf: expose bpf_strtol and bpf_strtoul to all program types") https://lore.kernel.org/all/20221003201957.13149-1-daniel@iogearbox.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21bpf: Export bpf_dynptr_get_size()Roberto Sassu
Export bpf_dynptr_get_size(), so that kernel code dealing with eBPF dynamic pointers can obtain the real size of data carried by this data structure. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220920075951.929132-6-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21btf: Export bpf_dynptr definitionRoberto Sassu
eBPF dynamic pointers is a new feature recently added to upstream. It binds together a pointer to a memory area and its size. The internal kernel structure bpf_dynptr_kern is not accessible by eBPF programs in user space. They instead see bpf_dynptr, which is then translated to the internal kernel structure by the eBPF verifier. The problem is that it is not possible to include at the same time the uapi include linux/bpf.h and the vmlinux BTF vmlinux.h, as they both contain the definition of some structures/enums. The compiler complains saying that the structures/enums are redefined. As bpf_dynptr is defined in the uapi include linux/bpf.h, this makes it impossible to include vmlinux.h. However, in some cases, e.g. when using kfuncs, vmlinux.h has to be included. The only option until now was to include vmlinux.h and add the definition of bpf_dynptr directly in the eBPF program source code from linux/bpf.h. Solve the problem by using the same approach as for bpf_timer (which also follows the same scheme with the _kern suffix for the internal kernel structure). Add the following line in one of the dynamic pointer helpers, bpf_dynptr_from_mem(): BTF_TYPE_EMIT(struct bpf_dynptr); Cc: stable@vger.kernel.org Cc: Joanne Koong <joannelkoong@gmail.com> Fixes: 97e03f521050c ("bpf: Add verifier support for dynptrs") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Tested-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20220920075951.929132-3-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21bpf: Add bpf_user_ringbuf_drain() helperDavid Vernet
In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which will allow user-space applications to publish messages to a ring buffer that is consumed by a BPF program in kernel-space. In order for this map-type to be useful, it will require a BPF helper function that BPF programs can invoke to drain samples from the ring buffer, and invoke callbacks on those samples. This change adds that capability via a new BPF helper function: bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags) BPF programs may invoke this function to run callback_fn() on a series of samples in the ring buffer. callback_fn() has the following signature: long callback_fn(struct bpf_dynptr *dynptr, void *context); Samples are provided to the callback in the form of struct bpf_dynptr *'s, which the program can read using BPF helper functions for querying struct bpf_dynptr's. In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register type is added to the verifier to reflect a dynptr that was allocated by a helper function and passed to a BPF program. Unlike PTR_TO_STACK dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR dynptrs need not use reference tracking, as the BPF helper is trusted to properly free the dynptr before returning. The verifier currently only supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL. Note that while the corresponding user-space libbpf logic will be added in a subsequent patch, this patch does contain an implementation of the .map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This .map_poll() callback guarantees that an epoll-waiting user-space producer will receive at least one event notification whenever at least one sample is drained in an invocation of bpf_user_ringbuf_drain(), provided that the function is not invoked with the BPF_RB_NO_WAKEUP flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup notification is sent even if no sample was drained. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com
2022-09-21bpf: Gate dynptr API behind CAP_BPFKumar Kartikeya Dwivedi
This has been enabled for unprivileged programs for only one kernel release, hence the expected annoyances due to this move are low. Users using ringbuf can stick to non-dynptr APIs. The actual use cases dynptr is meant to serve may not make sense in unprivileged BPF programs. Hence, gate these helpers behind CAP_BPF and limit use to privileged BPF programs. Fixes: 263ae152e962 ("bpf: Add bpf_dynptr_from_mem for local dynptrs") Fixes: bc34dee65a65 ("bpf: Dynptr support for ring buffers") Fixes: 13bbbfbea759 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write") Fixes: 34d4ef5775f7 ("bpf: Add dynptr data slices") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220921143550.30247-1-memxor@gmail.com Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-15bpf: Add verifier check for BPF_PTR_POISON retval and argDave Marchevsky
BPF_PTR_POISON was added in commit c0a5a21c25f37 ("bpf: Allow storing referenced kptr in map") to denote a bpf_func_proto btf_id which the verifier will replace with a dynamically-determined btf_id at verification time. This patch adds verifier 'poison' functionality to BPF_PTR_POISON in order to prepare for expanded use of the value to poison ret- and arg-btf_id in ongoing work, namely rbtree and linked list patchsets [0, 1]. Specifically, when the verifier checks helper calls, it assumes that BPF_PTR_POISON'ed ret type will be replaced with a valid type before - or in lieu of - the default ret_btf_id logic. Similarly for arg btf_id. If poisoned btf_id reaches default handling block for either, consider this a verifier internal error and fail verification. Otherwise a helper w/ poisoned btf_id but no verifier logic replacing the type will cause a crash as the invalid pointer is dereferenced. Also move BPF_PTR_POISON to existing include/linux/posion.h header and remove unnecessary shift. [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com [1]: lore.kernel.org/bpf/20220904204145.3089-1-memxor@gmail.com Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220912154544.1398199-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPFKumar Kartikeya Dwivedi
They would require func_info which needs prog BTF anyway. Loading BTF and setting the prog btf_fd while loading the prog indirectly requires CAP_BPF, so just to reduce confusion, move both these helpers taking callback under bpf_capable() protection as well, since they cannot be used without CAP_BPF. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220823013117.24916-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf: expose bpf_strtol and bpf_strtoul to all program typesStanislav Fomichev
bpf_strncmp is already exposed everywhere. The motivation is to keep those helpers in kernel/bpf/helpers.c. Otherwise it's tempting to move them under kernel/bpf/cgroup.c because they are currently only used by sysctl prog types. Suggested-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220823222555.523590-4-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf: Introduce cgroup_{common,current}_func_protoStanislav Fomichev
Split cgroup_base_func_proto into the following: * cgroup_common_func_proto - common helpers for all cgroup hooks * cgroup_current_func_proto - common helpers for all cgroup hooks running in the process context (== have meaningful 'current'). Move bpf_{g,s}et_retval and other cgroup-related helpers into kernel/bpf/cgroup.c so they closer to where they are being used. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220823222555.523590-2-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10bpf: export crash_kexec() as destructive kfuncArtem Savkov
Allow properly marked bpf programs to call crash_kexec(). Signed-off-by: Artem Savkov <asavkov@redhat.com> Link: https://lore.kernel.org/r/20220810065905.475418-3-asavkov@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-09bpf: Add BPF-helper for accessing CLOCK_TAIJesper Dangaard Brouer
Commit 3dc6ffae2da2 ("timekeeping: Introduce fast accessor to clock tai") introduced a fast and NMI-safe accessor for CLOCK_TAI. Especially in time sensitive networks (TSN), where all nodes are synchronized by Precision Time Protocol (PTP), it's helpful to have the possibility to generate timestamps based on CLOCK_TAI instead of CLOCK_MONOTONIC. With a BPF helper for TAI in place, it becomes very convenient to correlate activity across different machines in the network. Use cases for such a BPF helper include functionalities such as Tx launch time (e.g. ETF and TAPRIO Qdiscs) and timestamping. Note: CLOCK_TAI is nothing new per se, only the NMI-safe variant of it is. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> [Kurt: Wrote changelog and renamed helper] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20220809060803.5773-2-kurt@linutronix.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/net/sock.h 310731e2f161 ("net: Fix data-races around sysctl_mem.") e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/ net/ipv4/fib_semantics.c 747c14307214 ("ip: fix dflt addr selection for connected nexthop") d62607c3fe45 ("net: rename reference+tracking helpers") net/tls/tls.h include/net/tls.h 3d8c51b25a23 ("net/tls: Check for errors in tls_device_init") 587903142308 ("tls: create an internal header") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIsJoanne Koong
Commit 13bbbfbea759 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write") added the bpf_dynptr_write() and bpf_dynptr_read() APIs. However, it will be needed for some dynptr types to pass in flags as well (e.g. when writing to a skb, the user may like to invalidate the hash or recompute the checksum). This patch adds a "u64 flags" arg to the bpf_dynptr_read() and bpf_dynptr_write() APIs before their UAPI signature freezes where we then cannot change them anymore with a 5.19.x released kernel. Fixes: 13bbbfbea759 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20220706232547.4016651-1-joannelkoong@gmail.com
2022-06-17bpf: Fix non-static bpf_func_proto struct definitionsJoanne Koong
This patch does two things: 1) Marks the dynptr bpf_func_proto structs that were added in [1] as static, as pointed out by the kernel test robot in [2]. 2) There are some bpf_func_proto structs marked as extern which can instead be statically defined. [1] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/ [2] https://lore.kernel.org/bpf/62ab89f2.Pko7sI08RAKdF8R6%25lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220616225407.1878436-1-joannelkoong@gmail.com
2022-05-23bpf: Add dynptr data slicesJoanne Koong
This patch adds a new helper function void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len); which returns a pointer to the underlying data of a dynptr. *len* must be a statically known value. The bpf program may access the returned data slice as a normal buffer (eg can do direct reads and writes), since the verifier associates the length with the returned pointer, and enforces that no out of bounds accesses occur. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220523210712.3641569-6-joannelkoong@gmail.com
2022-05-23bpf: Add bpf_dynptr_read and bpf_dynptr_writeJoanne Koong
This patch adds two helper functions, bpf_dynptr_read and bpf_dynptr_write: long bpf_dynptr_read(void *dst, u32 len, struct bpf_dynptr *src, u32 offset); long bpf_dynptr_write(struct bpf_dynptr *dst, u32 offset, void *src, u32 len); The dynptr passed into these functions must be valid dynptrs that have been initialized. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220523210712.3641569-5-joannelkoong@gmail.com
2022-05-23bpf: Dynptr support for ring buffersJoanne Koong
Currently, our only way of writing dynamically-sized data into a ring buffer is through bpf_ringbuf_output but this incurs an extra memcpy cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra memcpy, but it can only safely support reservation sizes that are statically known since the verifier cannot guarantee that the bpf program won’t access memory outside the reserved space. The bpf_dynptr abstraction allows for dynamically-sized ring buffer reservations without the extra memcpy. There are 3 new APIs: long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); These closely follow the functionalities of the original ringbuf APIs. For example, all ringbuffer dynptrs that have been reserved must be either submitted or discarded before the program exits. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20220523210712.3641569-4-joannelkoong@gmail.com
2022-05-23bpf: Add bpf_dynptr_from_mem for local dynptrsJoanne Koong
This patch adds a new api bpf_dynptr_from_mem: long bpf_dynptr_from_mem(void *data, u32 size, u64 flags, struct bpf_dynptr *ptr); which initializes a dynptr to point to a bpf program's local memory. For now only local memory that is of reg type PTR_TO_MAP_VALUE is supported. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220523210712.3641569-3-joannelkoong@gmail.com
2022-05-13bpf: Add MEM_UNINIT as a bpf_type_flagJoanne Koong
Instead of having uninitialized versions of arguments as separate bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag modifier to denote that the argument is uninitialized. Doing so cleans up some of the logic in the verifier. We no longer need to do two checks against an argument type (eg "if (base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) == ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized versions of the same argument type will now share the same base type. In the near future, MEM_UNINIT will be used by dynptr helper functions as well. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20220509224257.3222614-2-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-11bpf: add bpf_map_lookup_percpu_elem for percpu mapFeng Zhou
Add new ebpf helpers bpf_map_lookup_percpu_elem. The implementation method is relatively simple, refer to the implementation method of map_lookup_elem of percpu map, increase the parameters of cpu, and obtain it according to the specified cpu. Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-04-25bpf: Allow storing referenced kptr in mapKumar Kartikeya Dwivedi
Extending the code in previous commits, introduce referenced kptr support, which needs to be tagged using 'kptr_ref' tag instead. Unlike unreferenced kptr, referenced kptr have a lot more restrictions. In addition to the type matching, only a newly introduced bpf_kptr_xchg helper is allowed to modify the map value at that offset. This transfers the referenced pointer being stored into the map, releasing the references state for the program, and returning the old value and creating new reference state for the returned pointer. Similar to unreferenced pointer case, return value for this case will also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer must either be eventually released by calling the corresponding release function, otherwise it must be transferred into another map. It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear the value, and obtain the old value if any. BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future commit will permit using BPF_LDX for such pointers, but attempt at making it safe, since the lifetime of object won't be guaranteed. There are valid reasons to enforce the restriction of permitting only bpf_kptr_xchg to operate on referenced kptr. The pointer value must be consistent in face of concurrent modification, and any prior values contained in the map must also be released before a new one is moved into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg returns the old value, which the verifier would require the user to either free or move into another map, and releases the reference held for the pointer being moved in. In the future, direct BPF_XCHG instruction may also be permitted to work like bpf_kptr_xchg helper. Note that process_kptr_func doesn't have to call check_helper_mem_access, since we already disallow rdonly/wronly flags for map, which is what check_map_access_type checks, and we already ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc, so check_map_access is also not required. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-4-memxor@gmail.com
2022-03-07bpf: Replace strncpy() with strscpy()Yuntao Wang
Using strncpy() on NUL-terminated strings is considered deprecated[1]. Moreover, if the length of 'task->comm' is less than the destination buffer size, strncpy() will NUL-pad the destination buffer, which is a needless performance penalty. Replacing strncpy() with strscpy() fixes all these issues. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304070408.233658-1-ytcoode@gmail.com
2022-02-23bpf: Cleanup commentsTom Rix
Add leading space to spdx tag Use // for spdx c file comment Replacements resereved to reserved inbetween to in between everytime to every time intutivie to intuitive currenct to current encontered to encountered referenceing to referencing upto to up to exectuted to executed Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220220184055.3608317-1-trix@redhat.com
2022-02-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Fast path bpf marge for some -next work. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-11bpf: Emit bpf_timer in vmlinux BTFYonghong Song
Currently the following code in check_and_init_map_value() *(struct bpf_timer *)(dst + map->timer_off) = (struct bpf_timer){}; can help generate bpf_timer definition in vmlinuxBTF. But the code above may not zero the whole structure due to anonymour members and that code will be replaced by memset in the subsequent patch and bpf_timer definition will disappear from vmlinuxBTF. Let us emit the type explicitly so bpf program can continue to use it from vmlinux.h. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220211194948.3141529-1-yhs@fb.com
2022-01-31bpf: make bpf_copy_from_user_task() gpl onlyKenta Tada
access_process_vm() is exported by EXPORT_SYMBOL_GPL(). Signed-off-by: Kenta Tada <Kenta.Tada@sony.com> Link: https://lore.kernel.org/r/20220128170906.21154-1-Kenta.Tada@sony.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-24bpf: Add bpf_copy_from_user_task() helperKenny Yu
This adds a helper for bpf programs to read the memory of other tasks. As an example use case at Meta, we are using a bpf task iterator program and this new helper to print C++ async stack traces for all threads of a given process. Signed-off-by: Kenny Yu <kennyyu@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220124185403.468466-3-kennyyu@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-12-18bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.Hao Luo
Some helper functions may modify its arguments, for example, bpf_d_path, bpf_get_stack etc. Previously, their argument types were marked as ARG_PTR_TO_MEM, which is compatible with read-only mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate, but technically incorrect, to modify a read-only memory by passing it into one of such helper functions. This patch tags the bpf_args compatible with immutable memory with MEM_RDONLY flag. The arguments that don't have this flag will be only compatible with mutable memory types, preventing the helper from modifying a read-only memory. The bpf_args that have MEM_RDONLY are compatible with both mutable memory and immutable memory. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
2021-12-18bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.Hao Luo
Tag the return type of {per, this}_cpu_ptr with RDONLY_MEM. The returned value of this pair of helpers is kernel object, which can not be updated by bpf programs. Previously these two helpers return PTR_OT_MEM for kernel objects of scalar type, which allows one to directly modify the memory. Now with RDONLY_MEM tagging, the verifier will reject programs that write into RDONLY_MEM. Fixes: 63d9b80dcf2c ("bpf: Introducte bpf_this_cpu_ptr()") Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()") Fixes: 4976b718c355 ("bpf: Introduce pseudo_btf_id") Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-8-haoluo@google.com
2021-12-18bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULLHao Luo
We have introduced a new type to make bpf_ret composable, by reserving high bits to represent flags. One of the flag is PTR_MAYBE_NULL, which indicates a pointer may be NULL. When applying this flag to ret_types, it means the returned value could be a NULL pointer. This patch switches the qualified arg_types to use this flag. The ret_types changed in this patch include: 1. RET_PTR_TO_MAP_VALUE_OR_NULL 2. RET_PTR_TO_SOCKET_OR_NULL 3. RET_PTR_TO_TCP_SOCK_OR_NULL 4. RET_PTR_TO_SOCK_COMMON_OR_NULL 5. RET_PTR_TO_ALLOC_MEM_OR_NULL 6. RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL 7. RET_PTR_TO_BTF_ID_OR_NULL This patch doesn't eliminate the use of these names, instead it makes them aliases to 'RET_PTR_TO_XXX | PTR_MAYBE_NULL'. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-4-haoluo@google.com
2021-12-16add missing bpf-cgroup.h includesJakub Kicinski
We're about to break the cgroup-defs.h -> bpf-cgroup.h dependency, make sure those who actually need more than the definition of struct cgroup_bpf include bpf-cgroup.h explicitly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20211216025538.1649516-3-kuba@kernel.org
2021-12-11bpf: Add bpf_strncmp helperHou Tao
The helper compares two strings: one string is a null-terminated read-only string, and another string has const max storage size but doesn't need to be null-terminated. It can be used to compare file name in tracing or LSM program. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211210141652.877186-2-houtao1@huawei.com
2021-12-10Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Andrii Nakryiko says: ==================== bpf-next 2021-12-10 v2 We've added 115 non-merge commits during the last 26 day(s) which contain a total of 182 files changed, 5747 insertions(+), 2564 deletions(-). The main changes are: 1) Various samples fixes, from Alexander Lobakin. 2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov. 3) A batch of new unified APIs for libbpf, logging improvements, version querying, etc. Also a batch of old deprecations for old APIs and various bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko. 4) BPF documentation reorganization and improvements, from Christoph Hellwig and Dave Tucker. 5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in libbpf, from Hengqi Chen. 6) Verifier log fixes, from Hou Tao. 7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong. 8) Extend branch record capturing to all platforms that support it, from Kajol Jain. 9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi. 10) bpftool doc-generating script improvements, from Quentin Monnet. 11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet. 12) Deprecation warning fix for perf/bpf_counter, from Song Liu. 13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf, from Tiezhu Yang. 14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song. 15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun, and others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits) libbpf: Add "bool skipped" to struct bpf_map libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition bpftool: Switch bpf_object__load_xattr() to bpf_object__load() selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr() selftests/bpf: Add test for libbpf's custom log_buf behavior selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load() libbpf: Deprecate bpf_object__load_xattr() libbpf: Add per-program log buffer setter and getter libbpf: Preserve kernel error code and remove kprobe prog type guessing libbpf: Improve logging around BPF program loading libbpf: Allow passing user log setting through bpf_object_open_opts libbpf: Allow passing preallocated log_buf when loading BTF into kernel libbpf: Add OPTS-based bpf_btf_load() API libbpf: Fix bpf_prog_load() log_buf logic for log_level 0 samples/bpf: Remove unneeded variable bpf: Remove redundant assignment to pointer t selftests/bpf: Fix a compilation warning perf/bpf_counter: Use bpf_map_create instead of bpf_create_map samples: bpf: Fix 'unknown warning group' build warning on Clang samples: bpf: Fix xdp_sample_user.o linking with Clang ... ==================== Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-30bpf: Add bpf_loop helperJoanne Koong
This patch adds the kernel-side and API changes for a new helper function, bpf_loop: long bpf_loop(u32 nr_loops, void *callback_fn, void *callback_ctx, u64 flags); where long (*callback_fn)(u32 index, void *ctx); bpf_loop invokes the "callback_fn" **nr_loops** times or until the callback_fn returns 1. The callback_fn can only return 0 or 1, and this is enforced by the verifier. The callback_fn index is zero-indexed. A few things to please note: ~ The "u64 flags" parameter is currently unused but is included in case a future use case for it arises. ~ In the kernel-side implementation of bpf_loop (kernel/bpf/bpf_iter.c), bpf_callback_t is used as the callback function cast. ~ A program can have nested bpf_loop calls but the program must still adhere to the verifier constraint of its stack depth (the stack depth cannot exceed MAX_BPF_STACK)) ~ Recursive callback_fns do not pass the verifier, due to the call stack for these being too deep. ~ The next patch will include the tests and benchmark Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211130030622.4131246-2-joannekoong@fb.com
2021-11-15bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progsDmitrii Banshchikov
Use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in tracing progs may result in locking issues. bpf_ktime_get_coarse_ns() uses ktime_get_coarse_ns() time accessor that isn't safe for any context: ====================================================== WARNING: possible circular locking dependency detected 5.15.0-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.4/14877 is trying to acquire lock: ffffffff8cb30008 (tk_core.seq.seqcount){----}-{0:0}, at: ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255 but task is already holding lock: ffffffff90dbf200 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_deactivate+0x61/0x400 lib/debugobjects.c:735 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&obj_hash[i].lock){-.-.}-{2:2}: lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd1/0x120 kernel/locking/spinlock.c:162 __debug_object_init+0xd9/0x1860 lib/debugobjects.c:569 debug_hrtimer_init kernel/time/hrtimer.c:414 [inline] debug_init kernel/time/hrtimer.c:468 [inline] hrtimer_init+0x20/0x40 kernel/time/hrtimer.c:1592 ntp_init_cmos_sync kernel/time/ntp.c:676 [inline] ntp_init+0xa1/0xad kernel/time/ntp.c:1095 timekeeping_init+0x512/0x6bf kernel/time/timekeeping.c:1639 start_kernel+0x267/0x56e init/main.c:1030 secondary_startup_64_no_verify+0xb1/0xbb -> #0 (tk_core.seq.seqcount){----}-{0:0}: check_prev_add kernel/locking/lockdep.c:3051 [inline] check_prevs_add kernel/locking/lockdep.c:3174 [inline] validate_chain+0x1dfb/0x8240 kernel/locking/lockdep.c:3789 __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5015 lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625 seqcount_lockdep_reader_access+0xfe/0x230 include/linux/seqlock.h:103 ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255 ktime_get_coarse include/linux/timekeeping.h:120 [inline] ktime_get_coarse_ns include/linux/timekeeping.h:126 [inline] ____bpf_ktime_get_coarse_ns kernel/bpf/helpers.c:173 [inline] bpf_ktime_get_coarse_ns+0x7e/0x130 kernel/bpf/helpers.c:171 bpf_prog_a99735ebafdda2f1+0x10/0xb50 bpf_dispatcher_nop_func include/linux/bpf.h:721 [inline] __bpf_prog_run include/linux/filter.h:626 [inline] bpf_prog_run include/linux/filter.h:633 [inline] BPF_PROG_RUN_ARRAY include/linux/bpf.h:1294 [inline] trace_call_bpf+0x2cf/0x5d0 kernel/trace/bpf_trace.c:127 perf_trace_run_bpf_submit+0x7b/0x1d0 kernel/events/core.c:9708 perf_trace_lock+0x37c/0x440 include/trace/events/lock.h:39 trace_lock_release+0x128/0x150 include/trace/events/lock.h:58 lock_release+0x82/0x810 kernel/locking/lockdep.c:5636 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:149 [inline] _raw_spin_unlock_irqrestore+0x75/0x130 kernel/locking/spinlock.c:194 debug_hrtimer_deactivate kernel/time/hrtimer.c:425 [inline] debug_deactivate kernel/time/hrtimer.c:481 [inline] __run_hrtimer kernel/time/hrtimer.c:1653 [inline] __hrtimer_run_queues+0x2f9/0xa60 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline] _raw_spin_unlock_irqrestore+0xd4/0x130 kernel/locking/spinlock.c:194 try_to_wake_up+0x702/0xd20 kernel/sched/core.c:4118 wake_up_process kernel/sched/core.c:4200 [inline] wake_up_q+0x9a/0xf0 kernel/sched/core.c:953 futex_wake+0x50f/0x5b0 kernel/futex/waitwake.c:184 do_futex+0x367/0x560 kernel/futex/syscalls.c:127 __do_sys_futex kernel/futex/syscalls.c:199 [inline] __se_sys_futex+0x401/0x4b0 kernel/futex/syscalls.c:180 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae There is a possible deadlock with bpf_timer_* set of helpers: hrtimer_start() lock_base(); trace_hrtimer...() perf_event() bpf_run() bpf_timer_start() hrtimer_start() lock_base() <- DEADLOCK Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT and BPF_PROG_TYPE_RAW_TRACEPOINT prog types. Fixes: d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper") Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Reported-by: syzbot+43fd005b5a1b4d10781e@syzkaller.appspotmail.com Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211113142227.566439-2-me@ubique.spb.ru
2021-09-28bpf: Replace callers of BPF_CAST_CALL with proper function typedefKees Cook
In order to keep ahead of cases in the kernel where Control Flow Integrity (CFI) may trip over function call casts, enabling -Wcast-function-type is helpful. To that end, BPF_CAST_CALL causes various warnings and is one of the last places in the kernel triggering this warning. For actual function calls, replace BPF_CAST_CALL() with a typedef, which captures the same details about the given function pointers. This change results in no object code difference. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://github.com/KSPP/linux/issues/20 Link: https://lore.kernel.org/lkml/CAEf4Bzb46=-J5Fxc3mMZ8JQPtK1uoE0q6+g6WPz53Cvx=CBEhw@mail.gmail.com Link: https://lore.kernel.org/bpf/20210928230946.4062144-3-keescook@chromium.org