summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2020-08-25selftests/bpf: Add set test to resolve_btfidsJiri Olsa
Adding test to for sets resolve_btfids. We're checking that testing set gets properly resolved and sorted. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-15-jolsa@kernel.org
2020-08-25selftests/bpf: Add test for d_path helperJiri Olsa
Adding test for d_path helper which is pretty much copied from Wenbo Zhang's test for bpf_get_fd_path, which never made it in. The test is doing fstat/close on several fd types, and verifies we got the d_path helper working on kernel probes for vfs_getattr/filp_close functions. Original-patch-by: Wenbo Zhang <ethercflow@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-14-jolsa@kernel.org
2020-08-25selftests/bpf: Add verifier test for d_path helperJiri Olsa
Adding verifier test for attaching tracing program and calling d_path helper from within and testing that it's allowed for dentry_open function and denied for 'd_path' function with appropriate error. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-13-jolsa@kernel.org
2020-08-25bpf: Add d_path helperJiri Olsa
Adding d_path helper function that returns full path for given 'struct path' object, which needs to be the kernel BTF 'path' object. The path is returned in buffer provided 'buf' of size 'sz' and is zero terminated. bpf_d_path(&file->f_path, buf, size); The helper calls directly d_path function, so there's only limited set of function it can be called from. Adding just very modest set for the start. Updating also bpf.h tools uapi header and adding 'path' to bpf_helpers_doc.py script. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-11-jolsa@kernel.org
2020-08-25bpf: Add BTF_SET_START/END macrosJiri Olsa
Adding support to define sorted set of BTF ID values. Following defines sorted set of BTF ID values: BTF_SET_START(btf_allowlist_d_path) BTF_ID(func, vfs_truncate) BTF_ID(func, vfs_fallocate) BTF_ID(func, dentry_open) BTF_ID(func, vfs_getattr) BTF_ID(func, filp_close) BTF_SET_END(btf_allowlist_d_path) It defines following 'struct btf_id_set' variable to access values and count: struct btf_id_set btf_allowlist_d_path; Adding 'allowed' callback to struct bpf_func_proto, to allow verifier the check on allowed callers. Adding btf_id_set_contains function, which will be used by allowed callbacks to verify the caller's BTF ID value is within allowed set. Also removing extra '\' in __BTF_ID_LIST macro. Added BTF_SET_START_GLOBAL macro for global sets. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-10-jolsa@kernel.org
2020-08-25tools resolve_btfids: Add support for set symbolsJiri Olsa
The set symbol does not have the unique number suffix, so we need to give it a special parsing function. This was omitted in the first batch, because there was no set support yet, so it slipped in the testing. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-3-jolsa@kernel.org
2020-08-25tools resolve_btfids: Add size check to get_id functionJiri Olsa
To make sure we don't crash on malformed symbols. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825192124.710397-2-jolsa@kernel.org
2020-08-25bpf: Add selftests for local_storageKP Singh
inode_local_storage: * Hook to the file_open and inode_unlink LSM hooks. * Create and unlink a temporary file. * Store some information in the inode's bpf_local_storage during file_open. * Verify that this information exists when the file is unlinked. sk_local_storage: * Hook to the socket_post_create and socket_bind LSM hooks. * Open and bind a socket and set the sk_storage in the socket_post_create hook using the start_server helper. * Verify if the information is set in the socket_bind hook. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-8-kpsingh@chromium.org
2020-08-25bpf: Allow local storage to be used from LSM programsKP Singh
Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used in LSM programs. These helpers are not used for tracing programs (currently) as their usage is tied to the life-cycle of the object and should only be used where the owning object won't be freed (when the owning object is passed as an argument to the LSM hook). Thus, they are safer to use in LSM hooks than tracing. Usage of local storage in tracing programs will probably follow a per function based whitelist approach. Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock, it, leads to a compilation warning for LSM programs, it's also updated to accept a void * pointer instead. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-7-kpsingh@chromium.org
2020-08-25bpf: Implement bpf_local_storage for inodesKP Singh
Similar to bpf_local_storage for sockets, add local storage for inodes. The life-cycle of storage is managed with the life-cycle of the inode. i.e. the storage is destroyed along with the owning inode. The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the security blob which are now stackable and can co-exist with other LSMs. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200825182919.1118197-6-kpsingh@chromium.org
2020-08-25bpf: Generalize bpf_sk_storageKP Singh
Refactor the functionality in bpf_sk_storage.c so that concept of storage linked to kernel objects can be extended to other objects like inode, task_struct etc. Each new local storage will still be a separate map and provide its own set of helpers. This allows for future object specific extensions and still share a lot of the underlying implementation. This includes the changes suggested by Martin in: https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/ adding new map operations to support bpf_local_storage maps: * storages for different kernel objects to optionally have different memory charging strategy (map_local_storage_charge, map_local_storage_uncharge) * Functionality to extract the storage pointer from a pointer to the owning object (map_owner_storage_ptr) Co-developed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200825182919.1118197-4-kpsingh@chromium.org
2020-08-25bpf: Renames in preparation for bpf_local_storageKP Singh
A purely mechanical change to split the renaming from the actual generalization. Flags/consts: SK_STORAGE_CREATE_FLAG_MASK BPF_LOCAL_STORAGE_CREATE_FLAG_MASK BPF_SK_STORAGE_CACHE_SIZE BPF_LOCAL_STORAGE_CACHE_SIZE MAX_VALUE_SIZE BPF_LOCAL_STORAGE_MAX_VALUE_SIZE Structs: bucket bpf_local_storage_map_bucket bpf_sk_storage_map bpf_local_storage_map bpf_sk_storage_data bpf_local_storage_data bpf_sk_storage_elem bpf_local_storage_elem bpf_sk_storage bpf_local_storage The "sk" member in bpf_local_storage is also updated to "owner" in preparation for changing the type to void * in a subsequent patch. Functions: selem_linked_to_sk selem_linked_to_storage selem_alloc bpf_selem_alloc __selem_unlink_sk bpf_selem_unlink_storage_nolock __selem_link_sk bpf_selem_link_storage_nolock selem_unlink_sk __bpf_selem_unlink_storage sk_storage_update bpf_local_storage_update __sk_storage_lookup bpf_local_storage_lookup bpf_sk_storage_map_free bpf_local_storage_map_free bpf_sk_storage_map_alloc bpf_local_storage_map_alloc bpf_sk_storage_map_alloc_check bpf_local_storage_map_alloc_check bpf_sk_storage_map_check_btf bpf_local_storage_map_check_btf Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-2-kpsingh@chromium.org
2020-08-24selftests/bpf: Enable tc verbose mode for test_sk_assignYonghong Song
Currently test_sk_assign failed verifier with llvm11/llvm12. During debugging, I found the default verifier output is truncated like below Verifier analysis: Skipped 2200 bytes, use 'verb' option for the full verbose log. [...] off=23,r=34,imm=0) R5=inv0 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=0,off=0,r=34,imm=0) R10=fp0 80: (0f) r7 += r2 last_idx 80 first_idx 21 regs=4 stack=0 before 78: (16) if w3 == 0x11 goto pc+1 when I am using "./test_progs -vv -t assign". The reason is tc verbose mode is not enabled. This patched enabled tc verbose mode and the output looks like below Verifier analysis: 0: (bf) r6 = r1 1: (b4) w0 = 2 2: (61) r1 = *(u32 *)(r6 +80) 3: (61) r7 = *(u32 *)(r6 +76) 4: (bf) r2 = r7 5: (07) r2 += 14 6: (2d) if r2 > r1 goto pc+61 R0_w=inv2 R1_w=pkt_end(id=0,off=0,imm=0) R2_w=pkt(id=0,off=14,r=14,imm=0) ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200824222807.100200-1-yhs@fb.com
2020-08-24selftests/bpf: Fix test_progs-flavor run getting number of testsJesper Dangaard Brouer
Commit 643e7233aa94 ("selftests/bpf: Test_progs option for getting number of tests") introduced ability to getting number of tests, which is targeted towards scripting. As demonstrate in the commit the number can be use as a shell variable for further scripting. The test_progs program support "flavor", which is detected by the binary have a "-flavor" in the executable name. One example is test_progs-no_alu32, which load bpf-progs compiled with disabled alu32, located in dir 'no_alu32/'. The problem is that invoking a "flavor" binary prints to stdout e.g.: "Switching to flavor 'no_alu32' subdirectory..." Thus, intermixing with the number of tests, making it unusable for scripting. Fix the issue by only printing "flavor" info when verbose -v option is used. Fixes: 643e7233aa94 ("selftests/bpf: Test_progs option for getting number of tests") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/159827024012.923543.7104106594870150597.stgit@firesoul
2020-08-24selftests: mlxsw: Reduce runtime of tc-police scale testIdo Schimmel
Currently, the test takes about 626 seconds to complete because of an inefficient use of the device's TCAM. Reduce the runtime to 202 seconds by inserting all the flower filters with the same preference and mask, but with a different key. In particular, this reduces the deletion of the qdisc (which triggers the deletion of all the filters) from 66 seconds to 0.2 seconds. This prevents various netlink requests from user space applications (e.g., systemd-networkd) from timing-out because RTNL is not held for too long anymore. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24selftests: forwarding: Fix mausezahn delay parameter in mirror_test()Danielle Ratson
Currently, mausezahn delay parameter in mirror_test() is specified with 'ms' units. mausezahn versions before 0.6.5 interpret 'ms' as seconds and therefore the tests that use mirror_test() take a very long time to complete. Resolve this by specifying 'msec' units. Signed-off-by: Danielle Ratson <danieller@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24selftests: mlxsw: Increase burst size for burst testIdo Schimmel
The current combination of rate and burst size does not adhere to Spectrum-{2,3} limitation which states that the minimum burst size should be 40% of the rate. Increase the burst size in order to honor above mentioned limitation and avoid intermittent failures of this test case on Spectrum-{2,3}. Remove the first sub-test case as the variation in number of received packets is simply too large to reliably test it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24selftests: mlxsw: Increase burst size for rate testIdo Schimmel
The current combination of rate and burst size does not adhere to Spectrum-{2,3} limitation which states that the minimum burst size should be 40% of the rate. Increase the burst size in order to honor above mentioned limitation and avoid intermittent failures of this test case on Spectrum-{2,3}. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24selftests: mlxsw: Decrease required rate accuracyIdo Schimmel
On Spectrum-{2,3} the required accuracy is +/-10%. Align the test to this requirement so that it can reliably pass on these platforms. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24selftests: bpf: Fix sockmap update nitsLorenz Bauer
Address review by Yonghong, to bring the new tests in line with the usual code style. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200824084523.13104-1-lmb@cloudflare.com
2020-08-24libbpf: Fix type compatibility check copy-paste errorAndrii Nakryiko
Fix copy-paste error in types compatibility check. Local type is accidentally used instead of target type for the very first type check strictness check. This can result in potentially less strict candidate comparison. Fix the error. Fixes: 3fc32f40c402 ("libbpf: Implement type-based CO-RE relocations support") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200821225653.2180782-1-andriin@fb.com
2020-08-24libbpf: Avoid false unuinitialized variable warning in bpf_core_apply_reloAndrii Nakryiko
Some versions of GCC report uninitialized targ_spec usage. GCC is wrong, but let's avoid unnecessary warnings. Fixes: ddc7c3042614 ("libbpf: implement BPF CO-RE offset relocation algorithm") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200821225556.2178419-1-andriin@fb.com
2020-08-24tcp: bpf: Optionally store mac header in TCP_SAVE_SYNMartin KaFai Lau
This patch is adapted from Eric's patch in an earlier discussion [1]. The TCP_SAVE_SYN currently only stores the network header and tcp header. This patch allows it to optionally store the mac header also if the setsockopt's optval is 2. It requires one more bit for the "save_syn" bit field in tcp_sock. This patch achieves this by moving the syn_smc bit next to the is_mptcp. The syn_smc is currently used with the TCP experimental option. Since syn_smc is only used when CONFIG_SMC is enabled, this patch also puts the "IS_ENABLED(CONFIG_SMC)" around it like the is_mptcp did with "IS_ENABLED(CONFIG_MPTCP)". The mac_hdrlen is also stored in the "struct saved_syn" to allow a quick offset from the bpf prog if it chooses to start getting from the network header or the tcp header. [1]: https://lore.kernel.org/netdev/CANn89iLJNWh6bkH7DNhy_kmcAexuUCccqERqe7z2QsvPhGrYPQ@mail.gmail.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190123.2886935-1-kafai@fb.com
2020-08-24bpf: selftests: Tcp header optionsMartin KaFai Lau
This patch adds tests for the new bpf tcp header option feature. test_tcp_hdr_options.c: - It tests header option writing and parsing in 3WHS: regular connection establishment, fastopen, and syncookie. - In syncookie, the passive side's bpf prog is asking the active side to resend its bpf header option by specifying a RESEND bit in the outgoing SYNACK. handle_active_estab() and write_nodata_opt() has some details. - handle_passive_estab() has comments on fastopen. - It also has test for header writing and parsing in FIN packet. - Most of the tests is writing an experimental option 254 with magic 0xeB9F. - The no_exprm_estab() also tests writing a regular TCP option without any magic. test_misc_tcp_options.c: - It is an one directional test. Active side writes option and passive side parses option. The focus is to exercise the new helpers and API. - Testing the new helper: bpf_load_hdr_opt() and bpf_store_hdr_opt(). - Testing the bpf_getsockopt(TCP_BPF_SYN). - Negative tests for the above helpers. - Testing the sock_ops->skb_data. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190117.2886749-1-kafai@fb.com
2020-08-24bpf: selftests: Add fastopen_connect to network_helpersMartin KaFai Lau
This patch adds a fastopen_connect() helper which will be used in a later test. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190111.2886196-1-kafai@fb.com
2020-08-24bpf: tcp: Allow bpf prog to write and parse TCP header optionMartin KaFai Lau
[ Note: The TCP changes here is mainly to implement the bpf pieces into the bpf_skops_*() functions introduced in the earlier patches. ] The earlier effort in BPF-TCP-CC allows the TCP Congestion Control algorithm to be written in BPF. It opens up opportunities to allow a faster turnaround time in testing/releasing new congestion control ideas to production environment. The same flexibility can be extended to writing TCP header option. It is not uncommon that people want to test new TCP header option to improve the TCP performance. Another use case is for data-center that has a more controlled environment and has more flexibility in putting header options for internal only use. For example, we want to test the idea in putting maximum delay ACK in TCP header option which is similar to a draft RFC proposal [1]. This patch introduces the necessary BPF API and use them in the TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse and write TCP header options. It currently supports most of the TCP packet except RST. Supported TCP header option: ─────────────────────────── This patch allows the bpf-prog to write any option kind. Different bpf-progs can write its own option by calling the new helper bpf_store_hdr_opt(). The helper will ensure there is no duplicated option in the header. By allowing bpf-prog to write any option kind, this gives a lot of flexibility to the bpf-prog. Different bpf-prog can write its own option kind. It could also allow the bpf-prog to support a recently standardized option on an older kernel. Sockops Callback Flags: ────────────────────── The bpf program will only be called to parse/write tcp header option if the following newly added callback flags are enabled in tp->bpf_sock_ops_cb_flags: BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG A few words on the PARSE CB flags. When the above PARSE CB flags are turned on, the bpf-prog will be called on packets received at a sk that has at least reached the ESTABLISHED state. The parsing of the SYN-SYNACK-ACK will be discussed in the "3 Way HandShake" section. The default is off for all of the above new CB flags, i.e. the bpf prog will not be called to parse or write bpf hdr option. There are details comment on these new cb flags in the UAPI bpf.h. sock_ops->skb_data and bpf_load_hdr_opt() ───────────────────────────────────────── sock_ops->skb_data and sock_ops->skb_data_end covers the whole TCP header and its options. They are read only. The new bpf_load_hdr_opt() helps to read a particular option "kind" from the skb_data. Please refer to the comment in UAPI bpf.h. It has details on what skb_data contains under different sock_ops->op. 3 Way HandShake ─────────────── The bpf-prog can learn if it is sending SYN or SYNACK by reading the sock_ops->skb_tcp_flags. * Passive side When writing SYNACK (i.e. sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB), the received SYN skb will be available to the bpf prog. The bpf prog can use the SYN skb (which may carry the header option sent from the remote bpf prog) to decide what bpf header option should be written to the outgoing SYNACK skb. The SYN packet can be obtained by getsockopt(TCP_BPF_SYN*). More on this later. Also, the bpf prog can learn if it is in syncookie mode (by checking sock_ops->args[0] == BPF_WRITE_HDR_TCP_SYNACK_COOKIE). The bpf prog can store the received SYN pkt by using the existing bpf_setsockopt(TCP_SAVE_SYN). The example in a later patch does it. [ Note that the fullsock here is a listen sk, bpf_sk_storage is not very useful here since the listen sk will be shared by many concurrent connection requests. Extending bpf_sk_storage support to request_sock will add weight to the minisock and it is not necessary better than storing the whole ~100 bytes SYN pkt. ] When the connection is established, the bpf prog will be called in the existing PASSIVE_ESTABLISHED_CB callback. At that time, the bpf prog can get the header option from the saved syn and then apply the needed operation to the newly established socket. The later patch will use the max delay ack specified in the SYN header and set the RTO of this newly established connection as an example. The received ACK (that concludes the 3WHS) will also be available to the bpf prog during PASSIVE_ESTABLISHED_CB through the sock_ops->skb_data. It could be useful in syncookie scenario. More on this later. There is an existing getsockopt "TCP_SAVED_SYN" to return the whole saved syn pkt which includes the IP[46] header and the TCP header. A few "TCP_BPF_SYN*" getsockopt has been added to allow specifying where to start getting from, e.g. starting from TCP header, or from IP[46] header. The new getsockopt(TCP_BPF_SYN*) will also know where it can get the SYN's packet from: - (a) the just received syn (available when the bpf prog is writing SYNACK) and it is the only way to get SYN during syncookie mode. or - (b) the saved syn (available in PASSIVE_ESTABLISHED_CB and also other existing CB). The bpf prog does not need to know where the SYN pkt is coming from. The getsockopt(TCP_BPF_SYN*) will hide this details. Similarly, a flags "BPF_LOAD_HDR_OPT_TCP_SYN" is also added to bpf_load_hdr_opt() to read a particular header option from the SYN packet. * Fastopen Fastopen should work the same as the regular non fastopen case. This is a test in a later patch. * Syncookie For syncookie, the later example patch asks the active side's bpf prog to resend the header options in ACK. The server can use bpf_load_hdr_opt() to look at the options in this received ACK during PASSIVE_ESTABLISHED_CB. * Active side The bpf prog will get a chance to write the bpf header option in the SYN packet during WRITE_HDR_OPT_CB. The received SYNACK pkt will also be available to the bpf prog during the existing ACTIVE_ESTABLISHED_CB callback through the sock_ops->skb_data and bpf_load_hdr_opt(). * Turn off header CB flags after 3WHS If the bpf prog does not need to write/parse header options beyond the 3WHS, the bpf prog can clear the bpf_sock_ops_cb_flags to avoid being called for header options. Or the bpf-prog can select to leave the UNKNOWN_HDR_OPT_CB_FLAG on so that the kernel will only call it when there is option that the kernel cannot handle. [1]: draft-wang-tcpm-low-latency-opt-00 https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190104.2885895-1-kafai@fb.com
2020-08-24bpf: tcp: Add bpf_skops_hdr_opt_len() and bpf_skops_write_hdr_opt()Martin KaFai Lau
The bpf prog needs to parse the SYN header to learn what options have been sent by the peer's bpf-prog before writing its options into SYNACK. This patch adds a "syn_skb" arg to tcp_make_synack() and send_synack(). This syn_skb will eventually be made available (as read-only) to the bpf prog. This will be the only SYN packet available to the bpf prog during syncookie. For other regular cases, the bpf prog can also use the saved_syn. When writing options, the bpf prog will first be called to tell the kernel its required number of bytes. It is done by the new bpf_skops_hdr_opt_len(). The bpf prog will only be called when the new BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG is set in tp->bpf_sock_ops_cb_flags. When the bpf prog returns, the kernel will know how many bytes are needed and then update the "*remaining" arg accordingly. 4 byte alignment will be included in the "*remaining" before this function returns. The 4 byte aligned number of bytes will also be stored into the opts->bpf_opt_len. "bpf_opt_len" is a newly added member to the struct tcp_out_options. Then the new bpf_skops_write_hdr_opt() will call the bpf prog to write the header options. The bpf prog is only called if it has reserved spaces before (opts->bpf_opt_len > 0). The bpf prog is the last one getting a chance to reserve header space and writing the header option. These two functions are half implemented to highlight the changes in TCP stack. The actual codes preparing the bpf running context and invoking the bpf prog will be added in the later patch with other necessary bpf pieces. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190052.2885316-1-kafai@fb.com
2020-08-24bpf: tcp: Add bpf_skops_parse_hdr()Martin KaFai Lau
The patch adds a function bpf_skops_parse_hdr(). It will call the bpf prog to parse the TCP header received at a tcp_sock that has at least reached the ESTABLISHED state. For the packets received during the 3WHS (SYN, SYNACK and ACK), the received skb will be available to the bpf prog during the callback in bpf_skops_established() introduced in the previous patch and in the bpf_skops_write_hdr_opt() that will be added in the next patch. Calling bpf prog to parse header is controlled by two new flags in tp->bpf_sock_ops_cb_flags: BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG and BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG. When BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG is set, the bpf prog will only be called when there is unknown option in the TCP header. When BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG is set, the bpf prog will be called on all received TCP header. This function is half implemented to highlight the changes in TCP stack. The actual codes preparing the bpf running context and invoking the bpf prog will be added in the later patch with other necessary bpf pieces. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190046.2885054-1-kafai@fb.com
2020-08-24tcp: bpf: Add TCP_BPF_RTO_MIN for bpf_setsockoptMartin KaFai Lau
This patch adds bpf_setsockopt(TCP_BPF_RTO_MIN) to allow bpf prog to set the min rto of a connection. It could be used together with the earlier patch which has added bpf_setsockopt(TCP_BPF_DELACK_MAX). A later selftest patch will communicate the max delay ack in a bpf tcp header option and then the receiving side can use bpf_setsockopt(TCP_BPF_RTO_MIN) to set a shorter rto. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190027.2884170-1-kafai@fb.com
2020-08-24tcp: bpf: Add TCP_BPF_DELACK_MAX setsockoptMartin KaFai Lau
This change is mostly from an internal patch and adapts it from sysctl config to the bpf_setsockopt setup. The bpf_prog can set the max delay ack by using bpf_setsockopt(TCP_BPF_DELACK_MAX). This max delay ack can be communicated to its peer through bpf header option. The receiving peer can then use this max delay ack and set a potentially lower rto by using bpf_setsockopt(TCP_BPF_RTO_MIN) which will be introduced in the next patch. Another later selftest patch will also use it like the above to show how to write and parse bpf tcp header option. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190021.2884000-1-kafai@fb.com
2020-08-24selftests/powerpc: Update PROT_SAO test to skip ISA 3.1Shawn Anastasio
Since SAO support was removed from ISA 3.1, skip the prot_sao test if PPC_FEATURE2_ARCH_3_1 is set. Signed-off-by: Shawn Anastasio <shawn@anastas.io> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200821185558.35561-4-shawn@anastas.io
2020-08-24Revert "powerpc/64s: Remove PROT_SAO support"Shawn Anastasio
This reverts commit 5c9fa16e8abd342ce04dc830c1ebb2a03abf6c05. Since PROT_SAO can still be useful for certain classes of software, reintroduce it. Concerns about guest migration for LPARs using SAO will be addressed next. Signed-off-by: Shawn Anastasio <shawn@anastas.io> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200821185558.35561-2-shawn@anastas.io
2020-08-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
2020-08-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "Nothing earth shattering here, lots of small fixes (f.e. missing RCU protection, bad ref counting, missing memset(), etc.) all over the place: 1) Use get_file_rcu() in task_file iterator, from Yonghong Song. 2) There are two ways to set remote source MAC addresses in macvlan driver, but only one of which validates things properly. Fix this. From Alvin Šipraga. 3) Missing of_node_put() in gianfar probing, from Sumera Priyadarsini. 4) Preserve device wanted feature bits across multiple netlink ethtool requests, from Maxim Mikityanskiy. 5) Fix rcu_sched stall in task and task_file bpf iterators, from Yonghong Song. 6) Avoid reset after device destroy in ena driver, from Shay Agroskin. 7) Missing memset() in netlink policy export reallocation path, from Johannes Berg. 8) Fix info leak in __smc_diag_dump(), from Peilin Ye. 9) Decapsulate ECN properly for ipv6 in ipv4 tunnels, from Mark Tomlinson. 10) Fix number of data stream negotiation in SCTP, from David Laight. 11) Fix double free in connection tracker action module, from Alaa Hleihel. 12) Don't allow empty NHA_GROUP attributes, from Nikolay Aleksandrov" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits) net: nexthop: don't allow empty NHA_GROUP bpf: Fix two typos in uapi/linux/bpf.h net: dsa: b53: check for timeout tipc: call rcu_read_lock() in tipc_aead_encrypt_done() net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flow net: sctp: Fix negotiation of the number of data streams. dt-bindings: net: renesas, ether: Improve schema validation gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit() hv_netvsc: Remove "unlikely" from netvsc_select_queue bpf: selftests: global_funcs: Check err_str before strstr bpf: xdp: Fix XDP mode when no mode flags specified selftests/bpf: Remove test_align leftovers tools/resolve_btfids: Fix sections with wrong alignment net/smc: Prevent kernel-infoleak in __smc_diag_dump() sfc: fix build warnings on 32-bit net: phy: mscc: Fix a couple of spelling mistakes "spcified" -> "specified" libbpf: Fix map index used in error message net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe() net: atlantic: Use readx_poll_timeout() for large timeout ...
2020-08-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: - PAE and PKU bugfixes for x86 - selftests fix for new binutils - MMU notifier fix for arm64 * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode KVM: x86: fix access code passed to gva_to_gpa selftests: kvm: Use a shorter encoding to clear RAX
2020-08-21libbpf: Normalize and improve logging across few functionsAndrii Nakryiko
Make libbpf logs follow similar pattern and provide more context like section name or program name, where appropriate. Also, add BPF_INSN_SZ constant and use it throughout to clean up code a little bit. This commit doesn't have any functional changes and just removes some code changes out of the way before bigger refactoring in libbpf internals. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820231250.1293069-6-andriin@fb.com
2020-08-21libbpf: Skip well-known ELF sections when iterating ELFAndrii Nakryiko
Skip and don't log ELF sections that libbpf knows about and ignores during ELF processing. This allows to not unnecessarily log details about those ELF sections and cleans up libbpf debug log. Ignored sections include DWARF data, string table, empty .text section and few special (e.g., .llvm_addrsig) useless sections. With such ELF sections out of the way, log unrecognized ELF sections at pr_info level to increase visibility. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820231250.1293069-5-andriin@fb.com
2020-08-21libbpf: Add __noinline macro to bpf_helpers.hAndrii Nakryiko
__noinline is pretty frequently used, especially with BPF subprograms, so add them along the __always_inline, for user convenience and completeness. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820231250.1293069-4-andriin@fb.com
2020-08-21libbpf: Factor out common ELF operations and improve loggingAndrii Nakryiko
Factor out common ELF operations done throughout the libbpf. This simplifies usage across multiple places in libbpf, as well as hide error reporting from higher-level functions and make error logging more consistent. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820231250.1293069-3-andriin@fb.com
2020-08-21selftests/bpf: BPF object files should depend only on libbpf headersAndrii Nakryiko
There is no need to re-build BPF object files if any of the sources of libbpf change. So record more precise dependency only on libbpf/bpf_*.h headers. This eliminates unnecessary re-builds. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820231250.1293069-2-andriin@fb.com
2020-08-21selftests: bpf: Test sockmap update from BPFLorenz Bauer
Add a test which copies a socket from a sockmap into another sockmap or sockhash. This excercises bpf_map_update_elem support from BPF context. Compare the socket cookies from source and destination to ensure that the copy succeeded. Also check that the verifier rejects map_update from unsafe contexts. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200821102948.21918-7-lmb@cloudflare.com
2020-08-21libbpf: Add perf_buffer APIs for better integration with outside epoll loopAndrii Nakryiko
Add a set of APIs to perf_buffer manage to allow applications to integrate perf buffer polling into existing epoll-based infrastructure. One example is applications using libevent already and wanting to plug perf_buffer polling, instead of relying on perf_buffer__poll() and waste an extra thread to do it. But perf_buffer is still extremely useful to set up and consume perf buffer rings even for such use cases. So to accomodate such new use cases, add three new APIs: - perf_buffer__buffer_cnt() returns number of per-CPU buffers maintained by given instance of perf_buffer manager; - perf_buffer__buffer_fd() returns FD of perf_event corresponding to a specified per-CPU buffer; this FD is then polled independently; - perf_buffer__consume_buffer() consumes data from single per-CPU buffer, identified by its slot index. To support a simpler, but less efficient, way to integrate perf_buffer into external polling logic, also expose underlying epoll FD through perf_buffer__epoll_fd() API. It will need to be followed by perf_buffer__poll(), wasting extra syscall, or perf_buffer__consume(), wasting CPU to iterate buffers with no data. But could be simpler and more convenient for some cases. These APIs allow for great flexiblity, but do not sacrifice general usability of perf_buffer. Also exercise and check new APIs in perf_buffer selftest. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20200821165927.849538-1-andriin@fb.com
2020-08-21bpftool: Implement link_query for bpf iteratorsYonghong Song
The link query for bpf iterators is implemented. Besides being shown to the user what bpf iterator the link represents, the target_name is also used to filter out what additional information should be printed out, e.g., whether map_id should be shown or not. The following is an example of bpf_iter link dump, plain output or pretty output. $ bpftool link show 11: iter prog 59 target_name task pids test_progs(1749) 34: iter prog 173 target_name bpf_map_elem map_id 127 pids test_progs_1(1753) $ bpftool -p link show [{ "id": 11, "type": "iter", "prog_id": 59, "target_name": "task", "pids": [{ "pid": 1749, "comm": "test_progs" } ] },{ "id": 34, "type": "iter", "prog_id": 173, "target_name": "bpf_map_elem", "map_id": 127, "pids": [{ "pid": 1753, "comm": "test_progs_1" } ] } ] Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200821184420.574430-1-yhs@fb.com
2020-08-21bpf: Implement link_query for bpf iteratorsYonghong Song
This patch implemented bpf_link callback functions show_fdinfo and fill_link_info to support link_query interface. The general interface for show_fdinfo and fill_link_info will print/fill the target_name. Each targets can register show_fdinfo and fill_link_info callbacks to print/fill more target specific information. For example, the below is a fdinfo result for a bpf task iterator. $ cat /proc/1749/fdinfo/7 pos: 0 flags: 02000000 mnt_id: 14 link_type: iter link_id: 11 prog_tag: 990e1f8152f7e54f prog_id: 59 target_name: task Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200821184418.574122-1-yhs@fb.com
2020-08-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2020-08-21 The following pull-request contains BPF updates for your *net* tree. We've added 11 non-merge commits during the last 5 day(s) which contain a total of 12 files changed, 78 insertions(+), 24 deletions(-). The main changes are: 1) three fixes in BPF task iterator logic, from Yonghong. 2) fix for compressed dwarf sections in vmlinux, from Jiri. 3) fix xdp attach regression, from Andrii. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-21bpf: Fix two typos in uapi/linux/bpf.hTobias Klauser
Also remove trailing whitespaces in bpf_skb_get_tunnel_key example code. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200821133642.18870-1-tklauser@distanz.ch
2020-08-21perf: arm-spe: Fix check error when synthesizing eventsWei Li
In arm_spe_read_record(), when we are processing an events packet, 'decoder->packet.index' is the length of payload, which has been transformed in payloadlen(). So correct the check of 'idx'. Signed-off-by: Wei Li <liwei391@huawei.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200724072628.35904-1-liwei391@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21perf symbols: Add mwait_idle_with_hints.constprop.0 to the list of idle symbolsArnaldo Carvalho de Melo
The "mwait_idle_with_hints" one was already there, some compiler artifact now adds this ".constprop.0" suffix, cover that one too. At some point we need to put these in a special bucket and show it somewhere on the screen. Noticed building the kernel on a fedora:32 system using: gcc version 10.2.1 20200723 (Red Hat 10.2.1-1) (GCC) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21perf top: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not setTiezhu Yang
When I execute 'perf top' without HAVE_LIBBPF_SUPPORT, there exists the following segmentation fault, skip the side-band event setup to fix it, this is similar with commit 1101c872c8c7 ("perf record: Skip side-band event setup if HAVE_LIBBPF_SUPPORT is not set"). [yangtiezhu@linux perf]$ ./perf top <SNIP> perf: Segmentation fault Obtained 6 stack frames. ./perf(sighandler_dump_stack+0x5c) [0x12011b604] [0xffffffc010] ./perf(perf_mmap__read_init+0x3e) [0x1201feeae] ./perf() [0x1200d715c] /lib64/libpthread.so.0(+0xab9c) [0xffee10ab9c] /lib64/libc.so.6(+0x128f4c) [0xffedc08f4c] Segmentation fault [yangtiezhu@linux perf]$ I use git bisect to find commit b38d85ef49cf ("perf bpf: Decouple creating the evlist from adding the SB event") is the first bad commit, so also add the Fixes tag. Committer testing: First build perf explicitely disabling libbpf: $ make NO_LIBBPF=1 O=/tmp/build/perf -C tools/perf install-bin && perf test python Now make sure it isn't linked: $ perf -vv | grep -w bpf bpf: [ OFF ] # HAVE_LIBBPF_SUPPORT $ $ nm ~/bin/perf | grep libbpf $ And now try to run 'perf top': # perf top perf: Segmentation fault -------- backtrace -------- perf[0x5bcd6d] /lib64/libc.so.6(+0x3ca6f)[0x7fd0f5a66a6f] perf(perf_mmap__read_init+0x1e)[0x5e1afe] perf[0x4cc468] /lib64/libpthread.so.0(+0x9431)[0x7fd0f645a431] /lib64/libc.so.6(clone+0x42)[0x7fd0f5b2b912] # Applying this patch fixes the issue. Fixes: b38d85ef49cf ("perf bpf: Decouple creating the evlist from adding the SB event") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Link: http://lore.kernel.org/lkml/1597753837-16222-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-21perf sched timehist: Fix use of CPU list with summary optionDavid Ahern
Do not update thread stats or show idle summary unless CPU is in the list of interest. Fixes: c30d630d1bcfad8d ("perf sched timehist: Add support for filtering on CPU") Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20200817170943.1486-1-dsahern@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>