linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-04-11	bpf: Switch BPF verifier log to be a rotating log by default	Andrii Nakryiko
	Currently, if user-supplied log buffer to collect BPF verifier log turns out to be too small to contain full log, bpf() syscall returns -ENOSPC, fails BPF program verification/load, and preserves first N-1 bytes of the verifier log (where N is the size of user-supplied buffer). This is problematic in a bunch of common scenarios, especially when working with real-world BPF programs that tend to be pretty complex as far as verification goes and require big log buffers. Typically, it's when debugging tricky cases at log level 2 (verbose). Also, when BPF program is successfully validated, log level 2 is the only way to actually see verifier state progression and all the important details. Even with log level 1, it's possible to get -ENOSPC even if the final verifier log fits in log buffer, if there is a code path that's deep enough to fill up entire log, even if normally it would be reset later on (there is a logic to chop off successfully validated portions of BPF verifier log). In short, it's not always possible to pre-size log buffer. Also, what's worse, in practice, the end of the log most often is way more important than the beginning, but verifier stops emitting log as soon as initial log buffer is filled up. This patch switches BPF verifier log behavior to effectively behave as rotating log. That is, if user-supplied log buffer turns out to be too short, verifier will keep overwriting previously written log, effectively treating user's log buffer as a ring buffer. -ENOSPC is still going to be returned at the end, to notify user that log contents was truncated, but the important last N bytes of the log would be returned, which might be all that user really needs. This consistent -ENOSPC behavior, regardless of rotating or fixed log behavior, allows to prevent backwards compatibility breakage. The only user-visible change is which portion of verifier log user ends up seeing if buffer is too small. Given contents of verifier log itself is not an ABI, there is no breakage due to this behavior change. Specialized tools that rely on specific contents of verifier log in -ENOSPC scenario are expected to be easily adapted to accommodate old and new behaviors. Importantly, though, to preserve good user experience and not require every user-space application to adopt to this new behavior, before exiting to user-space verifier will rotate log (in place) to make it start at the very beginning of user buffer as a continuous zero-terminated string. The contents will be a chopped off N-1 last bytes of full verifier log, of course. Given beginning of log is sometimes important as well, we add BPF_LOG_FIXED (which equals 8) flag to force old behavior, which allows tools like veristat to request first part of verifier log, if necessary. BPF_LOG_FIXED flag is also a simple and straightforward way to check if BPF verifier supports rotating behavior. On the implementation side, conceptually, it's all simple. We maintain 64-bit logical start and end positions. If we need to truncate the log, start position will be adjusted accordingly to lag end position by N bytes. We then use those logical positions to calculate their matching actual positions in user buffer and handle wrap around the end of the buffer properly. Finally, right before returning from bpf_check(), we rotate user log buffer contents in-place as necessary, to make log contents contiguous. See comments in relevant functions for details. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/bpf/20230406234205.323208-4-andrii@kernel.org
2023-04-11	tools: ynl: throw a more meaningful exception if family not supported	Jakub Kicinski
	cli.py currently throws a pure KeyError if kernel doesn't support a netlink family. Users who did not write ynl (hah) may waste their time investigating what's wrong with the Python code. Improve the error message: Traceback (most recent call last): File "/home/kicinski/devel/linux/tools/net/ynl/lib/ynl.py", line 362, in __init__ self.family = GenlFamily(self.yaml['name']) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kicinski/devel/linux/tools/net/ynl/lib/ynl.py", line 331, in __init__ self.genl_family = genl_family_name_to_id[family_name] ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^ KeyError: 'netdev' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/kicinski/devel/linux/./tools/net/ynl/cli.py", line 52, in <module> main() File "/home/kicinski/devel/linux/./tools/net/ynl/cli.py", line 31, in main ynl = YnlFamily(args.spec, args.schema) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kicinski/devel/linux/tools/net/ynl/lib/ynl.py", line 364, in __init__ raise Exception(f"Family '{self.yaml['name']}' not supported by the kernel") Exception: Family 'netdev' not supported by the kernel Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230407145609.297525-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-10	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull virtio fixes from Michael Tsirkin: "Some last minute fixes - most of them for regressions" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa_sim_net: complete the initialization before register the device vdpa/mlx5: Add and remove debugfs in setup/teardown driver tools/virtio: fix typo in README instructions vhost-scsi: Fix crash during LUN unmapping vhost-scsi: Fix vhost_scsi struct use after free virtio-blk: fix ZBD probe in kernels without ZBD support virtio-blk: fix to match virtio spec
2023-04-10	selftests/bpf: Reset err when symbol name already exist in kprobe_multi_test	Manu Bretelle
	When trying to add a name to the hashmap, an error code of EEXIST is returned and we continue as names are possibly duplicated in the sys file. If the last name in the file is a duplicate, we will continue to the next iteration of the while loop, and exit the loop with a value of err set to EEXIST and enter the error label with err set, which causes the test to fail when it should not. This change reset err to 0 before continue-ing into the next iteration, this way, if there is no more data to read from the file we iterate through, err will be set to 0. Behaviour prior to this change: ``` test_kprobe_multi_bench_attach:FAIL:get_syms unexpected error: -17 (errno 2) All error logs: test_kprobe_multi_bench_attach:FAIL:get_syms unexpected error: -17 (errno 2) Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED ``` After this change: ``` Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED ``` Signed-off-by: Manu Bretelle <chantr4@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230408022919.54601-1-chantr4@gmail.com
2023-04-08	Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM fixes from Andrew Morton: "28 hotfixes. 23 are cc:stable and the other five address issues which were introduced during this merge cycle. 20 are for MM and the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits) maple_tree: fix a potential concurrency bug in RCU mode maple_tree: fix get wrong data_end in mtree_lookup_walk() mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() nilfs2: fix sysfs interface lifetime mm: take a page reference when removing device exclusive entries mm: vmalloc: avoid warn_alloc noise caused by fatal signal nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread() zsmalloc: document freeable stats zsmalloc: document new fullness grouping fsdax: force clear dirty mark if CoW mm/hugetlb: fix uffd wr-protection for CoW optimization path mm: enable maple tree RCU mode by default maple_tree: add RCU lock checking to rcu callback functions maple_tree: add smp_rmb() to dead node detection maple_tree: fix write memory barrier of nodes once dead for RCU mode maple_tree: remove extra smp_wmb() from mas_dead_leaves() maple_tree: fix freeing of nodes in rcu mode maple_tree: detect dead nodes in mas_start() maple_tree: be more cautious about dead nodes ...
2023-04-07	Merge tag 'for-netdev' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-04-08 We've added 4 non-merge commits during the last 11 day(s) which contain a total of 5 files changed, 39 insertions(+), 6 deletions(-). The main changes are: 1) Fix BPF TCP socket iterator to use correct helper for dropping socket's refcount, that is, sock_gen_put instead of sock_put, from Martin KaFai Lau. 2) Fix a BTI exception splat in BPF trampoline-generated code on arm64, from Xu Kuohai. 3) Fix a LongArch JIT error from missing BPF_NOSPEC no-op, from George Guo. 4) Fix dynamic XDP feature detection of veth in xdp_redirect selftest, from Lorenzo Bianconi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver bpf, arm64: Fixed a BTI error on returning to patched function LoongArch, bpf: Fix jit to skip speculation barrier opcode bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp ==================== Link: https://lore.kernel.org/r/20230407224642.30906-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07	selftests/bpf: Prevent infinite loop in veristat when base file is too short	Eduard Zingerman
	The following example forces veristat to loop indefinitely: $ cat two-ok file_name,prog_name,verdict,total_states file-a,a,success,12 file-b,b,success,67 $ cat add-failure file_name,prog_name,verdict,total_states file-a,a,success,12 file-b,b,success,67 file-b,c,failure,32 $ veristat -C two-ok add-failure <does not return> The loop is caused by handle_comparison_mode() not checking if `base` variable points to `fallback_stats` prior advancing joined results using `base`. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230407154125.896927-1-eddyz87@gmail.com
2023-04-07	bpftool: Set program type only if it differs from the desired one	Wei Yongjun
	After commit d6e6286a12e7 ("libbpf: disassociate section handler on explicit bpf_program__set_type() call"), bpf_program__set_type() will force cleanup the program's SEC() definition, this commit fixed the test helper but missed the bpftool, which leads to bpftool prog autoattach broken as follows: $ bpftool prog load spi-xfer-r1v1.o /sys/fs/bpf/test autoattach Program spi_xfer_r1v1 does not support autoattach, falling back to pinning This patch fix bpftool to set program type only if it differs. Fixes: d6e6286a12e7 ("libbpf: disassociate section handler on explicit bpf_program__set_type() call") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230407081427.2621590-1-weiyongjun@huaweicloud.com
2023-04-07	selftests/bpf: Use PERF_COUNT_HW_CPU_CYCLES event for get_branch_snapshot	Song Liu
	perf_event with type=PERF_TYPE_RAW and config=0x1b00 turned out to be not reliable in ensuring LBR is active. Thus, test_progs:get_branch_snapshot is not reliable in some systems. Replace it with PERF_COUNT_HW_CPU_CYCLES event, which gives more consistent results. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230407190130.2093736-1-song@kernel.org
2023-04-07	selftests: bonding: add arp validate test	Hangbin Liu
	This patch add bonding arp validate tests with mode active backup, monitor arp_ip_target and ns_ip6_target. It also checks mii_status to make sure all slaves are UP. Acked-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07	selftests: bonding: re-format bond option tests	Hangbin Liu
	To improve the testing process for bond options, A new bond topology lib is added to our testing setup. The current option_prio.sh file will be renamed to bond_options.sh so that all bonding options can be tested here. Specifically, for priority testing, we will run all tests using modes 1, 5, and 6. These changes will help us streamline the testing process and ensure that our bond options are rigorously evaluated. Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-06	selftests: forwarding: hw_stats_l3: Detect failure to install counters	Petr Machata
	Running this test makes little sense if the enabled l3_stats are not actually reported as "used". This can signify a failure of a driver to install the necessary counters, or simply lack of support for enabling in-HW counters on a given netdevice. It is generally impossible to tell from the outside which it is. But more likely than not, if somebody is running this on veth pairs, they do not intend to actually test that a certain piece of HW can install in-HW counters for the veth. It is more likely they are e.g. running the test by mistake. Therefore detect that the counter has not been actually installed. In that case, if the netdevice is one end of a veth pair, SKIP. Otherwise FAIL. Suggested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Tested-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/a86817961903cca5cb0aebf2b2a06294b8aa7dea.1680704172.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06	selftests/bpf: Add verifier tests for code pattern '<const> <cond_op> ↵	Yonghong Song
	<non_const>' Add various tests for code pattern '<const> <cond_op> <non_const>' to exercise the previous verifier patch. The following are veristat changed number of processed insns stat comparing the previous patch vs. this patch: File Program Insns (A) Insns (B) Insns (DIFF) ----------------------------------------------------- ---------------------------------------------------- --------- --------- ------------- test_seg6_loop.bpf.linked3.o __add_egr_x 12423 12314 -109 (-0.88%) Only one program is affected with minor change. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230406164510.1047757-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06	bpf: Improve handling of pattern '<const> <cond_op> <non_const>' in verifier	Yonghong Song
	Currently, the verifier does not handle '<const> <cond_op> <non_const>' well. For example, ... 10: (79) r1 = (u64 )(r10 -16) ; R1_w=scalar() R10=fp0 11: (b7) r2 = 0 ; R2_w=0 12: (2d) if r2 > r1 goto pc+2 13: (b7) r0 = 0 14: (95) exit 15: (65) if r1 s> 0x1 goto pc+3 16: (0f) r0 += r1 ... At insn 12, verifier decides both true and false branch are possible, but actually only false branch is possible. Currently, the verifier already supports patterns '<non_const> <cond_op> <const>. Add support for patterns '<const> <cond_op> <non_const>' in a similar way. Also fix selftest 'verifier_bounds_mix_sign_unsign/bounds checks mixing signed and unsigned, variant 10' due to this change. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230406164505.1046801-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06	selftests/bpf: Add tests for non-constant cond_op NE/EQ bound deduction	Yonghong Song
	Add various tests for code pattern '<non-const> NE/EQ <const>' implemented in the previous verifier patch. Without the verifier patch, these new tests will fail. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230406164500.1045715-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Conflicts: drivers/net/ethernet/google/gve/gve.h 3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts") 75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format") https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/ https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/ Adjacent changes: net/can/isotp.c 051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()") 96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-06	Merge tag 'net-6.3-rc6-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and can. Current release - regressions: - wifi: mac80211: - fix potential null pointer dereference - fix receiving mesh packets in forwarding=0 networks - fix mesh forwarding Current release - new code bugs: - virtio/vsock: fix leaks due to missing skb owner Previous releases - regressions: - raw: fix NULL deref in raw_get_next(). - sctp: check send stream number after wait_for_sndbuf - qrtr: - fix a refcount bug in qrtr_recvmsg() - do not do DEL_SERVER broadcast after DEL_CLIENT - wifi: brcmfmac: fix SDIO suspend/resume regression - wifi: mt76: fix use-after-free in fw features query. - can: fix race between isotp_sendsmg() and isotp_release() - eth: mtk_eth_soc: fix remaining throughput regression - eth: ice: reset FDIR counter in FDIR init stage Previous releases - always broken: - core: don't let netpoll invoke NAPI if in xmit context - icmp: guard against too small mtu - ipv6: fix an uninit variable access bug in __ip6_make_skb() - wifi: mac80211: fix the size calculation of ieee80211_ie_len_eht_cap() - can: fix poll() to not report false EPOLLOUT events - eth: gve: secure enough bytes in the first TX desc for all TCP pkts" * tag 'net-6.3-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) net: stmmac: check fwnode for phy device before scanning for phy net: stmmac: Add queue reset into stmmac_xdp_open() function selftests: net: rps_default_mask.sh: delete veth link specifically net: fec: make use of MDIO C45 quirk can: isotp: fix race between isotp_sendsmg() and isotp_release() can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access gve: Secure enough bytes in the first TX desc for all TCP pkts netlink: annotate lockless accesses to nlk->max_recvmsg_len ethtool: reset #lanes when lanes is omitted ping: Fix potentail NULL deref for /proc/net/icmp. raw: Fix NULL deref in raw_get_next(). ice: Reset FDIR counter in FDIR init stage ice: fix wrong fallback logic for FDIR net: stmmac: fix up RX flow hash indirection table when setting channels net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe wifi: mt76: ignore key disable commands wifi: ath11k: reduce the MHI timeout to 20s ipv6: Fix an uninit variable access bug in __ip6_make_skb() ...
2023-04-06	Merge tag 'linux-kselftest-fixes-6.3-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "One single fix to mount_setattr_test build failure" * tag 'linux-kselftest-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests mount: Fix mount_setattr_test builds failed
2023-04-06	selftests: xsk: Add test UNALIGNED_INV_DESC_4K1_FRAME_SIZE	Kal Conley
	Add unaligned descriptor test for frame size of 4001. Using an odd frame size ensures that the end of the UMEM is not near a page boundary. This allows testing descriptors that staddle the end of the UMEM but not a page. This test used to fail without the previous commit ("xsk: Fix unaligned descriptor validation"). Signed-off-by: Kal Conley <kal.conley@dectris.com> Link: https://lore.kernel.org/r/20230405235920.7305-3-kal.conley@dectris.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-06	selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver	Lorenzo Bianconi
	xdp-features supported by veth driver are no more static, but they depends on veth configuration (e.g. if GRO is enabled/disabled or TX/RX queue configuration). Take it into account in xdp_redirect xdp-features selftest for veth driver. Fixes: fccca038f300 ("veth: take into account device reconfiguration for xdp_features flag") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/bc35455cfbb1d4f7f52536955ded81ad47d8dc54.1680777371.git.lorenzo@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-06	selftests/net: fix typo in tcp_mmap	Eric Dumazet
	kernel test robot reported the following warning: All warnings (new ones prefixed by >>): tcp_mmap.c: In function 'child_thread': >> tcp_mmap.c:211:61: warning: 'lu' may be used uninitialized in this function [-Wmaybe-uninitialized] 211 \| zc.length = min(chunk_size, FILE_SZ - lu); We want to read FILE_SZ bytes, so the correct expression should be (FILE_SZ - total) Fixes: 5c5945dc695c ("selftests/net: Add SHA256 computation over data sent in tcp_mmap") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304042104.UFIuevBp-lkp@intel.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xiaoyan Li <lixiaoyan@google.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230405071556.1019623-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-05	bpftool: Clean up _bpftool_once_attr() calls in bash completion	Quentin Monnet
	In bpftool's bash completion file, function _bpftool_once_attr() is able to process multiple arguments. There are a few locations where this function is called multiple times in a row, each time for a single argument; let's pass all arguments instead to minimize the number of function calls required for the completion. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230405132120.59886-8-quentin@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-05	bpftool: Support printing opcodes and source file references in CFG	Quentin Monnet
	Add support for displaying opcodes or/and file references (filepath, line and column numbers) when dumping the control flow graphs of loaded BPF programs with bpftool. The filepaths in the records are absolute. To avoid blocks on the graph to get too wide, we truncate them when they get too long (but we always keep the entire file name). In the unlikely case where the resulting file name is ambiguous, it remains possible to get the full path with a regular dump (no CFG). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230405132120.59886-7-quentin@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-05	bpftool: Support "opcodes", "linum", "visual" simultaneously	Quentin Monnet
	When dumping a program, the keywords "opcodes" (for printing the raw opcodes), "linum" (for displaying the filename, line number, column number along with the source code), and "visual" (for generating the control flow graph for translated programs) are mutually exclusive. But there's no reason why they should be. Let's make it possible to pass several of them at once. The "file FILE" option, which makes bpftool output a binary image to a file, remains incompatible with the others. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230405132120.59886-6-quentin@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-05	bpftool: Return an error on prog dumps if both CFG and JSON are required	Quentin Monnet
	We do not support JSON output for control flow graphs of programs with bpftool. So far, requiring both the CFG and JSON output would result in producing a null JSON object. It makes more sense to raise an error directly when parsing command line arguments and options, so that users know they won't get any output they might expect. If JSON is required for the graph, we leave it to Graphviz instead: # bpftool prog dump xlated <REF> visual \| dot -Tjson Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230405132120.59886-5-quentin@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-05	bpftool: Support inline annotations when dumping the CFG of a program	Quentin Monnet
	We support dumping the control flow graph of loaded programs to the DOT format with bpftool, but so far this feature wouldn't display the source code lines available through BTF along with the eBPF bytecode. Let's add support for these annotations, to make it easier to read the graph. In prog.c, we move the call to dump_xlated_cfg() in order to pass and use the full struct dump_data, instead of creating a minimal one in draw_bb_node(). We pass the pointer to this struct down to dump_xlated_for_graph() in xlated_dumper.c, where most of the logics is added. We deal with BTF mostly like we do for plain or JSON output, except that we cannot use a "nr_skip" value to skip a given number of linfo records (we don't process the BPF instructions linearly, and apart from the root of the graph we don't know how many records we should skip, so we just store the last linfo and make sure the new one we find is different before printing it). When printing the source instructions to the label of a DOT graph node, there are a few subtleties to address. We want some special newline markers, and there are some characters that we must escape. To deal with them, we introduce a new dedicated function btf_dump_linfo_dotlabel() in btf_dumper.c. We'll reuse this function in a later commit to format the filepath, line, and column references as well. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230405132120.59886-4-quentin@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-05	bpftool: Fix bug for long instructions in program CFG dumps	Quentin Monnet
	When dumping the control flow graphs for programs using the 16-byte long load instruction, we need to skip the second part of this instruction when looking for the next instruction to process. Otherwise, we end up printing "BUG_ld_00" from the kernel disassembler in the CFG. Fixes: efcef17a6d65 ("tools: bpftool: generate .dot graph from CFG information") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230405132120.59886-3-quentin@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-05	bpftool: Fix documentation about line info display for prog dumps	Quentin Monnet
	The documentation states that when line_info is available when dumping a program, the source line will be displayed "by default". There is no notion of "default" here: the line is always displayed if available, there is no way currently to turn it off. In the next sentence, the documentation states that if "linum" is used on the command line, the relevant filename, line, and column will be displayed "on top of the source line". This is incorrect, as they are currently displayed on the right side of the source line (or on top of the eBPF instruction, not the source). This commit fixes the documentation to address these points. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230405132120.59886-2-quentin@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-05	selftests: net: rps_default_mask.sh: delete veth link specifically	Hangbin Liu
	When deleting the netns and recreating a new one while re-adding the veth interface, there is a small window of time during which the old veth interface has not yet been removed. This can cause the new addition to fail. To resolve this issue, we can either wait for a short while to ensure that the old veth interface is deleted, or we can specifically remove the veth interface. Before this patch: # ./rps_default_mask.sh empty rps_default_mask [ ok ] changing rps_default_mask dont affect existing devices [ ok ] changing rps_default_mask dont affect existing netns [ ok ] changing rps_default_mask affect newly created devices [ ok ] changing rps_default_mask don't affect newly child netns[II][ ok ] rps_default_mask is 0 by default in child netns [ ok ] RTNETLINK answers: File exists changing rps_default_mask in child ns don't affect the main one[ ok ] cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected [fail] expected 1 found changing rps_default_mask in child ns don't affect existing devices[ ok ] After this patch: # ./rps_default_mask.sh empty rps_default_mask [ ok ] changing rps_default_mask dont affect existing devices [ ok ] changing rps_default_mask dont affect existing netns [ ok ] changing rps_default_mask affect newly created devices [ ok ] changing rps_default_mask don't affect newly child netns[II][ ok ] rps_default_mask is 0 by default in child netns [ ok ] changing rps_default_mask in child ns don't affect the main one[ ok ] changing rps_default_mask in child ns affects new childns devices[ ok ] changing rps_default_mask in child ns don't affect existing devices[ ok ] Fixes: 3a7d84eae03b ("self-tests: more rps self tests") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-05	maple_tree: fix write memory barrier of nodes once dead for RCU mode	Liam R. Howlett
	During the development of the maple tree, the strategy of freeing multiple nodes changed and, in the process, the pivots were reused to store pointers to dead nodes. To ensure the readers see accurate pivots, the writers need to mark the nodes as dead and call smp_wmb() to ensure any readers can identify the node as dead before using the pivot values. There were two places where the old method of marking the node as dead without smp_wmb() were being used, which resulted in RCU readers seeing the wrong pivot value before seeing the node was dead. Fix this race condition by using mte_set_node_dead() which has the smp_wmb() call to ensure the race is closed. Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed are marked as dead to ensure there are no other call paths besides the two updated paths. This is necessary for the RCU mode of the maple tree. Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-05	selftests/bpf: Wait for receive in cg_storage_multi test	YiFei Zhu
	In some cases the loopback latency might be large enough, causing the assertion on invocations to be run before ingress prog getting executed. The assertion would fail and the test would flake. This can be reliably reproduced by arbitrarily increasing the loopback latency (thanks to [1]): tc qdisc add dev lo root handle 1: htb default 12 tc class add dev lo parent 1:1 classid 1:12 htb rate 20kbps ceil 20kbps tc qdisc add dev lo parent 1:12 netem delay 100ms Fix this by waiting on the receive end, instead of instantly returning to the assert. The call to read() will wait for the default SO_RCVTIMEO timeout of 3 seconds provided by start_server(). [1] https://gist.github.com/kstevens715/4598301 Reported-by: Martin KaFai Lau <martin.lau@linux.dev> Link: https://lore.kernel.org/bpf/9c5c8b7e-1d89-a3af-5400-14fde81f4429@linux.dev/ Fixes: 3573f384014f ("selftests/bpf: Test CGROUP_STORAGE behavior on shared egress + ingress") Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: YiFei Zhu <zhuyifei@google.com> Link: https://lore.kernel.org/r/20230405193354.1956209-1-zhuyifei@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05	selftests: xsk: Deflakify STATS_RX_DROPPED test	Kal Conley
	Fix flaky STATS_RX_DROPPED test. The receiver calls getsockopt after receiving the last (valid) packet which is not the final packet sent in the test (valid and invalid packets are sent in alternating fashion with the final packet being invalid). Since the last packet may or may not have been dropped already, both outcomes must be allowed. This issue could also be fixed by making sure the last packet sent is valid. This alternative is left as an exercise to the reader (or the benevolent maintainers of this file). This problem was quite visible on certain setups. On one machine this failure was observed 50% of the time. Also, remove a redundant assignment of pkt_stream->nb_pkts. This field is already initialized by __pkt_stream_alloc. Fixes: 27e934bec35b ("selftests: xsk: make stat tests not spin on getsockopt") Signed-off-by: Kal Conley <kal.conley@dectris.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230403120400.31018-1-kal.conley@dectris.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05	selftests: xsk: Disable IPv6 on VETH1	Kal Conley
	This change fixes flakiness in the BIDIRECTIONAL test: # [is_pkt_valid] expected length [60], got length [90] not ok 1 FAIL: SKB BUSY-POLL BIDIRECTIONAL When IPv6 is enabled, the interface will periodically send MLDv1 and MLDv2 packets. These packets can cause the BIDIRECTIONAL test to fail since it uses VETH0 for RX. For other tests, this was not a problem since they only receive on VETH1 and IPv6 was already disabled on VETH0. Fixes: a89052572ebb ("selftests/bpf: Xsk selftests framework") Signed-off-by: Kal Conley <kal.conley@dectris.com> Link: https://lore.kernel.org/r/20230405082905.6303-1-kal.conley@dectris.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05	selftests: xsk: Add test case for packets at end of UMEM	Kal Conley
	Add test case to testapp_invalid_desc for valid packets at the end of the UMEM. Signed-off-by: Kal Conley <kal.conley@dectris.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230403145047.33065-3-kal.conley@dectris.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05	selftests: xsk: Use correct UMEM size in testapp_invalid_desc	Kal Conley
	Avoid UMEM_SIZE macro in testapp_invalid_desc which is incorrect when the frame size is not XSK_UMEM__DEFAULT_FRAME_SIZE. Also remove the macro since it's no longer being used. Fixes: 909f0e28207c ("selftests: xsk: Add tests for 2K frame size") Signed-off-by: Kal Conley <kal.conley@dectris.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230403145047.33065-2-kal.conley@dectris.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-05	selftests: xsk: Add xskxceiver.h dependency to Makefile	Kal Conley
	xskxceiver depends on xskxceiver.h so tell make about it. Signed-off-by: Kal Conley <kal.conley@dectris.com> Link: https://lore.kernel.org/r/20230403130151.31195-1-kal.conley@dectris.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-04-04	selftests/bpf: Add tracing tests for walking skb and req.	Alexei Starovoitov
	Add tracing tests for walking skb->sk and req->sk. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230404045029.82870-9-alexei.starovoitov@gmail.com
2023-04-04	selftests/bpf: Add RESOLVE_BTFIDS dependency to bpf_testmod.ko	Ilya Leoshkevich
	bpf_testmod.ko sometimes fails to build from a clean checkout: BTF [M] linux/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.ko /bin/sh: 1: linux-build//tools/build/resolve_btfids/resolve_btfids: not found The reason is that RESOLVE_BTFIDS may not yet be built. Fix by adding a dependency. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230403172935.1553022-1-iii@linux.ibm.com
2023-04-04	tools/virtio: fix typo in README instructions	Ross Zwisler
	We need to have a unique chardev for each data path, else the chardevs will collide and qemu will die with this message: qemu-system-x86_64: -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0, id=channel1,name=trace-path-cpu0: Property 'virtserialport.chardev' can't take value 'charchannel0': Device 'charchannel0' is in use Signed-off-by: Ross Zwisler <zwisler@google.com> Message-Id: <20230215223350.2658616-7-zwisler@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-04	vsock/test: update expected return values	Arseniy Krasnov
	This updates expected return values for invalid buffer test. Now such values are returned from transport, not from af_vsock.c. Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-01	bpf: Remove now-defunct task kfuncs	David Vernet
	In commit 22df776a9a86 ("tasks: Extract rcu_users out of union"), the 'refcount_t rcu_users' field was extracted out of a union with the 'struct rcu_head rcu' field. This allows us to safely perform a refcount_inc_not_zero() on task->rcu_users when acquiring a reference on a task struct. A prior patch leveraged this by making struct task_struct an RCU-protected object in the verifier, and by bpf_task_acquire() to use the task->rcu_users field for synchronization. Now that we can use RCU to protect tasks, we no longer need bpf_task_kptr_get(), or bpf_task_acquire_not_zero(). bpf_task_kptr_get() is truly completely unnecessary, as we can just use RCU to get the object. bpf_task_acquire_not_zero() is now equivalent to bpf_task_acquire(). In addition to these changes, this patch also updates the associated selftests to no longer use these kfuncs. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230331195733.699708-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01	bpf: Make struct task_struct an RCU-safe type	David Vernet
	struct task_struct objects are a bit interesting in terms of how their lifetime is protected by refcounts. task structs have two refcount fields: 1. refcount_t usage: Protects the memory backing the task struct. When this refcount drops to 0, the task is immediately freed, without waiting for an RCU grace period to elapse. This is the field that most callers in the kernel currently use to ensure that a task remains valid while it's being referenced, and is what's currently tracked with bpf_task_acquire() and bpf_task_release(). 2. refcount_t rcu_users: A refcount field which, when it drops to 0, schedules an RCU callback that drops a reference held on the 'usage' field above (which is acquired when the task is first created). This field therefore provides a form of RCU protection on the task by ensuring that at least one 'usage' refcount will be held until an RCU grace period has elapsed. The qualifier "a form of" is important here, as a task can remain valid after task->rcu_users has dropped to 0 and the subsequent RCU gp has elapsed. In terms of BPF, we want to use task->rcu_users to protect tasks that function as referenced kptrs, and to allow tasks stored as referenced kptrs in maps to be accessed with RCU protection. Let's first determine whether we can safely use task->rcu_users to protect tasks stored in maps. All of the bpf_task* kfuncs can only be called from tracepoint, struct_ops, or BPF_PROG_TYPE_SCHED_CLS, program types. For tracepoint and struct_ops programs, the struct task_struct passed to a program handler will always be trusted, so it will always be safe to call bpf_task_acquire() with any task passed to a program. Note, however, that we must update bpf_task_acquire() to be KF_RET_NULL, as it is possible that the task has exited by the time the program is invoked, even if the pointer is still currently valid because the main kernel holds a task->usage refcount. For BPF_PROG_TYPE_SCHED_CLS, tasks should never be passed as an argument to the any program handlers, so it should not be relevant. The second question is whether it's safe to use RCU to access a task that was acquired with bpf_task_acquire(), and stored in a map. Because bpf_task_acquire() now uses task->rcu_users, it follows that if the task is present in the map, that it must have had at least one task->rcu_users refcount by the time the current RCU cs was started. Therefore, it's safe to access that task until the end of the current RCU cs. With all that said, this patch makes struct task_struct is an RCU-protected object. In doing so, we also change bpf_task_acquire() to be KF_ACQUIRE \| KF_RCU \| KF_RET_NULL, and adjust any selftests as necessary. A subsequent patch will remove bpf_task_kptr_get(), and bpf_task_acquire_not_zero() respectively. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230331195733.699708-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01	veristat: small fixed found in -O2 mode	Andrii Nakryiko
	Fix few potentially unitialized variables uses, found while building veristat.c in release (-O2) mode. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230331222405.3468634-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01	veristat: avoid using kernel-internal headers	Andrii Nakryiko
	Drop linux/compiler.h include, which seems to be needed for ARRAY_SIZE macro only. Redefine own version of ARRAY_SIZE instead. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230331222405.3468634-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01	veristat: improve version reporting	Andrii Nakryiko
	For packaging version of the tool is important, so add a simple way to specify veristat version for upstream mirror at Github. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230331222405.3468634-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-01	veristat: relicense veristat.c as dual GPL-2.0-only or BSD-2-Clause licensed	Andrii Nakryiko
	Dual-license veristat.c to dual GPL-2.0-only or BSD-2-Clause license. This is needed to mirror it to Github to make it convenient for distro packagers to package veristat as a separate package. Veristat grew into a useful tool by itself, and there are already a bunch of users relying on veristat as generic BPF loading and verification helper tool. So making it easy to packagers by providing Github mirror just like we do for bpftool and libbpf is the next step to get veristat into the hands of users. Apart from few typo fixes, I'm the sole contributor to veristat.c so far, so no extra Acks should be needed for relicensing. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230331222405.3468634-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-31	selftests/bpf: Fix conflicts with built-in functions in ↵	James Hilliard
	bench_local_storage_create The fork function in gcc is considered a built in function due to being used by libgcov when building with gnu extensions. Rename fork to sched_process_fork to prevent this conflict. See details: https://github.com/gcc-mirror/gcc/commit/d1c38823924506d389ca58d02926ace21bdf82fa https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82457 Fixes the following error: In file included from progs/bench_local_storage_create.c:6: progs/bench_local_storage_create.c:43:14: error: conflicting types for built-in function 'fork'; expected 'int(void)' [-Werror=builtin-declaration-mismatch] 43 \| int BPF_PROG(fork, struct task_struct parent, struct task_struct child) \| ^~~~ Fixes: cbe9d93d58b1 ("selftests/bpf: Add bench for task storage creation") Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230331075848.1642814-1-james.hilliard1@gmail.com
2023-03-31	selftests/bpf: Replace extract_build_id with read_build_id	Jiri Olsa
	Replacing extract_build_id with read_build_id that parses out build id directly from elf without using readelf tool. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230331093157.1749137-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-31	selftests/bpf: Add read_build_id function	Jiri Olsa
	Adding read_build_id function that parses out build id from specified binary. It will replace extract_build_id and also be used in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230331093157.1749137-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-31	selftests/bpf: Add err.h header	Jiri Olsa
	Moving error macros from profiler.inc.h to new err.h header. It will be used in following changes. Also adding PTR_ERR macro that will be used in following changes. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230331093157.1749137-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>