linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-03-18	bpftool: Add `gen object` command to perform BPF static linking	Andrii Nakryiko
	Add `bpftool gen object <output-file> <input_file>...` command to statically link multiple BPF ELF object files into a single output BPF ELF object file. This patch also updates bash completions and man page. Man page gets a short section on `gen object` command, but also updates the skeleton example to show off workflow for BPF application with two .bpf.c files, compiled individually with Clang, then resulting object files are linked together with `gen object`, and then final object file is used to generate usable BPF skeleton. This should help new users understand realistic workflow w.r.t. compiling mutli-file BPF application. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20210318194036.3521577-10-andrii@kernel.org
2021-03-18	bpftool: Add ability to specify custom skeleton object name	Andrii Nakryiko
	Add optional name OBJECT_NAME parameter to `gen skeleton` command to override default object name, normally derived from input file name. This allows much more flexibility during build time. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-9-andrii@kernel.org
2021-03-18	libbpf: Add BPF static linker BTF and BTF.ext support	Andrii Nakryiko
	Add .BTF and .BTF.ext static linking logic. When multiple BPF object files are linked together, their respective .BTF and .BTF.ext sections are merged together. BTF types are not just concatenated, but also deduplicated. .BTF.ext data is grouped by type (func info, line info, core_relos) and target section names, and then all the records are concatenated together, preserving their relative order. All the BTF type ID references and string offsets are updated as necessary, to take into account possibly deduplicated strings and types. BTF DATASEC types are handled specially. Their respective var_secinfos are accumulated separately in special per-section data and then final DATASEC types are emitted at the very end during bpf_linker__finalize() operation, just before emitting final ELF output file. BTF data can also provide "section annotations" for some extern variables. Such concept is missing in ELF, but BTF will have DATASEC types for such special extern datasections (e.g., .kconfig, .ksyms). Such sections are called "ephemeral" internally. Internally linker will keep metadata for each such section, collecting variables information, but those sections won't be emitted into the final ELF file. Also, given LLVM/Clang during compilation emits BTF DATASECS that are incomplete, missing section size and variable offsets for static variables, BPF static linker will initially fix up such DATASECs, using ELF symbols data. The final DATASECs will preserve section sizes and all variable offsets. This is handled correctly by libbpf already, so won't cause any new issues. On the other hand, it's actually a nice property to have a complete BTF data without runtime adjustments done during bpf_object__open() by libbpf. In that sense, BPF static linker is also a BTF normalizer. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-8-andrii@kernel.org
2021-03-18	libbpf: Add BPF static linker APIs	Andrii Nakryiko
	Introduce BPF static linker APIs to libbpf. BPF static linker allows to perform static linking of multiple BPF object files into a single combined resulting object file, preserving all the BPF programs, maps, global variables, etc. Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are concatenated together. Similarly, code sections are also concatenated. All the symbols and ELF relocations are also concatenated in their respective ELF sections and are adjusted accordingly to the new object file layout. Static variables and functions are handled correctly as well, adjusting BPF instructions offsets to reflect new variable/function offset within the combined ELF section. Such relocations are referencing STT_SECTION symbols and that stays intact. Data sections in different files can have different alignment requirements, so that is taken care of as well, adjusting sizes and offsets as necessary to satisfy both old and new alignment requirements. DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG section, which is ignored by libbpf in bpf_object__open() anyways. So, in a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty nice property, especially if resulting .o file is then used to generate BPF skeleton. Original string sections are ignored and instead we construct our own set of unique strings using libbpf-internal `struct strset` API. To reduce the size of the patch, all the .BTF and .BTF.ext processing was moved into a separate patch. The high-level API consists of just 4 functions: - bpf_linker__new() creates an instance of BPF static linker. It accepts output filename and (currently empty) options struct; - bpf_linker__add_file() takes input filename and appends it to the already processed ELF data; it can be called multiple times, one for each BPF ELF object file that needs to be linked in; - bpf_linker__finalize() needs to be called to dump final ELF contents into the output file, specified when bpf_linker was created; after bpf_linker__finalize() is called, no more bpf_linker__add_file() and bpf_linker__finalize() calls are allowed, they will return error; - regardless of whether bpf_linker__finalize() was called or not, bpf_linker__free() will free up all the used resources. Currently, BPF static linker doesn't resolve cross-object file references (extern variables and/or functions). This will be added in the follow up patch set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
2021-03-18	libbpf: Add generic BTF type shallow copy API	Andrii Nakryiko
	Add btf__add_type() API that performs shallow copy of a given BTF type from the source BTF into the destination BTF. All the information and type IDs are preserved, but all the strings encountered are added into the destination BTF and corresponding offsets are rewritten. BTF type IDs are assumed to be correct or such that will be (somehow) modified afterwards. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-6-andrii@kernel.org
2021-03-18	libbpf: Extract internal set-of-strings datastructure APIs	Andrii Nakryiko
	Extract BTF logic for maintaining a set of strings data structure, used for BTF strings section construction in writable mode, into separate re-usable API. This data structure is going to be used by bpf_linker to maintains ELF STRTAB section, which has the same layout as BTF strings section. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-5-andrii@kernel.org
2021-03-18	libbpf: Rename internal memory-management helpers	Andrii Nakryiko
	Rename btf_add_mem() and btf_ensure_mem() helpers that abstract away details of dynamically resizable memory to use libbpf_ prefix, as they are not BTF-specific. No functional changes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-4-andrii@kernel.org
2021-03-18	libbpf: Generalize BTF and BTF.ext type ID and strings iteration	Andrii Nakryiko
	Extract and generalize the logic to iterate BTF type ID and string offset fields within BTF types and .BTF.ext data. Expose this internally in libbpf for re-use by bpf_linker. Additionally, complete strings deduplication handling for BTF.ext (e.g., CO-RE access strings), which was previously missing. There previously was no case of deduplicating .BTF.ext data, but bpf_linker is going to use it. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-3-andrii@kernel.org
2021-03-18	libbpf: Expose btf_type_by_id() internally	Andrii Nakryiko
	btf_type_by_id() is internal-only convenience API returning non-const pointer to struct btf_type. Expose it outside of btf.c for re-use. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210318194036.3521577-2-andrii@kernel.org
2021-03-18	selftests: kvm: add set_boot_cpu_id test	Emanuele Giuseppe Esposito
	Test for the KVM_SET_BOOT_CPU_ID ioctl. Check that it correctly allows to change the BSP vcpu. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210318151624.490861-2-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18	selftests: kvm: add _vm_ioctl	Emanuele Giuseppe Esposito
	As in kvm_ioctl and _kvm_ioctl, add the respective _vm_ioctl for vm_ioctl. _vm_ioctl invokes an ioctl using the vm fd, leaving the caller to test the result. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210318151624.490861-1-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18	selftests: kvm: add get_msr_index_features	Emanuele Giuseppe Esposito
	Test the KVM_GET_MSR_FEATURE_INDEX_LIST and KVM_GET_MSR_INDEX_LIST ioctls. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210318145629.486450-1-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-18	selftest/bpf: Add a test to check trampoline freeing logic.	Alexei Starovoitov
	Add a selftest for commit e21aa341785c ("bpf: Fix fexit trampoline.") to make sure that attaching fexit prog to a sleeping kernel function will trigger appropriate trampoline and program destruction. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210318004523.55908-1-alexei.starovoitov@gmail.com
2021-03-18	selftests: kvm: Add basic Hyper-V clocksources tests	Vitaly Kuznetsov
	Introduce a new selftest for Hyper-V clocksources (MSR-based reference TSC and TSC page). As a starting point, test the following: 1) Reference TSC is 1Ghz clock. 2) Reference TSC and TSC page give the same reading. 3) TSC page gets updated upon KVM_SET_CLOCK call. 4) TSC page does not get updated when guest opted for reenlightenment. 5) Disabled TSC page doesn't get updated. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210318140949.1065740-1-vkuznets@redhat.com> [Add a host-side test using TSC + KVM_GET_MSR too. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-17	selftests/bpf: drop custom NULL #define in skb_pkt_end selftest	Andrii Nakryiko
	Now that bpftool generates NULL definition as part of vmlinux.h, drop custom NULL definition in skb_pkt_end.c. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20210317200510.1354627-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17	libbpf: provide NULL and KERNEL_VERSION macros in bpf_helpers.h	Andrii Nakryiko
	Given that vmlinux.h is not compatible with headers like stddef.h, NULL poses an annoying problem: it is defined as #define, so is not captured in BTF, so is not emitted into vmlinux.h. This leads to users either sticking to explicit 0, or defining their own NULL (as progs/skb_pkt_end.c does). But it's easy for bpf_helpers.h to provide (conditionally) NULL definition. Similarly, KERNEL_VERSION is another commonly missed macro that came up multiple times. So this patch adds both of them, along with offsetof(), that also is typically defined in stddef.h, just like NULL. This might cause compilation warning for existing BPF applications defining their own NULL and/or KERNEL_VERSION already: progs/skb_pkt_end.c:7:9: warning: 'NULL' macro redefined [-Wmacro-redefined] #define NULL 0 ^ /tmp/linux/tools/testing/selftests/bpf/tools/include/vmlinux.h:4:9: note: previous definition is here #define NULL ((void *)0) ^ It is trivial to fix, though, so long-term benefits outweight temporary inconveniences. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20210317200510.1354627-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	David S. Miller
	Daniel Borkmann says: ==================== pull-request: bpf 2021-03-18 The following pull-request contains BPF updates for your net tree. We've added 10 non-merge commits during the last 4 day(s) which contain a total of 14 files changed, 336 insertions(+), 94 deletions(-). The main changes are: 1) Fix fexit/fmod_ret trampoline for sleepable programs, and also fix a ftrace splat in modify_ftrace_direct() on address change, from Alexei Starovoitov. 2) Fix two oob speculation possibilities that allows unprivileged to leak mem via side-channel, from Piotr Krysiuk and Daniel Borkmann. 3) Fix libbpf's netlink handling wrt SOCK_CLOEXEC, from Kumar Kartikeya Dwivedi. 4) Fix libbpf's error handling on failure in getting section names, from Namhyung Kim. 5) Fix tunnel collect_md BPF selftest wrt Geneve option handling, from Hangbin Liu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-18	libbpf: Use SOCK_CLOEXEC when opening the netlink socket	Kumar Kartikeya Dwivedi
	Otherwise, there exists a small window between the opening and closing of the socket fd where it may leak into processes launched by some other thread. Fixes: 949abbe88436 ("libbpf: add function to setup XDP") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210317115857.6536-1-memxor@gmail.com
2021-03-18	libbpf: Fix error path in bpf_object__elf_init()	Namhyung Kim
	When it failed to get section names, it should call into bpf_object__elf_finish() like others. Fixes: 88a82120282b ("libbpf: Factor out common ELF operations and improve logging") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210317145414.884817-1-namhyung@kernel.org
2021-03-17	bpf, selftests: Fix up some test_verifier cases for unprivileged	Piotr Krysiuk
	Fix up test_verifier error messages for the case where the original error message changed, or for the case where pointer alu errors differ between privileged and unprivileged tests. Also, add alternative tests for keeping coverage of the original verifier rejection error message (fp alu), and newly reject map_ptr += rX where rX == 0 given we now forbid alu on these types for unprivileged. All test_verifier cases pass after the change. The test case fixups were kept separate to ease backporting of core changes. Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17	selftests: mlxsw: spectrum-2: Remove q_in_vni_veto test	Amit Cohen
	q_in_vni_veto.sh is not needed anymore because VxLAN with an 802.1ad bridge and VxLAN with an 802.1d bridge can coexist. Remove the test. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17	selftests: forwarding: Add test for dual VxLAN bridge	Amit Cohen
	Configure VxLAN with an 802.1ad bridge and VxLAN with an 802.1d bridge at the same time in same switch, verify that traffic passed as expected. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-17	selftests/bpf: Use nanosleep() syscall instead of sleep() in get_cgroup_id	Ravi Bangoria
	Glibc's sleep() switched to clock_nanosleep() from nanosleep(), and thus syscalls:sys_enter_nanosleep tracepoint is not hitting which is causing testcase failure. Instead of depending on glibc sleep(), call nanosleep() systemcall directly. Before: # ./get_cgroup_id_user ... main:FAIL:compare_cgroup_id kern cgid 0 user cgid 483 After: # ./get_cgroup_id_user ... main:PASS:compare_cgroup_id Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210316153048.136447-1-ravi.bangoria@linux.ibm.com
2021-03-16	selftests/bpf: Fix warning comparing pointer to 0	Jiapeng Chong
	Fix the following coccicheck warnings: ./tools/testing/selftests/bpf/progs/fexit_test.c:77:15-16: WARNING comparing pointer to 0. ./tools/testing/selftests/bpf/progs/fexit_test.c:68:12-13: WARNING comparing pointer to 0. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1615881577-3493-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-03-16	selftests: mlxsw: Test egress sampling limitation on Spectrum-1 only	Ido Schimmel
	Make sure egress sampling configuration only fails on Spectrum-1, given that mlxsw now supports it on Spectrum-{2,3}. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16	selftests: mlxsw: Add tc sample tests for new triggers	Ido Schimmel
	Test that packets are sampled when tc-sample is used with matchall egress binding and flower classifier. Verify that when performing sampling on egress the end-to-end latency is reported as metadata. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16	selftests/net: fix warnings on reuseaddr_ports_exhausted	Carlos Llamas
	Fix multiple warnings seen with gcc 10.2.1: reuseaddr_ports_exhausted.c:32:41: warning: missing braces around initializer [-Wmissing-braces] 32 \| struct reuse_opts unreusable_opts[12] = { \| ^ 33 \| {0, 0, 0, 0}, \| { } { } Fixes: 7f204a7de8b0 ("selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized.") Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-16	selftests/bpf: Build everything in debug mode	Andrii Nakryiko
	Build selftests, bpftool, and libbpf in debug mode with DWARF data to facilitate easier debugging. In terms of impact on building and running selftests. Build is actually faster now: BEFORE: make -j60 380.21s user 37.87s system 1466% cpu 28.503 total AFTER: make -j60 345.47s user 37.37s system 1599% cpu 23.939 total test_progs runtime seems to be the same: BEFORE: real 1m5.139s user 0m1.600s sys 0m43.977s AFTER: real 1m3.799s user 0m1.721s sys 0m42.420s Huge difference is being able to debug issues throughout test_progs, bpftool, and libbpf without constantly updating 3 Makefiles by hand (including GDB seeing the source code without any extra incantations). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210313210920.1959628-5-andrii@kernel.org
2021-03-16	selftests/bpf: Fix maybe-uninitialized warning in xdpxceiver test	Andrii Nakryiko
	xsk_ring_prod__reserve() doesn't necessarily set idx in some conditions, so from static analysis point of view compiler is right about the problems like: In file included from xdpxceiver.c:92: xdpxceiver.c: In function ‘xsk_populate_fill_ring’: /data/users/andriin/linux/tools/testing/selftests/bpf/tools/include/bpf/xsk.h:119:20: warning: ‘idx’ may be used uninitialized in this function [-Wmaybe-uninitialized] return &addrs[idx & fill->mask]; ~~~~^~~~~~~~~~~~ xdpxceiver.c:300:6: note: ‘idx’ was declared here u32 idx; ^~~ xdpxceiver.c: In function ‘tx_only’: xdpxceiver.c:596:30: warning: ‘idx’ may be used uninitialized in this function [-Wmaybe-uninitialized] struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx + i); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix two warnings reported by compiler by pre-initializing variable. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210313210920.1959628-4-andrii@kernel.org
2021-03-16	bpftool: Fix maybe-uninitialized warnings	Andrii Nakryiko
	Somehow when bpftool is compiled in -Og mode, compiler produces new warnings about possibly uninitialized variables. Fix all the reported problems. Fixes: 2119f2189df1 ("bpftool: add C output format option to btf dump subcommand") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210313210920.1959628-3-andrii@kernel.org
2021-03-16	libbpf: Add explicit padding to bpf_xdp_set_link_opts	Andrii Nakryiko
	Adding such anonymous padding fixes the issue with uninitialized portions of bpf_xdp_set_link_opts when using LIBBPF_DECLARE_OPTS macro with inline field initialization: DECLARE_LIBBPF_OPTS(bpf_xdp_set_link_opts, opts, .old_fd = -1); When such code is compiled in debug mode, compiler is generating code that leaves padding bytes uninitialized, which triggers error inside libbpf APIs that do strict zero initialization checks for OPTS structs. Adding anonymous padding field fixes the issue. Fixes: bd5ca3ef93cd ("libbpf: Add function to set link XDP fd while specifying old program") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210313210920.1959628-2-andrii@kernel.org
2021-03-15	bpf: selftests: Remove unused 'nospace_err' in tests for batched ops in ↵	Pedro Tammela
	array maps This seems to be a reminiscent from the hashmap tests. Signed-off-by: Pedro Tammela <pctammela@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210315132954.603108-1-pctammela@gmail.com
2021-03-15	libbpf: Avoid inline hint definition from 'linux/stddef.h'	Pedro Tammela
	Linux headers might pull 'linux/stddef.h' which defines '__always_inline' as the following: #ifndef __always_inline #define __always_inline inline #endif This becomes an issue if the program picks up the 'linux/stddef.h' definition as the macro now just hints inline to clang. This change now enforces the proper definition for BPF programs regardless of the include order. Signed-off-by: Pedro Tammela <pctammela@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210314173839.457768-1-pctammela@gmail.com
2021-03-15	selftests/bpf: Set gopt opt_class to 0 if get tunnel opt failed	Hangbin Liu
	When fixing the bpf test_tunnel.sh geneve failure. I only fixed the IPv4 part but forgot the IPv6 issue. Similar with the IPv4 fixes 557c223b643a ("selftests/bpf: No need to drop the packet when there is no geneve opt"), when there is no tunnel option and bpf_skb_get_tunnel_opt() returns error, there is no need to drop the packets and break all geneve rx traffic. Just set opt_class to 0 and keep returning TC_ACT_OK at the end. Fixes: 557c223b643a ("selftests/bpf: No need to drop the packet when there is no geneve opt") Fixes: 933a741e3b82 ("selftests/bpf: bpf tunnel test.") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: William Tu <u9012063@gmail.com> Link: https://lore.kernel.org/bpf/20210309032214.2112438-1-liuhangbin@gmail.com
2021-03-15	bpf: Add getter and setter for SO_REUSEPORT through bpf_{g,s}etsockopt	Manu Bretelle
	Augment the current set of options that are accessible via bpf_{g,s}etsockopt to also support SO_REUSEPORT. Signed-off-by: Manu Bretelle <chantra@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210310182305.1910312-1-chantra@fb.com
2021-03-14	selftests: mlxsw: Add tc sample tests	Ido Schimmel
	Test that packets are sampled when tc-sample is used and that reported metadata is correct. Two sets of hosts (with and without LAG) are used, since metadata extraction in mlxsw is a bit different when LAG is involved. # ./tc_sample.sh TEST: tc sample rate (forward) [ OK ] TEST: tc sample rate (local receive) [ OK ] TEST: tc sample maximum rate [ OK ] TEST: tc sample group conflict test [ OK ] TEST: tc sample iif [ OK ] TEST: tc sample lag iif [ OK ] TEST: tc sample oif [ OK ] TEST: tc sample lag oif [ OK ] TEST: tc sample out-tc [ OK ] TEST: tc sample out-tc-occ [ OK ] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14	selftests: netdevsim: Test psample functionality	Ido Schimmel
	Test various aspects of psample functionality over netdevsim and in particular test that the psample module correctly reports the provided metadata. Example: # ./psample.sh TEST: psample enable / disable [ OK ] TEST: psample group number [ OK ] TEST: psample metadata [ OK ] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14	Merge tag 'objtool-urgent-2021-03-14' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool fix from Thomas Gleixner: "A single objtool fix to handle the PUSHF/POPF validation correctly for the paravirt changes which modified arch_local_irq_restore not to use popf" * tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool,x86: Fix uaccess PUSHF/POPF validation
2021-03-12	selftests: mptcp: Restore packet capture option in join tests	Mat Martineau
	The join self tests previously used the '-c' command line option to enable creation of pcap files for the tests that run, but the change to allow running a subset of the join tests made overlapping use of that option. Restore the capture functionality with '-c' and move the syncookie test option to '-k'. Fixes: 1002b89f23ea ("selftests: mptcp: add command line arguments for mptcp_join.sh") Acked-and-tested-by: Geliang Tang <geliangtang@gmail.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	selftests: mptcp: add testcases for removing addrs	Geliang Tang
	This patch added the testcases for removing a list of addresses. Used the netlink to flush the addresses in the testcases. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	selftests: mptcp: set addr id for removing testcases	Geliang Tang
	The removing testcases can only delete the addresses from id 1, this patch added the support for deleting the addresses from any id that user set. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	selftests: mptcp: add invert argument for chk_rm_nr	Geliang Tang
	Some of the removing testcases used two zeros as arguments for chk_rm_nr like this: chk_rm_nr 0 0. This doesn't mean that no RM_ADDR has been sent. It only means that RM_ADDR had been sent in the opposite direction that chk_rm_nr is checking. This patch added a new argument invert for chk_rm_nr to allow it can check the RM_ADDR from the opposite direction. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	selftests: netdevsim: Add test for resilient nexthop groups offload API	Ido Schimmel
	Test various aspects of the resilient nexthop group offload API on top of the netdevsim implementation. Both good and bad flows are tested. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	selftests: forwarding: Add resilient multipath tunneling nexthop test	Ido Schimmel
	Add a resilient nexthop objects version of gre_multipath_nh.sh. Test that both IPv4 and IPv6 overlays work with resilient nexthop groups where the nexthops are two GRE tunnels. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	selftests: forwarding: Add resilient hashing test	Ido Schimmel
	Verify that IPv4 and IPv6 multipath forwarding works correctly with resilient nexthop groups and with different weights. Test that when the idle timer is not zero, the resilient groups are not rebalanced - because the nexthop buckets are considered active - and the initial weights (1:1) are used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	selftests: fib_nexthops: Test resilient nexthop groups	Ido Schimmel
	Add test cases for resilient nexthop groups. Exhaustive forwarding tests are added separately under net/forwarding/. Examples: # ./fib_nexthops.sh -t basic_res Basic resilient nexthop group functional tests ---------------------------------------------- TEST: Add a nexthop group with default parameters [ OK ] TEST: Get a nexthop group with default parameters [ OK ] TEST: Get a nexthop group with non-default parameters [ OK ] TEST: Add a nexthop group with 0 buckets [ OK ] TEST: Replace nexthop group parameters [ OK ] TEST: Get a nexthop group after replacing parameters [ OK ] TEST: Replace idle timer [ OK ] TEST: Get a nexthop group after replacing idle timer [ OK ] TEST: Replace unbalanced timer [ OK ] TEST: Get a nexthop group after replacing unbalanced timer [ OK ] TEST: Replace with no parameters [ OK ] TEST: Get a nexthop group after replacing no parameters [ OK ] TEST: Replace nexthop group type - implicit [ OK ] TEST: Replace nexthop group type - explicit [ OK ] TEST: Replace number of nexthop buckets [ OK ] TEST: Get a nexthop group after replacing with invalid parameters [ OK ] TEST: Dump all nexthop buckets [ OK ] TEST: Dump all nexthop buckets in a group [ OK ] TEST: Dump all nexthop buckets with a specific nexthop device [ OK ] TEST: Dump all nexthop buckets with a specific nexthop identifier [ OK ] TEST: Dump all nexthop buckets in a non-existent group [ OK ] TEST: Dump all nexthop buckets in a non-resilient group [ OK ] TEST: Dump all nexthop buckets using a non-existent device [ OK ] TEST: Dump all nexthop buckets with invalid 'groups' keyword [ OK ] TEST: Dump all nexthop buckets with invalid 'fdb' keyword [ OK ] TEST: Get a valid nexthop bucket [ OK ] TEST: Get a nexthop bucket with valid group, but invalid index [ OK ] TEST: Get a nexthop bucket from a non-resilient group [ OK ] TEST: Get a nexthop bucket from a non-existent group [ OK ] Tests passed: 29 Tests failed: 0 # ./fib_nexthops.sh -t ipv4_large_res_grp IPv4 large resilient group (128k buckets) ----------------------------------------- TEST: Dump large (x131072) nexthop buckets [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_large_res_grp IPv6 large resilient group (128k buckets) ----------------------------------------- TEST: Dump large (x131072) nexthop buckets [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv4_res_torture IPv4 runtime resilient nexthop group torture -------------------------------------------- TEST: IPv4 resilient nexthop group torture test [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_res_torture IPv6 runtime resilient nexthop group torture -------------------------------------------- TEST: IPv6 resilient nexthop group torture test [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv4_res_grp_fcnal IPv4 resilient groups functional -------------------------------- TEST: Nexthop group updated when entry is deleted [ OK ] TEST: Nexthop buckets updated when entry is deleted [ OK ] TEST: Nexthop group updated after replace [ OK ] TEST: Nexthop buckets updated after replace [ OK ] TEST: Nexthop group updated when entry is deleted - nECMP [ OK ] TEST: Nexthop buckets updated when entry is deleted - nECMP [ OK ] TEST: Nexthop group updated after replace - nECMP [ OK ] TEST: Nexthop buckets updated after replace - nECMP [ OK ] Tests passed: 8 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_res_grp_fcnal IPv6 resilient groups functional -------------------------------- TEST: Nexthop group updated when entry is deleted [ OK ] TEST: Nexthop buckets updated when entry is deleted [ OK ] TEST: Nexthop group updated after replace [ OK ] TEST: Nexthop buckets updated after replace [ OK ] TEST: Nexthop group updated when entry is deleted - nECMP [ OK ] TEST: Nexthop buckets updated when entry is deleted - nECMP [ OK ] TEST: Nexthop group updated after replace - nECMP [ OK ] TEST: Nexthop buckets updated after replace - nECMP [ OK ] Tests passed: 8 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	selftests: fib_nexthops: List each test case in a different line	Ido Schimmel
	The lines with the IPv4 and IPv6 test cases are already very long and more test cases will be added in subsequent patches. List each test case in a different line to make it easier to extend the test with more test cases. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	selftests: fib_nexthops: Declutter test output	Ido Schimmel
	Before: # ./fib_nexthops.sh -t ipv4_torture IPv4 runtime torture -------------------- TEST: IPv4 torture test [ OK ] ./fib_nexthops.sh: line 213: 19376 Killed ipv4_del_add_loop1 ./fib_nexthops.sh: line 213: 19377 Killed ipv4_grp_replace_loop ./fib_nexthops.sh: line 213: 19378 Killed ip netns exec me ping -f 172.16.101.1 > /dev/null 2>&1 ./fib_nexthops.sh: line 213: 19380 Killed ip netns exec me ping -f 172.16.101.2 > /dev/null 2>&1 ./fib_nexthops.sh: line 213: 19381 Killed ip netns exec me mausezahn veth1 -B 172.16.101.2 -A 172.16.1.1 -c 0 -t tcp "dp=1-1023, flags=syn" > /dev/null 2>&1 Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_torture IPv6 runtime torture -------------------- TEST: IPv6 torture test [ OK ] ./fib_nexthops.sh: line 213: 24453 Killed ipv6_del_add_loop1 ./fib_nexthops.sh: line 213: 24454 Killed ipv6_grp_replace_loop ./fib_nexthops.sh: line 213: 24456 Killed ip netns exec me ping -f 2001:db8:101::1 > /dev/null 2>&1 ./fib_nexthops.sh: line 213: 24457 Killed ip netns exec me ping -f 2001:db8:101::2 > /dev/null 2>&1 ./fib_nexthops.sh: line 213: 24458 Killed ip netns exec me mausezahn -6 veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn" > /dev/null 2>&1 Tests passed: 1 Tests failed: 0 After: # ./fib_nexthops.sh -t ipv4_torture IPv4 runtime torture -------------------- TEST: IPv4 torture test [ OK ] Tests passed: 1 Tests failed: 0 # ./fib_nexthops.sh -t ipv6_torture IPv6 runtime torture -------------------- TEST: IPv6 torture test [ OK ] Tests passed: 1 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "We've got a smattering of changes all over the place which we've acrued since -rc1. To my knowledge, there aren't any pending issues at the moment, but there's still plenty of time for something else to crop up... Summary: - Fix booting a 52-bit-VA-aware kernel on Qualcomm Amberwing - Fix pfn_valid() not to reject all ZONE_DEVICE memory - Fix memory tagging setup for hotplugged memory regions - Fix KASAN tagging in page_alloc() when DEBUG_VIRTUAL is enabled - Fix accidental truncation of CPU PMU event counters - Fix error code initialisation when failing probe of DMC620 PMU - Fix return value initialisation for sve-ptrace selftest - Drop broken support for CMDLINE_EXTEND" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: perf/arm_dmc620_pmu: Fix error return code in dmc620_pmu_device_probe() arm64: mm: remove unused __cpu_uses_extended_idmap[_level()] arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds arm64: perf: Fix 64-bit event counter read truncation arm64/mm: Fix __enable_mmu() for new TGRAN range values kselftest: arm64: Fix exit code of sve-ptrace arm64: mte: Map hotplugged memory as Normal Tagged arm64: kasan: fix page_alloc tagging with DEBUG_VIRTUAL arm64/mm: Reorganize pfn_valid() arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory arm64/mm: Drop THP conditionality from FORCE_MAX_ZONEORDER arm64/mm: Drop redundant ARCH_WANT_HUGE_PMD_SHARE arm64: Drop support for CMDLINE_EXTEND arm64: cpufeatures: Fix handling of CONFIG_CMDLINE for idreg overrides
2021-03-12	objtool,x86: Fix uaccess PUSHF/POPF validation	Peter Zijlstra
	Commit ab234a260b1f ("x86/pv: Rework arch_local_irq_restore() to not use popf") replaced "push %reg; popf" with something like: "test $0x200, %reg; jz 1f; sti; 1:", which breaks the pushf/popf symmetry that commit ea24213d8088 ("objtool: Add UACCESS validation") relies on. The result is: drivers/gpu/drm/amd/amdgpu/si.o: warning: objtool: si_common_hw_init()+0xf36: PUSHF stack exhausted Meanwhile, commit c9c324dc22aa ("objtool: Support stack layout changes in alternatives") makes that we can actually use stack-ops in alternatives, which means we can revert 1ff865e343c2 ("x86,smap: Fix smap_{save,restore}() alternatives"). That in turn means we can limit the PUSHF/POPF handling of ea24213d8088 to those instructions that are in alternatives. Fixes: ab234a260b1f ("x86/pv: Rework arch_local_irq_restore() to not use popf") Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/YEY4rIbQYa5fnnEp@hirez.programming.kicks-ass.net