linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-06-17	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Jakub Kicinski
	Daniel Borkmann says: ==================== pull-request: bpf 2022-06-17 We've added 12 non-merge commits during the last 4 day(s) which contain a total of 14 files changed, 305 insertions(+), 107 deletions(-). The main changes are: 1) Fix x86 JIT tailcall count offset on BPF-2-BPF call, from Jakub Sitnicki. 2) Fix a kprobe_multi link bug which misplaces BPF cookies, from Jiri Olsa. 3) Fix an infinite loop when processing a module's BTF, from Kumar Kartikeya Dwivedi. 4) Fix getting a rethook only in RCU available context, from Masami Hiramatsu. 5) Fix request socket refcount leak in sk lookup helpers, from Jon Maxwell. 6) Fix xsk xmit behavior which wrongly adds skb to already full cq, from Ciara Loftus. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: rethook: Reject getting a rethook if RCU is not watching fprobe, samples: Add use_trace option and show hit/missed counter bpf, docs: Update some of the JIT/maintenance entries selftest/bpf: Fix kprobe_multi bench test bpf: Force cookies array to follow symbols sorting ftrace: Keep address offset in ftrace_lookup_symbols selftests/bpf: Shuffle cookies symbols in kprobe multi test selftests/bpf: Test tail call counting with bpf2bpf and data on stack bpf, x86: Fix tail call count offset calculation on bpf2bpf call bpf: Limit maximum modifier chain length in btf_check_type_tags bpf: Fix request_sock leak in sk lookup helpers xsk: Fix generic transmit when completion queue reservation fails ==================== Link: https://lore.kernel.org/r/20220617202119.2421-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17	ipv4: ping: fix bind address validity check	Riccardo Paolo Bestetti
	Commit 8ff978b8b222 ("ipv4/raw: support binding to nonlocal addresses") introduced a helper function to fold duplicated validity checks of bind addresses into inet_addr_valid_or_nonlocal(). However, this caused an unintended regression in ping_check_bind_addr(), which previously would reject binding to multicast and broadcast addresses, but now these are both incorrectly allowed as reported in [1]. This patch restores the original check. A simple reordering is done to improve readability and make it evident that multicast and broadcast addresses should not be allowed. Also, add an early exit for INADDR_ANY which replaces lost behavior added by commit 0ce779a9f501 ("net: Avoid unnecessary inet_addr_type() call when addr is INADDR_ANY"). Furthermore, this patch introduces regression selftests to catch these specific cases. [1] https://lore.kernel.org/netdev/CANP3RGdkAcDyAZoT1h8Gtuu0saq+eOrrTiWbxnOs+5zn+cpyKg@mail.gmail.com/ Fixes: 8ff978b8b222 ("ipv4/raw: support binding to nonlocal addresses") Cc: Miaohe Lin <linmiaohe@huawei.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Riccardo Paolo Bestetti <pbl@bestov.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: spectrum-2: tc_flower_scale: Dynamically set scale target	Ido Schimmel
	Instead of hard coding the scale target in the test, dynamically set it based on the maximum number of flow counters and their current occupancy. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mlxsw: Add a RIF counter scale test	Petr Machata
	This tests creates as many RIFs as possible, ideally more than there can be RIF counters (though that is currently only possible on Spectrum-1). It then tries to enable L3 HW stats on each of the RIFs. It also contains the traffic test, which tries to run traffic through a log2 of those counters and checks that the traffic is shown in the counter values. Like with tc_flower traffic test, take a log2 subset of rules. The logic behind picking log2 rules is that then every bit of the instantiated item's number is exercised. This should catch issues whether they happen at the high end, low end, or somewhere in between. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mlxsw: tc_flower_scale: Add a traffic test	Petr Machata
	Add a test that checks that the created filters do actually trigger on matching traffic. Exercising all the rules would be a very lengthy process. Instead, take a log2 subset of rules. The logic behind picking log2 rules is that then every bit of the instantiated item's number is exercised. This should catch issues whether they happen at the high end, low end, or somewhere in between. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mlxsw: resource_scale: Pass target count to cleanup	Petr Machata
	The scale tests are verifying behavior of mlxsw when number of instances of some resource reaches the ASIC capacity. The number of instances is referred to as "target" number. No scale tests so far needed to know this target number to clean up. E.g. the tc_flower simply removes the clsact qdisc that all the tested filters are hooked onto, and that takes care of collecting all the filters. However, for the RIF counter test, which is being added in a future patch, VLAN netdevices are created. These are created as part of the test, but of course the cleanup needs to undo them again. For that it needs to know how many there were. To support this usage, pass the target number to the cleanup callback. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mlxsw: resource_scale: Allow skipping a test	Petr Machata
	The scale tests are currently testing two things: that some number of instances of a given resource can actually be created; and that when an attempt is made to create more than the supported amount, the failures are noted and handled gracefully. Sometimes the scale test depends on more than one resource. In particular, a following patch will add a RIF counter scale test, which depends on the number of RIF counters that can be bound, and also on the number of RIFs that can be created. When the test is limited by the auxiliary resource and not by the primary one, there's no point trying to run the overflow test, because it would be testing exhaustion of the wrong resource. To support this use case, when the $test_get_target yields 0, skip the test instead. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mlxsw: resource_scale: Introduce traffic tests	Petr Machata
	The scale tests are currently testing two things: that some number of instances of a given resource can actually be created; and that when an attempt is made to create more than the supported amount, the failures are noted and handled gracefully. However the ability to allocate the resource does not mean that the resource actually works when passing traffic. For that, make it possible for a given scale to also test traffic. Traffic test is only run on the positive leg of the scale test (no point trying to pass traffic when the expected outcome is that the resource will not be allocated). Traffic tests are opt-in, if a given test does not expose it, it is not run. To this end, delay the test cleanup until after the traffic test is run. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mlxsw: resource_scale: Update scale target after test setup	Ido Schimmel
	The scale of each resource is tested in the following manner: 1. The scale target is queried. 2. The test setup is prepared. 3. The test is invoked. In some cases, the occupancy of a resource changes as part of the second step, requiring the test to return a scale target that takes this change into account. Make this more robust by re-querying the scale target after the second step. Another possible solution is to swap the first and second steps, but when a test needs to be skipped (i.e., scale target is zero), the setup would have been in vain. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests: mirror_gre_bridge_1q_lag: Enslave port to bridge before other ↵	Amit Cohen
	configurations Using mlxsw driver, the configurations are offloaded just in case that there is a physical port which is enslaved to the virtual device (e.g., to a bridge). In 'mirror_gre_bridge_1q_lag' test, the bridge gets an address and route before there are ports in the bridge. It means that these configurations are not offloaded. Till now the test passes with mlxsw driver even that the RIF of the bridge is not in the hardware, because the ARP packets are trapped in layer 2 and also mirrored, so there is no real need of the RIF in hardware. The previous patch changed the traps 'ARP_REQUEST' and 'ARP_RESPONSE' to be done at layer 3 instead of layer 2. With this change the ARP packets are not trapped during the test, as the RIF is not in the hardware because of the order of configurations. Reorder the configurations to make them to be offloaded, then the test will pass with the change of the traps. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-17	selftests/bpf: Don't force lld on non-x86 architectures	Andrii Nakryiko
	LLVM's lld linker doesn't have a universal architecture support (e.g., it definitely doesn't work on s390x), so be safe and force lld for urandom_read and liburandom_read.so only on x86 architectures. This should fix s390x CI runs. Fixes: 3e6fe5ce4d48 ("libbpf: Fix internal USDT address translation logic for shared libraries") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220617045512.1339795-1-andrii@kernel.org
2022-06-16	selftests/bpf: Add selftests for raw syncookie helpers in TC mode	Maxim Mikityanskiy
	This commit extends selftests for the new BPF helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} to also test the TC BPF functionality added in the previous commit. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-7-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16	selftests/bpf: Add selftests for raw syncookie helpers	Maxim Mikityanskiy
	This commit adds selftests for the new BPF helpers: bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6}. xdp_synproxy_kern.c is a BPF program that generates SYN cookies on allowed TCP ports and sends SYNACKs to clients, accelerating synproxy iptables module. xdp_synproxy.c is a userspace control application that allows to configure the following options in runtime: list of allowed ports, MSS, window scale, TTL. A selftest is added to prog_tests that leverages the above programs to test the functionality of the new helpers. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-5-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16	bpf: Add helpers to issue and check SYN cookies in XDP	Maxim Mikityanskiy
	The new helpers bpf_tcp_raw_{gen,check}_syncookie_ipv{4,6} allow an XDP program to generate SYN cookies in response to TCP SYN packets and to check those cookies upon receiving the first ACK packet (the final packet of the TCP handshake). Unlike bpf_tcp_{gen,check}_syncookie these new helpers don't need a listening socket on the local machine, which allows to use them together with synproxy to accelerate SYN cookie generation. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-4-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16	bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie	Maxim Mikityanskiy
	bpf_tcp_gen_syncookie expects the full length of the TCP header (with all options), and bpf_tcp_check_syncookie accepts lengths bigger than sizeof(struct tcphdr). Fix the documentation that says these lengths should be exactly sizeof(struct tcphdr). While at it, fix a typo in the name of struct ipv6hdr. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-2-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16	selftest/bpf: Fix kprobe_multi bench test	Jiri Olsa
	With [1] the available_filter_functions file contains records starting with __ftrace_invalid_address___ and marking disabled entries. We need to filter them out for the bench test to pass only resolvable symbols to kernel. [1] commit b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") Fixes: b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16	selftests/bpf: Shuffle cookies symbols in kprobe multi test	Jiri Olsa
	There's a kernel bug that causes cookies to be misplaced and the reason we did not catch this with this test is that we provide bpf_fentry_test* functions already sorted by name. Shuffling function bpf_fentry_test2 deeper in the list and keeping the current cookie values as before will trigger the bug. The kernel fix is coming in following changes. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220615112118.497303-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16	selftests/bpf: add tests for sleepable (uk)probes	Delyan Kratunov
	Add tests that ensure sleepable uprobe programs work correctly. Add tests that ensure sleepable kprobe programs cannot attach. Also add tests that attach both sleepable and non-sleepable uprobe programs to the same location (i.e. same bpf_prog_array). Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/c744e5bb7a5c0703f05444dc41f2522ba3579a48.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16	libbpf: add support for sleepable uprobe programs	Delyan Kratunov
	Add section mappings for u(ret)probe.s programs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/aedbc3b74f3523f00010a7b0df8f3388cca59f16.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17	libbpf: Fix internal USDT address translation logic for shared libraries	Andrii Nakryiko
	Perform the same virtual address to file offset translation that libbpf is doing for executable ELF binaries also for shared libraries. Currently libbpf is making a simplifying and sometimes wrong assumption that for shared libraries relative virtual addresses inside ELF are always equal to file offsets. Unfortunately, this is not always the case with LLVM's lld linker, which now by default generates quite more complicated ELF segments layout. E.g., for liburandom_read.so from selftests/bpf, here's an excerpt from readelf output listing ELF segments (a.k.a. program headers): Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x0001f8 0x0001f8 R 0x8 LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x0005e4 0x0005e4 R 0x1000 LOAD 0x0005f0 0x00000000000015f0 0x00000000000015f0 0x000160 0x000160 R E 0x1000 LOAD 0x000750 0x0000000000002750 0x0000000000002750 0x000210 0x000210 RW 0x1000 LOAD 0x000960 0x0000000000003960 0x0000000000003960 0x000028 0x000029 RW 0x1000 Compare that to what is generated by GNU ld (or LLVM lld's with extra -znoseparate-code argument which disables this cleverness in the name of file size reduction): Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000550 0x000550 R 0x1000 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000131 0x000131 R E 0x1000 LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x0000ac 0x0000ac R 0x1000 LOAD 0x002dc0 0x0000000000003dc0 0x0000000000003dc0 0x000262 0x000268 RW 0x1000 You can see from the first example above that for executable (Flg == "R E") PT_LOAD segment (LOAD #2), Offset doesn't match VirtAddr columns. And it does in the second case (GNU ld output). This is important because all the addresses, including USDT specs, operate in a virtual address space, while kernel is expecting file offsets when performing uprobe attach. So such mismatches have to be properly taken care of and compensated by libbpf, which is what this patch is fixing. Also patch clarifies few function and variable names, as well as updates comments to reflect this important distinction (virtaddr vs file offset) and to ephasize that shared libraries are not all that different from executables in this regard. This patch also changes selftests/bpf Makefile to force urand_read and liburand_read.so to be built with Clang and LLVM's lld (and explicitly request this ELF file size optimization through -znoseparate-code linker parameter) to validate libbpf logic and ensure regressions don't happen in the future. I've bundled these selftests changes together with libbpf changes to keep the above description tied with both libbpf and selftests changes. Fixes: 74cc6311cec9 ("libbpf: Add USDT notes parsing and resolution logic") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220616055543.3285835-1-andrii@kernel.org
2022-06-16	selftests: make use of GUP_TEST_FILE macro	Joel Savitz
	Commit 17de1e559cf1 ("selftests: clarify common error when running gup_test") had most of its hunks dropped due to a conflict with another patch accepted into Linux around the same time that implemented the same behavior as a subset of other changes. However, the remaining hunk defines the GUP_TEST_FILE macro without making use of it. This patch makes use of the macro in the two relevant places. Furthermore, the above mentioned commit's log message erroneously describes the changes that were dropped from the patch. This patch corrects the record. Fixes: 17de1e559cf1 ("selftests: clarify common error when running gup_test") Signed-off-by: Joel Savitz <jsavitz@redhat.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Nico Pache <npache@redhat.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-16	selftests: vm: Fix resource leak when return error	Ding Xiang
	When return on an error path, file handle need to be closed to prevent resource leak Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-16	selftests dma: fix compile error for dma_map_benchmark	Yu Liao
	When building selftests/dma: $ make -C tools/testing/selftests TARGETS=dma I hit the following compilation error: dma_map_benchmark.c:13:10: fatal error: linux/map_benchmark.h: No such file or directory #include <linux/map_benchmark.h> ^~~~~~~~~~~~~~~~~~~~~~~ dma/Makefile does not include the map_benchmark.h path, so add more including path, and fix include order in dma_map_benchmark.c Fixes: 8ddde07a3d28 ("dma-mapping: benchmark: extract a common header file for map_benchmark definition") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-16	selftests/bpf: Test tail call counting with bpf2bpf and data on stack	Jakub Sitnicki
	Cover the case when tail call count needs to be passed from BPF function to BPF function, and the caller has data on stack. Specifically when the size of data allocated on BPF stack is not a multiple on 8. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220616162037.535469-3-jakub@cloudflare.com
2022-06-16	Merge tag 'net-5.19-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Mostly driver fixes. Current release - regressions: - Revert "net: Add a second bind table hashed by port and address", needs more work - amd-xgbe: use platform_irq_count(), static setup of IRQ resources had been removed from DT core - dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation Current release - new code bugs: - hns3: modify the ring param print info Previous releases - always broken: - axienet: make the 64b addressable DMA depends on 64b architectures - iavf: fix issue with MAC address of VF shown as zero - ice: fix PTP TX timestamp offset calculation - usb: ax88179_178a needs FLAG_SEND_ZLP Misc: - document some net.sctp.* sysctls" * tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits) net: axienet: add missing error return code in axienet_probe() Revert "net: Add a second bind table hashed by port and address" net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg net: usb: ax88179_178a needs FLAG_SEND_ZLP MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS ARM: dts: at91: ksz9477_evb: fix port/phy validation net: bgmac: Fix an erroneous kfree() in bgmac_remove() ice: Fix memory corruption in VF driver ice: Fix queue config fail handling ice: Sync VLAN filtering features for DVM ice: Fix PTP TX timestamp offset calculation mlxsw: spectrum_cnt: Reorder counter pools docs: networking: phy: Fix a typo amd-xgbe: Use platform_irq_count() octeontx2-vf: Add support for adaptive interrupt coalescing xilinx: Fix build on x86. net: axienet: Use iowrite64 to write all 64b descriptor pointers net: axienet: make the 64b addresable DMA depends on 64b archectures net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization net: hns3: fix PF rss size initialization bug ...
2022-06-16	Revert "net: Add a second bind table hashed by port and address"	Joanne Koong
	This reverts: commit d5a42de8bdbe ("net: Add a second bind table hashed by port and address") commit 538aaf9b2383 ("selftests: Add test for timing a bind request to a port with a populated bhash entry") Link: https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/ There are a few things that need to be fixed here: * Updating bhash2 in cases where the socket's rcv saddr changes * Adding bhash2 hashbucket locks Links to syzbot reports: https://lore.kernel.org/netdev/00000000000022208805e0df247a@google.com/ https://lore.kernel.org/netdev/0000000000003f33bc05dfaf44fe@google.com/ Fixes: d5a42de8bdbe ("net: Add a second bind table hashed by port and address") Reported-by: syzbot+015d756bbd1f8b5c8f09@syzkaller.appspotmail.com Reported-by: syzbot+98fd2d1422063b0f8c44@syzkaller.appspotmail.com Reported-by: syzbot+0a847a982613c6438fba@syzkaller.appspotmail.com Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20220615193213.2419568-1-joannelkoong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-15	tools/kvm_stat: fix display of error when multiple processes are found	Dmitry Klochkov
	Instead of printing an error message, kvm_stat script fails when we restrict statistics to a guest by its name and there are multiple guests with such name: # kvm_stat -g my_vm Traceback (most recent call last): File "/usr/bin/kvm_stat", line 1819, in <module> main() File "/usr/bin/kvm_stat", line 1779, in main options = get_options() File "/usr/bin/kvm_stat", line 1718, in get_options options = argparser.parse_args() File "/usr/lib64/python3.10/argparse.py", line 1825, in parse_args args, argv = self.parse_known_args(args, namespace) File "/usr/lib64/python3.10/argparse.py", line 1858, in parse_known_args namespace, args = self._parse_known_args(args, namespace) File "/usr/lib64/python3.10/argparse.py", line 2067, in _parse_known_args start_index = consume_optional(start_index) File "/usr/lib64/python3.10/argparse.py", line 2007, in consume_optional take_action(action, args, option_string) File "/usr/lib64/python3.10/argparse.py", line 1935, in take_action action(self, namespace, argument_values, option_string) File "/usr/bin/kvm_stat", line 1649, in __call__ ' to specify the desired pid'.format(" ".join(pids))) TypeError: sequence item 0: expected str instance, int found To avoid this, it's needed to convert pids int values to strings before pass them to join(). Signed-off-by: Dmitry Klochkov <kdmitry556@gmail.com> Message-Id: <20220614121141.160689-1-kdmitry556@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-14	selftests/bpf: Avoid skipping certain subtests	Yonghong Song
	Commit 704c91e59fe0 ('selftests/bpf: Test "bpftool gen min_core_btf"') added a test test_core_btfgen to test core relocation with btf generated with 'bpftool gen min_core_btf'. Currently, among 76 subtests, 25 are skipped. ... #46/69 core_reloc_btfgen/enumval:OK #46/70 core_reloc_btfgen/enumval___diff:OK #46/71 core_reloc_btfgen/enumval___val3_missing:OK #46/72 core_reloc_btfgen/enumval___err_missing:SKIP #46/73 core_reloc_btfgen/enum64val:OK #46/74 core_reloc_btfgen/enum64val___diff:OK #46/75 core_reloc_btfgen/enum64val___val3_missing:OK #46/76 core_reloc_btfgen/enum64val___err_missing:SKIP ... #46 core_reloc_btfgen:SKIP Summary: 1/51 PASSED, 25 SKIPPED, 0 FAILED Alexei found that in the above core_reloc_btfgen/enum64val___err_missing should not be skipped. Currently, the core_reloc tests have some negative tests. In Commit 704c91e59fe0, for core_reloc_btfgen, all negative tests are skipped with the following condition if (!test_case->btf_src_file \|\| test_case->fails) { test__skip(); continue; } This is too conservative. Negative tests do not fail mkstemp() and run_btfgen() should not be skipped. There are a few negative tests indeed failing run_btfgen() and this patch added 'run_btfgen_fails' to mark these tests so that they can be skipped for btfgen tests. With this, we have ... #46/69 core_reloc_btfgen/enumval:OK #46/70 core_reloc_btfgen/enumval___diff:OK #46/71 core_reloc_btfgen/enumval___val3_missing:OK #46/72 core_reloc_btfgen/enumval___err_missing:OK #46/73 core_reloc_btfgen/enum64val:OK #46/74 core_reloc_btfgen/enum64val___diff:OK #46/75 core_reloc_btfgen/enum64val___val3_missing:OK #46/76 core_reloc_btfgen/enum64val___err_missing:OK ... Summary: 1/62 PASSED, 14 SKIPPED, 0 FAILED Totally 14 subtests are skipped instead of 25. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220614055526.628299-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-14	selftests/bpf: Fix test_varlen verification failure with latest llvm	Yonghong Song
	With latest llvm15, test_varlen failed with the following verifier log: 17: (85) call bpf_probe_read_kernel_str#115 ; R0_w=scalar(smin=-4095,smax=256) 18: (bf) r1 = r0 ; R0_w=scalar(id=1,smin=-4095,smax=256) R1_w=scalar(id=1,smin=-4095,smax=256) 19: (67) r1 <<= 32 ; R1_w=scalar(smax=1099511627776,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32_max=) 20: (bf) r2 = r1 ; R1_w=scalar(id=2,smax=1099511627776,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32) 21: (c7) r2 s>>= 32 ; R2=scalar(smin=-2147483648,smax=256) ; if (len >= 0) { 22: (c5) if r2 s< 0x0 goto pc+7 ; R2=scalar(umax=256,var_off=(0x0; 0x1ff)) ; payload4_len1 = len; 23: (18) r2 = 0xffffc90000167418 ; R2_w=map_value(off=1048,ks=4,vs=1572,imm=0) 25: (63) (u32 )(r2 +0) = r0 ; R0=scalar(id=1,smin=-4095,smax=256) R2_w=map_value(off=1048,ks=4,vs=1572,imm=0) 26: (77) r1 >>= 32 ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; payload += len; 27: (18) r6 = 0xffffc90000167424 ; R6_w=map_value(off=1060,ks=4,vs=1572,imm=0) 29: (0f) r6 += r1 ; R1_w=Pscalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=map_value(off=1060,ks=4,vs=1572,umax=4294967295,var_off=(0) ; len = bpf_probe_read_kernel_str(payload, MAX_LEN, &buf_in2[0]); 30: (bf) r1 = r6 ; R1_w=map_value(off=1060,ks=4,vs=1572,umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=map_value(off=1060,ks=4,vs=1572,um) 31: (b7) r2 = 256 ; R2_w=256 32: (18) r3 = 0xffffc90000164100 ; R3_w=map_value(off=256,ks=4,vs=1056,imm=0) 34: (85) call bpf_probe_read_kernel_str#115 R1 unbounded memory access, make sure to bounds check any such access processed 27 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- libbpf: failed to load program 'handler32_signed' The failure is due to 20: (bf) r2 = r1 ; R1_w=scalar(id=2,smax=1099511627776,umax=18446744069414584320,var_off=(0x0; 0xffffffff00000000),s32_min=0,s32_max=0,u32) 21: (c7) r2 s>>= 32 ; R2=scalar(smin=-2147483648,smax=256) 22: (c5) if r2 s< 0x0 goto pc+7 ; R2=scalar(umax=256,var_off=(0x0; 0x1ff)) 26: (77) r1 >>= 32 ; R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 29: (0f) r6 += r1 ; R1_w=Pscalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=map_value(off=1060,ks=4,vs=1572,umax=4294967295,var_off=(0) where r1 has conservative value range compared to r2 and r1 is used later. In llvm, commit [1] triggered the above code generation and caused verification failure. It may take a while for llvm to address this issue. In the main time, let us change the variable 'len' type to 'long' and adjust condition properly. Tested with llvm14 and latest llvm15, both worked fine. [1] https://reviews.llvm.org/D126647 Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220613233449.2860753-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-14	bpftool: Do not check return value from libbpf_set_strict_mode()	Quentin Monnet
	The function always returns 0, so we don't need to check whether the return value is 0 or not. This change was first introduced in commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"), but later reverted to restore the unconditional rlimit bump in bpftool. Let's re-add it. Co-developed-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220610112648.29695-3-quentin@isovalent.com
2022-06-14	Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"	Quentin Monnet
	This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8. In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the kernel has switched to memcg-based memory accounting. Thanks to the LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility with other systems and ask libbpf to raise the limit for us if necessary. How do we know if memcg-based accounting is supported? There is a probe in libbpf to check this. But this probe currently relies on the availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which landed in the same kernel version as the memory accounting change. This works in the generic case, but it may fail, for example, if the helper function has been backported to an older kernel. This has been observed for Google Cloud's Container-Optimized OS (COS), where the helper is available but rlimit is still in use. The probe succeeds, the rlimit is not raised, and probing features with bpftool, for example, fails. A patch was submitted [0] to update this probe in libbpf, based on what the cilium/ebpf Go library does [1]. It would lower the soft rlimit to 0, attempt to load a BPF object, and reset the rlimit. But it may induce some hard-to-debug flakiness if another process starts, or the current application is killed, while the rlimit is reduced, and the approach was discarded. As a workaround to ensure that the rlimit bump does not depend on the availability of a given helper, we restore the unconditional rlimit bump in bpftool for now. [0] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/ [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20220610112648.29695-2-quentin@isovalent.com
2022-06-14	selftests: Fix clang cross compilation	Mark Brown
	Unlike GCC clang uses a single compiler image to support multiple target architectures meaning that we can't simply rely on CROSS_COMPILE to select the output architecture. Instead we must pass --target to the compiler to tell it what to output, kselftest was not doing this so cross compilation of kselftest using clang resulted in kselftest being built for the host architecture. More work is required to fix tests using custom rules but this gets the bulk of things building. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-06-14	libbpf: Fix an unsigned < 0 bug	Yonghong Song
	Andrii reported a bug with the following information: 2859 if (enum64_placeholder_id == 0) { 2860 enum64_placeholder_id = btf__add_int(btf, "enum64_placeholder", 1, 0); >>> CID 394804: Control flow issues (NO_EFFECT) >>> This less-than-zero comparison of an unsigned value is never true. "enum64_placeholder_id < 0U". 2861 if (enum64_placeholder_id < 0) 2862 return enum64_placeholder_id; 2863 ... Here enum64_placeholder_id declared as '__u32' so enum64_placeholder_id < 0 is always false. Declare enum64_placeholder_id as 'int' in order to capture the potential error properly. Fixes: f2a625889bb8 ("libbpf: Add enum64 sanitization") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220613054314.1251905-1-yhs@fb.com
2022-06-14	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: "While last week's pull request contained miscellaneous fixes for x86, this one covers other architectures, selftests changes, and a bigger series for APIC virtualization bugs that were discovered during 5.20 development. The idea is to base 5.20 development for KVM on top of this tag. ARM64: - Properly reset the SVE/SME flags on vcpu load - Fix a vgic-v2 regression regarding accessing the pending state of a HW interrupt from userspace (and make the code common with vgic-v3) - Fix access to the idreg range for protected guests - Ignore 'kvm-arm.mode=protected' when using VHE - Return an error from kvm_arch_init_vm() on allocation failure - A bunch of small cleanups (comments, annotations, indentation) RISC-V: - Typo fix in arch/riscv/kvm/vmid.c - Remove broken reference pattern from MAINTAINERS entry x86-64: - Fix error in page tables with MKTME enabled - Dirty page tracking performance test extended to running a nested guest - Disable APICv/AVIC in cases that it cannot implement correctly" [ This merge also fixes a misplaced end parenthesis bug introduced in commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") pointed out by Sean Christopherson ] Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/ * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits) KVM: selftests: Restrict test region to 48-bit physical addresses when using nested KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 KVM: selftests: Clean up LIBKVM files in Makefile KVM: selftests: Link selftests directly with lib object files KVM: selftests: Drop unnecessary rule for STATIC_LIBS KVM: selftests: Add a helper to check EPT/VPID capabilities KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h KVM: selftests: Refactor nested_map() to specify target level KVM: selftests: Drop stale function parameter comment for nested_map() KVM: selftests: Add option to create 2M and 1G EPT mappings KVM: selftests: Replace x86_page_size with PG_LEVEL_XX KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un\|}blocking KVM: x86: disable preemption while updating apicv inhibition KVM: x86: SVM: fix avic_kick_target_vcpus_fast KVM: x86: SVM: remove avic's broken code that updated APIC ID KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base KVM: x86: document AVIC/APICv inhibit reasons KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs ...
2022-06-14	Merge tag 'x86-bugs-2022-06-01' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MMIO stale data fixes from Thomas Gleixner: "Yet another hw vulnerability with a software mitigation: Processor MMIO Stale Data. They are a class of MMIO-related weaknesses which can expose stale data by propagating it into core fill buffers. Data which can then be leaked using the usual speculative execution methods. Mitigations include this set along with microcode updates and are similar to MDS and TAA vulnerabilities: VERW now clears those buffers too" * tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation/mmio: Print SMT warning KVM: x86/speculation: Disable Fill buffer clear within guests x86/speculation/mmio: Reuse SRBDS mitigation for SBDS x86/speculation/srbds: Update SRBDS mitigation selection x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data x86/speculation/mmio: Enable CPU Fill buffer clearing on idle x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data x86/speculation: Add a common function for MD_CLEAR mitigation update x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug Documentation: Add documentation for Processor MMIO Stale Data
2022-06-12	Merge tag 'random-5.19-rc2-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator fixes from Jason Donenfeld: - A fix for a 5.19 regression for a case in which early device tree initializes the RNG, which flips a static branch. On most plaforms, jump labels aren't initialized until much later, so this caused splats. On a few mailing list threads, we cooked up easy fixes for arm64, arm32, and risc-v. But then things looked slightly more involved for xtensa, powerpc, arc, and mips. And at that point, when we're patching 7 architectures in a place before the console is even available, it seems like the cost/risk just wasn't worth it. So random.c works around it now by checking the already exported `static_key_initialized` boolean, as though somebody already ran into this issue in the past. I'm not super jazzed about that; it'd be prettier to not have to complicate downstream code. But I suppose it's practical. - A few small code nits and adding a missing __init annotation. - A change to the default config values to use the cpu and bootloader's seeds for initializing the RNG earlier. This brings them into line with what all the distros do (Fedora/RHEL, Debian, Ubuntu, Gentoo, Arch, NixOS, Alpine, SUSE, and Void... at least), and moreover will now give us test coverage in various test beds that might have caught the above device tree bug earlier. - A change to WireGuard CI's configuration to increase test coverage around the RNG. - A documentation comment fix to unrelated maintainerless CRC code that I was asked to take, I guess because it has to do with polynomials (which the RNG thankfully no longer uses). * tag 'random-5.19-rc2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: wireguard: selftests: use maximum cpu features and allow rng seeding random: remove rng_has_arch_random() random: credit cpu and bootloader seeds by default random: do not use jump labels before they are initialized random: account for arch randomness in bits random: mark bootloader randomness code as __init random: avoid checking crng_ready() twice in random_init() crc-itu-t: fix typo in CRC ITU-T polynomial comment
2022-06-11	selftest/bpf/benchs: Add bpf_map benchmark	Feng Zhou
	Add benchmark for hash_map to reproduce the worst case that non-stop update when map's free is zero. Just like this: ./run_bench_bpf_hashmap_full_update.sh Setting up benchmark 'bpf-hashmap-ful-update'... Benchmark 'bpf-hashmap-ful-update' started. 1:hash_map_full_perf 555830 events per sec ... Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Link: https://lore.kernel.org/r/20220610023308.93798-3-zhoufeng.zf@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-11	wireguard: selftests: use maximum cpu features and allow rng seeding	Jason A. Donenfeld
	By forcing the maximum CPU that QEMU has available, we expose additional capabilities, such as the RNDR instruction, which increases test coverage. This then allows the CI to skip the fake seeding step in some cases. Also enable STRICT_KERNEL_RWX to catch issues related to early jump labels when the RNG is initialized at boot. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-06-09	Merge tag 'net-5.19-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - eth: amt: fix possible null-ptr-deref in amt_rcv() Previous releases - regressions: - tcp: use alloc_large_system_hash() to allocate table_perturb - af_unix: fix a data-race in unix_dgram_peer_wake_me() - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF Previous releases - always broken: - ipv6: fix signed integer overflow in __ip6_append_data - netfilter: - nat: really support inet nat without l3 address - nf_tables: memleak flow rule from commit path - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs - openvswitch: fix misuse of the cached connection on tuple changes - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred - eth: altera: fix refcount leak in altera_tse_mdio_create Misc: - add Quentin Monnet to bpftool maintainers" * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) net: amd-xgbe: fix clang -Wformat warning tcp: use alloc_large_system_hash() to allocate table_perturb net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY net: dsa: mv88e6xxx: correctly report serdes link failure net: dsa: mv88e6xxx: fix BMSR error to be consistent with others net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete net: altera: Fix refcount leak in altera_tse_mdio_create net: openvswitch: fix misuse of the cached connection on tuple changes net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag ip_gre: test csum_start instead of transport header au1000_eth: stop using virt_to_bus() ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg ipv6: Fix signed integer overflow in __ip6_append_data nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION net: ipv6: unexport __init-annotated seg6_hmac_init() net: xfrm: unexport __init-annotated xfrm4_protocol_init() net: mdio: unexport __init-annotated mdio_bus_init() ...
2022-06-09	KVM: selftests: Restrict test region to 48-bit physical addresses when using ↵	David Matlack
	nested The selftests nested code only supports 4-level paging at the moment. This means it cannot map nested guest physical addresses with more than 48 bits. Allow perf_test_util nested mode to work on hosts with more than 48 physical addresses by restricting the guest test region to 48-bits. While here, opportunistically fix an off-by-one error when dealing with vm_get_max_gfn(). perf_test_util.c was treating this as the maximum number of GFNs, rather than the maximum allowed GFN. This didn't result in any correctness issues, but it did end up shifting the test region down slightly when using huge pages. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-12-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2	David Matlack
	Add an option to dirty_log_perf_test that configures the vCPUs to run in L2 instead of L1. This makes it possible to benchmark the dirty logging performance of nested virtualization, which is particularly interesting because KVM must shadow L1's EPT/NPT tables. For now this support only works on x86_64 CPUs with VMX. Otherwise passing -n results in the test being skipped. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-11-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Clean up LIBKVM files in Makefile	David Matlack
	Break up the long lines for LIBKVM and alphabetize each architecture. This makes reading the Makefile easier, and will make reading diffs to LIBKVM easier. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-10-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Link selftests directly with lib object files	David Matlack
	The linker does obey strong/weak symbols when linking static libraries, it simply resolves an undefined symbol to the first-encountered symbol. This means that defining __weak arch-generic functions and then defining arch-specific strong functions to override them in libkvm will not always work. More specifically, if we have: lib/generic.c: void __weak foo(void) { pr_info("weak\n"); } void bar(void) { foo(); } lib/x86_64/arch.c: void foo(void) { pr_info("strong\n"); } And a selftest that calls bar(), it will print "weak". Now if you make generic.o explicitly depend on arch.o (e.g. add function to arch.c that is called directly from generic.c) it will print "strong". In other words, it seems that the linker is free to throw out arch.o when linking because generic.o does not explicitly depend on it, which causes the linker to lose the strong symbol. One solution is to link libkvm.a with --whole-archive so that the linker doesn't throw away object files it thinks are unnecessary. However that is a bit difficult to plumb since we are using the common selftests makefile rules. An easier solution is to drop libkvm.a just link selftests with all the .o files that were originally in libkvm.a. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-9-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Drop unnecessary rule for STATIC_LIBS	David Matlack
	Drop the "all: $(STATIC_LIBS)" rule. The KVM selftests already depend on $(STATIC_LIBS), so there is no reason to have an extra "all" rule. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Add a helper to check EPT/VPID capabilities	David Matlack
	Create a small helper function to check if a given EPT/VPID capability is supported. This will be re-used in a follow-up commit to check for 1G page support. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h	David Matlack
	This is a VMX-related macro so move it to vmx.h. While here, open code the mask like the rest of the VMX bitmask macros. No functional change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Refactor nested_map() to specify target level	David Matlack
	Refactor nested_map() to specify that it explicityl wants 4K mappings (the existing behavior) and push the implementation down into __nested_map(), which can be used in subsequent commits to create huge page mappings. No function change intended. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Drop stale function parameter comment for nested_map()	David Matlack
	nested_map() does not take a parameter named eptp_memslot. Drop the comment referring to it. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Add option to create 2M and 1G EPT mappings	David Matlack
	The current EPT mapping code in the selftests only supports mapping 4K pages. This commit extends that support with an option to map at 2M or 1G. This will be used in a future commit to create large page mappings to test eager page splitting. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-09	KVM: selftests: Replace x86_page_size with PG_LEVEL_XX	David Matlack
	x86_page_size is an enum used to communicate the desired page size with which to map a range of memory. Under the hood they just encode the desired level at which to map the page. This ends up being clunky in a few ways: - The name suggests it encodes the size of the page rather than the level. - In other places in x86_64/processor.c we just use a raw int to encode the level. Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass around raw ints when referring to the level. This makes the code easier to understand since these macros are very common in KVM MMU code. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220520233249.3776001-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>