summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-04-24Merge branch 'selftest-netfilter-additional-cleanups'Jakub Kicinski
Florian Westphal says: ==================== selftest: netfilter: additional cleanups This is the last planned series of the netfilter-selftest-move. It contains cleanups (and speedups) and a few small updates to scripts to improve error/skip reporting. I intend to route future changes, if any, via nf(-next) trees now that the 'massive code churn' phase is over. ==================== Link: https://lore.kernel.org/r/20240423130604.7013-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24selftests: netfilter: conntrack_vrf.sh: prefer socat, not iperf3Florian Westphal
Use socat, like most of the other scripts already do. This also makes the script complete slightly faster (3s -> 1s). iperf3 establishes two connections (1 control connection, and 1+x depending on test), so adjust expected counter values as well. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240423130604.7013-8-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24selftests: netfilter: skip tests on early errorsFlorian Westphal
br_netfilter: If we can't add the needed initial nftables ruleset skip the test, kernel doesn't support a required feature. rpath: run a subset of the tests if possible, but make sure we return the skip return value so they are marked appropriately by the kselftest framework. nft_audit.sh: provide version information when skipping, this should help catching kernel problem (feature not available in kernel) vs. userspace issue (parser doesn't support keyword). Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240423130604.7013-7-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24selftests: netfilter: nft_flowtable.sh: shellcheck cleanupsFlorian Westphal
no functional changes intended except that test will now SKIP in case kernel lacks bridge support and initial rule load failure provides nft version information. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240423130604.7013-6-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24selftests: netfilter: nft_flowtable.sh: re-run with random mtu sizesFlorian Westphal
Now that the test runs much faster, also re-run it with random MTU sizes for the different link legs. flowtable should pass ip fragments, if any, up to the normal forwarding path. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240423130604.7013-5-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24selftests: netfilter: nft_concat_range.sh: shellcheck cleanupsFlorian Westphal
no functional changes intended. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240423130604.7013-4-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24selftests: netfilter: nft_concat_range.sh: drop netcat supportFlorian Westphal
Tests fail on my workstation with netcat 110, instead of debugging+more workarounds just remove this. Tests will fall back to bash or socat. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240423130604.7013-3-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24selftests: netfilter: nft_concat_range.sh: move to lib.sh infraFlorian Westphal
Use busywait helper instead of unconditional sleep, reduces run time from 6m to 2:30 on my system. The busywait helper calls the function passed to it as argument; disable the shellcheck test for unreachable code, it generates many (false) warnings here. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240423130604.7013-2-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24net: openvswitch: Release reference to netdevJun Gu
dev_get_by_name will provide a reference on the netdev. So ensure that the reference of netdev is released after completed. Fixes: 2540088b836f ("net: openvswitch: Check vport netdev name") Signed-off-by: Jun Gu <jun.gu@easystack.cn> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20240423073751.52706-1-jun.gu@easystack.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24Merge branch 'BPF crypto API framework'Martin KaFai Lau
Vadim Fedorenko says: ==================== This series introduces crypto kfuncs to make BPF programs able to utilize kernel crypto subsystem. Crypto operations made pluggable to avoid extensive growth of kernel when it's not needed. Only skcipher is added within this series, but it can be easily extended to other types of operations. No hardware offload supported as it needs sleepable context which is not available for TX or XDP programs. At the same time crypto context initialization kfunc can only run in sleepable context, that's why it should be run separately and store the result in the map. Selftests show the common way to implement crypto actions in BPF programs. Benchmark is also added to have a baseline. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24selftests: bpf: crypto: add benchmark for crypto functionsVadim Fedorenko
Some simple benchmarks are added to understand the baseline of performance. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://lore.kernel.org/r/20240422225024.2847039-5-vadfed@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24selftests: bpf: crypto skcipher algo selftestsVadim Fedorenko
Add simple tc hook selftests to show the way to work with new crypto BPF API. Some tricky dynptr initialization is used to provide empty iv dynptr. Simple AES-ECB algo is used to demonstrate encryption and decryption of fixed size buffers. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://lore.kernel.org/r/20240422225024.2847039-4-vadfed@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24bpf: crypto: add skcipher to bpf cryptoVadim Fedorenko
Implement skcipher crypto in BPF crypto framework. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20240422225024.2847039-3-vadfed@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24bpf: make common crypto API for TC/XDP programsVadim Fedorenko
Add crypto API support to BPF to be able to decrypt or encrypt packets in TC/XDP BPF programs. Special care should be taken for initialization part of crypto algo because crypto alloc) doesn't work with preemtion disabled, it can be run only in sleepable BPF program. Also async crypto is not supported because of the very same issue - TC/XDP BPF programs are not sleepable. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Link: https://lore.kernel.org/r/20240422225024.2847039-2-vadfed@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24bpf: update the comment for BTF_FIELDS_MAXHaiyue Wang
The commit d56b63cf0c0f ("bpf: add support for bpf_wq user type") changes the fields support number to 11, just sync the comment. Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240424054526.8031-1-haiyue.wang@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-25ipvs: Fix checksumming on GSO of SCTP packetsIsmael Luceno
It was observed in the wild that pairs of consecutive packets would leave the IPVS with the same wrong checksum, and the issue only went away when disabling GSO. IPVS needs to avoid computing the SCTP checksum when using GSO. Fixes: 90017accff61 ("sctp: Add GSO support") Co-developed-by: Firo Yang <firo.yang@suse.com> Signed-off-by: Ismael Luceno <iluceno@suse.de> Tested-by: Andreas Taschner <andreas.taschner@suse.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-04-24selftests/bpf: Fix wq test.Alexei Starovoitov
The wq test was missing destroy(skel) part which was causing bpf progs to stay loaded. That was causing test_progs to complain with "Failed to unload bpf_testmod.ko from kernel: -11" message, but adding destroy() wasn't enough, since wq callback may be delayed, so loop on unload of bpf_testmod if errno is EAGAIN. Acked-by: Andrii Nakryiko <andrii@kernel.org> Fixes: 8290dba51910 ("selftests/bpf: wq: add bpf_wq_start() checks") Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24Bluetooth: qca: set power_ctrl_enabled on NULL returned by gpiod_get_optional()Bartosz Golaszewski
Any return value from gpiod_get_optional() other than a pointer to a GPIO descriptor or a NULL-pointer is an error and the driver should abort probing. That being said: commit 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL() with gpiod_get_optional()") no longer sets power_ctrl_enabled on NULL-pointer returned by devm_gpiod_get_optional(). Restore this behavior but bail-out on errors. While at it: also bail-out on error returned when trying to get the "swctrl" GPIO. Reported-by: Wren Turkal <wt@penguintechs.org> Reported-by: Zijun Hu <quic_zijuhu@quicinc.com> Closes: https://lore.kernel.org/linux-bluetooth/1713449192-25926-2-git-send-email-quic_zijuhu@quicinc.com/ Fixes: 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL() with gpiod_get_optional()") Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Tested-by: Wren Turkal" <wt@penguintechs.org> Reported-by: Wren Turkal <wt@penguintechs.org> Reported-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Krzysztof Kozlowski<krzysztof.kozlowski@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: hci_sync: Using hci_cmd_sync_submit when removing Adv MonitorChun-Yi Lee
Since the d883a4669a1de be introduced in v6.4, bluetooth daemon got the following failed message of MGMT_OP_REMOVE_ADV_MONITOR command when controller is power-off: bluetoothd[20976]: src/adapter.c:reset_adv_monitors_complete() Failed to reset Adv Monitors: Failed> Normally this situation is happened when the bluetoothd deamon be started manually after system booting. Which means that bluetoothd received MGMT_EV_INDEX_ADDED event after kernel runs hci_power_off(). Base on doc/mgmt-api.txt, the MGMT_OP_REMOVE_ADV_MONITOR command can be used when the controller is not powered. This patch changes the code in remove_adv_monitor() to use hci_cmd_sync_submit() instead of hci_cmd_sync_queue(). Fixes: d883a4669a1de ("Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if running") Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Manish Mandlik <mmandlik@google.com> Cc: Archie Pusaka <apusaka@chromium.org> Cc: Miao-chen Chou <mcchou@chromium.org> Signed-off-by: Chun-Yi Lee <jlee@suse.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: qca: fix NULL-deref on non-serdev setupJohan Hovold
Qualcomm ROME controllers can be registered from the Bluetooth line discipline and in this case the HCI UART serdev pointer is NULL. Add the missing sanity check to prevent a NULL-pointer dereference when setup() is called for a non-serdev controller. Fixes: e9b3e5b8c657 ("Bluetooth: hci_qca: only assign wakeup with serial port support") Cc: stable@vger.kernel.org # 6.2 Cc: Zhengping Jiang <jiangzp@google.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: qca: fix NULL-deref on non-serdev suspendJohan Hovold
Qualcomm ROME controllers can be registered from the Bluetooth line discipline and in this case the HCI UART serdev pointer is NULL. Add the missing sanity check to prevent a NULL-pointer dereference when wakeup() is called for a non-serdev controller during suspend. Just return true for now to restore the original behaviour and address the crash with pre-6.2 kernels, which do not have commit e9b3e5b8c657 ("Bluetooth: hci_qca: only assign wakeup with serial port support") that causes the crash to happen already at setup() time. Fixes: c1a74160eaf1 ("Bluetooth: hci_qca: Add device_may_wakeup support") Cc: stable@vger.kernel.org # 5.13 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: btusb: mediatek: Fix double free of skb in coredumpSean Wang
hci_devcd_append() would free the skb on error so the caller don't have to free it again otherwise it would cause the double free of skb. Fixes: 0b7015132878 ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support") Reported-by : Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: MGMT: Fix failing to MGMT_OP_ADD_UUID/MGMT_OP_REMOVE_UUIDLuiz Augusto von Dentz
These commands don't require the adapter to be up and running so don't use hci_cmd_sync_queue which would check that flag, instead use hci_cmd_sync_submit which would ensure mgmt_class_complete is set properly regardless if any command was actually run or not. Link: https://github.com/bluez/bluez/issues/809 Fixes: d883a4669a1d ("Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if running") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: qca: fix invalid device address checkJohan Hovold
Qualcomm Bluetooth controllers may not have been provisioned with a valid device address and instead end up using the default address 00:00:00:00:5a:ad. This was previously believed to be due to lack of persistent storage for the address but it may also be due to integrators opting to not use the on-chip OTP memory and instead store the address elsewhere (e.g. in storage managed by secure world firmware). According to Qualcomm, at least WCN6750, WCN6855 and WCN7850 have on-chip OTP storage for the address. As the device type alone cannot be used to determine when the address is valid, instead read back the address during setup() and only set the HCI_QUIRK_USE_BDADDR_PROPERTY flag when needed. This specifically makes sure that controllers that have been provisioned with an address do not start as unconfigured. Reported-by: Janaki Ramaiah Thota <quic_janathot@quicinc.com> Link: https://lore.kernel.org/r/124a7d54-5a18-4be7-9a76-a12017f6cce5@quicinc.com/ Fixes: 5971752de44c ("Bluetooth: hci_qca: Set HCI_QUIRK_USE_BDADDR_PROPERTY for wcn3990") Fixes: e668eb1e1578 ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts") Fixes: 6945795bc81a ("Bluetooth: fix use-bdaddr-property quirk") Cc: stable@vger.kernel.org # 6.5 Cc: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reported-by: Janaki Ramaiah Thota <quic_janathot@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: hci_event: Fix sending HCI_OP_READ_ENC_KEY_SIZELuiz Augusto von Dentz
The code shall always check if HCI_QUIRK_BROKEN_READ_ENC_KEY_SIZE has been set before attempting to use HCI_OP_READ_ENC_KEY_SIZE. Fixes: c569242cd492 ("Bluetooth: hci_event: set the conn encrypted before conn establishes") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: btusb: Fix triggering coredump implementation for QCAZijun Hu
btusb_coredump_qca() uses __hci_cmd_sync() to send a vendor-specific command to trigger firmware coredump, but the command does not have any event as its sync response, so it is not suitable to use __hci_cmd_sync(), fixed by using __hci_cmd_send(). Fixes: 20981ce2d5a5 ("Bluetooth: btusb: Add WCN6855 devcoredump support") Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853WangYuli
Add the support ID(0x0bda, 0x4853) to usb_device_id table for Realtek RTL8852BE. Without this change the device utilizes an obsolete version of the firmware that is encoded in it rather than the updated Realtek firmware and config files from the firmware directory. The latter files implement many new features. The device table is as follows: T: Bus=03 Lev=01 Prnt=01 Port=09 Cnt=03 Dev#= 4 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0bda ProdID=4853 Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms Cc: stable@vger.kernel.org Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_syncLuiz Augusto von Dentz
The extended advertising reports do report the PHYs so this store then in hci_conn so it can be later used in hci_le_ext_create_conn_sync to narrow the PHYs to be scanned since the controller will also perform a scan having a smaller set of PHYs shall reduce the time it takes to find and connect peers. Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()Nathan Chancellor
After an innocuous optimization change in LLVM main (19.0.0), x86_64 allmodconfig (which enables CONFIG_KCSAN / -fsanitize=thread) fails to build due to the checks in check_copy_size(): In file included from net/bluetooth/sco.c:27: In file included from include/linux/module.h:13: In file included from include/linux/stat.h:19: In file included from include/linux/time.h:60: In file included from include/linux/time32.h:13: In file included from include/linux/timex.h:67: In file included from arch/x86/include/asm/timex.h:6: In file included from arch/x86/include/asm/tsc.h:10: In file included from arch/x86/include/asm/msr.h:15: In file included from include/linux/percpu.h:7: In file included from include/linux/smp.h:118: include/linux/thread_info.h:244:4: error: call to '__bad_copy_from' declared with 'error' attribute: copy source size is too small 244 | __bad_copy_from(); | ^ The same exact error occurs in l2cap_sock.c. The copy_to_user() statements that are failing come from l2cap_sock_getsockopt_old() and sco_sock_getsockopt_old(). This does not occur with GCC with or without KCSAN or Clang without KCSAN enabled. len is defined as an 'int' because it is assigned from '__user int *optlen'. However, it is clamped against the result of sizeof(), which has a type of 'size_t' ('unsigned long' for 64-bit platforms). This is done with min_t() because min() requires compatible types, which results in both len and the result of sizeof() being casted to 'unsigned int', meaning len changes signs and the result of sizeof() is truncated. From there, len is passed to copy_to_user(), which has a third parameter type of 'unsigned long', so it is widened and changes signs again. This excessive casting in combination with the KCSAN instrumentation causes LLVM to fail to eliminate the __bad_copy_from() call, failing the build. The official recommendation from LLVM developers is to consistently use long types for all size variables to avoid the unnecessary casting in the first place. Change the type of len to size_t in both l2cap_sock_getsockopt_old() and sco_sock_getsockopt_old(). This clears up the error while allowing min_t() to be replaced with min(), resulting in simpler code with no casts and fewer implicit conversions. While len is a different type than optlen now, it should result in no functional change because the result of sizeof() will clamp all values of optlen in the same manner as before. Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2007 Link: https://github.com/llvm/llvm-project/issues/85647 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-04-24Merge branch 'use network helpers, part 2'Martin KaFai Lau
Geliang Tang says: ==================== This patchset uses more network helpers in test_sock_addr.c, but first of all, patch 2 is needed to make network_helpers.c independent of test_progs.c. Then network_helpers.h can be included into test_sock_addr.c without compile errors. Patch 1 and patch 2 address Martin's comments for the previous series too. v2: - Only a few minor cleanups to patch 5. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24selftests/bpf: Use make_sockaddr in test_sock_addrGeliang Tang
This patch uses public helper make_sockaddr() exported in network_helpers.h instead of the local defined function mk_sockaddr() in test_sock_addr.c. This can avoid duplicate code. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/1473e189d6ca1a3925de4c5354d191a14eca0f3f.1713868264.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24selftests/bpf: Use connect_to_addr in test_sock_addrGeliang Tang
This patch uses public network helper connect_to_addr() exported in network_helpers.h instead of the local defined function connect_to_server() in test_sock_addr.c. This can avoid duplicate code. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/f263797712d93fdfaf2943585c5dfae56714a00b.1713868264.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24selftests/bpf: Use start_server_addr in test_sock_addrGeliang Tang
Include network_helpers.h in test_sock_addr.c, use the newly added public helper start_server_addr() instead of the local defined function start_server(). This can avoid duplicate code. In order to use functions defined in network_helpers.c in test_sock_addr.c, Makefile needs to be updated and <Linux/err.h> needs to be included in network_helpers.h to avoid compilation errors. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/3101f57bde5502383eb41723c8956cc26be06893.1713868264.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24selftests/bpf: Use log_err in open_netns/close_netnsGeliang Tang
ASSERT helpers defined in test_progs.h shouldn't be used in public functions like open_netns() and close_netns(). Since they depend on test__fail() which defined in test_progs.c. Public functions may be used not only in test_progs.c, but in other tests like test_sock_addr.c in the next commit. This patch uses log_err() to replace ASSERT helpers in open_netns() and close_netns() in network_helpers.c to decouple dependencies, then uses ASSERT_OK_PTR() to check the return values of all open_netns(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/d1dad22b2ff4909af3f8bfd0667d046e235303cb.1713868264.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24selftests/bpf: Fix a fd leak in error paths in open_netnsGeliang Tang
As Martin mentioned in review comment, there is an existing bug that orig_netns_fd will be leaked in the later "goto fail;" case after open("/proc/self/ns/net") in open_netns() in network_helpers.c. This patch adds "close(token->orig_netns_fd);" before "free(token);" to fix it. Fixes: a30338840fa5 ("selftests/bpf: Move open_netns() and close_netns() into network_helpers.c") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/a104040b47c3c34c67f3f125cdfdde244a870d3c.1713868264.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-04-24MAINTAINERS: add entry for libeth and libieAlexander Lobakin
Add myself as a maintainer/supporter for libeth and libie. Let they have separate entries from the Intel ethernet code as it's a bit different case and all patches will go through me rather than Tony. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24iavf: switch to Page PoolAlexander Lobakin
Now that the IAVF driver simply uses dev_alloc_page() + free_page() with no custom recycling logics, it can easily be switched to using Page Pool / libeth API instead. This allows to removing the whole dancing around headroom, HW buffer size, and page order. All DMA-for-device is now done in the PP core, for-CPU -- in the libeth helper. Use skb_mark_for_recycle() to bring back the recycling and restore the performance. Speaking of performance: on par with the baseline and faster with the PP optimization series applied. But the memory usage for 1500b MTU is now almost 2x lower (x86_64) thanks to allocating a page every second descriptor. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24iavf: pack iavf_ring more efficientlyAlexander Lobakin
Before replacing the Rx buffer management with libie, clean up &iavf_ring a bit. There are several fields not used anywhere in the code -- simply remove them. Move ::tail up to remove a hole. Replace ::arm_wb boolean with 1-bit flag in ::flags to free 1 more byte. Finally, move ::prev_pkt_ctr out of &iavf_tx_queue_stats -- it doesn't belong there (used for Tx stall detection). Place it next to the stats on the ring itself to fill the 4-byte slot. The result: no holes and all the hot fields fit into the first 64-byte cacheline. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24libeth: add Rx buffer managementAlexander Lobakin
Add a couple intuitive helpers to hide Rx buffer implementation details in the library and not multiplicate it between drivers. The settings are sorta optimized for 100G+ NICs, but nothing really HW-specific here. Use the new page_pool_dev_alloc() to dynamically switch between split-page and full-page modes depending on MTU, page size, required headroom etc. For example, on x86_64 with the default driver settings each page is shared between 2 buffers. Turning on XDP (not in this series) -> increasing headroom requirement pushes truesize out of 2048 boundary, leading to that each buffer starts getting a full page. The "ceiling" limit is %PAGE_SIZE, as only order-0 pages are used to avoid compound overhead. For the above architecture, this means maximum linear frame size of 3712 w/o XDP. Not that &libeth_buf_queue is not a complete queue/ring structure for now, rather a shim, but eventually the libeth-enabled drivers will move to it, with iavf being the first one. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24page_pool: add DMA-sync-for-CPU inline helperAlexander Lobakin
Each driver is responsible for syncing buffers written by HW for CPU before accessing them. Almost each PP-enabled driver uses the same pattern, which could be shorthanded into a static inline to make driver code a little bit more compact. Introduce a simple helper which performs DMA synchronization for the size passed from the driver. It can be used even when the pool doesn't manage DMA-syncs-for-device, just make sure the page has a correct DMA address set via page_pool_set_dma_addr(). Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24page_pool: constify some read-only function argumentsAlexander Lobakin
There are several functions taking pointers to data they don't modify. This includes statistics fetching, page and page_pool parameters, etc. Constify the pointers, so that call sites will be able to pass const pointers as well. No functional changes, no visible changes in functions sizes. Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24slab: introduce kvmalloc_array_node() and kvcalloc_node()Alexander Lobakin
Add NUMA-aware counterparts for kvmalloc_array() and kvcalloc() to be able to flexibly allocate arrays for a particular node. Rewrite kvmalloc_array() to kvmalloc_array_node(NUMA_NO_NODE) call. Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24iavf: drop page splitting and recyclingAlexander Lobakin
As an intermediate step, remove all page splitting/recycling code. Just always allocate a new page and don't touch its refcount, so that it gets freed by the core stack later. Same for the "in-place" recycling, i.e. when an unused buffer gets assigned to a first needs-refilling descriptor. In some cases, this was leading to moving up to 63 &iavf_rx_buf structures around the ring on a per-field basis -- not something wanted on hotpath. The change allows to greatly simplify certain parts of the code: Function: add/remove: 0/2 grow/shrink: 0/7 up/down: 0/-744 (-744) Although the array of &iavf_rx_buf is barely used now and could be replaced with just page pointer array, don't touch it now to not complicate replacing it with libie Rx buffer struct later on. No surprise perf loses up to 30% here, but that regression will go away once PP lands. Note that iavf_rx_pg_*() definitions are left to reduce diffstat. They will be removed with the conversion to Page Pool. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24iavf: kill "legacy-rx" for goodAlexander Lobakin
Ever since build_skb() became stable, the old way with allocating an skb for storing the headers separately, which will be then copied manually, was slower, less flexible, and thus obsolete. * It had higher pressure on MM since it actually allocates new pages, which then get split and refcount-biased (NAPI page cache); * It implies memcpy() of packet headers (40+ bytes per each frame); * the actual header length was calculated via eth_get_headlen(), which invokes Flow Dissector and thus wastes a bunch of CPU cycles; * XDP makes it even more weird since it requires headroom for long and also tailroom for some time (since mbuf landed). Take a look at the ice driver, which is built around work-arounds to make XDP work with it. Even on some quite low-end hardware (not a common case for 100G NICs) it was performing worse. The only advantage "legacy-rx" had is that it didn't require any reserved headroom and tailroom. But iavf didn't use this, as it always splits pages into two halves of 2k, while that save would only be useful when striding. And again, XDP effectively removes that sole pro. There's a train of features to land in IAVF soon: Page Pool, XDP, XSk, multi-buffer etc. Each new would require adding more and more Danse Macabre for absolutely no reason, besides making hotpath less and less effective. Remove the "feature" with all the related code. This includes at least one very hot branch (typically hit on each new frame), which was either always-true or always-false at least for a complete NAPI bulk of 64 frames, the whole private flags cruft, and so on. Some stats: Function: add/remove: 0/4 grow/shrink: 0/7 up/down: 0/-721 (-721) RO Data: add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-40 (-40) Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24net: intel: introduce {, Intel} Ethernet common libraryAlexander Lobakin
Not a secret there's a ton of code duplication between two and more Intel ethernet modules. Before introducing new changes, which would need to be copied over again, start decoupling the already existing duplicate functionality into a new module, which will be shared between several Intel Ethernet drivers. Add the lookup table which converts 8/10-bit hardware packet type into a parsed bitfield structure for easy checking packet format parameters, such as payload level, IP version, etc. This is currently used by i40e, ice and iavf and it's all the same in all three drivers. The only difference introduced in this implementation is that instead of defining a 256 (or 1024 in case of ice) element array, add unlikely() condition to limit the input to 154 (current maximum non-reserved packet type). There's no reason to waste 600 (or even 3600) bytes only to not hurt very unlikely exception packets. The hash computation function now takes payload level directly as a pkt_hash_type. There's a couple cases when non-IP ptypes are marked as L3 payload and in the previous versions their hash level would be 2, not 3. But skb_set_hash() only sees difference between L4 and non-L4, thus this won't change anything at all. The module is behind the hidden Kconfig symbol, which the drivers will select when needed. The exports are behind 'LIBIE' namespace to limit the scope of the functions. Not that non-HW-specific symbols will live in yet another module, libeth. This is done to easily distinguish pretty generic code ready for reusing by any other vendor and/or for moving the layer up from the code useful in Intel's 1-100G drivers only. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-24Merge branch 'introduce-bpf_preempt_-disable-enable'Alexei Starovoitov
Kumar Kartikeya Dwivedi says: ==================== Introduce bpf_preempt_{disable,enable} This set introduces two kfuncs, bpf_preempt_disable and bpf_preempt_enable, which are wrappers around preempt_disable and preempt_enable in the kernel. These functions allow a BPF program to have code sections where preemption is disabled. There are multiple use cases that are served by such a feature, a few are listed below: 1. Writing safe per-CPU alogrithms/data structures that work correctly across different contexts. 2. Writing safe per-CPU allocators similar to bpf_memalloc on top of array/arena memory blobs. 3. Writing locking algorithms in BPF programs natively. Note that local_irq_disable/enable equivalent is also needed for proper IRQ context protection, but that is a more involved change and will be sent later. While bpf_preempt_{disable,enable} is not sufficient for all of these usage scenarios on its own, it is still necessary. The same effect as these kfuncs can in some sense be already achieved using the bpf_spin_lock or rcu_read_lock APIs, therefore from the standpoint of kernel functionality exposure in the verifier, this is well understood territory. Note that these helpers do allow calling kernel helpers and kfuncs from within the non-preemptible region (unless sleepable). Otherwise, any locks built using the preemption helpers will be as limited as existing bpf_spin_lock. Nesting is allowed by keeping a counter for tracking remaining enables required to be performed. Similar approach can be applied to rcu_read_locks in a follow up. Changelog ========= v1: https://lore.kernel.org/bpf/20240423061922.2295517-1-memxor@gmail.com * Move kfunc BTF ID declerations above css task kfunc for !CONFIG_CGROUPS config (Alexei) * Add test case for global function call in non-preemptible region (Jiri) ==================== Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20240424031315.2757363-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24selftests/bpf: Add tests for preempt kfuncsKumar Kartikeya Dwivedi
Add tests for nested cases, nested count preservation upon different subprog calls that disable/enable preemption, and test sleepable helper call in non-preemptible regions. 182/1 preempt_lock/preempt_lock_missing_1:OK 182/2 preempt_lock/preempt_lock_missing_2:OK 182/3 preempt_lock/preempt_lock_missing_3:OK 182/4 preempt_lock/preempt_lock_missing_3_minus_2:OK 182/5 preempt_lock/preempt_lock_missing_1_subprog:OK 182/6 preempt_lock/preempt_lock_missing_2_subprog:OK 182/7 preempt_lock/preempt_lock_missing_2_minus_1_subprog:OK 182/8 preempt_lock/preempt_balance:OK 182/9 preempt_lock/preempt_balance_subprog_test:OK 182/10 preempt_lock/preempt_global_subprog_test:OK 182/11 preempt_lock/preempt_sleepable_helper:OK 182 preempt_lock:OK Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240424031315.2757363-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24bpf: Introduce bpf_preempt_[disable,enable] kfuncsKumar Kartikeya Dwivedi
Introduce two new BPF kfuncs, bpf_preempt_disable and bpf_preempt_enable. These kfuncs allow disabling preemption in BPF programs. Nesting is allowed, since the intended use cases includes building native BPF spin locks without kernel helper involvement. Apart from that, this can be used to per-CPU data structures for cases where programs (or userspace) may preempt one or the other. Currently, while per-CPU access is stable, whether it will be consistent is not guaranteed, as only migration is disabled for BPF programs. Global functions are disallowed from being called, but support for them will be added as a follow up not just preempt kfuncs, but rcu_read_lock kfuncs as well. Static subprog calls are permitted. Sleepable helpers and kfuncs are disallowed in non-preemptible regions. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240424031315.2757363-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-04-24Merge tag 'for-6.9-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix information leak by the buffer returned from LOGICAL_INO ioctl - fix flipped condition in scrub when tracking sectors in zoned mode - fix calculation when dropping extent range - reinstate fallback to write uncompressed data in case of fragmented space that could not store the entire compressed chunk - minor fix to message formatting style to make it conforming to the commonly used style * tag 'for-6.9-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range() btrfs: fix information leak in btrfs_ioctl_logical_to_ino() btrfs: fallback if compressed IO fails for ENOSPC btrfs: scrub: run relocation repair when/only needed btrfs: remove colon from messages with state
2024-04-24bpf: Don't check for recursion in bpf_wq_work.Alexei Starovoitov
__bpf_prog_enter_sleepable_recur does recursion check which is not applicable to wq callback. The callback function is part of bpf program and bpf prog might be running on the same cpu. So recursion check would incorrectly prevent callback from running. The code can call __bpf_prog_enter_sleepable(), but run_ctx would be fake, hence use explicit rcu_read_lock_trace(); migrate_disable(); to address this problem. Another reason to open code is __bpf_prog_enter* are not available in !JIT configs. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404241719.IIGdpAku-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202404241811.FFV4Bku3-lkp@intel.com/ Fixes: eb48f6cd41a0 ("bpf: wq: add bpf_wq_init") Signed-off-by: Alexei Starovoitov <ast@kernel.org>