Age | Commit message (Collapse) | Author |
|
Florian Westphal says:
====================
selftest: netfilter: additional cleanups
This is the last planned series of the netfilter-selftest-move.
It contains cleanups (and speedups) and a few small updates to
scripts to improve error/skip reporting.
I intend to route future changes, if any, via nf(-next) trees
now that the 'massive code churn' phase is over.
====================
Link: https://lore.kernel.org/r/20240423130604.7013-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use socat, like most of the other scripts already do. This also makes
the script complete slightly faster (3s -> 1s).
iperf3 establishes two connections (1 control connection, and 1+x
depending on test), so adjust expected counter values as well.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240423130604.7013-8-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
br_netfilter: If we can't add the needed initial nftables ruleset skip the
test, kernel doesn't support a required feature.
rpath: run a subset of the tests if possible, but make sure we return
the skip return value so they are marked appropriately by the kselftest
framework.
nft_audit.sh: provide version information when skipping, this should
help catching kernel problem (feature not available in kernel) vs.
userspace issue (parser doesn't support keyword).
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240423130604.7013-7-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
no functional changes intended except that test will now SKIP in
case kernel lacks bridge support and initial rule load failure provides
nft version information.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240423130604.7013-6-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the test runs much faster, also re-run it with random MTU sizes
for the different link legs. flowtable should pass ip fragments, if
any, up to the normal forwarding path.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240423130604.7013-5-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
no functional changes intended.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240423130604.7013-4-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Tests fail on my workstation with netcat 110, instead of debugging+more
workarounds just remove this.
Tests will fall back to bash or socat.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240423130604.7013-3-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use busywait helper instead of unconditional sleep, reduces run time
from 6m to 2:30 on my system.
The busywait helper calls the function passed to it as argument; disable
the shellcheck test for unreachable code, it generates many (false)
warnings here.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240423130604.7013-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
dev_get_by_name will provide a reference on the netdev. So ensure that
the reference of netdev is released after completed.
Fixes: 2540088b836f ("net: openvswitch: Check vport netdev name")
Signed-off-by: Jun Gu <jun.gu@easystack.cn>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20240423073751.52706-1-jun.gu@easystack.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vadim Fedorenko says:
====================
This series introduces crypto kfuncs to make BPF programs able to
utilize kernel crypto subsystem. Crypto operations made pluggable to
avoid extensive growth of kernel when it's not needed. Only skcipher is
added within this series, but it can be easily extended to other types
of operations. No hardware offload supported as it needs sleepable
context which is not available for TX or XDP programs. At the same time
crypto context initialization kfunc can only run in sleepable context,
that's why it should be run separately and store the result in the map.
Selftests show the common way to implement crypto actions in BPF
programs. Benchmark is also added to have a baseline.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Some simple benchmarks are added to understand the baseline of
performance.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://lore.kernel.org/r/20240422225024.2847039-5-vadfed@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add simple tc hook selftests to show the way to work with new crypto
BPF API. Some tricky dynptr initialization is used to provide empty iv
dynptr. Simple AES-ECB algo is used to demonstrate encryption and
decryption of fixed size buffers.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://lore.kernel.org/r/20240422225024.2847039-4-vadfed@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Implement skcipher crypto in BPF crypto framework.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20240422225024.2847039-3-vadfed@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add crypto API support to BPF to be able to decrypt or encrypt packets
in TC/XDP BPF programs. Special care should be taken for initialization
part of crypto algo because crypto alloc) doesn't work with preemtion
disabled, it can be run only in sleepable BPF program. Also async crypto
is not supported because of the very same issue - TC/XDP BPF programs
are not sleepable.
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Link: https://lore.kernel.org/r/20240422225024.2847039-2-vadfed@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
The commit d56b63cf0c0f ("bpf: add support for bpf_wq user type")
changes the fields support number to 11, just sync the comment.
Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240424054526.8031-1-haiyue.wang@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
It was observed in the wild that pairs of consecutive packets would leave
the IPVS with the same wrong checksum, and the issue only went away when
disabling GSO.
IPVS needs to avoid computing the SCTP checksum when using GSO.
Fixes: 90017accff61 ("sctp: Add GSO support")
Co-developed-by: Firo Yang <firo.yang@suse.com>
Signed-off-by: Ismael Luceno <iluceno@suse.de>
Tested-by: Andreas Taschner <andreas.taschner@suse.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The wq test was missing destroy(skel) part which was causing bpf progs to stay
loaded. That was causing test_progs to complain with
"Failed to unload bpf_testmod.ko from kernel: -11" message, but adding
destroy() wasn't enough, since wq callback may be delayed, so loop on unload of
bpf_testmod if errno is EAGAIN.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Fixes: 8290dba51910 ("selftests/bpf: wq: add bpf_wq_start() checks")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Any return value from gpiod_get_optional() other than a pointer to a
GPIO descriptor or a NULL-pointer is an error and the driver should
abort probing. That being said: commit 56d074d26c58 ("Bluetooth: hci_qca:
don't use IS_ERR_OR_NULL() with gpiod_get_optional()") no longer sets
power_ctrl_enabled on NULL-pointer returned by
devm_gpiod_get_optional(). Restore this behavior but bail-out on errors.
While at it: also bail-out on error returned when trying to get the
"swctrl" GPIO.
Reported-by: Wren Turkal <wt@penguintechs.org>
Reported-by: Zijun Hu <quic_zijuhu@quicinc.com>
Closes: https://lore.kernel.org/linux-bluetooth/1713449192-25926-2-git-send-email-quic_zijuhu@quicinc.com/
Fixes: 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL() with gpiod_get_optional()")
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Wren Turkal" <wt@penguintechs.org>
Reported-by: Wren Turkal <wt@penguintechs.org>
Reported-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Krzysztof Kozlowski<krzysztof.kozlowski@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Since the d883a4669a1de be introduced in v6.4, bluetooth daemon
got the following failed message of MGMT_OP_REMOVE_ADV_MONITOR
command when controller is power-off:
bluetoothd[20976]:
src/adapter.c:reset_adv_monitors_complete() Failed to reset Adv
Monitors: Failed>
Normally this situation is happened when the bluetoothd deamon
be started manually after system booting. Which means that
bluetoothd received MGMT_EV_INDEX_ADDED event after kernel
runs hci_power_off().
Base on doc/mgmt-api.txt, the MGMT_OP_REMOVE_ADV_MONITOR command
can be used when the controller is not powered. This patch changes
the code in remove_adv_monitor() to use hci_cmd_sync_submit()
instead of hci_cmd_sync_queue().
Fixes: d883a4669a1de ("Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if running")
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Manish Mandlik <mmandlik@google.com>
Cc: Archie Pusaka <apusaka@chromium.org>
Cc: Miao-chen Chou <mcchou@chromium.org>
Signed-off-by: Chun-Yi Lee <jlee@suse.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Qualcomm ROME controllers can be registered from the Bluetooth line
discipline and in this case the HCI UART serdev pointer is NULL.
Add the missing sanity check to prevent a NULL-pointer dereference when
setup() is called for a non-serdev controller.
Fixes: e9b3e5b8c657 ("Bluetooth: hci_qca: only assign wakeup with serial port support")
Cc: stable@vger.kernel.org # 6.2
Cc: Zhengping Jiang <jiangzp@google.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Qualcomm ROME controllers can be registered from the Bluetooth line
discipline and in this case the HCI UART serdev pointer is NULL.
Add the missing sanity check to prevent a NULL-pointer dereference when
wakeup() is called for a non-serdev controller during suspend.
Just return true for now to restore the original behaviour and address
the crash with pre-6.2 kernels, which do not have commit e9b3e5b8c657
("Bluetooth: hci_qca: only assign wakeup with serial port support") that
causes the crash to happen already at setup() time.
Fixes: c1a74160eaf1 ("Bluetooth: hci_qca: Add device_may_wakeup support")
Cc: stable@vger.kernel.org # 5.13
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
hci_devcd_append() would free the skb on error so the caller don't
have to free it again otherwise it would cause the double free of skb.
Fixes: 0b7015132878 ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support")
Reported-by : Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
These commands don't require the adapter to be up and running so don't
use hci_cmd_sync_queue which would check that flag, instead use
hci_cmd_sync_submit which would ensure mgmt_class_complete is set
properly regardless if any command was actually run or not.
Link: https://github.com/bluez/bluez/issues/809
Fixes: d883a4669a1d ("Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if running")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Qualcomm Bluetooth controllers may not have been provisioned with a
valid device address and instead end up using the default address
00:00:00:00:5a:ad.
This was previously believed to be due to lack of persistent storage for
the address but it may also be due to integrators opting to not use the
on-chip OTP memory and instead store the address elsewhere (e.g. in
storage managed by secure world firmware).
According to Qualcomm, at least WCN6750, WCN6855 and WCN7850 have
on-chip OTP storage for the address.
As the device type alone cannot be used to determine when the address is
valid, instead read back the address during setup() and only set the
HCI_QUIRK_USE_BDADDR_PROPERTY flag when needed.
This specifically makes sure that controllers that have been provisioned
with an address do not start as unconfigured.
Reported-by: Janaki Ramaiah Thota <quic_janathot@quicinc.com>
Link: https://lore.kernel.org/r/124a7d54-5a18-4be7-9a76-a12017f6cce5@quicinc.com/
Fixes: 5971752de44c ("Bluetooth: hci_qca: Set HCI_QUIRK_USE_BDADDR_PROPERTY for wcn3990")
Fixes: e668eb1e1578 ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts")
Fixes: 6945795bc81a ("Bluetooth: fix use-bdaddr-property quirk")
Cc: stable@vger.kernel.org # 6.5
Cc: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reported-by: Janaki Ramaiah Thota <quic_janathot@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
The code shall always check if HCI_QUIRK_BROKEN_READ_ENC_KEY_SIZE has
been set before attempting to use HCI_OP_READ_ENC_KEY_SIZE.
Fixes: c569242cd492 ("Bluetooth: hci_event: set the conn encrypted before conn establishes")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
btusb_coredump_qca() uses __hci_cmd_sync() to send a vendor-specific
command to trigger firmware coredump, but the command does not
have any event as its sync response, so it is not suitable to use
__hci_cmd_sync(), fixed by using __hci_cmd_send().
Fixes: 20981ce2d5a5 ("Bluetooth: btusb: Add WCN6855 devcoredump support")
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Add the support ID(0x0bda, 0x4853) to usb_device_id table for
Realtek RTL8852BE.
Without this change the device utilizes an obsolete version of
the firmware that is encoded in it rather than the updated Realtek
firmware and config files from the firmware directory. The latter
files implement many new features.
The device table is as follows:
T: Bus=03 Lev=01 Prnt=01 Port=09 Cnt=03 Dev#= 4 Spd=12 MxCh= 0
D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0bda ProdID=4853 Rev= 0.00
S: Manufacturer=Realtek
S: Product=Bluetooth Radio
S: SerialNumber=00e04c000001
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms
I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms
I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms
I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms
I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms
I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms
E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
Cc: stable@vger.kernel.org
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
The extended advertising reports do report the PHYs so this store then
in hci_conn so it can be later used in hci_le_ext_create_conn_sync to
narrow the PHYs to be scanned since the controller will also perform a
scan having a smaller set of PHYs shall reduce the time it takes to
find and connect peers.
Fixes: 288c90224eec ("Bluetooth: Enable all supported LE PHY by default")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
After an innocuous optimization change in LLVM main (19.0.0), x86_64
allmodconfig (which enables CONFIG_KCSAN / -fsanitize=thread) fails to
build due to the checks in check_copy_size():
In file included from net/bluetooth/sco.c:27:
In file included from include/linux/module.h:13:
In file included from include/linux/stat.h:19:
In file included from include/linux/time.h:60:
In file included from include/linux/time32.h:13:
In file included from include/linux/timex.h:67:
In file included from arch/x86/include/asm/timex.h:6:
In file included from arch/x86/include/asm/tsc.h:10:
In file included from arch/x86/include/asm/msr.h:15:
In file included from include/linux/percpu.h:7:
In file included from include/linux/smp.h:118:
include/linux/thread_info.h:244:4: error: call to '__bad_copy_from'
declared with 'error' attribute: copy source size is too small
244 | __bad_copy_from();
| ^
The same exact error occurs in l2cap_sock.c. The copy_to_user()
statements that are failing come from l2cap_sock_getsockopt_old() and
sco_sock_getsockopt_old(). This does not occur with GCC with or without
KCSAN or Clang without KCSAN enabled.
len is defined as an 'int' because it is assigned from
'__user int *optlen'. However, it is clamped against the result of
sizeof(), which has a type of 'size_t' ('unsigned long' for 64-bit
platforms). This is done with min_t() because min() requires compatible
types, which results in both len and the result of sizeof() being casted
to 'unsigned int', meaning len changes signs and the result of sizeof()
is truncated. From there, len is passed to copy_to_user(), which has a
third parameter type of 'unsigned long', so it is widened and changes
signs again. This excessive casting in combination with the KCSAN
instrumentation causes LLVM to fail to eliminate the __bad_copy_from()
call, failing the build.
The official recommendation from LLVM developers is to consistently use
long types for all size variables to avoid the unnecessary casting in
the first place. Change the type of len to size_t in both
l2cap_sock_getsockopt_old() and sco_sock_getsockopt_old(). This clears
up the error while allowing min_t() to be replaced with min(), resulting
in simpler code with no casts and fewer implicit conversions. While len
is a different type than optlen now, it should result in no functional
change because the result of sizeof() will clamp all values of optlen in
the same manner as before.
Cc: stable@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/2007
Link: https://github.com/llvm/llvm-project/issues/85647
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Geliang Tang says:
====================
This patchset uses more network helpers in test_sock_addr.c, but
first of all, patch 2 is needed to make network_helpers.c independent
of test_progs.c. Then network_helpers.h can be included into
test_sock_addr.c without compile errors.
Patch 1 and patch 2 address Martin's comments for the previous series
too.
v2:
- Only a few minor cleanups to patch 5.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
This patch uses public helper make_sockaddr() exported in network_helpers.h
instead of the local defined function mk_sockaddr() in test_sock_addr.c.
This can avoid duplicate code.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/1473e189d6ca1a3925de4c5354d191a14eca0f3f.1713868264.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
This patch uses public network helper connect_to_addr() exported in
network_helpers.h instead of the local defined function connect_to_server()
in test_sock_addr.c. This can avoid duplicate code.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/f263797712d93fdfaf2943585c5dfae56714a00b.1713868264.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Include network_helpers.h in test_sock_addr.c, use the newly added public
helper start_server_addr() instead of the local defined function
start_server(). This can avoid duplicate code.
In order to use functions defined in network_helpers.c in test_sock_addr.c,
Makefile needs to be updated and <Linux/err.h> needs to be included in
network_helpers.h to avoid compilation errors.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/3101f57bde5502383eb41723c8956cc26be06893.1713868264.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
ASSERT helpers defined in test_progs.h shouldn't be used in public
functions like open_netns() and close_netns(). Since they depend on
test__fail() which defined in test_progs.c. Public functions may be
used not only in test_progs.c, but in other tests like test_sock_addr.c
in the next commit.
This patch uses log_err() to replace ASSERT helpers in open_netns()
and close_netns() in network_helpers.c to decouple dependencies, then
uses ASSERT_OK_PTR() to check the return values of all open_netns().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/d1dad22b2ff4909af3f8bfd0667d046e235303cb.1713868264.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
As Martin mentioned in review comment, there is an existing bug that
orig_netns_fd will be leaked in the later "goto fail;" case after
open("/proc/self/ns/net") in open_netns() in network_helpers.c. This
patch adds "close(token->orig_netns_fd);" before "free(token);" to
fix it.
Fixes: a30338840fa5 ("selftests/bpf: Move open_netns() and close_netns() into network_helpers.c")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/a104040b47c3c34c67f3f125cdfdde244a870d3c.1713868264.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add myself as a maintainer/supporter for libeth and libie. Let they have
separate entries from the Intel ethernet code as it's a bit different
case and all patches will go through me rather than Tony.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Now that the IAVF driver simply uses dev_alloc_page() + free_page() with
no custom recycling logics, it can easily be switched to using Page
Pool / libeth API instead.
This allows to removing the whole dancing around headroom, HW buffer
size, and page order. All DMA-for-device is now done in the PP core,
for-CPU -- in the libeth helper.
Use skb_mark_for_recycle() to bring back the recycling and restore the
performance. Speaking of performance: on par with the baseline and
faster with the PP optimization series applied. But the memory usage for
1500b MTU is now almost 2x lower (x86_64) thanks to allocating a page
every second descriptor.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Before replacing the Rx buffer management with libie, clean up
&iavf_ring a bit.
There are several fields not used anywhere in the code -- simply remove
them. Move ::tail up to remove a hole. Replace ::arm_wb boolean with
1-bit flag in ::flags to free 1 more byte. Finally, move ::prev_pkt_ctr
out of &iavf_tx_queue_stats -- it doesn't belong there (used for Tx
stall detection). Place it next to the stats on the ring itself to fill
the 4-byte slot.
The result: no holes and all the hot fields fit into the first 64-byte
cacheline.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add a couple intuitive helpers to hide Rx buffer implementation details
in the library and not multiplicate it between drivers. The settings are
sorta optimized for 100G+ NICs, but nothing really HW-specific here.
Use the new page_pool_dev_alloc() to dynamically switch between
split-page and full-page modes depending on MTU, page size, required
headroom etc. For example, on x86_64 with the default driver settings
each page is shared between 2 buffers. Turning on XDP (not in this
series) -> increasing headroom requirement pushes truesize out of 2048
boundary, leading to that each buffer starts getting a full page.
The "ceiling" limit is %PAGE_SIZE, as only order-0 pages are used to
avoid compound overhead. For the above architecture, this means maximum
linear frame size of 3712 w/o XDP.
Not that &libeth_buf_queue is not a complete queue/ring structure for
now, rather a shim, but eventually the libeth-enabled drivers will move
to it, with iavf being the first one.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Each driver is responsible for syncing buffers written by HW for CPU
before accessing them. Almost each PP-enabled driver uses the same
pattern, which could be shorthanded into a static inline to make driver
code a little bit more compact.
Introduce a simple helper which performs DMA synchronization for the
size passed from the driver. It can be used even when the pool doesn't
manage DMA-syncs-for-device, just make sure the page has a correct DMA
address set via page_pool_set_dma_addr().
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
There are several functions taking pointers to data they don't modify.
This includes statistics fetching, page and page_pool parameters, etc.
Constify the pointers, so that call sites will be able to pass const
pointers as well.
No functional changes, no visible changes in functions sizes.
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add NUMA-aware counterparts for kvmalloc_array() and kvcalloc() to be
able to flexibly allocate arrays for a particular node.
Rewrite kvmalloc_array() to kvmalloc_array_node(NUMA_NO_NODE) call.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
As an intermediate step, remove all page splitting/recycling code. Just
always allocate a new page and don't touch its refcount, so that it gets
freed by the core stack later.
Same for the "in-place" recycling, i.e. when an unused buffer gets
assigned to a first needs-refilling descriptor. In some cases, this
was leading to moving up to 63 &iavf_rx_buf structures around the ring
on a per-field basis -- not something wanted on hotpath.
The change allows to greatly simplify certain parts of the code:
Function: add/remove: 0/2 grow/shrink: 0/7 up/down: 0/-744 (-744)
Although the array of &iavf_rx_buf is barely used now and could be
replaced with just page pointer array, don't touch it now to not
complicate replacing it with libie Rx buffer struct later on.
No surprise perf loses up to 30% here, but that regression will
go away once PP lands.
Note that iavf_rx_pg_*() definitions are left to reduce diffstat.
They will be removed with the conversion to Page Pool.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Ever since build_skb() became stable, the old way with allocating an skb
for storing the headers separately, which will be then copied manually,
was slower, less flexible, and thus obsolete.
* It had higher pressure on MM since it actually allocates new pages,
which then get split and refcount-biased (NAPI page cache);
* It implies memcpy() of packet headers (40+ bytes per each frame);
* the actual header length was calculated via eth_get_headlen(), which
invokes Flow Dissector and thus wastes a bunch of CPU cycles;
* XDP makes it even more weird since it requires headroom for long and
also tailroom for some time (since mbuf landed). Take a look at the
ice driver, which is built around work-arounds to make XDP work with
it.
Even on some quite low-end hardware (not a common case for 100G NICs) it
was performing worse.
The only advantage "legacy-rx" had is that it didn't require any
reserved headroom and tailroom. But iavf didn't use this, as it always
splits pages into two halves of 2k, while that save would only be useful
when striding. And again, XDP effectively removes that sole pro.
There's a train of features to land in IAVF soon: Page Pool, XDP, XSk,
multi-buffer etc. Each new would require adding more and more Danse
Macabre for absolutely no reason, besides making hotpath less and less
effective.
Remove the "feature" with all the related code. This includes at least
one very hot branch (typically hit on each new frame), which was either
always-true or always-false at least for a complete NAPI bulk of 64
frames, the whole private flags cruft, and so on. Some stats:
Function: add/remove: 0/4 grow/shrink: 0/7 up/down: 0/-721 (-721)
RO Data: add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-40 (-40)
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Not a secret there's a ton of code duplication between two and more Intel
ethernet modules.
Before introducing new changes, which would need to be copied over again,
start decoupling the already existing duplicate functionality into a new
module, which will be shared between several Intel Ethernet drivers.
Add the lookup table which converts 8/10-bit hardware packet type into
a parsed bitfield structure for easy checking packet format parameters,
such as payload level, IP version, etc. This is currently used by i40e,
ice and iavf and it's all the same in all three drivers.
The only difference introduced in this implementation is that instead of
defining a 256 (or 1024 in case of ice) element array, add unlikely()
condition to limit the input to 154 (current maximum non-reserved packet
type). There's no reason to waste 600 (or even 3600) bytes only to not
hurt very unlikely exception packets.
The hash computation function now takes payload level directly as a
pkt_hash_type. There's a couple cases when non-IP ptypes are marked as
L3 payload and in the previous versions their hash level would be 2, not
3. But skb_set_hash() only sees difference between L4 and non-L4, thus
this won't change anything at all.
The module is behind the hidden Kconfig symbol, which the drivers will
select when needed. The exports are behind 'LIBIE' namespace to limit
the scope of the functions.
Not that non-HW-specific symbols will live in yet another module,
libeth. This is done to easily distinguish pretty generic code ready
for reusing by any other vendor and/or for moving the layer up from
the code useful in Intel's 1-100G drivers only.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Kumar Kartikeya Dwivedi says:
====================
Introduce bpf_preempt_{disable,enable}
This set introduces two kfuncs, bpf_preempt_disable and
bpf_preempt_enable, which are wrappers around preempt_disable and
preempt_enable in the kernel. These functions allow a BPF program to
have code sections where preemption is disabled. There are multiple use
cases that are served by such a feature, a few are listed below:
1. Writing safe per-CPU alogrithms/data structures that work correctly
across different contexts.
2. Writing safe per-CPU allocators similar to bpf_memalloc on top of
array/arena memory blobs.
3. Writing locking algorithms in BPF programs natively.
Note that local_irq_disable/enable equivalent is also needed for proper
IRQ context protection, but that is a more involved change and will be
sent later.
While bpf_preempt_{disable,enable} is not sufficient for all of these
usage scenarios on its own, it is still necessary.
The same effect as these kfuncs can in some sense be already achieved
using the bpf_spin_lock or rcu_read_lock APIs, therefore from the
standpoint of kernel functionality exposure in the verifier, this is
well understood territory.
Note that these helpers do allow calling kernel helpers and kfuncs from
within the non-preemptible region (unless sleepable). Otherwise, any
locks built using the preemption helpers will be as limited as
existing bpf_spin_lock.
Nesting is allowed by keeping a counter for tracking remaining enables
required to be performed. Similar approach can be applied to
rcu_read_locks in a follow up.
Changelog
=========
v1: https://lore.kernel.org/bpf/20240423061922.2295517-1-memxor@gmail.com
* Move kfunc BTF ID declerations above css task kfunc for
!CONFIG_CGROUPS config (Alexei)
* Add test case for global function call in non-preemptible region
(Jiri)
====================
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240424031315.2757363-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add tests for nested cases, nested count preservation upon different
subprog calls that disable/enable preemption, and test sleepable helper
call in non-preemptible regions.
182/1 preempt_lock/preempt_lock_missing_1:OK
182/2 preempt_lock/preempt_lock_missing_2:OK
182/3 preempt_lock/preempt_lock_missing_3:OK
182/4 preempt_lock/preempt_lock_missing_3_minus_2:OK
182/5 preempt_lock/preempt_lock_missing_1_subprog:OK
182/6 preempt_lock/preempt_lock_missing_2_subprog:OK
182/7 preempt_lock/preempt_lock_missing_2_minus_1_subprog:OK
182/8 preempt_lock/preempt_balance:OK
182/9 preempt_lock/preempt_balance_subprog_test:OK
182/10 preempt_lock/preempt_global_subprog_test:OK
182/11 preempt_lock/preempt_sleepable_helper:OK
182 preempt_lock:OK
Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240424031315.2757363-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce two new BPF kfuncs, bpf_preempt_disable and
bpf_preempt_enable. These kfuncs allow disabling preemption in BPF
programs. Nesting is allowed, since the intended use cases includes
building native BPF spin locks without kernel helper involvement. Apart
from that, this can be used to per-CPU data structures for cases where
programs (or userspace) may preempt one or the other. Currently, while
per-CPU access is stable, whether it will be consistent is not
guaranteed, as only migration is disabled for BPF programs.
Global functions are disallowed from being called, but support for them
will be added as a follow up not just preempt kfuncs, but rcu_read_lock
kfuncs as well. Static subprog calls are permitted. Sleepable helpers
and kfuncs are disallowed in non-preemptible regions.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240424031315.2757363-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix information leak by the buffer returned from LOGICAL_INO ioctl
- fix flipped condition in scrub when tracking sectors in zoned mode
- fix calculation when dropping extent range
- reinstate fallback to write uncompressed data in case of fragmented
space that could not store the entire compressed chunk
- minor fix to message formatting style to make it conforming to the
commonly used style
* tag 'for-6.9-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()
btrfs: fix information leak in btrfs_ioctl_logical_to_ino()
btrfs: fallback if compressed IO fails for ENOSPC
btrfs: scrub: run relocation repair when/only needed
btrfs: remove colon from messages with state
|
|
__bpf_prog_enter_sleepable_recur does recursion check which is not applicable
to wq callback. The callback function is part of bpf program and bpf prog might
be running on the same cpu. So recursion check would incorrectly prevent
callback from running. The code can call __bpf_prog_enter_sleepable(), but
run_ctx would be fake, hence use explicit rcu_read_lock_trace();
migrate_disable(); to address this problem. Another reason to open code is
__bpf_prog_enter* are not available in !JIT configs.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404241719.IIGdpAku-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202404241811.FFV4Bku3-lkp@intel.com/
Fixes: eb48f6cd41a0 ("bpf: wq: add bpf_wq_init")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|