summaryrefslogtreecommitdiff
path: root/Documentation/networking
AgeCommit message (Collapse)Author
2023-10-19docs: networking: document multi-RSS contextJakub Kicinski
There seems to be no docs for the concept of multiple RSS contexts and how to configure it. I had to explain it three times recently, the last one being the charm, document it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20231018010758.2382742-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-18Documentation: devlink: add a note about RTNL lock into locking sectionJiri Pirko
Add a note describing the locking order of taking RTNL lock with devlink instance lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18Documentation: devlink: add nested instance sectionJiri Pirko
Add a part talking about nested devlink instances describing the helpers and locking ordering. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-17Merge tag 'mlx5-updates-2023-10-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-10-10 1) Adham Faris, Increase max supported channels number to 256 2) Leon Romanovsky, Allow IPsec soft/hard limits in bytes 3) Shay Drory, Replace global mlx5_intf_lock with HCA devcom component lock 4) Wei Zhang, Optimize SF creation flow During SF creation, HCA state gets changed from INVALID to IN_USE step by step. Accordingly, FW sends vhca event to driver to inform about this state change asynchronously. Each vhca event is critical because all related SW/FW operations are triggered by it. Currently there is only a single mlx5 general event handler which not only handles vhca event but many other events. This incurs huge bottleneck because all events are forced to be handled in serial manner. Moreover, all SFs share same table_lock which inevitably impacts each other when they are created in parallel. This series will solve this issue by: 1. A dedicated vhca event handler is introduced to eliminate the mutual impact with other mlx5 events. 2. Max FW threads work queues are employed in the vhca event handler to fully utilize FW capability. 3. Redesign SF active work logic to completely remove table_lock. With above optimization, SF creation time is reduced by 25%, i.e. from 80s to 60s when creating 100 SFs. Patches summary: Patch 1 - implement dedicated vhca event handler with max FW cmd threads of work queues. Patch 2 - remove table_lock by redesigning SF active work logic. * tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Allow IPsec soft/hard limits in bytes net/mlx5e: Increase max supported channels number to 256 net/mlx5e: Preparations for supporting larger number of channels net/mlx5e: Refactor mlx5e_rss_init() and mlx5e_rss_free() API's net/mlx5e: Refactor mlx5e_rss_set_rxfh() and mlx5e_rss_get_rxfh() net/mlx5e: Refactor rx_res_init() and rx_res_free() APIs net/mlx5e: Use PTR_ERR_OR_ZERO() to simplify code net/mlx5: Use PTR_ERR_OR_ZERO() to simplify code net/mlx5: fix config name in Kconfig parameter documentation net/mlx5: Remove unused declaration net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom net/mlx5: Avoid false positive lockdep warning by adding lock_class_key net/mlx5: Redesign SF active work to remove table_lock net/mlx5: Parallelize vhca event handling ==================== Link: https://lore.kernel.org/r/20231014171908.290428-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17net: phylink: remove .validate() methodRussell King (Oracle)
The MAC .validate() method is no longer used, so remove it from the phylink_mac_ops structure, and remove the callsite in phylink_validate_mac_and_pcs(). Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qsPkF-009wij-QM@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17net: phylink: provide mac_get_caps() methodRussell King (Oracle)
Provide a new method, mac_get_caps() to get the MAC capabilities for the specified interface mode. This is for MACs which have special requirements, such as not supporting half-duplex in certain interface modes, and will replace the validate() method. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1qsPk5-009wiX-G5@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-16Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-16 We've added 90 non-merge commits during the last 25 day(s) which contain a total of 120 files changed, 3519 insertions(+), 895 deletions(-). The main changes are: 1) Add missed stats for kprobes to retrieve the number of missed kprobe executions and subsequent executions of BPF programs, from Jiri Olsa. 2) Add cgroup BPF sockaddr hooks for unix sockets. The use case is for systemd to reimplement the LogNamespace feature which allows running multiple instances of systemd-journald to process the logs of different services, from Daan De Meyer. 3) Implement BPF CPUv4 support for s390x BPF JIT, from Ilya Leoshkevich. 4) Improve BPF verifier log output for scalar registers to better disambiguate their internal state wrt defaults vs min/max values matching, from Andrii Nakryiko. 5) Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving the source IP address with a new BPF_FIB_LOOKUP_SRC flag, from Martynas Pumputis. 6) Add support for open-coded task_vma iterator to help with symbolization for BPF-collected user stacks, from Dave Marchevsky. 7) Add libbpf getters for accessing individual BPF ring buffers which is useful for polling them individually, for example, from Martin Kelly. 8) Extend AF_XDP selftests to validate the SHARED_UMEM feature, from Tushar Vyavahare. 9) Improve BPF selftests cross-building support for riscv arch, from Björn Töpel. 10) Add the ability to pin a BPF timer to the same calling CPU, from David Vernet. 11) Fix libbpf's bpf_tracing.h macros for riscv to use the generic implementation of PT_REGS_SYSCALL_REGS() to access syscall arguments, from Alexandre Ghiti. 12) Extend libbpf to support symbol versioning for uprobes, from Hengqi Chen. 13) Fix bpftool's skeleton code generation to guarantee that ELF data is 8 byte aligned, from Ian Rogers. 14) Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4 security mitigations in BPF verifier, from Yafang Shao. 15) Annotate struct bpf_stack_map with __counted_by attribute to prepare BPF side for upcoming __counted_by compiler support, from Kees Cook. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (90 commits) bpf: Ensure proper register state printing for cond jumps bpf: Disambiguate SCALAR register state output in verifier logs selftests/bpf: Make align selftests more robust selftests/bpf: Improve missed_kprobe_recursion test robustness selftests/bpf: Improve percpu_alloc test robustness selftests/bpf: Add tests for open-coded task_vma iter bpf: Introduce task_vma open-coded iterator kfuncs selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c bpf: Don't explicitly emit BTF for struct btf_iter_num bpf: Change syscall_nr type to int in struct syscall_tp_t net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set bpf: Avoid unnecessary audit log for CPU security mitigations selftests/bpf: Add tests for cgroup unix socket address hooks selftests/bpf: Make sure mount directory exists documentation/bpf: Document cgroup unix socket address hooks bpftool: Add support for cgroup unix socket address hooks libbpf: Add support for cgroup unix socket address hooks bpf: Implement cgroup sockaddr hooks for unix sockets bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf bpf: Propagate modified uaddrlen from cgroup sockaddr programs ... ==================== Link: https://lore.kernel.org/r/20231016204803.30153-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-16tcp: Set pingpong threshold via sysctlHaiyang Zhang
TCP pingpong threshold is 1 by default. But some applications, like SQL DB may prefer a higher pingpong threshold to activate delayed acks in quick ack mode for better performance. The pingpong threshold and related code were changed to 3 in the year 2019 in: commit 4a41f453bedf ("tcp: change pingpong threshold to 3") And reverted to 1 in the year 2022 in: commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") There is no single value that fits all applications. Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for optimal performance based on the application needs. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/1697056244-21888-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-15docs: net: description of MSG_ZEROCOPY for AF_VSOCKArseniy Krasnov
This adds description of MSG_ZEROCOPY flag support for AF_VSOCK type of socket. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-14net/mlx5: fix config name in Kconfig parameter documentationLukas Bulwahn
Commit a12ba19269d7 ("net/mlx5: Update Kconfig parameter documentation") adds documentation on Kconfig options for the mlx5 driver. It refers to the config MLX5_EN_MACSEC for MACSec offloading, but the config is actually called MLX5_MACSEC. Fix the reference to the right config name in the documentation. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-13docs: fix info about representor identificationMateusz Polchlopek
Update the "How are representors identified?" documentation subchapter. For newer kernels driver should use SET_NETDEV_DEVLINK_PORT instead of ndo_get_devlink_port() callback. Fixes: 7712b3e966ea ("Merge branch 'net-fix-netdev-to-devlink_port-linkage-and-expose-to-user'") Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20231012123144.15768-1-mateusz.polchlopek@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13Documentation: netconsole: add support for cmdline targetsBreno Leitao
With the previous patches, there is no more limitation at modifying the targets created at boot time (or module load time). Document the way on how to create the configfs directories to be able to modify these netconsole targets. The design discussion about this topic could be found at: https://lore.kernel.org/all/ZRWRal5bW93px4km@gmail.com/ Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231012111401.333798-5-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10appletalk: remove ipddp driverArnd Bergmann
After the cops driver is removed, ipddp is now the only CONFIG_DEV_APPLETALK but as far as I can tell, this also has no users and can be removed, making appletalk support purely based on ethertalk, using ethernet hardware. Link: https://lore.kernel.org/netdev/e490dd0c-a65d-4acf-89c6-c06cb48ec880@app.fastmail.com/ Link: https://lore.kernel.org/netdev/9cac4fbd-9557-b0b8-54fa-93f0290a6fb8@schmorgal.com/ Cc: Doug Brown <doug@schmorgal.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20231009141139.1766345-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-04net: appletalk: remove cops supportGreg Kroah-Hartman
The COPS Appletalk support is very old, never said to actually work properly, and the firmware code for the devices are under a very suspect license. Remove it all to clear up the license issue, if it is still needed and actually used by anyone, we can add it back later once the license is cleared up. Reported-by: Prarit Bhargava <prarit@redhat.com> Cc: jschlst@samba.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230927090029.44704-2-gregkh@linuxfoundation.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-03net: add sysctl to disable rfc4862 5.5.3e lifetime handlingPatrick Rohr
This change adds a sysctl to opt-out of RFC4862 section 5.5.3e's valid lifetime derivation mechanism. RFC4862 section 5.5.3e prescribes that the valid lifetime in a Router Advertisement PIO shall be ignored if it less than 2 hours and to reset the lifetime of the corresponding address to 2 hours. An in-progress 6man draft (see draft-ietf-6man-slaac-renum-07 section 4.2) is currently looking to remove this mechanism. While this draft has not been moving particularly quickly for other reasons, there is widespread consensus on section 4.2 which updates RFC4862 section 5.5.3e. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Jen Linkova <furry@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230925214711.959704-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-28pktgen: Introducing 'SHARED' flag for testing with non-shared skbLiang Chen
Currently, skbs generated by pktgen always have their reference count incremented before transmission, causing their reference count to be always greater than 1, leading to two issues: 1. Only the code paths for shared skbs can be tested. 2. In certain situations, skbs can only be released by pktgen. To enhance testing comprehensiveness, we are introducing the "SHARED" flag to indicate whether an SKB is shared. This flag is enabled by default, aligning with the current behavior. However, disabling this flag allows skbs with a reference count of 1 to be transmitted. So we can test non-shared skbs and code paths where skbs are released within the stack. Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/20230920125658.46978-2-liangchen.linux@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21bpf, docs: Add loongarch64 as arch supporting BPF JITTiezhu Yang
As BPF JIT support for loongarch64 was added about one year ago with commit 5dc615520c4d ("LoongArch: Add BPF JIT support"), it is appropriate to add loongarch64 as arch supporting BPF JIT in bpf and sysctl docs as well. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/1695111937-19697-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-18Documentation: netdev: fix dead link in ax25.rstPeter Lafreniere
http://linux-ax25.org has been down for nearly a year. Its official replacement is https://linux-ax25.in-berlin.de. Update the documentation to point there instead. And acknowledge that while the linux-hams list isn't entirely dead, it isn't what most would call 'active'. Remove that word. Link: https://marc.info/?m=166792551600315 Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 79 files changed, 5275 insertions(+), 600 deletions(-). The main changes are: 1) Basic BTF validation in libbpf, from Andrii Nakryiko. 2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi. 3) next_thread cleanups, from Oleg Nesterov. 4) Add mcpu=v4 support to arm32, from Puranjay Mohan. 5) Add support for __percpu pointers in bpf progs, from Yonghong Song. 6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang. 7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao. Please consider pulling these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git Thanks a lot! Also thanks to reporters, reviewers and testers of commits in this pull-request: Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman", Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle), Song Liu, Stanislav Fomichev, Yonghong Song ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15bpf: expose information about supported xdp metadata kfuncStanislav Fomichev
Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-13idpf: add SRIOV support and other ndo_opsJoshua Hay
Add support for SRIOV: send the requested number of VFs to the device Control Plane, via the virtchnl message and then enable the VFs using 'pci_enable_sriov'. Add other ndo ops supported by the driver such as features_check, set_rx_mode, validate_addr, set_mac_address, change_mtu, get_stats64, set_features, and tx_timeout. Initialize the statistics task which requests the queue related statistics to the CP. Add loopback and promiscuous mode support and the respective virtchnl messages. Finally, add documentation and build support for the driver. Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Co-developed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Alan Brady <alan.brady@intel.com> Co-developed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Madhu Chittim <madhu.chittim@intel.com> Co-developed-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-09-12tcp: defer regular ACK while processing socket backlogEric Dumazet
This idea came after a particular workload requested the quickack attribute set on routes, and a performance drop was noticed for large bulk transfers. For high throughput flows, it is best to use one cpu running the user thread issuing socket system calls, and a separate cpu to process incoming packets from BH context. (With TSO/GRO, bottleneck is usually the 'user' cpu) Problem is the user thread can spend a lot of time while holding the socket lock, forcing BH handler to queue most of incoming packets in the socket backlog. Whenever the user thread releases the socket lock, it must first process all accumulated packets in the backlog, potentially adding latency spikes. Due to flood mitigation, having too many packets in the backlog increases chance of unexpected drops. Backlog processing unfortunately shifts a fair amount of cpu cycles from the BH cpu to the 'user' cpu, thus reducing max throughput. This patch takes advantage of the backlog processing, and the fact that ACK are mostly cumulative. The idea is to detect we are in the backlog processing and defer all eligible ACK into a single one, sent from tcp_release_cb(). This saves cpu cycles on both sides, and network resources. Performance of a single TCP flow on a 200Gbit NIC: - Throughput is increased by 20% (100Gbit -> 120Gbit). - Number of generated ACK per second shrinks from 240,000 to 40,000. - Number of backlog drops per second shrinks from 230 to 0. Benchmark context: - Regular netperf TCP_STREAM (no zerocopy) - Intel(R) Xeon(R) Platinum 8481C (Saphire Rapids) - MAX_SKB_FRAGS = 17 (~60KB per GRO packet) This feature is guarded by a new sysctl, and enabled by default: /proc/sys/net/ipv4/tcp_backlog_ack_defer Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-11Documentation: Drop or replace remaining mentions of IA64Ard Biesheuvel
Drop or update mentions of IA64, as appropriate. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-08-30Merge tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - VFIO direct character device (cdev) interface support. This extracts the vfio device fd from the container and group model, and is intended to be the native uAPI for use with IOMMUFD (Yi Liu) - Enhancements to the PCI hot reset interface in support of cdev usage (Yi Liu) - Fix a potential race between registering and unregistering vfio files in the kvm-vfio interface and extend use of a lock to avoid extra drop and acquires (Dmitry Torokhov) - A new vfio-pci variant driver for the AMD/Pensando Distributed Services Card (PDS) Ethernet device, supporting live migration (Brett Creeley) - Cleanups to remove redundant owner setup in cdx and fsl bus drivers, and simplify driver init/exit in fsl code (Li Zetao) - Fix uninitialized hole in data structure and pad capability structures for alignment (Stefan Hajnoczi) * tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio: (53 commits) vfio/pds: Send type for SUSPEND_STATUS command vfio/pds: fix return value in pds_vfio_get_lm_file() pds_core: Fix function header descriptions vfio: align capability structures vfio/type1: fix cap_migration information leak vfio/fsl-mc: Use module_fsl_mc_driver macro to simplify the code vfio/cdx: Remove redundant initialization owner in vfio_cdx_driver vfio/pds: Add Kconfig and documentation vfio/pds: Add support for firmware recovery vfio/pds: Add support for dirty page tracking vfio/pds: Add VFIO live migration support vfio/pds: register with the pds_core PF pds_core: Require callers of register/unregister to pass PF drvdata vfio/pds: Initial support for pds VFIO driver vfio: Commonize combine_ranges for use in other VFIO drivers kvm/vfio: avoid bouncing the mutex when adding and deleting groups kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add() docs: vfio: Add vfio device cdev description vfio: Compile vfio_group infrastructure optionally vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in __vfio_register_dev() ...
2023-08-30Merge tag 'docs-6.6' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "Documentation work keeps chugging along; this includes: - Work from Carlos Bilbao to integrate rustdoc output into the generated HTML documentation. This took some work to figure out how to do it without slowing the docs build and without creating people who don't have Rust installed, but Carlos got there - Move the loongarch and mips architecture documentation under Documentation/arch/ - Some more maintainer documentation from Jakub ... plus the usual assortment of updates, translations, and fixes" * tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits) Docu: genericirq.rst: fix irq-example input: docs: pxrc: remove reference to phoenix-sim Documentation: serial-console: Fix literal block marker docs/mm: remove references to hmm_mirror ops and clean typos docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add() Documentation: Fix typos Documentation/ABI: Fix typos scripts: kernel-doc: fix macro handling in enums scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN] Documentation: riscv: Update boot image header since EFI stub is supported Documentation: riscv: Add early boot document Documentation: arm: Add bootargs to the table of added DT parameters docs: kernel-parameters: Refer to the correct bitmap function doc: update params of memhp_default_state= docs: Add book to process/kernel-docs.rst docs: sparse: fix invalid link addresses docs: vfs: clean up after the iterate() removal docs: Add a section on surveys to the researcher guidelines docs: move mips under arch docs: move loongarch under arch ...
2023-08-27net/mlx5: Implement devlink port function cmds to control ipsec_packetDima Chumak
Implement devlink port function commands to enable / disable IPsec packet offloads. This is used to control the IPsec capability of the device. When ipsec_offload is enabled for a VF, it prevents adding IPsec packet offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec packet offloads on the PF, it's not allowed to enable ipsec_packet on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27net/mlx5: Implement devlink port function cmds to control ipsec_cryptoDima Chumak
Implement devlink port function commands to enable / disable IPsec crypto offloads. This is used to control the IPsec capability of the device. When ipsec_crypto is enabled for a VF, it prevents adding IPsec crypto offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec crypto offloads on the PF, it's not allowed to enable ipsec_crypto on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27devlink: Expose port function commands to control IPsec packet offloadsDima Chumak
Expose port function commands to enable / disable IPsec packet offloads, this is used to control the port IPsec capabilities. When IPsec packet is disabled for a function of the port (default), function cannot offload IPsec packet operations (encapsulation and XFRM policy offload). When enabled, IPsec packet operations can be offloaded by the function of the port, which includes crypto operation (Encrypt/Decrypt), IPsec encapsulation and XFRM state and policy offload. Example of a PCI VF port which supports IPsec packet offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet disable $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-27devlink: Expose port function commands to control IPsec crypto offloadsDima Chumak
Expose port function commands to enable / disable IPsec crypto offloads, this is used to control the port IPsec capabilities. When IPsec crypto is disabled for a function of the port (default), function cannot offload any IPsec crypto operations (Encrypt/Decrypt and XFRM state offloading). When enabled, IPsec crypto operations can be offloaded by the function of the port. Example of a PCI VF port which supports IPsec crypto offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable $ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-22mptcp: add a new sysctl schedulerGeliang Tang
This patch adds a new sysctl, named scheduler, to support for selection of different schedulers. Export mptcp_get_scheduler helper to get this sysctl. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-4-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-21net/mlx5: Update dead links in Kconfig documentationRahul Rameshbabu
Point to NVIDIA documentation for device specific information now that the Mellanox documentation site is deprecated. Refer to kernel documentation sources for generic information not specific to mlx5 devices. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-21net/mlx5e: aRFS, Introduce ethtool statsAdham Faris
Improve aRFS observability by adding new set of counters. Each Rx ring will have this set of counters listed below. These counters are exposed through ethtool -S. 1) arfs_add: number of times a new rule has been created. 2) arfs_request_in: number of times a rule was requested to move from its current Rx ring to a new Rx ring (incremented on the destination Rx ring). 3) arfs_request_out: number of times a rule was requested to move out from its current Rx ring (incremented on source/current Rx ring). 4) arfs_expired: number of times a rule has been expired by the kernel and removed from HW. 5) arfs_err: number of times a rule creation or modification has failed. This patch removes rx[i]_xsk_arfs_err counter and its documentation in mlx5/counters.rst since aRFS activity does not occur in XSK RQ's. Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com>
2023-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-18Documentation: Fix typosBjorn Helgaas
Fix typos in Documentation. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20230814212822.193684-4-helgaas@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-08-16vfio/pds: Add Kconfig and documentationBrett Creeley
Add Kconfig entries and pds-vfio-pci.rst. Also, add an entry in the MAINTAINERS file for this new driver. It's not clear where documentation for vendor specific VFIO drivers should live, so just re-use the current amd ethernet location. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230807205755.29579-9-brett.creeley@amd.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-08-16netfilter: set default timeout to 3 secs for sctp shutdown send and recv stateXin Long
In SCTP protocol, it is using the same timer (T2 timer) for SHUTDOWN and SHUTDOWN_ACK retransmission. However in sctp conntrack the default timeout value for SCTP_CONNTRACK_SHUTDOWN_ACK_SENT state is 3 secs while it's 300 msecs for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV state. As Paolo Valerio noticed, this might cause unwanted expiration of the ct entry. In my test, with 1s tc netem delay set on the NAT path, after the SHUTDOWN is sent, the sctp ct entry enters SCTP_CONNTRACK_SHUTDOWN_SEND state. However, due to 300ms (too short) delay, when the SHUTDOWN_ACK is sent back from the peer, the sctp ct entry has expired and been deleted, and then the SHUTDOWN_ACK has to be dropped. Also, it is confusing these two sysctl options always show 0 due to all timeout values using sec as unit: net.netfilter.nf_conntrack_sctp_timeout_shutdown_recd = 0 net.netfilter.nf_conntrack_sctp_timeout_shutdown_sent = 0 This patch fixes it by also using 3 secs for sctp shutdown send and recv state in sctp conntrack, which is also RTO.initial value in SCTP protocol. Note that the very short time value for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV was probably used for a rare scenario where SHUTDOWN is sent on 1st path but SHUTDOWN_ACK is replied on 2nd path, then a new connection started immediately on 1st path. So this patch also moves from SHUTDOWN_SEND/RECV to CLOSE when receiving INIT in the ORIGINAL direction. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Reported-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-08-14net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQRahul Rameshbabu
A new check for the tx devlink health reporter is introduced for determining when the PTP port timestamping SQ is considered unhealthy. If there are enough CQEs considered never to be delivered, the space that can be utilized on the SQ decreases significantly, impacting performance and usability of the SQ. The health reporter is triggered when the number of likely never delivered port timestamping CQEs that utilize the space of the PTP SQ is greater than 93.75% of the total capacity of the SQ. A devlink health reporter recover method is also provided for this specific TX error context that restarts the PTP SQ. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEsRahul Rameshbabu
Use a map structure for associating CQEs containing port timestamping information with the appropriate skb. Track order of WQEs submitted using a FIFO. Check if the corresponding port timestamping CQEs from the lookup values in the FIFO are considered dropped due to time elapsed. Return the lookup value to a freelist after consuming the skb. Reuse the freed lookup in future WQE submission iterations. The map structure uses an integer identifier for the key and returns an skb corresponding to that identifier. Embed the integer identifier in the WQE submitted to the WQ for the transmit path when the SQ is a PTP (port timestamping) SQ. The embedded identifier can then be queried using a field in the CQE of the corresponding port timestamping CQ. In the port timestamping napi_poll context, the identifier is queried from the CQE polled from CQ and used to lookup the corresponding skb from the WQE submit path. The skb reference is removed from map and then embedded with the port HW timestamp information from the CQE and eventually consumed. The metadata freelist FIFO is an array containing integer identifiers that can be pushed and popped in the FIFO. The purpose of this structure is bookkeeping what identifier values can safely be used in a subsequent WQE submission and should not contain identifiers that have still not been reaped by processing a corresponding CQE completion on the port timestamping CQ. The ts_cqe_pending_list structure is a combination of an array and linked list. The array is pre-populated with the nodes that will be added and removed from the head of the linked list. Each node contains the unique identifier value associated with the values submitted in the WQEs and retrieved in the port timestamping CQEs. When a WQE is submitted, the node in the array corresponding to the identifier popped from the metadata freelist is added to the end of the CQE pending list and is marked as "in-use". The node is removed from the linked list under two conditions. The first condition is that the corresponding port timestamping CQE is polled in the PTP napi_poll context. The second condition is that more than a second has elapsed since the DMA timestamp value corresponding to the WQE submission. When the first condition occurs, the "in-use" bit in the linked list node is cleared, and the resources corresponding to the WQE submission are then released. The second condition, however, indicates that the port timestamping CQE will likely never be delivered. It's not impossible for the device to post a CQE after an infinite amount of time though highly improbable. In order to be resilient to this improbable case, resources related to the corresponding WQE submission are still kept, the identifier value is not returned to the freelist, and the "in-use" bit is cleared on the node to indicate that it's no longer part of the linked list of "likely to be delivered" port timestamping CQE identifiers. A count for the number of port timestamping CQEs considered highly likely to never be delivered by the device is maintained. This count gets decremented in the unlikely event a port timestamping CQE considered unlikely to ever be delivered is polled in the PTP napi_poll context. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14net/mlx5: Consolidate devlink documentation in devlink/mlx5.rstRahul Rameshbabu
De-duplicate documentation by removing mellanox/mlx5/devlink.rst. Instead, only use the generic devlink documentation directory to document mlx5 devlink parameters. Avoid providing general devlink tool usage information in mlx5-specific documentation. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14net: phy: Introduce PSGMII PHY interface modeGabor Juhos
The PSGMII interface is similar to QSGMII. The main difference is that the PSGMII interface combines five SGMII lines into a single link while in QSGMII only four lines are combined. Similarly to the QSGMII, this interface mode might also needs special handling within the MAC driver. It is commonly used by Qualcomm with their QCA807x PHY series and modern WiSoC-s. Add definitions for the PHY layer to allow to express this type of connection between the MAC and PHY. Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Signed-off-by: Robert Marko <robert.marko@sartura.hr> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-08docs: net: page_pool: de-duplicate the intro commentJakub Kicinski
In commit 82e896d992fa ("docs: net: page_pool: use kdoc to avoid duplicating the information") I shied away from using the DOC: comments when moving to kdoc for documenting page_pool API, because I wasn't sure how familiar people are with it. Turns out there is already a DOC: comment for the intro, which is the same in both places, modulo what looks like minor rewording. Use the version from Documentation/ but keep the contents with the code. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230807210051.1014580-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07page_pool: split types and declarations from page_pool.hYunsheng Lin
Split types and pure function declarations from page_pool.h and add them in page_page/types.h, so that C sources can include page_pool.h and headers should generally only include page_pool/types.h as suggested by jakub. Rename page_pool.h to page_pool/helpers.h to have both in one place. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com [Jakub: change microsoft/mana, fix kdoc paths in Documentation] Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-06gve: update gve.rstRushil Gupta
Add a note about QPL and RDA mode Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-03docs: net: page_pool: use kdoc to avoid duplicating the informationJakub Kicinski
All struct members of the driver-facing APIs are documented twice, in the code and under Documentation. This is a bit tedious. I also get the feeling that a lot of developers will read the header when coding, rather than the doc. Bring the two a little closer together by using kdoc for structs and functions. Using kdoc also gives us links (mentioning a function or struct in the text gets replaced by a link to its doc). Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230802161821.3621985-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-03docs: net: page_pool: document PP_FLAG_DMA_SYNC_DEV parametersJakub Kicinski
Using PP_FLAG_DMA_SYNC_DEV is a bit confusing. It was perhaps more obvious when it was introduced but the page pool use has grown beyond XDP and beyond packet-per-page so now making the heads and tails out of this feature is not trivial. Obviously making the API more user friendly would be a better fix, but until someone steps up to do that let's at least document what the parameters are. Relevant discussion in the first Link. Link: https://lore.kernel.org/all/20230731114427.0da1f73b@kernel.org/ Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://lore.kernel.org/r/20230802161821.3621985-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-28net: change accept_ra_min_rtr_lft to affect all RA lifetimesPatrick Rohr
accept_ra_min_rtr_lft only considered the lifetime of the default route and discarded entire RAs accordingly. This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and applies the value to individual RA sections; in particular, router lifetime, PIO preferred lifetime, and RIO lifetime. If any of those lifetimes are lower than the configured value, the specific RA section is ignored. In order for the sysctl to be useful to Android, it should really apply to all lifetimes in the RA, since that is what determines the minimum frequency at which RAs must be processed by the kernel. Android uses hardware offloads to drop RAs for a fraction of the minimum of all lifetimes present in the RA (some networks have very frequent RAs (5s) with high lifetimes (2h)). Despite this, we have encountered networks that set the router lifetime to 30s which results in very frequent CPU wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the WiFi firmware) entirely on such networks, it seems better to ignore the misconfigured routers while still processing RAs from other IPv6 routers on the same network (i.e. to support IoT applications). The previous implementation dropped the entire RA based on router lifetime. This turned out to be hard to expand to the other lifetimes present in the RA in a consistent manner; dropping the entire RA based on RIO/PIO lifetimes would essentially require parsing the whole thing twice. Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-23net: add sysctl accept_ra_min_rtr_lftPatrick Rohr
This change adds a new sysctl accept_ra_min_rtr_lft to specify the minimum acceptable router lifetime in an RA. If the received RA router lifetime is less than the configured value (and not 0), the RA is ignored. This is useful for mobile devices, whose battery life can be impacted by networks that configure RAs with a short lifetime. On such networks, the device should never gain IPv6 provisioning and should attempt to drop RAs via hardware offload, if available. Signed-off-by: Patrick Rohr <prohr@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-21docs: net: clarify the NAPI rules around XDP TxJakub Kicinski
page pool and XDP should not be accessed from IRQ context which may happen if drivers try to clean up XDP TX with NAPI budget of 0. Link: https://lore.kernel.org/r/20230720161323.2025379-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>