summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-12MAINTAINERS: Add sctp headers to the general netdev entryMarcelo Ricardo Leitner
All SCTP patches are picked up by netdev maintainers. Two headers were missing to be listed there. Reported-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/b3c2dc3a102eb89bd155abca2503ebd015f50ee0.1739193671.git.marcelo.leitner@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12Merge branch '200GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-02-11 (idpf, ixgbe, igc) For idpf: Sridhar fixes a couple issues in handling of RSC packets. Josh adds a call to set_real_num_queues() to keep queue count in sync. For ixgbe: Piotr removes missed IS_ERR() removal when ERR_PTR usage was removed. For igc: Zdenek Bouska fixes reporting of Rx timestamp with AF_XDP. Siang sets buffer type on empty frame to ensure proper handling. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Set buffer type for empty frames in igc_init_empty_frame igc: Fix HW RX timestamp when passed by ZC XDP ixgbe: Fix possible skb NULL pointer dereference idpf: call set_real_num_queues in idpf_open idpf: record rx queue in skb for RSC packets idpf: fix handling rsc packet with a single segment ==================== Link: https://patch.msgid.link/20250211214343.4092496-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12netlink: specs: add conntrack dump and stats dump supportFlorian Westphal
This adds support to dump the connection tracking table ("conntrack -L") and the conntrack statistics, ("conntrack -S"). Example conntrack dump: tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/conntrack.yaml --dump get [{'id': 59489769, 'mark': 0, 'nfgen-family': 2, 'protoinfo': {'protoinfo-tcp': {'tcp-flags-original': {'flags': {'maxack', 'sack-perm', 'window-scale'}, 'mask': set()}, 'tcp-flags-reply': {'flags': {'maxack', 'sack-perm', 'window-scale'}, 'mask': set()}, 'tcp-state': 'established', 'tcp-wscale-original': 7, 'tcp-wscale-reply': 8}}, 'res-id': 0, 'secctx': {'secctx-name': 'system_u:object_r:unlabeled_t:s0'}, 'status': {'assured', 'confirmed', 'dst-nat-done', 'seen-reply', 'src-nat-done'}, 'timeout': 431949, 'tuple-orig': {'tuple-ip': {'ip-v4-dst': '34.107.243.93', 'ip-v4-src': '192.168.0.114'}, 'tuple-proto': {'proto-dst-port': 443, 'proto-num': 6, 'proto-src-port': 37104}}, 'tuple-reply': {'tuple-ip': {'ip-v4-dst': '192.168.0.114', 'ip-v4-src': '34.107.243.93'}, 'tuple-proto': {'proto-dst-port': 37104, 'proto-num': 6, 'proto-src-port': 443}}, 'use': 1, 'version': 0}, {'id': 3402229480, Example stats dump: tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/conntrack.yaml --dump get-stats [{'chain-toolong': 0, 'clash-resolve': 3, 'drop': 0, .... Changes since last iteration: - Address comments from Donald Hunter, in particular, fixup "get" and "get-stats" descriptions, the former operation supports both dump and normal request (returns a single entry, if found), the latter only supports dumps. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20250210152159.41077-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net: avoid unconditionally touching sk_tsflags on RXPaolo Abeni
After commit 5d4cc87414c5 ("net: reorganize "struct sock" fields"), the sk_tsflags field shares the same cacheline with sk_forward_alloc. The UDP protocol does not acquire the sock lock in the RX path; forward allocations are protected via the receive queue spinlock; additionally udp_recvmsg() calls sock_recv_cmsgs() unconditionally touching sk_tsflags on each packet reception. Due to the above, under high packet rate traffic, when the BH and the user-space process run on different CPUs, UDP packet reception experiences a cache miss while accessing sk_tsflags. The receive path doesn't strictly need to access the problematic field; change sock_set_timestamping() to maintain the relevant information in a newly allocated sk_flags bit, so that sock_recv_cmsgs() can take decisions accessing the latter field only. With this patch applied, on an AMD epic server with i40e NICs, I measured a 10% performance improvement for small packets UDP flood performance tests - possibly a larger delta could be observed with more recent H/W. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/dbd18c8a1171549f8249ac5a8b30b1b5ec88a425.1739294057.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12Merge branch 'netlink-specs-add-a-spec-for-nl80211-wiphy'Jakub Kicinski
Donald Hunter says: ==================== netlink: specs: add a spec for nl80211 wiphy Add a rudimentary YNL spec for nl80211 that includes get-wiphy and get-interface, along with some required enhancements to YNL and the netlink schemas. Patch 1 is a minor cleanup to prepare for patch 2 Patches 2-4 are new features for YNL Patches 5-7 are updates to ynl_gen_c Patches 8-9 are schema updates for feature parity Patch 10 is the new nl80211 spec ==================== Link: https://patch.msgid.link/20250211120127.84858-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12netlink: specs: wireless: add a spec for nl80211Donald Hunter
Add a rudimentary YNL spec for nl80211 that covers get-wiphy, get-interface and get-protocol-features. ./tools/net/ynl/pyynl/cli.py --family nl80211 \ --do get-protocol-features {'protocol-features': {'split-wiphy-dump'}} ./tools/net/ynl/pyynl/cli.py --family nl80211 \ --dump get-wiphy --json '{ "split-wiphy-dump": true }' ./tools/net/ynl/pyynl/cli.py --family nl80211 \ --dump get-interface Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Link: https://patch.msgid.link/20250211120127.84858-11-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12netlink: specs: add s8, s16 to genetlink schemasDonald Hunter
Add s8 and s16 types to the genetlink schemas to align scalar types across all schemas. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250211120127.84858-10-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12netlink: specs: support nested structs in genetlink legacyDonald Hunter
Nested structs are already supported in netlink-raw. Add the same capability to the genetlink legacy schema. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250211120127.84858-9-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12tools/net/ynl: add indexed-array scalar support to ynl-gen-cDonald Hunter
Extend ynl-gen-c.py with support for indexed-array that has a scalar sub-type. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250211120127.84858-8-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12tools/net/ynl: sanitise enums with leading digits in ynl-gen-cDonald Hunter
Turn attribute names with leading digits into valid C names by prepending an underscore, e.g. 5ghz -> _5ghz Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250211120127.84858-7-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12tools/net/ynl: add s8, s16 to valid scalars in ynl-gen-cDonald Hunter
Add the missing s8 and s16 scalar types to the list of recognised scalars in ynl-gen-c. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250211120127.84858-6-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12tools/net/ynl: accept IP string inputsDonald Hunter
The ynl tool uses display-hint to know when to format IP addresses in printed output, but not to parse IP addresses from --json input. Add support for parsing ipv4 and ipv6 strings. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250211120127.84858-5-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12tools/net/ynl: support rendering C array members to stringsDonald Hunter
The nl80211 family encodes the list of supported ciphers as a C array of u32 values. Add support for translating arrays of scalars into strings for enum names and display hints. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250211120127.84858-4-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12tools/net/ynl: support decoding indexed arrays as enumsDonald Hunter
When decoding an indexed-array with a scalar subtype, it is currently only possible to add a display-hint. Add support for decoding each value as an enum. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250211120127.84858-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12tools/net/ynl: remove extraneous plural from variable namesDonald Hunter
_decode_array_attr() uses variable subattrs in every branch when only one branch decodes more than a single attribute. Change the variable name to subattr in the branches that only decode a single attribute so that the intent is more obvious. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250211120127.84858-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12Merge branch 'net: dsa: add support for phylink managed EEE'Jakub Kicinski
Russell King says: ==================== net: dsa: add support for phylink managed EEE This series adds support for phylink managed EEE to DSA, and converts mt753x to make use of this feature. Patch 1 implements a helper to indicate whether the MAC LPI operations are populated (suggested by Vladimir) Patch 2 makes the necessary changes to the core code - we retain calling set_mac_eee(), but this method now becomes a way to merely validate the arguments when using phylink managed EEE rather than performing any configuration. Patch 3 converts the mt7530 driver to use phylink managed EEE. ==================== Link: https://patch.msgid.link/Z6nWujbjxlkzK_3P@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net: dsa: mt7530: convert to phylink managed EEERussell King (Oracle)
Convert mt7530 to use phylink managed EEE. When enabling EEE, we set both PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 irrespective of the speed, and clear them both when disabling. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1thR9q-003vXI-Cp@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net: dsa: allow use of phylink managed EEE supportRussell King (Oracle)
In order to allow DSA drivers to use phylink managed EEE, we need to change the behaviour of the DSA's .set_eee() ethtool method. Implementation of the DSA .set_mac_eee() method becomes optional with phylink managed EEE as it is only used to validate the EEE parameters supplied from userspace. The rest of the EEE state management should be left to phylink. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1thR9l-003vXC-9F@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net: phylink: provide phylink_mac_implements_lpi()Russell King (Oracle)
Provide a helper to determine whether the MAC operations structure implements the LPI operations, which will be used by both phylink and DSA. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1thR9g-003vX6-4s@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12Merge branch 'eth-fbnic-report-software-queue-stats'Jakub Kicinski
Jakub Kicinski says: ==================== eth: fbnic: report software queue stats Fill in typical software queue stats. # ./pyynl/cli.py --spec netlink/specs/netdev.yaml --dump qstats-get [{'ifindex': 2, 'rx-alloc-fail': 0, 'rx-bytes': 398064076, 'rx-csum-complete': 271, 'rx-csum-none': 0, 'rx-packets': 276044, 'tx-bytes': 7223770, 'tx-needs-csum': 28148, 'tx-packets': 28449, 'tx-stop': 0, 'tx-wake': 0}] Note that we don't collect csum-unnecessary, just the uncommon cases (and unnecessary is all the rest of the packets). There is no programatic use for these stats AFAIK, just manual debug. ==================== Link: https://patch.msgid.link/20250211181356.580800-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12eth: fbnic: re-sort the objects in the MakefileJakub Kicinski
Looks like recent commit broke the sort order, fix it. Acked-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250211181356.580800-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12eth: fbnic: report software Tx queue statsJakub Kicinski
Gather and report software Tx queue stats - checksum stats and queue stop / start. Acked-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250211181356.580800-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12eth: fbnic: report software Rx queue statsJakub Kicinski
Gather and report software Rx queue stats - checksum stats and allocation failures. Acked-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250211181356.580800-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12eth: fbnic: wrap tx queue stats in a structJakub Kicinski
The queue stats struct is used for Rx and Tx queues. Wrap the Tx stats in a struct and a union, so that we can reuse the same space for Rx stats on Rx queues. This also makes it easy to add an assert to the stat handling code to catch new stats not being aggregated on shutdown. Acked-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250211181356.580800-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net: report csum_complete via qstatsJakub Kicinski
Commit 13c7c941e729 ("netdev: add qstat for csum complete") reserved the entry for csum complete in the qstats uAPI. Start reporting this value now that we have a driver which needs it. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20250211181356.580800-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12bcachefs: Reuse transactionAlan Huang
bch2_nocow_write_convert_unwritten is already in transaction context: 00191 ========= TEST generic/648 00242 kernel BUG at fs/bcachefs/btree_iter.c:3332! 00242 Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP 00242 Modules linked in: 00242 CPU: 4 UID: 0 PID: 2593 Comm: fsstress Not tainted 6.13.0-rc3-ktest-g345af8f855b7 #14403 00242 Hardware name: linux,dummy-virt (DT) 00242 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) 00242 pc : __bch2_trans_get+0x120/0x410 00242 lr : __bch2_trans_get+0xcc/0x410 00242 sp : ffffff80d89af600 00242 x29: ffffff80d89af600 x28: ffffff80ddb23000 x27: 00000000fffff705 00242 x26: ffffff80ddb23028 x25: ffffff80d8903fe0 x24: ffffff80ebb30168 00242 x23: ffffff80c8aeb500 x22: 000000000000005d x21: ffffff80d8904078 00242 x20: ffffff80d8900000 x19: ffffff80da9e8000 x18: 0000000000000000 00242 x17: 64747568735f6c61 x16: 6e72756f6a20726f x15: 0000000000000028 00242 x14: 0000000000000004 x13: 000000000000f787 x12: ffffffc081bbcdc8 00242 x11: 0000000000000000 x10: 0000000000000003 x9 : ffffffc08094efbc 00242 x8 : 000000001092c111 x7 : 000000000000000c x6 : ffffffc083c31fc4 00242 x5 : ffffffc083c31f28 x4 : ffffff80c8aeb500 x3 : ffffff80ebb30000 00242 x2 : 0000000000000001 x1 : 0000000000000a21 x0 : 000000000000028e 00242 Call trace: 00242 __bch2_trans_get+0x120/0x410 (P) 00242 bch2_inum_offset_err_msg+0x48/0xb0 00242 bch2_nocow_write_convert_unwritten+0x3d0/0x530 00242 bch2_nocow_write+0xeb0/0x1000 00242 __bch2_write+0x330/0x4e8 00242 bch2_write+0x1f0/0x530 00242 bch2_direct_write+0x530/0xc00 00242 bch2_write_iter+0x160/0xbe0 00242 vfs_write+0x1cc/0x360 00242 ksys_write+0x5c/0xf0 00242 __arm64_sys_write+0x20/0x30 00242 invoke_syscall.constprop.0+0x54/0xe8 00242 do_el0_svc+0x44/0xc0 00242 el0_svc+0x34/0xa0 00242 el0t_64_sync_handler+0x104/0x130 00242 el0t_64_sync+0x154/0x158 00242 Code: 6b01001f 54ffff01 79408460 3617fec0 (d4210000) 00242 ---[ end trace 0000000000000000 ]--- 00242 Kernel panic - not syncing: Oops - BUG: Fatal exception Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-12bcachefs: Pass _orig_restart_count to trans_was_restartedAlan Huang
_orig_restart_count is unused now, according to the logic, trans_was_restarted should be using _orig_restart_count. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-12bcachefs: CONFIG_BCACHEFS_INJECT_TRANSACTION_RESTARTSKent Overstreet
Incorrectly handled transaction restarts can be a source of heisenbugs; add a mode where we randomly inject them to shake them out. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-12Merge tag 'mfd-fixes-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD fix from Lee Jones: - Fix syscon users not specifying the "syscon" compatible * tag 'mfd-fixes-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: syscon: Restore device_node_to_regmap() for non-syscon nodes
2025-02-12Merge branch 'use-phylib-for-reset-randomization-and-adjustable-polling'Jakub Kicinski
Oleksij Rempel says: ==================== Use PHYlib for reset randomization and adjustable polling This patch set tackles a DP83TG720 reset lock issue and improves PHY polling. Rather than adding a separate polling worker to randomize PHY resets, I chose to extend the PHYlib framework - which already handles most of the needed functionality - with adjustable polling. This approach not only addresses the DP83TG720-specific problem (where synchronized resets can lock the link) but also lays the groundwork for optimizing PHY stats polling across all PHY drivers. With generic PHY stats coming in, we can adjust the polling interval based on hardware characteristics, such as using longer intervals for PHYs with stable HW counters or shorter ones for high-speed links prone to counter overflows. Patch version changes are tracked in separate patches. ==================== Link: https://patch.msgid.link/20250210082358.200751-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net: phy: dp83tg720: Add randomized polling intervals for link detectionOleksij Rempel
Address the limitations of the DP83TG720 PHY, which cannot reliably detect or report a stable link state. To handle this, the PHY must be periodically reset when the link is down. However, synchronized reset intervals between the PHY and its link partner can result in a deadlock, preventing the link from re-establishing. This change introduces a randomized polling interval when the link is down to desynchronize resets between link partners. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250210082358.200751-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net: phy: Add support for driver-specific next update timeOleksij Rempel
Introduce the `phy_get_next_update_time` function to allow PHY drivers to dynamically determine the time (in jiffies) until the next state update event. This enables more flexible and adaptive polling intervals based on the link state or other conditions. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250210082358.200751-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12Merge branch 'rate-management-on-traffic-classes-misc'Jakub Kicinski
Tariq Toukan says: ==================== mlx5 misc Patches 1-3 by William reduce the memory consumption for representors to achieve better scalability. Patches 4-5 by Akiva expose ICM memory consumption per function. Patches 6-8 expose helpful information on RSS resources in devlink RX reporter diagnose. Patches 9-10 are simple enhancements by Alex Lazar. ==================== Link: https://patch.msgid.link/20250209101716.112774-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5: XDP, Enable TX side XDP multi-buffer supportAlexei Lazar
In XDP scenarios, fragmented packets can occur if the MTU is larger than the page size, even when the packet size fits within the linear part. If XDP multi-buffer support is disabled, the fragmented part won't be handled in the TX flow, leading to packet drops. Since XDP multi-buffer support is always available, this commit removes the conditional check for enabling it. This ensures that XDP multi-buffer support is always enabled, regardless of the `is_xdp_mb` parameter, and guarantees the handling of fragmented packets in such scenarios. Signed-off-by: Alexei Lazar <alazar@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250209101716.112774-16-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5: Extend Ethtool loopback selftest to support non-linear SKBAlexei Lazar
Current loopback test validation ignores non-linear SKB case in the SKB access, which can lead to failures in scenarios such as when HW GRO is enabled. Linearize the SKB so both cases will be handled. Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250209101716.112774-15-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5e: Expose RSS via devlink rx reporter diagnoseAmir Tzin
Underneath "rx resources" tag expose RSS diagnostic information. For each RSS expose its rqtn, TIRs and inner TIRs. $ devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx ....... RSS: Index: 0 rqtn: 0 TIRs Numbers: tt: TT_IPV4_TCP tirn: 0 tt: TT_IPV6_TCP tirn: 1 tt: TT_IPV4_UDP tirn: 2 tt: TT_IPV6_UDP tirn: 3 tt: TT_IPV4_IPSEC_AH tirn: 4 tt: TT_IPV6_IPSEC_AH tirn: 5 tt: TT_IPV4_IPSEC_ESP tirn: 6 tt: TT_IPV6_IPSEC_ESP tirn: 7 tt: TT_IPV4 tirn: 8 tt: TT_IPV6 tirn: 9 Inner TIRs Numbers: tt: TT_IPV4_TCP tirn: 10 tt: TT_IPV6_TCP tirn: 11 tt: TT_IPV4_UDP tirn: 12 tt: TT_IPV6_UDP tirn: 13 tt: TT_IPV4_IPSEC_AH tirn: 14 tt: TT_IPV6_IPSEC_AH tirn: 15 tt: TT_IPV4_IPSEC_ESP tirn: 16 tt: TT_IPV6_IPSEC_ESP tirn: 17 tt: TT_IPV4 tirn: 18 tt: TT_IPV6 tirn: 19 Index: 2 rqtn: 27 TIRs Numbers: tt: TT_IPV6_TCP tirn: 46 Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250209101716.112774-14-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5e: Add direct TIRs to devlink rx reporter diagnoseAmir Tzin
Add "RX resources" tag to the output of rx reporter diagnose callback. Underneath add tag for direct TIRs, for each TIR expose its tirn and the corresponding rqtn. $ sudo devlink health diagnose auxiliary/mlx5_core.eth.0/65535 reporter rx .... rx resources: Direct TIRs: ix: 0 tirn: 20 rqtn: 1 ix: 1 tirn: 21 rqtn: 2 ix: 2 tirn: 22 rqtn: 3 ix: 3 tirn: 23 rqtn: 4 ix: 4 tirn: 24 rqtn: 5 ix: 5 tirn: 25 rqtn: 6 ix: 6 tirn: 26 rqtn: 7 ix: 7 tirn: 27 rqtn: 8 ix: 8 tirn: 28 rqtn: 9 ix: 9 tirn: 29 rqtn: 10 ix: 10 tirn: 30 rqtn: 11 ix: 11 tirn: 31 rqtn: 12 ix: 12 tirn: 32 rqtn: 13 ix: 13 tirn: 33 rqtn: 14 ix: 14 tirn: 34 rqtn: 15 ix: 15 tirn: 35 rqtn: 16 ix: 16 tirn: 36 rqtn: 17 ix: 17 tirn: 37 rqtn: 18 ix: 18 tirn: 38 rqtn: 19 ix: 19 tirn: 39 rqtn: 20 ix: 20 tirn: 40 rqtn: 21 ix: 21 tirn: 41 rqtn: 22 ix: 22 tirn: 42 rqtn: 23 ix: 23 tirn: 43 rqtn: 24 Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250209101716.112774-13-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5e: Move RQs diagnose to a dedicated functionAmir Tzin
Move rx reporter RQs diagnose from mlx5e_rx_reporter_diagnose() to a dedicated function. This change is a preparation for the following series which extends diagnose output for the rx reporter. While at it, also pass a mlx5e_priv pointer to mlx5e_rx_reporter_diagnose_common_config() as this is the argument the latter actually needs. Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250209101716.112774-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5: Expose ICM consumption per functionAkiva Goldberger
ICM is a portion of the host's memory assigned to a function by the OS through requests made by the NIC's firmware. PF ICM consumption can be accessed directly, while VF/SF ICM consumption can be accessed through their representors in switchdev mode. The value is exposed to the user in granularity of 4KB through the vnic health reporter as follows: $ devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 icm_consumption: 1032 Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250209101716.112774-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5: Rename and move mlx5_esw_query_vport_vhca_idAkiva Goldberger
Rename mlx5_esw_query_vport_vhca_id to mlx5_vport_get_vhca_id and move it to vport file. Also, add function declaration to mlx5_core header file. This better represents the function's usage and allows for it to be called from other parts of the mlx5_core driver. Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250209101716.112774-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5e: set the tx_queue_len for pfifo_fastWilliam Tu
By default, the mq netdev creates a pfifo_fast qdisc. On a system with 16 core, the pfifo_fast with 3 bands consumes 16 * 3 * 8 (size of pointer) * 1024 (default tx queue len) = 393KB. The patch sets the tx qlen to representor default value, 128 (1<<MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE), which consumes 16 * 3 * 8 * 128 = 49KB, saving 344KB for each representor at ECPF. Signed-off-by: William Tu <witu@nvidia.com> Reviewed-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250209101716.112774-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5e: reduce rep rxq depth to 256 for ECPFWilliam Tu
By experiments, a single queue representor netdev consumes kernel memory around 2.8MB, and 1.8MB out of the 2.8MB is due to page pool for the RXQ. Scaling to a thousand representors consumes 2.8GB, which becomes a memory pressure issue for embedded devices such as BlueField-2 16GB / BlueField-3 32GB memory. Since representor netdevs mostly handles miss traffic, and ideally, most of the traffic will be offloaded, reduce the default non-uplink rep netdev's RXQ default depth from 1024 to 256 if mdev is ecpf eswitch manager. This saves around 1MB of memory per regular RQ, (1024 - 256) * 2KB, allocated from page pool. With rxq depth of 256, the netlink page pool tool reports $./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump page-pool-get {'id': 277, 'ifindex': 9, 'inflight': 128, 'inflight-mem': 786432, 'napi-id': 775}] This is due to mtu 1500 + headroom consumes half pages, so 256 rxq entries consumes around 128 pages (thus create a page pool with size 128), shown above at inflight. Note that each netdev has multiple types of RQs, including Regular RQ, XSK, PTP, Drop, Trap RQ. Since non-uplink representor only supports regular rq, this patch only changes the regular RQ's default depth. Signed-off-by: William Tu <witu@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250209101716.112774-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12net/mlx5e: reduce the max log mpwrq sz for ECPF and repsWilliam Tu
For the ECPF and representors, reduce the max MPWRQ size from 256KB (18) to 128KB (17). This prepares the later patch for saving representor memory. With Striding RQ, there is a minimum of 4 MPWQEs. So with 128KB of max MPWRQ size, the minimal memory is 4 * 128KB = 512KB. When creating page pool, consider 1500 mtu, the minimal page pool size will be 512KB/4KB = 128 pages = 256 rx ring entries (2 entries per page). Before this patch, setting RX ringsize (ethtool -G rx) to 256 causes driver to allocate page pool size more than it needs due to max MPWRQ is 256KB (18). Ex: 4 * 256KB = 1MB, 1MB/4KB = 256 pages, but actually 128 pages is good enough. Reducing the max MPWRQ to 128KB fixes the limitation. Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250209101716.112774-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12platform/x86: thinkpad_acpi: Fix registration of tpacpi platform driverMark Pearson
The recent platform profile changes prevent the tpacpi platform driver from registering. This error is seen in the kernel logs, and the various tpacpi entries are not created: [ 7550.642171] platform thinkpad_acpi: Resources present before probing This happens because devm_platform_profile_register() is called before tpacpi_pdev probes (thanks to Kurt Borja for identifying the root cause). For now revert back to the old platform_profile_register to fix the issue. This is quick fix and will be re-implemented later as more testing is needed for full solution. Tested on X1 Carbon G12. Fixes: 31658c916fa6 ("platform/x86: thinkpad_acpi: Use devm_platform_profile_register()") Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Kurt Borja <kuurtb@gmail.com> Link: https://lore.kernel.org/r/20250211173620.16522-1-mpearson-lenovo@squebb.ca Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-02-12Revert "netfilter: flowtable: teardown flow if cached mtu is stale"Pablo Neira Ayuso
This reverts commit b8baac3b9c5cc4b261454ff87d75ae8306016ffd. IPv4 packets with no DF flag set on result in frequent flow entry teardown cycles, this is visible in the network topology that is used in the nft_flowtable.sh test. nft_flowtable.sh test ocassionally fails reporting that the dscp_fwd test sees no packets going through the flowtable path. Fixes: b8baac3b9c5c ("netfilter: flowtable: teardown flow if cached mtu is stale") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-02-11Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-02-10 (ice, igc, e1000e) For ice: Karol, Jake, and Michal add PTP support for E830 devices. Karol refactors and cleans up PTP code. Jake allows for a common cross-timestamp implementation to be shared for all devices and Michal adds E830 support. Mateusz cleans up initial Flow Director rule creation to loop rather than duplicate repeated similar calls. For igc: Siang adjust calls to remove need for close and open calls on loading XDP program. For e1000e: Gerhard Engleder batches register writes for writing multicast table on real-time kernels. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: e1000e: Fix real-time violations on link up igc: Avoid unnecessary link down event in XDP_SETUP_PROG process ice: refactor ice_fdir_create_dflt_rules() function ice: Implement PTP support for E830 devices ice: Refactor ice_ptp_init_tx_* ice: Add unified ice_capture_crosststamp ice: Process TSYN IRQ in a separate function ice: Use FIELD_PREP for timestamp values ice: Remove unnecessary ice_is_e8xx() functions ice: Don't check device type when checking GNSS presence ==================== Link: https://patch.msgid.link/20250210192352.3799673-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11iavf: Fix a locking bug in an error pathBart Van Assche
If the netdev lock has been obtained, unlock it before returning. This bug has been detected by the Clang thread-safety analyzer. Fixes: afc664987ab3 ("eth: iavf: extend the netdev_lock usage") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20250206175114.1974171-28-bvanassche@acm.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11Merge branch 'sfc-support-devlink-flash'Jakub Kicinski
Edward Cree says: ==================== sfc: support devlink flash Allow upgrading device firmware on Solarflare NICs through standard tools. ==================== Link: https://patch.msgid.link/cover.1739186252.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11sfc: document devlink flash supportEdward Cree
Update the information in sfc's devlink documentation including support for firmware update with devlink flash. Also update the help text for CONFIG_SFC_MTD, as it is no longer strictly required for firmware updates. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/3476b0ef04a0944f03e0b771ec8ed1a9c70db4dc.1739186253.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11sfc: deploy devlink flash images to NIC over MCDIEdward Cree
Use MC_CMD_NVRAM_* wrappers to write the firmware to the partition identified from the image header. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/a746335876b621b3e54cf4e49948148e349a1745.1739186253.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>