summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-14net: qrtr: ns: Ignore ENODEV failures in nsChris Lew
Ignore the ENODEV failures returned by kernel_sendmsg(). These errors indicate that either the local port has been closed or the remote has gone down. Neither of these scenarios are fatal and will eventually be handled through packets that are later queued on the control port. Signed-off-by: Chris Lew <quic_clew@quicinc.com> Signed-off-by: Sarannya Sasikumar <quic_sarannya@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240612063156.1377210-1-quic_sarannya@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14Merge branch 'series-to-deliver-ethernet-for-stm32mp13'Paolo Abeni
Christophe Roullier says: ==================== Series to deliver Ethernet for STM32MP13 STM32MP13 is STM32 SOC with 2 GMACs instances GMAC IP version is SNPS 4.20. GMAC IP configure with 1 RX and 1 TX queue. DMA HW capability register supported RX Checksum Offload Engine supported TX Checksum insertion supported Wake-Up On Lan supported TSO supported Rework dwmac glue to simplify management for next stm32 (integrate RFC from Marek) V2: - Remark from Rob Herring (add Krzysztof's ack in patch 02/11, update in yaml) Remark from Serge Semin (upate commits msg) V3: - Remove PHY regulator patch and Ethernet2 DT because need to clarify how to manage PHY regulator (in glue or PHY side) - Integrate RFC from Marek - Remark from Rob Herring in YAML documentation V4: - Remark from Marek (remove max-speed, extra space in DT, update commit msg) - Remark from Rasmus (add sign-off, add base-commit) - Remark from Sai Krishna Gajula V5: - Fix warning during build CHECK_DTBS - Remark from Marek (glue + DT update) - Remark from Krzysztof about YAML (Make it symmetric) V6: - Replace pr_debug by dev_dbg - Split serie driver/DTs separately V7: - Remark from Marek (update sysconfig register mask) ==================== Link: https://lore.kernel.org/r/20240611083606.733453-1-christophe.roullier@foss.st.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14net: stmmac: dwmac-stm32: add management of stm32mp13 for stm32Christophe Roullier
Add Ethernet support for STM32MP13. STM32MP13 is STM32 SOC with 2 GMACs instances. GMAC IP version is SNPS 4.20. GMAC IP configure with 1 RX and 1 TX queue. DMA HW capability register supported RX Checksum Offload Engine supported TX Checksum insertion supported Wake-Up On Lan supported TSO supported Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14net: stmmac: dwmac-stm32: Mask support for PMCR configurationChristophe Roullier
Add possibility to have second argument in syscon property to manage mask. This mask will be used to address right BITFIELDS of PMCR register. Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14net: stmmac: dwmac-stm32: Fix Mhz to MHzMarek Vasut
Trivial, fix up the comments using 'Mhz' to 'MHz'. No functional change. Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14net: stmmac: dwmac-stm32: Clean up the debug printsMarek Vasut
Use dev_err()/dev_dbg() and phy_modes() to print PHY mode instead of pr_debug() and hand-written PHY mode decoding. This way, each debug print has associated device with it and duplicated mode decoding is removed. Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14net: stmmac: dwmac-stm32: Extract PMCR configurationMarek Vasut
Pull the PMCR clock mux configuration into a separate function. This is the final change of three, which moves external clock rate validation, external clock selector decoding, and clock mux configuration into separate functions. This should make the code easier to understand. No functional change intended. Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14net: stmmac: dwmac-stm32: Separate out external clock selectorMarek Vasut
Pull the external clock selector into a separate function, to avoid conflating it with external clock rate validation and clock mux register configuration. This should make the code easier to read and understand. The dwmac->enable_eth_ck variable in the end indicates whether the MAC clock are supplied by external oscillator (true) or internal RCC clock IP (false). The dwmac->enable_eth_ck value is set based on multiple DT properties, some of them deprecated, some of them specific to bus mode. The following DT properties and variables are taken into account. In each case, if the property is present or true, MAC clock is supplied by external oscillator. - "st,ext-phyclk", assigned to variable dwmac->ext_phyclk - Used in any mode (MII/RMII/GMII/RGMII) - The only non-deprecated DT property of the three - "st,eth-clk-sel", assigned to variable dwmac->eth_clk_sel_reg - Valid only in GMII/RGMII mode - Deprecated property, backward compatibility only - "st,eth-ref-clk-sel", assigned to variable dwmac->eth_ref_clk_sel_reg - Valid only in RMII mode - Deprecated property, backward compatibility only The stm32mp1_select_ethck_external() function handles the aforementioned DT properties and sets dwmac->enable_eth_ck accordingly. The stm32mp1_set_mode() is adjusted to call stm32mp1_select_ethck_external() first and then only use dwmac->enable_eth_ck to determine hardware clock mux settings. No functional change intended. Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14net: stmmac: dwmac-stm32: Separate out external clock rate validationMarek Vasut
Pull the external clock frequency validation into a separate function, to avoid conflating it with external clock DT property decoding and clock mux register configuration. This should make the code easier to read and understand. This does change the code behavior slightly. The clock mux PMCR register setting now depends solely on the DT properties which configure the clock mux between external clock and internal RCC generated clock. The mux PMCR register settings no longer depend on the supplied clock frequency, that supplied clock frequency is now only validated, and if the clock frequency is invalid for a mode, it is rejected. Previously, the code would switch the PMCR register clock mux to internal RCC generated clock if external clock couldn't provide suitable frequency, without checking whether the RCC generated clock frequency is correct. Such behavior is risky at best, user should have configured their clock correctly in the first place, so this behavior is removed here. Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14dt-bindings: net: add STM32MP13 compatible in documentation for stm32Christophe Roullier
New STM32 SOC have 2 GMACs instances. GMAC IP version is SNPS 4.20. Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-14net: hsr: Send supervisory frames to HSR network with ProxyNodeTable dataLukasz Majewski
This patch provides support for sending supervision HSR frames with MAC addresses stored in ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled. Supervision frames with RedBox MAC address (appended as second TLV) are only send for ProxyNodeTable nodes. This patch series shall be tested with hsr_redbox.sh script. Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-13Merge branch 'net-dsa-lantiq_gswip-code-improvements'Jakub Kicinski
Martin Schiller says: ==================== net: dsa: lantiq_gswip: code improvements This patchset for the lantiq_gswip driver is a collection of minor fixes and coding improvements by Martin Blumenstingl without any real changes in the actual functionality. ==================== Link: https://lore.kernel.org/r/20240611135434.3180973-1-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: Improve error message in gswip_port_fdb()Martin Blumenstingl
Print that no FID is found for bridge %s instead of the incorrect message that the port is not part of a bridge. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20240611135434.3180973-13-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: Update comments in gswip_port_vlan_filtering()Martin Blumenstingl
Update the comments in gswip_port_vlan_filtering() so it's clear that there are two separate cases, one for "tag based VLAN" and another one for "port based VLAN". Suggested-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20240611135434.3180973-12-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: Remove dead code from gswip_add_single_port_br()Martin Schiller
The port validation in gswip_add_single_port_br() is superfluous and can be omitted. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240611135434.3180973-11-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: Consistently use macros for the mac bridge tableMartin Blumenstingl
Only bits [5:0] in mac_bridge.key[3] are reserved for the FID. Also, for dynamic (learned) entries, bits [7:4] in mac_bridge.val[0] represents the port. Introduce new macros GSWIP_TABLE_MAC_BRIDGE_KEY3_FID and GSWIP_TABLE_MAC_BRIDGE_VAL0_PORT macro and use it throughout the driver. Also rename and update GSWIP_TABLE_MAC_BRIDGE_VAL1_STATIC to use the BIT() macro. This makes the driver code easier to understand. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240611135434.3180973-10-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: Change literal 6 to ETH_ALENMartin Blumenstingl
The addr variable in gswip_port_fdb_dump() stores a mac address. Use ETH_ALEN to make this consistent across other drivers. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20240611135434.3180973-9-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: Use dsa_is_cpu_port() in gswip_port_change_mtu()Martin Blumenstingl
Make the check for the CPU port in gswip_port_change_mtu() consistent with other areas of the driver by using dsa_is_cpu_port(). Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20240611135434.3180973-8-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: do also enable or disable cpu portMartin Schiller
Before commit 74be4babe72f ("net: dsa: do not enable or disable non user ports"), gswip_port_enable/disable() were also executed for the cpu port in gswip_setup() which disabled the cpu port during initialization. Let's restore this by removing the dsa_is_user_port checks. Also, let's clean up the gswip_port_enable() function so that we only have to check for the cpu port once. The operation reordering done here is safe. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240611135434.3180973-7-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: Don't manually call gswip_port_enable()Martin Blumenstingl
We don't need to manually call gswip_port_enable() from within gswip_setup() for the CPU port. DSA does this automatically for us. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20240611135434.3180973-6-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: Use dev_err_probe where appropriateMartin Blumenstingl
dev_err_probe() can be used to simplify the existing code. Also it means we get rid of the following warning which is seen whenever the PMAC (Ethernet controller which connects to GSWIP's CPU port) has not been probed yet: gswip 1e108000.switch: dsa switch register failed: -517 Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20240611135434.3180973-5-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: add terminating \n where missingMartin Schiller
Some dev_err are missing the terminating \n. Let's add that. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240611135434.3180973-4-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13net: dsa: lantiq_gswip: Only allow phy-mode = "internal" on the CPU portMartin Blumenstingl
Add the CPU port to gswip_xrx200_phylink_get_caps() and gswip_xrx300_phylink_get_caps(). It connects through a SoC-internal bus, so the only allowed phy-mode is PHY_INTERFACE_MODE_INTERNAL. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20240611135434.3180973-3-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13dt-bindings: net: dsa: lantiq,gswip: convert to YAML schemaMartin Schiller
Convert the lantiq,gswip bindings to YAML format. Also add this new file to the MAINTAINERS file. Furthermore, the CPU port has to specify a phy-mode and either a phy or a fixed-link. Since GSWIP is connected using a SoC internal protocol there's no PHY involved. Add phy-mode = "internal" and a fixed-link to the example code to describe the communication between the PMAC (Ethernet controller) and GSWIP switch. Signed-off-by: Martin Schiller <ms@dev.tdt.de> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20240611135434.3180973-2-ms@dev.tdt.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13Merge branch 'mana-shared' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Leon Romanovsky says: ==================== net: mana: Allow variable size indirection table Like we talked, I created new shared branch for this patch: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=mana-shared * 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: net: mana: Allow variable size indirection table ==================== Link: https://lore.kernel.org/all/20240612183051.GE4966@unreal Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts, no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13Merge tag 'net-6.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth and netfilter. Slim pickings this time, probably a combination of summer, DevConf.cz, and the end of first half of the year at corporations. Current release - regressions: - Revert "igc: fix a log entry using uninitialized netdev", it traded lack of netdev name in a printk() for a crash Previous releases - regressions: - Bluetooth: L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ - geneve: fix incorrectly setting lengths of inner headers in the skb, confusing the drivers and causing mangled packets - sched: initialize noop_qdisc owner to avoid false-positive recursion detection (recursing on CPU 0), which bubbles up to user space as a sendmsg() error, while noop_qdisc should silently drop - netdevsim: fix backwards compatibility in nsim_get_iflink() Previous releases - always broken: - netfilter: ipset: fix race between namespace cleanup and gc in the list:set type" * tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits) bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send() af_unix: Read with MSG_PEEK loops if the first unread byte is OOB bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response gve: Clear napi->skb before dev_kfree_skb_any() ionic: fix use after netif_napi_del() Revert "igc: fix a log entry using uninitialized netdev" net: bridge: mst: fix suspicious rcu usage in br_mst_set_state net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state net/ipv6: Fix the RT cache flush via sysctl using a previous delay net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters gve: ignore nonrelevant GSO type bits when processing TSO headers net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP netfilter: Use flowlabel flow key when re-routing mangled packets netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type netfilter: nft_inner: validate mandatory meta and payload tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() mailmap: map Geliang's new email address mptcp: pm: update add_addr counters after connect mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID mptcp: ensure snd_una is properly initialized on connect ...
2024-06-13Merge tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: "Bugfixes: - NFSv4.2: Fix a memory leak in nfs4_set_security_label - NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long. - NFS: Add appropriate memory barriers to the sillyrename code - Propagate readlink errors in nfs_symlink_filler - NFS: don't invalidate dentries on transient errors - NFS: fix unnecessary synchronous writes in random write workloads - NFSv4.1: enforce rootpath check when deciding whether or not to trunk Other: - Change email address for Trond Myklebust due to email server concerns" * tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: add barriers when testing for NFS_FSDATA_BLOCKED SUNRPC: return proper error from gss_wrap_req_priv NFSv4.1 enforce rootpath check in fs_location query NFS: abort nfs_atomic_open_v23 if name is too long. nfs: don't invalidate dentries on transient errors nfs: Avoid flushing many pages with NFS_FILE_SYNC nfs: propagate readlink errors in nfs_symlink_filler MAINTAINERS: Change email address for Trond Myklebust NFSv4: Fix memory leak in nfs4_set_security_label
2024-06-13Merge tag 'fixes-2024-06-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fixes from Mike Rapoport: "Fix validation of NUMA coverage. memblock_validate_numa_coverage() was checking for a unset node ID using NUMA_NO_NODE, but x86 used MAX_NUMNODES when no node ID was specified by buggy firmware. Update memblock to substitute MAX_NUMNODES with NUMA_NO_NODE in memblock_set_node() and use NUMA_NO_NODE in x86::numa_init()" * tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node() memblock: make memblock_set_node() also warn about use of MAX_NUMNODES
2024-06-13bnxt_en: Adjust logging of firmware messages in case of released token in ↵Aleksandr Mishin
__hwrm_send() In case of token is released due to token->state == BNXT_HWRM_DEFERRED, released token (set to NULL) is used in log messages. This issue is expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But this error code is returned by recent firmware. So some firmware may not return it. This may lead to NULL pointer dereference. Adjust this issue by adding token pointer check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 8fa4219dba8e ("bnxt_en: add dynamic debug support for HWRM messages") Suggested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13af_unix: Read with MSG_PEEK loops if the first unread byte is OOBRao Shoaib
Read with MSG_PEEK flag loops if the first byte to read is an OOB byte. commit 22dd70eb2c3d ("af_unix: Don't peek OOB data without MSG_OOB.") addresses the loop issue but does not address the issue that no data beyond OOB byte can be read. >>> from socket import * >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM) >>> c1.send(b'a', MSG_OOB) 1 >>> c1.send(b'b') 1 >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT) b'b' >>> from socket import * >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM) >>> c2.setsockopt(SOL_SOCKET, SO_OOBINLINE, 1) >>> c1.send(b'a', MSG_OOB) 1 >>> c1.send(b'b') 1 >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT) b'a' >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT) b'a' >>> c2.recv(1, MSG_DONTWAIT) b'a' >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT) b'b' >>> Fixes: 314001f0bf92 ("af_unix: Add OOB support") Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240611084639.2248934-1-Rao.Shoaib@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded responseMichael Chan
Firmware interface 1.10.2.118 has increased the size of HWRM_PORT_PHY_QCFG response beyond the maximum size that can be forwarded. When the VF's link state is not the default auto state, the PF will need to forward the response back to the VF to indicate the forced state. This regression may cause the VF to fail to initialize. Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum 96 bytes. The SPEEDS2_SUPPORTED flag needs to be cleared because the new speeds2 fields are beyond the legacy structure. Also modify bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96 bytes to make this failure more obvious. Fixes: 84a911db8305 ("bnxt_en: Update firmware interface to 1.10.2.118") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240612231736.57823-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13gve: Clear napi->skb before dev_kfree_skb_any()Ziwei Xiao
gve_rx_free_skb incorrectly leaves napi->skb referencing an skb after it is freed with dev_kfree_skb_any(). This can result in a subsequent call to napi_get_frags returning a dangling pointer. Fix this by clearing napi->skb before the skb is freed. Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path") Cc: stable@vger.kernel.org Reported-by: Shailend Chand <shailend@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Shailend Chand <shailend@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Link: https://lore.kernel.org/r/20240612001654.923887-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13ionic: fix use after netif_napi_del()Taehee Yoo
When queues are started, netif_napi_add() and napi_enable() are called. If there are 4 queues and only 3 queues are used for the current configuration, only 3 queues' napi should be registered and enabled. The ionic_qcq_enable() checks whether the .poll pointer is not NULL for enabling only the using queue' napi. Unused queues' napi will not be registered by netif_napi_add(), so the .poll pointer indicates NULL. But it couldn't distinguish whether the napi was unregistered or not because netif_napi_del() doesn't reset the .poll pointer to NULL. So, ionic_qcq_enable() calls napi_enable() for the queue, which was unregistered by netif_napi_del(). Reproducer: ethtool -L <interface name> rx 1 tx 1 combined 0 ethtool -L <interface name> rx 0 tx 0 combined 1 ethtool -L <interface name> rx 0 tx 0 combined 4 Splat looks like: kernel BUG at net/core/dev.c:6666! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 1057 Comm: kworker/3:3 Not tainted 6.10.0-rc2+ #16 Workqueue: events ionic_lif_deferred_work [ionic] RIP: 0010:napi_enable+0x3b/0x40 Code: 48 89 c2 48 83 e2 f6 80 b9 61 09 00 00 00 74 0d 48 83 bf 60 01 00 00 00 74 03 80 ce 01 f0 4f RSP: 0018:ffffb6ed83227d48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff97560cda0828 RCX: 0000000000000029 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff97560cda0a28 RBP: ffffb6ed83227d50 R08: 0000000000000400 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: ffff97560ce3c1a0 R14: 0000000000000000 R15: ffff975613ba0a20 FS: 0000000000000000(0000) GS:ffff975d5f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f734ee200 CR3: 0000000103e50000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? napi_enable+0x3b/0x40 ? do_error_trap+0x83/0xb0 ? napi_enable+0x3b/0x40 ? napi_enable+0x3b/0x40 ? exc_invalid_op+0x4e/0x70 ? napi_enable+0x3b/0x40 ? asm_exc_invalid_op+0x16/0x20 ? napi_enable+0x3b/0x40 ionic_qcq_enable+0xb7/0x180 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_start_queues+0xc4/0x290 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_link_status_check+0x11c/0x170 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_lif_deferred_work+0x129/0x280 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] process_one_work+0x145/0x360 worker_thread+0x2bb/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240612060446.1754392-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-13Revert "igc: fix a log entry using uninitialized netdev"Sasha Neftin
This reverts commit 86167183a17e03ec77198897975e9fdfbd53cb0b. igc_ptp_init() needs to be called before igc_reset(), otherwise kernel crash could be observed. Following the corresponding discussion [1] and [2] revert this commit. Link: https://lore.kernel.org/all/8fb634f8-7330-4cf4-a8ce-485af9c0a61a@intel.com/ [1] Link: https://lore.kernel.org/all/87o78rmkhu.fsf@intel.com/ [2] Fixes: 86167183a17e ("igc: fix a log entry using uninitialized netdev") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240611162456.961631-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12CDC-NCM: add support for Apple's private interfaceOle André Vadla Ravnås
Available on iOS/iPadOS >= 17, where this new interface is used by developer tools using the new RemoteXPC protocol. This private interface lacks a status endpoint, presumably because there isn't a physical cable that can be unplugged, nor any speed changes to be notified about. Note that NCM interfaces are not exposed until a mode switch is requested, which macOS does automatically. The mode switch can be performed like this: uint8_t status; libusb_control_transfer(device_handle, LIBUSB_RECIPIENT_DEVICE | LIBUSB_REQUEST_TYPE_VENDOR | LIBUSB_ENDPOINT_IN, 82, /* bRequest */ 0, /* wValue */ 3, /* wIndex */ &status, sizeof(status), 0); Newer versions of usbmuxd do this automatically. Co-developed-by: Håvard Sørbø <havard@hsorbo.no> Signed-off-by: Håvard Sørbø <havard@hsorbo.no> Signed-off-by: Ole André Vadla Ravnås <oleavr@frida.re> Link: https://lore.kernel.org/r/20240607074117.31322-1-oleavr@frida.re Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12Merge branch 'net-bridge-mst-fix-suspicious-rcu-usage-warning'Jakub Kicinski
Nikolay Aleksandrov says: ==================== net: bridge: mst: fix suspicious rcu usage warning This set fixes a suspicious RCU usage warning triggered by syzbot[1] in the bridge's MST code. After I converted br_mst_set_state to RCU, I forgot to update the vlan group dereference helper. Fix it by using the proper helper, in order to do that we need to pass the vlan group which is already obtained correctly by the callers for their respective context. Patch 01 is a requirement for the fix in patch 02. Note I did consider rcu_dereference_rtnl() but the churn is much bigger and in every part of the bridge. We can do that as a cleanup in net-next. [1] https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe ============================= WARNING: suspicious RCU usage 6.10.0-rc2-syzkaller-00235-g8a92980606e3 #0 Not tainted ----------------------------- net/bridge/br_private.h:1599 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by syz-executor.1/5374: #0: ffff888022d50b18 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock include/linux/mmap_lock.h:144 [inline] #0: ffff888022d50b18 (&mm->mmap_lock){++++}-{3:3}, at: __mm_populate+0x1b0/0x460 mm/gup.c:2111 #1: ffffc90000a18c00 ((&p->forward_delay_timer)){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x650 kernel/time/timer.c:1789 #2: ffff88805fb2ccb8 (&br->lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #2: ffff88805fb2ccb8 (&br->lock){+.-.}-{2:2}, at: br_forward_delay_timer_expired+0x50/0x440 net/bridge/br_stp_timer.c:86 #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline] #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: br_mst_set_state+0x171/0x7a0 net/bridge/br_mst.c:105 stack backtrace: CPU: 1 PID: 5374 Comm: syz-executor.1 Not tainted 6.10.0-rc2-syzkaller-00235-g8a92980606e3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 nbp_vlan_group net/bridge/br_private.h:1599 [inline] br_mst_set_state+0x29e/0x7a0 net/bridge/br_mst.c:106 br_set_state+0x28a/0x7b0 net/bridge/br_stp.c:47 br_forward_delay_timer_expired+0x176/0x440 net/bridge/br_stp_timer.c:88 call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792 expire_timers kernel/time/timer.c:1843 [inline] __run_timers kernel/time/timer.c:2417 [inline] __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428 run_timer_base kernel/time/timer.c:2437 [inline] run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 </IRQ> <TASK> ==================== Link: https://lore.kernel.org/r/20240609103654.914987-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12net: bridge: mst: fix suspicious rcu usage in br_mst_set_stateNikolay Aleksandrov
I converted br_mst_set_state to RCU to avoid a vlan use-after-free but forgot to change the vlan group dereference helper. Switch to vlan group RCU deref helper to fix the suspicious rcu usage warning. Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free") Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12net: bridge: mst: pass vlan group directly to br_mst_vlan_set_stateNikolay Aleksandrov
Pass the already obtained vlan group pointer to br_mst_vlan_set_state() instead of dereferencing it again. Each caller has already correctly dereferenced it for their context. This change is required for the following suspicious RCU dereference fix. No functional changes intended. Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free") Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12Merge branch 'net-flower-validate-encapsulation-control-flags'Jakub Kicinski
Asbjørn Sloth Tønnesen says: ==================== net: flower: validate encapsulation control flags Now that all drivers properly rejects unsupported flower control flags used with FLOW_DISSECTOR_KEY_CONTROL, then time has come to add similar checks to the drivers supporting FLOW_DISSECTOR_KEY_ENC_CONTROL. There are currently just 4 drivers supporting this key, and 3 of those currently doesn't validate encapsulated control flags. Encapsulation control flags may currently be unused, but they should still be validated by the drivers, so that drivers will properly reject any new flags when they are introduced. This series adds some helper functions, and implements them in all 4 drivers. NB: It is currently discussed[1] to use encapsulation control flags for tunnel flags instead of the new FLOW_DISSECTOR_KEY_ENC_FLAGS. [1] https://lore.kernel.org/netdev/ZmFuxElwZiYJzBkh@dcaratti.users.ipa.redhat.com/ ==================== Link: https://lore.kernel.org/r/20240609173358.193178-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12ice: flower: validate encapsulation control flagsAsbjørn Sloth Tønnesen
Encapsulation control flags are currently not used anywhere, so all flags are currently unsupported by all drivers. This patch adds validation of this assumption, so that encapsulation flags may be used in the future. In case any encapsulation control flags are masked, flow_rule_match_has_enc_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/20240609173358.193178-6-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12nfp: flower: validate encapsulation control flagsAsbjørn Sloth Tønnesen
Encapsulation control flags are currently not used anywhere, so all flags are currently unsupported by all drivers. This patch adds validation of this assumption, so that encapsulation flags may be used in the future. In case any encapsulation control flags are masked, flow_rule_match_has_enc_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/20240609173358.193178-5-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12net/mlx5e: flower: validate encapsulation control flagsAsbjørn Sloth Tønnesen
Encapsulation control flags are currently not used anywhere, so all flags are currently unsupported by all drivers. This patch adds validation of this assumption, so that encapsulation flags may be used in the future. In case any encapsulation control flags are masked, flow_rule_match_has_enc_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/20240609173358.193178-4-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12sfc: use flow_rule_is_supp_enc_control_flags()Asbjørn Sloth Tønnesen
Change the existing check for unsupported encapsulation control flags, to use the new helper flow_rule_is_supp_enc_control_flags(). No functional change, only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/20240609173358.193178-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12flow_offload: add encapsulation control flag helpersAsbjørn Sloth Tønnesen
This patch adds two new helper functions: flow_rule_is_supp_enc_control_flags() flow_rule_has_enc_control_flags() They are intended to be used for validating encapsulation control flags, and compliment the similar helpers without "enc_" in the name. The only difference is that they have their own error message, to make it obvious if an unsupported flag error is related to FLOW_DISSECTOR_KEY_CONTROL or FLOW_DISSECTOR_KEY_ENC_CONTROL. flow_rule_has_enc_control_flags() is for drivers supporting FLOW_DISSECTOR_KEY_ENC_CONTROL, but not supporting any encapsulation control flags. (Currently all 4 drivers fits this category) flow_rule_is_supp_enc_control_flags() is currently only used for the above helper, but should also be used by drivers once they implement at least one encapsulation control flag. There is AFAICT currently no need for an "enc_" variant of flow_rule_match_has_control_flags(), as all drivers currently supporting FLOW_DISSECTOR_KEY_ENC_CONTROL, are already calling flow_rule_match_enc_control() directly. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/20240609173358.193178-2-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12net/ipv6: Fix the RT cache flush via sysctl using a previous delayPetr Pavlu
The net.ipv6.route.flush system parameter takes a value which specifies a delay used during the flush operation for aging exception routes. The written value is however not used in the currently requested flush and instead utilized only in the next one. A problem is that ipv6_sysctl_rtcache_flush() first reads the old value of net->ipv6.sysctl.flush_delay into a local delay variable and then calls proc_dointvec() which actually updates the sysctl based on the provided input. Fix the problem by switching the order of the two operations. Fixes: 4990509f19e8 ("[NETNS][IPV6]: Make sysctls route per namespace.") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12net: ethernet: mtk_eth_soc: ppe: add support for multiple PPEsElad Yifee
Add the missing pieces to allow multiple PPEs units, one for each GMAC. mtk_gdm_config has been modified to work on targted mac ID, the inner loop moved outside of the function to allow unrelated operations like setting the MAC's PPE index. Introduce a sanity check in flow_offload_replace to account for non-MTK ingress devices. Additional field 'ppe_idx' was added to struct mtk_mac in order to keep track on the assigned PPE unit. Signed-off-by: Elad Yifee <eladwf@gmail.com> Link: https://lore.kernel.org/r/20240607082155.20021-1-eladwf@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linuxLinus Torvalds
Pull ARM and clkdev fixes from Russell King: - Fix clkdev - erroring out on long strings causes boot failures, so don't do this. Still warn about the over-sized strings (which will never match and thus their registration with clkdev is useless) - Fix for ftrace with frame pointer unwinder with recent GCC changing the way frames are stacked. * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: ARM: 9405/1: ftrace: Don't assume stack frames are contiguous in memory clkdev: don't fail clkdev_alloc() if over-sized
2024-06-12Merge branch 'allow-configuration-of-multipath-hash-seed'Jakub Kicinski
Petr Machata says: ==================== Allow configuration of multipath hash seed Let me just quote the commit message of patch #2 here to inform the motivation and some of the implementation: When calculating hashes for the purpose of multipath forwarding, both IPv4 and IPv6 code currently fall back on flow_hash_from_keys(). That uses a randomly-generated seed. That's a fine choice by default, but unfortunately some deployments may need a tighter control over the seed used. In this patchset, make the seed configurable by adding a new sysctl key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used specifically for multipath forwarding and not for the other concerns that flow_hash_from_keys() is used for, such as queue selection. Expose the knob as sysctl because other such settings, such as headers to hash, are also handled that way. Despite being placed in the net.ipv4 namespace, the multipath seed sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of TCP variables. Like those, the multipath hash seed is a per-netns variable. The seed used by flow_hash_from_keys() is a 128-bit quantity. However it seems that usually the seed is a much more modest value. 32 bits seem typical (Cisco, Cumulus), some systems go even lower. For that reason, and to decouple the user interface from implementation details, go with a 32-bit quantity, which is then quadruplicated to form the siphash key. One example of use of this interface is avoiding hash polarization, where two ECMP routers, one behind the other, happen to make consistent hashing decisions, and as a result, part of the ECMP space of the latter router is never used. Another is a load balancer where several machines forward traffic to one of a number of leaves, and the forwarding decisions need to be made consistently. (This is a case of a desired hash polarization, mentioned e.g. in chapter 6.3 of [0].) There has already been a proposal to include a hash seed control interface in the past[1]. - Patches #1-#2 contain the substance of the work - Patch #3 is an mlxsw offload - Patches #4 and #5 are a selftest [0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf [1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/ ==================== Link: https://lore.kernel.org/r/20240607151357.421181-1-petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12selftests: forwarding: router_mpath_hash: Add a new selftestPetr Machata
Add a selftest that exercises the sysctl added in the previous patches. Test that set/get works as expected; that across seeds we eventually hit all NHs (test_mpath_seed_*); and that a given seed keeps hitting the same NHs even across seed changes (test_mpath_seed_stability_*). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607151357.421181-6-petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>