summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-19mlxsw: pci: Optimize data buffer accessAmit Cohen
Before accessing data buffer, call net_prefetch() to load it into the cache. This change improves driver performance, CPU can handle about 7.1% more packets per second. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1fa07c510890866a6f201163ab7e78890ba28b3b.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Use page pool for Rx buffers allocationAmit Cohen
As part of driver init, all Rx queues are filled with buffers for hardware usage. Later, when a packet is received, a new buffer should be allocated to be used by hardware instead of the received buffer. Packet's processing time includes allocation time, which can be improved using page pool. Using page pool, DMA mapping is done only for first allocation of buffers. As subsequent buffers allocation avoid DMA mapping, it results in performance improvement. The purpose of page pool is to allocate pages fast from cache without locking. This lockless guarantee naturally comes from running under a NAPI. Use page pool to allocate the data buffer only, so hardware will use it to fill the packet. At completion time, attach the data buffer (now filled with packet payload) to new SKB which is allocated around the received buffer. SKB building at completion time prevents cache miss for each packet, as now the SKB is allocated right before packets will be handled by networking stack. Page pool for each Rx queue enhances Rx side performance by reclaiming buffers back to each queue specific pool. This change significantly improves driver performance, CPU can handle about 345% of the packets per second it previously handled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1cf788a8f43c70aae6d526018ef77becb27ad6d3.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Initialize page pool per CQAmit Cohen
Next patch will use page pool to allocate buffers for RDQ. Initialize page pool for each CQ, which is mapped 1:1 to RDQ. Page pool for each Rx queue enhances Rx side performance by reclaiming buffers back to each queue specific pool. When only one NAPI instance is the consumer of pages from page pool, it is recommended to pass it as part of 'page_pool_params', then page pool APIs will be done without special locks. mlxsw driver holds NAPI instance per CQ, so add page pool per CQ and use the existing NAPI instance. For now, pages are not allocated from the pool, next patch will use it. Some notes regarding 'page_pool_params': * Use PP_FLAG_DMA_MAP to allow page pool handles DMA mapping, for now do not use sync flag, as only the device writes to this memory and we read it only when it finishes writing there. This will probably be changed when we will support XDP. * Define 'order' according to maximum MTU and take into account software overhead. Some round up are done, which means that we allocate more pages than we really need. This can be improved later by using fragmented buffers. * Use pool_size = MLXSW_PCI_WQE_COUNT. This will be the size of 'ptr_ring', and should be the maximum amount of packets that page pool will allocate memory for. In our case, this is the queue size, defined as MLXSW_PCI_WQE_COUNT. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/02e5856ae7c572d4293ce6bb92c286ee6cfec800.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Store CQ pointer as part of RDQ structureAmit Cohen
Next patches will add support for page pool in mlxsw driver. Page pool will be used to allocate buffers for RDQ and will use NAPI instance of the appropriate CQ (RDQ is mapped 1:1 to CQ). To allow pool initialization as part of CQ init, when NAPI is initialized, page_pool structure will be as part of CQ structure. Later, the allocations for RDQ will be done from the pool in the appropriate CQ. To allow access to the appropriate pool, set CQ pointer as part of RDQ initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/d60918ca1e142a554af1df9c1152cdac83854a3b.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19mlxsw: pci: Split NAPI setup/teardown into two stepsAmit Cohen
mlxsw_pci_cq_napi_setup() includes both NAPI initialization and enablement, similar to teardown function. Next patches will add support for page pool in mlxsw driver, then we use NAPI instance for page pool. Page pool initialization should be done before NAPI enablement, same for page pool destruction which should be done after NAPI disablement. As preparation, split NAPI setup/teardown into two steps, then page pool setup will be done between the phases. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/8dbf37e859f07247498fca17109b8858ff2b0498.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19net: hsr: cosmetic: Remove extra white spaceLukasz Majewski
This change just removes extra (i.e. not needed) white space in prp_drop_frame() function. No functional changes. Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240618125817.1111070-1-lukma@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19net/tcp_ao: Don't leak ao_info on error-pathDmitry Safonov
It seems I introduced it together with TCP_AO_CMDF_AO_REQUIRED, on version 5 [1] of TCP-AO patches. Quite frustrative that having all these selftests that I've written, running kmemtest & kcov was always in todo. [1]: https://lore.kernel.org/netdev/20230215183335.800122-5-dima@arista.com/ Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240617072451.1403e1d2@kernel.org/ Fixes: 0aadc73995d0 ("net/tcp: Prevent TCP-MD5 with TCP-AO being set") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240619-tcp-ao-required-leak-v1-1-6408f3c94247@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19virtio_net: add support for Byte Queue LimitsJiri Pirko
Add support for Byte Queue Limits (BQL). Tested on qemu emulated virtio_net device with 1, 2 and 4 queues. Tested with fq_codel and pfifo_fast. Super netperf with 50 threads is running in background. Netperf TCP_RR results: NOBQL FQC 1q: 159.56 159.33 158.50 154.31 agv: 157.925 NOBQL FQC 2q: 184.64 184.96 174.73 174.15 agv: 179.62 NOBQL FQC 4q: 994.46 441.96 416.50 499.56 agv: 588.12 NOBQL PFF 1q: 148.68 148.92 145.95 149.48 agv: 148.2575 NOBQL PFF 2q: 171.86 171.20 170.42 169.42 agv: 170.725 NOBQL PFF 4q: 1505.23 1137.23 2488.70 3507.99 agv: 2159.7875 BQL FQC 1q: 1332.80 1297.97 1351.41 1147.57 agv: 1282.4375 BQL FQC 2q: 768.30 817.72 864.43 974.40 agv: 856.2125 BQL FQC 4q: 945.66 942.68 878.51 822.82 agv: 897.4175 BQL PFF 1q: 149.69 151.49 149.40 147.47 agv: 149.5125 BQL PFF 2q: 2059.32 798.74 1844.12 381.80 agv: 1270.995 BQL PFF 4q: 1871.98 4420.02 4916.59 13268.16 agv: 6119.1875 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240618144456.1688998-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19net: smc9194: add missing MODULE_DESCRIPTION() macroJeff Johnson
With ARCH=m68k, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/smsc/smc9194.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-smsc-v1-1-ad3d7200421e@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19net: ethernet: mac89x0: add missing MODULE_DESCRIPTION() macroJeff Johnson
With ARCH=m68k, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/cirrus/mac89x0.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-cirrus-v1-1-07f5bd0b64cb@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19net: amd: add missing MODULE_DESCRIPTION() macrosJeff Johnson
With ARCH=m68k, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/a2065.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/ariadne.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/atarilance.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/hplance.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/7990.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/mvme147.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/sun3lance.o Add the missing invocation of the MODULE_DESCRIPTION() macro to all files which have a MODULE_LICENSE(). This includes drivers/net/ethernet/amd/lance.c which, although it did not produce a warning with the m68k allmodconfig configuration, may cause this warning with other configurations. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-amd-v1-1-50ee7a9ad50e@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19net: arcnet: com20020-isa: add missing MODULE_DESCRIPTION() macroJeff Johnson
With ARCH=m68k, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020-isa.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-arcnet-v1-1-90e42bc58102@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19ice: Fix VSI list rule with ICE_SW_LKUP_LAST typeMarcin Szycik
Adding/updating VSI list rule, as well as allocating/freeing VSI list resource are called several times with type ICE_SW_LKUP_LAST, which fails because ice_update_vsi_list_rule() and ice_aq_alloc_free_vsi_list() consider it invalid. Allow calling these functions with ICE_SW_LKUP_LAST. This fixes at least one issue in switchdev mode, where the same rule with different action cannot be added, e.g.: tc filter add dev $PF1 ingress protocol arp prio 0 flower skip_sw \ dst_mac ff:ff:ff:ff:ff:ff action mirred egress redirect dev $VF1_PR tc filter add dev $PF1 ingress protocol arp prio 0 flower skip_sw \ dst_mac ff:ff:ff:ff:ff:ff action mirred egress redirect dev $VF2_PR Fixes: 0f94570d0cae ("ice: allow adding advanced rules") Suggested-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240618210206.981885-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19ipv6: bring NLM_DONE out to a separate recv() againJakub Kicinski
Commit under Fixes optimized the number of recv() calls needed during RTM_GETROUTE dumps, but we got multiple reports of applications hanging on recv() calls. Applications expect that a route dump will be terminated with a recv() reading an individual NLM_DONE message. Coalescing NLM_DONE is perfectly legal in netlink, but even tho reporters fixed the code in respective projects, chances are it will take time for those applications to get updated. So revert to old behavior (for now)? This is an IPv6 version of commit 460b0d33cf10 ("inet: bring NLM_DONE out to a separate recv() again"). Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com> Link: https://lore.kernel.org/all/CANP3RGc1RG71oPEBXNx_WZFP9AyphJefdO4paczN92n__ds4ow@mail.gmail.com Reported-by: Stefano Brivio <sbrivio@redhat.com> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth Reported-by: Ilya Maximets <i.maximets@ovn.org> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org Fixes: 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Tested-by: Ilya Maximets <i.maximets@ovn.org> Link: https://lore.kernel.org/r/20240618193914.561782-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-19Merge tag 'probes-fixes-v6.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: - Restrict gen-API tests for synthetic and kprobe events to only be built as modules, as they generate dynamic events that cannot be removed, causing ftracetest and startup selftests to fail * tag 'probes-fixes-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Build event generation tests only as modules
2024-06-19Merge tag 'mips-fixes_6.10_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - fix for BCM6538 boards - fix RB532 PCI workaround * tag 'mips-fixes_6.10_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: Revert "MIPS: pci: lantiq: restore reset gpio polarity" mips: bmips: BCM6358: make sure CBR is correctly set MIPS: pci: lantiq: restore reset gpio polarity MIPS: Routerboard 532: Fix vendor retry check code
2024-06-19selftests: add selftest for the SRv6 End.DX6 behavior with netfilterJianguo Wu
this selftest is designed for evaluating the SRv6 End.DX6 behavior used with netfilter(rpfilter), in this example, for implementing IPv6 L3 VPN use cases. Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19selftests: add selftest for the SRv6 End.DX4 behavior with netfilterJianguo Wu
this selftest is designed for evaluating the SRv6 End.DX4 behavior used with netfilter(rpfilter), in this example, for implementing IPv4 L3 VPN use cases. Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter coreJianguo Wu
Currently, the sysctl net.netfilter.nf_hooks_lwtunnel depends on the nf_conntrack module, but the nf_conntrack module is not always loaded. Therefore, accessing net.netfilter.nf_hooks_lwtunnel may have an error. Move sysctl nf_hooks_lwtunnel into the netfilter core. Fixes: 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane") Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 ↵Jianguo Wu
behaviors input_action_end_dx4() and input_action_end_dx6() are called NF_HOOK() for PREROUTING hook, in PREROUTING hook, we should passing a valid indev, and a NULL outdev to NF_HOOK(), otherwise may trigger a NULL pointer dereference, as below: [74830.647293] BUG: kernel NULL pointer dereference, address: 0000000000000090 [74830.655633] #PF: supervisor read access in kernel mode [74830.657888] #PF: error_code(0x0000) - not-present page [74830.659500] PGD 0 P4D 0 [74830.660450] Oops: 0000 [#1] PREEMPT SMP PTI ... [74830.664953] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [74830.666569] RIP: 0010:rpfilter_mt+0x44/0x15e [ipt_rpfilter] ... [74830.689725] Call Trace: [74830.690402] <IRQ> [74830.690953] ? show_trace_log_lvl+0x1c4/0x2df [74830.692020] ? show_trace_log_lvl+0x1c4/0x2df [74830.693095] ? ipt_do_table+0x286/0x710 [ip_tables] [74830.694275] ? __die_body.cold+0x8/0xd [74830.695205] ? page_fault_oops+0xac/0x140 [74830.696244] ? exc_page_fault+0x62/0x150 [74830.697225] ? asm_exc_page_fault+0x22/0x30 [74830.698344] ? rpfilter_mt+0x44/0x15e [ipt_rpfilter] [74830.699540] ipt_do_table+0x286/0x710 [ip_tables] [74830.700758] ? ip6_route_input+0x19d/0x240 [74830.701752] nf_hook_slow+0x3f/0xb0 [74830.702678] input_action_end_dx4+0x19b/0x1e0 [74830.703735] ? input_action_end_t+0xe0/0xe0 [74830.704734] seg6_local_input_core+0x2d/0x60 [74830.705782] lwtunnel_input+0x5b/0xb0 [74830.706690] __netif_receive_skb_one_core+0x63/0xa0 [74830.707825] process_backlog+0x99/0x140 [74830.709538] __napi_poll+0x2c/0x160 [74830.710673] net_rx_action+0x296/0x350 [74830.711860] __do_softirq+0xcb/0x2ac [74830.713049] do_softirq+0x63/0x90 input_action_end_dx4() passing a NULL indev to NF_HOOK(), and finally trigger a NULL dereference in rpfilter_mt()->rpfilter_is_loopback(): static bool rpfilter_is_loopback(const struct sk_buff *skb, const struct net_device *in) { // in is NULL return skb->pkt_type == PACKET_LOOPBACK || in->flags & IFF_LOOPBACK; } Fixes: 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19netfilter: ipset: Fix suspicious rcu_dereference_protected()Jozsef Kadlecsik
When destroying all sets, we are either in pernet exit phase or are executing a "destroy all sets command" from userspace. The latter was taken into account in ip_set_dereference() (nfnetlink mutex is held), but the former was not. The patch adds the required check to rcu_dereference_protected() in ip_set_dereference(). Fixes: 4e7aaa6b82d6 ("netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type") Reported-by: syzbot+b62c37cdd58103293a5a@syzkaller.appspotmail.com Reported-by: syzbot+cfbe1da5fdfc39efc293@syzkaller.appspotmail.com Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202406141556.e0b6f17e-lkp@intel.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-19selftests: openvswitch: Set value to nla flags.Adrian Moreno
Netlink flags, although they don't have payload at the netlink level, are represented as having "True" as value in pyroute2. Without it, trying to add a flow with a flag-type action (e.g: pop_vlan) fails with the following traceback: Traceback (most recent call last): File "[...]/ovs-dpctl.py", line 2498, in <module> sys.exit(main(sys.argv)) ^^^^^^^^^^^^^^ File "[...]/ovs-dpctl.py", line 2487, in main ovsflow.add_flow(rep["dpifindex"], flow) File "[...]/ovs-dpctl.py", line 2136, in add_flow reply = self.nlm_request( ^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request return tuple(self._genlm_request(*argv, **kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in nlm_request return tuple(super().nlm_request(*argv, **kwarg)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request self.put(msg, msg_type, msg_flags, msg_seq=msg_seq) File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put self.sendto_gate(msg, addr) File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate msg.encode() File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode offset = self.encode_nlas(offset) ^^^^^^^^^^^^^^^^^^^^^^^^ File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas nla_instance.setvalue(cell[1]) File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue nlv.setvalue(nla_tuple[1]) ~~~~~~~~~^^^ IndexError: list index out of range Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19net: dsa: mt7530: add support for bridge port isolationMatthias Schiffer
Remove a pair of ports from the port matrix when both ports have the isolated flag set. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19net: dsa: mt7530: factor out bridge join/leave logicMatthias Schiffer
As preparation for implementing bridge port isolation, move the logic to add and remove bits in the port matrix into a new helper mt7530_update_port_member(), which is called from mt7530_port_bridge_join() and mt7530_port_bridge_leave(). Another part of the preparation is using dsa_port_offloads_bridge_dev() instead of dsa_port_offloads_bridge() to check for bridge membership, as we don't have a struct dsa_bridge in mt7530_port_bridge_flags(). The port matrix setting is slightly streamlined, now always first setting the mt7530_port's pm field and then writing the port matrix from that field into the hardware register, instead of duplicating the bit manipulation for both the struct field and the register. mt7530_port_bridge_join() was previously using |= to update the port matrix with the port bitmap, which was unnecessary, as pm would only have the CPU port set before joining a bridge; a simple assignment can be used for both joining and leaving (and will also work when individual bits are added/removed in port_bitmap with regard to the previous port matrix, which is what happens with port isolation). No functional change intended. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19octeontx2-pf: Fix linking objects into multiple modulesGeetha sowjanya
This patch fixes the below build warning messages that are caused due to linking same files to multiple modules by exporting the required symbols. "scripts/Makefile.build:244: drivers/net/ethernet/marvell/octeontx2/nic/Makefile: otx2_devlink.o is added to multiple modules: rvu_nicpf rvu_nicvf scripts/Makefile.build:244: drivers/net/ethernet/marvell/octeontx2/nic/Makefile: otx2_dcbnl.o is added to multiple modules: rvu_nicpf rvu_nicvf" Fixes: 8e67558177f8 ("octeontx2-pf: PFC config support with DCBx"). Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19Merge branch 'net-drop-rx-socket-tracepoint'David S. Miller
Yan Zhai says: ==================== net: pass receive socket to drop tracepoint We set up our production packet drop monitoring around the kfree_skb tracepoint. While this tracepoint is extremely valuable for diagnosing critical problems, it also has some limitation with drops on the local receive path: this tracepoint can only inspect the dropped skb itself, but such skb might not carry enough information to: 1. determine in which netns/container this skb gets dropped 2. determine by which socket/service this skb oughts to be received The 1st issue is because skb->dev is the only member field with valid netns reference. But skb->dev can get cleared or reused. For example, tcp_v4_rcv will clear skb->dev and in later processing it might be reused for OFO tree. The 2nd issue is because there is no reference on an skb that reliably points to a receiving socket. skb->sk usually points to the local sending socket, and it only points to a receive socket briefly after early demux stage, yet the socket can get stolen later. For certain drop reason like TCP OFO_MERGE, Zerowindow, UDP at PROTO_MEM error, etc, it is hard to infer which receiving socket is impacted. This cannot be overcome by simply looking at the packet header, because of complications like sk lookup programs. In the past, single purpose tracepoints like trace_udp_fail_queue_rcv_skb, trace_sock_rcvqueue_full, etc are added as needed to provide more visibility. This could be handled in a more generic way. In this change set we propose a new 'sk_skb_reason_drop' call as a drop-in replacement for kfree_skb_reason at various local input path. It accepts an extra receiving socket argument. Both issues above can be resolved via this new argument. V4->V5: rename rx_skaddr to rx_sk to be more clear visually, suggested by Jesper Dangaard Brouer. V3->V4: adjusted the TP_STRUCT field order to align better, suggested by Steven Rostedt. V2->V3: fixed drop_monitor function signatures; fixed a few uninitialized sks; Added a few missing report tags from test bots (also noticed by Dan Carpenter and Simon Horman). V1->V2: instead of using skb->cb, directly add the needed argument to trace_kfree_skb tracepoint. Also renamed functions as Eric Dumazet suggested. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19af_packet: use sk_skb_reason_drop to free rx packetsYan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011859.Aacus8GV-lkp@intel.com/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19udp: use sk_skb_reason_drop to free rx packetsYan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011751.NpVN0sSk-lkp@intel.com/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19tcp: use sk_skb_reason_drop to free rx packetsYan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011539.jhwBd7DX-lkp@intel.com/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19net: raw: use sk_skb_reason_drop to free rx packetsYan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19ping: use sk_skb_reason_drop to free rx packetsYan Zhai
Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19net: introduce sk_skb_reason_drop functionYan Zhai
Long used destructors kfree_skb and kfree_skb_reason do not pass receiving socket to packet drop tracepoints trace_kfree_skb. This makes it hard to track packet drops of a certain netns (container) or a socket (user application). The naming of these destructors are also not consistent with most sk/skb operating functions, i.e. functions named "sk_xxx" or "skb_xxx". Introduce a new functions sk_skb_reason_drop as drop-in replacement for kfree_skb_reason on local receiving path. Callers can now pass receiving sockets to the tracepoints. kfree_skb and kfree_skb_reason are still usable but they are now just inline helpers that call sk_skb_reason_drop. Note it is not feasible to do the same to consume_skb. Packets not dropped can flow through multiple receive handlers, and have multiple receiving sockets. Leave it untouched for now. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19net: add rx_sk to trace_kfree_skbYan Zhai
skb does not include enough information to find out receiving sockets/services and netns/containers on packet drops. In theory skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP stack for OOO packet lookup. Similarly, skb->sk often identifies a local sender, and tells nothing about a receiver. Allow passing an extra receiving socket to the tracepoint to improve the visibility on receiving drops. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19octeontx2-pf: Add error handling to VLAN unoffload handlingSimon Horman
otx2_sq_append_skb makes used of __vlan_hwaccel_push_inside() to unoffload VLANs - push them from skb meta data into skb data. However, it omitts a check for __vlan_hwaccel_push_inside() returning NULL. Found by inspection based on [1] and [2]. Compile tested only. [1] Re: [PATCH net-next v1] net: stmmac: Enable TSO on VLANs https://lore.kernel.org/all/ZmrN2W8Fye450TKs@shell.armlinux.org.uk/ [2] Re: [PATCH net-next v2] net: stmmac: Enable TSO on VLANs https://lore.kernel.org/all/CANn89i+11L5=tKsa7V7Aeyxaj6nYGRwy35PAbCRYJ73G+b25sg@mail.gmail.com/ Fixes: fd9d7859db6c ("octeontx2-pf: Implement ingress/egress VLAN offload") Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19Merge branch 'am65x-ptp'David S. Miller
Diogo Ivo says ==================== Enable PTP timestamping/PPS for AM65x SR1.0 devices This patch series enables support for PTP in AM65x SR1.0 devices. This feature relies heavily on the Industrial Ethernet Peripheral (IEP) hardware module, which implements a hardware counter through which time is kept. This hardware block is the basis for exposing a PTP hardware clock to userspace and for issuing timestamps for incoming/outgoing packets, allowing for time synchronization. The IEP also has compare registers that fire an interrupt when the counter reaches the value stored in a compare register. This feature allows us to support PPS events in the kernel. The changes are separated into five patches: - PATCH 01/05: Register SR1.0 devices with the IEP infrastructure to expose a PHC clock to userspace, allowing time to be adjusted using standard PTP tools. The code for issuing/ collecting packet timestamps is already present in the current state of the driver, so only this needs to be done. - PATCH 02/05: Remove unnecessary spinlock synchronization. - PATCH 03/05: Document IEP interrupt in DT binding. - PATCH 04/05: Add support for IEP compare event/interrupt handling to enable PPS events. - PATCH 05/05: Add the interrupts to the IOT2050 device tree. Currently every compare event generates two interrupts, the first corresponding to the actual event and the second being a spurious but otherwise harmless interrupt. The root cause of this has been identified and has been solved in the platform's SDK. A forward port of the SDK's patches also fixes the problem in upstream but is not included here since it's upstreaming is out of the scope of this series. If someone from TI would be willing to chime in and help get the interrupt changes upstream that would be great! Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> --- Changes in v4: - Remove unused 'flags' variables in patch 02/05 - Add patch 03/05 describing IEP interrupt in DT binding - Link to v3: https://lore.kernel.org/r/20240607-iep-v3-0-4824224105bc@siemens.com Changes in v3: - Collect Reviewed-by tags - Add patch 02/04 removing spinlocks from IEP driver - Use mutex-based synchronization when accessing HW registers - Link to v2: https://lore.kernel.org/r/20240604-iep-v2-0-ea8e1c0a5686@siemens.com Changes in v2: - Collect Reviewed-by tags - PATCH 01/03: Limit line length to 80 characters - PATCH 02/03: Proceed with limited functionality if getting IRQ fails, limit line length to 80 characters - Link to v1: https://lore.kernel.org/r/20240529-iep-v1-0-7273c07592d3@siemens.com ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19arm64: dts: ti: iot2050: Add IEP interrupts for SR1.0 devicesDiogo Ivo
Add the interrupts needed for PTP Hardware Clock support via IEP in SR1.0 devices. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19net: ti: icss-iep: Enable compare eventsDiogo Ivo
The IEP module supports compare events, in which a value is written to a hardware register and when the IEP counter reaches the written value an interrupt is generated. Add handling for this interrupt in order to support PPS events. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19dt-bindings: net: Add IEP interruptDiogo Ivo
The IEP interrupt is used in order to support both capture events, where an incoming external signal gets timestamped on arrival, and compare events, where an interrupt is generated internally when the IEP counter reaches a programmed value. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19net: ti: icss-iep: Remove spinlock-based synchronizationDiogo Ivo
As all sources of concurrency in hardware register access occur in non-interrupt context eliminate spinlock-based synchronization and rely on the mutex-based synchronization that is already present. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19net: ti: icssg-prueth: Enable PTP timestamping support for SR1.0 devicesDiogo Ivo
Enable PTP support for AM65x SR1.0 devices by registering with the IEP infrastructure in order to expose a PTP clock to userspace. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19Merge branch 'virtio_net-csum-xdp-fixes'David S. Miller
Heng Qi says: ==================== virtio_net: fixes for checksum offloading and XDP handling This series of patches aim to address two specific issues identified in the virtio_net driver related to checksum offloading and XDP processing of fully checksummed packets. The first patch corrects the handling of checksum offloading in the driver. The second patch addresses an issue where the XDP program had no trouble with fully checksummed packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19virtio_net: fixing XDP for fully checksummed packets handlingHeng Qi
The XDP program can't correctly handle partially checksummed packets, but works fine with fully checksummed packets. If the device has already validated fully checksummed packets, then the driver doesn't need to re-validate them, saving CPU resources. Additionally, the driver does not drop all partially checksummed packets when VIRTIO_NET_F_GUEST_CSUM is not negotiated. This is not a bug, as the driver has always done this. Fixes: 436c9453a1ac ("virtio-net: keep vnet header zeroed after processing XDP") Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19virtio_net: checksum offloading handling fixHeng Qi
In virtio spec 0.95, VIRTIO_NET_F_GUEST_CSUM was designed to handle partially checksummed packets, and the validation of fully checksummed packets by the device is independent of VIRTIO_NET_F_GUEST_CSUM negotiation. However, the specification erroneously stated: "If VIRTIO_NET_F_GUEST_CSUM is not negotiated, the device MUST set flags to zero and SHOULD supply a fully checksummed packet to the driver." This statement is inaccurate because even without VIRTIO_NET_F_GUEST_CSUM negotiation, the device can still set the VIRTIO_NET_HDR_F_DATA_VALID flag. Essentially, the device can facilitate the validation of these packets' checksums - a process known as RX checksum offloading - removing the need for the driver to do so. This scenario is currently not implemented in the driver and requires correction. The necessary specification correction[1] has been made and approved in the virtio TC vote. [1] https://lists.oasis-open.org/archives/virtio-comment/202401/msg00011.html Fixes: 4f49129be6fa ("virtio-net: Set RXCSUM feature if GUEST_CSUM is available") Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19net: usb: ax88179_178a: improve reset checkJose Ignacio Tornos Martinez
After ecf848eb934b ("net: usb: ax88179_178a: fix link status when link is set to down/up") to not reset from usbnet_open after the reset from usbnet_probe at initialization stage to speed up this, some issues have been reported. It seems to happen that if the initialization is slower, and some time passes between the probe operation and the open operation, the second reset from open is necessary too to have the device working. The reason is that if there is no activity with the phy, this is "disconnected". In order to improve this, the solution is to detect when the phy is "disconnected", and we can use the phy status register for this. So we will only reset the device from reset operation in this situation, that is, only if necessary. The same bahavior is happening when the device is stopped (link set to down) and later is restarted (link set to up), so if the phy keeps working we only need to enable the mac again, but if enough time passes between the device stop and restart, reset is necessary, and we can detect the situation checking the phy status register too. cc: stable@vger.kernel.org # 6.6+ Fixes: ecf848eb934b ("net: usb: ax88179_178a: fix link status when link is set to down/up") Reported-by: Yongqin Liu <yongqin.liu@linaro.org> Reported-by: Antje Miederhöfer <a.miederhoefer@gmx.de> Reported-by: Arne Fitzenreiter <arne_f@ipfire.org> Tested-by: Yongqin Liu <yongqin.liu@linaro.org> Tested-by: Antje Miederhöfer <a.miederhoefer@gmx.de> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-19rds:Simplify the allocation of slab cachesHongfu Li
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Hongfu Li <lihongfu@kylinos.cn> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-18net: mana: Add support for page sizes other than 4KB on ARM64Haiyang Zhang
As defined by the MANA Hardware spec, the queue size for DMA is 4KB minimal, and power of 2. And, the HWC queue size has to be exactly 4KB. To support page sizes other than 4KB on ARM64, define the minimal queue size as a macro separately from the PAGE_SIZE, which we always assumed it to be 4KB before supporting ARM64. Also, add MANA specific macros and update code related to size alignment, DMA region calculations, etc. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1718655446-6576-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18Merge branch 'net-mlx4_en-use-ethtool_puts-sprintf'Jakub Kicinski
Kamal Heib says: ==================== net/mlx4_en: Use ethtool_puts/sprintf This patchset updates the mlx4_en driver to use the ethtool_puts and ethtool_sprintf helper functions. Signed-off-by: Kamal Heib <kheib@redhat.com> ==================== Link: https://lore.kernel.org/r/20240617172329.239819-1-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18net/mlx4_en: Use ethtool_puts/sprintf to fill stats stringsKamal Heib
Use the ethtool_puts/ethtool_sprintf helper to print the stats strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-4-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18net/mlx4_en: Use ethtool_puts to fill selftest stringsKamal Heib
Use the ethtool_puts helper to print the selftest strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-3-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18net/mlx4_en: Use ethtool_puts to fill priv flags stringsKamal Heib
Use the ethtool_puts helper to print the priv flags strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-2-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>