summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-02net: core: annotate socks of struct sock_reuseport with __counted_byDmitry Antipov
According to '__reuseport_alloc()', annotate flexible array member 'sock' of 'struct sock_reuseport' with '__counted_by()' and use convenient 'struct_size()' to simplify the math used in 'kzalloc()'. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240801142311.42837-1-dmantipov@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02tipc: guard against string buffer overrunSimon Horman
Smatch reports that copying media_name and if_name to name_parts may overwrite the destination. .../bearer.c:166 bearer_name_validate() error: strcpy() 'media_name' too large for 'name_parts->media_name' (32 vs 16) .../bearer.c:167 bearer_name_validate() error: strcpy() 'if_name' too large for 'name_parts->if_name' (1010102 vs 16) This does seem to be the case so guard against this possibility by using strscpy() and failing if truncation occurs. Introduced by commit b97bf3fd8f6a ("[TIPC] Initial merge") Compile tested only. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240801-tipic-overrun-v2-1-c5b869d1f074@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02Merge branch 'ibmveth-rr-performance'Jakub Kicinski
Nick Child says: ==================== ibmveth RR performance This patchset aims to increase the ibmveth drivers small packet request response rate. These 2 patches address: 1. NAPI rescheduling technique 2. Driver-side processing of small packets Testing over several netperf tcp_rr connections, we saw a 30% increase in transactions per second. No regressions were observed in other workloads. ==================== Link: https://patch.msgid.link/20240801211215.128101-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02ibmveth: Recycle buffers during replenish phaseNick Child
When the length of a packet is under the rx_copybreak threshold, the buffer is copied into a new skb and sent up the stack. This allows the dma mapped memory to be recycled back to FW. Previously, the reuse of the DMA space was handled immediately. This means that further packet processing has to wait until h_add_logical_lan finishes for this packet. Therefore, when reusing a packet, offload the hcall to the replenish function. As a result, much of the shared logic between the recycle and replenish functions can be removed. This change increases TCP_RR packet rate by another 15% (370k to 430k txns). We can see the ftrace data supports this: PREV: ibmveth_poll = 8078553.0 us / 190999.0 hits = AVG 42.3 us NEW: ibmveth_poll = 7632787.0 us / 224060.0 hits = AVG 34.07 us Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20240801211215.128101-3-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02ibmveth: Optimize poll rescheduling processNick Child
When the ibmveth driver processes less than the budget, it must call napi_complete_done() to release the instance. This function will return false if the driver should avoid rearming interrupts. Previously, the driver was ignoring the return code of napi_complete_done(). As a result, there were unnecessary calls to enable the veth irq. Therefore, use the return code napi_complete_done() to determine if irq rearm is necessary. Additionally, in the event that new data is received immediately after rearming interrupts, rather than just rescheduling napi, also jump back to the poll processing loop since we are already in the poll function (and know that we did not expense all of budget). This slight tweak results in a 15% increase in TCP_RR transaction rate (320k to 370k txns). We can see the ftrace data supports this: PREV: ibmveth_poll = 8818014.0 us / 182802.0 hits = AVG 48.24 NEW: ibmveth_poll = 8082398.0 us / 191413.0 hits = AVG 42.22 Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20240801211215.128101-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02linkmode: Change return type of linkmode_andnot to boolSimon Horman
linkmode_andnot() simply returns the result of bitmap_andnot(). And the return type of bitmap_andnot() is bool. So it makes sense for the return type of linkmode_andnot() to also be bool. I checked all call-sites and they either ignore the return value or treat it as a bool. Compile tested only. Link: https://lore.kernel.org/netdev/68088998-4486-4930-90a4-96a32f08c490@lunn.ch/ Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240801-linkfield-bowl-v1-1-d58f68967802@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02Merge branch 'add-second-qdma-support-for-en7581-eth-controller'Jakub Kicinski
Lorenzo Bianconi says: ==================== Add second QDMA support for EN7581 eth controller EN7581 SoC supports two independent QDMA controllers to connect the Ethernet Frame Engine (FE) to the CPU. Introduce support for the second QDMA controller. This is a preliminary series to support multiple FE ports (e.g. connected to a second PHY controller). ==================== Link: https://patch.msgid.link/cover.1722522582.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02net: airoha: Link the gdm port to the selected qdma controllerLorenzo Bianconi
Link the running gdm port to the qdma controller used to connect with the CPU. Moreover, load all QDMA controllers available on EN7581 SoC. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/95b515df34ba4727f7ae5b14a1d0462cceec84ff.1722522582.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02net: airoha: Start all qdma NAPIs in airoha_probe()Lorenzo Bianconi
This is a preliminary patch to support multi-QDMA controllers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/b51cf69c94d8cbc81e0a0b35587f024d01e6d9c0.1722522582.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02net: airoha: Allow mapping IO region for multiple qdma controllersLorenzo Bianconi
Map MMIO regions of both qdma controllers available on EN7581 SoC. Run airoha_hw_cleanup routine for both QDMA controllers available on EN7581 SoC removing airoha_eth module or in airoha_probe error path. This is a preliminary patch to support multi-QDMA controllers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/a734ae608da14b67ae749b375d880dbbc70868ea.1722522582.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02net: airoha: Use qdma pointer as private structure in airoha_irq_handler routineLorenzo Bianconi
This is a preliminary patch to support multi-QDMA controllers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/1e40c3cb973881c0eb3c3c247c78550da62054ab.1722522582.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02net: airoha: Add airoha_qdma pointer in airoha_tx_irq_queue/airoha_queue ↵Lorenzo Bianconi
structures Move airoha_eth pointer in airoha_qdma structure from airoha_tx_irq_queue/airoha_queue ones. This is a preliminary patch to introduce support for multi-QDMA controllers available on EN7581. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/074565b82fd0ceefe66e186f21133d825dbd48eb.1722522582.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02net: airoha: Move irq_mask in airoha_qdma structureLorenzo Bianconi
QDMA controllers have independent irq lines, so move irqmask in airoha_qdma structure. This is a preliminary patch to support multiple QDMA controllers. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/1c8a06e8be605278a7b2f3cd8ac06e74bf5ebf2b.1722522582.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02net: airoha: Move airoha_queues in airoha_qdmaLorenzo Bianconi
QDMA controllers available in EN7581 SoC have independent tx/rx hw queues so move them in airoha_queues structure. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/795fc4797bffbf7f0a1351308aa9bf0e65b5126e.1722522582.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02net: airoha: Introduce airoha_qdma structLorenzo Bianconi
Introduce airoha_qdma struct and move qdma IO register mapping in airoha_qdma. This is a preliminary patch to enable both QDMA controllers available on EN7581 SoC. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/7df163bdc72ee29c3d27a0cbf54522ffeeafe53c.1722522582.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02eth: fbnic: select DEVLINK and PAGE_POOLSimon Horman
Build bot reports undefined references to devlink functions. And local testing revealed undefined references to page_pool functions. Based on a patch by Jakub Kicinski <kuba@kernel.org> Fixes: 1a9d48892ea5 ("eth: fbnic: Allocate core device specific structures and devlink interface") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408011219.hiPmwwAs-lkp@intel.com/ Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240802-fbnic-select-v2-1-41f82a3e0178@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02selftests: net: ksft: print more of the stack for checksJakub Kicinski
Print more stack frames and the failing line when check fails. This helps when tests use helpers to do the checks. Before: # At ./ksft/drivers/net/hw/rss_ctx.py line 92: # Check failed 1037698 >= 396893.0 traffic on other queues:[344612, 462380, 233020, 449174, 342298] not ok 8 rss_ctx.test_rss_context_queue_reconfigure After: # Check| At ./ksft/drivers/net/hw/rss_ctx.py, line 387, in test_rss_context_queue_reconfigure: # Check| test_rss_queue_reconfigure(cfg, main_ctx=False) # Check| At ./ksft/drivers/net/hw/rss_ctx.py, line 230, in test_rss_queue_reconfigure: # Check| _send_traffic_check(cfg, port, ctx_ref, { 'target': (0, 3), # Check| At ./ksft/drivers/net/hw/rss_ctx.py, line 92, in _send_traffic_check: # Check| ksft_lt(sum(cnts[i] for i in params['noise']), directed / 2, # Check failed 1045235 >= 405823.5 traffic on other queues (context 1)':[460068, 351995, 565970, 351579, 127270] not ok 8 rss_ctx.test_rss_context_queue_reconfigure Link: https://patch.msgid.link/20240801232317.545577-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02selftests: net: ksft: replace 95 with errno.EOPNOTSUPPStanislav Fomichev
Petr suggested to use errno.EOPNOTSUPP instead of hard-coded 95 in the new test case. Adjust existing ones to match this style. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20240802000309.2368-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02selftests: net: ksft: support marking tests as disruptiveStanislav Fomichev
Add new @ksft_disruptive decorator to mark the tests that might be disruptive to the system. Depending on how well the previous test works in the CI we might want to disable disruptive tests by default and only let the developers run them manually. KSFT framework runs disruptive tests by default. DISRUPTIVE=False environment (or config file) can be used to disable these tests. ksft_setup should be called by the test cases that want to use new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now). In the future we can add similar decorators to, for example, avoid running slow tests all the time. And/or have some option to run only 'fast' tests for some sort of smoke test scenario. $ DISRUPTIVE=False ./stats.py KTAP version 1 1..5 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum ok 4 stats.qstat_by_ifindex ok 5 stats.check_down # SKIP marked as disruptive # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:1 error:0 v3: - parse yes and properly treat non-zero nums as true (Petr) v2: - convert from cli argument to env variable (Jakub) Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20240802000309.2368-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02selftests: net-drv: exercise queue stats when the device is downStanislav Fomichev
Verify that total device stats don't decrease after it has been turned down. Also make sure the device doesn't crash when we access per-queue stats when it's down (in case it tries to access some pointers that are NULL). KTAP version 1 1..5 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum ok 4 stats.qstat_by_ifindex ok 5 stats.check_down # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0 v3: - use errno.EOPNOTSUPP (Petr) - move qstat[0] under try (Petr) v2: - KTAP output formatting (Jakub) - defer instead of try/finally (Jakub) - disappearing stats is an error (Jakub) - ksft_ge instead of open coding (Jakub) Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20240802000309.2368-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02net: remove IFF_* re-definitionJakub Kicinski
We re-define values of enum netdev_priv_flags as preprocessor macros with the same name. I guess this was done to avoid breaking out of tree modules which may use #ifdef X for kernel compatibility? Commit 7aa98047df95 ("net: move net_device priv_flags out from UAPI") which added the enum doesn't say. In any case, the flags with defines are quite old now, and defines for new flags don't get added. OOT drivers have to resort to code greps for compat detection, anyway. Let's delete these defines, save LoC, help LXR link to the right place. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20240801163401.378723-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02Merge branch 'axienet-coding-style' into mainDavid S. Miller
Radhey Shyam Pandey says: ==================== net: axienet: Fix coding style issues This patchset replace all occurences of (1<<x) by BIT(x) to get rid of checkpatch.pl "CHECK" output "Prefer using the BIT macro". It also removes unnecessary ftrace-like logging, add missing blank line after declaration and remove unnecessary parentheses around 'ndev->mtu <= XAE_JUMBO_MTU' and 'ndev->mtu > XAE_MTU'. Changes for v2: - Split each coding style change into separate patch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02net: axienet: remove unnecessary parenthesesRadhey Shyam Pandey
Remove unnecessary parentheses around 'ndev->mtu <= XAE_JUMBO_MTU' and 'ndev->mtu > XAE_MTU'. Reported by checkpatch. CHECK: Unnecessary parentheses around 'ndev->mtu > XAE_MTU' + if ((ndev->mtu > XAE_MTU) && + (ndev->mtu <= XAE_JUMBO_MTU)) { CHECK: Unnecessary parentheses around 'ndev->mtu <= XAE_JUMBO_MTU' + if ((ndev->mtu > XAE_MTU) && + (ndev->mtu <= XAE_JUMBO_MTU)) { Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02net: axienet: remove unnecessary ftrace-like loggingRadhey Shyam Pandey
remove unnecessary ftrace-like logging. Also fixes below checkpatch WARNING. WARNING: Unnecessary ftrace-like logging - prefer using ftrace + dev_dbg(&ndev->dev, "%s\n", __func__); Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02net: axienet: add missing blank line after declarationRadhey Shyam Pandey
Add missing blank line after declaration. Fixes below checkpatch warnings. WARNING: Missing a blank line after declarations + struct sockaddr *addr = p; + axienet_set_mac_address(ndev, addr->sa_data); WARNING: Missing a blank line after declarations + struct axienet_local *lp = netdev_priv(ndev); + disable_irq(lp->tx_irq); Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02net: axienet: Replace the occurrences of (1<<x) by BIT(x)Appana Durga Kedareswara Rao
Replace all occurences of (1<<x) by BIT(x) to get rid of checkpatch.pl "CHECK" output "Prefer using the BIT macro". Signed-off-by: Appana Durga Kedareswara Rao <appana.durga.rao@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02Merge branch 'vsock-virtio' into mainDavid S. Miller
Luigi Leonardi says: ==================== vsock: avoid queuing on intermediate queue if possible This series introduces an optimization for vsock/virtio to reduce latency and increase the throughput: When the guest sends a packet to the host, and the intermediate queue (send_pkt_queue) is empty, if there is enough space, the packet is put directly in the virtqueue. v3->v4 While running experiments on fio with 64B payload, I realized that there was a mistake in my fio configuration, so I re-ran all the experiments and now the latency numbers are indeed lower with the patch applied. I also noticed that I was kicking the host without the lock. - Fixed a configuration mistake on fio and re-ran all experiments. - Fio latency measurement using 64B payload. - virtio_transport_send_skb_fast_path sends kick with the tx_lock acquired - Addressed all minor style changes requested by maintainer. - Rebased on latest net-next - Link to v3: https://lore.kernel.org/r/20240711-pinna-v3-0-697d4164fe80@outlook.com v2->v3 - Performed more experiments using iperf3 using multiple streams - Handling of reply packets removed from virtio_transport_send_skb, as is needed just by the worker. - Removed atomic_inc/atomic_sub when queuing directly to the vq. - Introduced virtio_transport_send_skb_fast_path that handles the steps for sending on the vq. - Fixed a missing mutex_unlock in error path. - Changed authorship of the second commit - Rebased on latest net-next v1->v2 In this v2 I replaced a mutex_lock with a mutex_trylock because it was insidea RCU critical section. I also added a check on tx_run, so if the module is being removed the packet is not queued. I'd like to thank Stefano for reporting the tx_run issue. Applied all Stefano's suggestions: - Minor code style changes - Minor commit text rewrite Performed more experiments: - Check if all the packets go directly to the vq (Matias' suggestion) - Used iperf3 to see if there is any improvement in overall throughput from guest to host - Pinned the vhost process to a pCPU. - Run fio using 512B payload Rebased on latest net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02test/vsock: add ioctl unsent bytes testLuigi Leonardi
Introduce two tests, one for SOCK_STREAM and one for SOCK_SEQPACKET, which use SIOCOUTQ ioctl to check that the number of unsent bytes is zero after delivering a packet. vsock_connect and vsock_accept are no longer static: this is to create more generic tests, allowing code to be reused for SEQPACKET and STREAM. Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02vsock/virtio: add SIOCOUTQ support for all virtio based transportsLuigi Leonardi
Introduce support for virtio_transport_unsent_bytes ioctl for virtio_transport, vhost_vsock and vsock_loopback. For all transports the unsent bytes counter is incremented in virtio_transport_get_credit. In virtio_transport (G2H) and in vhost-vsock (H2G) the counter is decremented when the skbuff is consumed. In vsock_loopback the same skbuff is passed from the transmitter to the receiver, so the counter is decremented before queuing the skbuff to the receiver. Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-02vsock: add support for SIOCOUTQ ioctlLuigi Leonardi
Add support for ioctl(s) in AF_VSOCK. The only ioctl available is SIOCOUTQ/TIOCOUTQ, which returns the number of unsent bytes in the socket. This information is transport-specific and is delegated to them using a callback. Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Luigi Leonardi <luigi.leonardi@outlook.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-01net: Use of_property_read_bool()Rob Herring (Arm)
Use of_property_read_bool() to read boolean properties rather than of_find_property(). This is part of a larger effort to remove callers of of_find_property() and similar functions. of_find_property() leaks the DT struct property and data pointers which is a problem for dynamically allocated nodes which may be freed. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Link: https://patch.msgid.link/20240731191601.1714639-2-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01net: mdio: Use of_property_count_u32_elems() to get property lengthRob Herring (Arm)
Replace of_get_property() with the type specific of_property_count_u32_elems() to get the property length. This is part of a larger effort to remove callers of of_get_property() and similar functions. of_get_property() leaks the DT property data pointer which is a problem for dynamically allocated nodes which may be freed. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240731201514.1839974-2-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01net: phy: qca807x: Drop unnecessary and broken DT validationRob Herring (Arm)
The check for "leds" and "gpio-controller" both being present is never true because "leds" is a node, not a property. This could be fixed with a check for child node, but there's really no need for the kernel to validate a DT. Just continue ignoring the LEDs if GPIOs are present. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240731201703.1842022-2-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01net: mctp: Consistent peer address handling in ioctl tag allocationJohn Wang
When executing ioctl to allocate tags, if the peer address is 0, mctp_alloc_local_tag now replaces it with 0xff. However, during tag dropping, this replacement is not performed, potentially causing the key not to be dropped as expected. Signed-off-by: John Wang <wangzhiqiang02@ieisystem.com> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20240730084636.184140-1-wangzhiqiang02@ieisystem.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Link: https://patch.msgid.link/20240801131917.34494-1-pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01Merge tag 'net-6.11-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from wireless, bleutooth, BPF and netfilter. Current release - regressions: - core: drop bad gso csum_start and offset in virtio_net_hdr - wifi: mt76: fix null pointer access in mt792x_mac_link_bss_remove - eth: tun: add missing bpf_net_ctx_clear() in do_xdp_generic() - phy: aquantia: only poll GLOBAL_CFG regs on aqr113, aqr113c and aqr115c Current release - new code bugs: - smc: prevent UAF in inet_create() - bluetooth: btmtk: fix kernel crash when entering btmtk_usb_suspend - eth: bnxt: reject unsupported hash functions Previous releases - regressions: - sched: act_ct: take care of padding in struct zones_ht_key - netfilter: fix null-ptr-deref in iptable_nat_table_init(). - tcp: adjust clamping window for applications specifying SO_RCVBUF Previous releases - always broken: - ethtool: rss: small fixes to spec and GET - mptcp: - fix signal endpoint re-add - pm: fix backup support in signal endpoints - wifi: ath12k: fix soft lockup on suspend - eth: bnxt_en: fix RSS logic in __bnxt_reserve_rings() - eth: ice: fix AF_XDP ZC timeout and concurrency issues - eth: mlx5: - fix missing lock on sync reset reload - fix error handling in irq_pool_request_irq" * tag 'net-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits) mptcp: fix duplicate data handling mptcp: fix bad RCVPRUNED mib accounting ipv6: fix ndisc_is_useropt() handling for PIO igc: Fix double reset adapter triggered from a single taprio cmd net: MAINTAINERS: Demote Qualcomm IPA to "maintained" net: wan: fsl_qmc_hdlc: Discard received CRC net: wan: fsl_qmc_hdlc: Convert carrier_lock spinlock to a mutex net/mlx5e: Add a check for the return value from mlx5_port_set_eth_ptys net/mlx5e: Fix CT entry update leaks of modify header context net/mlx5e: Require mlx5 tc classifier action support for IPsec prio capability net/mlx5: Fix missing lock on sync reset reload net/mlx5: Lag, don't use the hardcoded value of the first port net/mlx5: DR, Fix 'stack guard page was hit' error in dr_rule net/mlx5: Fix error handling in irq_pool_request_irq net/mlx5: Always drain health in shutdown callback net: Add skbuff.h to MAINTAINERS r8169: don't increment tx_dropped in case of NETDEV_TX_BUSY netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init(). netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init(). net: drop bad gso csum_start and offset in virtio_net_hdr ...
2024-08-01ethtool: Don't check for NULL info in prepare_data callbacksSimon Horman
Since commit f946270d05c2 ("ethtool: netlink: always pass genl_info to .prepare_data") the info argument of prepare_data callbacks is never NULL. Remove checks present in callback implementations. Link: https://lore.kernel.org/netdev/20240703121237.3f8b9125@kernel.org/ Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240731-prepare_data-null-check-v1-1-627f2320678f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01RDS: IB: Remove unused declarationsYue Haibing
Commit f4f943c958a2 ("RDS: IB: ack more receive completions to improve performance") removed rds_ib_recv_tasklet_fn() implementation but not the declaration. And commit ec16227e1414 ("RDS/IB: Infiniband transport") declared but never implemented other functions. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Link: https://patch.msgid.link/20240731063630.3592046-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01net: ethernet: mtk_eth_soc: drop clocks unused by Ethernet driverDaniel Golle
Clocks for SerDes and PHY are going to be handled by standalone drivers for each of those hardware components. Drop them from the Ethernet driver. The clocks which are being removed for this patch are responsible for the for the SerDes PCS and PHYs used for the 2nd and 3rd MAC which are anyway not yet supported. Hence backwards compatibility is not an issue. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/b5faaf69b5c6e3e155c64af03706c3c423c6a1c9.1722335682.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01net/mlx5: Reclaim max 50K pages at onceAnand Khoje
In non FLR context, at times CX-5 requests release of ~8 million FW pages. This needs humongous number of cmd mailboxes, which to be released once the pages are reclaimed. Release of humongous number of cmd mailboxes is consuming cpu time running into many seconds. Which with non preemptible kernels is leading to critical process starving on that cpu’s RQ. On top of it, the FW does not use all the mailbox messages as it has a limit of releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES + MLX5_PAGES_TAKE device command. Hence, the allocation of these many mailboxes is extra and adds unnecessary overhead. To alleviate this, this change restricts the total number of pages a worker will try to reclaim to maximum 50K pages in one go. Our tests have shown significant benefit of this change in terms of time consumed by dma_pool_free(). During a test where an event was raised by HCA to release 1.3 Million pages, following observations were made: - Without this change: Number of mailbox messages allocated was around 20K, to accommodate the DMA addresses of 1.3 million pages. The average time spent by dma_pool_free() to free the DMA pool is between 16 usec to 32 usec. value ------------- Distribution ------------- count 256 | 0 512 |@ 287 1024 |@@@ 1332 2048 |@ 656 4096 |@@@@@ 2599 8192 |@@@@@@@@@@ 4755 16384 |@@@@@@@@@@@@@@@ 7545 32768 |@@@@@ 2501 65536 | 0 - With this change: Number of mailbox messages allocated was around 800; this was to accommodate DMA addresses of only 50K pages. The average time spent by dma_pool_free() to free the DMA pool in this case lies between 1 usec to 2 usec. value ------------- Distribution ------------- count 256 | 0 512 |@@@@@@@@@@@@@@@@@@ 346 1024 |@@@@@@@@@@@@@@@@@@@@@@ 435 2048 | 0 4096 | 0 8192 | 1 16384 | 0 Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240730073634.114407-1-anand.a.khoje@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-01Merge branch 'mptcp-fix-duplicate-data-handling'Paolo Abeni
Matthieu Baerts says: ==================== mptcp: fix duplicate data handling In some cases, the subflow-level's copied_seq counter was incorrectly increased, leading to an unexpected subflow reset. Patch 1/2 fixes the RCVPRUNED MIB counter that was attached to the wrong event since its introduction in v5.14, backported to v5.11. Patch 2/2 fixes the copied_seq counter issues, is present since v5.10. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> ==================== Link: https://patch.msgid.link/20240731-upstream-net-20240731-mptcp-dup-data-v1-0-bde833fa628a@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01mptcp: fix duplicate data handlingPaolo Abeni
When a subflow receives and discards duplicate data, the mptcp stack assumes that the consumed offset inside the current skb is zero. With multiple subflows receiving data simultaneously such assertion does not held true. As a result the subflow-level copied_seq will be incorrectly increased and later on the same subflow will observe a bad mapping, leading to subflow reset. Address the issue taking into account the skb consumed offset in mptcp_subflow_discard_data(). Fixes: 04e4cd4f7ca4 ("mptcp: cleanup mptcp_subflow_discard_data()") Cc: stable@vger.kernel.org Link: https://github.com/multipath-tcp/mptcp_net-next/issues/501 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01mptcp: fix bad RCVPRUNED mib accountingPaolo Abeni
Since its introduction, the mentioned MIB accounted for the wrong event: wake-up being skipped as not-needed on some edge condition instead of incoming skb being dropped after landing in the (subflow) receive queue. Move the increment in the correct location. Fixes: ce599c516386 ("mptcp: properly account bulk freed memory") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01Merge tag 'nf-24-07-31' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Fix a possible null-ptr-deref sometimes triggered by iptables-restore at boot time. Register iptables {ipv4,ipv6} nat table pernet in first place to fix this issue. Patch #1 and #2 from Kuniyuki Iwashima. netfilter pull request 24-07-31 * tag 'nf-24-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init(). netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init(). ==================== Link: https://patch.msgid.link/20240731213046.6194-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01ipv6: fix ndisc_is_useropt() handling for PIOMaciej Żenczykowski
The current logic only works if the PIO is between two other ND user options. This fixes it so that the PIO can also be either before or after other ND user options (for example the first or last option in the RA). side note: there's actually Android tests verifying a portion of the old broken behaviour, so: https://android-review.googlesource.com/c/kernel/tests/+/3196704 fixes those up. Cc: Jen Linkova <furry@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Patrick Rohr <prohr@google.com> Cc: David Ahern <dsahern@kernel.org> Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Maciej Żenczykowski <maze@google.com> Fixes: 048c796beb6e ("ipv6: adjust ndisc_is_useropt() to also return true for PIO") Link: https://patch.msgid.link/20240730001748.147636-1-maze@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-01net: skbuff: Skip early return in skb_unref when debuggingBreno Leitao
This patch modifies the skb_unref function to skip the early return optimization when CONFIG_DEBUG_NET is enabled. The change ensures that the reference count decrement always occurs in debug builds, allowing for more thorough checking of SKB reference counting. Previously, when the SKB's reference count was 1 and CONFIG_DEBUG_NET was not set, the function would return early after a memory barrier (smp_rmb()) without decrementing the reference count. This optimization assumes it's safe to proceed with freeing the SKB without the overhead of an atomic decrement from 1 to 0. With this change: - In non-debug builds (CONFIG_DEBUG_NET not set), behavior remains unchanged, preserving the performance optimization. - In debug builds (CONFIG_DEBUG_NET set), the reference count is always decremented, even when it's 1, allowing for consistent behavior and potentially catching subtle SKB management bugs. This modification enhances debugging capabilities for networking code without impacting performance in production kernels. It helps kernel developers identify and diagnose issues related to SKB management and reference counting in the network stack. Cc: Chris Mason <clm@fb.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240729104741.370327-1-leitao@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-31Merge branch 'ethernet-convert-from-tasklet-to-bh-workqueue'Jakub Kicinski
Allen Pais says: ==================== ethernet: Convert from tasklet to BH workqueue [part] The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts a few drivers in drivers/ethernet/* from tasklet to BH workqueue. The next set will be sent out after the next -rc is out. v2: https://lore.kernel.org/20240621183947.4105278-1-allen.lkml@gmail.com v1: https://lore.kernel.org/20240507190111.16710-2-apais@linux.microsoft.com ==================== Link: https://patch.msgid.link/20240730183403.4176544-1-allen.lkml@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net: macb: Convert tasklet API to new bottom half workqueue mechanismAllen Pais
Migrate tasklet APIs to the new bottom half workqueue mechanism. It replaces all occurrences of tasklet usage with the appropriate workqueue APIs throughout the macb driver. This transition ensures compatibility with the latest design and enhances performance. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://patch.msgid.link/20240730183403.4176544-5-allen.lkml@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net: cnic: Convert tasklet API to new bottom half workqueue mechanismAllen Pais
Migrate tasklet APIs to the new bottom half workqueue mechanism. It replaces all occurrences of tasklet usage with the appropriate workqueue APIs throughout the cnic driver. This transition ensures compatibility with the latest design and enhances performance. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://patch.msgid.link/20240730183403.4176544-4-allen.lkml@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-31net: xgbe: Convert tasklet API to new bottom half workqueue mechanismAllen Pais
Migrate tasklet APIs to the new bottom half workqueue mechanism. It replaces all occurrences of tasklet usage with the appropriate workqueue APIs throughout the xgbe driver. This transition ensures compatibility with the latest design and enhances performance. Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://patch.msgid.link/20240730183403.4176544-3-allen.lkml@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>