summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-09-29selftest: packetdrill: Import client-ack-dropped-then-recovery-ms-timestamps.pktKuniyuki Iwashima
This also does not have the non-experimental version, so converted to FO. The comment in .pkt explains the detailed scenario. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-14-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Import sockopt-fastopen-key.pktKuniyuki Iwashima
sockopt-fastopen-key.pkt does not have the non-experimental version, so the Experimental version is converted, FOEXP -> FO. The test sets net.ipv4.tcp_fastopen_key=0-0-0-0 and instead sets another key via setsockopt(TCP_FASTOPEN_KEY). The first listener generates a valid cookie in response to TFO option without cookie, and the second listner creates a TFO socket using the valid cookie. TCP_FASTOPEN_KEY is adjusted to use the common key in default.sh so that we can use TFO_COOKIE and support dualstack. Similarly, TFO_COOKIE_ZERO for the 0-0-0-0 key is defined. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-13-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Refine tcp_fastopen_server_reset-after-disconnect.pkt.Kuniyuki Iwashima
These changes are applied to follow the imported packetdrill tests. * Call setsockopt(TCP_FASTOPEN) * Remove unnecessary accept() delay * Add assertion for TCP states * Rename to tcp_fastopen_server_trigger-rst-reconnect.pkt. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-12-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Import opt34/*-trigger-rst.pkt.Kuniyuki Iwashima
This imports the non-experimental version of opt34/*-trigger-rst.pkt. | accept() | SYN data | -----------------------------------+----------+----------+ listener-closed-trigger-rst.pkt | no | unread | unread-data-closed-trigger-rst.pkt | yes | unread | Both files test that close()ing a SYN_RECV socket with unread SYN data triggers RST. The files are renamed to have the common prefix, trigger-rst. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-11-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Import opt34/reset-* tests.Kuniyuki Iwashima
This imports the non-experimental version of opt34/reset-*.pkt. | Child | RST | sk_err | ---------------------------------+---------+-------------------------------+---------+ reset-after-accept.pkt | TFO | after accept(), SYN_RECV | read() | reset-close-with-unread-data.pkt | TFO | after accept(), SYN_RECV | write() | reset-before-accept.pkt | TFO | before accept(), SYN_RECV | read() | reset-non-tfo-socket.pkt | non-TFO | before accept(), ESTABLISHED | write() | The first 3 files test scenarios where a SYN_RECV socket receives RST before/after accept() and data in SYN must be read() without error, but the following read() or fist write() will return ECONNRESET. The last test is similar but with non-TFO socket. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Import opt34/icmp-before-accept.pkt.Kuniyuki Iwashima
This imports the non-experimental version of icmp-before-accept.pkt. This file tests the scenario where an ICMP unreachable packet for a not-yet-accept()ed socket changes its state to TCP_CLOSE, but the SYN data must be read without error, and the following read() returns EHOSTUNREACH. Note that this test support only IPv4 as icmp is used. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Import opt34/fin-close-socket.pkt.Kuniyuki Iwashima
This imports the non-experimental version of fin-close-socket.pkt. This file tests the scenario where a TFO child socket's state transitions from SYN_RECV to CLOSE_WAIT before accept()ed. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Add test for experimental option.Kuniyuki Iwashima
The only difference between non-experimental vs experimental TFO option handling is SYN+ACK generation. When tcp_parse_fastopen_option() parses a TFO option, it sets tcp_fastopen_cookie.exp to false if the option number is 34, and true if 255. The value is carried to tcp_options_write() to generate a TFO option with the same option number. Other than that, all the TFO handling is the same and the kernel must generate the same cookie regardless of the option number. Let's add a test for the handling so that we can consolidate fastopen/server/ tests and fastopen/server/opt34 tests. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Add test for TFO_SERVER_WO_SOCKOPT1.Kuniyuki Iwashima
TFO_SERVER_WO_SOCKOPT1 is no longer enabled by default, and each server test requires setsockopt(TCP_FASTOPEN). Let's add a basic test for TFO_SERVER_WO_SOCKOPT1. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Import TFO server basic tests.Kuniyuki Iwashima
This imports basic TFO server tests from google/packetdrill. The repository has two versions of tests for most scenarios; one uses the non-experimental option (34), and the other uses the experimental option (255) with 0xF989. This only imports the following tests of the non-experimental version placed in [0]. I will add a specific test for the experimental option handling later. | TFO | Cookie | Payload | ---------------------------+-----+--------+---------+ basic-rw.pkt | yes | yes | yes | basic-zero-payload.pkt | yes | yes | no | basic-cookie-not-reqd.pkt | yes | no | yes | basic-non-tfo-listener.pkt | no | yes | yes | pure-syn-data.pkt | yes | no | yes | The original pure-syn-data.pkt missed setsockopt(TCP_FASTOPEN) and did not test TFO server in some scenarios unintentionally, so setsockopt() is added where needed. In addition, non-TFO scenario is stripped as it is covered by basic-non-tfo-listener.pkt. Also, I added basic- prefix. Link: https://github.com/google/packetdrill/tree/bfc96251310f/gtests/net/tcp/fastopen/server/opt34 #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Define common TCP Fast Open cookie.Kuniyuki Iwashima
TCP Fast Open cookie is generated in __tcp_fastopen_cookie_gen_cipher(). The cookie value is generated from src/dst IPs and a key configured by setsockopt(TCP_FASTOPEN_KEY) or net.ipv4.tcp_fastopen_key. The default.sh sets net.ipv4.tcp_fastopen_key, and the original packetdrill defines the corresponding cookie as TFO_COOKIE in run_all.py. [0] Then, each test does not need to care about the value, and we can easily update TFO_COOKIE in case __tcp_fastopen_cookie_gen_cipher() changes the algorithm. However, some tests use the bare hex value for specific IPv4 addresses and do not support IPv6. Let's define the same TFO_COOKIE in ksft_runner.sh. We will replace such bare hex values with TFO_COOKIE except for a single test for setsockopt(TCP_FASTOPEN_KEY). Link: https://github.com/google/packetdrill/blob/7230b3990f94/gtests/net/packetdrill/run_all.py#L65 #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Require explicit setsockopt(TCP_FASTOPEN).Kuniyuki Iwashima
To enable TCP Fast Open on a server, net.ipv4.tcp_fastopen must have 0x2 (TFO_SERVER_ENABLE), and we need to do either 1. Call setsockopt(TCP_FASTOPEN) for the socket 2. Set 0x400 (TFO_SERVER_WO_SOCKOPT1) additionally to net.ipv4.tcp_fastopen The default.sh sets 0x70403 so that each test does not need setsockopt(). (0x1 is TFO_CLIENT_ENABLE, and 0x70000 is ...???) However, some tests overwrite net.ipv4.tcp_fastopen without TFO_SERVER_WO_SOCKOPT1 and forgot setsockopt(TCP_FASTOPEN). For example, pure-syn-data.pkt [0] tests non-TFO servers unintentionally, except in the first scenario. To prevent such an accident, let's require explicit setsockopt(). TFO_CLIENT_ENABLE is necessary for tcp_syscall_bad_arg_fastopen-invalid-buf-ptr.pkt. Link: https://github.com/google/packetdrill/blob/bfc96251310f/gtests/net/tcp/fastopen/server/opt34/pure-syn-data.pkt #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftest: packetdrill: Set ktap_set_plan properly for single protocol test.Kuniyuki Iwashima
The cited commit forgot to update the ktap_set_plan call. ktap_set_plan sets the number of tests (KSFT_NUM_TESTS), which must match the number of executed tests (KTAP_CNT_PASS + KTAP_CNT_SKIP + KTAP_CNT_XFAIL) in ktap_finished. Otherwise, the selftest exit()s with 1. Let's adjust KSFT_NUM_TESTS based on supported protocols. While at it, misalignment is fixed up. Fixes: a5c10aa3d1ba ("selftests/net: packetdrill: Support single protocol test.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'nios2_update_for_v6.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux Pull NIOS2 updates from Dinh Nguyen: - Replace __ASSEMBLY__ with __ASSEMBLER__ in headers - Set memblock.current_limit when setting pfn limits * tag 'nios2_update_for_v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: nios2: ensure that memblock.current_limit is set when setting pfn limits nios2: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers nios2: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
2025-09-29net: ena: return 0 in ena_get_rxfh_key_size() when RSS hash key is not ↵Kohei Enju
configurable In EC2 instances where the RSS hash key is not configurable, ethtool shows bogus RSS hash key since ena_get_rxfh_key_size() unconditionally returns ENA_HASH_KEY_SIZE. Commit 6a4f7dc82d1e ("net: ena: rss: do not allocate key when not supported") added proper handling for devices that don't support RSS hash key configuration, but ena_get_rxfh_key_size() has been unchanged. When the RSS hash key is not configurable, return 0 instead of ENA_HASH_KEY_SIZE to clarify getting the value is not supported. Tested on m5 instance families. Without patch: # ethtool -x ens5 | grep -A 1 "RSS hash key" RSS hash key: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 With patch: # ethtool -x ens5 | grep -A 1 "RSS hash key" RSS hash key: Operation not supported Fixes: 6a4f7dc82d1e ("net: ena: rss: do not allocate key when not supported") Signed-off-by: Kohei Enju <enjuk@amazon.com> Link: https://patch.msgid.link/20250929050247.51680-1-enjuk@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29nfp: fix RSS hash key size when RSS is not supportedKohei Enju
The nfp_net_get_rxfh_key_size() function returns -EOPNOTSUPP when devices don't support RSS, and callers treat the negative value as a large positive value since the return type is u32. Return 0 when devices don't support RSS, aligning with the ethtool interface .get_rxfh_key_size() that requires returning 0 in such cases. Fixes: 9ff304bfaf58 ("nfp: add support for reporting CRC32 hash function") Signed-off-by: Kohei Enju <enjuk@amazon.com> Link: https://patch.msgid.link/20250929054230.68120-1-enjuk@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: rtnetlink: fix typo in rtnl_unregister_all() commentAlok Tiwari
Corrected "rtnl_unregster()" -> "rtnl_unregister()" in the documentation comment of "rtnl_unregister_all()" Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250929085418.49200-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Revert "net: group sk_backlog and sk_receive_queue"Eric Dumazet
This reverts commit 4effb335b5dab08cb6e2c38d038910f8b527cfc9. This was a benefit for UDP flood case, which was later greatly improved with commits 6471658dc66c ("udp: use skb_attempt_defer_free()") and b650bf0977d3 ("udp: remove busylock and add per NUMA queues"). Apparently blamed commit added a regression for RAW sockets, possibly because they do not use the dual RX queue strategy that UDP has. sock_queue_rcv_skb_reason() and RAW recvmsg() compete for sk_receive_buf and sk_rmem_alloc changes, and them being in the same cache line reduce performance. Fixes: 4effb335b5da ("net: group sk_backlog and sk_receive_queue") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509281326.f605b4eb-lkp@intel.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: David Ahern <dsahern@kernel.org> Cc: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250929182112.824154-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: stmmac: Convert open-coded register polling to helper macroFurong Xu
Drop the open-coded register polling routines. Use readl_poll_timeout_atomic() in atomic state. Also adjust the delay time to 10us which seems more reasonable. Tested on NXP i.MX8MP and ROCKCHIP RK3588 boards, the break condition was met right after the first polling, no delay involved at all. So the 10us delay should be long enough for most cases. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250927081036.10611-1-0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge branch 'mptcp-receive-path-improvement'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: receive path improvement This series includes several changes to the MPTCP RX path. The main goals are improving the RX performances, and increase the long term maintainability. Some changes reflects recent(ish) improvements introduced in the TCP stack: patch 1, 2 and 3 are the MPTCP counter part of SKB deferral free and auto-tuning improvements. Note that patch 3 could possibly fix additional issues, and overall such patch should protect from similar issues to arise in the future. Patches 4-7 are aimed at introducing the socket backlog usage which will be done in a later series to process the packets received by the different subflows while the msk socket is owned. Patch 8 is not related to the RX path, but it contains additional tests for new features recently introduced in net-next. ==================== Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-0-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29selftests: mptcp: join: validate new laminar endpMatthieu Baerts (NGI0)
Here are a few sub-tests for mptcp_join.sh, validating the new 'laminar' endpoint type. In a setup where subflows created using the routing rules would be rejected by the listener, and where the latter announces one IP address, some cases are verified: - Without any 'laminar' endpoints: no new subflows are created. - With one 'laminar' endpoint: a second subflow is created. - With multiple 'laminar' endpoints: 2 IPv4 subflows are created. - With one 'laminar' endpoint, but the server announcing a second IP address, only one subflow is created. - With one 'laminar' + 'subflow' endpoint, the same endpoint is only used once. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-8-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29mptcp: minor move_skbs_to_msk() cleanupPaolo Abeni
Such function is called only by __mptcp_data_ready(), which in turn is always invoked when msk is not owned by the user: we can drop the redundant, related check. Additionally mptcp needs to propagate the socket error only for current subflow. Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-7-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29mptcp: factor out a basic skb coalesce helperPaolo Abeni
The upcoming patch will introduced backlog processing for MPTCP socket, and we want to leverage coalescing in such data path. Factor out the relevant bits not touching memory accounting to deal with such use-case. Co-developed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-6-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29mptcp: remove unneeded mptcp_move_skb()Paolo Abeni
Since commit b7535cfed223 ("mptcp: drop legacy code around RX EOF"), sk_shutdown can't change during the main recvmsg loop, we can drop the related race breaker. Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-5-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29mptcp: introduce the mptcp_init_skb helperPaolo Abeni
Factor out all the skb initialization step in a new helper and use it. Note that this change moves the MPTCP CB initialization earlier: we can do such step as soon as the skb leaves the subflow socket receive queues. Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-4-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29mptcp: rcvbuf auto-tuning improvementPaolo Abeni
Apply to the MPTCP auto-tuning the same improvements introduced for the TCP protocol by the merge commit 2da35e4b4df9 ("Merge branch 'tcp-receive-side-improvements'"). The main difference is that TCP subflow and the main MPTCP socket need to account separately for OoO: MPTCP does not care for TCP-level OoO and vice versa, as a consequence do not reflect MPTCP-level rcvbuf increase due to OoO packets at the subflow level. This refeactor additionally allow dropping the msk receive buffer update at receive time, as the latter only intended to cope with subflow receive buffer increase due to OoO packets. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487 Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/559 Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-3-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29tcp: make tcp_rcvbuf_grow() accessible to mptcp codePaolo Abeni
To leverage the auto-tuning improvements brought by commit 2da35e4b4df9 ("Merge branch 'tcp-receive-side-improvements'"), the MPTCP stack need to access the mentioned helper. Acked-by: Geliang Tang <geliang@kernel.org> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-2-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29mptcp: leverage skb deferral freePaolo Abeni
Usage of the skb deferral API is straight-forward; with multiple subflows actives this allow moving part of the received application load into multiple CPUs. Also fix a typo in the related comment. Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-1-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29tcp: use skb->len instead of skb->truesize in tcp_can_ingest()Eric Dumazet
Some applications are stuck to the 20th century and still use small SO_RCVBUF values. After the blamed commit, we can drop packets especially when using LRO/hw-gro enabled NIC and small MSS (1500) values. LRO/hw-gro NIC pack multiple segments into pages, allowing tp->scaling_ratio to be set to a high value. Whenever the receive queue gets full, we can receive a small packet filling RWIN, but with a high skb->truesize, because most NIC use 4K page plus sk_buff metadata even when receiving less than 1500 bytes of payload. Even if we refine how tp->scaling_ratio is estimated, we could have an issue at the start of the flow, because the first round of packets (IW10) will be sent based on the initial tp->scaling_ratio (1/2) Relax tcp_can_ingest() to use skb->len instead of skb->truesize, allowing the peer to use final RWIN, assuming a 'perfect' scaling_ratio of 1. Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250927092827.2707901-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'for-net-next-2025-09-27' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: core: - MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver - Avoid a couple dozen -Wflex-array-member-not-at-end warnings - bcsp: receive data only if registered - HCI: Fix using LE/ACL buffers for ISO packets - hci_core: Detect if an ISO link has stalled - ISO: Don't initiate CIS connections if there are no buffers - ISO: Use sk_sndtimeo as conn_timeout drivers: - btusb: Check for unexpected bytes when defragmenting HCI frames - btusb: Add new VID/PID 13d3/3627 for MT7925 - btusb: Add new VID/PID 13d3/3633 for MT7922 - btusb: Add USB ID 2001:332a for D-Link AX9U rev. A1 - btintel: Add support for BlazarIW core - btintel_pcie: Add support for _suspend() / _resume() - btintel_pcie: Define hdev->wakeup() callback - btintel_pcie: Add Bluetooth core/platform as comments - btintel_pcie: Add id of Scorpious, Panther Lake-H484 - btintel_pcie: Refactor Device Coredump * tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (30 commits) Bluetooth: Avoid a couple dozen -Wflex-array-member-not-at-end warnings Bluetooth: hci_sync: Fix using random address for BIG/PA advertisements Bluetooth: ISO: don't leak skb in ISO_CONT RX Bluetooth: ISO: free rx_skb if not consumed Bluetooth: ISO: Fix possible UAF on iso_conn_free Bluetooth: SCO: Fix UAF on sco_conn_free Bluetooth: bcsp: receive data only if registered Bluetooth: btusb: Add new VID/PID 13d3/3633 for MT7922 Bluetooth: btusb: Add new VID/PID 13d3/3627 for MT7925 Bluetooth: remove duplicate h4_recv_buf() in header Bluetooth: btusb: Check for unexpected bytes when defragmenting HCI frames Bluetooth: hci_core: Print information of hcon on hci_low_sent Bluetooth: hci_core: Print number of packets in conn->data_q Bluetooth: Add function and line information to bt_dbg Bluetooth: MGMT: Fix not exposing debug UUID on MGMT_OP_READ_EXP_FEATURES_INFO Bluetooth: hci_core: Detect if an ISO link has stalled Bluetooth: ISO: Use sk_sndtimeo as conn_timeout Bluetooth: HCI: Fix using LE/ACL buffers for ISO packets Bluetooth: ISO: Don't initiate CIS connections if there are no buffers MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver ... ==================== Link: https://patch.msgid.link/20250927154616.1032839-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29ptr_ring: __ptr_ring_zero_tail micro optimizationMichael S. Tsirkin
__ptr_ring_zero_tail currently does the - 1 operation twice: - during initialization of head - at each loop iteration Let's just do it in one place, all we need to do is adjust the loop condition. this is better: - a slightly clearer logic with less duplication - uses prefix -- we don't need to save the old value - one less - 1 operation - for example, when ring is empty we now don't do - 1 at all, existing code does it once Text size shrinks from 15081 to 15050 bytes. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/bcd630c7edc628e20d4f8e037341f26c90ab4365.1758976026.git.mst@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge branch 'net-wangxun-support-to-configure-rss'Jakub Kicinski
Jiawen Wu says: ==================== net: wangxun: support to configure RSS Implement ethtool ops for RSS configuration, and support multiple RSS for multiple pools. ==================== Link: https://patch.msgid.link/20250926023843.34340-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: libwx: restrict change user-set RSS configurationJiawen Wu
Enable/disable SR-IOV will change the number of rings, thereby changing the RSS configuration that the user has set. So reject these attempts if netif_is_rxfh_configured() returns true. And remind the user to reset the RSS configuration. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250926023843.34340-5-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: wangxun: add RSS reta and rxfh fields supportJiawen Wu
Add ethtool ops for Rx flow hashing, query and set RSS indirection table and hash key. Disable UDP RSS by default, and support to configure L4 header fields with TCP/UDP/SCTP for flow hasing. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250926023843.34340-4-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: libwx: move rss_field to struct wxJiawen Wu
For global RSS and multiple RSS scheme, the RSS type fields are defined identically in the registers. So they can be defined as the macros WX_RSS_FIELD_* to cleanup the codes. And to prepare for the RXFH support in the next patch, move the rss_field to struct wx. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250926023843.34340-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: libwx: support separate RSS configuration for every poolJiawen Wu
For those devices which support 64 pools, they also support PF and VF (i.e. different pools) to configure different RSS key and hash table. Enable multiple RSS, use up to 64 RSS configurations and each pool has a specific configuration. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250926023843.34340-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'pstore-v6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull tiny pstore update from Kees Cook: - Clarify various comments for better understanding (Eugen Hristev) * tag 'pstore-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/zone: rewrite some comments for better understanding
2025-09-29idpf: fix mismatched free function for dma_alloc_coherentAlok Tiwari
The mailbox receive path allocates coherent DMA memory with dma_alloc_coherent(), but frees it with dmam_free_coherent(). This is incorrect since dmam_free_coherent() is only valid for buffers allocated with dmam_alloc_coherent(). Fix the mismatch by using dma_free_coherent() instead of dmam_free_coherent Fixes: e54232da1238 ("idpf: refactor idpf_recv_mb_msg") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Link: https://patch.msgid.link/20250925180212.415093-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: remove one stac/clac pair from move_addr_to_user()Eric Dumazet
Convert the get_user() and __put_user() code to the fast masked_user_access_begin()/unsafe_{get|put}_user() variant. This patch increases the performance of an UDP recvfrom() receiver (netserver) on 120 bytes messages by 7 % on an AMD EPYC 7B12 64-Core Processor platform. Presence of audit_sockaddr() makes difficult to avoid the stac/clac pair in the copy_to_user() call, this is left for a future patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250925230929.3727873-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29scm: use masked_user_access_begin() in put_cmsg()Eric Dumazet
Use the greatest and latest uaccess construct to get an optimal code. Before : lea (%r9,%rcx,1),%r10 movabs $<USER_PTR_MAX>,%r11 mov $0xfffffff2,%eax cmp %rcx,%r10 jb ffffffff81cdc312 <put_cmsg+0x152> cmp %r11,%r10 ja ffffffff81cdc312 <put_cmsg+0x152> stac lfence mov %r9,(%rcx) After: movabs $<USER_PTR_MAX>,%r9 cmp %r9,%rax cmova %r9,%rax stac mov %rcx,(%rax) Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250925224914.3590290-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge branch 'net-stmmac-drop-frames-causing-hlbs-error'Jakub Kicinski
Rohan G Thomas says: ==================== net: stmmac: Drop frames causing HLBS error This patchset consists of following patchset to avoid netdev watchdog reset due to Head-of-Line Blocking due to EST scheduling error. 1. Drop those frames causing HLBS error 2. Add HLBS frame drops to taprio stats v2: https://lore.kernel.org/r/20250915-hlbs_2-v2-1-27266b2afdd9@altera.com ==================== Link: https://patch.msgid.link/20250925-hlbs_2-v3-0-3b39472776c2@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: stmmac: tc: Add HLBS drop count to taprio statsRohan G Thomas
Add the count of the frames dropped by Head-Of-Line Blocking due to Scheduling(HLBS) error to taprio window drop count stats. Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250925-hlbs_2-v3-2-3b39472776c2@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29net: stmmac: est: Drop frames causing HLBS errorRohan G Thomas
Drop those frames causing Head-of-Line Blocking due to Scheduling (HLBS) error to avoid HLBS interrupt flooding and netdev watchdog timeouts due to blocked packets. Tx queues can be configured to drop those blocked packets by setting Drop Frames causing Scheduling Error (DFBS) bit of EST_CONTROL register. Also, add per queue HLBS drop count. Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250925-hlbs_2-v3-1-3b39472776c2@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'hardening-v6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "One notable addition is the creation of the 'transitional' keyword for kconfig so CONFIG renaming can go more smoothly. This has been a long-standing deficiency, and with the renaming of CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI support), this came up again. The breadth of the diffstat is mainly this renaming. - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva) - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure (Junjie Cao) - Add str_assert_deassert() helper (Lad Prabhakar) - gcc-plugins: Remove TODO_verify_il for GCC >= 16 - kconfig: Fix BrokenPipeError warnings in selftests - kconfig: Add transitional symbol attribute for migration support - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI" * tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib/string_choices: Add str_assert_deassert() helper kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI kconfig: Add transitional symbol attribute for migration support kconfig: Fix BrokenPipeError warnings in selftests gcc-plugins: Remove TODO_verify_il for GCC >= 16 stddef: Introduce __TRAILING_OVERLAP() stddef: Remove token-pasting in TRAILING_OVERLAP() lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
2025-09-29ixgbe: fix typos and docstring inconsistenciesAlok Tiwari
Corrected function and variable name typos in comments and docstrings: ixgbe_write_ee_hostif_X550 -> ixgbe_write_ee_hostif_data_X550 ixgbe_get_lcd_x550em -> ixgbe_get_lcd_t_x550em "Determime" -> "Determine" "point to hardware structure" -> "pointer to hardware structure" "To turn on the LED" -> "To turn off the LED" These changes improve readability, consistency. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250929124427.79219-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29Merge tag 'seccomp-v6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp update from Kees Cook: - Fix race with WAIT_KILLABLE_RECV (Johannes Nixdorf) * tag 'seccomp-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast
2025-09-29Merge tag 'execve-v6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - binfmt_elf: preserve original ELF e_flags for core dumps (Svetlana Parfenova) - exec: Fix incorrect type for ret (Xichao Zhao) - binfmt_elf: Replace offsetof() with struct_size() in fill_note_info() (Xichao Zhao) * tag 'execve-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_elf: preserve original ELF e_flags for core dumps binfmt_elf: Replace offsetof() with struct_size() in fill_note_info() exec: Fix incorrect type for ret
2025-09-29Merge tag 'ffs-const-v6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull ffs const-attribute cleanups from Kees Cook: "While working on various hardening refactoring a while back we encountered inconsistencies in the application of __attribute_const__ on the ffs() family of functions. This series fixes this across all archs and adds KUnit tests. Notably, this found a theoretical underflow in PCI (also fixed here) and uncovered an inefficiency in ARC (fixed in the ARC arch PR). I kept the series separate from the general hardening PR since it is a stand-alone "topic". - PCI: Fix theoretical underflow in use of ffs(). - Universally apply __attribute_const__ to all architecture's ffs()-family of functions. - Add KUnit tests for ffs() behavior and const-ness" * tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: KUnit: ffs: Validate all the __attribute_const__ annotations sparc: Add __attribute_const__ to ffs()-family implementations xtensa: Add __attribute_const__ to ffs()-family implementations s390: Add __attribute_const__ to ffs()-family implementations parisc: Add __attribute_const__ to ffs()-family implementations mips: Add __attribute_const__ to ffs()-family implementations m68k: Add __attribute_const__ to ffs()-family implementations openrisc: Add __attribute_const__ to ffs()-family implementations riscv: Add __attribute_const__ to ffs()-family implementations hexagon: Add __attribute_const__ to ffs()-family implementations alpha: Add __attribute_const__ to ffs()-family implementations sh: Add __attribute_const__ to ffs()-family implementations powerpc: Add __attribute_const__ to ffs()-family implementations x86: Add __attribute_const__ to ffs()-family implementations csky: Add __attribute_const__ to ffs()-family implementations bitops: Add __attribute_const__ to generic ffs()-family implementations KUnit: Introduce ffs()-family tests PCI: Test for bit underflow in pcie_set_readrq()
2025-09-30Merge tag 'amd-drm-next-6.18-2025-09-26' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.18-2025-09-26: amdgpu: - Misc fixes - Misc cleanups - SMU 13.x fixes - MES fix - VCN 5.0.1 reset fixes - DCN 3.2 watermark fixes - AVI infoframe fixes - PSR fix - Brightness fixes - DCN 3.1.4 fixes - DCN 3.1+ DTM fixes - DCN powergating fixes - DMUB fixes - DCN/SMU cleanup - DCN stutter fixes - DCN 3.5 fixes - GAMMA_LUT fixes - Add UserQ documentation - GC 9.4 reset fixes - Enforce isolation cleanups - UserQ fixes - DC/non-DC common modes cleanup - DCE6-10 fixes amdkfd: - Fix a race in sw_fini - Switch partition fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250926143918.2030854-1-alexander.deucher@amd.com
2025-09-29Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds
Pull interleaved SHA-256 hashing support from Eric Biggers: "Optimize fsverity with 2-way interleaved hashing Add support for 2-way interleaved SHA-256 hashing to lib/crypto/, and make fsverity use it for faster file data verification. This improves fsverity performance on many x86_64 and arm64 processors. Later, I plan to make dm-verity use this too" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: Use 2-way interleaved SHA-256 hashing when supported fsverity: Remove inode parameter from fsverity_hash_block() lib/crypto: tests: Add tests and benchmark for sha256_finup_2x() lib/crypto: x86/sha256: Add support for 2-way interleaved hashing lib/crypto: arm64/sha256: Add support for 2-way interleaved hashing lib/crypto: sha256: Add support for 2-way interleaved hashing