summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-12-12bnxt_en: Skip nic close/open when configuring tstamp filtersPavan Chebbi
We don't have to close and open the nic to make sure we have valid rx timestamps. Once we have the timestamp filter applied to the HW and the timestamp_fld_format bit is cleared in the rx completion and the timestamp is non-zero, we can be sure that rx timestamp is valid data. Skip close/open when we set any timestamp filter. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-13-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Add support for UDP GSO on 5760X chipsMichael Chan
The new 5760X chips supports UDP GSO. Tested using udpgso_bench_tx. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-12-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: add rx_filter_miss extended statsDamodharam Ammepalli
rx_filter_miss counter is newly added to the rx_port_stats_ext stats structure for newer chips. Newer firmware will return the structure size that includes this counter. Add this entry to the bnxt_port_stats_ext_arr array and the ethtool -S code will pick up this counter if it is supported. Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-11-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Configure UDP tunnel TPAMichael Chan
On the new P7 chips, TPA for tunnel packets can be independently enabled for each VNIC. The default TPA configuration should not include UDP tunnels because the UDP ports for these tunnels are not known yet. The chip should not aggregate these UDP tunneled packets using default UDP ports until the ports are known. Add a new function bnxt_hwrm_vnic_update_tunl_tpa() to enable VXLAN and Geneve TPA if the corresponding UDP ports are known. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-10-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Add support for VXLAN GPEMichael Chan
Add a new bnxt_udp_tunnels_p7 struct to support the new P7 chips that can parse VXLAN GPE packets. Add VXLAN GPE tunnel type handling to the .set_port() and .unset_port() functions. .ndo_features_check() is also enhanced to support VXLAN GPE which may encapsulate inner IP packets instead of ethernet packets. Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Use proper TUNNEL_DST_PORT_ALLOC* commandsMichael Chan
In bnxt_udp_tunnel_set_port(), use the proper ALLOC commands instead of the FREE commands for correctness. The ALLOC and FREE commands happen to be identical so this is just a cosmetic fix for correctness. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Allocate extra QP backing store memory when RoCE FW reports itSelvin Xavier
The Fast QP modify destroy RoCE feature requires additional QP entries in QP context backing store. FW reports the extra count to be allocated during backing store query. Use this value and allocate extra memory. Note that this works for both the V1 and V1 backing store FW APIs. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Support TX coalesced completion on 5760X chipsMichael Chan
TX coalesced completions are supported on newer chips to provide one TX completion record for multiple TX packets up to the sq_cons_idx in the completion record. This method saves PCIe bandwidth by reducing the number of TX completions. Only very minor changes are now required to support this mode with the new framework that handles TX completions based on the consumer indices. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Prevent TX timeout with a very small TX ringMichael Chan
If xmit_more condition is true, the driver may set the TX_BD_FLAGS_NO_CMPL flag. If after this packet, the TX ring can no longer hold a packet with maximum fragments, we will stop the TX queue. When this happens, we must clear the TX_BD_FLAGS_NO_CMPL flag on the last packet or there will be no completion and cause TX timeout. Fixes: c1056a59aee1 ("bnxt_en: Optimize xmit_more TX path") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Fix TX ring indexing logicMichael Chan
Two spots were missed when modifying the TX ring indexing logic. The use of unmasked TX index in bnxt_tx_int() will cause unnecessary __bnxt_tx_int() calls. The same issue in bnxt_tx_int_xdp() can result in illegal array index. Fixes: 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Fix AGG ring check logic in bnxt_check_rings()Somnath Kotur
_bnxt_get_max_rings() that is invoked in bnxt_check_rings() already accounts for the AGG ring(s) and gives a max value based on that. Increasing for AGG rings before calling _bnxt_get_max_rings() will result in checking for twice the number of rings than required and it can fail. Fix it by adjusting for AGG rings after calling _bnxt_get_max_rings(). Fixes: f5b29c6afe36 ("bnxt_en: Add helper to get the number of CP rings required for TX rings") Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12bnxt_en: Fix trimming of P5 RX and TX ringsMichael Chan
The recent commit to trim the RX and TX rings on P5 chips by assigning each with max CP rings divided by 2 is not correct. Max CP rings divided by 2 may be bigger than the original RX or TX and would lead to failure. In other words, we may be checking for increased RX/TX rings than required and it may fail. Fix it by calling __bnxt_trim_rings() instead that would properly trim RX and TX without the possibility of increasing their values. Fixes: f5b29c6afe36 ("bnxt_en: Add helper to get the number of CP rings required for TX rings") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231212005122.2401-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12net: mdio-gpio: replace deprecated strncpy with strscpyJustin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect new_bus->id to be NUL-terminated but not NUL-padded based on its prior assignment through snprintf: | snprintf(new_bus->id, MII_BUS_ID_SIZE, "gpio-%x", bus_id); Due to this, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. We can also use sizeof() instead of a length macro as this more closely ties the maximum buffer size to the destination buffer. Do this for two instances. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231211-strncpy-drivers-net-mdio-mdio-gpio-c-v3-1-76dea53a1a52@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12net: Remove acked SYN flag from packet in the transmit queue correctlyDong Chenchen
syzkaller report: kernel BUG at net/core/skbuff.c:3452! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e7762ad2-dirty #135 RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452) Call Trace: icmp_glue_bits (net/ipv4/icmp.c:357) __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165) ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341) icmp_push_reply (net/ipv4/icmp.c:370) __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772) ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577) __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295) ip_output (net/ipv4/ip_output.c:427) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387) tcp_retransmit_skb (net/ipv4/tcp_output.c:3404) tcp_retransmit_timer (net/ipv4/tcp_timer.c:604) tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716) The panic issue was trigered by tcp simultaneous initiation. The initiation process is as follows: TCP A TCP B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... // TCP B: not send challenge ack for ack limit or packet loss // TCP A: close tcp_close tcp_send_fin if (!tskb && tcp_under_memory_pressure(sk)) tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN; // set FIN flag 6. FIN_WAIT_1 --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // TCP B: send challenge ack to SYN_FIN_ACK 7. ... <SEQ=301><ACK=101><CTL=ACK> <-- SYN-RECEIVED //challenge ack // TCP A: <SND.UNA=101> 8. FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic __tcp_retransmit_skb //skb->len=0 tcp_trim_head len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100 __pskb_trim_head skb->data_len -= len // skb->len=-1, wrap around ... ... ip_fragment icmp_glue_bits //BUG_ON If we use tcp_trim_head() to remove acked SYN from packet that contains data or other flags, skb->len will be incorrectly decremented. We can remove SYN flag that has been acked from rtx_queue earlier than tcp_trim_head(), which can fix the problem mentioned above. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Co-developed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12Merge tag 'pef2256-framer' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Linus Walleij says: ==================== Immutable tag for the PEF2256 framer * tag 'pef2256-framer' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: MAINTAINERS: Add the Lantiq PEF2256 driver entry pinctrl: Add support for the Lantic PEF2256 pinmux net: wan: framer: Add support for the Lantiq PEF2256 framer dt-bindings: net: Add the Lantiq PEF2256 E1/T1/J1 framer net: wan: Add framer framework support ==================== Link: https://lore.kernel.org/all/CACRpkdYT1J7noFUhObFgfA60XQAfL4rb=knEmWS__TKKtCMh7Q@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12MAINTAINERS: Add the Lantiq PEF2256 driver entryHerve Codina
After contributing the driver, add myself as the maintainer for the Lantiq PEF2256 driver. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/20231128132534.258459-6-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-12pinctrl: Add support for the Lantic PEF2256 pinmuxHerve Codina
The Lantiq PEF2256 is a framer and line interface component designed to fulfill all required interfacing between an analog E1/T1/J1 line and the digital PCM system highway/H.100 bus. This kind of component can be found in old telecommunication system. It was used to digital transmission of many simultaneous telephone calls by time-division multiplexing. Also using HDLC protocol, WAN networks can be reached through the framer. This pinmux support handles the pin muxing part (pins RP(A..D) and pins XP(A..D)) of the PEF2256. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231128132534.258459-5-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-12net: wan: framer: Add support for the Lantiq PEF2256 framerHerve Codina
The Lantiq PEF2256 is a framer and line interface component designed to fulfill all required interfacing between an analog E1/T1/J1 line and the digital PCM system highway/H.100 bus. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231128132534.258459-4-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-12dt-bindings: net: Add the Lantiq PEF2256 E1/T1/J1 framerHerve Codina
The Lantiq PEF2256 is a framer and line interface component designed to fulfill all required interfacing between an analog E1/T1/J1 line and the digital PCM system highway/H.100 bus. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20231128132534.258459-3-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-12net: wan: Add framer framework supportHerve Codina
A framer is a component in charge of an E1/T1 line interface. Connected usually to a TDM bus, it converts TDM frames to/from E1/T1 frames. It also provides information related to the E1/T1 line. The framer framework provides a set of APIs for the framer drivers (framer provider) to create/destroy a framer and APIs for the framer users (framer consumer) to obtain a reference to the framer, and use the framer. This basic implementation provides a framer abstraction for: - power on/off the framer - get the framer status (line state) - be notified on framer status changes - get/set the framer configuration Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20231128132534.258459-2-herve.codina@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-12-12qed: Fix a potential use-after-free in qed_cxt_tables_allocDinghao Liu
qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to free p_hwfn->p_cxt_mngr->ilt_shadow on error. However, qed_cxt_tables_alloc() accesses the freed pointer on failure of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(), which may lead to use-after-free. Fix this issue by setting p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free(). Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12net: asix: fix fortify warningDmitry Antipov
When compiling with gcc version 14.0.0 20231129 (experimental) and CONFIG_FORTIFY_SOURCE=y, I've noticed the following warning: ... In function 'fortify_memcpy_chk', inlined from 'ax88796c_tx_fixup' at drivers/net/ethernet/asix/ax88796c_main.c:287:2: ./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning] 588 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... This call to 'memcpy()' is interpreted as an attempt to copy TX_OVERHEAD (which is 8) bytes from 4-byte 'sop' field of 'struct tx_pkt_info' and thus overread warning is issued. Since we actually want to copy both 'sop' and 'seg' fields at once, use the convenient 'struct_group()' here. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Łukasz Stelmach <l.stelmach@samsung.com> Link: https://lore.kernel.org/r/20231211090535.9730-1-dmantipov@yandex.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12e1000e: Use pcie_capability_read_word() for reading LNKSTAIlpo Järvinen
Use pcie_capability_read_word() for reading LNKSTA and remove the custom define that matches to PCI_EXP_LNKSTA. As only single user for cap_offset remains, replace it with a call to pci_pcie_cap(). Instead of e1000_adapter, make local variable out of pci_dev because both users are interested in it. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12Merge tag 'ext4_for_linus-6.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix various bugs / regressions for ext4, including a soft lockup, a WARN_ON, and a BUG" * tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix soft lockup in journal_finish_inode_data_buffers() ext4: fix warning in ext4_dio_write_end_io() jbd2: increase the journal IO's priority jbd2: correct the printing of write_flags in jbd2_write_superblock() ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
2023-12-12iavf: Fix iavf_shutdown to call iavf_remove instead iavf_closeSlawomir Laba
Make the flow for pci shutdown be the same to the pci remove. iavf_shutdown was implementing an incomplete version of iavf_remove. It misses several calls to the kernel like iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap that might break the system on reboot or hibernation. Implement the call of iavf_remove directly in iavf_shutdown to close this gap. Fixes below error messages (dmesg) during shutdown stress tests - [685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for VF 0 [685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for VF 0 Reproduction: 1. Create one VF interface: echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs 2. Run live dmesg on the host: dmesg -wH 3. On SUT, script below steps into vf_namespace_assignment.sh <#!/bin/sh> // Remove <>. Git removes # line if=<VF name> (edit this per VF name) loop=0 while true; do echo test round $loop let loop++ ip netns add ns$loop ip link set dev $if up ip link set dev $if netns ns$loop ip netns exec ns$loop ip link set dev $if up ip netns exec ns$loop ip link set dev $if netns 1 ip netns delete ns$loop done 4. Run the script for at least 1000 iterations on SUT: ./vf_namespace_assignment.sh Expected result: No errors in dmesg. Fixes: 129cf89e5856 ("iavf: rename functions and structs to new name") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Ranganatha Rao <ranganatha.rao@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12iavf: Handle ntuple on/off based on new state machines for flow directorPiotr Gardocki
ntuple-filter feature on/off: Default is on. If turned off, the filters will be removed from both PF and iavf list. The removal is irrespective of current filter state. Steps to reproduce: ------------------- 1. Ensure ntuple is on. ethtool -K enp8s0 ntuple-filters on 2. Create a filter to receive the traffic into non-default rx-queue like 15 and ensure traffic is flowing into queue into 15. Now, turn off ntuple. Traffic should not flow to configured queue 15. It should flow to default RX queue. Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12iavf: Introduce new state machines for flow directorPiotr Gardocki
New states introduced: IAVF_FDIR_FLTR_DIS_REQUEST IAVF_FDIR_FLTR_DIS_PENDING IAVF_FDIR_FLTR_INACTIVE Current FDIR state machines (SM) are not adequate to handle a few scenarios in the link DOWN/UP event, reset event and ntuple-feature. For example, when VF link goes DOWN and comes back UP administratively, the expectation is that previously installed filters should also be restored. But with current SM, filters are not restored. So with new SM, during link DOWN filters are marked as INACTIVE in the iavf list but removed from PF. After link UP, SM will transition from INACTIVE to ADD_REQUEST to restore the filter. Similarly, with VF reset, filters will be removed from the PF, but marked as INACTIVE in the iavf list. Filters will be restored after reset completion. Steps to reproduce: ------------------- 1. Create a VF. Here VF is enp8s0. 2. Assign IP addresses to VF and link partner and ping continuously from remote. Here remote IP is 1.1.1.1. 3. Check default RX Queue of traffic. ethtool -S enp8s0 | grep -E "rx-[[:digit:]]+\.packets" 4. Add filter - change default RX Queue (to 15 here) ethtool -U ens8s0 flow-type ip4 src-ip 1.1.1.1 action 15 loc 5 5. Ensure filter gets added and traffic is received on RX queue 15 now. Link event testing: ------------------- 6. Bring VF link down and up. If traffic flows to configured queue 15, test is success, otherwise it is a failure. Reset event testing: -------------------- 7. Reset the VF. If traffic flows to configured queue 15, test is success, otherwise it is a failure. Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12Merge tag 'fuse-fixes-6.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: - Fix a couple of potential crashes, one introduced in 6.6 and one in 5.10 - Fix misbehavior of virtiofs submounts on memory pressure - Clarify naming in the uAPI for a recent feature * tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP fuse: dax: set fc->dax to NULL in fuse_dax_conn_free() fuse: share lookup state between submount and its parent docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
2023-12-12e1000e: Use PCI_EXP_LNKSTA_NLW & FIELD_GET() instead of custom defines/codeIlpo Järvinen
e1000e has own copy of PCI Negotiated Link Width field defines. Use the ones from include/uapi/linux/pci_regs.h instead of the custom ones and remove the custom ones and convert to FIELD_GET(). Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12igb: Use FIELD_GET() to extract Link WidthIlpo Järvinen
Use FIELD_GET() to extract PCIe Negotiated Link Width field instead of custom masking and shifting. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12Merge tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Memory leak fix (in lock error path) - Two fixes for create with allocation size - FIx for potential UAF in lease break error path - Five directory lease (caching) fixes found during additional recent testing * tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE ksmbd: fix wrong allocation size update in smb2_open() ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack() ksmbd: lazy v2 lease break on smb2_write() ksmbd: send v2 lease break notification for directory ksmbd: downgrade RWH lease caching state to RH for directory ksmbd: set v2 lease capability ksmbd: set epoch in create context v2 lease ksmbd: fix memory leak in smb2_lock()
2023-12-12jbd2: fix soft lockup in journal_finish_inode_data_buffers()Ye Bin
There's issue when do io test: WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170] CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G OE Call trace: dump_backtrace+0x0/0x1a0 show_stack+0x24/0x30 dump_stack+0xb0/0x100 watchdog_timer_fn+0x254/0x3f8 __hrtimer_run_queues+0x11c/0x380 hrtimer_interrupt+0xfc/0x2f8 arch_timer_handler_phys+0x38/0x58 handle_percpu_devid_irq+0x90/0x248 generic_handle_irq+0x3c/0x58 __handle_domain_irq+0x68/0xc0 gic_handle_irq+0x90/0x320 el1_irq+0xcc/0x180 queued_spin_lock_slowpath+0x1d8/0x320 jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2] kjournald2+0xec/0x2f0 [jbd2] kthread+0x134/0x138 ret_from_fork+0x10/0x18 Analyzed informations from vmcore as follows: (1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list'; (2) Now is processing the 855th jbd2_inode; (3) JBD2 task has TIF_NEED_RESCHED flag; (4) There's no pags in address_space around the 855th jbd2_inode; (5) There are some process is doing drop caches; (6) Mounted with 'nodioread_nolock' option; (7) 128 CPUs; According to informations from vmcore we know 'journal->j_list_lock' spin lock competition is fierce. So journal_finish_inode_data_buffers() maybe process slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors(). However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK, will not call cond_resched(). So may lead to soft lockup. journal_finish_inode_data_buffers filemap_fdatawait_range_keep_errors __filemap_fdatawait_range while (index <= end) nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK); if (!nr_pages) break; --> If 'nr_pages' is equal zero will break, then will not call cond_resched() for (i = 0; i < nr_pages; i++) wait_on_page_writeback(page); cond_resched(); To solve above issue, add scheduling point in the journal_finish_inode_data_buffers(); Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-12-12HID: apple: Add "hfd.cn" and "WKB603" to the list of non-apple keyboardsYan Jun
JingZao(京造) WKB603 keyboard is a rebranded product of Jamesdonkey RS2 keyboard, identified as "hfd.cn WKB603" in wired mode, "WKB603" in bluetooth mode. Adding them to the list of non-apple keyboards fixes function key. Signed-off-by: Yan Jun <jerrysteve1101@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2023-12-12HID: lenovo: Restrict detection of patched firmware only to USB cptkbdMikhail Khvainitski
Commit 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and stop applying workaround") introduced a regression for ThinkPad TrackPoint Keyboard II which has similar quirks to cptkbd (so it uses the same workarounds) but slightly different so that there are false-positives during detecting well-behaving firmware. This commit restricts detecting well-behaving firmware to the only model which known to have one and have stable enough quirks to not cause false-positives. Fixes: 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and stop applying workaround") Link: https://lore.kernel.org/linux-input/ZXRiiPsBKNasioqH@jekhomev/ Link: https://bbs.archlinux.org/viewtopic.php?pid=2135468#p2135468 Signed-off-by: Mikhail Khvainitski <me@khvoinitsky.org> Tested-by: Yauhen Kharuzhy <jekhor@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2023-12-12Merge branch 'net-dsa-realtek-two-rtl8366rb-fixes'Paolo Abeni
Linus Walleij says: ==================== net: dsa: realtek: Two RTL8366RB fixes These minor fixes were found while digging into other issues: a weirdly named variable and bogus MTU handling. Fix it up. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> ==================== Link: https://lore.kernel.org/r/20231209-rtl8366rb-mtu-fix-v1-0-df863e2b2b2a@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-12net: dsa: realtek: Rewrite RTL8366RB MTU handlingLinus Walleij
The MTU callbacks are in layer 1 size, so for example 1500 bytes is a normal setting. Cache this size, and only add the layer 2 framing right before choosing the setting. On the CPU port this will however include the DSA tag since this is transmitted from the parent ethernet interface! Add the layer 2 overhead such as ethernet and VLAN framing and FCS before selecting the size in the register. This will make the code easier to understand. The rtl8366rb_max_mtu() callback returns a bogus MTU just subtracting the CPU tag, which is the only thing we should NOT subtract. Return the correct layer 1 max MTU after removing headers and checksum. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-12net: dsa: realtek: Rename bogus RTL8368S variableLinus Walleij
Rename the register name to RTL8366RB instead of the bogus RTL8368S (internal product name?) Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-12net/rose: Fix Use-After-Free in rose_ioctlHyunwoo Kim
Because rose_ioctl() accesses sk->sk_receive_queue without holding a sk->sk_receive_queue.lock, it can cause a race with rose_accept(). A use-after-free for skb occurs with the following flow. ``` rose_ioctl() -> skb_peek() rose_accept() -> skb_dequeue() -> kfree_skb() ``` Add sk->sk_receive_queue.lock to rose_ioctl() to fix this issue. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Link: https://lore.kernel.org/r/20231209100538.GA407321@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-12atm: Fix Use-After-Free in do_vcc_ioctlHyunwoo Kim
Because do_vcc_ioctl() accesses sk->sk_receive_queue without holding a sk->sk_receive_queue.lock, it can cause a race with vcc_recvmsg(). A use-after-free for skb occurs with the following flow. ``` do_vcc_ioctl() -> skb_peek() vcc_recvmsg() -> skb_recv_datagram() -> skb_free_datagram() ``` Add sk->sk_receive_queue.lock to do_vcc_ioctl() to fix this issue. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Link: https://lore.kernel.org/r/20231209094210.GA403126@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-12net: dns_resolver: the module is called dns_resolver, not dnsresolverAhelenia Ziemiańska
$ modinfo dnsresolver dns_resolver | grep name modinfo: ERROR: Module dnsresolver not found. filename: /lib/modules/6.1.0-9-amd64/kernel/net/dns_resolver/dns_resolver.ko name: dns_resolver Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Link: https://lore.kernel.org/r/gh4sxphjxbo56n2spgmc66vtazyxgiehpmv5f2gkvgicy6f4rs@tarta.nabijaczleweli.xyz Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-12net: dl2k: Use proper conversion of dev_addr before IO to deviceAndy Shevchenko
The driver is using iowriteXX()/ioreadXX() APIs which are LE IO accessors simplified as 1. Convert given value _from_ CPU _to_ LE 2. Write it to the device as is The dev_addr is a byte stream, but because the driver uses 16-bit IO accessors, it wants to perform double conversion on BE CPUs, but it took it wrong, as it effectivelly does two times _from_ CPU _to_ LE. What it has to do is to consider dev_addr as an array of LE16 and hence do _from_ LE _to_ CPU conversion, followed by implied _from_ CPU _to_ LE in the iowrite16(). To achieve that, use get_unaligned_le16(). This will make it correct and allows to avoid sparse warning as reported by LKP. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312030058.hfZPTXd7-lkp@intel.com/ Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20231208153327.3306798-1-andriy.shevchenko@linux.intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-11netlink: specs: devlink: add some(not all) missing attributes in devlink.yamlSwarup Laxman Kotiaklapudi
Add some missing(not all) attributes in devlink.yaml. Signed-off-by: Swarup Laxman Kotiaklapudi <swarupkotikalapudi@gmail.com> Suggested-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231208182515.1206616-1-swarupkotikalapudi@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11Merge branch 'net-sched-conditional-notification-of-events-for-cls-and-act'Jakub Kicinski
Pedro Tammela says: ==================== net/sched: conditional notification of events for cls and act This is an optimization we have been leveraging on P4TC but we believe it will benefit rtnl users in general. It's common to allocate an skb, build a notification message and then broadcast an event. In the absence of any user space listeners, these resources (cpu and memory operations) are wasted. In cases where the subsystem is lockless (such as in tc-flower) this waste is more prominent. For the scenarios where the rtnl_lock is held it is not as prominent. The idea is simple. Build and send the notification iif: - The user requests via NLM_F_ECHO or - Someone is listening to the rtnl group (tc mon) On a simple test with tc-flower adding 1M entries, using just a single core, there's already a noticeable difference in the cycles spent in tc_new_tfilter with this patchset. before: - 43.68% tc_new_tfilter + 31.73% fl_change + 6.35% tfilter_notify + 1.62% nlmsg_notify 0.66% __tcf_qdisc_find.part.0 0.64% __tcf_chain_get 0.54% fl_get + 0.53% tcf_proto_lookup_ops after: - 39.20% tc_new_tfilter + 34.58% fl_change 0.69% __tcf_qdisc_find.part.0 0.67% __tcf_chain_get + 0.61% tcf_proto_lookup_ops Note, the above test is using iproute2:tc which execs a shell. We expect people using netlink directly to observe even greater reductions. The qdisc side needs some refactoring of the notification routines to fit in this new model, so they will be sent in a later patchset. ==================== Link: https://lore.kernel.org/r/20231208192847.714940-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11net/sched: cls_api: conditional notification of eventsPedro Tammela
As of today tc-filter/chain events are unconditionally built and sent to RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check before-hand if they are really needed. This will help to alleviate system pressure when filters are concurrently added without the rtnl lock as in tc-flower. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20231208192847.714940-8-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11net/sched: cls_api: remove 'unicast' argument from delete notificationPedro Tammela
This argument is never called while set to true, so remove it as there's no need for it. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231208192847.714940-7-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11net/sched: act_api: conditional notification of eventsPedro Tammela
As of today tc-action events are unconditionally built and sent to RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check before-hand if they are really needed. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231208192847.714940-6-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11net/sched: act_api: don't open code max()Pedro Tammela
Use max() in a couple of places that are open coding it with the ternary operator. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231208192847.714940-5-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11rtnl: add helper to send if skb is not nullPedro Tammela
This is a convenience helper for routines handling conditional rtnl events, that is code that might send a notification depending on rtnl_has_listeners/rtnl_notify_needed. Instead of: if (skb) rtnetlink_send(...) Use: rtnetlink_maybe_send(...) Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20231208192847.714940-4-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11rtnl: add helper to check if a notification is neededVictor Nogueira
Building on the rtnl_has_listeners helper, add the rtnl_notify_needed helper to check if we can bail out early in the notification routines. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20231208192847.714940-3-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11rtnl: add helper to check if rtnl group has listenersJamal Hadi Salim
As of today, rtnl code creates a new skb and unconditionally fills and broadcasts it to the relevant group. For most operations this is okay and doesn't waste resources in general. When operations are done without the rtnl_lock, as in tc-flower, such skb allocation, message fill and no-op broadcasting can happen in all cores of the system, which contributes to system pressure and wastes precious cpu cycles when no one will receive the built message. Introduce this helper so rtnetlink operations can simply check if someone is listening and then proceed if necessary. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20231208192847.714940-2-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>