linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-01-16	unix: Improve locking scheme in unix_show_fdinfo()	Kirill Tkhai
	After switching to TCP_ESTABLISHED or TCP_LISTEN sk_state, alive SOCK_STREAM and SOCK_SEQPACKET sockets can't change it anymore (since commit 3ff8bff704f4 "unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()"). Thus, we do not need to take lock here. Signed-off-by: Kirill Tkhai <tkhai@ya.ru> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	Merge branch 'virtio-net-xdp-multi-buffer'	David S. Miller
	Heng Qi says: ==================== virtio-net: support multi buffer xdp Changes since PATCH v4: - Make netdev_warn() in [PATCH 2/10] independent from [PATCH 3/10]. Changes since PATCH v3: - Separate fix patch [2/10] for MTU calculation of single buffer xdp. Note that this patch needs to be backported to the stable branch. Changes since PATCH v2: - Even if single buffer xdp has a hole mechanism, there will be no problem (limiting mtu and turning off GUEST GSO), so there is no need to backport "[PATCH 1/9]"; - Modify calculation of MTU for single buffer xdp in virtnet_xdp_set(); - Make truesize in mergeable mode return to literal meaning; - Add some comments for legibility; Changes since RFC: - Using headroom instead of vi->xdp_enabled to avoid re-reading in add_recvbuf_mergeable(); - Disable GRO_HW and keep linearization for single buffer xdp; - Renamed to virtnet_build_xdp_buff_mrg(); - pr_debug() to netdev_dbg(); - Adjusted the order of the patch series. Currently, virtio net only supports xdp for single-buffer packets or linearized multi-buffer packets. This patchset supports xdp for multi-buffer packets, then larger MTU can be used if xdp sets the xdp.frags. This does not affect single buffer handling. In order to build multi-buffer xdp neatly, we integrated the code into virtnet_build_xdp_buff_mrg() for xdp. The first buffer is used for prepared xdp buff, and the rest of the buffers are added to its skb_shared_info structure. This structure can also be conveniently converted during XDP_PASS to get the corresponding skb. Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail() is based on the assumption of the page pool, (rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag)) is negative in most cases. So we didn't set xdp_rxq->frag_size in virtnet_open() to disable the tail increase. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: support multi-buffer xdp	Heng Qi
	Driver can pass the skb to stack by build_skb_from_xdp_buff(). Driver forwards multi-buffer packets using the send queue when XDP_TX and XDP_REDIRECT, and clears the reference of multi pages when XDP_DROP. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: remove xdp related info from page_to_skb()	Heng Qi
	For the clear construction of xdp_buff, we remove the xdp processing interleaved with page_to_skb(). Now, the logic of xdp and building skb from xdp are separate and independent. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: build skb from multi-buffer xdp	Heng Qi
	This converts the xdp_buff directly to a skb, including multi-buffer and single buffer xdp. We'll isolate the construction of skb based on xdp from page_to_skb(). Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: transmit the multi-buffer xdp	Heng Qi
	This serves as the basis for XDP_TX and XDP_REDIRECT to send a multi-buffer xdp_frame. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: construct multi-buffer xdp in mergeable	Heng Qi
	Build multi-buffer xdp using virtnet_build_xdp_buff_mrg(). For the prefilled buffer before xdp is set, we will probably use vq reset in the future. At the same time, virtio net currently uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate the tailroom of the last frag, which will involve the offset of the corresponding page and cause a negative value, so we disable tail increase by not setting xdp_rxq->frag_size. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: build xdp_buff with multi buffers	Heng Qi
	Support xdp for multi buffer packets in mergeable mode. Putting the first buffer as the linear part for xdp_buff, and the rest of the buffers as non-linear fragments to struct skb_shared_info in the tailroom belonging to xdp_buff. Let 'truesize' return to its literal meaning, that is, when xdp is set, it includes the length of headroom and tailroom. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: update bytes calculation for xdp_frame	Heng Qi
	Update relative record value for xdp_frame as basis for multi-buffer xdp transmission. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: set up xdp for multi buffer packets	Heng Qi
	When the xdp program sets xdp.frags, which means it can process multi-buffer packets over larger MTU, so we continue to support xdp. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: fix calculation of MTU for single-buffer xdp	Heng Qi
	When single-buffer xdp is loaded, the size of the buffer filled each time is 'sz = (PAGE_SIZE - headroom - tailroom)', which is the maximum packet length that the driver allows the device to pass in. Otherwise, the packet with a length greater than sz will come in, so num_buf will be greater than or equal to 2, and xdp_linearize_page() will be performed and the packet will be dropped because the total length is greater than PAGE_SIZE. So the maximum value of MTU for single-buffer xdp is 'max_sz = sz - ETH_HLEN'. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	virtio-net: disable the hole mechanism for xdp	Heng Qi
	XDP core assumes that the frame_size of xdp_buff and the length of the frag are PAGE_SIZE. The hole may cause the processing of xdp to fail, so we disable the hole mechanism when xdp is set. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16	octeontx2-af: update CPT inbound inline IPsec config mailbox	Srujana Challa
	Updates CPT inbound inline IPsec configure mailbox to take CPT credit, opcode, credit_th and bpid from VF. This patch also adds a mailbox to read inbound IPsec configuration. Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	Merge branch 'mlxbf_gige-add-bluefield-3-support'	Jakub Kicinski
	David Thompson says: ==================== mlxbf_gige: add BlueField-3 support This patch series adds driver logic to the "mlxbf_gige" Ethernet driver in order to support the third generation BlueField SoC (BF3). The existing "mlxbf_gige" driver is extended with BF3-specific logic and run-time decisions are made by the driver depending on the SoC generation (BF2 vs. BF3). The BF3 SoC is similar to BF2 SoC with regards to transmit and receive packet processing: * Driver rings usage; consumer & producer indices * Single queue for receive and transmit * DMA operation The differences between BF3 and BF2 SoC are: * In addition to supporting 1Gbps interface speed, the BF3 SoC adds support for 10Mbps and 100Mbps interface speeds * BF3 requires SerDes config logic to support its SGMII interface * BF3 adds support for "ethtool -s" for interface speed config * BF3 utilizes different MDIO logic for accessing the board-level PHY device Testing - Successful build of kernel for ARM64, ARM32, X86_64 - Tested ARM64 build on FastModels, Palladium, SoC ==================== Link: https://lore.kernel.org/r/20230112202609.21331-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	mlxbf_gige: fix white space in mlxbf_gige_eth_ioctl	David Thompson
	This patch fixes the white space issue raised by checkpatch: CHECK: Alignment should match open parenthesis +static int mlxbf_gige_eth_ioctl(struct net_device netdev, + struct ifreq ifr, int cmd) Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	mlxbf_gige: add "set_link_ksettings" ethtool callback	David Thompson
	This patch extends the "ethtool_ops" data structure to include the "set_link_ksettings" callback. This change enables configuration of the various interface speeds that the BlueField-3 supports (10Mbps, 100Mbps, and 1Gbps). Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	mlxbf_gige: support 10M/100M/1G speeds on BlueField-3	David Thompson
	The BlueField-3 OOB interface supports 10Mbps, 100Mbps, and 1Gbps speeds. The external PHY is responsible for autonegotiating the speed with the link partner. Once the autonegotiation is done, the BlueField PLU needs to be configured accordingly. This patch does two things: 1) Initialize the advertised control flow/duplex/speed in the probe based on the BlueField SoC generation (2 or 3) 2) Adjust the PLU speed config in the PHY interrupt handler Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	mlxbf_gige: add MDIO support for BlueField-3	David Thompson
	This patch adds initial MDIO support for the BlueField-3 SoC. Separate header files for the BlueField-2 and the BlueField-3 SoCs have been created. These header files hold the SoC-specific MDIO macros since the register offsets and bit fields have changed. Also, in BlueField-3 there is a separate register for writing and reading the MDIO data. Finally, instead of having "if" statements everywhere to differentiate between SoC-specific logic, a mlxbf_gige_mdio_gw_t struct was created for this purpose. Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: pcs: pcs-lynx: use phylink_get_link_timer_ns() helper	Russell King (Oracle)
	Use the phylink_get_link_timer_ns() helper to get the period for the link timer. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1pFyhW-0067jq-Fh@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	plca.c: fix obvious mistake in checking retval	Piergiorgio Beruto
	Revert a wrong fix that was done during the review process. The intention was to substitute "if(ret < 0)" with "if(ret)". Unfortunately, the intended fix did not meet the code. Besides, after additional review, it was decided that "if(ret < 0)" was actually the right thing to do. Fixes: 8580e16c28f3 ("net/ethtool: add netlink interface for the PLCA RS") Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/f2277af8951a51cfee2fb905af8d7a812b7beaf4.1673616357.git.piergiorgio.beruto@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	Merge branch 'net-mdio-continue-separating-c22-and-c45'	Jakub Kicinski
	Michael Walle says: ==================== net: mdio: Continue separating C22 and C45 I've picked this older series from Andrew up and rebased it onto the latest net-next. This is the second patch set in the series which separates the C22 and C45 MDIO bus transactions at the API level to the MDIO bus drivers. ==================== Link: https://lore.kernel.org/r/20230112-net-next-c45-seperation-part-2-v1-0-5eeaae931526@walle.cc Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	enetc: Separate C22 and C45 transactions	Andrew Lunn
	The enetc MDIO bus driver can perform both C22 and C45 transfers. Create separate functions for each and register the C45 versions using the new API calls where appropriate. This driver is shared with the Felix DSA switch, so update that at the same time. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: stmmac: Separate C22 and C45 transactions for xgmac	Andrew Lunn
	The stmmac MDIO bus driver in variant gmac4 can perform both C22 and C45 transfers. Create separate functions for each and register the C45 versions using the new API calls where appropriate. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: stmmac: Separate C22 and C45 transactions for xgmac2	Andrew Lunn
	The stmicro stmmac xgmac2 MDIO bus driver can perform both C22 and C45 transfers. Create separate functions for each and register the C45 versions using the new API calls where appropriate. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: lan743x: Separate C22 and C45 transactions	Andrew Lunn
	The microchip lan743x MDIO bus driver can perform both C22 and C45 transfers in some variants. Create separate functions for each and register the C45 versions using the new API calls where appropriate. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: ethernet: mtk_eth_soc: Separate C22 and C45 transactions	Andrew Lunn
	The mediatek bus driver can perform both C22 and C45 transfers. Create separate functions for each and register the C45 versions using the new API calls. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: mdio: ipq4019: Separate C22 and C45 transactions	Andrew Lunn
	The ipq4019 driver can perform both C22 and C45 transfers. Create separate functions for each and register the C45 versions using the new driver API calls. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: mdio: aspeed: Separate C22 and C45 transactions	Andrew Lunn
	The aspeed MDIO bus driver can perform both C22 and C45 transfers. Modify the existing C45 functions to take the devad as a parameter, and remove the wrappers so there are individual C22 and C45 functions. Add the C45 functions to the new API calls. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: mdio: mux-bcm-iproc: Separate C22 and C45 transactions	Andrew Lunn
	The MDIO mux broadcom iproc can perform both C22 and C45 transfers. Create separate functions for each and register the C45 versions using the new API calls. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: mdio: i2c: Separate C22 and C45 transactions	Andrew Lunn
	The MDIO over I2C bus driver can perform both C22 and C45 transfers. Create separate functions for each and register the C45 versions using the new API calls. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: mdio: cavium: Separate C22 and C45 transactions	Andrew Lunn
	The cavium IP can perform both C22 and C45 transfers. Create separate functions for each and register the C45 versions in both the octeon and thunder bus driver. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	nfp: add DCB IEEE support	Bin Chen
	Add basic DCB IEEE support. This includes support for ETS, max-rate, and DSCP to user priority mapping. DCB may be configured using iproute2's dcb command. Example usage: dcb ets set dev $dev tc-tsa 0:ets 1:ets 2:ets 3:ets 4:ets 5:ets \ 6:ets 7:ets tc-bw 0:0 1:80 2:0 3:0 4:0 5:0 6:20 7:0 dcb maxrate set dev $dev tc-maxrate 1:1000bit And DCB configuration can be shown using: dcb ets show dev $dev dcb maxrate show dev $dev Signed-off-by: Bin Chen <bin.chen@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230112121102.469739-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	net: ethernet: mtk_wed: get rid of queue lock for tx queue	Lorenzo Bianconi
	Similar to MTK Wireless Ethernet Dispatcher (WED) MCU rx queue, we do not need to protect WED MCU tx queue with a spin lock since the tx queue is accessed in the two following routines: - mtk_wed_wo_queue_tx_skb(): it is run at initialization and during mt7915 normal operation. Moreover MCU messages are serialized through MCU mutex. - mtk_wed_wo_queue_tx_clean(): it runs just at mt7915 driver module unload when no more messages are sent to the MCU. Remove tx queue spinlock. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/7bd0337b2a13ab1a63673b7c03fd35206b3b284e.1673515140.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	ipv6: remove max_size check inline with ipv4	Jon Maxwell
	In ip6_dst_gc() replace: if (entries > gc_thresh) With: if (entries > ops->gc_thresh) Sending Ipv6 packets in a loop via a raw socket triggers an issue where a route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly consumes the Ipv6 max_size threshold which defaults to 4096 resulting in these warnings: [1] 99.187805] dst_alloc: 7728 callbacks suppressed [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. . . [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. When this happens the packet is dropped and sendto() gets a network is unreachable error: remaining pkt 200557 errno 101 remaining pkt 196462 errno 101 . . remaining pkt 126821 errno 101 Implement David Aherns suggestion to remove max_size check seeing that Ipv6 has a GC to manage memory usage. Ipv4 already does not check max_size. Here are some memory comparisons for Ipv4 vs Ipv6 with the patch: Test by running 5 instances of a program that sends UDP packets to a raw socket 5000000 times. Compare Ipv4 and Ipv6 performance with a similar program. Ipv4: Before test: MemFree: 29427108 kB Slab: 237612 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 2881 3990 192 42 2 : tunables 0 0 0 During test: MemFree: 29417608 kB Slab: 247712 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 44394 44394 192 42 2 : tunables 0 0 0 After test: MemFree: 29422308 kB Slab: 238104 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 Ipv6 with patch: Errno 101 errors are not observed anymore with the patch. Before test: MemFree: 29422308 kB Slab: 238104 kB ip6_dst_cache 1912 2528 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 During Test: MemFree: 29431516 kB Slab: 240940 kB ip6_dst_cache 11980 12064 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 After Test: MemFree: 29441816 kB Slab: 238132 kB ip6_dst_cache 1902 2432 256 32 2 : tunables 0 0 0 xfrm_dst_cache 0 0 320 25 2 : tunables 0 0 0 ip_dst_cache 3048 4116 192 42 2 : tunables 0 0 0 Tested-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230112012532.311021-1-jmaxwell37@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	caif: don't assume iov_iter type	Keith Busch
	The details of the iov_iter types are appropriately abstracted, so there's no need to check for specific type fields. Just let the abstractions handle it. This is preparing for io_uring/net's io_send to utilize the more efficient ITER_UBUF. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20230111184245.3784393-1-kbusch@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	dt-bindings: net: rockchip-dwmac: fix rv1126 compatible warning	Anand Moon
	Fix compatible string for RV1126 gmac, and constrain it to be compatible with Synopsys dwmac 4.20a. fix below warning $ make CHECK_DTBS=y rv1126-edgeble-neu2-io.dtb arch/arm/boot/dts/rv1126-edgeble-neu2-io.dtb: ethernet@ffc40000: compatible: 'oneOf' conditional failed, one must be fixed: ['rockchip,rv1126-gmac', 'snps,dwmac-4.20a'] is too long 'rockchip,rv1126-gmac' is not one of ['rockchip,rk3568-gmac', 'rockchip,rk3588-gmac'] Fixes: b36fe2f43662 ("dt-bindings: net: rockchip-dwmac: add rv1126 compatible") Reviewed-by: Jagan Teki <jagan@edgeble.ai> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Anand Moon <anand@edgeble.ai> Link: https://lore.kernel.org/r/20230111172437.5295-1-anand@edgeble.ai Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-13	sock: add tracepoint for send recv length	Yunhui Cui
	Add 2 tracepoints to monitor the tcp/udp traffic of per process and per cgroup. Regarding monitoring the tcp/udp traffic of each process, there are two existing solutions, the first one is https://www.atoptool.nl/netatop.php. The second is via kprobe/kretprobe. Netatop solution is implemented by registering the hook function at the hook point provided by the netfilter framework. These hook functions may be in the soft interrupt context and cannot directly obtain the pid. Some data structures are added to bind packets and processes. For example, struct taskinfobucket, struct taskinfo ... Every time the process sends and receives packets it needs multiple hashmaps,resulting in low performance and it has the problem fo inaccurate tcp/udp traffic statistics(for example: multiple threads share sockets). We can obtain the information with kretprobe, but as we know, kprobe gets the result by trappig in an exception, which loses performance compared to tracepoint. We compared the performance of tracepoints with the above two methods, and the results are as follows: ab -n 1000000 -c 1000 -r http://127.0.0.1/index.html without trace: Time per request: 39.660 [ms] (mean) Time per request: 0.040 [ms] (mean, across all concurrent requests) netatop: Time per request: 50.717 [ms] (mean) Time per request: 0.051 [ms] (mean, across all concurrent requests) kr: Time per request: 43.168 [ms] (mean) Time per request: 0.043 [ms] (mean, across all concurrent requests) tracepoint: Time per request: 41.004 [ms] (mean) Time per request: 0.041 [ms] (mean, across all concurrent requests It can be seen that tracepoint has better performance. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Xiongchun Duan <duanxiongchun@bytedance.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	Merge branch 'rmnet-tx-pkt-aggregation'	David S. Miller
	Daniele Palmas says: ==================== net: add tx packets aggregation to ethtool and rmnet Hello maintainers and all, this patchset implements tx qmap packets aggregation in rmnet and generic ethtool support for that. Some low-cat Thread-x based modems are not capable of properly reaching the maximum allowed throughput both in tx and rx during a bidirectional test if tx packets aggregation is not enabled. I verified this problem with rmnet + qmi_wwan by using a MDM9207 Cat. 4 based modem (50Mbps/150Mbps max throughput). What is actually happening is pictured at https://drive.google.com/file/d/1gSbozrtd9h0X63i6vdkNpN68d-9sg8f9/view Testing with iperf TCP, when rx and tx flows are tested singularly there's no issue in tx and minor issues in rx (not able to reach max throughput). When there are concurrent tx and rx flows, tx throughput has an huge drop. rx a minor one, but still present. The same scenario with tx aggregation enabled is pictured at https://drive.google.com/file/d/1jcVIKNZD7K3lHtwKE5W02mpaloudYYih/view showing a regular graph. This issue does not happen with high-cat modems (e.g. SDX20), or at least it does not happen at the throughputs I'm able to test currently: maybe the same could happen when moving close to the maximum rates supported by those modems. Anyway, having the tx aggregation enabled should not hurt. The first attempt to solve this issue was in qmi_wwan qmap implementation, see the discussion at https://lore.kernel.org/netdev/20221019132503.6783-1-dnlplm@gmail.com/ However, it turned out that rmnet was a better candidate for the implementation. Moreover, Greg and Jakub suggested also to use ethtool for the configuration: not sure if I got their advice right, but this patchset add also generic ethtool support for tx aggregation. The patches have been tested mainly against an MDM9207 based modem through USB and SDX55 through PCI (MHI). v2 should address the comments highlighted in the review: the implementation is still in rmnet, due to Subash's request of keeping tx aggregation there. v3 fixes ethtool-netlink.rst content out of table bounds and a W=1 build warning for patch 2. v4 solves a race related to egress_agg_params. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: qualcomm: rmnet: add ethtool support for configuring tx aggregation	Daniele Palmas
	Add support for ETHTOOL_COALESCE_TX_AGGR for configuring the tx aggregation settings. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: qualcomm: rmnet: add tx packets aggregation	Daniele Palmas
	Add tx packets aggregation. Bidirectional TCP throughput tests through iperf with low-cat Thread-x based modems revelead performance issues both in tx and rx. The Windows driver does not show this issue: inspecting USB packets revealed that the only notable change is the driver enabling tx packets aggregation. Tx packets aggregation is by default disabled and can be enabled by increasing the value of ETHTOOL_A_COALESCE_TX_MAX_AGGR_FRAMES. The maximum aggregated size is by default set to a reasonably low value in order to support the majority of modems. This implementation is based on patches available in Code Aurora repositories (msm kernel) whose main authors are Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Reviewed-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	ethtool: add tx aggregation parameters	Daniele Palmas
	Add the following ethtool tx aggregation parameters: ETHTOOL_A_COALESCE_TX_AGGR_MAX_BYTES Maximum size in bytes of a tx aggregated block of frames. ETHTOOL_A_COALESCE_TX_AGGR_MAX_FRAMES Maximum number of frames that can be aggregated into a block. ETHTOOL_A_COALESCE_TX_AGGR_TIME_USECS Time in usecs after the first packet arrival in an aggregated block for the block to be sent. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	Merge branch 'dsa-microchip-ptp'	David S. Miller
	Arun Ramadoss says: ==================== net: dsa: microchip: add PTP support for KSZ9563/KSZ8563 and LAN937x KSZ9563/KSZ8563 and LAN937x switch are capable for supporting IEEE 1588 PTP protocol. LAN937x has the same PTP register set similar to KSZ9563, hence the implementation has been made common for the KSZ switches. KSZ9563 does not support two step timestamping but LAN937x supports both. Tested the 1step & 2step p2p timestamping in LAN937x and p2p1step timestamping in KSZ9563. This patch series is based on the Christian Eggers PTP support for KSZ9563. Applied the Christian patch and updated as per the latest refactoring of KSZ series code. The features added on top are PTP packet Interrupt implementation based on nested handler, LAN937x two step timestamping and programmable per_out pins. Link: https://www.spinics.net/lists/netdev/msg705531.html Patch v7 -> v8 - set skb->ip_summed = CHECKSUM_NONE after updating the checksum Patch v6 -> v7 - Corrected the misplaced spaces and tabs - Added mutex lock in do_aux_work - Replaced 0/1 with false/true for ts_en - SKB_TX_INPROGRESS flag is set before dsa_enqueue_skb - Removed the fallthrough keyword - pdelay_resp header correction is performed based on KSZ_SKB_CB(skb)->update_correction instead of clone Patch v5 -> v6 - Rebased to latest net-next and renamed from RFC to patch net-next. Patch v4 -> v5 - Replaced irq_domain_add_simple with irq_doamin_add_linear - Used the helper diff_by_scaled_ppm() for adjfine. Patch v3 -> v4 - removed IRQF_TRIGGER_FALLING from the request_threaded_irq of ptp msg - addressed review comments on patch 10 periodic output - added sign off in patch 6 & 9 - reverted to set PTP_1STEP bit for lan937x which is missed during v3 regression Patch v2-> v3 - used port_rxtstamp for reconstructing the absolute timestamp instead of tagger function pointer. - Reverted to setting of 802.1As bit. Patch v1 -> v2 - GPIO perout enable bit is different for LAN937x and KSZ9x. Added new patch for configuring LAN937x programmable pins. - PTP enabled in hardware based on both tx and rx timestamping of all the user ports. - Replaced setting of 802.1AS bit with P2P bit in PTP_MSG_CONF1 register. RFC v2 -> Patch v1 - Changed the patch author based on past patch submission - Changed the commit message prefix as net: dsa: microchip: ptp Individual patch changes are listed in correspondig commits. RFC v1 -> v2 - Added the p2p1step timestamping and conditional execution of 2 step for LAN937x only. - Added the periodic output support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: dsa: microchip: ptp: lan937x: Enable periodic output in LED pins	Arun Ramadoss
	There is difference in implementation of per_out pins between KSZ9563 and LAN937x. In KSZ9563, Timestamping control register (0x052C) bit 6, if 1 - timestamp input and 0 - trigger output. But it is opposite for LAN937x 1 - trigger output and 0 - timestamp input. As per per_out gpio pins, KSZ9563 has four Led pins and two dedicated gpio pins. But in LAN937x dedicated gpio pins are removed instead there are up to 10 LED pins out of which LED_0 and LED_1 can be mapped to PTP tou 0, 1 or 2. This patch sets the bit 6 in 0x052C register and configure the LED override and source register for LAN937x series of switches alone. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: dsa: microchip: ptp: lan937x: add 2 step timestamping	Arun Ramadoss
	LAN937x series of switches support 2 step timestamping mechanism. There are timestamp correction calculation performed in ksz_rcv_timestamp and ksz_xmit_timestamp which are applicable only for p2p1step. To check whether the 2 step is enabled or not in tag_ksz.c introduced the helper function in taggger_data to query it from ksz_ptp.c. Based on whether 2 step is enabled or not, timestamp calculation are performed. Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: dsa: microchip: ptp: add support for perout programmable pins	Arun Ramadoss
	There are two programmable pins available for Trigger output unit to generate periodic pulses. This patch add verify_pin for the available 2 pins and configure it with respect to GPIO index for the TOU unit. Tested using testptp ./testptp -i 0 -L 0,2 ./testptp -i 0 -d /dev/ptp0 -p 1000000000 ./testptp -i 1 -L 1,2 ./testptp -i 1 -d /dev/ptp0 -p 100000000 Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: dsa: microchip: ptp: add periodic output signal	Christian Eggers
	LAN937x and KSZ PTP supported switches has Three Trigger output unit. This TOU can used to generate the periodic signal for PTP. TOU has the cycle width register of 32 bit in size and period width register of 24 bit, each value is of 8ns so the pulse width can be maximum 125ms. Tested using ./testptp -d /dev/ptp0 -p 1000000000 -w 100000000 for generating the 10ms pulse width Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: dsa: microchip: ptp: move pdelay_rsp correction field to tail tag	Christian Eggers
	For PDelay_Resp messages we will likely have a negative value in the correction field. The switch hardware cannot correctly update such values (produces an off by one error in the UDP checksum), so it must be moved to the time stamp field in the tail tag. Format of the correction field is 48 bit ns + 16 bit fractional ns. After updating the correction field, clone is no longer required hence it is freed. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: dsa: microchip: ptp: add packet transmission timestamping	Christian Eggers
	This patch adds the routines for transmission of ptp packets. When the ptp pdelay_req packet to be transmitted, it uses the deferred xmit worker to schedule the packets. During irq_setup, interrupt for Sync, Pdelay_req and Pdelay_rsp are enabled. So interrupt is triggered for all three packets. But for p2p1step, we require only time stamp of Pdelay_req packet. Hence to avoid posting of the completion from ISR routine for Sync and Pdelay_resp packets, ts_en flag is introduced. This controls which packets need to processed for timestamp. After the packet is transmitted, ISR is triggered. The time at which packet transmitted is recorded to separate register. This value is reconstructed to absolute time and posted to the user application through socket error queue. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: dsa: microchip: ptp: add packet reception timestamping	Christian Eggers
	Rx Timestamping is done through 4 additional bytes in tail tag. Whenever the ptp packet is received, the 4 byte hardware time stamped value is added before 1 byte tail tag. Also, bit 7 in tail tag indicates it as PTP frame. This 4 byte value is extracted from the tail tag and reconstructed to absolute time and assigned to skb hwtstamp. If the packet received in PDelay_Resp, then partial ingress timestamp is subtracted from the correction field. Since user space tools expects to be done in hardware. Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-13	net: ptp: add helper for one-step P2P clocks	Christian Eggers
	For P2P delay measurement, the ingress time stamp of the PDelay_Req is required for the correction field of the PDelay_Resp. The application echoes back the correction field of the PDelay_Req when sending the PDelay_Resp. Some hardware (like the ZHAW InES PTP time stamping IP core) subtracts the ingress timestamp autonomously from the correction field, so that the hardware only needs to add the egress timestamp on tx. Other hardware (like the Microchip KSZ9563) reports the ingress time stamp via an interrupt and requires that the software provides this time stamp via tail-tag on tx. In order to avoid introducing a further application interface for this, the driver can simply emulate the behavior of the InES device and subtract the ingress time stamp in software from the correction field. On egress, the correction field can either be kept as it is (and the time stamp field in the tail-tag is set to zero) or move the value from the correction field back to the tail-tag. Changing the correction field requires updating the UDP checksum (if UDP is used as transport). Signed-off-by: Christian Eggers <ceggers@arri.de> Co-developed-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>