Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls)
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked) in
BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream
performance up to 2x.
- Improve performance of contended connect() by 200% by searching for
an available 4-tuple under RCU rather than a spin lock. Bring an
additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles. There are up to 4
namespace roles but we used to have just 2 netns pointer arguments,
interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout in
TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a
module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name to
messages as metadata
Driver API:
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where
possible. Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers:
- Remove the IBM LCS driver for s390
- Remove the sb1000 cable modem driver
- Add support for SFP module access over SMBus
- Add MCTP transport driver for MCTP-over-USB
- Enable XDP metadata support in multiple drivers
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD
platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96%
with 1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused
cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at
runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7"
* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
unix: fix up for "apparmor: add fine grained af_unix mediation"
mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
net: usb: asix: ax88772: Increase phy_name size
net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
net: libwx: fix Tx L4 checksum
net: libwx: fix Tx descriptor content for some tunnel packets
atm: Fix NULL pointer dereference
net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
net: phy: aquantia: add essential functions to aqr105 driver
net: phy: aquantia: search for firmware-name in fwnode
net: phy: aquantia: add probe function to aqr105 for firmware loading
net: phy: Add swnode support to mdiobus_scan
gve: add XDP DROP and PASS support for DQ
gve: update XDP allocation path support RX buffer posting
gve: merge packet buffer size fields
gve: update GQ RX to use buf_size
gve: introduce config-based allocation for XDP
gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
...
|
|
Add two helper functions - rtnl_newlink_link_net() and
rtnl_newlink_peer_net() for netns fallback logic. Peer netns falls back
to link netns, and link netns falls back to source netns.
Convert the use of params->net in netdevice drivers to one of the helper
functions for clarity.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-4-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are 4 net namespaces involved when creating links:
- source netns - where the netlink socket resides,
- target netns - where to put the device being created,
- link netns - netns associated with the device (backend),
- peer netns - netns of peer device.
Currently, two nets are passed to newlink() callback - "src_net"
parameter and "dev_net" (implicitly in net_device). They are set as
follows, depending on netlink attributes in the request.
+------------+-------------------+---------+---------+
| peer netns | IFLA_LINK_NETNSID | src_net | dev_net |
+------------+-------------------+---------+---------+
| | absent | source | target |
| absent +-------------------+---------+---------+
| | present | link | link |
+------------+-------------------+---------+---------+
| | absent | peer | target |
| present +-------------------+---------+---------+
| | present | peer | link |
+------------+-------------------+---------+---------+
When IFLA_LINK_NETNSID is present, the device is created in link netns
first and then moved to target netns. This has some side effects,
including extra ifindex allocation, ifname validation and link events.
These could be avoided if we create it in target netns from
the beginning.
On the other hand, the meaning of src_net parameter is ambiguous. It
varies depending on how parameters are passed. It is the effective
link (or peer netns) by design, but some drivers ignore it and use
dev_net instead.
To provide more netns context for drivers, this patch packs existing
newlink() parameters, along with the source netns, link netns and peer
netns, into a struct. The old "src_net" is renamed to "net" to avoid
confusion with real source netns, and will be deprecated later. The use
of src_net are converted to params->net trivially.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
hrtimer_setup() takes the callback function pointer as argument and
initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of
hrtimer::function with the new setup mechanism.
Patch was created by using Coccinelle.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/4312885d95fc99053b2c285f74cc83a852c03f4a.1738746872.git.namcao@linutronix.de
|
|
NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
rather an attribute, very similar to IFF_NO_QUEUE (and hot).
Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Simon reported that ndo_change_mtu() methods were never
updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted
in commit 501a90c94510 ("inet: protect against too small
mtu values.")
We read dev->mtu without holding RTNL in many places,
with READ_ONCE() annotations.
It is time to take care of ndo_change_mtu() methods
to use corresponding WRITE_ONCE()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Simon Horman <horms@kernel.org>
Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.
This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Qualcom rmnet and emac drivers.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The variable rmnet_link_ops assign a *bigger* maxtype which leads to a
global out-of-bounds read when parsing the netlink attributes. See bug
trace below:
==================================================================
BUG: KASAN: global-out-of-bounds in validate_nla lib/nlattr.c:386 [inline]
BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600
Read of size 1 at addr ffffffff92c438d0 by task syz-executor.6/84207
CPU: 0 PID: 84207 Comm: syz-executor.6 Tainted: G N 6.1.0 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x8b/0xb3 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:284 [inline]
print_report+0x172/0x475 mm/kasan/report.c:395
kasan_report+0xbb/0x1c0 mm/kasan/report.c:495
validate_nla lib/nlattr.c:386 [inline]
__nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600
__nla_parse+0x3e/0x50 lib/nlattr.c:697
nla_parse_nested_deprecated include/net/netlink.h:1248 [inline]
__rtnl_newlink+0x50a/0x1880 net/core/rtnetlink.c:3485
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3594
rtnetlink_rcv_msg+0x43c/0xd70 net/core/rtnetlink.c:6091
netlink_rcv_skb+0x14f/0x410 net/netlink/af_netlink.c:2540
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x54e/0x800 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x930/0xe50 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0x154/0x190 net/socket.c:734
____sys_sendmsg+0x6df/0x840 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fdcf2072359
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fdcf13e3168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fdcf219ff80 RCX: 00007fdcf2072359
RDX: 0000000000000000 RSI: 0000000020000200 RDI: 0000000000000003
RBP: 00007fdcf20bd493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fffbb8d7bdf R14: 00007fdcf13e3300 R15: 0000000000022000
</TASK>
The buggy address belongs to the variable:
rmnet_policy+0x30/0xe0
The buggy address belongs to the physical page:
page:0000000065bdeb3c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x155243
flags: 0x200000000001000(reserved|node=0|zone=2)
raw: 0200000000001000 ffffea00055490c8 ffffea00055490c8 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffffffff92c43780: f9 f9 f9 f9 00 00 00 02 f9 f9 f9 f9 00 00 00 07
ffffffff92c43800: f9 f9 f9 f9 00 00 00 05 f9 f9 f9 f9 06 f9 f9 f9
>ffffffff92c43880: f9 f9 f9 f9 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9
^
ffffffff92c43900: 00 00 00 00 00 00 00 00 07 f9 f9 f9 f9 f9 f9 f9
ffffffff92c43980: 00 00 00 07 f9 f9 f9 f9 00 00 00 05 f9 f9 f9 f9
According to the comment of `nla_parse_nested_deprecated`, the maxtype
should be len(destination array) - 1. Hence use `IFLA_RMNET_MAX` here.
Fixes: 14452ca3b5ce ("net: qualcomm: rmnet: Export mux_id and flags to netlink")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240110061400.3356108-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for ETHTOOL_COALESCE_TX_AGGR for configuring the tx
aggregation settings.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add tx packets aggregation.
Bidirectional TCP throughput tests through iperf with low-cat
Thread-x based modems revelead performance issues both in tx
and rx.
The Windows driver does not show this issue: inspecting USB
packets revealed that the only notable change is the driver
enabling tx packets aggregation.
Tx packets aggregation is by default disabled and can be enabled
by increasing the value of ETHTOOL_A_COALESCE_TX_MAX_AGGR_FRAMES.
The maximum aggregated size is by default set to a reasonably low
value in order to support the majority of modems.
This implementation is based on patches available in Code Aurora
repositories (msm kernel) whose main authors are
Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Reviewed-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.
Convert to the regular interface.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Zero-length arrays are deprecated and we are moving towards adopting
C99 flexible-array members, instead. So, replace zero-length arrays
declarations in anonymous union with the new DECLARE_FLEX_ARRAY()
helper macro.
This helper allows for flexible-array members in unions.
Link: https://github.com/KSPP/linux/issues/193
Link: https://github.com/KSPP/linux/issues/221
Link: https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before adding yet another possibly contended atomic_long_t,
it is time to add per-cpu storage for existing ones:
dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler
Because many devices do not have to increment such counters,
allocate the per-cpu storage on demand, so that dev_get_stats()
does not have to spend considerable time folding zero counters.
Note that some drivers have abused these counters which
were supposed to be only used by core networking stack.
v4: should use per_cpu_ptr() in dev_get_stats() (Jakub)
v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo)
v2: add a missing include (reported by kernel test robot <lkp@intel.com>)
Change in netdev_core_stats_alloc() (Jakub)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: jeffreyji <jeffreyji@google.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use skb_put_zero() instead of hand-writing it. This saves a few lines of
code and is more readable.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Number of drivers use eth_random_addr(netdev->dev_addr)
thus writing to netdev->dev_addr directly, and not setting
the address type. Switch them to eth_hw_addr_random().
@@
expression netdev;
@@
- eth_random_addr(netdev->dev_addr);
+ eth_hw_addr_random(netdev);
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We recently changed these two pointers from void pointers to struct
pointers and it breaks the pointer math so now the "txphdr" points
beyond the end of the buffer.
Fixes: 56a967c4f7e5 ("net: qualcomm: rmnet: Remove some unneeded casts")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh
scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the explicit casts in the checksum complement functions
and pass the actual protocol specific headers instead.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The idiomatic way to handle the changelink flags/mask pair seems to be
allow partial updates of the driver's link flags. In contrast the rmnet
driver masks the incoming flags and then use that as the new flags.
Change the rmnet driver to follow the common scheme, before the
introduction of IFLA_RMNET_FLAGS handling in iproute2 et al.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A recent change tidied up some conditional code, avoiding the use of
some #ifdefs. Unfortunately, if CONFIG_IPV6 was not enabled, it
meant that two functions were referenced but never defined.
The easiest fix is to just define stubs for these functions if
CONFIG_IPV6 is not defined. This will soon be simplified further
by some other development in the works...
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 75db5b07f8c39 ("net: qualcomm: rmnet: eliminate some ifdefs")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The purpose of the loop using u64_stats_fetch_*_irq() is to ensure
statistics on a given CPU are collected atomically. If one of the
statistics values gets updated within the begin/retry window, the
loop will run again.
Currently the statistics totals are updated inside that window.
This means that if the loop ever retries, the statistics for the
CPU will be counted more than once.
Fix this by taking a snapshot of a CPU's statistics inside the
protected window, and then updating the counters with the snapshot
values after exiting the loop.
(Also add a newline at the end of this file...)
Fixes: 192c4b5d48f2a ("net: qualcomm: rmnet: Add support for 64 bit stats")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We don't support any extension headers for IPv6 packets. Extension
headers therefore contribute 0 bytes to the payload length. As a
result we can just use the IPv6 payload length as the length used to
compute the pseudo header checksum for both UDP and TCP messages.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We compare a payload checksum with a pseudo checksum value for
equality in rmnet_map_ipv4_dl_csum_trailer(). Both of those values
are computed with a unary NOT (~) operation. The result of the
comparison is the same if we omit that NOT for both values.
Remove these operations in rmnet_map_ipv6_dl_csum_trailer() also.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The csum_value field in the rmnet_map_dl_csum_trailer structure is a
"real" Internet checksum. It is a 16 bit value, in big endian format,
which represents an inverted ones' complement sum over pairs of bytes.
Make that clear by changing its type to __sum16.
This makes a typecast in rmnet_map_ipv4_dl_csum_trailer() and
another in rmnet_map_ipv6_dl_csum_trailer() unnecessary.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous patch makes rmnet_map_ipv4_dl_csum_trailer() return
early with an error if it is determined that the computed checksum
for the IP payload does not match what was expected.
If the computed checksum *does* match the expected value, the IP
payload (i.e., the transport message), can be considered good.
There is no need to do any further processing of the message.
This means a big block of code is unnecessary for validating the
transport checksum value, and can be removed.
Make comparable changes in rmnet_map_ipv6_dl_csum_trailer().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In rmnet_map_ipv4_dl_csum_trailer(), if the sum of the trailer
checksum and the pseudo checksum is non-zero, checksum validation
has failed. We can return an error as soon as we know that.
We can do the same thing in rmnet_map_ipv6_dl_csum_trailer().
Add some comments that explain where we're headed.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch simply demonstrates that a checksum value computed when
verifying an offloaded transport checksum value for both IPv4 and
IPv6 is (normally) 0. It can be squashed into the next patch.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With the ones' complement arithmetic, the sum of two negated values
is equal to the negation of the sum of the two original values [1].
Rearrange the calculation ip6_payload_sum using this property.
[1] https://tools.ietf.org/html/rfc1071
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In rmnet_map_ipv4_dl_csum_trailer(), remove the "csum_temp" and
"addend" local variables, and simplify a few lines of code.
Remove the "csum_temp", "csum_value", "ip6_hdr_csum", and "addend"
local variables in rmnet_map_ipv6_dl_csum_trailer(), and simplify a
few lines of code there as well.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the previous patch IPv4 download checksum offload code was
updated to avoid unnecessary byte swapping, based on properties of
the Internet checksum algorithm. This patch makes comparable
changes to the IPv6 download checksum offload handling.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Internet checksums are used for IPv4 header checksum, as well as TCP
segment and UDP datagram checksums. Such a checksum represents the
negated sum of adjacent pairs of bytes, using ones' complement
arithmetic.
One property of the Internet checkum is byte order independence [1].
Specifically, the sum of byte-swapped pairs is equal to the result
of byte swapping the sum of those same pairs when not byte-swapped.
So for example if a, b, c, d, y, and z are hexadecimal digits, and
PLUS represents ones' complement addition:
If: ab PLUS cd = yz
Then: ba PLUS dc = zy
For this reason, there is no need to swap the order of bytes in the
checksum value held in a message header, nor the one in the QMAPv4
trailer, in order to operate on them.
In other words, we can determine whether the hardware-computed
checksum matches the one in the message header without any byte
swaps.
(This patch leaves in place all existing type casts.)
[1] https://tools.ietf.org/html/rfc1071
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In rmnet_map_ipv6_dl_csum_trailer() there is an especially involved
line of code that determines the ones' complement sum of the IPv6
packet header (in host byte order). Simplify that by storing the
result of computing just the header checksum in a local variable,
then using that in the original assignment.
Use the size of the IPv6 header structure as the number of bytes to
checksum, rather than computing the offset to the transport header.
And use ip_fast_csum() rather than ipa_compute_csum(), knowing that
the size of an IPv6 header (40 bytes) is a multiple of 4 bytes
greater than 16.
Add some comments to match rmnet_map_ipv4_dl_csum_trailer().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In rmnet_map_ipv4_dl_csum_trailer(), an illegal checksum subtraction
is done, subtracting hdr_csum (in host byte order) from csum_value (in
network byte order). Despite being illegal, it generally works,
because it turns out the value subtracted is (or should be) always 0,
which has the same representation in either byte order.
Doing illegal operations is not good form though, so fix this by
verifying the IP header checksum early in that function. If its
checksum is non-zero, the packet will be bad, so just return an
error. This will cause the packet to passed to the IP layer where
it can be dropped.
Thereafter, there is no need subtract the IP header checksum from
the checksum value in the trailer because we know it is zero.
Add a comment explaining this.
This type of packet error is different from other types, so add a
new statistics counter to track this condition.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The checksum fields of the TCP and UDP header structures already
have type __sum16. We don't support any other protocol headers, so
we can simplify rmnet_map_get_csum_field(), getting rid of the local
variable entirely and just returning the appropriate address.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The value passed as an argument to rmnet_map_ipv4_ul_csum_header()
is always an IPv4 header. Rather than using a local variable, just
have the type of the argument reflect the proper type.
In rmnet_map_ipv6_ul_csum_header() things are defined a little
differently, but make the same basic change there.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If IPV6 is not enabled in the kernel configuration, the RMNet
checksum code indicates a buffer containing an IPv6 packet is not
supported. The same thing happens if a buffer contains something
other than an IPv4 or IPv6 packet.
We can rearrange things a bit in two functions so that some #ifdef
calls can simply be eliminated.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In rmnet_map_ipv4_dl_csum_trailer() use ip_is_fragment() to
determine whether a socket buffer contains a packet fragment.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit e1d9a90a9bfd ("net: ethernet: rmnet: Support for ingress MAPv5
checksum offload") broke ingress handling for devices where
RMNET_FLAGS_INGRESS_MAP_CKSUMV5 or RMNET_FLAGS_INGRESS_MAP_CKSUMV4 are
not set. Unless either of these flags are set, the MAP header is not
removed. This commit restores the original logic by ensuring that the
MAP header is removed for all MAP packets.
Fixes: e1d9a90a9bfd ("net: ethernet: rmnet: Support for ingress MAPv5 checksum offload")
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clang warns that proto in rmnet_map_v5_checksum_uplink_packet() might be
used uninitialized:
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:283:14: warning:
variable 'proto' is used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
} else if (skb->protocol == htons(ETH_P_IPV6)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:295:36: note:
uninitialized use occurs here
check = rmnet_map_get_csum_field(proto, trans);
^~~~~
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:283:10: note:
remove the 'if' if its condition is always true
} else if (skb->protocol == htons(ETH_P_IPV6)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:270:11: note:
initialize the variable 'proto' to silence this warning
u8 proto;
^
= '\0'
1 warning generated.
This is technically a false positive because there is an if statement
above this one that checks skb->protocol for not being either
ETH_P_IP{,V6}. However, it is more obvious to sink that into the if
statement as an else branch, which makes the code clearer and fixes the
warning.
At the same time, move the "IS_ENABLED(CONFIG_IPV6)" into the else if
condition so that the else branch of the preprocessor conditional can
be shared, since there is no build failure with CONFIG_IPV6 disabled.
Fixes: b6e5d27e32ef ("net: ethernet: rmnet: Add support for MAPv5 egress packets")
Link: https://github.com/ClangBuiltLinux/linux/issues/1390
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding support for MAPv5 egress packets.
This involves adding the MAPv5 header and setting the csum_valid_required
in the checksum header to request HW compute the checksum.
Corresponding stats are incremented based on whether the checksum is
computed in software or HW.
New stat has been added which represents the count of packets whose
checksum is calculated by the HW.
Signed-off-by: Sharath Chandra Vurukala <sharathv@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding support for processing of MAPv5 downlink packets.
It involves parsing the Mapv5 packet and checking the csum header
to know whether the hardware has validated the checksum and is
valid or not.
Based on the checksum valid bit the corresponding stats are
incremented and skb->ip_summed is marked either CHECKSUM_UNNECESSARY
or left as CHEKSUM_NONE to let network stack revalidate the checksum
and update the respective snmp stats.
Current MAPV1 header has been modified, the reserved field in the
Mapv1 header is now used for next header indication.
Signed-off-by: Sharath Chandra Vurukala <sharathv@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace the use of C bit-fields in the rmnet_map_ul_csum_header
structure with a single two-byte (big endian) structure member,
and use masks to encode or get values within it. The content of
these fields can be accessed using simple bitwise AND and OR
operations on the (host byte order) value of the new structure
member.
Previously rmnet_map_ipv4_ul_csum_header() would update C bit-field
values in host byte order, then forcibly fix their byte order using
a combination of byte swap operations and types.
Instead, just compute the value that needs to go into the new
structure member and save it with a simple byte-order conversion.
Make similar simplifications in rmnet_map_ipv6_ul_csum_header().
Finally, in rmnet_map_checksum_uplink_packet() a set of assignments
zeroes every field in the upload checksum header. Replace that with
a single memset() operation.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace the use of C bit-fields in the rmnet_map_dl_csum_trailer
structure with a single one-byte field, using constant field masks
to encode or get at embedded values.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The actual layout of bits defined in C bit-fields (e.g. int foo : 3)
is implementation-defined. Structures defined in <linux/if_rmnet.h>
address this by specifying all bit-fields twice, to cover two
possible layouts.
I think this pattern is repetitive and noisy, and I find the whole
notion of compiler "bitfield endianness" to be non-intuitive.
Stop using C bit-fields for the command/data flag and the pad length
fields in the rmnet_map structure, and define a single-byte flags
field instead. Define a mask for the single-bit "command" flag,
and another mask for the encoded pad length. The content of both
fields can be accessed using a simple bitwise AND operation.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The following macros, defined in "rmnet_map.h", assume a socket
buffer is provided as an argument without any real indication this
is the case.
RMNET_MAP_GET_MUX_ID()
RMNET_MAP_GET_CD_BIT()
RMNET_MAP_GET_PAD()
RMNET_MAP_GET_CMD_START()
RMNET_MAP_GET_LENGTH()
What they hide is pretty trivial accessing of fields in a structure,
and it's much clearer to see this if we do these accesses directly.
So rather than using these accessor macros, assign a local
variable of the map header pointer type to the socket buffer data
pointer, and derereference that pointer variable.
In "rmnet_map_data.c", use sizeof(object) rather than sizeof(type)
in one spot. Also, there's no need to byte swap 0; it's all zeros
irrespective of endianness.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In rmnet_map_ipv4_ul_csum_header() and rmnet_map_ipv6_ul_csum_header()
the offset within a packet at which checksumming should commence is
calculated. This calculation involves byte swapping and a forced type
conversion that makes it hard to understand.
Simplify this by computing the offset in host byte order, then
converting the result when assigning it into the header field.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no guarantee that rmnet rx_handler is only fed with linear
skbs, but current rmnet implementation does not check that, leading
to crash in case of non linear skbs processed as linear ones.
Fix that by ensuring skb linearization before processing.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Link: https://lore.kernel.org/r/1612428002-12333-2-git-send-email-loic.poulain@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Packets sent by rmnet to the real device have variable MAP header
lengths based on the data format configured. This patch adds checks
to ensure that the real device MTU is sufficient to transmit the MAP
packet comprising of the MAP header and the IP packet. This check
is enforced when rmnet devices are created and updated and during
MTU updates of both the rmnet and real device.
Additionally, rmnet devices now have a default MTU configured which
accounts for the real device MTU and the headroom based on the data
format.
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Tested-by: Loic Poulain <loic.poulain@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
linux/netdevice.h is included in very many places, touching any
of its dependecies causes large incremental builds.
Drop the linux/ethtool.h include, linux/netdevice.h just needs
a forward declaration of struct ethtool_ops.
Fix all the places which made use of this implicit include.
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/20201120225052.1427503-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|