Age | Commit message (Collapse) | Author |
|
Dumping all conntrack entries via proc interface can take hours due to
linear search to skip entries dumped so far in each cycle.
Apply same strategy used to speed up ipvs proc reading done in
commit 178883fd039d ("ipvs: speed up reads from ip_vs_conn proc file")
to nf_conntrack.
Note that the ctnetlink interface doesn't suffer from this problem, but
many scripts depend on the nf_conntrack proc interface.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The xt_quota compares skb length with remaining quota, but the nft_quota
compares it with consumed bytes.
The xt_quota can match consumed bytes up to quota at maximum. But the
nft_quota break match when consumed bytes equal to quota.
i.e., nft_quota match consumed bytes in [0, quota - 1], not [0, quota].
Fixes: 795595f68d6c ("netfilter: nft_quota: dump consumed quota")
Signed-off-by: Zhongqiu Duan <dzq.aishenghu0@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add a new test case to check:
- conntrack_max limit is effective
- conntrack_max limit cannot be exceeded from within a netns
- resizing the hash table while packets are inflight works
- removal of all conntrack rules disables conntrack in netns
- conntrack tool dump (conntrack -L) returns expected number
of (unique) entries
- procfs interface - if available - has same number of entries
as conntrack -L dump
Expected output with selftest framework:
selftests: net/netfilter: conntrack_resize.sh
PASS: got 1 connections: netns conntrack_max is pernet bound
PASS: got 100 connections: netns conntrack_max is init_net bound
PASS: dump in netns had same entry count (-C 1778, -L 1778, -p 1778, /proc 0)
PASS: dump in netns had same entry count (-C 2000, -L 2000, -p 2000, /proc 0)
PASS: test parallel conntrack dumps
PASS: resize+flood
PASS: got 0 connections: conntrack disabled
PASS: got 1 connections: conntrack enabled
ok 1 selftests: net/netfilter: conntrack_resize.sh
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
dropping it
The config NF_CONNTRACK_BRIDGE will change the bridge forwarding for
fragmented packets.
The original bridge does not know that it is a fragmented packet and
forwards it directly, after NF_CONNTRACK_BRIDGE is enabled, function
nf_br_ip_fragment and br_ip6_fragment will check the headroom.
In original br_forward, insufficient headroom of skb may indeed exist,
but there's still a way to save the skb in the device driver after
dev_queue_xmit.So droping the skb will change the original bridge
forwarding in some cases.
Fixes: 3c171f496ef5 ("netfilter: bridge: add connection tracking system")
Signed-off-by: Huajian Yang <huajianyang@asrmicro.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Commit 32607a332cfe ("ipv4: prefer multipath nexthop that matches source
address") changed IPv4 nexthop selection to prefer a nexthop whose
nexthop device is assigned the specified source address for locally
generated traffic.
While the selection honors the "fib_multipath_use_neigh" sysctl and will
not choose a nexthop with an invalid neighbour, it does not honor the
"ignore_routes_with_linkdown" sysctl and can choose a nexthop without a
carrier:
$ sysctl net.ipv4.conf.all.ignore_routes_with_linkdown
net.ipv4.conf.all.ignore_routes_with_linkdown = 1
$ ip route show 198.51.100.0/24
198.51.100.0/24
nexthop via 192.0.2.2 dev dummy1 weight 1
nexthop via 192.0.2.18 dev dummy2 weight 1 dead linkdown
$ ip route get 198.51.100.1 from 192.0.2.17
198.51.100.1 from 192.0.2.17 via 192.0.2.18 dev dummy2 uid 0
Solve this by skipping over nexthops whose assigned hash upper bound is
minus one, which is the value assigned to nexthops that do not have a
carrier when the "ignore_routes_with_linkdown" sysctl is set.
In practice, this probably does not matter a lot as the initial route
lookup for the source address would not choose a nexthop that does not
have a carrier in the first place, but the change does make the code
clearer.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzkaller reported out-of-bounds read in ipv6_addr_prefix(),
where the prefix length was over 128.
The cited commit accidentally removed some fib6_config
validation from the ioctl path.
Let's restore the validation.
[0]:
BUG: KASAN: slab-out-of-bounds in ip6_route_info_create (./include/net/ipv6.h:616 net/ipv6/route.c:3814)
Read of size 1 at addr ff11000138020ad4 by task repro/261
CPU: 3 UID: 0 PID: 261 Comm: repro Not tainted 6.15.0-rc3-00614-g0d15a26b247d #87 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:123)
print_report (mm/kasan/report.c:409 mm/kasan/report.c:521)
kasan_report (mm/kasan/report.c:636)
ip6_route_info_create (./include/net/ipv6.h:616 net/ipv6/route.c:3814)
ip6_route_add (net/ipv6/route.c:3902)
ipv6_route_ioctl (net/ipv6/route.c:4523)
inet6_ioctl (net/ipv6/af_inet6.c:577)
sock_do_ioctl (net/socket.c:1190)
sock_ioctl (net/socket.c:1314)
__x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:906 fs/ioctl.c:892 fs/ioctl.c:892)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f518fb2de5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007fff14f38d18 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f518fb2de5d
RDX: 00000000200015c0 RSI: 000000000000890b RDI: 0000000000000003
RBP: 00007fff14f38d30 R08: 0000000000000800 R09: 0000000000000800
R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff14f38e48
R13: 0000000000401136 R14: 0000000000403df0 R15: 00007f518fd3c000
</TASK>
Fixes: fa76c1674f2e ("ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: Yi Lai <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/netdev/aBAcKDEFoN%2FLntBF@ly-workstation/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250501005335.53683-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ever since commit f5f80e32de12 ("ipv6: remove hard coded limitation on
ipv6_pinfo") that protocols stopped using the old "obj_size -
sizeof(struct ipv6_pinfo)" way of grabbing ipv6_pinfo, that severely
restricted struct layout and caused fun, hard to see issues.
However, mptcp_inet6_sk wasn't fixed (unlike tcp_inet6_sk). Do so.
The non-cloned sockets already do the right thing using
ipv6_pinfo_offset + the generic IPv6 code.
Signed-off-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250430154541.1038561-1-pfalcato@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Russell King says:
====================
net: stmmac: replace speed_mode_2500() method
This series replaces the speed_mode_2500() method with a new method
that is more flexible, allowing the platform glue driver to populate
phylink's supported_interfaces and set the PHY-side interface mode.
The only user of this method is currently dwmac-intel, which we
update to use this new method.
====================
Link: https://patch.msgid.link/aBNe0Vt81vmqVCma@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the speed_mode_2500() platform method which is no longer used
or necessary, being superseded by the more flexible get_interfaces()
method.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uASM3-0021R3-2B@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TGL platforms support either SGMII or 2500BASE-X, which is determined
by reading a SERDES register.
Thus, plat->phy_interface (and phylink's supported_interfaces) depend
on this. Use the new .get_interfaces() method to set both
plat->phy_interface and the supported_interfaces bitmap.
This removes the only user of the .speed_mode_2500() method.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uASLx-0021Qs-Uz@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the initialisation of plat->phy_interface to tgl_common_data()
as all callers set this same interface mode. This moves it to a
single location to make the change to get_interfaces() more obvious.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uASLs-0021Qk-Qt@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a get_interfaces() platform method to allow platforms to indicate
to phylink which interface modes they support - which then allows
phylink to validate on initialisation that the configured PHY interface
mode is actually supported.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1uASLn-0021Qd-Mi@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Avoid using a local variable for priv->plat->phy_interface as this
may be modified in the .get_interfaces() method added in a future
commit.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uASLi-0021QX-HG@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use a local variable for priv->phylink_config in stmmac_phy_setup()
which makes the code a bit easier to read, allowing some lines to be
merged.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1uASLd-0021QR-Cu@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
tools: ynl-gen: additional C types and classic netlink handling
This series is a bit of a random grab bag adding things we need
to generate code for rt-link.
First two patches are pretty random code cleanups.
Patch 3 adds default values if the spec is missing them.
Patch 4 adds support for setting Netlink request flags
(NLM_F_CREATE, NLM_F_REPLACE etc.). Classic netlink uses those
quite a bit.
Patches 5 and 6 extend the notification handling for variations
used in classic netlink. Patch 6 adds support for when notification
ID is the same as the ID of the response message to GET.
Next 4 patches add support for handling a couple of complex types.
These are supported by the schema and Python but C code gen wasn't
there.
Patch 11 is a bit of a hack, it skips code related to kernel
policy generation, since we don't need it for classic netlink.
Patch 12 adds support for having different fixed headers per op.
Something we could avoid in previous rtnetlink specs but some
specs do mix.
v2: https://lore.kernel.org/20250425024311.1589323-1-kuba@kernel.org
v1: https://lore.kernel.org/20250424021207.1167791-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250429154704.2613851-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
rtnetlink has variety of ops with different fixed headers.
Detect that op fixed header is not the same as family one,
and use sizeof() directly. For reverse parsing we need to
pass the fixed header len along the policy (in the socket
state).
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-13-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
rt-link has a vlan-protocols enum with:
name: 8021q value: 33024
name: 8021ad value: 34984
It's nice to have, since it converts the values to strings in Python.
For C, however, the codegen is trying to use enums to generate strict
policy checks. Parsing such sparse enums is not possible via policies.
Since for classic netlink we don't support kernel codegen and policy
generation - skip the auto-generation of checks from enums.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-12-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
IPv6 addresses are expressed as binary arrays since we don't have u128.
Since they are not variable length, however, they are relatively
easy to represent as an array of known size.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-11-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
C codegen supports ArrayNest AKA indexed-array carrying scalars,
but only for the netlink -> struct parsing. Support rendering
from struct to netlink.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-10-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Binary types with struct are fixed size, relatively easy to
handle for multi attr. Declare the member as a pointer.
Count the members, allocate an array, copy in the data.
Allow the netlink attr to be smaller or larger than our view
of the struct in case the build headers are newer or older
than the running kernel.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-9-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for multi attr strings (needed for link alt_names).
We record the length individual strings in a len member, to do
the same for multi-attr create a struct ynl_string in ynl.h
and use it as a layer holding both the string and its length.
Since strings may be arbitrary length dynamically allocate each
individual one.
Adjust arg_member and struct member to avoid spacing the double
pointers to get "type **name;" rather than "type * *name;"
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-8-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Allow CRUD-style notification where the notification is more
like the response to the request, which can optionally be
looped back onto the requesting socket. Since the notification
and request are different ops in the spec, for example:
-
name: delrule
doc: Remove an existing FIB rule
attribute-set: fib-rule-attrs
do:
request:
value: 33
attributes: *fib-rule-all
-
name: delrule-ntf
doc: Notify a rule deletion
value: 33
notify: getrule
We need to find the request by ID. Ideally we'd detect this model
from the spec properties, rather than assume that its what all
classic netlink families do. But maybe that'd cause this model
to spread and its easy to get wrong. For now assume CRUD == classic.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-7-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Classic Netlink has GET callbacks with no doit support, just dumps.
Support using their responses in notifications. If notification points
at a type which only has a dump - use the dump's type.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-6-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Classic netlink makes extensive use of flags. Support specifying
them the same way as attributes are specified (using a helper),
for example:
rt_link_newlink_req_set_nlflags(req, NLM_F_CREATE | NLM_F_ECHO);
Wrap the code up in a RenderInfo predicate. I think that some
genetlink families may want this, too. It should be easy to
add a spec property later.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-5-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The C codegen refers to op attribute lists all over the place,
without checking if they are present, even tho attribute list
is technically an optional property. Add them automatically
at init if missing so that we don't have to make specs longer.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-4-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Instead of walking the entries in the code gen add a method
for the struct class to return if any of the members need
an iterator.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-3-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The dict stores struct objects (of class Struct), not just
a trivial set with directions.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250429154704.2613851-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Rewrite the textual description for the VIA Rhine platform Ethernet
controller as YAML schema, and switch the filename to follow the
compatible string. These are used in several VIA/WonderMedia SoCs
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250430-rhine-binding-v2-1-4290156c0f57@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
setup
Recent updates to the locking mechanism that protects IPv6 routing tables
[1] have affected the SRv6 networking subsystem. Such changes cause
problems with some SRv6 Endpoints behaviors, like End.B6.Encaps and also
impact SRv6 counters.
Starting from commit 169fd62799e8 ("ipv6: Get rid of RTNL for SIOCADDRT and
RTM_NEWROUTE."), the inet6_rtm_newroute() function no longer needs to
acquire the RTNL lock for creating and configuring IPv6 routes and set up
lwtunnels.
The RTNL lock can be avoided because the ip6_route_add() function
finishes setting up a new route in a section protected by RCU.
This makes sure that no dev/nexthops can disappear during the operation.
Because of this, the steps for setting up lwtunnels - i.e., calling
lwtunnel_build_state() - are now done in a RCU lock section and not
under the RTNL lock anymore.
However, creating and configuring a lwtunnel instance in an
RCU-protected section can be problematic when that tunnel needs to
allocate memory using the GFP_KERNEL flag.
For example, the following trace shows what happens when an SRv6
End.B6.Encaps behavior is instantiated after commit 169fd62799e8 ("ipv6:
Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE."):
[ 3061.219696] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321
[ 3061.226136] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 445, name: ip
[ 3061.232101] preempt_count: 0, expected: 0
[ 3061.235414] RCU nest depth: 1, expected: 0
[ 3061.238622] 1 lock held by ip/445:
[ 3061.241458] #0: ffffffff83ec64a0 (rcu_read_lock){....}-{1:3}, at: ip6_route_add+0x41/0x1e0
[ 3061.248520] CPU: 1 UID: 0 PID: 445 Comm: ip Not tainted 6.15.0-rc3-micro-vm-dev-00590-ge527e891492d #2058 PREEMPT(full)
[ 3061.248532] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 3061.248549] Call Trace:
[ 3061.248620] <TASK>
[ 3061.248633] dump_stack_lvl+0xa9/0xc0
[ 3061.248846] __might_resched+0x218/0x360
[ 3061.248871] __kmalloc_node_track_caller_noprof+0x332/0x4e0
[ 3061.248889] ? rcu_is_watching+0x3a/0x70
[ 3061.248902] ? parse_nla_srh+0x56/0xa0
[ 3061.248938] kmemdup_noprof+0x1c/0x40
[ 3061.248952] parse_nla_srh+0x56/0xa0
[ 3061.248969] seg6_local_build_state+0x2e0/0x580
[ 3061.248992] ? __lock_acquire+0xaff/0x1cd0
[ 3061.249013] ? do_raw_spin_lock+0x111/0x1d0
[ 3061.249027] ? __pfx_seg6_local_build_state+0x10/0x10
[ 3061.249068] ? lwtunnel_build_state+0xe1/0x3a0
[ 3061.249274] lwtunnel_build_state+0x10d/0x3a0
[ 3061.249303] fib_nh_common_init+0xce/0x1e0
[ 3061.249337] ? __pfx_fib_nh_common_init+0x10/0x10
[ 3061.249352] ? in6_dev_get+0xaf/0x1f0
[ 3061.249369] ? __rcu_read_unlock+0x64/0x2e0
[ 3061.249392] fib6_nh_init+0x290/0xc30
[ 3061.249422] ? __pfx_fib6_nh_init+0x10/0x10
[ 3061.249447] ? __lock_acquire+0xaff/0x1cd0
[ 3061.249459] ? _raw_spin_unlock_irqrestore+0x22/0x70
[ 3061.249624] ? ip6_route_info_create+0x423/0x520
[ 3061.249641] ? rcu_is_watching+0x3a/0x70
[ 3061.249683] ip6_route_info_create_nh+0x190/0x390
[ 3061.249715] ip6_route_add+0x71/0x1e0
[ 3061.249730] ? __pfx_inet6_rtm_newroute+0x10/0x10
[ 3061.249743] inet6_rtm_newroute+0x426/0xc50
[ 3061.249764] ? avc_has_perm_noaudit+0x13d/0x360
[ 3061.249853] ? __pfx_inet6_rtm_newroute+0x10/0x10
[ 3061.249905] ? __lock_acquire+0xaff/0x1cd0
[ 3061.249962] ? rtnetlink_rcv_msg+0x52f/0x890
[ 3061.249996] ? __pfx_inet6_rtm_newroute+0x10/0x10
[ 3061.250012] rtnetlink_rcv_msg+0x551/0x890
[ 3061.250040] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 3061.250065] ? __lock_acquire+0xaff/0x1cd0
[ 3061.250092] netlink_rcv_skb+0xbd/0x1f0
[ 3061.250108] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 3061.250124] ? __pfx_netlink_rcv_skb+0x10/0x10
[ 3061.250179] ? netlink_deliver_tap+0x10b/0x700
[ 3061.250210] netlink_unicast+0x2e7/0x410
[ 3061.250232] ? __pfx_netlink_unicast+0x10/0x10
[ 3061.250241] ? __lock_acquire+0xaff/0x1cd0
[ 3061.250280] netlink_sendmsg+0x366/0x670
[ 3061.250306] ? __pfx_netlink_sendmsg+0x10/0x10
[ 3061.250313] ? find_held_lock+0x2d/0xa0
[ 3061.250344] ? import_ubuf+0xbc/0xf0
[ 3061.250370] ? __pfx_netlink_sendmsg+0x10/0x10
[ 3061.250381] __sock_sendmsg+0x13e/0x150
[ 3061.250420] ____sys_sendmsg+0x33d/0x450
[ 3061.250442] ? __pfx_____sys_sendmsg+0x10/0x10
[ 3061.250453] ? __pfx_copy_msghdr_from_user+0x10/0x10
[ 3061.250489] ? __pfx_slab_free_after_rcu_debug+0x10/0x10
[ 3061.250514] ___sys_sendmsg+0xe5/0x160
[ 3061.250530] ? __pfx____sys_sendmsg+0x10/0x10
[ 3061.250568] ? __lock_acquire+0xaff/0x1cd0
[ 3061.250617] ? find_held_lock+0x2d/0xa0
[ 3061.250678] ? __virt_addr_valid+0x199/0x340
[ 3061.250704] ? preempt_count_sub+0xf/0xc0
[ 3061.250736] __sys_sendmsg+0xca/0x140
[ 3061.250750] ? __pfx___sys_sendmsg+0x10/0x10
[ 3061.250786] ? syscall_exit_to_user_mode+0xa2/0x1e0
[ 3061.250825] do_syscall_64+0x62/0x140
[ 3061.250844] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 3061.250855] RIP: 0033:0x7f0b042ef914
[ 3061.250868] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 e9 5d 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53
[ 3061.250876] RSP: 002b:00007ffc2d113ef8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 3061.250885] RAX: ffffffffffffffda RBX: 00000000680f93fa RCX: 00007f0b042ef914
[ 3061.250891] RDX: 0000000000000000 RSI: 00007ffc2d113f60 RDI: 0000000000000003
[ 3061.250897] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000008
[ 3061.250902] R10: fffffffffffff26d R11: 0000000000000246 R12: 0000000000000001
[ 3061.250907] R13: 000055a961f8a520 R14: 000055a961f63eae R15: 00007ffc2d115270
[ 3061.250952] </TASK>
To solve this issue, we replace the GFP_KERNEL flag with the GFP_ATOMIC
one in those SRv6 Endpoints that need to allocate memory during the
setup phase. This change makes sure that memory allocations are handled
in a way that works with RCU critical sections.
[1] - https://lore.kernel.org/all/20250418000443.43734-1-kuniyu@amazon.com/
Fixes: 169fd62799e8 ("ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE.")
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250429132453.31605-1-andrea.mayer@uniroma2.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After 52358dd63e34 ("net: phy: remove function stubs") there's a
problem if CONFIG_MDIO_BUS is set, but CONFIG_PHYLIB is not.
mdiobus_scan() uses phylib functions like get_phy_device().
Bringing back the stub wouldn't make much sense, because it would
allow to compile mdiobus_scan(), but the function would be unusable.
The stub returned NULL, and we have the following in mdiobus_scan():
phydev = get_phy_device(bus, addr, c45);
if (IS_ERR(phydev))
return phydev;
So calling mdiobus_scan() w/o CONFIG_PHYLIB would cause a crash later in
mdiobus_scan(). In general the PHYLIB functionality isn't optional here.
Consequently, MDIO bus providers depend on PHYLIB.
Therefore factor it out and build it together with the libphy core
modules. In addition make all MDIO bus providers under /drivers/net/mdio
depend on PHYLIB. Same applies to enetc MDIO bus provider. Note that
PHYLIB selects MDIO_DEVRES, therefore we can omit this here.
Fixes: 52358dd63e34 ("net: phy: remove function stubs")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504270639.mT0lh2o1-lkp@intel.com/
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/c74772a9-dab6-44bf-a657-389df89d85c2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MediaTek MT7988 SoC comes with an single built-in Ethernet PHY for
2500Base-T/1000Base-T/100Base-TX/10Base-T link partners in addition to
the built-in 1GE switch. The built-in PHY only supports full duplex.
Add muxes allowing to select GMAC2->2.5G PHY path and add basic support
for XGMAC as the built-in 2.5G PHY is internally connected via XGMII.
The XGMAC features will also be used by 5GBase-R, 10GBase-R and USXGMII
SerDes modes which are going to be added once support for standalone PCS
drivers is in place.
In order to make use of the built-in 2.5G PHY the appropriate PHY driver
as well as (proprietary) PHY firmware has to be present as well.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/9072cefbff6db969720672ec98ed5cef65e8218c.1745715380.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This user of SHA-256 does not support any other algorithm, so the
crypto_shash abstraction provides no value. Just use the SHA-256
library API instead, which is much simpler and easier to use.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://patch.msgid.link/20250428191606.856198-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The RTL8211F supports multiple WOL modes. This patch adds support for
magic packets.
The PHY notifies the system via the INTB/PMEB pin when a WOL event
occurs.
Signed-off-by: Daniel Braunwarth <daniel.braunwarth@kuka.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250429-realtek_wol-v2-1-8f84def1ef2c@kuka.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ensure the following prerequisites before executing the test:
1. 'socat' is installed on the remote host.
2. Python version supports socket.SO_INCOMING_CPU (available since v3.11).
Skip the test if either prerequisite is not met.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250430054801.750646-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allwinner A523 SoC variant (A527/T527) contains an "EMAC0" Ethernet
MAC compatible to the A64 version.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yixun Lan <dlan@gentoo.org>
Link: https://patch.msgid.link/20250430-01-sun55i-emac0-v3-2-6fc000bbccbd@gentoo.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-04-29 (igb, igc, ixgbe, idpf)
For igb:
Kurt Kanzenbach adds linking of IRQs and queues to NAPI instances and
adds persistent NAPI config. Lastly, he removes undesired IRQs that
occur while busy polling.
For igc:
Kurt Kanzenbach switches the Tx mode for MQPRIO offload to harmonize the
current implementation with TAPRIO.
For ixgbe:
Jedrzej adds separate ethtool ops for E610 devices to account for device
differences.
Slawomir adds devlink region support for E610 devices.
For idpf:
Mateusz assigns and utilizes the ptype field out of libeth_rqe_info.
Michal removes unreachable code.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: remove unreachable code from setting mailbox
idpf: assign extracted ptype to struct libeth_rqe_info field
ixgbe: devlink: add devlink region support for E610
ixgbe: add E610 .set_phys_id() callback implementation
ixgbe: apply different rules for setting FC on E610
ixgbe: add support for ACPI WOL for E610
ixgbe: create E610 specific ethtool_ops structure
igc: Change Tx mode for MQPRIO offloading
igc: Limit netdev_tc calls to MQPRIO
igb: Get rid of spurious interrupts
igb: Add support for persistent NAPI config
igb: Link queues to NAPI instances
igb: Link IRQs to NAPI instances
====================
Link: https://patch.msgid.link/20250429234651.3982025-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc5).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Happy May Day.
Things have calmed down on our end (knock on wood), no outstanding
investigations. Including fixes from Bluetooth and WiFi.
Current release - fix to a fix:
- igc: fix lock order in igc_ptp_reset
Current release - new code bugs:
- Revert "wifi: iwlwifi: make no_160 more generic", fixes regression
to Killer line of devices reported by a number of people
- Revert "wifi: iwlwifi: add support for BE213", initial FW is too
buggy
- number of fixes for mld, the new Intel WiFi subdriver
Previous releases - regressions:
- wifi: mac80211: restore monitor for outgoing frames
- drv: vmxnet3: fix malformed packet sizing in vmxnet3_process_xdp
- eth: bnxt_en: fix timestamping FIFO getting out of sync on reset,
delivering stale timestamps
- use sock_gen_put() in the TCP fraglist GRO heuristic, don't assume
every socket is a full socket
Previous releases - always broken:
- sched: adapt qdiscs for reentrant enqueue cases, fix list
corruptions
- xsk: fix race condition in AF_XDP generic RX path, shared UMEM
can't be protected by a per-socket lock
- eth: mtk-star-emac: fix spinlock recursion issues on rx/tx poll
- btusb: avoid NULL pointer dereference in skb_dequeue()
- dsa: felix: fix broken taprio gate states after clock jump"
* tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
net: vertexcom: mse102x: Fix RX error handling
net: vertexcom: mse102x: Add range check for CMD_RTS
net: vertexcom: mse102x: Fix LEN_MASK
net: vertexcom: mse102x: Fix possible stuck of SPI interrupt
net: hns3: defer calling ptp_clock_register()
net: hns3: fixed debugfs tm_qset size
net: hns3: fix an interrupt residual problem
net: hns3: store rx VLAN tag offload state for VF
octeon_ep: Fix host hang issue during device reboot
net: fec: ERR007885 Workaround for conventional TX
net: lan743x: Fix memleak issue when GSO enabled
ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations
net: use sock_gen_put() when sk_state is TCP_TIME_WAIT
bnxt_en: fix module unload sequence
bnxt_en: Fix ethtool -d byte order for 32-bit values
bnxt_en: Fix out-of-bound memcpy() during ethtool -w
bnxt_en: Fix coredump logic to free allocated buffer
bnxt_en: delay pci_alloc_irq_vectors() in the AER path
bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()
bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()
...
|
|
Stefan Wahren says:
====================
net: vertexcom: mse102x: Fix RX handling
This series is the first part of two series for the Vertexcom driver.
It contains substantial fixes for the RX handling of the Vertexcom MSE102x.
====================
Link: https://patch.msgid.link/20250430133043.7722-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In case the CMD_RTS got corrupted by interferences, the MSE102x
doesn't allow a retransmission of the command. Instead the Ethernet
frame must be shifted out of the SPI FIFO. Since the actual length is
unknown, assume the maximum possible value.
Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-5-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since there is no protection in the SPI protocol against electrical
interferences, the driver shouldn't blindly trust the length payload
of CMD_RTS. So introduce a bounds check for incoming frames.
Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-4-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The LEN_MASK for CMD_RTS doesn't cover the whole parameter mask.
The Bit 11 is reserved, so adjust LEN_MASK accordingly.
Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MSE102x doesn't provide any SPI commands for interrupt handling.
So in case the interrupt fired before the driver requests the IRQ,
the interrupt will never fire again. In order to fix this always poll
for pending packets after opening the interface.
Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jijie Shao says:
====================
There are some bugfix for the HNS3 ethernet driver
====================
Link: https://patch.msgid.link/20250430093052.2400464-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently the ptp_clock_register() is called before relative
ptp resource ready. It may cause unexpected result when upper
layer called the ptp API during the timewindow. Fix it by
moving the ptp_clock_register() to the function end.
Fixes: 0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250430093052.2400464-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The size of the tm_qset file of debugfs is limited to 64 KB,
which is too small in the scenario with 1280 qsets.
The size needs to be expanded to 1 MB.
Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250430093052.2400464-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a VF is passthrough to a VM, and the VM is killed, the reported
interrupt may not been handled, it will remain, and won't be clear by
the nic engine even with a flr or tqp reset. When the VM restart, the
interrupt of the first vector may be dropped by the second enable_irq
in vfio, see the issue below:
https://gitlab.com/qemu-project/qemu/-/issues/2884#note_2423361621
We notice that the vfio has always behaved this way, and the interrupt
is a residue of the nic engine, so we fix the problem by moving the
vector enable process out of the enable_irq loop.
Fixes: 08a100689d4b ("net: hns3: re-organize vector handle")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250430093052.2400464-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The VF driver missed to store the rx VLAN tag strip state when
user change the rx VLAN tag offload state. And it will default
to enable the rx vlan tag strip when re-init VF device after
reset. So if user disable rx VLAN tag offload, and trig reset,
then the HW will still strip the VLAN tag from packet nad fill
into RX BD, but the VF driver will ignore it for rx VLAN tag
offload disabled. It may cause the rx VLAN tag dropped.
Fixes: b2641e2ad456 ("net: hns3: Add support of hardware rx-vlan-offload to HNS3 VF driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250430093052.2400464-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2025-04-29 (idpf, igc)
For idpf:
Michal fixes error path handling to remove memory leak.
Larysa prevents reset from being called during shutdown.
For igc:
Jake adjusts locking order to resolve sleeping in atomic context.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igc: fix lock order in igc_ptp_reset
idpf: protect shutdown from reset
idpf: fix potential memory leak on kcalloc() failure
====================
Link: https://patch.msgid.link/20250429221034.3909139-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the host loses heartbeat messages from the device,
the driver calls the device-specific ndo_stop function,
which frees the resources. If the driver is unloaded in
this scenario, it calls ndo_stop again, attempting to free
resources that have already been freed, leading to a host
hang issue. To resolve this, dev_close should be called
instead of the device-specific stop function.dev_close
internally calls ndo_stop to stop the network interface
and performs additional cleanup tasks. During the driver
unload process, if the device is already down, ndo_stop
is not called.
Fixes: 5cb96c29aa0e ("octeon_ep: add heartbeat monitor")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250429114624.19104-1-sedara@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|