path: root/net
diff options
authorDavid Ahern <>2021-06-12 18:24:59 -0600
committerDavid S. Miller <>2021-06-14 12:30:53 -0700
commitb87b04f5019e821c8c6c7761f258402e43500a1f (patch)
tree2ed3760e3200588d63e10f077eb4ea6db37a4bf5 /net
parent58af3d3d54e87bfc1f936e16c04ade3369d34011 (diff)
ipv4: Fix device used for dst_alloc with local routes
Oliver reported a use case where deleting a VRF device can hang waiting for the refcnt to drop to 0. The root cause is that the dst is allocated against the VRF device but cached on the loopback device. The use case (added to the selftests) has an implicit VRF crossing due to the ordering of the FIB rules (lookup local is before the l3mdev rule, but the problem occurs even if the FIB rules are re-ordered with local after l3mdev because the VRF table does not have a default route to terminate the lookup). The end result is is that the FIB lookup returns the loopback device as the nexthop, but the ingress device is in a VRF. The mismatch causes the dst alloc against the VRF device but then cached on the loopback. The fix is to bring the trick used for IPv6 (see ip6_rt_get_dev_rcu): pick the dst alloc device based the fib lookup result but with checks that the result has a nexthop device (e.g., not an unreachable or prohibit entry). Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant") Reported-by: Oliver Herms <> Signed-off-by: David Ahern <> Signed-off-by: David S. Miller <>
Diffstat (limited to 'net')
1 files changed, 14 insertions, 1 deletions
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f6787c55f6ab..6a36ac98476f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2056,6 +2056,19 @@ martian_source:
return err;
+/* get device for dst_alloc with local routes */
+static struct net_device *ip_rt_get_dev(struct net *net,
+ const struct fib_result *res)
+ struct fib_nh_common *nhc = res->fi ? res->nhc : NULL;
+ struct net_device *dev = NULL;
+ if (nhc)
+ dev = l3mdev_master_dev_rcu(nhc->nhc_dev);
+ return dev ? : net->loopback_dev;
* NOTE. We drop all the packets that has local source
* addresses, because every properly looped back packet
@@ -2212,7 +2225,7 @@ local_input:
- rth = rt_dst_alloc(l3mdev_master_dev_rcu(dev) ? : net->loopback_dev,
+ rth = rt_dst_alloc(ip_rt_get_dev(net, res),
flags | RTCF_LOCAL, res->type,
IN_DEV_ORCONF(in_dev, NOPOLICY), false);
if (!rth)