diff options
author | Paul Chaignon <paul.chaignon@gmail.com> | 2025-05-05 21:58:04 +0200 |
---|---|---|
committer | Jakub Kicinski <kuba@kernel.org> | 2025-05-07 18:16:33 -0700 |
commit | c4327229948879814229b46aa26a750718888503 (patch) | |
tree | 2377823f49796be448f7378d11c6c8bf4fef4719 | |
parent | 4a7843cc8a41b9612becccc07715ed017770eb89 (diff) |
bpf: Scrub packet on bpf_redirect_peer
When bpf_redirect_peer is used to redirect packets to a device in
another network namespace, the skb isn't scrubbed. That can lead skb
information from one namespace to be "misused" in another namespace.
As one example, this is causing Cilium to drop traffic when using
bpf_redirect_peer to redirect packets that just went through IPsec
decryption to a container namespace. The following pwru trace shows (1)
the packet path from the host's XFRM layer to the container's XFRM
layer where it's dropped and (2) the number of active skb extensions at
each function.
NETNS MARK IFACE TUPLE FUNC
4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 xfrm_rcv_cb
.active_extensions = (__u8)2,
4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 xfrm4_rcv_cb
.active_extensions = (__u8)2,
4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 gro_cells_receive
.active_extensions = (__u8)2,
[...]
4026533547 0 eth0 10.244.3.124:35473->10.244.2.158:53 skb_do_redirect
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 ip_rcv
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 ip_rcv_core
.active_extensions = (__u8)2,
[...]
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 udp_queue_rcv_one_skb
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 __xfrm_policy_check
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 __xfrm_decode_session
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 security_xfrm_decode_session
.active_extensions = (__u8)2,
4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
.active_extensions = (__u8)2,
In this case, there are no XFRM policies in the container's network
namespace so the drop is unexpected. When we decrypt the IPsec packet,
the XFRM state used for decryption is set in the skb extensions. This
information is preserved across the netns switch. When we reach the
XFRM policy check in the container's netns, __xfrm_policy_check drops
the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
policy can't be found that matches the (host-side) XFRM state used for
decryption.
This patch fixes this by scrubbing the packet when using
bpf_redirect_peer, as is done on typical netns switches via veth
devices except skb->mark and skb->tstamp are not zeroed.
Fixes: 9aa1206e8f482 ("bpf: Add redirect_peer helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/1728ead5e0fe45e7a6542c36bd4e3ca07a73b7d6.1746460653.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-rw-r--r-- | net/core/filter.c | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/net/core/filter.c b/net/core/filter.c index 79cab4d78dc3..577a4504e26f 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2509,6 +2509,7 @@ int skb_do_redirect(struct sk_buff *skb) goto out_drop; skb->dev = dev; dev_sw_netstats_rx_add(dev, skb->len); + skb_scrub_packet(skb, false); return -EAGAIN; } return flags & BPF_F_NEIGH ? |