summaryrefslogtreecommitdiff
path: root/net/xfrm
AgeCommit message (Collapse)Author
2024-12-05xfrm: iptfs: share page fragments of inner packetsChristian Hopps
When possible rather than appending secondary (aggregated) inner packets to the fragment list, share their page fragments with the outer IPTFS packet. This allows for more efficient packet transmission. Signed-off-by: Christian Hopps <chopps@labn.net> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-12-05xfrm: iptfs: add user packet (tunnel ingress) handlingChristian Hopps
Add tunnel packet output functionality. This is code handles the ingress to the tunnel. Signed-off-by: Christian Hopps <chopps@labn.net> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-12-05xfrm: iptfs: add new iptfs xfrm mode implChristian Hopps
Add a new xfrm mode implementing AggFrag/IP-TFS from RFC9347. This utilizes the new xfrm_mode_cbs to implement demand-driven IP-TFS functionality. This functionality can be used to increase bandwidth utilization through small packet aggregation, as well as help solve PMTU issues through it's efficient use of fragmentation. Link: https://www.rfc-editor.org/rfc/rfc9347.txt Multiple commits follow to build the functionality into xfrm_iptfs.c Signed-off-by: Christian Hopps <chopps@labn.net> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-12-05xfrm: add generic iptfs defines and functionalityChristian Hopps
Define `XFRM_MODE_IPTFS` and `IPSEC_MODE_IPTFS` constants, and add these to switch case and conditionals adjacent with the existing TUNNEL modes. Signed-off-by: Christian Hopps <chopps@labn.net> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-12-05xfrm: add mode_cbs module functionalityChristian Hopps
Add a set of callbacks xfrm_mode_cbs to xfrm_state. These callbacks enable the addition of new xfrm modes, such as IP-TFS to be defined in modules. Signed-off-by: Christian Hopps <chopps@labn.net> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-12-05xfrm: netlink: add config (netlink) optionsChristian Hopps
Add netlink options for configuring IP-TFS SAs. Signed-off-by: Christian Hopps <chopps@labn.net> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-12-05xfrm: config: add CONFIG_XFRM_IPTFSChristian Hopps
Add new Kconfig option to enable IP-TFS (RFC9347) functionality. Signed-off-by: Christian Hopps <chopps@labn.net> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-12-05xfrm: state: fix out-of-bounds read during lookupFlorian Westphal
lookup and resize can run in parallel. The xfrm_state_hash_generation seqlock ensures a retry, but the hash functions can observe a hmask value that is too large for the new hlist array. rehash does: rcu_assign_pointer(net->xfrm.state_bydst, ndst) [..] net->xfrm.state_hmask = nhashmask; While state lookup does: h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family); hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) { This is only safe in case the update to state_bydst is larger than net->xfrm.xfrm_state_hmask (or if the lookup function gets serialized via state spinlock again). Fix this by prefetching state_hmask and the associated pointers. The xfrm_state_hash_generation seqlock retry will ensure that the pointer and the hmask will be consistent. The existing helpers, like xfrm_dst_hash(), are now unsafe for RCU side, add lockdep assertions to document that they are only safe for insert side. xfrm_state_lookup_byaddr() uses the spinlock rather than RCU. AFAICS this is an oversight from back when state lookup was converted to RCU, this lock should be replaced with RCU in a future patch. Reported-by: syzbot+5f9f31cb7d985f584d8e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/CACT4Y+azwfrE3uz6A5ZErov5YN2LYBN5KrsymBerT36VU8qzBA@mail.gmail.com/ Diagnosed-by: Dmitry Vyukov <dvyukov@google.com> Fixes: c2f672fc9464 ("xfrm: state lookup can be lockless") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-25xfrm: replay: Fix the update of replay_esn->oseq_hi for GSOJianbo Liu
When skb needs GSO and wrap around happens, if xo->seq.low (seqno of the first skb segment) is before the last seq number but oseq (seqno of the last segment) is after it, xo->seq.low is still bigger than replay_esn->oseq while oseq is smaller than it, so the update of replay_esn->oseq_hi is missed for this case wrap around because of the change in the cited commit. For example, if sending a packet with gso_segs=3 while old replay_esn->oseq=0xfffffffe, we calculate: xo->seq.low = 0xfffffffe + 1 = 0x0xffffffff oseq = 0xfffffffe + 3 = 0x1 (oseq < replay_esn->oseq) is true, but (xo->seq.low < replay_esn->oseq) is false, so replay_esn->oseq_hi is not incremented. To fix this issue, change the outer checking back for the update of replay_esn->oseq_hi. And add new checking inside for the update of packet's oseq_hi. Fixes: 4b549ccce941 ("xfrm: replay: Fix ESN wrap around for GSO") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-18Merge tag 'ipsec-next-2024-11-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next-11-15 1) Add support for RFC 9611 per cpu xfrm state handling. 2) Add inbound and outbound xfrm state caches to speed up state lookups. 3) Convert xfrm to dscp_t. From Guillaume Nault. 4) Fix error handling in build_aevent. From Everest K.C. 5) Replace strncpy with strscpy_pad in copy_to_user_auth. From Daniel Yang. 6) Fix an uninitialized symbol during acquire state insertion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-15xfrm: Fix acquire state insertion.Steffen Klassert
A recent commit jumped over the dst hash computation and left the symbol uninitialized. Fix this by explicitly computing the dst hash before it is used. Fixes: 0045e3d80613 ("xfrm: Cache used outbound xfrm states at the policy.") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-14xfrm: replace deprecated strncpy with strscpy_padDaniel Yang
The function strncpy is deprecated since it does not guarantee the destination buffer is NULL terminated. Recommended replacement is strscpy. The padded version was used to remain consistent with the other strscpy_pad usage in the modified function. Signed-off-by: Daniel Yang <danielyangkang@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-14xfrm: Add error handling when nla_put_u32() returns an errorEverest K.C
Error handling is missing when call to nla_put_u32() fails. Handle the error when the call to nla_put_u32() returns an error. The error was reported by Coverity Scan. Report: CID 1601525: (#1 of 1): Unused value (UNUSED_VALUE) returned_value: Assigning value from nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num) to err here, but that stored value is overwritten before it can be used Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Everest K.C. <everestkc@everestkc.com.np> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-11net: convert to nla_get_*_default()Johannes Berg
Most of the original conversion is from the spatch below, but I edited some and left out other instances that were either buggy after conversion (where default values don't fit into the type) or just looked strange. @@ expression attr, def; expression val; identifier fn =~ "^nla_get_.*"; fresh identifier dfn = fn ## "_default"; @@ ( -if (attr) - val = fn(attr); -else - val = def; +val = dfn(attr, def); | -if (!attr) - val = def; -else - val = fn(attr); +val = dfn(attr, def); | -if (!attr) - return def; -return fn(attr); +return dfn(attr, def); | -attr ? fn(attr) : def +dfn(attr, def) | -!attr ? def : fn(attr) +dfn(attr, def) ) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://patch.msgid.link/20241108114145.0580b8684e7f.I740beeaa2f70ebfc19bfca1045a24d6151992790@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06xfrm: Convert struct xfrm_dst_lookup_params -> tos to dscp_t.Guillaume Nault
Add type annotation to the "tos" field of struct xfrm_dst_lookup_params, to ensure that the ECN bits aren't mistakenly taken into account when doing route lookups. Rename that field (tos -> dscp) to make that change explicit. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-06xfrm: Convert xfrm_dst_lookup() to dscp_t.Guillaume Nault
Pass a dscp_t variable to xfrm_dst_lookup(), instead of an int, to prevent accidental setting of ECN bits in ->flowi4_tos. Only xfrm_bundle_create() actually calls xfrm_dst_lookup(). Since it already has a dscp_t variable to pass as parameter, we only need to remove the inet_dscp_to_dsfield() conversion. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-06xfrm: Convert xfrm_bundle_create() to dscp_t.Guillaume Nault
Use a dscp_t variable to store the result of xfrm_get_dscp(). This prepares for the future conversion of xfrm_dst_lookup(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-06xfrm: Convert xfrm_get_tos() to dscp_t.Guillaume Nault
Return a dscp_t variable to prepare for the future conversion of xfrm_bundle_create() to dscp_t. While there, rename the function "xfrm_get_dscp", to align its name with the new return type. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-10-29xfrm: Restrict percpu SA attribute to specific netlink message typesSteffen Klassert
Reject the usage of XFRMA_SA_PCPU in xfrm netlink messages when it's not applicable. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29xfrm: Add an inbound percpu state cache.Steffen Klassert
Now that we can have percpu xfrm states, the number of active states might increase. To get a better lookup performance, we add a percpu cache to cache the used inbound xfrm states. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29xfrm: Cache used outbound xfrm states at the policy.Steffen Klassert
Now that we can have percpu xfrm states, the number of active states might increase. To get a better lookup performance, we cache the used xfrm states at the policy for outbound IPsec traffic. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29xfrm: Add support for per cpu xfrm state handling.Steffen Klassert
Currently all flows for a certain SA must be processed by the same cpu to avoid packet reordering and lock contention of the xfrm state lock. To get rid of this limitation, the IETF standardized per cpu SAs in RFC 9611. This patch implements the xfrm part of it. We add the cpu as a lookup key for xfrm states and a config option to generate acquire messages for each cpu. With that, we can have on each cpu a SA with identical traffic selector so that flows can be processed in parallel on all cpus. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-24Merge tag 'ipsec-2024-10-22' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-10-22 1) Fix routing behavior that relies on L4 information for xfrm encapsulated packets. From Eyal Birger. 2) Remove leftovers of pernet policy_inexact lists. From Florian Westphal. 3) Validate new SA's prefixlen when the selector family is not set from userspace. From Sabrina Dubroca. 4) Fix a kernel-infoleak when dumping an auth algorithm. From Petr Vaganov. Please pull or let me know if there are problems. ipsec-2024-10-22 * tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix one more kernel-infoleak in algo dumping xfrm: validate new SA's prefixlen using SA family when sel.family is unset xfrm: policy: remove last remnants of pernet inexact list xfrm: respect ip protocols rules criteria when performing dst lookups xfrm: extract dst lookup parameters into a struct ==================== Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-11xfrm: fix one more kernel-infoleak in algo dumpingPetr Vaganov
During fuzz testing, the following issue was discovered: BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x598/0x2a30 _copy_to_iter+0x598/0x2a30 __skb_datagram_iter+0x168/0x1060 skb_copy_datagram_iter+0x5b/0x220 netlink_recvmsg+0x362/0x1700 sock_recvmsg+0x2dc/0x390 __sys_recvfrom+0x381/0x6d0 __x64_sys_recvfrom+0x130/0x200 x64_sys_call+0x32c8/0x3cc0 do_syscall_64+0xd8/0x1c0 entry_SYSCALL_64_after_hwframe+0x79/0x81 Uninit was stored to memory at: copy_to_user_state_extra+0xcc1/0x1e00 dump_one_state+0x28c/0x5f0 xfrm_state_walk+0x548/0x11e0 xfrm_dump_sa+0x1e0/0x840 netlink_dump+0x943/0x1c40 __netlink_dump_start+0x746/0xdb0 xfrm_user_rcv_msg+0x429/0xc00 netlink_rcv_skb+0x613/0x780 xfrm_netlink_rcv+0x77/0xc0 netlink_unicast+0xe90/0x1280 netlink_sendmsg+0x126d/0x1490 __sock_sendmsg+0x332/0x3d0 ____sys_sendmsg+0x863/0xc30 ___sys_sendmsg+0x285/0x3e0 __x64_sys_sendmsg+0x2d6/0x560 x64_sys_call+0x1316/0x3cc0 do_syscall_64+0xd8/0x1c0 entry_SYSCALL_64_after_hwframe+0x79/0x81 Uninit was created at: __kmalloc+0x571/0xd30 attach_auth+0x106/0x3e0 xfrm_add_sa+0x2aa0/0x4230 xfrm_user_rcv_msg+0x832/0xc00 netlink_rcv_skb+0x613/0x780 xfrm_netlink_rcv+0x77/0xc0 netlink_unicast+0xe90/0x1280 netlink_sendmsg+0x126d/0x1490 __sock_sendmsg+0x332/0x3d0 ____sys_sendmsg+0x863/0xc30 ___sys_sendmsg+0x285/0x3e0 __x64_sys_sendmsg+0x2d6/0x560 x64_sys_call+0x1316/0x3cc0 do_syscall_64+0xd8/0x1c0 entry_SYSCALL_64_after_hwframe+0x79/0x81 Bytes 328-379 of 732 are uninitialized Memory access of size 732 starts at ffff88800e18e000 Data copied to user address 00007ff30f48aff0 CPU: 2 PID: 18167 Comm: syz-executor.0 Not tainted 6.8.11 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Fixes copying of xfrm algorithms where some random data of the structure fields can end up in userspace. Padding in structures may be filled with random (possibly sensitve) data and should never be given directly to user-space. A similar issue was resolved in the commit 8222d5910dae ("xfrm: Zero padding when dumping algos and encap") Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: c7a5899eb26e ("xfrm: redact SA secret with lockdown confidentiality") Cc: stable@vger.kernel.org Co-developed-by: Boris Tonofa <b.tonofa@ideco.ru> Signed-off-by: Boris Tonofa <b.tonofa@ideco.ru> Signed-off-by: Petr Vaganov <p.vaganov@ideco.ru> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-10-07xfrm: validate new SA's prefixlen using SA family when sel.family is unsetSabrina Dubroca
This expands the validation introduced in commit 07bf7908950a ("xfrm: Validate address prefix lengths in the xfrm selector.") syzbot created an SA with usersa.sel.family = AF_UNSPEC usersa.sel.prefixlen_s = 128 usersa.family = AF_INET Because of the AF_UNSPEC selector, verify_newsa_info doesn't put limits on prefixlen_{s,d}. But then copy_from_user_state sets x->sel.family to usersa.family (AF_INET). Do the same conversion in verify_newsa_info before validating prefixlen_{s,d}, since that's how prefixlen is going to be used later on. Reported-by: syzbot+cc39f136925517aed571@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-09-24xfrm: policy: remove last remnants of pernet inexact listFlorian Westphal
xfrm_net still contained the no-longer-used inexact policy list heads, remove them. Fixes: a54ad727f745 ("xfrm: policy: remove remaining use of inexact list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-23xfrm: respect ip protocols rules criteria when performing dst lookupsEyal Birger
The series in the "fixes" tag added the ability to consider L4 attributes in routing rules. The dst lookup on the outer packet of encapsulated traffic in the xfrm code was not adapted to this change, thus routing behavior that relies on L4 information is not respected. Pass the ip protocol information when performing dst lookups. Fixes: a25724b05af0 ("Merge branch 'fib_rules-support-sport-dport-and-proto-match'") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-23xfrm: extract dst lookup parameters into a structEyal Birger
Preparation for adding more fields to dst lookup functions without changing their signatures. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-10Merge tag 'ipsec-next-2024-09-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-09-10 1) Remove an unneeded WARN_ON on packet offload. From Patrisious Haddad. 2) Add a copy from skb_seq_state to buffer function. This is needed for the upcomming IPTFS patchset. From Christian Hopps. 3) Spelling fix in xfrm.h. From Simon Horman. 4) Speed up xfrm policy insertions. From Florian Westphal. 5) Add and revert a patch to support xfrm interfaces for packet offload. This patch was just half cooked. 6) Extend usage of the new xfrm_policy_is_dead_or_sk helper. From Florian Westphal. 7) Update comments on sdb and xfrm_policy. From Florian Westphal. 8) Fix a null pointer dereference in the new policy insertion code From Florian Westphal. 9) Fix an uninitialized variable in the new policy insertion code. From Nathan Chancellor. * tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: policy: Restore dir assignments in xfrm_hash_rebuild() xfrm: policy: fix null dereference Revert "xfrm: add SA information to the offloaded packet" xfrm: minor update to sdb and xfrm_policy comments xfrm: policy: use recently added helper in more places xfrm: add SA information to the offloaded packet xfrm: policy: remove remaining use of inexact list xfrm: switch migrate to xfrm_policy_lookup_bytype xfrm: policy: don't iterate inexact policies twice at insert time selftests: add xfrm policy insertion speed test script xfrm: Correct spelling in xfrm.h net: add copy from skb_seq_state to buffer function xfrm: Remove documentation WARN_ON to limit return values for offloaded SA ==================== Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()Nathan Chancellor
Clang warns (or errors with CONFIG_WERROR): net/xfrm/xfrm_policy.c:1286:8: error: variable 'dir' is uninitialized when used here [-Werror,-Wuninitialized] 1286 | if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) { | ^~~ net/xfrm/xfrm_policy.c:1257:9: note: initialize the variable 'dir' to silence this warning 1257 | int dir; | ^ | = 0 1 error generated. A recent refactoring removed some assignments to dir because xfrm_policy_is_dead_or_sk() has a dir assignment in it. However, dir is used elsewhere in xfrm_hash_rebuild(), including within loops where it needs to be reloaded for each policy. Restore the assignments before the first use of dir to fix the warning and ensure dir is properly initialized throughout the function. Fixes: 08c2182cf0b4 ("xfrm: policy: use recently added helper in more places") Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-09xfrm: policy: fix null dereferenceFlorian Westphal
Julian Wiedmann says: > + if (!xfrm_pol_hold_rcu(ret)) Coverity spotted that ^^^ needs a s/ret/pol fix-up: > CID 1599386: Null pointer dereferences (FORWARD_NULL) > Passing null pointer "ret" to "xfrm_pol_hold_rcu", which dereferences it. Ditch the bogus 'ret' variable. Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype") Reported-by: Julian Wiedmann <jwiedmann.dev@gmail.com> Closes: https://lore.kernel.org/netdev/06dc2499-c095-4bd4-aee3-a1d0e3ec87c4@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-09Revert "xfrm: add SA information to the offloaded packet"Steffen Klassert
This reverts commit e7cd191f83fd899c233dfbe7dc6d96ef703dcbbd. While supporting xfrm interfaces in the packet offload API is needed, this patch does not do the right thing. There are more things to do to really support xfrm interfaces, so revert it for now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-03netdev_features: convert NETIF_F_LLTX to dev->lltxAlexander Lobakin
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-31xfrm: Unmask upper DSCP bits in xfrm_get_tos()Ido Schimmel
The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain: xfrm_bundle_create() tos = xfrm_get_tos(fl, family) xfrm_dst_lookup(..., tos, ...) __xfrm_dst_lookup(..., tos, ...) xfrm4_dst_lookup(..., tos, ...) __xfrm4_dst_lookup(..., tos, ...) fl4->flowi4_tos = tos __ip_route_output_key(net, fl4) Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Remove IPTOS_RT_MASK since it is no longer used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-28xfrm: minor update to sdb and xfrm_policy commentsFlorian Westphal
The spd is no longer maintained as a linear list. We also haven't been caching bundles in the xfrm_policy struct since 2010. While at it, add kdoc style comments for the xfrm_policy structure and extend the description of the current rbtree based search to mention why it needs to search the candidate set. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-28xfrm: policy: use recently added helper in more placesFlorian Westphal
No logical change intended. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-27xfrm: add SA information to the offloaded packetwangfe
In packet offload mode, append Security Association (SA) information to each packet, replicating the crypto offload implementation. The XFRM_XMIT flag is set to enable packet to be returned immediately from the validate_xmit_xfrm function, thus aligning with the existing code path for packet offload mode. This SA info helps HW offload match packets to their correct security policies. The XFRM interface ID is included, which is crucial in setups with multiple XFRM interfaces where source/destination addresses alone can't pinpoint the right policy. Signed-off-by: wangfe <wangfe@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-24xfrm: policy: remove remaining use of inexact listFlorian Westphal
No consumers anymore, remove it. After this, insertion of policies no longer require list walk of all inexact policies but only those that are reachable via the candidate sets. This gives almost linear insertion speeds provided the inserted policies are for non-overlapping networks. Before: Inserted 1000 policies in 70 ms Inserted 10000 policies in 1155 ms Inserted 100000 policies in 216848 ms After: Inserted 1000 policies in 56 ms Inserted 10000 policies in 478 ms Inserted 100000 policies in 4580 ms Insertion of 1m entries takes about ~40s after this change on my test vm. Cc: Noel Kuntze <noel@familie-kuntze.de> Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-24xfrm: switch migrate to xfrm_policy_lookup_bytypeFlorian Westphal
XFRM_MIGRATE still uses the old lookup method: first check the bydst hash table, then search the list of all the other policies. Switch MIGRATE to use the same lookup function as the packetpath. This is done to remove the last remaining users of the pernet xfrm.policy_inexact lists with the intent of removing this list. After this patch, policies are still added to the list on insertion and they are rehashed as-needed but no single API makes use of these anymore. This change is compile tested only. Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-24xfrm: policy: don't iterate inexact policies twice at insert timeFlorian Westphal
Since commit 6be3b0db6db8 ("xfrm: policy: add inexact policy search tree infrastructure") policy lookup no longer walks a list but has a set of candidate lists. This set has to be searched for the best match. In case there are several matches, the priority wins. If the priority is also the same, then the historic behaviour with a single list was to return the first match (first-in-list). With introduction of serval lists, this doesn't work and a new 'pos' member was added that reflects the xfrm_policy structs position in the list. This value is not exported to userspace and it does not need to be the 'position in the list', it just needs to make sure that a->pos < b->pos means that a was added to the lists more recently than b. This re-walk is expensive when many inexact policies are in use. Speed this up: when appending the policy to the end of the walker list, then just take the ->pos value of the last entry made and add 1. Add a slowpath version to prevent overflow, if we'd assign UINT_MAX then iterate the entire list and fix the ordering. While this speeds up insertion considerably finding the insertion spot in the inexact list still requires a partial list walk. This is addressed in followup patches. Before: ./xfrm_policy_add_speed.sh Inserted 1000 policies in 72 ms Inserted 10000 policies in 1540 ms Inserted 100000 policies in 334780 ms After: Inserted 1000 policies in 68 ms Inserted 10000 policies in 1137 ms Inserted 100000 policies in 157307 ms Reported-by: Noel Kuntze <noel@familie-kuntze.de> Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-16xfrm: Remove documentation WARN_ON to limit return values for offloaded SAPatrisious Haddad
The original idea to put WARN_ON() on return value from driver code was to make sure that packet offload doesn't have silent fallback to SW implementation, like crypto offload has. In reality, this is not needed as all *swan implementations followed this request and used explicit configuration style to make sure that "users will get what they ask". So instead of forcing drivers to make sure that even their internal flows don't return -EOPNOTSUPP, let's remove this WARN_ON. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.11 net-next PR. Conflicts: 93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null") 4cddb0f15ea9 ("net: ethtool: pse-pd: Fix possible null-deref") 30d7b6727724 ("net: ethtool: Add new power limit get and set features") https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge tag 'ipsec-next-2024-07-13' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-07-13 1) Support sending NAT keepalives in ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it. From Eyal Birger. 2) Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths. Currently, IPsec crypto offload is enabled for GRO code path only. This patchset support UDP encapsulation for the non GRO path. From Mike Yu. * tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet xfrm: Allow UDP encapsulation in crypto offload control path xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path xfrm: support sending NAT keepalives in ESP in UDP states ==================== Link: https://patch.msgid.link/20240713102416.3272997-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge tag 'ipsec-2024-07-11' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-07-11 1) Fix esp_output_tail_tcp() on unsupported ESPINTCP. From Hagar Hemdan. 2) Fix two bugs in the recently introduced SA direction separation. From Antony Antony. 3) Fix unregister netdevice hang on hardware offload. We had to add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. 4) Fix netdev reference count imbalance in xfrm_state_find. From Jianbo Liu. 5) Call xfrm_dev_policy_delete when killingi them on offloaded policies. Jianbo Liu. * tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: call xfrm_dev_policy_delete when kill policy xfrm: fix netdev reference count imbalance xfrm: Export symbol xfrm_dev_state_delete. xfrm: Fix unregister netdevice hang on hardware offload. xfrm: Log input direction mismatch error in one place xfrm: Fix input error path memory access net: esp: cleanup esp_output_tail_tcp() in case of unsupported ESPINTCP ==================== Link: https://patch.msgid.link/20240711100025.1949454-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packetMike Yu
If xfrm_input() is called with UDP_ENCAP_ESPINUDP, the packet is already processed in UDP layer that removes the UDP header. Therefore, there should be no much difference to treat it as an ESP packet in the XFRM stack. Test: Enabled dir=in IPsec crypto offload, and verified IPv4 UDP-encapsulated ESP packets on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12xfrm: Allow UDP encapsulation in crypto offload control pathMike Yu
Unblock this limitation so that SAs with encapsulation specified can be passed to HW drivers. HW drivers can still reject the SA in their implementation of xdo_dev_state_add if the encapsulation is not supported. Test: Verified on Android device Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO pathMike Yu
IPsec crypt offload supports outbound IPv6 ESP packets, but it doesn't support inbound IPv6 ESP packets. This change enables the crypto offload for inbound IPv6 ESP packets that are not handled through GRO code path. If HW drivers add the offload information to the skb, the packet will be handled in the crypto offload rx code path. Apart from the change in crypto offload rx code path, the change in xfrm_policy_check is also needed. Exampe of RX data path: +-----------+ +-------+ | HW Driver |-->| wlan0 |--------+ +-----------+ +-------+ | v +---------------+ +------+ +------>| Network Stack |-->| Apps | | +---------------+ +------+ | | | v +--------+ +------------+ | ipsec1 |<--| XFRM Stack | +--------+ +------------+ Test: Enabled both in/out IPsec crypto offload, and verified IPv6 ESP packets on Android device on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-08xfrm: call xfrm_dev_policy_delete when kill policyJianbo Liu
xfrm_policy_kill() is called at different places to delete xfrm policy. It will call xfrm_pol_put(). But xfrm_dev_policy_delete() is not called to free the policy offloaded to hardware. The three commits cited here are to handle this issue by calling xfrm_dev_policy_delete() outside xfrm_get_policy(). But they didn't cover all the cases. An example, which is not handled for now, is xfrm_policy_insert(). It is called when XFRM_MSG_UPDPOLICY request is received. Old policy is replaced by new one, but the offloaded policy is not deleted, so driver doesn't have the chance to release hardware resources. To resolve this issue for all cases, move xfrm_dev_policy_delete() into xfrm_policy_kill(), so the offloaded policy can be deleted from hardware when it is called, which avoids hardware resources leakage. Fixes: 919e43fad516 ("xfrm: add an interface to offload policy") Fixes: bf06fcf4be0f ("xfrm: add missed call to delete offloaded policies") Fixes: 982c3aca8bac ("xfrm: delete offloaded policy") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-08xfrm: fix netdev reference count imbalanceJianbo Liu
In cited commit, netdev_tracker_alloc() is called for the newly allocated xfrm state, but dev_hold() is missed, which causes netdev reference count imbalance, because netdev_put() is called when the state is freed in xfrm_dev_state_free(). Fix the issue by replacing netdev_tracker_alloc() with netdev_hold(). Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>