summaryrefslogtreecommitdiff
path: root/drivers/net/vxlan
AgeCommit message (Collapse)Author
2024-10-18net: vxlan: update the document for vxlan_snoop()Menglong Dong
The function vxlan_snoop() returns drop reasons now, so update the document of it too. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-18net: vxlan: replace VXLAN_INVALID_HDR with VNI_NOT_FOUNDMenglong Dong
Replace the drop reason "SKB_DROP_REASON_VXLAN_INVALID_HDR" with "SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND" in encap_bypass_if_local(), as the latter is more accurate. Fixes: 790961d88b0e ("net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()") Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in encap_bypass_if_local, and no new skb drop reason is added in this commit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: vxlan: use kfree_skb_reason() in vxlan_encap_bypass()Menglong Dong
Replace kfree_skb with kfree_skb_reason in vxlan_encap_bypass, and no new skb drop reason is added in this commit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: vxlan: use kfree_skb_reason() in vxlan_mdb_xmit()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in vxlan_mdb_xmit. No drop reasons are introduced in this commit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: vxlan: add drop reasons support to vxlan_xmit_one()Menglong Dong
Replace kfree_skb/dev_kfree_skb with kfree_skb_reason in vxlan_xmit_one. No drop reasons are introduced in this commit. The only concern of mine is replacing dev_kfree_skb with kfree_skb_reason. The dev_kfree_skb is equal to consume_skb, and I'm not sure if we can change it to kfree_skb here. In my option, the skb is "dropped" here, isn't it? Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: vxlan: use kfree_skb_reason() in vxlan_xmit()Menglong Dong
Replace kfree_skb() with kfree_skb_reason() in vxlan_xmit(). Following new skb drop reasons are introduced for vxlan: /* no remote found for xmit */ SKB_DROP_REASON_VXLAN_NO_REMOTE /* packet without necessary metadata reached a device which is * in "external" mode */ SKB_DROP_REASON_TUNNEL_TXINFO Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: vxlan: make vxlan_set_mac() return drop reasonsMenglong Dong
Change the return type of vxlan_set_mac() from bool to enum skb_drop_reason. In this commit, the drop reason "SKB_DROP_REASON_LOCAL_MAC" is introduced for the case that the source mac of the packet is a local mac. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: vxlan: make vxlan_snoop() return drop reasonsMenglong Dong
Change the return type of vxlan_snoop() from bool to enum skb_drop_reason. In this commit, two drop reasons are introduced: SKB_DROP_REASON_MAC_INVALID_SOURCE SKB_DROP_REASON_VXLAN_ENTRY_EXISTS Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: vxlan: make vxlan_remcsum() return drop reasonsMenglong Dong
Make vxlan_remcsum() support skb drop reasons by changing the return value type of it from bool to enum skb_drop_reason. The only drop reason in vxlan_remcsum() comes from pskb_may_pull_reason(), so we just return it. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: vxlan: add skb drop reasons to vxlan_rcv()Menglong Dong
Introduce skb drop reasons to the function vxlan_rcv(). Following new drop reasons are added: SKB_DROP_REASON_VXLAN_INVALID_HDR SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND SKB_DROP_REASON_IP_TUNNEL_ECN Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-13net: tunnel: make skb_vlan_inet_prepare() return drop reasonsMenglong Dong
Make skb_vlan_inet_prepare return the skb drop reasons, which is just what pskb_may_pull_reason() returns. Meanwhile, adjust all the call of it. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-10-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc3). No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10genetlink: extend info user-storage to match NL cb ctxPaolo Abeni
This allows a more uniform implementation of non-dump and dump operations, and will be used later in the series to avoid some per-operation allocation. Additionally rename the NL_ASSERT_DUMP_CTX_FITS macro, to fit a more extended usage. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/1130cc2896626b84587a2a5f96a5c6829638f4da.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-10vxlan: Handle error of rtnl_register_module().Kuniyuki Iwashima
Since introduced, vxlan_vnifilter_init() has been ignoring the returned value of rtnl_register_module(), which could fail silently. Handling the error allows users to view a module as an all-or-nothing thing in terms of the rtnetlink functionality. This prevents syzkaller from reporting spurious errors from its tests, where OOM often occurs and module is automatically loaded. Let's handle the errors by rtnl_register_many(). Fixes: f9c4bb0b245c ("vxlan: vni filtering support on collect metadata device") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-03netdev_features: convert NETIF_F_LLTX to dev->lltxAlexander Lobakin
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-03netdevice: convert private flags > BIT(31) to bitfieldsAlexander Lobakin
Make dev->priv_flags `u32` back and define bits higher than 31 as bitfield booleans as per Jakub's suggestion. This simplifies code which accesses these bits with no optimization loss (testb both before/after), allows to not extend &netdev_priv_flags each time, but also scales better as bits > 63 in the future would only add a new u64 to the structure with no complications, comparing to that extending ::priv_flags would require converting it to a bitmap. Note that I picked `unsigned long :1` to not lose any potential optimizations comparing to `bool :1` etc. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-23net: vxlan: delete redundant judgment statementsLi Zetao
The initial value of err is -ENOBUFS, and err is guaranteed to be less than 0 before all goto errout. Therefore, on the error path of errout, there is no need to repeatedly judge that err is less than 0, and delete redundant judgments to make the code more concise. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12net: vxlan: remove duplicated initialization in vxlan_xmitMenglong Dong
The variable "did_rsc" is initialized twice, which is unnecessary. Just remove one of them. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21vxlan: Pull inner IP header in vxlan_xmit_one().Guillaume Nault
Ensure the inner IP header is part of the skb's linear data before setting old_iph. Otherwise, on a non-linear skb, old_iph could point outside of the packet data. Unlike classical VXLAN, which always encapsulates Ethernet packets, VXLAN-GPE can transport IP packets directly. In that case, we need to look at skb->protocol to figure out if an Ethernet header is present. Fixes: d342894c5d2f ("vxlan: virtual extensible lan") Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/2aa75f6fa62ac9dbe4f16ad5ba75dd04a51d4b99.1718804000.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05vxlan: Fix regression when dropping packets due to invalid src addressesDaniel Borkmann
Commit f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") has recently been added to vxlan mainly in the context of source address snooping/learning so that when it is enabled, an entry in the FDB is not being created for an invalid address for the corresponding tunnel endpoint. Before commit f58f45c1e5b9 vxlan was similarly behaving as geneve in that it passed through whichever macs were set in the L2 header. It turns out that this change in behavior breaks setups, for example, Cilium with netkit in L3 mode for Pods as well as tunnel mode has been passing before the change in f58f45c1e5b9 for both vxlan and geneve. After mentioned change it is only passing for geneve as in case of vxlan packets are dropped due to vxlan_set_mac() returning false as source and destination macs are zero which for E/W traffic via tunnel is totally fine. Fix it by only opting into the is_valid_ether_addr() check in vxlan_set_mac() when in fact source address snooping/learning is actually enabled in vxlan. This is done by moving the check into vxlan_snoop(). With this change, the Cilium connectivity test suite passes again for both tunnel flavors. Fixes: f58f45c1e5b9 ("vxlan: drop packets from invalid src-address") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: David Bauer <mail@david-bauer.net> Cc: Ido Schimmel <idosch@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Bauer <mail@david-bauer.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-07net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protectionEric Dumazet
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. All rtnl_link_ops->get_link_net() methods already using dev_net() are ready. I added READ_ONCE() annotations on others. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01vxlan: Pull inner IP header in vxlan_rcv().Guillaume Nault
Ensure the inner IP header is part of skb's linear data before reading its ECN bits. Otherwise we might read garbage. One symptom is the system erroneously logging errors like "vxlan: non-ECT from xxx.xxx.xxx.xxx with TOS=xxxx". Similar bugs have been fixed in geneve, ip_tunnel and ip6_tunnel (see commit 1ca1ba465e55 ("geneve: make sure to pull inner header in geneve_rx()") for example). So let's reuse the same code structure for consistency. Maybe we'll can add a common helper in the future. Fixes: d342894c5d2f ("vxlan: virtual extensible lan") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/1239c8db54efec341dd6455c77e0380f58923a3c.1714495737.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29vxlan: Add missing VNI filter counter update in arp_reduce().Guillaume Nault
VXLAN stores per-VNI statistics using vxlan_vnifilter_count(). These statistics were not updated when arp_reduce() failed its pskb_may_pull() call. Use vxlan_vnifilter_count() to update the VNI counter when that happens. Fixes: 4095e0e1328a ("drivers: vxlan: vnifilter: per vni stats") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-29vxlan: Fix racy device stats updates.Guillaume Nault
VXLAN devices update their stats locklessly. Therefore these counters should either be stored in per-cpu data structures or the updates should be done using atomic increments. Since the net_device_core_stats infrastructure is already used in vxlan_rcv(), use it for the other rx_dropped and tx_dropped counter updates. Update the other counters atomically using DEV_STATS_INC(). Fixes: d342894c5d2f ("vxlan: virtual extensible lan") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-29ipv6: introduce dst_rt6_info() helperEric Dumazet
Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-19vxlan: drop packets from invalid src-addressDavid Bauer
The VXLAN driver currently does not check if the inner layer2 source-address is valid. In case source-address snooping/learning is enabled, a entry in the FDB for the invalid address is created with the layer3 address of the tunnel endpoint. If the frame happens to have a non-unicast address set, all this non-unicast traffic is subsequently not flooded to the tunnel network but sent to the learnt host in the FDB. To make matters worse, this FDB entry does not expire. Apply the same filtering for packets as it is done for bridges. This not only drops these invalid packets but avoids them from being learnt into the FDB. Fixes: d342894c5d2f ("vxlan: virtual extensible lan") Suggested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David Bauer <mail@david-bauer.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01ip_tunnel: convert __be16 tunnel flags to bitmapsAlexander Lobakin
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice. Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all TUNNEL_* occurrences to their bitmap counterparts. Use the converters in the places which talk to the userspace, hardware (NFP) or other hosts (GRE header). The rest must explicitly use the new flags only. This must be done at once, otherwise there will be too many conversions throughout the code in the intermediate commits. Finally, disable the old __be16 flags for use in the kernel code (except for the two 'irregular' flags mentioned above), to prevent any accidental (mis)use of them. For the userspace, nothing is changed, only additions were made. Most noticeable bloat-o-meter difference (.text): vmlinux: 307/-1 (306) gre.ko: 62/0 (62) ip_gre.ko: 941/-217 (724) [*] ip_tunnel.ko: 390/-900 (-510) [**] ip_vti.ko: 138/0 (138) ip6_gre.ko: 534/-18 (516) [*] ip6_tunnel.ko: 118/-10 (108) [*] gre_flags_to_tnl_flags() grew, but still is inlined [**] ip_tunnel_find() got uninlined, hence such decrease The average code size increase in non-extreme case is 100-200 bytes per module, mostly due to sizeof(long) > sizeof(__be16), as %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers are able to expand the majority of bitmap_*() calls here into direct operations on scalars. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-11vxlan: Remove generic .ndo_get_stats64Breno Leitao
Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://lore.kernel.org/r/20240311112437.3813987-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11vxlan: Do not alloc tstats manuallyBreno Leitao
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the vxlan driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://lore.kernel.org/r/20240311112437.3813987-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21net: vxlan: constify the struct device_type usageRicardo B. Marliere
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver core can properly handle constant struct device_type. Move the vxlan_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-13net: add netdev_lockdep_set_classes() to virtual driversEric Dumazet
Based on a syzbot report, it appears many virtual drivers do not yet use netdev_lockdep_set_classes(), triggerring lockdep false positives. WARNING: possible recursive locking detected 6.8.0-rc4-next-20240212-syzkaller #0 Not tainted syz-executor.0/19016 is trying to acquire lock: ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] ffff8880162cb298 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 but task is already holding lock: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(_xmit_ETHER#2); lock(_xmit_ETHER#2); *** DEADLOCK *** May be due to missing lock nesting notation 9 locks held by syz-executor.0/19016: #0: ffffffff8f385208 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline] #0: ffffffff8f385208 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x82c/0x1040 net/core/rtnetlink.c:6603 #1: ffffc90000a08c00 ((&in_dev->mr_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x600 kernel/time/timer.c:1697 #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #2: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1360 net/ipv4/ip_output.c:228 #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline] #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline] #3: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4284 #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:361 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:195 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3771 [inline] #4: ffff8880416e3258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x1262/0x3b10 net/core/dev.c:4325 #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: __netif_tx_lock include/linux/netdevice.h:4452 [inline] #5: ffff8880223db4d8 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #6: ffffffff8e131520 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1360 net/ipv4/ip_output.c:228 #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline] #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline] #7: ffffffff8e131580 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4284 #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: spin_trylock include/linux/spinlock.h:361 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: qdisc_run_begin include/net/sch_generic.h:195 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_xmit_skb net/core/dev.c:3771 [inline] #8: ffff888014d9d258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+...}-{2:2}, at: __dev_queue_xmit+0x1262/0x3b10 net/core/dev.c:4325 stack backtrace: CPU: 1 PID: 19016 Comm: syz-executor.0 Not tainted 6.8.0-rc4-next-20240212-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_deadlock kernel/locking/lockdep.c:3062 [inline] validate_chain+0x15c1/0x58e0 kernel/locking/lockdep.c:3856 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] __netif_tx_lock include/linux/netdevice.h:4452 [inline] sch_direct_xmit+0x1c4/0x5f0 net/sched/sch_generic.c:340 __dev_xmit_skb net/core/dev.c:3784 [inline] __dev_queue_xmit+0x1912/0x3b10 net/core/dev.c:4325 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0xe66/0x1360 net/ipv4/ip_output.c:235 iptunnel_xmit+0x540/0x9b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x20ee/0x2960 net/ipv4/ip_tunnel.c:831 erspan_xmit+0x9de/0x1460 net/ipv4/ip_gre.c:720 __netdev_start_xmit include/linux/netdevice.h:4989 [inline] netdev_start_xmit include/linux/netdevice.h:5003 [inline] xmit_one net/core/dev.c:3555 [inline] dev_hard_start_xmit+0x242/0x770 net/core/dev.c:3571 sch_direct_xmit+0x2b6/0x5f0 net/sched/sch_generic.c:342 __dev_xmit_skb net/core/dev.c:3784 [inline] __dev_queue_xmit+0x1912/0x3b10 net/core/dev.c:4325 neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0xe66/0x1360 net/ipv4/ip_output.c:235 igmpv3_send_cr net/ipv4/igmp.c:723 [inline] igmp_ifc_timer_expire+0xb71/0xd90 net/ipv4/igmp.c:813 call_timer_fn+0x17e/0x600 kernel/time/timer.c:1700 expire_timers kernel/time/timer.c:1751 [inline] __run_timers+0x621/0x830 kernel/time/timer.c:2038 run_timer_softirq+0x67/0xf0 kernel/time/timer.c:2051 __do_softirq+0x2bc/0x943 kernel/softirq.c:554 invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1076 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1076 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 RIP: 0010:resched_offsets_ok kernel/sched/core.c:10127 [inline] RIP: 0010:__might_resched+0x16f/0x780 kernel/sched/core.c:10142 Code: 00 4c 89 e8 48 c1 e8 03 48 ba 00 00 00 00 00 fc ff df 48 89 44 24 38 0f b6 04 10 84 c0 0f 85 87 04 00 00 41 8b 45 00 c1 e0 08 <01> d8 44 39 e0 0f 85 d6 00 00 00 44 89 64 24 1c 48 8d bc 24 a0 00 RSP: 0018:ffffc9000ee069e0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8880296a9e00 RDX: dffffc0000000000 RSI: ffff8880296a9e00 RDI: ffffffff8bfe8fa0 RBP: ffffc9000ee06b00 R08: ffffffff82326877 R09: 1ffff11002b5ad1b R10: dffffc0000000000 R11: ffffed1002b5ad1c R12: 0000000000000000 R13: ffff8880296aa23c R14: 000000000000062a R15: 1ffff92001dc0d44 down_write+0x19/0x50 kernel/locking/rwsem.c:1578 kernfs_activate fs/kernfs/dir.c:1403 [inline] kernfs_add_one+0x4af/0x8b0 fs/kernfs/dir.c:819 __kernfs_create_file+0x22e/0x2e0 fs/kernfs/file.c:1056 sysfs_add_file_mode_ns+0x24a/0x310 fs/sysfs/file.c:307 create_files fs/sysfs/group.c:64 [inline] internal_create_group+0x4f4/0xf20 fs/sysfs/group.c:152 internal_create_groups fs/sysfs/group.c:192 [inline] sysfs_create_groups+0x56/0x120 fs/sysfs/group.c:218 create_dir lib/kobject.c:78 [inline] kobject_add_internal+0x472/0x8d0 lib/kobject.c:240 kobject_add_varg lib/kobject.c:374 [inline] kobject_init_and_add+0x124/0x190 lib/kobject.c:457 netdev_queue_add_kobject net/core/net-sysfs.c:1706 [inline] netdev_queue_update_kobjects+0x1f3/0x480 net/core/net-sysfs.c:1758 register_queue_kobjects net/core/net-sysfs.c:1819 [inline] netdev_register_kobject+0x265/0x310 net/core/net-sysfs.c:2059 register_netdevice+0x1191/0x19c0 net/core/dev.c:10298 bond_newlink+0x3b/0x90 drivers/net/bonding/bond_netlink.c:576 rtnl_newlink_create net/core/rtnetlink.c:3506 [inline] __rtnl_newlink net/core/rtnetlink.c:3726 [inline] rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3739 rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6606 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367 netlink_sendmsg+0xa3c/0xd70 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 __sys_sendto+0x3a4/0x4f0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2199 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7fc3fa87fa9c Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240212140700.2795436-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-07vxlan: use exit_batch_rtnl() methodEric Dumazet
exit_batch_rtnl() is called while RTNL is held, and devices to be unregistered can be queued in the dev_kill_list. This saves one rtnl_lock()/rtnl_unlock() pair per netns and one unregister_netdevice_many() call. v4: (Paolo feedback : https://netdev-3.bots.linux.dev/vmksft-net/results/453141/17-udpgro-fwd-sh/stdout ) - Changed vxlan_destroy_tunnels() to use vxlan_dellink() instead of unregister_netdevice_queue to propely remove devices from vn->vxlan_list. - vxlan_destroy_tunnels() can simply iterate one list (vn->vxlan_list) to find all devices in the most efficient way. - Moved sanity checks in a separate vxlan_exit_net() method. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20240206144313.2050392-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-20vxlan: mdb: Add MDB bulk deletion supportIdo Schimmel
Implement MDB bulk deletion support in the VXLAN driver, allowing MDB entries to be deleted in bulk according to provided parameters. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16vxlan: add support for flowlabel inheritAlce Lafranque
By default, VXLAN encapsulation over IPv6 sets the flow label to 0, with an option for a fixed value. This commits add the ability to inherit the flow label from the inner packet, like for other tunnel implementations. This enables devices using only L3 headers for ECMP to correctly balance VXLAN-encapsulated IPv6 packets. ``` $ ./ip/ip link add dummy1 type dummy $ ./ip/ip addr add 2001:db8::2/64 dev dummy1 $ ./ip/ip link set up dev dummy1 $ ./ip/ip link add vxlan1 type vxlan id 100 flowlabel inherit remote 2001:db8::1 local 2001:db8::2 $ ./ip/ip link set up dev vxlan1 $ ./ip/ip addr add 2001:db8:1::2/64 dev vxlan1 $ ./ip/ip link set arp off dev vxlan1 $ ping -q 2001:db8:1::1 & $ tshark -d udp.port==8472,vxlan -Vpni dummy1 -c1 [...] Internet Protocol Version 6, Src: 2001:db8::2, Dst: 2001:db8::1 0110 .... = Version: 6 .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT) .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0) .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0) .... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb [...] Virtual eXtensible Local Area Network Flags: 0x0800, VXLAN Network ID (VNI) Group Policy ID: 0 VXLAN Network Identifier (VNI): 100 [...] Internet Protocol Version 6, Src: 2001:db8:1::2, Dst: 2001:db8:1::1 0110 .... = Version: 6 .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT) .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0) .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0) .... 1011 0001 1010 1111 1011 = Flow Label: 0xb1afb ``` Signed-off-by: Alce Lafranque <alce@lafranque.net> Co-developed-by: Vincent Bernat <vincent@bernat.ch> Signed-off-by: Vincent Bernat <vincent@bernat.ch> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27vxlan: Cleanup IFLA_VXLAN_PORT_RANGE entry in vxlan_get_size()Benjamin Poirier
This patch is basically a followup to commit 4e4b1798cc90 ("vxlan: Add missing entries to vxlan_get_size()"). All of the attributes in vxlan_get_size() appear in the same order that they are filled in vxlan_fill_info() except for IFLA_VXLAN_PORT_RANGE. For consistency, move that entry to match its order and add a comment, like for all other entries. Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/20231027184410.236671-1-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27vxlan: mdb: Add MDB get supportIdo Schimmel
Implement support for MDB get operation by looking up a matching MDB entry, allocating the skb according to the entry's size and then filling in the response. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27vxlan: mdb: Factor out a helper for remote entry size calculationIdo Schimmel
Currently, netlink notifications are sent for individual remote entries and not for the entire MDB entry itself. Subsequent patches are going to add MDB get support which will require the VXLAN driver to reply with an entire MDB entry. Therefore, as a preparation, factor out a helper to calculate the size of an individual remote entry. When determining the size of the reply this helper will be invoked for each remote entry in the MDB entry. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27vxlan: mdb: Adjust function argumentsIdo Schimmel
Adjust the function's arguments and rename it to allow it to be reused by future call sites that only have access to 'struct vxlan_mdb_entry_key', but not to 'struct vxlan_mdb_config'. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-26netlink: make range pointers in policies constJakub Kicinski
struct nla_policy is usually constant itself, but unless we make the ranges inside constant we won't be able to make range structs const. The ranges are not modified by the core. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231025162204.132528-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-23vxlan: use generic function for tunnel IPv6 route lookupBeniamino Galvani
The route lookup can be done now via generic function udp_tunnel6_dst_lookup() to replace the custom implementation in vxlan6_get_route(). This is similar to what already done for IPv4 in commit 6f19b2c136d9 ("vxlan: use generic function for tunnel IPv4 route lookup"). Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-16vxlan: use generic function for tunnel IPv4 route lookupBeniamino Galvani
The route lookup can be done now via generic function udp_tunnel_dst_lookup() to replace the custom implementations in vxlan_get_route(). Note that this patch only touches IPv4, while IPv6 still uses vxlan6_get_route(). After IPv6 route lookup gets converted as well, vxlan_xmit_one() can be simplified by removing local variables that will be passed via "struct ip_tunnel_key", such as remote_ip, local_ip, flow_flags, label. Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13vxlan: vxlan_core: Support FDB flushing by destination IPAmit Cohen
Add support for flush VXLAN FDB entries by destination IP. FDB entry is stored as {MAC, SRC_VNI} + remote. The destination IP is an attribute of the remote. For multicast entries, the VXLAN driver stores a linked list of remotes for a given key. In user space, each remote is represented as a separate entry, so when flush is sent with filter of 'destination IP', flush only the match remotes. In case that there are no additional remotes, destroy the entry. For example, the following are stored as one entry with several remotes: $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.3 self permanent 00:00:00:00:00:00 dst 192.1.1.1 self permanent 00:00:00:00:00:00 dst 192.1.1.2 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 1000 self permanent When user flush by destination IP x, only the relevant remotes will be flushed: $ bridge fdb flush dev vx10 dst 192.1.1.1 $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.3 self permanent 00:00:00:00:00:00 dst 192.1.1.2 self permanent Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13vxlan: vxlan_core: Support FDB flushing by destination portAmit Cohen
Add support for flush VXLAN FDB entries by destination port. FDB entry is stored as {MAC, SRC_VNI} + remote. The destination port is an attribute of the remote. For multicast entries, the VXLAN driver stores a linked list of remotes for a given key. In user space, each remote is represented as a separate entry, so when flush is sent with filter of 'destination port', flush only the match remotes. In case that there are no additional remotes, destroy the entry. For example, the following are stored as one entry with several remotes: $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.1 port 1111 vni 2000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 port 1111 vni 3000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 port 2222 vni 2000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent When user flush by port x, only the relevant remotes will be flushed: $ bridge fdb flush dev vx10 port 1111 $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.1 port 2222 vni 2000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13vxlan: vxlan_core: Support FDB flushing by destination VNIAmit Cohen
Add support for flush VXLAN FDB entries by destination VNI. FDB entry is stored as {MAC, SRC_VNI} + remote. The destination VNI is an attribute of the remote. For multicast entries, the VXLAN driver stores a linked list of remotes for a given key. In user space, each remote is represented as a separate entry, so when flush is sent with filter of 'destination VNI', flush only the match remotes. In case that there are no additional remotes, destroy the entry. For example, the following are stored as one entry with several remotes: $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 4000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 2000 self permanent 00:00:00:00:00:00 dst 192.1.1.2 vni 2000 self permanent When user flush by VNI x, only the relevant remotes will be flushed: $ bridge fdb flush dev vx10 vni 2000 $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 4000 self permanent Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13vxlan: vxlan_core: Support FDB flushing by nexthop IDAmit Cohen
Add support for flush VXLAN FDB entries by nexthop ID. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13vxlan: vxlan_core: Support FDB flushing by source VNIAmit Cohen
Add support for flush VXLAN FDB entries by source VNI. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>