summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-10-13mlxsw: reg: Fix a typo in a group headingPetr Machata
There is no such thing as "traffic group". The group that this is a heading of is "per traffic class counters". Fix the heading. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13net: enetc: fix check for allocation failureDan Carpenter
This was supposed to be a check for if dma_alloc_coherent() failed but it has a copy and paste bug so it will not work. Fixes: fb8629e2cbfc ("net: enetc: add support for software TSO") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20211013080456.GC6010@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13net: dsa: unregister cross-chip notifier after ds->ops->teardownVladimir Oltean
To be symmetric with the error unwind path of dsa_switch_setup(), call dsa_switch_unregister_notifier() after ds->ops->teardown. The implication is that ds->ops->teardown cannot emit cross-chip notifiers. For example, currently the dsa_tag_8021q_unregister() call from sja1105_teardown() does not propagate to the entire tree due to this reason. However I cannot find an actual issue caused by this, observed using code inspection. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211012123735.2545742-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13marvell: octeontx2: build error: unknown type name 'u64'Anders Roxell
Building an allmodconfig kernel arm64 kernel, the following build error shows up: In file included from drivers/crypto/marvell/octeontx2/cn10k_cpt.c:4: include/linux/soc/marvell/octeontx2/asm.h:38:15: error: unknown type name 'u64' 38 | static inline u64 otx2_atomic64_fetch_add(u64 incr, u64 *ptr) | ^~~ Include linux/types.h in asm.h so the compiler knows what the type 'u64' are. Fixes: af3826db74d1 ("octeontx2-pf: Use hardware register for CQE count") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Link: https://lore.kernel.org/r/20211013135743.3826594-1-anders.roxell@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13net: remove single-byte netdev->dev_addr writesJakub Kicinski
Make the drivers which use single-byte netdev addresses (netdev->addr_len == 1) use the appropriate address setting helpers. arcnet copies from int variables and io reads a lot, so add a helper for arcnet drivers to use. Similar helper could be reused for phonet and appletalk but there isn't any good central location where we could put it, and netdevice.h is already very crowded. Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> # for HSI Link: https://lore.kernel.org/r/20211012142757.4124842-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13Merge branch 'net-use-dev_addr_set-in-hamradio-and-ip-tunnels'Jakub Kicinski
Jakub Kicinski says: ==================== net: use dev_addr_set() in hamradio and ip tunnels Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. ==================== Link: https://lore.kernel.org/r/20211012160634.4152690-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ip: use dev_addr_set() in tunnelsJakub Kicinski
Use dev_addr_set() instead of writing to netdev->dev_addr directly in ip tunnels drivers. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13hamradio: use dev_addr_set() for setting device addressJakub Kicinski
Use dev_addr_set() instead of writing to netdev->dev_addr directly in hamradio drivers. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13netdevice: demote the type of some dev_addr_set() helpersJakub Kicinski
__dev_addr_set() and dev_addr_mod() and pretty low level, let the arguments be void, there's no chance for confusion in callers converted to use them. Keep u8 in dev_addr_set() because some of the callers are converted from a loop and we want to make sure assignments are not from an array of a different type. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13Merge branch 'net-constify-dev_addr-passing-for-protocols'Jakub Kicinski
Jakub Kicinski says: ==================== net: constify dev_addr passing for protocols Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. netdev->dev_addr will be made const to prevent direct writes. This set sprinkles const across variables and arguments in protocol code which are used to hold references to netdev->dev_addr. ==================== Link: https://lore.kernel.org/r/20211012155840.4151590-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13decnet: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in decnet constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13tipc: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in tipc constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ipv6: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in ndisc constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13llc/snap: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in LLC and SNAP constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13rose: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in rose constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ax25: constify dev_addr passingJakub Kicinski
In preparation for netdev->dev_addr being constant make all relevant arguments in AX25 constant. Modify callers as well (netrom, rose). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13Merge branch 'add-functional-support-for-gigabit-ethernet-driver'Jakub Kicinski
Biju Das says: ==================== Add functional support for Gigabit Ethernet driver The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are similar to the R-Car Ethernet AVB IP. The Gigabit Ethernet IP consists of Ethernet controller (E-MAC), Internal TCP/IP Offload Engine (TOE) and Dedicated Direct memory access controller (DMAC). With a few changes in the driver we can support both IPs. This patch series is aims to add functional support for Gigabit Ethernet driver by filling all the stubs except set_features. set_feature patch will send as separate RFC patch along with rx_checksum patch, as it needs further discussion related to HW checksum. With this series, we can do boot kernel with rootFS mounted on NFS on RZ/G2L platforms. ==================== Link: https://lore.kernel.org/r/20211012163613.30030-1-biju.das.jz@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Fix typo AVB->DMACBiju Das
Fix the typo AVB->DMAC in comment, as the code following the comment is for DMAC on Gigabit Ethernet IP. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Update ravb_emac_init_gbeth()Biju Das
This patch enables Receive/Transmit port of TOE and removes the setting of promiscuous bit from EMAC configuration mode register. This patch also update EMAC configuration mode comment from "PAUSE prohibition" to "EMAC Mode: PAUSE prohibition; Duplex; TX; RX; CRC Pass Through". Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Document PFRI register bitBiju Das
Document PFRI register bit, as it is documented on R-Car Gen3 and RZ/G2L hardware manuals. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Rename "nc_queue" feature bitBiju Das
Rename the feature bit "nc_queue" with "nc_queues" as AVB DMAC has RX and TX NC queues. There is no functional change. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Optimize ravb_emac_init_gbeth functionBiju Das
Optimize CXR31 register initialization on ravb_emac_init_gbeth function. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Rename "tsrq" variableBiju Das
Rename the variable "tsrq" with "tccr_mask" as we are passing TCCR mask to the ravb_wait() function. There is no functional change. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Add support to retrieve stats for GbEthernetBiju Das
Add support for retrieving stats information for GbEthernet. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Add carrier_counters to struct ravb_hw_infoBiju Das
RZ/G2L E-MAC supports carrier counters. Add a carrier_counter hw feature bit to struct ravb_hw_info to add this feature only for RZ/G2L. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Fillup ravb_rx_gbeth() stubBiju Das
Fillup ravb_rx_gbeth() function to support RZ/G2L. This patch also renames ravb_rcar_rx to ravb_rx_rcar to be consistent with the naming convention used in sh_eth driver. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Fillup ravb_rx_ring_format_gbeth() stubBiju Das
Fillup ravb_rx_ring_format_gbeth() function to support RZ/G2L. This patch also renames ravb_rx_ring_format to ravb_rx_ring_format_rcar to be consistent with the naming convention used in sh_eth driver. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Fillup ravb_rx_ring_free_gbeth() stubBiju Das
Fillup ravb_rx_ring_free_gbeth() function to support RZ/G2L. This patch also renames ravb_rx_ring_free to ravb_rx_ring_free_rcar to be consistent with the naming convention used in sh_eth driver. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Fillup ravb_alloc_rx_desc_gbeth() stubBiju Das
Fillup ravb_alloc_rx_desc_gbeth() function to support RZ/G2L. This patch also renames ravb_alloc_rx_desc to ravb_alloc_rx_desc_rcar to be consistent with the naming convention used in sh_eth driver. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Add rx_max_buf_size to struct ravb_hw_infoBiju Das
R-Car AVB-DMAC has maximum 2K size on RX buffer, whereas on RZ/G2L it is 8K. We need to allow for changing the MTU within the limit of the maximum size of a descriptor. Add a rx_max_buf_size variable to struct ravb_hw_info to handle this difference. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13ravb: Use ALIGN macro for max_rx_lenBiju Das
Use ALIGN macro for calculating the value for max_rx_len. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13net: qed_debug: fix check of false (grc_param < 0) expressionJean Sacren
The type of enum dbg_grc_params has the enumerator list starting from 0. When grc_param is declared by enum dbg_grc_params, (grc_param < 0) is always false. We should remove the check of this expression. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Acked-by: Shai Malin <smalin@marvell.com> Link: https://lore.kernel.org/r/20211012074645.12864-1-sakiwit@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13net: enetc: include ip6_checksum.h for csum_ipv6_magicIoana Ciornei
For those architectures which do not define_HAVE_ARCH_IPV6_CSUM, we need to include ip6_checksum.h which provides the csum_ipv6_magic() function. Fixes: fb8629e2cbfc ("net: enetc: add support for software TSO") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211012121358.16641-1-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12ionic: no devlink_unregister if not registeredShannon Nelson
Don't try to unregister the devlink if it hasn't been registered yet. This bit of error cleanup code got missed in the recent devlink registration changes. Fixes: 7911c8bd546f ("ionic: Move devlink registration to be last devlink command") Signed-off-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20211012231520.72582-1-snelson@pensando.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12Merge branch 'devlink-reload-simplification'Jakub Kicinski
Leon Romanovsky says: ==================== devlink reload simplification Simplify devlink reload APIs. ==================== Link: https://lore.kernel.org/r/cover.1634044267.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Delete reload enable/disable interfaceLeon Romanovsky
Commit a0c76345e3d3 ("devlink: disallow reload operation during device cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload operation from racing with device probe/dismantle. After recent changes to move devlink_register() to the end of device probe and devlink_unregister() to the beginning of device dismantle, these races can no longer happen. Reload operations will be denied if the devlink instance is unregistered and devlink_unregister() will block until all in-flight operations are done. Therefore, remove these devlink_reload_{enable,disable}() APIs. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net/mlx5: Set devlink reload feature bit for supported devices onlyLeon Romanovsky
Mulitport slave device doesn't support devlink reload, so instead of complicating initialization flow with devlink_reload_enable() which will be removed in next patch, don't set DEVLINK_F_RELOAD feature bit for such devices. This fixes an error when reload counters exposed (and equal zero) for the mode that is not supported at all. Fixes: d89ddaae1766 ("net/mlx5: Disable devlink reload for multi port slave device") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Allow control devlink ops behavior through feature maskLeon Romanovsky
Introduce new devlink call to set feature mask to control devlink behavior during device initialization phase after devlink_alloc() is already called. This allows us to set reload ops based on device property which is not known at the beginning of driver initialization. For the sake of simplicity, this API lacks any type of locking and needs to be called before devlink_register() to make sure that no parallel access to the ops is possible at this stage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Annotate devlink API callsLeon Romanovsky
Initial annotation patch to separate calls that needs to be executed before or after devlink_register(). Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Move netdev_to_devlink helpers to devlink.cLeon Romanovsky
Both netdev_to_devlink and netdev_to_devlink_port are used in devlink.c only, so move them in order to reduce their scope. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12devlink: Reduce struct devlink exposureLeon Romanovsky
The declaration of struct devlink in general header provokes the situation where internal fields can be accidentally used by the driver authors. In order to reduce such possible situations, let's reduce the namespace exposure of struct devlink. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12ethernet: tulip: avoid duplicate variable name on sparcJakub Kicinski
I recently added a variable called addr to tulip_init_one() but for sparc there's already a variable called that half way thru the function. Rename it to fix build. Fixes: ca8793175564 ("ethernet: tulip: remove direct netdev->dev_addr writes") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12net: hns3: debugfs add support dumping page pool infoHao Chen
Add a file node "page_pool_info" for debugfs, then cat this file node to dump page pool info as below: QUEUE_ID ALLOCATE_CNT FREE_CNT POOL_SIZE(PAGE_NUM) ORDER NUMA_ID MAX_LEN 0 512 0 512 0 2 4K 1 512 0 512 0 2 4K 2 512 0 512 0 2 4K 3 512 0 512 0 2 4K 4 512 0 512 0 2 4K Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12tulip: fix setting device address from romJakub Kicinski
I missed removing i from the array index when converting from a loop to a direct copy. Fixes: ca8793175564 ("ethernet: tulip: remove direct netdev->dev_addr writes") Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12Merge branch 'Managed-Neighbor-Entries'David S. Miller
Daniel Borkmann says: ==================== Managed Neighbor Entries This series adds a couple of fixes related to NTF_EXT_LEARNED and NTF_USE neighbor flags, extends the UAPI with a new NDA_FLAGS_EXT netlink attribute in order to be able to add new neighbor flags from user space given all current struct ndmsg / ndm_flags bits are used up. Finally, the core of this series adds a new NTF_EXT_MANAGED flag to neighbors, which allows user space control planes to add 'managed' neighbor entries. Meaning, user space may either transition existing entries or can push down new L3 entries without lladdr into the kernel where the latter will periodically try to keep such NTF_EXT_MANAGED managed entries in reachable state. Main use case for this series are XDP / tc BPF load-balancers which make use of the bpf_fib_lookup() helper for backends. For more details, please see individual patches. Thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12net, neigh: Add NTF_MANAGED flag for managed neighbor entriesDaniel Borkmann
Allow a user space control plane to insert entries with a new NTF_EXT_MANAGED flag. The flag then indicates to the kernel that the neighbor entry should be periodically probed for keeping the entry in NUD_REACHABLE state iff possible. The use case for this is targeting XDP or tc BPF load-balancers which use the bpf_fib_lookup() BPF helper in order to piggyback on neighbor resolution for their backends. Given they cannot be resolved in fast-path, a control plane inserts the L3 (without L2) entries manually into the neighbor table and lets the kernel do the neighbor resolution either on the gateway or on the backend directly in case the latter resides in the same L2. This avoids to deal with L2 in the control plane and to rebuild what the kernel already does best anyway. NTF_EXT_MANAGED can be combined with NTF_EXT_LEARNED in order to avoid GC eviction. The kernel then adds NTF_MANAGED flagged entries to a per-neighbor table which gets triggered by the system work queue to periodically call neigh_event_send() for performing the resolution. The implementation allows migration from/to NTF_MANAGED neighbor entries, so that already existing entries can be converted by the control plane if needed. Potentially, we could make the interval for periodically calling neigh_event_send() configurable; right now it's set to DELAY_PROBE_TIME which is also in line with mlxsw which has similar driver-internal infrastructure c723c735fa6b ("mlxsw: spectrum_router: Periodically update the kernel's neigh table"). In future, the latter could possibly reuse the NTF_MANAGED neighbors as well. Example: # ./ip/ip n replace 192.168.178.30 dev enp5s0 managed extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a managed extern_learn REACHABLE [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Roopa Prabhu <roopa@nvidia.com> Link: https://linuxplumbersconf.org/event/11/contributions/953/ Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12net, neigh: Extend neigh->flags to 32 bit to allow for extensionsRoopa Prabhu
Currently, all bits in struct ndmsg's ndm_flags are used up with the most recent addition of 435f2e7cc0b7 ("net: bridge: add support for sticky fdb entries"). This makes it impossible to extend the neighboring subsystem with new NTF_* flags: struct ndmsg { __u8 ndm_family; __u8 ndm_pad1; __u16 ndm_pad2; __s32 ndm_ifindex; __u16 ndm_state; __u8 ndm_flags; __u8 ndm_type; }; There are ndm_pad{1,2} attributes which are not used. However, due to uncareful design, the kernel does not enforce them to be zero upon new neighbor entry addition, and given they've been around forever, it is not possible to reuse them today due to risk of breakage. One option to overcome this limitation is to add a new NDA_FLAGS_EXT attribute for extended flags. In struct neighbour, there is a 3 byte hole between protocol and ha_lock, which allows neigh->flags to be extended from 8 to 32 bits while still being on the same cacheline as before. This also allows for all future NTF_* flags being in neigh->flags rather than yet another flags field. Unknown flags in NDA_FLAGS_EXT will be rejected by the kernel. Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12net, neigh: Enable state migration between NUD_PERMANENT and NTF_USEDaniel Borkmann
Currently, it is not possible to migrate a neighbor entry between NUD_PERMANENT state and NTF_USE flag with a dynamic NUD state from a user space control plane. Similarly, it is not possible to add/remove NTF_EXT_LEARNED flag from an existing neighbor entry in combination with NTF_USE flag. This is due to the latter directly calling into neigh_event_send() without any meta data updates as happening in __neigh_update(). Thus, to enable this use case, extend the latter with a NEIGH_UPDATE_F_USE flag where we break the NUD_PERMANENT state in particular so that a latter neigh_event_send() is able to re-resolve a neighbor entry. Before fix, NUD_PERMANENT -> NUD_* & NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] As can be seen, despite the admin-triggered replace, the entry remains in the NUD_PERMANENT state. After fix, NUD_PERMANENT -> NUD_* & NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE [...] # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn STALE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] After the fix, the admin-triggered replace switches to a dynamic state from the NTF_USE flag which triggered a new neighbor resolution. Likewise, we can transition back from there, if needed, into NUD_PERMANENT. Similar before/after behavior can be observed for below transitions: Before fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] After fix, NTF_USE -> NTF_USE | NTF_EXT_LEARNED -> NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [..] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12net, neigh: Fix NTF_EXT_LEARNED in combination with NTF_USEDaniel Borkmann
The NTF_EXT_LEARNED neigh flag is usually propagated back to user space upon dump of the neighbor table. However, when used in combination with NTF_USE flag this is not the case despite exempting the entry from the garbage collector. This results in inconsistent state since entries are typically marked in neigh->flags with NTF_EXT_LEARNED, but here they are not. Fix it by propagating the creation flag to ___neigh_create(). Before fix: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] After fix: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE [...] Fixes: 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12net: hns: Prefer struct_size over open coded arithmeticLen Baker
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, take the opportunity to refactor the hnae_handle structure to switch the last member to flexible array, changing the code accordingly. Also, fix the comment in the hnae_vf_cb structure to inform that the ae_handle member must be the last member. Then, use the struct_size() helper to do the arithmetic instead of the argument "size + count * size" in the kzalloc() function. This code was detected with the help of Coccinelle and audited and fixed manually. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by: Len Baker <len.baker@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>