summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-05-23net: dsa: felix: update bridge fwd mask from ocelot lib when changing ↵Vladimir Oltean
tag_8021q CPU Add more logic to ocelot_port_{,un}set_dsa_8021q_cpu() from the ocelot switch lib by encapsulating the ocelot_apply_bridge_fwd_mask() call that felix used to have. This is necessary because the CPU port change procedure will also need to do this, and it's good to reduce code duplication by having an entry point in the ocelot switch lib that does all that is needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-23net: dsa: felix: move the updating of PGID_CPU to the ocelot libVladimir Oltean
PGID_CPU must be updated every time a port is configured or unconfigured as a tag_8021q CPU port. The ocelot switch lib already has a hook for that operation, so move the updating of PGID_CPU to those hooks. These bits are pretty specific to DSA, so normally I would keep them out of the common switch lib, but when tag_8021q is in use, this has implications upon the forwarding mask determined by ocelot_apply_bridge_fwd_mask() and called extensively by the switch lib. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-23net: dsa: fix missing adjustment of host broadcast floodingVladimir Oltean
PGID_BC is configured statically by ocelot_init() to flood towards the CPU port module, and dynamically by ocelot_port_set_bcast_flood() towards all user ports. When the tagging protocol changes, the intention is to turn off flooding towards the old pipe towards the host, and to turn it on towards the new pipe. Due to a recent change which removed the adjustment of PGID_BC from felix_set_host_flood(), 3 things happen. - when we change from NPI to tag_8021q mode: in this mode, the CPU port module is accessed via registers, and used to read PTP packets with timestamps. We fail to disable broadcast flooding towards the CPU port module, and to enable broadcast flooding towards the physical port that serves as a DSA tag_8021q CPU port. - from tag_8021q to NPI mode: in this mode, the CPU port module is redirected to a physical port. We fail to disable broadcast flooding towards the physical tag_8021q CPU port, and to enable it towards the CPU port module at ocelot->num_phys_ports. - when the ports are put in promiscuous mode, we also fail to update PGID_BC towards the host pipe of the current protocol. First issue means that felix_check_xtr_pkt() has to do extra work, because it will not see only PTP packets, but also broadcasts. It needs to dequeue these packets just to drop them. Third issue is inconsequential, since PGID_BC is allocated from the nonreserved multicast PGID space, and these PGIDs are conveniently initialized to 0x7f (i.e. flood towards all ports except the CPU port module). Broadcasts reach the NPI port via ocelot_init(), and reach the tag_8021q CPU port via the hardware defaults. Second issue is also inconsequential, because we fail both at disabling and at enabling broadcast flooding on a port, so the defaults mentioned above are preserved, and they are fine except for the performance impact. Fixes: 7a29d220f4c0 ("net: dsa: felix: reimplement tagging protocol change with function pointers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Merge branch 'fix-silence-gcc-12-warnings-in-drivers-net-wireless'Jakub Kicinski
Jakub Kicinski says: ==================== Fix/silence GCC 12 warnings in drivers/net/wireless/ as mentioned off list we'd like to get GCC 12 warnings quashed. This set takes care of the warnings we have in drivers/net/wireless/ mostly by relegating them to W=1/W=2 builds. ==================== Link: https://lore.kernel.org/r/20220520194320.2356236-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-22wifi: carl9170: silence a GCC 12 -Warray-bounds warningJakub Kicinski
carl9170 has a big union (struct carl9170_cmd) with all the command types in it. But it allocates buffers only large enough for a given command. This upsets GCC 12: drivers/net/wireless/ath/carl9170/cmd.c:125:30: warning: array subscript ‘struct carl9170_cmd[0]’ is partly outside array bounds of ‘unsigned char[8]’ [-Warray-bounds] 125 | tmp->hdr.cmd = cmd; | ~~~~~~~~~~~~~^~~~~ Punt the warning to W=1 for now. Hopefully GCC will learn to recognize which fields are in-bounds. Acked-by: Christian Lamparter <chunkeey@gmail.com> Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-22wifi: brcmfmac: work around a GCC 12 -Warray-bounds warningJakub Kicinski
GCC 12 really doesn't like partial struct allocations: drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:2202:32: warning: array subscript ‘struct brcmf_ext_join_params_le[0]’ is partly outside array bounds of ‘void[70]’ [-Warray-bounds] 2202 | ext_join_params->scan_le.passive_time = | ^~ brcmfmac is trying to save 2 bytes at the end by either allocating or not allocating a channel member. Let's keep @join_params_size the "right" size but kmalloc() the full structure. Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-22wifi: iwlwifi: use unsigned to silence a GCC 12 warningJakub Kicinski
GCC 12 says: drivers/net/wireless/intel/iwlwifi/mvm/sta.c:1076:37: warning: array subscript -1 is below array bounds of ‘struct iwl_mvm_tid_data[9]’ [-Warray-bounds] 1076 | if (mvmsta->tid_data[tid].state != IWL_AGG_OFF) | ~~~~~~~~~~~~~~~~^~~~~ Whatever, tid is a bit from for_each_set_bit(), it's clearly unsigned. Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-22wifi: ath6k: silence false positive -Wno-dangling-pointer warning on GCC 12Jakub Kicinski
For some reason GCC 12 decided to complain about the common pattern of queuing an object onto a list on the stack in ath6k: inlined from ‘ath6kl_htc_mbox_tx’ at drivers/net/wireless/ath/ath6kl/htc_mbox.c:1142:3: include/linux/list.h:74:19: warning: storing the address of local variable ‘queue’ in ‘*&packet_15(D)->list.prev’ [-Wdangling-pointer=] 74 | new->prev = prev; | ~~~~~~~~~~^~~~~~ Move the warning to W=1, hopefully it goes away with a compiler update. Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-22wifi: rtlwifi: remove always-true condition pointed out by GCC 12Jakub Kicinski
The .value is a two-dim array, not a pointer. struct iqk_matrix_regs { bool iqk_done; long value[1][IQK_MATRIX_REG_NUM]; }; Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-22wifi: ath9k: silence array-bounds warning on GCC 12Jakub Kicinski
GCC 12 says: drivers/net/wireless/ath/ath9k/mac.c: In function ‘ath9k_hw_resettxqueue’: drivers/net/wireless/ath/ath9k/mac.c:373:22: warning: array subscript 32 is above array bounds of ‘struct ath9k_tx_queue_info[10]’ [-Warray-bounds] 373 | qi = &ah->txq[q]; | ~~~~~~~^~~ I don't know where it got the 32 from, relegate the warning to W=1+. Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-22wifi: plfxlc: remove redundant NULL-check for GCC 12Jakub Kicinski
GCC is upset that we check the return value of plfxlc_usb_dev() even tho it can't be NULL: drivers/net/wireless/purelifi/plfxlc/usb.c: In function ‘resume’: drivers/net/wireless/purelifi/plfxlc/usb.c:840:20: warning: the comparison will always evaluate as ‘true’ for the address of ‘dev’ will never be NULL [-Waddress] 840 | if (!pl || !plfxlc_usb_dev(pl)) | ^ plfxlc_usb_dev() returns an address of one of the members of pl, so it's safe to drop these checks. Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-22dt-bindings: net: toshiba,visconti-dwmac: Update the common clock propertiesNobuhiro Iwamatsu
The clock for this driver switched to the common clock controller driver. Therefore, update common clock properties for ethernet device in the binding document. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: fddi: skfp: smt: Remove extra parameters to vararg macroTom Rix
cppcheck reports [drivers/net/fddi/skfp/smt.c:750]: (warning) printf format string requires 0 parameters but 2 are given. DB_SBAN is a vararg macro, like DB_ESSN. Remove the extra args and the nl. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Merge branch 'mt7986-support'David S. Miller
Jakub Kicinski says: ==================== introduce mt7986 ethernet support Add support for mt7986-eth driver available on mt7986 soc. Changes since v2: - rely on GFP_KERNEL whenever possible - define mtk_reg_map struct to introduce soc register map and avoid macros - improve comments Changes since v1: - drop SRAM option - convert ring->dma to void - convert scratch_ring to void - enable port4 - fix irq dts bindings - drop gmac1 support from mt7986a-rfb dts for the moment ==================== Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: introduce support for mt7986 chipsetLorenzo Bianconi
Add support for mt7986-eth driver available on mt7986 soc. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: convert scratch_ring pointer to voidLorenzo Bianconi
Simplify the code converting scratch_ring pointer to void Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: convert ring dma pointer to voidLorenzo Bianconi
Simplify the code converting {tx,rx} ring dma pointer to void Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 supportLorenzo Bianconi
Introduce MTK_NETSYS_V2 support. MTK_NETSYS_V2 defines 32B TX/RX DMA descriptors. This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: introduce device register mapLorenzo Bianconi
Introduce reg_map structure to add the capability to support different register definitions. Move register definitions in mtk_regmap structure. This is a preliminary patch to introduce mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on rxd_size field in mtk_rx_alloc/mtk_rx_cleanLorenzo Bianconi
Remove mtk_rx_dma structure layout dependency in mtk_rx_alloc/mtk_rx_clean. Initialize to 0 rxd3 and rxd4 in mtk_rx_alloc. This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on txd_size field in mtk_poll_tx/mtk_poll_rxLorenzo Bianconi
This is a preliminary to ad mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: add rxd_size to mtk_soc_dataLorenzo Bianconi
Similar to tx counterpart, introduce rxd_size in mtk_soc_data data structure. This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on txd_size in txd_to_idxLorenzo Bianconi
This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on txd_size in mtk_desc_to_tx_bufLorenzo Bianconi
This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on txd_size in mtk_tx_alloc/mtk_tx_cleanLorenzo Bianconi
This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: add txd_size to mtk_soc_dataLorenzo Bianconi
In order to remove mtk_tx_dma size dependency, introduce txd_size in mtk_soc_data data structure. Rely on txd_size in mtk_init_fq_dma() and mtk_dma_free() routines. This is a preliminary patch to add mt7986 ethernet support. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: move tx dma desc configuration in ↵Lorenzo Bianconi
mtk_tx_set_dma_desc Move tx dma descriptor configuration in mtk_tx_set_dma_desc routine. This is a preliminary patch to introduce mt7986 ethernet support since it relies on a different tx dma descriptor layout. Tested-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: ethernet: mtk_eth_soc: rely on GFP_KERNEL for dma_alloc_coherent ↵Lorenzo Bianconi
whenever possible Rely on GFP_KERNEL for dma descriptors mappings in mtk_tx_alloc(), mtk_rx_alloc() and mtk_init_fq_dma() since they are run in non-irq context. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22dt-bindings: net: mediatek,net: add mt7986-eth bindingLorenzo Bianconi
Introduce dts bindings for mt7986 soc in mediatek,net.yaml. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22arm64: dts: mediatek: mt7986: introduce ethernet nodesLorenzo Bianconi
Introduce ethernet nodes in mt7986 bindings in order to enable mt7986a/mt7986b ethernet support. Co-developed-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Merge branch 'net-gcc12-warnings'David S. Miller
Jakub Kicinski says: ==================== eth: silence the GCC 12 array-bounds warnings Silence the array-bounds warnings in Ethernet drivers. v2 uses -Wno-array-bounds directly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22eth: tg3: silence the GCC 12 array-bounds warningJakub Kicinski
GCC 12 currently generates a rather inconsistent warning: drivers/net/ethernet/broadcom/tg3.c:17795:51: warning: array subscript 5 is above array bounds of ‘struct tg3_napi[5]’ [-Warray-bounds] 17795 | struct tg3_napi *tnapi = &tp->napi[i]; | ~~~~~~~~^~~ i is guaranteed < tp->irq_max which in turn is either 1 or 5. There are more loops like this one in the driver, but strangely GCC 12 dislikes only this single one. Silence this silliness for now. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22eth: ice: silence the GCC 12 array-bounds warningJakub Kicinski
GCC 12 gets upset because driver allocates partial struct ice_aqc_sw_rules_elem buffers. The writes are within bounds. Silence these warnings for now, our build bot runs GCC 12 so we won't allow any new instances. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22eth: mtk_eth_soc: silence the GCC 12 array-bounds warningJakub Kicinski
GCC 12 gets upset because in mtk_foe_entry_commit_subflow() this driver allocates a partial structure. The writes are within bounds. Silence these warnings for now, our build bot runs GCC 12 so we won't allow any new instances. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: mscc: ocelot: offload tc action "ok" using an empty action vectorVladimir Oltean
The "ok" tc action is useful when placed in front of a more generic filter to exclude some more specific rules from matching it. The ocelot switches can offload this tc action by creating an empty action vector (no _ENA fields set to 1). This makes sense for all of VCAP IS1, IS2 and ES0 (but not for PSFP). Add support for this action. Note that this makes the gact_drop_and_ok_test() selftest pass, where "action ok" is used in front of an "action drop" rule, both offloaded to VCAP IS2. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Merge branch 'ocelot-selftests'David S. Miller
Vladimir Oltean says: ==================== Streamline Ocelot tc-chains selftest This series changes the output and the argument format of the Ocelot switch selftest so that it is more similar to what can be found in tools/testing/selftests/net/forwarding/. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22selftests: ocelot: tc_flower_chains: reorder interfacesVladimir Oltean
Use the standard interface order h1, swp1, swp2, h2 that is used by the forwarding selftest framework. The previous order was confusing even with the ASCII drawing. That isn't needed anymore. This also drops the fixed MAC addresses and uses STABLE_MAC_ADDRS, which ensures the MAC addresses are unique. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22selftests: ocelot: tc_flower_chains: use conventional interface namesVladimir Oltean
This is a robotic rename as follows: eth0 -> swp1 eth1 -> swp2 eth2 -> h2 eth3 -> h1 This brings the selftest more in line with the other forwarding selftests, where h1 is connected to swp1, and h2 to swp2. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22selftests: ocelot: tc_flower_chains: streamline test outputVladimir Oltean
Bring this driver-specific selftest output in line with the other selftests. Before: Testing VLAN pop.. OK Testing VLAN push.. OK Testing ingress VLAN modification.. OK Testing egress VLAN modification.. OK Testing frame prioritization.. OK After: TEST: VLAN pop [ OK ] TEST: VLAN push [ OK ] TEST: Ingress VLAN modification [ OK ] TEST: Egress VLAN modification [ OK ] TEST: Frame prioritization [ OK ] Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: wrap the wireless pointers in struct net_device in an ifdefJakub Kicinski
Most protocol-specific pointers in struct net_device are under a respective ifdef. Wireless is the notable exception. Since there's a sizable number of custom-built kernels for datacenter workloads which don't build wireless it seems reasonable to ifdefy those pointers as well. While at it move IPv4 and IPv6 pointers up, those are special for obvious reasons. Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> # ieee802154 Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: fec: Do proper error checking for enet_out clkUwe Kleine-König
An error code returned by devm_clk_get() might have other meanings than "This clock doesn't exist". So use devm_clk_get_optional() and handle all remaining errors as fatal. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: phy: DP83822: enable rgmii mode if phy_interface_is_rgmiiTommaso Merciai
RGMII mode can be enable from dp83822 straps, and also writing bit 9 of register 0x17 - RMII and Status Register (RCSR). When phy_interface_is_rgmii rgmii mode must be enabled, same for contrary, this prevents malconfigurations of hw straps References: - https://www.ti.com/lit/gpn/dp83822i p66 Signed-off-by: Tommaso Merciai <tommaso.merciai@amarulasolutions.com> Co-developed-by: Michael Trimarchi <michael@amarulasolutions.com> Suggested-by: Alberto Bianchi <alberto.bianchi@amarulasolutions.com> Tested-by: Tommaso Merciai <tommaso.merciai@amarulasolutions.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22net: selftests: Add stress_reuseport_listen to .gitignoreMuhammad Usama Anjum
Add newly added stress_reuseport_listen object to .gitignore file. Fixes: ec8cb4f617a2 ("net: selftests: Stress reuseport listen") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22Merge branch 'rxrpc-misc'David S. Miller
David Howells says: ==================== rxrpc: Miscellaneous changes Here are some miscellaneous changes for AF_RXRPC: (1) Allow the list of local endpoints to be viewed through /proc. (2) Switch to using refcount_t for refcounting. (3) Fix a locking issue found by lockdep. (4) Autogenerate tracing symbol enums from symbol->string maps to make it easier to keep them in sync. (5) Return an error to sendmsg() if a call it tried to set up failed. Because it failed at this point, no notification will be generated for recvmsg to pick up - but userspace still needs to know about the failure. (6) Fix the selection of abort codes generated by internal events. In particular, rxrpc and kafs shouldn't be generating RX_USER_ABORT unless it's because userspace did something to cancel a call. (7) Adjust the interpretation and handling of certain ACK types to try and detect NAT changes causing a call to seem to start mid-flow from a different peer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22afs: Adjust ACK interpretation to try and cope with NATDavid Howells
If a client's address changes, say if it is NAT'd, this can disrupt an in progress operation. For most operations, this is not much of a problem, but StoreData can be different as some servers modify the target file as the data comes in, so if a store request is disrupted, the file can get corrupted on the server. The problem is that the server doesn't recognise packets that come after the change of address as belonging to the original client and will bounce them, either by sending an OUT_OF_SEQUENCE ACK to the apparent new call if the packet number falls within the initial sequence number window of a call or by sending an EXCEEDS_WINDOW ACK if it falls outside and then aborting it. In both cases, firstPacket will be 1 and previousPacket will be 0 in the ACK information. Fix this by the following means: (1) If a client call receives an EXCEEDS_WINDOW ACK with firstPacket as 1 and previousPacket as 0, assume this indicates that the server saw the incoming packets from a different peer and thus as a different call. Fail the call with error -ENETRESET. (2) Also fail the call if a similar OUT_OF_SEQUENCE ACK occurs if the first packet has been hard-ACK'd. If it hasn't been hard-ACK'd, the ACK packet will cause it to get retransmitted, so the call will just be repeated. (3) Make afs_select_fileserver() treat -ENETRESET as a straight fail of the operation. (4) Prioritise the error code over things like -ECONNRESET as the server did actually respond. (5) Make writeback treat -ENETRESET as a retryable error and make it redirty all the pages involved in a write so that the VM will retry. Note that there is still a circumstance that I can't easily deal with: if the operation is fully received and processed by the server, but the reply is lost due to address change. There's no way to know if the op happened. We can examine the server, but a conflicting change could have been made by a third party - and we can't tell the difference. In such a case, a message like: kAFS: vnode modified {100058:146266} b7->b8 YFS.StoreData64 (op=2646a) will be logged to dmesg on the next op to touch the file and the client will reset the inode state, including invalidating clean parts of the pagecache. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004811.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc, afs: Fix selection of abort codesDavid Howells
The RX_USER_ABORT code should really only be used to indicate that the user of the rxrpc service (ie. userspace) implicitly caused a call to be aborted - for instance if the AF_RXRPC socket is closed whilst the call was in progress. (The user may also explicitly abort a call and specify the abort code to use). Change some of the points of generation to use other abort codes instead: (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see ENOMEM and EFAULT during received data delivery and abort with RX_CALL_DEAD in the default case. (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a reply. (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't. (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not completed successfully or been aborted. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Return an error to sendmsg if call failedDavid Howells
If at the end of rxrpc sendmsg() or rxrpc_kernel_send_data() the call that was being given data was aborted remotely or otherwise failed, return an error rather than returning the amount of data buffered for transmission. The call (presumably) did not complete, so there's not much point continuing with it. AF_RXRPC considers it "complete" and so will be unwilling to do anything else with it - and won't send a notification for it, deeming the return from sendmsg sufficient. Not returning an error causes afs to incorrectly handle a StoreData operation that gets interrupted by a change of address due to NAT reconfiguration. This doesn't normally affect most operations since their request parameters tend to fit into a single UDP packet and afs_make_call() returns before the server responds; StoreData is different as it involves transmission of a lot of data. This can be triggered on a client by doing something like: dd if=/dev/zero of=/afs/example.com/foo bs=1M count=512 at one prompt, and then changing the network address at another prompt, e.g.: ifconfig enp6s0 inet 192.168.6.2 && route add 192.168.6.1 dev enp6s0 Tracing packets on an Auristor fileserver looks something like: 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 <ARP exchange for 192.168.6.2> 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.1 -> 192.168.6.2 RX 107 ACK Exceeds Window Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 29321 Call: 4 Source Port: 7000 Destination Port: 7001 The Auristor fileserver logs code -453 (RXGEN_SS_UNMARSHAL), but the abort code received by kafs is -5 (RX_PROTOCOL_ERROR) as the rx layer sees the condition and generates an abort first and the unmarshal error is a consequence of that at the application layer. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004810.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Automatically generate trace tag enumsDavid Howells
Automatically generate trace tag enums from the symbol -> string mapping tables rather than having the enums as well, thereby reducing duplicated data. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Fix locking issueDavid Howells
There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock. ================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario: CPU0 ---- lock(&rxnet->call_lock); <Interrupt> lock(&rxnet->call_lock); *** DEADLOCK *** 1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22rxrpc: Use refcount_t rather than atomic_tDavid Howells
Move to using refcount_t rather than atomic_t for refcounts in rxrpc. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>