diff options
Diffstat (limited to 'Documentation/networking')
22 files changed, 696 insertions, 284 deletions
diff --git a/Documentation/networking/dccp.rst b/Documentation/networking/dccp.rst deleted file mode 100644 index 91e5c33ba3ff..000000000000 --- a/Documentation/networking/dccp.rst +++ /dev/null @@ -1,219 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -============= -DCCP protocol -============= - - -.. Contents - - Introduction - - Missing features - - Socket options - - Sysctl variables - - IOCTLs - - Other tunables - - Notes - - -Introduction -============ -Datagram Congestion Control Protocol (DCCP) is an unreliable, connection -oriented protocol designed to solve issues present in UDP and TCP, particularly -for real-time and multimedia (streaming) traffic. -It divides into a base protocol (RFC 4340) and pluggable congestion control -modules called CCIDs. Like pluggable TCP congestion control, at least one CCID -needs to be enabled in order for the protocol to function properly. In the Linux -implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as -the TCP-friendly CCID3 (RFC 4342), are optional. -For a brief introduction to CCIDs and suggestions for choosing a CCID to match -given applications, see section 10 of RFC 4340. - -It has a base protocol and pluggable congestion control IDs (CCIDs). - -DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol -is at http://www.ietf.org/html.charters/dccp-charter.html - - -Missing features -================ -The Linux DCCP implementation does not currently support all the features that are -specified in RFCs 4340...42. - -The known bugs are at: - - http://www.linuxfoundation.org/collaborate/workgroups/networking/todo#DCCP - -For more up-to-date versions of the DCCP implementation, please consider using -the experimental DCCP test tree; instructions for checking this out are on: -http://www.linuxfoundation.org/collaborate/workgroups/networking/dccp_testing#Experimental_DCCP_source_tree - - -Socket options -============== -DCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes -a policy ID as argument and can only be set before the connection (i.e. changes -during an established connection are not supported). Currently, two policies are -defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special, -and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an -u32 priority value as ancillary data to sendmsg(), where higher numbers indicate -a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to -be formatted using a cmsg(3) message header filled in as follows:: - - cmsg->cmsg_level = SOL_DCCP; - cmsg->cmsg_type = DCCP_SCM_PRIORITY; - cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */ - -DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero -value is always interpreted as unbounded queue length. If different from zero, -the interpretation of this parameter depends on the current dequeuing policy -(see above): the "simple" policy will enforce a fixed queue size by returning -EAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the -lowest-priority packet first. The default value for this parameter is -initialised from /proc/sys/net/dccp/default/tx_qlen. - -DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of -service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, -the socket will fall back to 0 (which means that no meaningful service code -is present). On active sockets this is set before connect(); specifying more -than one code has no effect (all subsequent service codes are ignored). The -case is different for passive sockets, where multiple service codes (up to 32) -can be set before calling bind(). - -DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet -size (application payload size) in bytes, see RFC 4340, section 14. - -DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs -supported by the endpoint. The option value is an array of type uint8_t whose -size is passed as option length. The minimum array size is 4 elements, the -value returned in the optlen argument always reflects the true number of -built-in CCIDs. - -DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same -time, combining the operation of the next two socket options. This option is -preferable over the latter two, since often applications will use the same -type of CCID for both directions; and mixed use of CCIDs is not currently well -understood. This socket option takes as argument at least one uint8_t value, or -an array of uint8_t values, which must match available CCIDS (see above). CCIDs -must be registered on the socket before calling connect() or listen(). - -DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets -the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID. -Please note that the getsockopt argument type here is ``int``, not uint8_t. - -DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID. - -DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold -timewait state when closing the connection (RFC 4340, 8.3). The usual case is -that the closing server sends a CloseReq, whereupon the client holds timewait -state. When this boolean socket option is on, the server sends a Close instead -and will enter TIMEWAIT. This option must be set after accept() returns. - -DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the -partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums -always cover the entire packet and that only fully covered application data is -accepted by the receiver. Hence, when using this feature on the sender, it must -be enabled at the receiver, too with suitable choice of CsCov. - -DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the - range 0..15 are acceptable. The default setting is 0 (full coverage), - values between 1..15 indicate partial coverage. - -DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it - sets a threshold, where again values 0..15 are acceptable. The default - of 0 means that all packets with a partial coverage will be discarded. - Values in the range 1..15 indicate that packets with minimally such a - coverage value are also acceptable. The higher the number, the more - restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage - settings are inherited to the child socket after accept(). - -The following two options apply to CCID 3 exclusively and are getsockopt()-only. -In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned. - -DCCP_SOCKOPT_CCID_RX_INFO - Returns a ``struct tfrc_rx_info`` in optval; the buffer for optval and - optlen must be set to at least sizeof(struct tfrc_rx_info). - -DCCP_SOCKOPT_CCID_TX_INFO - Returns a ``struct tfrc_tx_info`` in optval; the buffer for optval and - optlen must be set to at least sizeof(struct tfrc_tx_info). - -On unidirectional connections it is useful to close the unused half-connection -via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. - - -Sysctl variables -================ -Several DCCP default parameters can be managed by the following sysctls -(sysctl net.dccp.default or /proc/sys/net/dccp/default): - -request_retries - The number of active connection initiation retries (the number of - Requests minus one) before timing out. In addition, it also governs - the behaviour of the other, passive side: this variable also sets - the number of times DCCP repeats sending a Response when the initial - handshake does not progress from RESPOND to OPEN (i.e. when no Ack - is received after the initial Request). This value should be greater - than 0, suggested is less than 10. Analogue of tcp_syn_retries. - -retries1 - How often a DCCP Response is retransmitted until the listening DCCP - side considers its connecting peer dead. Analogue of tcp_retries1. - -retries2 - The number of times a general DCCP packet is retransmitted. This has - importance for retransmitted acknowledgments and feature negotiation, - data packets are never retransmitted. Analogue of tcp_retries2. - -tx_ccid = 2 - Default CCID for the sender-receiver half-connection. Depending on the - choice of CCID, the Send Ack Vector feature is enabled automatically. - -rx_ccid = 2 - Default CCID for the receiver-sender half-connection; see tx_ccid. - -seq_window = 100 - The initial sequence window (sec. 7.5.2) of the sender. This influences - the local ackno validity and the remote seqno validity windows (7.5.1). - Values in the range Wmin = 32 (RFC 4340, 7.5.2) up to 2^32-1 can be set. - -tx_qlen = 5 - The size of the transmit buffer in packets. A value of 0 corresponds - to an unbounded transmit buffer. - -sync_ratelimit = 125 ms - The timeout between subsequent DCCP-Sync packets sent in response to - sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit - of this parameter is milliseconds; a value of 0 disables rate-limiting. - - -IOCTLS -====== -FIONREAD - Works as in udp(7): returns in the ``int`` argument pointer the size of - the next pending datagram in bytes, or 0 when no datagram is pending. - -SIOCOUTQ - Returns the number of unsent data bytes in the socket send queue as ``int`` - into the buffer specified by the argument pointer. - -Other tunables -============== -Per-route rto_min support - CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value - of the RTO timer. This setting can be modified via the 'rto_min' option - of iproute2; for example:: - - > ip route change 10.0.0.0/24 rto_min 250j dev wlan0 - > ip route add 10.0.0.254/32 rto_min 800j dev wlan0 - > ip route show dev wlan0 - - CCID-3 also supports the rto_min setting: it is used to define the lower - bound for the expiry of the nofeedback timer. This can be useful on LANs - with very low RTTs (e.g., loopback, Gbit ethernet). - - -Notes -===== -DCCP does not travel through NAT successfully at present on many boxes. This is -because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT -support for DCCP has been added. diff --git a/Documentation/networking/device_drivers/ethernet/huawei/hinic3.rst b/Documentation/networking/device_drivers/ethernet/huawei/hinic3.rst new file mode 100644 index 000000000000..e3dfd083fa52 --- /dev/null +++ b/Documentation/networking/device_drivers/ethernet/huawei/hinic3.rst @@ -0,0 +1,137 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================================================== +Linux kernel driver for Huawei Ethernet Device Driver (hinic3) family +===================================================================== + +Overview +======== + +The hinic3 is a network interface card (NIC) for Data Center. It supports +a range of link-speed devices (10GE, 25GE, 100GE, etc.). The hinic3 +devices can have multiple physical forms: LOM (Lan on Motherboard) NIC, +PCIe standard NIC, OCP (Open Compute Project) NIC, etc. + +The hinic3 driver supports the following features: +- IPv4/IPv6 TCP/UDP checksum offload +- TSO (TCP Segmentation Offload), LRO (Large Receive Offload) +- RSS (Receive Side Scaling) +- MSI-X interrupt aggregation configuration and interrupt adaptation. +- SR-IOV (Single Root I/O Virtualization). + +Content +======= + +- Supported PCI vendor ID/device IDs +- Source Code Structure of Hinic3 Driver +- Management Interface + +Supported PCI vendor ID/device IDs +================================== + +19e5:0222 - hinic3 PF/PPF +19e5:375F - hinic3 VF + +Prime Physical Function (PPF) is responsible for the management of the +whole NIC card. For example, clock synchronization between the NIC and +the host. Any PF may serve as a PPF. The PPF is selected dynamically. + +Source Code Structure of Hinic3 Driver +====================================== + +======================== ================================================ +hinic3_pci_id_tbl.h Supported device IDs +hinic3_hw_intf.h Interface between HW and driver +hinic3_queue_common.[ch] Common structures and methods for NIC queues +hinic3_common.[ch] Encapsulation of memory operations in Linux +hinic3_csr.h Register definitions in the BAR +hinic3_hwif.[ch] Interface for BAR +hinic3_eqs.[ch] Interface for AEQs and CEQs +hinic3_mbox.[ch] Interface for mailbox +hinic3_mgmt.[ch] Management interface based on mailbox and AEQ +hinic3_wq.[ch] Work queue data structures and interface +hinic3_cmdq.[ch] Command queue is used to post command to HW +hinic3_hwdev.[ch] HW structures and methods abstractions +hinic3_lld.[ch] Auxiliary driver adaptation layer +hinic3_hw_comm.[ch] Interface for common HW operations +hinic3_mgmt_interface.h Interface between firmware and driver +hinic3_hw_cfg.[ch] Interface for HW configuration +hinic3_irq.c Interrupt request +hinic3_netdev_ops.c Operations registered to Linux kernel stack +hinic3_nic_dev.h NIC structures and methods abstractions +hinic3_main.c Main Linux kernel driver +hinic3_nic_cfg.[ch] NIC service configuration +hinic3_nic_io.[ch] Management plane interface for TX and RX +hinic3_rss.[ch] Interface for Receive Side Scaling (RSS) +hinic3_rx.[ch] Interface for transmit +hinic3_tx.[ch] Interface for receive +hinic3_ethtool.c Interface for ethtool operations (ops) +hinic3_filter.c Interface for MAC address +======================== ================================================ + +Management Interface +==================== + +Asynchronous Event Queue (AEQ) +------------------------------ + +AEQ receives high priority events from the HW over a descriptor queue. +Every descriptor is a fixed size of 64 bytes. AEQ can receive solicited or +unsolicited events. Every device, VF or PF, can have up to 4 AEQs. +Every AEQ is associated to a dedicated IRQ. AEQ can receive multiple types +of events, but in practice the hinic3 driver ignores all events except for +2 mailbox related events. + +Mailbox +------- + +Mailbox is a communication mechanism between the hinic3 driver and the HW. +Each device has an independent mailbox. Driver can use the mailbox to send +requests to management. Driver receives mailbox messages, such as responses +to requests, over the AEQ (using event HINIC3_AEQ_FOR_MBOX). Due to the +limited size of mailbox data register, mailbox messages are sent +segment-by-segment. + +Every device can use its mailbox to post request to firmware. The mailbox +can also be used to post requests and responses between the PF and its VFs. + +Completion Event Queue (CEQ) +---------------------------- + +The implementation of CEQ is the same as AEQ. It receives completion events +from HW over a fixed size descriptor of 32 bits. Every device can have up +to 32 CEQs. Every CEQ has a dedicated IRQ. CEQ only receives solicited +events that are responses to requests from the driver. CEQ can receive +multiple types of events, but in practice the hinic3 driver ignores all +events except for HINIC3_CMDQ that represents completion of previously +posted commands on a cmdq. + +Command Queue (cmdq) +-------------------- + +Every cmdq has a dedicated work queue on which commands are posted. +Commands on the work queue are fixed size descriptor of size 64 bytes. +Completion of a command will be indicated using ctrl bits in the +descriptor that carried the command. Notification of command completions +will also be provided via event on CEQ. Every device has 4 command queues +that are initialized as a set (called cmdqs), each with its own type. +Hinic3 driver only uses type HINIC3_CMDQ_SYNC. + +Work Queues(WQ) +--------------- + +Work queues are logical arrays of fixed size WQEs. The array may be spread +over multiple non-contiguous pages using indirection table. Work queues are +used by I/O queues and command queues. + +Global function ID +------------------ + +Every function, PF or VF, has a unique ordinal identification within the device. +Many management commands (mbox or cmdq) contain this ID so HW can apply the +command effect to the right function. + +PF is allowed to post management commands to a subordinate VF by specifying the +VFs ID. A VF must provide its own ID. Anti-spoofing in the HW will cause +command from a VF to fail if it contains the wrong ID. + diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst index 05d822b904b4..139b4c75a191 100644 --- a/Documentation/networking/device_drivers/ethernet/index.rst +++ b/Documentation/networking/device_drivers/ethernet/index.rst @@ -28,6 +28,7 @@ Contents: freescale/gianfar google/gve huawei/hinic + huawei/hinic3 intel/e100 intel/e1000 intel/e1000e @@ -55,6 +56,7 @@ Contents: ti/cpsw_switchdev ti/am65_nuss_cpsw_switchdev ti/tlan + ti/icssg_prueth wangxun/txgbe wangxun/ngbe diff --git a/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst index 04e0595bb0a7..f8592dec8851 100644 --- a/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst +++ b/Documentation/networking/device_drivers/ethernet/meta/fbnic.rst @@ -28,9 +28,60 @@ devlink dev info provides version information for all three components. In addition to the version the hg commit hash of the build is included as a separate entry. +Upgrading Firmware +------------------ + +fbnic supports updating firmware using signed PLDM images with devlink dev +flash. PLDM images are written into the flash. Flashing does not interrupt +the operation of the device. + +On host boot the latest UEFI driver is always used, no explicit activation +is required. Firmware activation is required to run new control firmware. cmrt +firmware can only be activated by power cycling the NIC. + Statistics ---------- +TX MAC Interface +~~~~~~~~~~~~~~~~ + + - ``ptp_illegal_req``: packets sent to the NIC with PTP request bit set but routed to BMC/FW + - ``ptp_good_ts``: packets successfully routed to MAC with PTP request bit set + - ``ptp_bad_ts``: packets destined for MAC with PTP request bit set but aborted because of some error (e.g., DMA read error) + +TX Extension (TEI) Interface (TTI) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + - ``tti_cm_drop``: control messages dropped at the TX Extension (TEI) Interface because of credit starvation + - ``tti_frame_drop``: packets dropped at the TX Extension (TEI) Interface because of credit starvation + - ``tti_tbi_drop``: packets dropped at the TX BMC Interface (TBI) because of credit starvation + +RXB (RX Buffer) Enqueue +~~~~~~~~~~~~~~~~~~~~~~~ + + - ``rxb_integrity_err[i]``: frames enqueued with integrity errors (e.g., multi-bit ECC errors) on RXB input i + - ``rxb_mac_err[i]``: frames enqueued with MAC end-of-frame errors (e.g., bad FCS) on RXB input i + - ``rxb_parser_err[i]``: frames experienced RPC parser errors + - ``rxb_frm_err[i]``: frames experienced signaling errors (e.g., missing end-of-packet/start-of-packet) on RXB input i + - ``rxb_drbo[i]_frames``: frames received at RXB input i + - ``rxb_drbo[i]_bytes``: bytes received at RXB input i + +RXB (RX Buffer) FIFO +~~~~~~~~~~~~~~~~~~~~ + + - ``rxb_fifo[i]_drop``: transitions into the drop state on RXB pool i + - ``rxb_fifo[i]_dropped_frames``: frames dropped on RXB pool i + - ``rxb_fifo[i]_ecn``: transitions into the ECN mark state on RXB pool i + - ``rxb_fifo[i]_level``: current occupancy of RXB pool i + +RXB (RX Buffer) Dequeue +~~~~~~~~~~~~~~~~~~~~~~~ + + - ``rxb_intf[i]_frames``: frames sent to the output i + - ``rxb_intf[i]_bytes``: bytes sent to the output i + - ``rxb_pbuf[i]_frames``: frames sent to output i from the perspective of internal packet buffer + - ``rxb_pbuf[i]_bytes``: bytes sent to output i from the perspective of internal packet buffer + RPC (Rx parser) ~~~~~~~~~~~~~~~ @@ -44,6 +95,15 @@ RPC (Rx parser) - ``rpc_out_of_hdr_err``: frames where header was larger than parsable region - ``ovr_size_err``: oversized frames +Hardware Queues +~~~~~~~~~~~~~~~ + +1. RX DMA Engine: + + - ``rde_[i]_pkt_err``: packets with MAC EOP, RPC parser, RXB truncation, or RDE frame truncation errors. These error are flagged in the packet metadata because of cut-through support but the actual drop happens once PCIE/RDE is reached. + - ``rde_[i]_pkt_cq_drop``: packets dropped because RCQ is full + - ``rde_[i]_pkt_bdq_drop``: packets dropped because HPQ or PPQ ran out of host buffer + PCIe ~~~~ diff --git a/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst b/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst new file mode 100644 index 000000000000..da21ddf431bb --- /dev/null +++ b/Documentation/networking/device_drivers/ethernet/ti/icssg_prueth.rst @@ -0,0 +1,56 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================== +Texas Instruments ICSSG PRUETH ethernet driver +============================================== + +:Version: 1.0 + +ICSSG Firmware +============== + +Every ICSSG core has two Programmable Real-Time Unit(PRUs), two auxiliary +Real-Time Transfer Unit (RTUs), and two Transmit Real-Time Transfer Units +(TX_PRUs). Each one of these runs its own firmware. The firmwares combnined are +referred as ICSSG Firmware. + +Firmware Statistics +=================== + +The ICSSG firmware maintains certain statistics which are dumped by the driver +via ``ethtool -S <interface>`` + +These statistics are as follows, + + - ``FW_RTU_PKT_DROP``: Diagnostic error counter which increments when RTU drops a locally injected packet due to port being disabled or rule violation. + - ``FW_Q0_OVERFLOW``: TX overflow counter for queue0 + - ``FW_Q1_OVERFLOW``: TX overflow counter for queue1 + - ``FW_Q2_OVERFLOW``: TX overflow counter for queue2 + - ``FW_Q3_OVERFLOW``: TX overflow counter for queue3 + - ``FW_Q4_OVERFLOW``: TX overflow counter for queue4 + - ``FW_Q5_OVERFLOW``: TX overflow counter for queue5 + - ``FW_Q6_OVERFLOW``: TX overflow counter for queue6 + - ``FW_Q7_OVERFLOW``: TX overflow counter for queue7 + - ``FW_DROPPED_PKT``: This counter is incremented when a packet is dropped at PRU because of rule violation. + - ``FW_RX_ERROR``: Incremented if there was a CRC error or Min/Max frame error at PRU + - ``FW_RX_DS_INVALID``: Incremented when RTU detects Data Status invalid condition + - ``FW_TX_DROPPED_PACKET``: Counter for packets dropped via TX Port + - ``FW_TX_TS_DROPPED_PACKET``: Counter for packets with TS flag dropped via TX Port + - ``FW_INF_PORT_DISABLED``: Incremented when RX frame is dropped due to port being disabled + - ``FW_INF_SAV``: Incremented when RX frame is dropped due to Source Address violation + - ``FW_INF_SA_DL``: Incremented when RX frame is dropped due to Source Address being in the denylist + - ``FW_INF_PORT_BLOCKED``: Incremented when RX frame is dropped due to port being blocked and frame being a special frame + - ``FW_INF_DROP_TAGGED`` : Incremented when RX frame is dropped for being tagged + - ``FW_INF_DROP_PRIOTAGGED``: Incremented when RX frame is dropped for being priority tagged + - ``FW_INF_DROP_NOTAG``: Incremented when RX frame is dropped for being untagged + - ``FW_INF_DROP_NOTMEMBER``: Incremented when RX frame is dropped for port not being member of VLAN + - ``FW_RX_EOF_SHORT_FRMERR``: Incremented if End Of Frame (EOF) task is scheduled without seeing RX_B1 + - ``FW_RX_B0_DROP_EARLY_EOF``: Incremented when frame is dropped due to Early EOF + - ``FW_TX_JUMBO_FRM_CUTOFF``: Incremented when frame is cut off to prevent packet size > 2000 Bytes + - ``FW_RX_EXP_FRAG_Q_DROP``: Incremented when express frame is received in the same queue as the previous fragment + - ``FW_RX_FIFO_OVERRUN``: RX fifo overrun counter + - ``FW_CUT_THR_PKT``: Incremented when a packet is forwarded using Cut-Through forwarding method + - ``FW_HOST_RX_PKT_CNT``: Number of valid packets sent by Rx PRU to Host on PSI + - ``FW_HOST_TX_PKT_CNT``: Number of valid packets copied by RTU0 to Tx queues + - ``FW_HOST_EGRESS_Q_PRE_OVERFLOW``: Host Egress Q (Pre-emptible) Overflow Counter + - ``FW_HOST_EGRESS_Q_EXP_OVERFLOW``: Host Egress Q (Pre-emptible) Overflow Counter diff --git a/Documentation/networking/devlink/devlink-info.rst b/Documentation/networking/devlink/devlink-info.rst index 23073bc219d8..dd6adc4d0559 100644 --- a/Documentation/networking/devlink/devlink-info.rst +++ b/Documentation/networking/devlink/devlink-info.rst @@ -86,6 +86,10 @@ In case software/firmware components are loaded from the disk (e.g. ``/lib/firmware``) only the running version should be reported via the kernel API. +Please note that any security versions reported via devlink are purely +informational. Devlink does not use a secure channel to communicate with +the device. + Generic Versions ================ diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst index 2c14dfe69b3a..5885e21e2212 100644 --- a/Documentation/networking/devlink/devlink-trap.rst +++ b/Documentation/networking/devlink/devlink-trap.rst @@ -451,7 +451,7 @@ be added to the following table: * - ``udp_parsing`` - ``drop`` - Traps packets dropped due to an error in the UDP header parsing. - This packet trap could include checksum errorrs, an improper UDP + This packet trap could include checksum errors, an improper UDP length detected (smaller than 8 bytes) or detection of header truncation. * - ``tcp_parsing`` diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst index 948c8c44e233..8319f43b5933 100644 --- a/Documentation/networking/devlink/index.rst +++ b/Documentation/networking/devlink/index.rst @@ -84,6 +84,7 @@ parameters, info versions, and other features it supports. i40e ionic ice + ixgbe mlx4 mlx5 mlxsw diff --git a/Documentation/networking/devlink/ixgbe.rst b/Documentation/networking/devlink/ixgbe.rst new file mode 100644 index 000000000000..c27d1436c70e --- /dev/null +++ b/Documentation/networking/devlink/ixgbe.rst @@ -0,0 +1,171 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +ixgbe devlink support +===================== + +This document describes the devlink features implemented by the ``ixgbe`` +device driver. + +Info versions +============= + +Any of the versions dealing with the security presented by ``devlink-info`` +is purely informational. Devlink does not use a secure channel to communicate +with the device. + +The ``ixgbe`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 5 90 + + * - Name + - Type + - Example + - Description + * - ``board.id`` + - fixed + - H49289-000 + - The Product Board Assembly (PBA) identifier of the board. + * - ``fw.undi`` + - running + - 1.1937.0 + - Version of the Option ROM containing the UEFI driver. The version is + reported in ``major.minor.patch`` format. The major version is + incremented whenever a major breaking change occurs, or when the + minor version would overflow. The minor version is incremented for + non-breaking changes and reset to 1 when the major version is + incremented. The patch version is normally 0 but is incremented when + a fix is delivered as a patch against an older base Option ROM. + * - ``fw.undi.srev`` + - running + - 4 + - Number indicating the security revision of the Option ROM. + * - ``fw.bundle_id`` + - running + - 0x80000d0d + - Unique identifier of the firmware image file that was loaded onto + the device. Also referred to as the EETRACK identifier of the NVM. + * - ``fw.mgmt.api`` + - running + - 1.5.1 + - 3-digit version number (major.minor.patch) of the API exported over + the AdminQ by the management firmware. Used by the driver to + identify what commands are supported. Historical versions of the + kernel only displayed a 2-digit version number (major.minor). + * - ``fw.mgmt.build`` + - running + - 0x305d955f + - Unique identifier of the source for the management firmware. + * - ``fw.mgmt.srev`` + - running + - 3 + - Number indicating the security revision of the firmware. + * - ``fw.psid.api`` + - running + - 0.80 + - Version defining the format of the flash contents. + * - ``fw.netlist`` + - running + - 1.1.2000-6.7.0 + - The version of the netlist module. This module defines the device's + Ethernet capabilities and default settings, and is used by the + management firmware as part of managing link and device + connectivity. + * - ``fw.netlist.build`` + - running + - 0xee16ced7 + - The first 4 bytes of the hash of the netlist module contents. + +Flash Update +============ + +The ``ixgbe`` driver implements support for flash update using the +``devlink-flash`` interface. It supports updating the device flash using a +combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and +``fw.netlist`` components. + +.. list-table:: List of supported overwrite modes + :widths: 5 95 + + * - Bits + - Behavior + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` + - Do not preserve settings stored in the flash components being + updated. This includes overwriting the port configuration that + determines the number of physical functions the device will + initialize with. + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` + - Do not preserve either settings or identifiers. Overwrite everything + in the flash with the contents from the provided image, without + performing any preservation. This includes overwriting device + identifying fields such as the MAC address, Vital product Data (VPD) area, + and device serial number. It is expected that this combination be used with an + image customized for the specific device. + +Reload +====== + +The ``ixgbe`` driver supports activating new firmware after a flash update +using ``DEVLINK_CMD_RELOAD`` with the ``DEVLINK_RELOAD_ACTION_FW_ACTIVATE`` +action. + +.. code:: shell + + $ devlink dev reload pci/0000:01:00.0 reload action fw_activate + +The new firmware is activated by issuing a device specific Embedded +Management Processor reset which requests the device to reset and reload the +EMP firmware image. + +The driver does not currently support reloading the driver via +``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``. + +Regions +======= + +The ``ixgbe`` driver implements the following regions for accessing internal +device data. + +.. list-table:: regions implemented + :widths: 15 85 + + * - Name + - Description + * - ``nvm-flash`` + - The contents of the entire flash chip, sometimes referred to as + the device's Non Volatile Memory. + * - ``shadow-ram`` + - The contents of the Shadow RAM, which is loaded from the beginning + of the flash. Although the contents are primarily from the flash, + this area also contains data generated during device boot which is + not stored in flash. + * - ``device-caps`` + - The contents of the device firmware's capabilities buffer. Useful to + determine the current state and configuration of the device. + +Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a +snapshot. The ``device-caps`` region requires a snapshot as the contents are +sent by firmware and can't be split into separate reads. + +Users can request an immediate capture of a snapshot for all three regions +via the ``DEVLINK_CMD_REGION_NEW`` command. + +.. code:: shell + + $ devlink region show + pci/0000:01:00.0/nvm-flash: size 10485760 snapshot [] max 1 + pci/0000:01:00.0/device-caps: size 4096 snapshot [] max 10 + + $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1 + + $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 + 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc + 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 + + $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + + $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1 diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst index eb678ca45496..a6cd7236bfbd 100644 --- a/Documentation/networking/devmem.rst +++ b/Documentation/networking/devmem.rst @@ -62,15 +62,15 @@ More Info https://lore.kernel.org/netdev/20240831004313.3713467-1-almasrymina@google.com/ -Interface -========= +RX Interface +============ Example ------- -tools/testing/selftests/net/ncdevmem.c:do_server shows an example of setting up -the RX path of this API. +./tools/testing/selftests/drivers/net/hw/ncdevmem:do_server shows an example of +setting up the RX path of this API. NIC Setup @@ -235,6 +235,148 @@ can be less than the tokens provided by the user in case of: (a) an internal kernel leak bug. (b) the user passed more than 1024 frags. +TX Interface +============ + + +Example +------- + +./tools/testing/selftests/drivers/net/hw/ncdevmem:do_client shows an example of +setting up the TX path of this API. + + +NIC Setup +--------- + +The user must bind a TX dmabuf to a given NIC using the netlink API:: + + struct netdev_bind_tx_req *req = NULL; + struct netdev_bind_tx_rsp *rsp = NULL; + struct ynl_error yerr; + + *ys = ynl_sock_create(&ynl_netdev_family, &yerr); + + req = netdev_bind_tx_req_alloc(); + netdev_bind_tx_req_set_ifindex(req, ifindex); + netdev_bind_tx_req_set_fd(req, dmabuf_fd); + + rsp = netdev_bind_tx(*ys, req); + + tx_dmabuf_id = rsp->id; + + +The netlink API returns a dmabuf_id: a unique ID that refers to this dmabuf +that has been bound. + +The user can unbind the dmabuf from the netdevice by closing the netlink socket +that established the binding. We do this so that the binding is automatically +unbound even if the userspace process crashes. + +Note that any reasonably well-behaved dmabuf from any exporter should work with +devmem TCP, even if the dmabuf is not actually backed by devmem. An example of +this is udmabuf, which wraps user memory (non-devmem) in a dmabuf. + +Socket Setup +------------ + +The user application must use MSG_ZEROCOPY flag when sending devmem TCP. Devmem +cannot be copied by the kernel, so the semantics of the devmem TX are similar +to the semantics of MSG_ZEROCOPY:: + + setsockopt(socket_fd, SOL_SOCKET, SO_ZEROCOPY, &opt, sizeof(opt)); + +It is also recommended that the user binds the TX socket to the same interface +the dma-buf has been bound to via SO_BINDTODEVICE:: + + setsockopt(socket_fd, SOL_SOCKET, SO_BINDTODEVICE, ifname, strlen(ifname) + 1); + + +Sending data +------------ + +Devmem data is sent using the SCM_DEVMEM_DMABUF cmsg. + +The user should create a msghdr where, + +* iov_base is set to the offset into the dmabuf to start sending from +* iov_len is set to the number of bytes to be sent from the dmabuf + +The user passes the dma-buf id to send from via the dmabuf_tx_cmsg.dmabuf_id. + +The example below sends 1024 bytes from offset 100 into the dmabuf, and 2048 +from offset 2000 into the dmabuf. The dmabuf to send from is tx_dmabuf_id:: + + char ctrl_data[CMSG_SPACE(sizeof(struct dmabuf_tx_cmsg))]; + struct dmabuf_tx_cmsg ddmabuf; + struct msghdr msg = {}; + struct cmsghdr *cmsg; + struct iovec iov[2]; + + iov[0].iov_base = (void*)100; + iov[0].iov_len = 1024; + iov[1].iov_base = (void*)2000; + iov[1].iov_len = 2048; + + msg.msg_iov = iov; + msg.msg_iovlen = 2; + + msg.msg_control = ctrl_data; + msg.msg_controllen = sizeof(ctrl_data); + + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_DEVMEM_DMABUF; + cmsg->cmsg_len = CMSG_LEN(sizeof(struct dmabuf_tx_cmsg)); + + ddmabuf.dmabuf_id = tx_dmabuf_id; + + *((struct dmabuf_tx_cmsg *)CMSG_DATA(cmsg)) = ddmabuf; + + sendmsg(socket_fd, &msg, MSG_ZEROCOPY); + + +Reusing TX dmabufs +------------------ + +Similar to MSG_ZEROCOPY with regular memory, the user should not modify the +contents of the dma-buf while a send operation is in progress. This is because +the kernel does not keep a copy of the dmabuf contents. Instead, the kernel +will pin and send data from the buffer available to the userspace. + +Just as in MSG_ZEROCOPY, the kernel notifies the userspace of send completions +using MSG_ERRQUEUE:: + + int64_t tstop = gettimeofday_ms() + waittime_ms; + char control[CMSG_SPACE(100)] = {}; + struct sock_extended_err *serr; + struct msghdr msg = {}; + struct cmsghdr *cm; + int retries = 10; + __u32 hi, lo; + + msg.msg_control = control; + msg.msg_controllen = sizeof(control); + + while (gettimeofday_ms() < tstop) { + if (!do_poll(fd)) continue; + + ret = recvmsg(fd, &msg, MSG_ERRQUEUE); + + for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) { + serr = (void *)CMSG_DATA(cm); + + hi = serr->ee_data; + lo = serr->ee_info; + + fprintf(stdout, "tx complete [%d,%d]\n", lo, hi); + } + } + +After the associated sendmsg has been completed, the dmabuf can be reused by +the userspace. + + Implementation & Caveats ======================== diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index c64133d309bf..ac90b82f3ce9 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -48,7 +48,6 @@ Contents: ax25 bonding cdc_mbim - dccp dctcp devmem dns_resolver diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 5c63ab928b97..0f1251cce314 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -37,8 +37,8 @@ ip_no_pmtu_disc - INTEGER Mode 3 is a hardened pmtu discover mode. The kernel will only accept fragmentation-needed errors if the underlying protocol can verify them besides a plain socket lookup. Current - protocols for which pmtu events will be honored are TCP, SCTP - and DCCP as they verify e.g. the sequence number or the + protocols for which pmtu events will be honored are TCP and + SCTP as they verify e.g. the sequence number or the association. This mode should not be enabled globally but is only intended to secure e.g. name servers in namespaces where TCP path mtu must still work but path MTU information of other @@ -735,7 +735,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables automatic tuning of that socket's receive buffer size, in which case this value is ignored. - Default: between 131072 and 6MB, depending on RAM size. + Default: between 131072 and 32MB, depending on RAM size. tcp_sack - BOOLEAN Enable select acknowledgments (SACKS). @@ -1099,7 +1099,7 @@ tcp_limit_output_bytes - INTEGER limits the number of bytes on qdisc or device to reduce artificial RTT/cwnd and reduce bufferbloat. - Default: 1048576 (16 * 65536) + Default: 4194304 (4 MB) tcp_challenge_ack_limit - INTEGER Limits number of Challenge ACK sent per second, as recommended diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst index 6327e689e8a8..c69cc89c958e 100644 --- a/Documentation/networking/net_cachelines/net_device.rst +++ b/Documentation/networking/net_cachelines/net_device.rst @@ -10,6 +10,7 @@ Type Name fastpath_tx_acce =================================== =========================== =================== =================== =================================================================================== unsigned_long:32 priv_flags read_mostly __dev_queue_xmit(tx) unsigned_long:1 lltx read_mostly HARD_TX_LOCK,HARD_TX_TRYLOCK,HARD_TX_UNLOCK(tx) +unsigned long:1 netmem_tx:1; read_mostly char name[16] struct netdev_name_node* name_node struct dev_ifalias* ifalias @@ -131,7 +132,7 @@ struct ref_tracker_dir refcnt_tracker struct list_head link_watch_list enum:8 reg_state bool dismantle -enum:16 rtnl_link_state +bool rtnl_link_initilizing bool needs_free_netdev void*priv_destructor struct net_device struct netpoll_info* npinfo read_mostly napi_poll/napi_poll_lock diff --git a/Documentation/networking/net_cachelines/snmp.rst b/Documentation/networking/net_cachelines/snmp.rst index bc96efc92cf5..bd44b3eebbef 100644 --- a/Documentation/networking/net_cachelines/snmp.rst +++ b/Documentation/networking/net_cachelines/snmp.rst @@ -37,6 +37,8 @@ unsigned_long LINUX_MIB_TIMEWAITKILLED unsigned_long LINUX_MIB_PAWSACTIVEREJECTED unsigned_long LINUX_MIB_PAWSESTABREJECTED unsigned_long LINUX_MIB_TSECR_REJECTED +unsigned_long LINUX_MIB_PAWS_OLD_ACK +unsigned_long LINUX_MIB_PAWS_TW_REJECTED unsigned_long LINUX_MIB_DELAYEDACKLOST unsigned_long LINUX_MIB_LISTENOVERFLOWS unsigned_long LINUX_MIB_LISTENDROPS diff --git a/Documentation/networking/netdev-features.rst b/Documentation/networking/netdev-features.rst index 5014f7cc1398..02bd7536fc0c 100644 --- a/Documentation/networking/netdev-features.rst +++ b/Documentation/networking/netdev-features.rst @@ -188,3 +188,8 @@ Redundancy) frames from one port to another in hardware. This should be set for devices which duplicate outgoing HSR (High-availability Seamless Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically frames in hardware. + +* netmem-tx + +This should be set for devices which support netmem TX. See +Documentation/networking/netmem.rst diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst index eab601ab2db0..7ebb6c36482d 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -8,7 +8,7 @@ Network Devices, the Kernel, and You! Introduction ============ The following is a random collection of documentation regarding -network devices. +network devices. It is intended for driver developers. struct net_device lifetime rules ================================ @@ -314,13 +314,8 @@ napi->poll: softirq will be called with interrupts disabled by netconsole. -struct netdev_queue_mgmt_ops synchronization rules -================================================== - -All queue management ndo callbacks are holding netdev instance lock. - -RTNL and netdev instance lock -============================= +netdev instance lock +==================== Historically, all networking control operations were protected by a single global lock known as ``rtnl_lock``. There is an ongoing effort to replace this @@ -328,10 +323,13 @@ global lock with separate locks for each network namespace. Additionally, properties of individual netdev are increasingly protected by per-netdev locks. For device drivers that implement shaping or queue management APIs, all control -operations will be performed under the netdev instance lock. Currently, this -instance lock is acquired within the context of ``rtnl_lock``. The drivers -can also explicitly request instance lock to be acquired via -``request_ops_lock``. In the future, there will be an option for individual +operations will be performed under the netdev instance lock. +Drivers can also explicitly request instance lock to be held during ops +by setting ``request_ops_lock`` to true. Code comments and docs refer +to drivers which have ops called under the instance lock as "ops locked". +See also the documentation of the ``lock`` member of struct net_device. + +In the future, there will be an option for individual drivers to opt out of using ``rtnl_lock`` and instead perform their control operations directly under the netdev instance lock. @@ -344,18 +342,59 @@ functions handle acquiring the instance lock themselves, while the ``netif_xxx`` functions assume that the driver has already acquired the instance lock. +struct net_device_ops +--------------------- + +``ndos`` are called without holding the instance lock for most drivers. + +"Ops locked" drivers will have most of the ``ndos`` invoked under +the instance lock. + +struct ethtool_ops +------------------ + +Similarly to ``ndos`` the instance lock is only held for select drivers. +For "ops locked" drivers all ethtool ops without exceptions should +be called under the instance lock. + +struct netdev_stat_ops +---------------------- + +"qstat" ops are invoked under the instance lock for "ops locked" drivers, +and under rtnl_lock for all other drivers. + +struct net_shaper_ops +--------------------- + +All net shaper callbacks are invoked while holding the netdev instance +lock. ``rtnl_lock`` may or may not be held. + +Note that supporting net shapers automatically enables "ops locking". + +struct netdev_queue_mgmt_ops +---------------------------- + +All queue management callbacks are invoked while holding the netdev instance +lock. ``rtnl_lock`` may or may not be held. + +Note that supporting struct netdev_queue_mgmt_ops automatically enables +"ops locking". + Notifiers and netdev instance lock -================================== +---------------------------------- For device drivers that implement shaping or queue management APIs, some of the notifiers (``enum netdev_cmd``) are running under the netdev instance lock. +The following netdev notifiers are always run under the instance lock: +* ``NETDEV_XDP_FEAT_CHANGE`` + For devices with locked ops, currently only the following notifiers are running under the lock: +* ``NETDEV_CHANGE`` * ``NETDEV_REGISTER`` * ``NETDEV_UP`` -* ``NETDEV_CHANGE`` The following notifiers are running without the lock: * ``NETDEV_UNREGISTER`` diff --git a/Documentation/networking/netmem.rst b/Documentation/networking/netmem.rst index 7de21ddb5412..b63aded46337 100644 --- a/Documentation/networking/netmem.rst +++ b/Documentation/networking/netmem.rst @@ -19,8 +19,8 @@ Benefits of Netmem : * Simplified Development: Drivers interact with a consistent API, regardless of the underlying memory implementation. -Driver Requirements -=================== +Driver RX Requirements +====================== 1. The driver must support page_pool. @@ -77,3 +77,22 @@ Driver Requirements that purpose, but be mindful that some netmem types might have longer circulation times, such as when userspace holds a reference in zerocopy scenarios. + +Driver TX Requirements +====================== + +1. The Driver must not pass the netmem dma_addr to any of the dma-mapping APIs + directly. This is because netmem dma_addrs may come from a source like + dma-buf that is not compatible with the dma-mapping APIs. + + Helpers like netmem_dma_unmap_page_attrs() & netmem_dma_unmap_addr_set() + should be used in lieu of dma_unmap_page[_attrs](), dma_unmap_addr_set(). + The netmem variants will handle netmem dma_addrs correctly regardless of the + source, delegating to the dma-mapping APIs when appropriate. + + Not all dma-mapping APIs have netmem equivalents at the moment. If your + driver relies on a missing netmem API, feel free to add and propose to + netdev@, or reach out to the maintainers and/or almasrymina@google.com for + help adding the netmem API. + +2. Driver should declare support by setting `netdev->netmem_tx = true` diff --git a/Documentation/networking/rds.rst b/Documentation/networking/rds.rst index 498395f5fbcb..41b0a6182fe4 100644 --- a/Documentation/networking/rds.rst +++ b/Documentation/networking/rds.rst @@ -265,7 +265,7 @@ RDS Protocol The bitmaps are allocated as connections are brought up. This avoids allocation in the interrupt handling path which queues - sages on sockets. The dense bitmaps let transports send the + messages on sockets. The dense bitmaps let transports send the entire bitmap on any bitmap change reasonably efficiently. This is much easier to implement than some finer-grained communication of per-port congestion. The sender does a very @@ -373,7 +373,7 @@ The recv path - validate header checksum - copy header to rds_ib_incoming struct if start of a new datagram - add to ibinc's fraglist - - if competed datagram: + - if completed datagram: - update cong map if datagram was cong update - call rds_recv_incoming() otherwise - note if ack is required @@ -415,7 +415,7 @@ Multipath RDS (mprds) I/O workqs and reconnect threads are driven from the rds_conn_path. Transports such as TCP that are multipath capable may then set up a TCP socket per rds_conn_path, and this is managed by the transport via - the transport privatee cp_transport_data pointer. + the transport private cp_transport_data pointer. Transports announce themselves as multipath capable by setting the t_mp_capable bit during registration with the rds core module. When the @@ -430,7 +430,7 @@ Multipath RDS (mprds) This is done by sending out a control packet exchange before the first data packet. The control packet exchange must have completed prior to outgoing hash completion in rds_sendmsg() when the transport - is mutlipath capable. + is multipath capable. The control packet is an RDS ping packet (i.e., packet to rds dest port 0) with the ping packet having a rds extension header option of diff --git a/Documentation/networking/rxrpc.rst b/Documentation/networking/rxrpc.rst index e807e18ba32a..d63e3e27dd06 100644 --- a/Documentation/networking/rxrpc.rst +++ b/Documentation/networking/rxrpc.rst @@ -1062,30 +1062,6 @@ The kernel interface functions are as follows: first function to change. Note that this must be called in TASK_RUNNING state. - (#) Get remote client epoch:: - - u32 rxrpc_kernel_get_epoch(struct socket *sock, - struct rxrpc_call *call) - - This allows the epoch that's contained in packets of an incoming client - call to be queried. This value is returned. The function always - successful if the call is still in progress. It shouldn't be called once - the call has expired. Note that calling this on a local client call only - returns the local epoch. - - This value can be used to determine if the remote client has been - restarted as it shouldn't change otherwise. - - (#) Set the maximum lifespan on a call:: - - void rxrpc_kernel_set_max_life(struct socket *sock, - struct rxrpc_call *call, - unsigned long hard_timeout) - - This sets the maximum lifespan on a call to hard_timeout (which is in - jiffies). In the event of the timeout occurring, the call will be - aborted and -ETIME or -ETIMEDOUT will be returned. - (#) Apply the RXRPC_MIN_SECURITY_LEVEL sockopt to a socket from within in the kernel:: @@ -1172,3 +1148,18 @@ adjusted through sysctls in /proc/net/rxrpc/: header plus exactly 1412 bytes of data. The terminal packet must contain a four byte header plus any amount of data. In any event, a jumbo packet may not exceed rxrpc_rx_mtu in size. + + +API Function Reference +====================== + +.. kernel-doc:: net/rxrpc/af_rxrpc.c +.. kernel-doc:: net/rxrpc/call_object.c +.. kernel-doc:: net/rxrpc/key.c +.. kernel-doc:: net/rxrpc/oob.c +.. kernel-doc:: net/rxrpc/peer_object.c +.. kernel-doc:: net/rxrpc/recvmsg.c +.. kernel-doc:: net/rxrpc/rxgk.c +.. kernel-doc:: net/rxrpc/rxkad.c +.. kernel-doc:: net/rxrpc/sendmsg.c +.. kernel-doc:: net/rxrpc/server_key.c diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst index b8fef8101176..7aabead90648 100644 --- a/Documentation/networking/timestamping.rst +++ b/Documentation/networking/timestamping.rst @@ -811,11 +811,9 @@ Documentation/devicetree/bindings/ptp/timestamper.txt for more details. 3.2.4 Other caveats for MAC drivers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Stacked PHCs, especially DSA (but not only) - since that doesn't require any -modification to MAC drivers, so it is more difficult to ensure correctness of -all possible code paths - is that they uncover bugs which were impossible to -trigger before the existence of stacked PTP clocks. One example has to do with -this line of code, already presented earlier:: +The use of stacked PHCs may uncover MAC driver bugs which were impossible to +trigger without them. One example has to do with this line of code, already +presented earlier:: skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; diff --git a/Documentation/networking/tproxy.rst b/Documentation/networking/tproxy.rst index 7f7c1ff6f159..75e4990cc3db 100644 --- a/Documentation/networking/tproxy.rst +++ b/Documentation/networking/tproxy.rst @@ -69,9 +69,9 @@ add rules like this to the iptables ruleset above:: # iptables -t mangle -A PREROUTING -p tcp --dport 80 -j TPROXY \ --tproxy-mark 0x1/0x1 --on-port 50080 -Or the following rule to nft: +Or the following rule to nft:: -# nft add rule filter divert tcp dport 80 tproxy to :50080 meta mark set 1 accept + # nft add rule filter divert tcp dport 80 tproxy to :50080 meta mark set 1 accept Note that for this to work you'll have to modify the proxy to enable (SOL_IP, IP_TRANSPARENT) for the listening socket. diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm_device.rst index 7f24c09f2694..122204da0fff 100644 --- a/Documentation/networking/xfrm_device.rst +++ b/Documentation/networking/xfrm_device.rst @@ -65,9 +65,13 @@ Callbacks to implement /* from include/linux/netdevice.h */ struct xfrmdev_ops { /* Crypto and Packet offload callbacks */ - int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack); - void (*xdo_dev_state_delete) (struct xfrm_state *x); - void (*xdo_dev_state_free) (struct xfrm_state *x); + int (*xdo_dev_state_add)(struct net_device *dev, + struct xfrm_state *x, + struct netlink_ext_ack *extack); + void (*xdo_dev_state_delete)(struct net_device *dev, + struct xfrm_state *x); + void (*xdo_dev_state_free)(struct net_device *dev, + struct xfrm_state *x); bool (*xdo_dev_offload_ok) (struct sk_buff *skb, struct xfrm_state *x); void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); |