summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-14nfp: bpf: add support for direct packet access - readJakub Kicinski
In direct packet access bound checks are already done, we can simply dereference the packet pointer. Verifier/parser logic needs to record pointer type. Note that although verifier does protect us from CTX vs other pointer changes we will also want to differentiate between PACKET vs MAP_VALUE or STACK, so we can add the check already. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14nfp: bpf: separate I/O from checks for legacy data loadJakub Kicinski
Move data load into a separate function and separate it from packet length checks of legacy I/O. This makes the code more readable and easier to reuse. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14nfp: bpf: fix context accessesJakub Kicinski
Sizes of fields in struct xdp_md/xdp_buff and some in sk_buff depend on target architecture. Take that into account and use struct xdp_buff, not struct xdp_md. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14nfp: bpf: support BPF offload only on little endianJakub Kicinski
eBPF is host-endian specific. Translating both BE and LE eBPF to the NFP is feasible, but would require quite a bit of indirection. The fact that I don't have access to any BE hosts that would fit a 25G/40G/100G NIC is also limiting my ability to test big endian. For now restrict the offload to little endian hosts only. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14nfp: bpf: implement byte swap instructionJakub Kicinski
Implement byte swaps with rotations, shifts and byte loads. Remember to clear upper parts of the 64 bit registers. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14nfp: bpf: add mov helperJakub Kicinski
Register move operation is encoded as alu no op. This means that one has to specify number of unused/none parameters to the emit_alu(). Add a helper. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14nfp: bpf: fix compare instructionsJakub Kicinski
Now that we have BPF assemebler support in LLVM 6 we can easily test all compare instructions (LLVM 4 didn't generate most of them from C). Fix the compare to immediates and refactor the order of compare to regs to make sure they both follow the same pattern. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14nfp: bpf: add missing return in jne_imm optimizationJakub Kicinski
We optimize comparisons to immediate 0 as if (reg.lo | reg.hi). The early return statement was missing, however, which means we would generate two comparisons - optimized one followed by a normal 2x 32 bit compare. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14nfp: bpf: reorder arguments to emit_ld_field_any()Jakub Kicinski
ld_field instruction has the following format in NFP assembler: ld_field[dst, 1000, src, <<24] reoder parameters to emit_ld_field_any() to make it closer to the familiar assembler order. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14bpf: verifier: set reg_type on context accesses in second passJakub Kicinski
Use a simplified is_valid_access() callback when verifier is used for program analysis by non-host JITs. This allows us to teach the verifier about packet start and packet end offsets for direct packet access. We can extend the callback as needed but for most packet processing needs there isn't much more the offloads may require. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14Merge branch 'stmmac-Improvements-for-multi-queuing-and-for-AVB'David S. Miller
Jose Abreu says: ==================== net: stmmac: Improvements for multi-queuing and for AVB Two improvements for stmmac: First one corrects the available fifo size per queue, second one corrects enabling of AVB queues. More info in commit log. Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Changes from v1: - Fix typo in second patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
2017-10-14net: stmmac: Disable flow ctrl for RX AVB queues and really enable TX AVB queuesJose Abreu
Flow control must be disabled for AVB enabled queues and TX AVB queues must be enabled by setting BIT(2) of TXQEN. Correct this by passing the queue mode to DMA callbacks and by checking in these functions wether we are in AVB performing the necessary adjustments. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14net: stmmac: Use correct values in TQS/RQS fieldsJose Abreu
Currently we are using all the available fifo size in RQS and TQS fields. This will not work correctly in multi-queues IP's because total fifo size must be splitted to the enabled queues. Correct this by computing the available fifo size per queue and setting the right value in TQS and RQS fields. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Joao Pinto <jpinto@synopsys.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-14icmp: don't fail on fragment reassembly time exceededMatteo Croce
The ICMP implementation currently replies to an ICMP time exceeded message (type 11) with an ICMP host unreachable message (type 3, code 1). However, time exceeded messages can either represent "time to live exceeded in transit" (code 0) or "fragment reassembly time exceeded" (code 1). Unconditionally replying to "fragment reassembly time exceeded" with host unreachable messages might cause unjustified connection resets which are now easily triggered as UFO has been removed, because, in turn, sending large buffers triggers IP fragmentation. The issue can be easily reproduced by running a lot of UDP streams which is likely to trigger IP fragmentation: # start netserver in the test namespace ip netns add test ip netns exec test netserver # create a VETH pair ip link add name veth0 type veth peer name veth0 netns test ip link set veth0 up ip -n test link set veth0 up for i in $(seq 20 29); do # assign addresses to both ends ip addr add dev veth0 192.168.$i.1/24 ip -n test addr add dev veth0 192.168.$i.2/24 # start the traffic netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 & done # wait send_data: data send error: No route to host (errno 113) netperf: send_omni: send_data failed: No route to host We need to differentiate instead: if fragment reassembly time exceeded is reported, we need to silently drop the packet, if time to live exceeded is reported, maintain the current behaviour. In both cases increment the related error count "icmpInTimeExcds". While at it, fix a typo in a comment, and convert the if statement into a switch to mate it more readable. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13i40e/i40evf: don't trust VF to reset itselfAlan Brady
When using 'ethtool -L' on a VF to change number of requested queues from PF, we shouldn't trust the VF to reset itself after making the request. Doing it that way opens the door for a potentially malicious VF to do nasty things to the PF which should never be the case. This makes it such that after VF makes a successful request, PF will then reset the VF to institute required changes. Only if the request fails will PF send a message back to VF letting it know the request was unsuccessful. Testing-hints: There should be no real functional changes. This is simply hardening against a potentially malicious VF. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13i40e: fix link reportingAlan Brady
When querying the NVM for supported phy_types, on some firmware versions, we were failing to actually fill out the phy_types which means ethtool wouldn't report any link types. Testing-hints: Check 'ethtool <iface>' if you have the right (wrong?) firmware. Without this patch, no link modes will be reported. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13i40e: make const array patterns static, reduces object code sizeColin Ian King
Don't populate const array patterns on the stack, instead make it static. Makes the object code smaller by over 60 bytes: Before: text data bss dec hex filename 1953 496 0 2449 991 i40e_diag.o After: text data bss dec hex filename 1798 584 0 2382 94e i40e_diag.o (gcc 6.3.0, x86-64) Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13i40e: Add support setting TC max bandwidth ratesAmritha Nambiar
This patch enables setting up maximum Tx rates for the traffic classes in i40e. The maximum rate is offloaded to the hardware through the mqprio framework by specifying the mode option as 'channel' and shaper option as 'bw_rlimit' and is configured for the VSI. Configuring minimum Tx rate limit is not supported in the device. The minimum usable value for Tx rate is 50Mbps. Example: # tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\ queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\ max_rate 4Gbit 5Gbit To dump the bandwidth rates: # tc qdisc show dev eth0 qdisc mqprio 804a: root tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 queues:(0:3) (4:7) mode:channel shaper:bw_rlimit max_rate:4Gbit 5Gbit Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13i40e: Refactor VF BW rate limitingAmritha Nambiar
This patch refactors the BW rate limiting for Tx traffic on the VF to be reused in the next patch for rate limiting Tx traffic for the VSIs on the PF as well. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13i40e: Enable 'channel' mode in mqprio for TC configsAmritha Nambiar
The i40e driver is modified to enable the new mqprio hardware offload mode and factor the TCs and queue configuration by creating channel VSIs. In this mode, the priority to traffic class mapping and the user specified queue ranges are used to configure the traffic classes by setting the mode option to 'channel'. Example: map 0 0 0 0 1 2 2 3 queues 2@0 2@2 1@4 1@5\ hw 1 mode channel qdisc mqprio 8038: root tc 4 map 0 0 0 0 1 2 2 3 0 0 0 0 0 0 0 0 queues:(0:1) (2:3) (4:4) (5:5) mode:channel shaper:dcb The HW channels created are removed and all the queue configuration is set to default when the qdisc is detached from the root of the device. This patch also disables setting up channels via ethtool (ethtool -L) when the TCs are configured using mqprio scheduler. The patch also limits setting ethtool Rx flow hash indirection (ethtool -X eth0 equal N) to max queues configured via mqprio. The Rx flow hash indirection input through ethtool should be validated so that it is within in the queue range configured via tc/mqprio. The bound checking is achieved by reporting the current rss size to the kernel when queues are configured via mqprio. Example: map 0 0 0 1 0 2 3 0 queues 2@0 4@2 8@6 11@14\ hw 1 mode channel Cannot set RX flow hash configuration: Invalid argument Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13i40e: Add infrastructure for queue channel supportAmritha Nambiar
This patch sets up the infrastructure for offloading TCs and queue configurations to the hardware by creating HW channels(VSI). A new channel is created for each of the traffic class configuration offloaded via mqprio framework except for the first TC (TC0). TC0 for the main VSI is also reconfigured as per user provided queue parameters. Queue counts that are not power-of-2 are handled by reconfiguring RSS by reprogramming LUTs using the queue count value. This patch also handles configuring the TX rings for the channels, setting up the RX queue map for channel. Also, the channels so created are removed and all the queue configuration is set to default when the qdisc is detached from the root of the device. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13i40e: Add macro for PF reset bitAmritha Nambiar
Introduce a macro for the bit setting the PF reset flag and update its usages. This makes it easier to use this flag in functions to be introduced in future without encountering checkpatch issues related to alignment and line over 80 characters. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13mqprio: Introduce new hardware offload mode and shaper in mqprioAmritha Nambiar
The offload types currently supported in mqprio are 0 (no offload) and 1 (offload only TCs) by setting these values for the 'hw' option. If offloads are supported by setting the 'hw' option to 1, the default offload mode is 'dcb' where only the TC values are offloaded to the device. This patch introduces a new hardware offload mode called 'channel' with 'hw' set to 1 in mqprio which makes full use of the mqprio options, the TCs, the queue configurations and the QoS parameters for the TCs. This is achieved through a new netlink attribute for the 'mode' option which takes values such as 'dcb' (default) and 'channel'. The 'channel' mode also supports QoS attributes for traffic class such as minimum and maximum values for bandwidth rate limits. This patch enables configuring additional HW shaper attributes associated with a traffic class. Currently the shaper for bandwidth rate limiting is supported which takes options such as minimum and maximum bandwidth rates and are offloaded to the hardware in the 'channel' mode. The min and max limits for bandwidth rates are provided by the user along with the TCs and the queue configurations when creating the mqprio qdisc. The interface can be extended to support new HW shapers in future through the 'shaper' attribute. Introduces a new data structure 'tc_mqprio_qopt_offload' for offloading mqprio queue options and use this to be shared between the kernel and device driver. This contains a copy of the existing data structure for mqprio queue options. This new data structure can be extended when adding new attributes for traffic class such as mode, shaper, shaper parameters (bandwidth rate limits). The existing data structure for mqprio queue options will be shared between the kernel and userspace. Example: queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\ min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit To dump the bandwidth rates: qdisc mqprio 804a: root tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 queues:(0:3) (4:7) mode:channel shaper:bw_rlimit min_rate:1Gbit 2Gbit max_rate:4Gbit 5Gbit Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-13Merge branch 'tipc-comm-groups'David S. Miller
Jon Maloy says: ==================== tipc: Introduce Communcation Group feature With this commit series we introduce a 'Group Communication' feature in order to resolve the datagram and multicast flow control problem. This new feature makes it possible for a user to instantiate multiple private virtual brokerless message buses by just creating and joining member sockets. The main features are as follows: --------------------------------- - Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN. If it is the first socket of the group this implies creation of the group. This call takes four parameters: 'type' serves as group identifier, 'instance' serves as member identifier, and 'scope' indicates the visibility of the group (node/cluster/zone). Finally, 'flags' indicates different options for the socket joining the group. For the time being, there are only two such flags: 1) 'LOOPBACK' indicates if the creator of the socket wants to receive a copy of broadcast or multicast messages it sends to the group, 2) EVENTS indicates if it wants to receive membership (JOINED/LEFT) events for the other members of the group. - Groups are closed, i.e., sockets which have not joined a group will not be able to send messages to or receive messages from members of the group, and vice versa. A socket can only be member of one group at a time. - There are four transmission modes. 1: Unicast. The sender transmits a message using the port identity (node:port tuple) of the receiving socket. 2: Anycast. The sender transmits a message using a port name (type: instance:scope) of one of the receiving sockets. If more than one member socket matches the given address a destination is selected according to a round-robin algorithm, but also considering the destination load (advertised window size) as an additional criteria. 3: Multicast. The sender transmits a message using a port name (type:instance:scope) of one or more of the receiving sockets. All sockets in the group matching the given address will receive a copy of the message. 4: Broadcast. The sender transmits a message using the primtive send(). All members of the group, irrespective of their member identity (instance) number receive a copy of the message. - TIPC broadcast is used for carrying messages in mode 3 or 4 when this is deemed more efficient, i.e., depending on number of actual destinations. - All transmission modes are flow controlled, so that messages never are dropped or rejected, just like we are used to from connection oriented communication. A special algorithm guarantees that this is true even for multipoint-to-point communication, i.e., at occasions where many source sockets may decide to send simultaneously towards the same destination socket. - Sequence order is always guaranteed, even between the different transmission modes. - Member join/leave events are received in all other member sockets in guaranteed order. I.e., a 'JOINED' (an empty message with the OOB bit set) will always be received before the first data message from a new member, and a 'LEAVE' (like 'JOINED', but with EOR bit set) will always arrive after the last data message from a leaving member. ----- v2: Reordered variable declarations in descending length order, as per feedback from David Miller. This was done as far as permitted by the the initialization order. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: add multipoint-to-point flow controlJon Maloy
We already have point-to-multipoint flow control within a group. But we even need the opposite; -a scheme which can handle that potentially hundreds of sources may try to send messages to the same destination simultaneously without causing buffer overflow at the recipient. This commit adds such a mechanism. The algorithm works as follows: - When a member detects a new, joining member, it initially set its state to JOINED and advertises a minimum window to the new member. This window is chosen so that the new member can send exactly one maximum sized message, or several smaller ones, to the recipient before it must stop and wait for an additional advertisement. This minimum window ADV_IDLE is set to 65 1kB blocks. - When a member receives the first data message from a JOINED member, it changes the state of the latter to ACTIVE, and advertises a larger window ADV_ACTIVE = 12 x ADV_IDLE blocks to the sender, so it can continue sending with minimal disturbances to the data flow. - The active members are kept in a dedicated linked list. Each time a message is received from an active member, it will be moved to the tail of that list. This way, we keep a record of which members have been most (tail) and least (head) recently active. - There is a maximum number (16) of permitted simultaneous active senders per receiver. When this limit is reached, the receiver will not advertise anything immediately to a new sender, but instead put it in a PENDING state, and add it to a corresponding queue. At the same time, it will pick the least recently active member, send it an advertisement RECLAIM message, and set this member to state RECLAIMING. - The reclaimee member has to respond with a REMIT message, meaning that it goes back to a send window of ADV_IDLE, and returns its unused advertised blocks beyond that value to the reclaiming member. - When the reclaiming member receives the REMIT message, it unlinks the reclaimee from its active list, resets its state to JOINED, and notes that it is now back at ADV_IDLE advertised blocks to that member. If there are still unread data messages sent out by reclaimee before the REMIT, the member goes into an intermediate state REMITTED, where it stays until the said messages have been consumed. - The returned advertised blocks can now be re-advertised to the pending member, which is now set to state ACTIVE and added to the active member list. - To be proactive, i.e., to minimize the risk that any member will end up in the pending queue, we start reclaiming resources already when the number of active members exceeds 3/4 of the permitted maximum. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: guarantee delivery of last broadcast before DOWN eventJon Maloy
The following scenario is possible: - A user sends a broadcast message, and thereafter immediately leaves the group. - The LEAVE message, following a different path than the broadcast, arrives ahead of the broadcast, and the sending member is removed from the receiver's list. - The broadcast message arrives, but is dropped because the sender now is unknown to the receipient. We fix this by sequence numbering membership events, just like ordinary unicast messages. Currently, when a JOIN is sent to a peer, it contains a synchronization point, - the sequence number of the next sent broadcast, in order to give the receiver a start synchronization point. We now let even LEAVE messages contain such an "end synchronization" point, so that the recipient can delay the removal of the sending member until it knows that all messages have been received. The received synchronization points are added as sequence numbers to the generated membership events, making it possible to handle them almost the same way as regular unicasts in the receiving filter function. In particular, a DOWN event with a too high sequence number will be kept in the reordering queue until the missing broadcast(s) arrive and have been delivered. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: guarantee delivery of UP event before first broadcastJon Maloy
The following scenario is possible: - A user joins a group, and immediately sends out a broadcast message to its members. - The broadcast message, following a different data path than the initial JOIN message sent out during the joining procedure, arrives to a receiver before the latter.. - The receiver drops the message, since it is not ready to accept any messages until the JOIN has arrived. We avoid this by treating group protocol JOIN messages like unicast messages. - We let them pass through the recipient's multicast input queue, just like ordinary unicasts. - We force the first following broadacst to be sent as replicated unicast and being acknowledged by the recipient before accepting any more broadcast transmissions. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: guarantee that group broadcast doesn't bypass group unicastJon Maloy
We need a mechanism guaranteeing that group unicasts sent out from a socket are not bypassed by later sent broadcasts from the same socket. We do this as follows: - Each time a unicast is sent, we set a the broadcast method for the socket to "replicast" and "mandatory". This forces the first subsequent broadcast message to follow the same network and data path as the preceding unicast to a destination, hence preventing it from overtaking the latter. - In order to make the 'same data path' statement above true, we let group unicasts pass through the multicast link input queue, instead of as previously through the unicast link input queue. - In the first broadcast following a unicast, we set a new header flag, requiring all recipients to immediately acknowledge its reception. - During the period before all the expected acknowledges are received, the socket refuses to accept any more broadcast attempts, i.e., by blocking or returning EAGAIN. This period should typically not be longer than a few microseconds. - When all acknowledges have been received, the sending socket will open up for subsequent broadcasts, this time giving the link layer freedom to itself select the best transmission method. - The forced and/or abrupt transmission method changes described above may lead to broadcasts arriving out of order to the recipients. We remedy this by introducing code that checks and if necessary re-orders such messages at the receiving end. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: guarantee group unicast doesn't bypass group broadcastJon Maloy
Group unicast messages don't follow the same path as broadcast messages, and there is a high risk that unicasts sent from a socket might bypass previously sent broadcasts from the same socket. We fix this by letting all unicast messages carry the sequence number of the next sent broadcast from the same node, but without updating this number at the receiver. This way, a receiver can check and if necessary re-order such messages before they are added to the socket receive buffer. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: introduce group multicast messagingJon Maloy
The previously introduced message transport to all group members is based on the tipc multicast service, but is logically a broadcast service within the group, and that is what we call it. We now add functionality for sending messages to all group members having a certain identity. Correspondingly, we call this feature 'group multicast'. The service is using unicast when only one destination is found, otherwise it will use the bearer broadcast service to transfer the messages. In the latter case, the receiving members filter arriving messages by looking at the intended destination instance. If there is no match, the message will be dropped, while still being considered received and read as seen by the flow control mechanism. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: introduce group anycast messagingJon Maloy
In this commit, we make it possible to send connectionless unicast messages to any member corresponding to the given member identity, when there is more than one such member. The sender must use a TIPC_ADDR_NAME address to achieve this effect. We also perform load balancing between the destinations, i.e., we primarily select one which has advertised sufficient send window to not cause a block/EAGAIN delay, if any. This mechanism is overlayed on the always present round-robin selection. Anycast messages are subject to the same start synchronization and flow control mechanism as group broadcast messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: introduce group unicast messagingJon Maloy
We now make it possible to send connectionless unicast messages within a communication group. To send a message, the sender can use either a direct port address, aka port identity, or an indirect port name to be looked up. This type of messages are subject to the same start synchronization and flow control mechanism as group broadcast messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: introduce flow control for group broadcast messagesJon Maloy
We introduce an end-to-end flow control mechanism for group broadcast messages. This ensures that no messages are ever lost because of destination receive buffer overflow, with minimal impact on performance. For now, the algorithm is based on the assumption that there is only one active transmitter at any moment in time. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: receive group membership events via member socketJon Maloy
Like with any other service, group members' availability can be subscribed for by connecting to be topology server. However, because the events arrive via a different socket than the member socket, there is a real risk that membership events my arrive out of synch with the actual JOIN/LEAVE action. I.e., it is possible to receive the first messages from a new member before the corresponding JOIN event arrives, just as it is possible to receive the last messages from a leaving member after the LEAVE event has already been received. Since each member socket is internally also subscribing for membership events, we now fix this problem by passing those events on to the user via the member socket. We leverage the already present member synch- ronization protocol to guarantee correct message/event order. An event is delivered to the user as an empty message where the two source addresses identify the new/lost member. Furthermore, we set the MSG_OOB bit in the message flags to mark it as an event. If the event is an indication about a member loss we also set the MSG_EOR bit, so it can be distinguished from a member addition event. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: add second source address to recvmsg()/recvfrom()Jon Maloy
With group communication, it becomes important for a message receiver to identify not only from which socket (identfied by a node:port tuple) the message was sent, but also the logical identity (type:instance) of the sending member. We fix this by adding a second instance of struct sockaddr_tipc to the source address area when a message is read. The extra address struct is filled in with data found in the received message header (type,) and in the local member representation struct (instance.) Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: introduce communication groupsJon Maloy
As a preparation for introducing flow control for multicast and datagram messaging we need a more strictly defined framework than we have now. A socket must be able keep track of exactly how many and which other sockets it is allowed to communicate with at any moment, and keep the necessary state for those. We therefore introduce a new concept we have named Communication Group. Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN. The call takes four parameters: 'type' serves as group identifier, 'instance' serves as an logical member identifier, and 'scope' indicates the visibility of the group (node/cluster/zone). Finally, 'flags' makes it possible to set certain properties for the member. For now, there is only one flag, indicating if the creator of the socket wants to receive a copy of broadcast or multicast messages it is sending via the socket, and if wants to be eligible as destination for its own anycasts. A group is closed, i.e., sockets which have not joined a group will not be able to send messages to or receive messages from members of the group, and vice versa. Any member of a group can send multicast ('group broadcast') messages to all group members, optionally including itself, using the primitive send(). The messages are received via the recvmsg() primitive. A socket can only be member of one group at a time. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: improve destination linked listJon Maloy
We often see a need for a linked list of destination identities, sometimes containing a port number, sometimes a node identity, and sometimes both. The currently defined struct u32_list is not generic enough to cover all cases, so we extend it to contain two u32 integers and rename it to struct tipc_dest_list. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: add new function for sending multiple small messagesJon Maloy
We see an increasing need to send multiple single-buffer messages of TIPC_SYSTEM_IMPORTANCE to different individual destination nodes. Instead of looping over the send queue and sending each buffer individually, as we do now, we add a new help function tipc_node_distr_xmit() to do this. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: refactor function filter_rcv()Jon Maloy
In the following commits we will need to handle multiple incoming and rejected/returned buffers in the function socket.c::filter_rcv(). As a preparation for this, we generalize the function by handling buffer queues instead of individual buffers. We also introduce a help function tipc_skb_reject(), and rename filter_rcv() to tipc_sk_filter_rcv() in line with other functions in socket.c. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: add ability to obtain node availability status from other filesJon Maloy
In the coming commits, functions at the socket level will need the ability to read the availability status of a given node. We therefore introduce a new function for this purpose, while renaming the existing static function currently having the wanted name. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: improve address sanity check in tipc_connect()Jon Maloy
The address given to tipc_connect() is not completely sanity checked, under the assumption that this will be done later in the function __tipc_sendmsg() when the address is used there. However, the latter functon will in the next commits serve as caller to several other send functions, so we want to move the corresponding sanity check there to the beginning of that function, before we possibly need to grab the address stored by tipc_connect(). We must therefore be able to trust that this address already has been thoroughly checked. We do this in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13tipc: add ability to order and receive topology events in driverJon Maloy
As preparation for introducing communication groups, we add the ability to issue topology subscriptions and receive topology events from kernel space. This will make it possible for group member sockets to keep track of other group members. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-13mac80211: don't track HT capability changesJohannes Berg
The code here (more or less accidentally) tracks the HT capability of the AP when connected, and we found at least one AP that erroneously toggles its 20/40 capability bit when changing between 20/40 MHz. The connection to the AP is then broken because we set the 40 MHz disable flag based on this, as soon as it switches to 20 MHz, but because the flag then changed, we disconnect. I'd be inclined to just ignore this issue, since we then reconnect while the AP is in 20 MHz mode and never use 40 MHz with it again, but this code is a bit strange anyway - we don't use the capabilities for anything else. Change the code to simply not track the HT capabilities at all, which assumes that the AP at least sets 20/40 capability when operating in 40 MHz (or higher). If not, rate scaling might end up using only the narrower bandwidth. The new behaviour also mirrors what VHT does, where we only check the VHT operation. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-10-13cfg80211: fix CFG80211_EXTRA_REGDB_KEYDIR typoArnd Bergmann
The missing CONFIG_ prefix means this macro is never defined, leading to a possible Kbuild warning: net/wireless/reg.c:666:20: error: 'load_keys_from_buffer' defined but not used [-Werror=unused-function] static void __init load_keys_from_buffer(const u8 *p, unsigned int buflen) When we use the correct symbol, the warning also goes away. Fixes: 90a53e4432b1 ("cfg80211: implement regdb signature checking") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-10-13cfg80211: don't print log output for building shipped-certsArnd Bergmann
Building an allmodconfig kernel with 'make -s' now prints a single line: GEN net/wireless/shipped-certs.c Using '$(kecho)' here will skip the output with 'make -s' but otherwise keeps printing it, which is consistent with how we handle all the other output. Fixes: 90a53e4432b1 ("cfg80211: implement regdb signature checking") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-10-12selftests: rtnetlink: add a small macsec test caseFlorian Westphal
Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12ravb: Consolidate clock handlingGeert Uytterhoeven
The module clock is used for two purposes: - Wake-on-LAN (WoL), which is optional, - gPTP Timer Increment (GTI) configuration, which is mandatory. As the clock is needed for GTI configuration anyway, WoL is always available. Hence remove duplication and repeated obtaining of the clock by making GTI use the stored clock for WoL use. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12Merge branch 'net-support-bgmac-with-B50212E-B1-PHY'David S. Miller
Rafał Miłecki says: ==================== net: support bgmac with B50212E B1 PHY I got a report that a board with BCM47189 SoC and B50212E B1 PHY doesn't work well some devices as there is massive ping loss. After analyzing PHY state it has appeared that is runs in slave mode and doesn't auto switch to master properly when needed. This patchset fixes this by: 1) Adding new flag support to the PHY driver for setting master mode 2) Modifying bgmac to request master mode for reported hardware ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12net: bgmac: enable master mode for BCM54210E and B50212E PHYsRafał Miłecki
There are 4 very similar PHYs: 0x600d84a1: BCM54210E (rev B0) 0x600d84a2: BCM54210E (rev B1) 0x600d84a5: B50212E (rev B0) 0x600d84a6: B50212E (rev B1) that need setting master mode manually. It's because they run in slave mode by default with Automatic Slave/Master configuration disabled which can lead to unreliable connection with massive ping loss. So far it was reported for a board with BCM47189 SoC and B50212E B1 PHY connected to the bgmac supported ethernet device. Telling PHY driver to setup PHY properly solves this issue. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-12net: phy: broadcom: support new device flag for setting master modeRafał Miłecki
Some of Broadcom's PHYs run by default in slave mode with Automatic Slave/Master configuration disabled. It stops them from working properly with some devices. So far it has been verified for BCM54210E and BCM50212E which don't work well with Intel's I217-LM and I218-LM: http://ark.intel.com/products/60019/Intel-Ethernet-Connection-I217-LM http://ark.intel.com/products/71307/Intel-Ethernet-Connection-I218-LM I was told there is massive ping loss. This commit adds support for a new flag which can be set by an ethernet driver to fixup PHY setup. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>