summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-08bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()Sunil Challa
flow_type in HWRM_FLOW_ALLOC is not being populated correctly due to incorrect passing of pointer and size of l3_mask argument of is_wildcard(). Fixed this. Fixes: db1d36a27324 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds") Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08Merge branch 'for-4.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "This contains fixes for the following two non-trivial issues: - The task iterator got broken while adding thread mode support for v4.14. It was less visible because it only triggers when both cgroup1 and cgroup2 hierarchies are in use. The recent versions of systemd uses cgroup2 for process management even when cgroup1 is used for resource control exposing this issue. - cpuset CPU hotplug path could deadlock when racing against exits. There also are two patches to replace unlimited strcpy() usages with strlcpy()" * 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC cgroup: Fix deadlock in cpu hotplug path cgroup: use strlcpy() instead of strscpy() to avoid spurious warning cgroup: avoid copying strings longer than the buffers
2018-01-08tcp: Split BUG_ON() in tcp_tso_should_defer() into two assertionsStefano Brivio
The two conditions triggering BUG_ON() are somewhat unrelated: the tcp_skb_pcount() check is meant to catch TSO flaws, the second one checks sanity of congestion window bookkeeping. Split them into two separate BUG_ON() assertions on two lines, so that we know which one actually triggers, when they do. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: ipv6: Allow connect to linklocal address from socket bound to vrfDavid Ahern
Allow a process bound to a VRF to connect to a linklocal address. Currently, this fails because of a mismatch between the scope of the linklocal address and the sk_bound_dev_if inherited by the VRF binding: $ ssh -6 fe80::70b8:cff:fedd:ead8%eth1 ssh: connect to host fe80::70b8:cff:fedd:ead8%eth1 port 22: Invalid argument Relax the scope check to allow the socket to be bound to the same L3 device as the scope id. This makes ipv6 linklocal consistent with other relaxed checks enabled by commits 1ff23beebdd3 ("net: l3mdev: Allow send on enslaved interface") and 7bb387c5ab12a ("net: Allow IP_MULTICAST_IF to set index to L3 slave"). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08sh_eth: remove sh_eth_plat_data::edmac_endianSergei Shtylyov
Since the commit 888cc8c20cf ("sh_eth: remove EDMAC_BIG_ENDIAN") (geez, I didn't realize that was 2 years ago!) the initializers in the SuperH platform code for the 'sh_eth_plat_data::edmac_endian' stopped to matter, so we can remove that field for good (not sure if it was ever useful -- SH7786 Ether has been reported to have the same EDMAC descriptor/register endiannes as configured for the SuperH CPU)... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08Merge branch 'hns3-next'David S. Miller
Peng Li says: ==================== add some new features and fix some bugs for HNS3 driver This patchset adds some new features support and fixes some bugs: [Patch 1/20] adds support to enable/disable vlan filter with ethtool [Patch 2/20] disables VFs change rxvlan offload status [Patch 3/20 - 13/120 fix bugs and refine some codes for packet statistics, support query with both ifconfig and ethtool. [Patch 14/20 - 20/20] fix some other bugs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Add more packet size statiscticsJian Shen
The statistics of rx/tx packets size greater than 1518 are not detailed. This patch adds more statistics for different packet size range. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: remove redundant semicolonPeng Li
There is a redundant semicolon, this patch removes it. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: fix for not setting pause parametersFuyun Liang
Pause parameters include source address, transmit gap and pause time. The default value of the pause source address is zero in the hardware. Default pause parameters need to be set to the hardware. Also, when setting new mac address, the pause source address need to be updated. Fixes: 9dc2145d910e ("net: hns3: Add support for PFC setting in TM module") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: add MTU initialization for hardwareFuyun Liang
When initializing the MAC, the MTU vlaue need to be set to the hardware too. Otherwise, the MTU value of software will be different from the MTU value of hardware. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: fix for changing MTUFuyun Liang
when changing MTU, The new MTU must need to be set to netdevice. Fixes: a8e8b7ff3517 ("net: hns3: Add support to change MTU in HNS3 hardware") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: fix for setting MTUFuyun Liang
When setting MTU, actually what we do is configuring the max frame size for the hardware. ETH_HLEN、ETH_FCS_LEN and VLAN_HLEN must need to be considered. And the frame size which is less than the default value should not be set to the hardware. Because in the hardware, the the max frame size not only controls the RX packet size, but also controls the TX packet size. the RX packets whose size are greater than the setting value will be dropped. This patch fixes the bug setting a error max frame size to hardware. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: fix for updating fc_mode_last_timeFuyun Liang
commit a9c782822166 ("net: hns3: add support for set_pauseparam") adds set_pauseparam support for ethtool cmd, but forgets to update fc_mode_last_time when PFC mode is disabled in hclge_cfg_pauseparam(). The wrong fc_mode_last_time will be used to update flow control mode when lldpad has been running. As a result, when using the ethtool command "-a", user will get a wrong pause parameter. This patch adds the fc_mode_last_time update when PFC mode is disabled. Fixes: a9c782822166 ("net: hns3: add support for set_pauseparam") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix a response data read error of tqp statistics queryJian Shen
The result of tqp statistics query was read with an error position, fix it according to the user manual. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Add packet statistics of netdevJian Shen
Add packet statistics of netdev for ethtool -S, in order to show the statistics data for current net device. Remove update_stats() calling because it has been completed in hns3_get_netdev_stats(). Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Remove a useless member of struct hns3_statsJian Shen
The member "stats_size" of struct hns3_stats is useless, remove it and fix the macro definition which has uses this struct. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix an error macro definition of HNS3_TQP_STATJian Shen
The member "stats_offset" was designed to indicate the offset of each member of struct ring_stats in struct hns3_enet_ring, but forgot to add the offset of the member in struct ring_stats. Fixes: 496d03e960a ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix a loop index error of tqp statistics queryJian Shen
An error loop index was used while querying statistics data of tqps, which may cause call trace. Fixes: 496d03e960ae ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix an error of total drop packet statisticsJian Shen
The dropped tx/rx packets number of each tqp should also be counted into the total drop tx/rx packets numbers. Fixes: 76ad4f0ee74 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Mask the packet statistics query when NIC is downJian Shen
Update the HNS3_NIC_STATE_DOWN bit when NIC state changes. When NIC is down, mask the packet statistics for querying with ifconfig command. It's a common practice. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Modify the update period of packet statisticsJian Shen
It takes more than 200 query response messages between driver and IMP, while updating the packet statistics. It's too heavy for IMP to update it per second. Extend the update period of packet statistics data from 1 second to 300 seconds(if too long, the statistics may overflow). As a result, we need to update it while querying with ifconfig tool to keep the statistics data fresh. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Remove repeat statistic of rx_errorsJian Shen
The igu_rx_err_pkt indicates the same error with mac_rx_fcs_err_pkt_num, so remove it. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Fix spelling errorsJian Shen
Fix spelling error "overrsize" --> "oversize". Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Unify the strings display of packet statisticsJian Shen
Some members of packet statistics are named in different styles. This patch unifies them with new internal name rules, the main modification are below: trans --> tx rcv --> rx rcb_q%d_tx --> txq#%d rcb_q%d_rx --> rxq#%d sw_err_cnt(tx side) --> tx_dropped sw_err_cnt(rx side) --> rx_dropped pkts --> packets tx_err_cnt --> errors rx_err_cnt --> errors Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Disable VFs change rxvlan offload statusJian Shen
Rxvlan offload status can only be changed by PF. Initialize the value of NETIF_F_HW_VLAN_CTAG_RX bit of hw_features for VFS to false, make sure user can't be able to change it. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: hns3: Add ethtool interface for vlan filterJian Shen
This patch adds vlan filter enable switch to support ethtool -K ethX rx-vlan-filter on/off. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08Merge branch 'net-qualcomm-rmnet-Enable-csum-offloads'David S. Miller
Subash Abhinov Kasiviswanathan says: ==================== net: qualcomm: rmnet: Enable csum offloads This series introduces the MAPv4 packet format for checksum offload plus some other minor changes. Patches 1-3 are cleanups. Patch 4 renames the ingress format to data format so that all data formats can be configured using this going forward. Patch 5 uses the pacing helper to improve TCP transmit performance. Patch 6-9 defines the the MAPv4 for checksum offload for RX and TX. A new header and trailer format are used as part of MAPv4. For RX checksum offload, only the 1's complement of the IP payload portion is computed by hardware. The meta data from RX header is used to verify the checksum field in the packet. Note that the IP packet and its field itself is not modified by hardware. This gives metadata to help with the RX checksum. For TX, the required metadata is filled up so hardware can compute the checksum. Patch 10 enables GSO on rmnet devices v1->v2: Fix sparse errors reported by kbuild test robot v2->v3: Update the commit message for Patch 5 based on Eric's comments ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Add support for GSOSubash Abhinov Kasiviswanathan
Real devices may support scatter gather(SG), so enable SG on rmnet devices to use GSO. GSO reduces CPU cycles by 20% for a rate of 146Mpbs for a single stream TCP connection. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Add support for TX checksum offloadSubash Abhinov Kasiviswanathan
TX checksum offload applies to TCP / UDP packets which are not fragmented using the MAPv4 checksum trailer. The following needs to be done to have checksum computed in hardware - 1. Set the checksum start offset and inset offset. 2. Set the csum_enabled bit 3. Compute and set 1's complement of partial checksum field in transport header. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Handle command packets with checksum trailerSubash Abhinov Kasiviswanathan
When using the MAPv4 packet format in conjunction with MAP commands, a dummy DL checksum trailer will be appended to the packet. Before this packet is sent out as an ACK, the DL checksum trailer needs to be removed. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Add support for RX checksum offloadSubash Abhinov Kasiviswanathan
When using the MAPv4 packet format, receive checksum offload can be enabled in hardware. The checksum computation over pseudo header is not offloaded but the rest of the checksum computation over the payload is offloaded. This applies only for TCP / UDP packets which are not fragmented. rmnet validates the TCP/UDP checksum for the packet using the checksum from the checksum trailer added to the packet by hardware. The validation performed is as following - 1. Perform 1's complement over the checksum value from the trailer 2. Compute 1's complement checksum over IPv4 / IPv6 header and subtracts it from the value from step 1 3. Computes 1's complement checksum over IPv4 / IPv6 pseudo header and adds it to the value from step 2 4. Subtracts the checksum value from the TCP / UDP header from the value from step 3. 5. Compares the value from step 4 to the checksum value from the TCP / UDP header. 6. If the comparison in step 5 succeeds, CHECKSUM_UNNECESSARY is set and the packet is passed on to network stack. If there is a failure, then the packet is passed on as such without modifying the ip_summed field. The checksum field is also checked for UDP checksum 0 as per RFC 768 and for unexpected TCP checksum of 0. If checksum offload is disabled when using MAPv4 packet format in receive path, the packet is queued as is to network stack without the validations above. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Define the MAPv4 packet formatsSubash Abhinov Kasiviswanathan
The MAPv4 packet format adds support for RX / TX checksum offload. For a bi-directional UDP stream at a rate of 570 / 146 Mbps, roughly 10% CPU cycles are saved. For receive path, there is a checksum trailer appended to the end of the MAP packet. The valid field indicates if hardware has computed the checksum. csum_start_offset indicates the offset from the start of the IP header from which hardware has computed checksum. csum_length is the number of bytes over which the checksum was computed and the resulting value is csum_value. In the transmit path, a header is appended between the end of the MAP header and the start of the IP packet. csum_start_offset is the offset in bytes from which hardware will compute the checksum if the csum_enabled bit is set. udp_ip4_ind indicates if the checksum value of 0 is valid or not. csum_insert_offset is the offset from the csum_start_offset where hardware will insert the computed checksum. The use of this additional packet format for checksum offload is explained in subsequent patches. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Set pacing shiftSubash Abhinov Kasiviswanathan
The real device over which the rmnet devices are installed also aggregate multiple IP packets and sends them as a single large aggregate frame to the hardware. This causes degraded throughput for TCP TX due to bufferbloat. To overcome this problem, pacing shift value of 8 is set using the sk_pacing_shift_update() helper. This value was determined based on experiments with a single stream TCP TX using iperf for a duration of 30s. Pacing shift | Observed data rate (Mbps) 10 | 9 9 | 140 8 | 146 (Max link rate) Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Rename ingress data format to data formatSubash Abhinov Kasiviswanathan
This is done so that we can use this field for both ingress and egress flags. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Remove unused function declarationSubash Abhinov Kasiviswanathan
rmnet_map_demultiplex() is only declared but not defined anywhere, so remove it. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Remove invalid condition while stamping mux idSubash Abhinov Kasiviswanathan
rmnet devices cannot have a mux id of 255. This is validated when assigning the mux id to the rmnet devices. As a result, checking for mux id 255 does not apply in egress path. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: qualcomm: rmnet: Remove redundant check when stamping map headerSubash Abhinov Kasiviswanathan
We already check the headroom once in rmnet_map_egress_handler(), so this is not needed. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08platform/x86: wmi: Call acpi_wmi_init() laterRafael J. Wysocki
Calling acpi_wmi_init() at the subsys_initcall() level causes ordering issues to appear on some systems and they are difficult to reproduce, because there is no guaranteed ordering between subsys_initcall() calls, so they may occur in different orders on different systems. In particular, commit 86d9f48534e8 (mm/slab: fix kmemcg cache creation delayed issue) exposed one of these issues where genl_init() and acpi_wmi_init() are both called at the same initcall level, but the former must run before the latter so as to avoid a NULL pointer dereference. For this reason, move the acpi_wmi_init() invocation to the initcall_sync level which should still be early enough for things to work correctly in the WMI land. Link: https://marc.info/?t=151274596700002&r=1&w=2 Reported-by: Jonathan McDowell <noodles@earth.li> Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Tested-by: Jonathan McDowell <noodles@earth.li> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-01-08netfilter: ipset: Missing nfnl_lock()/nfnl_unlock() is added to ↵Jozsef Kadlecsik
ip_set_net_exit() Patch "netfilter: ipset: use nfnl_mutex_is_locked" is added the real mutex locking check, which revealed the missing locking in ip_set_net_exit(). Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Reported-by: syzbot+36b06f219f2439fe62e1@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: ipset: Fix "don't update counters" mode when counters used at the ↵Jozsef Kadlecsik
matching The matching of the counters was not taken into account, fixed. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: ipset: use swap macro instead of _manually_ swapping valuesGustavo A. R. Silva
Make use of the swap macro and remove unnecessary variables tmp. This makes the code easier to read and maintain. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: flow offload expressionPablo Neira Ayuso
Add new instruction for the nf_tables VM that allows us to specify what flows are offloaded into a given flow table via name. This new instruction creates the flow entry and adds it to the flow table. Only established flows, ie. we have seen traffic in both directions, are added to the flow table. You can still decide to offload entries at a later stage via packet counting or checking the ct status in case you want to offload assured conntracks. This new extension depends on the conntrack subsystem. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: flow table support for the mixed IPv4/IPv6 familyPablo Neira Ayuso
This patch adds the IPv6 flow table type, that implements the datapath flow table to forward IPv6 traffic. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: flow table support for IPv6Pablo Neira Ayuso
This patch adds the IPv6 flow table type, that implements the datapath flow table to forward IPv6 traffic. This patch exports ip6_dst_mtu_forward() that is required to check for mtu to pass up packets that need PMTUD handling to the classic forwarding path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: flow table support for IPv4Pablo Neira Ayuso
This patch adds the IPv4 flow table type, that implements the datapath flow table to forward IPv4 traffic. Rationale is: 1) Look up for the packet in the flow table, from the ingress hook. 2) If there's a hit, decrement ttl and pass it on to the neighbour layer for transmission. 3) If there's a miss, packet is passed up to the classic forwarding path. This patch also supports layer 3 source and destination NAT. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: add generic flow table infrastructurePablo Neira Ayuso
This patch defines the API to interact with flow tables, this allows to add, delete and lookup for entries in the flow table. This also adds the generic garbage code that removes entries that have expired, ie. no traffic has been seen for a while. Users of the flow table infrastructure can delete entries via flow_offload_dead(), which sets the dying bit, this signals the garbage collector to release an entry from user context. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: add flow table netlink frontendPablo Neira Ayuso
This patch introduces a netlink control plane to create, delete and dump flow tables. Flow tables are identified by name, this name is used from rules to refer to an specific flow table. Flow tables use the rhashtable class and a generic garbage collector to remove expired entries. This also adds the infrastructure to add different flow table types, so we can add one for each layer 3 protocol family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_conntrack: add IPS_OFFLOAD status bitPablo Neira Ayuso
This new bit tells us that the conntrack entry is owned by the flow table offload infrastructure. # cat /proc/net/nf_conntrack ipv4 2 tcp 6 src=10.141.10.2 dst=147.75.205.195 sport=36392 dport=443 src=147.75.205.195 dst=192.168.2.195 sport=443 dport=36392 [OFFLOAD] mark=0 zone=0 use=2 Note the [OFFLOAD] tag in the listing. The timer of such conntrack entries look like stopped from userspace. In practise, to make sure the conntrack entry does not go away, the conntrack timer is periodically set to an arbitrary large value that gets refreshed on every iteration from the garbage collector, so it never expires- and they display no internal state in the case of TCP flows. This allows us to save a bitcheck from the packet path via nf_ct_is_expired(). Conntrack entries that have been offloaded to the flow table infrastructure cannot be deleted/flushed via ctnetlink. The flow table infrastructure is also responsible for releasing this conntrack entry. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_tables: remove nft_dereference()Pablo Neira Ayuso
This macro is unnecessary, it just hides details for one single caller. nfnl_dereference() is just enough. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: remove defensive check on malformed packets from raw socketsPablo Neira Ayuso
Users cannot forge malformed IPv4/IPv6 headers via raw sockets that they can inject into the stack. Specifically, not for IPv4 since 55888dfb6ba7 ("AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)"). IPv6 raw sockets also ensure that packets have a well-formed IPv6 header available in the skbuff. At quick glance, br_netfilter also validates layer 3 headers and it drops malformed both IPv4 and IPv6 packets. Therefore, let's remove this defensive check all over the place. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>