summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-02net: phy: marvell10g: add support for half duplex 100M and 10MRussell King
Add support for half-duplex 100M and 10M copper connections by parsing the advertisment results rather than trying to decode the negotiated speed from one of the PHYs "vendor" registers. This allows us to decode the duplex as well, which means we can support half-duplex mode for the slower speeds. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: phy: add helper to convert negotiation result to phy settingsRussell King
Add a helper to convert the result of the autonegotiation advertisment into the PHYs speed and duplex settings. If the result is full duplex, also extract the pause mode settings from the link partner advertisment. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: phy: marvell10g: clean up interface mode switchingRussell King
Centralise the PHY interface mode switching, rather than having it in two places. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: phy: marvell10g: add MDI swap reportingRussell King
Add reporting of the MDI swap to the Marvell 10G PHY driver by providing a generic implementation for the standard 10GBASE-T pair swap register and polarity register. We also support reading the MDI swap status for 1G and below from a PCS register. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: phy: marvell10g: update header commentsRussell King
Update header comments to indicate the newly found behaviour with XAUI interfaces. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: hns: add ACPI mode support for ethtool -pJian Shen
The locate operation interface of fiber port can only work with DT mode. Add a new interface to control the locate led for ACPI mode. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Tested-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02cxgb4: Check alignment constraint for T6Ganesh Goudar
Update the check for setting IPV4 filters and align filter_id to multiple of 2, only for IPv6 filters in case of T6. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02Merge branch 'ena-next'David S. Miller
Netanel Belgazal says: ==================== update ENA driver to version 1.5.0 This patchset contains two changes: * Add a robust mechanism for detection of stuck Rx/Tx rings due to missed or misrouted MSI-X * Increase the driver version to 1.5.0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: ena: increase ena driver version to 1.5.0Netanel Belgazal
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: ena: add detection and recovery mechanism for handling missed/misrouted ↵Netanel Belgazal
MSI-X A mechanism for detection of stuck Rx/Tx rings due to missed or misrouted interrupts. Check if there are unhandled completion descriptors before the first MSI-X interrupt arrived. The check is per queue and per interrupt vector. Once such condition is detected, driver and device reset is scheduled. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02Merge branch 'tcp-sctp-dccp-Replace-jprobe-usage-with-trace-events'David S. Miller
Masami Hiramatsu says: ==================== net: tcp: sctp: dccp: Replace jprobe usage with trace events This series is v7 of the replacement of jprobe usage with trace events. This version fixes net/dccp/trace.h to avoid sparse warning. Since the TP_STORE_ADDR_PORTS macro can be shared with trace/events/tcp.h, it also introduce a new common header file and move the definition of that macro. Previous version is here; https://lkml.org/lkml/2017/12/28/7 Changes from v6: [5/6]: Avoid preprocessor directives in tracepoint macro args ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: dccp: Remove dccpprobe moduleMasami Hiramatsu
Remove DCCP probe module since jprobe has been deprecated. That function is now replaced by dccp/dccp_probe trace-event. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: dccp: Add DCCP sendmsg trace eventMasami Hiramatsu
Add DCCP sendmsg trace event (dccp/dccp_probe) for replacing dccpprobe. User can trace this event via ftrace or perftools. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: sctp: Remove debug SCTP probe moduleMasami Hiramatsu
Remove SCTP probe module since jprobe has been deprecated. That function is now replaced by sctp/sctp_probe and sctp/sctp_probe_path trace-events. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: sctp: Add SCTP ACK tracking trace eventMasami Hiramatsu
Add SCTP ACK tracking trace event to trace the changes of SCTP association state in response to incoming packets. It is used for debugging SCTP congestion control algorithms, and will replace sctp_probe module. Note that this event a bit tricky. Since this consists of 2 events (sctp_probe and sctp_probe_path) so you have to enable both events as below. # cd /sys/kernel/debug/tracing # echo 1 > events/sctp/sctp_probe/enable # echo 1 > events/sctp/sctp_probe_path/enable Or, you can enable all the events under sctp. # echo 1 > events/sctp/enable Since sctp_probe_path event is always invoked from sctp_probe event, you can not see any output if you only enable sctp_probe_path. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: tcp: Remove TCP probe moduleMasami Hiramatsu
Remove TCP probe module since jprobe has been deprecated. That function is now replaced by tcp/tcp_probe trace-event. You can use it via ftrace or perftools. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: tcp: Add trace events for TCP congestion window tracingMasami Hiramatsu
This adds an event to trace TCP stat variables with slightly intrusive trace-event. This uses ftrace/perf event log buffer to trace those state, no needs to prepare own ring-buffer, nor custom user apps. User can use ftrace to trace this event as below; # cd /sys/kernel/debug/tracing # echo 1 > events/tcp/tcp_probe/enable (run workloads) # cat trace Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02Merge branch 'qed-Advance-to-FW-8.33.1.0'David S. Miller
Tomer Tayar says: ==================== qed*: Advance to FW 8.33.1.0 This series advances all qed* drivers to use firmware 8.33.1.0 which brings new capabilities and initial support of new HW. The changes are mostly in qed, and include changes in the FW interface files, as well as updating the FW initialization and debug collection code. The protocol drivers have minor functional changes for this firmware. Patch 1 Rearranges and refactors the FW interface files in preparation of the new FW (no functional change). Patch 2 Prepares the code for support of new HW (no functional change). Patch 3 Actual utilization of the new FW. Patch 4 Advances drivers' version. v3->v4: Fix a compilation issue which was reported by krobot (dependency on CRC8). v2->v3: Resend the series with a fixed title in the cover letter. v1->v2: - Break the previous single patch into several patches. - Fix compilation issues which were reported by krobot. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02qed*: Advance drivers' version to 8.33.0.20Tomer Tayar
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com> Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02qed*: Utilize FW 8.33.1.0Tomer Tayar
Advance the qed* drivers to use firmware 8.33.1.0: Modify core driver (qed) to utilize the new FW and initialize the device with it. This is the lion's share of the patch, and includes changes to FW interface files, device initialization flows, FW interaction flows, and debug collection flows. Modify Ethernet driver (qede) to make use of new FW in fastpath. Modify RoCE/iWARP driver (qedr) to make use of new FW in fastpath. Modify FCoE driver (qedf) to make use of new FW in fastpath. Modify iSCSI driver (qedi) to make use of new FW in fastpath. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Bason <Yuval.Bason@cavium.com> Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Manish Chopra <Manish.Chopra@cavium.com> Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com> Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02qed*: HSI renaming for different types of HWTomer Tayar
This patch renames defines and structures in the FW HSI files to allow a distinction between different types of HW. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com> Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02qed*: Refactoring and rearranging FW API with no functional impactTomer Tayar
This patch refactors and reorders the FW API files in preparation of upgrading the code to support new FW. - Make use of the BIT macro in appropriate places. - Whitespace changes to align values and code blocks. - Comments are updated (spelling mistakes, removed if not clear). - Group together code blocks which are related or deal with similar matters. Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02inet_diag: Add equal-operator for portsKristian Evensen
inet_diag currently provides less/greater than or equal operators for comparing ports when filtering sockets. An equal comparison can be performed by combining the two existing operators, or a user can for example request a port range and then do the final filtering in userspace. However, these approaches both have drawbacks. Implementing equal using LE/GE causes the size and complexity of a filter to grow quickly as the number of ports increase, while it on busy machines would be great if the kernel only returns information about relevant sockets. This patch introduces source and destination port equal operators. INET_DIAG_BC_S_EQ is used to match a source port, INET_DIAG_BC_D_EQ a destination port, and usage is the same as for the existing port operators. I.e., the port to match is stored in the no-member of the next inet_diag_bc_op-struct in the filter. Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02Merge branch 's390-next'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: updates 2017-12-27 please apply some post-christmas leftovers for 4.16. Two patches to improve IP address management on L3, and two that add support to auto-config the transport mode on z/VM VNICs. Note that one patch in the series touches arch/s390 (acked by Martin). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02s390/qeth: support early setup for z/VM NICsJulian Wiedmann
The transport mode that a z/VM NIC is configured in, must match the hypervisor-internal network which the NIC is coupled to. To get this right automatically, have qeth issue a diag26c hypervisor call that provides all sorts of information for a specific VNIC. With z/VM update VM65918, this also includes the VNIC's required transport mode. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02s390/diag: add diag26c support for VNIC infoJulian Wiedmann
With subcode 0x24, diag26c returns all sorts of VNIC-related information. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02s390/qeth: use common helper to display rxip/vipaJulian Wiedmann
By parameterising the address type, we need just one helper that walks the IP table and builds up the response string. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02s390/qeth: improve error reporting on IP add/removalJulian Wiedmann
When adding & removing IP entries for rxip/vipa/ipato/hsuid, forward any resulting errors back to the sysfs-level caller. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02openvswitch: drop unneeded newlineJulia Lawall
OVS_NLERR prints a newline at the end of the message string, so the message string does not need to include a newline explicitly. Done using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: dccp: drop unneeded newlineJulia Lawall
DCCP_CRIT prints some other text and then a newline after the message string, so the message string does not need to include a newline explicitly. Done using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: sched: fix skb leak in dev_requeue_skb()Wei Yongjun
When dev_requeue_skb() is called with bulked skb list, only the first skb of the list will be requeued to qdisc layer, and leak the others without free them. TCP is broken due to skb leak since no free skb will be considered as still in the host queue and never be retransmitted. This happend when dev_requeue_skb() called from qdisc_restart(). qdisc_restart |-- dequeue_skb |-- sch_direct_xmit() |-- dev_requeue_skb() <-- skb may bluked Fix dev_requeue_skb() to requeue the full bluked list. Also change to use __skb_queue_tail() in __dev_requeue_skb() to avoid skb out of order. Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02cxgb4: use CLIP with LIP6 on T6 for TCAM filtersGanesh Goudar
On T6, LIP compression is always enabled for IPv6 and uncompressed IPv6 for LIP is not supported. So, for IPv6 TCAM filters on T6, add LIP6 to CLIP on filter creation, and release the same on filter deletion. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02selftests: rtnetlink: add erspan and ip6erspanWilliam Tu
Add test cases for ipv4, ipv6 erspan, v1 and v2 native mode and external (collect metadata) mode. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02forcedeth: optimize the rx with likelyZhu Yanjun
In the rx fastpath, the function netdev_alloc_skb rarely fails. Therefore, a likely() optimization is added to this error check conditional. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02selftests/net: fix bugs in address and port initializationSowmini Varadhan
Address/port initialization should work correctly regardless of the order in which command line arguments are supplied, E.g, cfg_port should be used to connect to the remote host even if it is processed after -D, src/dst address initialization should not require that [-4|-6] be specified before the -S or -D args, receiver should be able to bind to *.<cfg_port> Achieve this by making sure that the address/port structures are initialized after all command line options are parsed. Store cfg_port in host-byte order, and use htons() to set up the sin_port/sin6_port before bind/connect, so that the network system calls get the correct values in network-byte order. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02Merge branch 'net-sched-Fix-RED-qdisc-offload-flag'David S. Miller
Nogah Frankel says: ==================== net: sched: Fix RED qdisc offload flag Replacing the RED offload flag (TC_RED_OFFLOADED) with the generic one (TCQ_F_OFFLOADED) deleted some of the logic behind it. This patchset fixes this problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: sched: Move offload check till after dump callNogah Frankel
Move the check of the offload state to after the qdisc dump action was called, so the qdisc could update it if it was changed. Fixes: 7a4fa29106d9 ("net: sched: Add TCA_HW_OFFLOAD") Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net_sch: red: Fix the new offload indicationNogah Frankel
Update the offload flag, TCQ_F_OFFLOADED, in each dump call (and ignore the offloading function return value in relation to this flag). This is done because a qdisc is being initialized, and therefore offloaded before being grafted. Since the ability of the driver to offload the qdisc depends on its location, a qdisc can be offloaded and un-offloaded by graft calls, that doesn't effect the qdisc itself. Fixes: 428a68af3a7c ("net: sched: Move to new offload indication in RED" Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02sky2: Replace mdelay with msleep in sky2_vpd_waitJia-Ju Bai
sky2_vpd_wait is not called in an interrupt handler nor holding a spinlock. The function mdelay in it can be replaced with msleep, to reduce busy wait. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: ptr_ring: otherwise safe empty checks can overrun array boundsJohn Fastabend
When running consumer and/or producer operations and empty checks in parallel its possible to have the empty check run past the end of the array. The scenario occurs when an empty check is run while __ptr_ring_discard_one() is in progress. Specifically after the consumer_head is incremented but before (consumer_head >= ring_size) check is made and the consumer head is zeroe'd. To resolve this, without having to rework how consumer/producer ops work on the array, simply add an extra dummy slot to the end of the array. Even if we did a rework to avoid the extra slot it looks like the normal case checks would suffer some so best to just allocate an extra pointer. Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-31Merge branch 'bpf-offload-report-dev'Daniel Borkmann
Jakub Kicinski says: ==================== This series is a redo of reporting offload device information to user space after the first attempt did not take into account name spaces. As requested by Kirill offloads are now protected by an r/w sem. This allows us to remove the workqueue and free the offload state fully when device is removed (suggested by Alexei). Net namespace is reported with a device/inode pair. The accompanying bpftool support is placed in common code because maps will have very similar info. Note that the UAPI information can't be nicely encapsulated into a struct, because in case we need to grow the device information the new fields will have to be added at the end of struct bpf_prog_info, we can't grow structures in the middle of bpf_prog_info. v3: - use dev_get_by_index(); - redo ns code (new patch 6). v2: - rework the locking in patch 1 (use RCU instead of locking dependencies); - grab RTNL for a short time in patch 6; - minor update to the test in patch 8. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31selftests/bpf: test device info reporting for bound progsJakub Kicinski
Check if bound programs report correct device info. Test in local namespace, in remote one, back to the local ns, remove the device and check that information is cleared. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31tools: bpftool: report device information for offloaded programsJakub Kicinski
Print the just-exposed device information about device to which program is bound. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31bpf: offload: report device information for offloaded programsJakub Kicinski
Report to the user ifindex and namespace information of offloaded programs. If device has disappeared return -ENODEV. Specify the namespace using dev/inode combination. CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31nsfs: generalize ns_get_path() for path resolution with a taskJakub Kicinski
ns_get_path() takes struct task_struct and proc_ns_ops as its parameters. For path resolution directly from a namespace, e.g. based on a networking device's net name space, we need more flexibility. Add a ns_get_path_cb() helper which will allow callers to use any method of obtaining the name space reference. Convert ns_get_path() to use ns_get_path_cb(). Following patches will bring a networking user. CC: Eric W. Biederman <ebiederm@xmission.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31bpf: offload: free program id when device disappearsJakub Kicinski
Bound programs are quite useless after their device disappears. They are simply waiting for reference count to go to zero, don't list them in BPF_PROG_GET_NEXT_ID by freeing their ID early. Note that orphaned offload programs will return -ENODEV on BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31bpf: offload: free prog->aux->offload when device disappearsJakub Kicinski
All bpf offload operations should now be under bpf_devs_lock, it's safe to free and clear the entire offload structure, not only the netdev pointer. __bpf_prog_offload_destroy() will no longer be called multiple times. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31bpf: offload: allow netdev to disappear while verifier is runningJakub Kicinski
To allow verifier instruction callbacks without any extra locking NETDEV_UNREGISTER notification would wait on a waitqueue for verifier to finish. This design decision was made when rtnl lock was providing all the locking. Use the read/write lock instead and remove the workqueue. Verifier will now call into the offload code, so dev_ops are moved to offload structure. Since verifier calls are all under bpf_prog_is_dev_bound() we no longer need static inline implementations to please builds with CONFIG_NET=n. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31bpf: offload: don't use prog->aux->offload as booleanJakub Kicinski
We currently use aux->offload to indicate that program is bound to a specific device. This forces us to keep the offload structure around even after the device is gone. Add a bool member to struct bpf_prog_aux to indicate if offload was requested. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31bpf: offload: don't require rtnl for dev list manipulationJakub Kicinski
We don't need the RTNL lock for all operations on offload state. We only need to hold it around ndo calls. The device offload initialization doesn't require it. The soon-to-come querying of the offload info will only need it partially. We will also be able to remove the waitqueue in following patches. Use struct rw_semaphore because map offload will require sleeping with the semaphore held for read. Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>