summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-05-14bpf, x64: clean up retpoline emission slightlyDaniel Borkmann
Make the RETPOLINE_{RA,ED}X_BPF_JIT() a bit more readable by cleaning up the macro, aligning comments and spacing. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14bpf, sparc: remove unused variableDaniel Borkmann
Since fe83963b7c38 ("bpf, sparc64: remove ld_abs/ld_ind") it's not used anymore therefore remove it. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14bpf, mips: remove unused functionDaniel Borkmann
The ool_skb_header_pointer() and size_to_len() is unused same as tmp_offset, therefore remove all of them. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-14net/mlx5e: Remove MLX5E_TEST_BIT macroGal Pressman
MLX5E_TEST_BIT macro is the same as the already existent test_bit, remove it and replace all usages. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Use test bit in en accel xmit flowGal Pressman
Replace (mask & bit) check with test_bit. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Use __set_bit for adaptive-moderation bit in RQ stateGal Pressman
Make the code more clear by replacing the existing code with __set_bit. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Report all channels with min RX WQEs timeoutEran Ben Elisha
Report all channels which got timeout on posting the minimal number of RX WQEs and not only the first one. Avoid busy wait on every channel, when one of the RQs check got timeout, poll once for the remaining RQs. In addition, add channel index to log when failed to get min RX WQEs This info is needed in order to debug in case of dysfunctional channel. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Support offloaded TC flows with no matches on headersOr Gerlitz
For example: tc filter add dev ens2f0_0 parent ffff: flower skip_sw action drop Note that for eswitch flows, we still always match on the source port. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Get the required HW match level while parsing TC flow matchesOr Gerlitz
Introduce levels of matching on headers of offloaded flows (none, L2, L3, L4) that follow the inline mode levels. This is pre-step for us to offload flows without any matches on headers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Properly order min inline mode setup while parsing TC matchesOr Gerlitz
Set the initial value to none instead of L2, and set to L2 where the previous initial value was assumed. Make sure to parse L2 matches before L3 matches and L3 before L4. This is a pre-step to get the match level for more purposes other than the validating the needed vs. actual inline level. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Use local actions var while processing offloaded TC flow actionsOr Gerlitz
Use local actions variable while parsing the actions of offloaded TC flow. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Return success when TC offloaded fdb actions parsed okOr Gerlitz
Reaching here, means we didn't err anywhere, so lets just return success. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Avoid redundant zeroing of offloaded TC flow attributesOr Gerlitz
This is not needed as the attributes are zeroed out on allocation. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Clean static checker complaints on TC offload and VF reps codeOr Gerlitz
Clean warning/check complaints made by checkpatch on en_{tc,rep}.c Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Remove double defined DMAC header re-write elementOr Gerlitz
The firmware DMAC_47_16 header re-write token was defined twice, clean it up. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Use bool as return type for mlx5e_xdp_handleTariq Toukan
Function returns boolean values, use bool instead of int. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Use u8 instead of int for LRO number of segmentsTariq Toukan
Range of LRO number of segments fits in u8. Also, bring initialization and declaration together to save code. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Skip redundant checks when providing NUD lastuse feedbackRoi Dayan
It's redundant to continue the loop if we found one flow whose lastuse value being newer than the last one we reported, since this is enough for us to trigger a NUD update (neigh_event_send()). Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14net/mlx5e: Remove redundant vport context vlan updateSaeed Mahameed
In delete vlan flow an extra call to mlx5e_vport_context_update_vlans was added by mistake, remove it. Fixes: 86d722ad2c3b ("net/mlx5: Use flow steering infrastructure for mlx5_en") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-05-14samples/bpf: xdp_monitor, accept short optionsPrashant Bhole
Updated optstring parameter for getopt_long() to accept short options. Also updated usage() function. Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-14Merge branch 'bpf-stackmap-nmi'Daniel Borkmann
Song Liu says: ==================== Changes v2 -> v3: Improve syntax based on suggestion by Tobin C. Harding. Changes v1 -> v2: 1. Rename some variables to (hopefully) reduce confusion; 2. Check irq_work status with IRQ_WORK_BUSY (instead of work->sem); 3. In Kconfig, let BPF_SYSCALL select IRQ_WORK; 4. Add static to DEFINE_PER_CPU(); 5. Remove pr_info() in stack_map_init(). ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-14bpf: add selftest for stackmap with build_id in NMI contextSong Liu
This new test captures stackmap with build_id with hardware event PERF_COUNT_HW_CPU_CYCLES. Because we only support one ips-to-build_id lookup per cpu in NMI context, stack_amap will not be able to do the lookup in this test. Therefore, we didn't do compare_stack_ips(), as it will alwasy fail. urandom_read.c is extended to run configurable cycles so that it can be caught by the perf event. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-14bpf: enable stackmap with build_id in nmi contextSong Liu
Currently, we cannot parse build_id in nmi context because of up_read(&current->mm->mmap_sem), this makes stackmap with build_id less useful. This patch enables parsing build_id in nmi by putting the up_read() call in irq_work. To avoid memory allocation in nmi context, we use per cpu variable for the irq_work. As a result, only one irq_work per cpu is allowed. If the irq_work is in-use, we fallback to only report ips. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-14cxgb4: do not fail vf instatiation in slave modeArjun Vynipadath
We no longer require a check for cxgb4 to be MASTER when configuring SRIOV, It was required when we had module parameter to instantiate vf. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14mlxsw: spectrum_span: Support LAG under mirror-to-gretapPetr Machata
When resolving a path that the packet will take after being encapsulated in mirror-to-gretap scenarios, one of the devices en route could be a LAG. In that case, mirror to first up slave that corresponds to a front panel port. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14net: ethernet: ti: Use ERR_CAST instead of ERR_PTR(PTR_ERR())Hernán Gonzalez
Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)). drivers/net/ethernet/ti/cpts.c:567:9-16: WARNING: ERR_CAST can be used with cpts->refclk Generated by: scripts/coccinelle/api/err_cast.cocci Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14sched: cls: enable verbose loggingMarcelo Ricardo Leitner
Currently, when the rule is not to be exclusively executed by the hardware, extack is not passed along and offloading failures don't get logged. The idea was that hardware failures are okay because the rule will get executed in software then and this way it doesn't confuse unware users. But this is not helpful in case one needs to understand why a certain rule failed to get offloaded. Considering it may have been a temporary failure, like resources exceeded or so, reproducing it later and knowing that it is triggering the same reason may be challenging. The ultimate goal is to improve Open vSwitch debuggability when using flower offloading. This patch adds a new flag to enable verbose logging. With the flag set, extack will be passed to the driver, which will be able to log the error. As the operation itself probably won't fail (not because of this, at least), current iproute will already log it as a Warning. The flag is generic, so it can be reused later. No need to restrict it just for HW offloading. The command line will follow the syntax that tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning. For example: # ./tc qdisc add dev p7p1 ingress # ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \ flower verbose \ src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \ src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop Warning: TC offload is disabled on net device. # echo $? 0 # ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \ flower \ src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \ src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop # echo $? 0 Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14Merge branch 'stmmac-dwmac-sun8i-Support-R40'David S. Miller
Chen-Yu Tsai says: ==================== net: stmmac: dwmac-sun8i: Support R40 This is a resend of the patches for net-next split out from my R40 Ethernet support v2 series, as requested by David Miller. The arm-soc bits will follow, once I rework the A64 system controller compatible. Patches 1, 2, and 3 clean up the dwmac-sun8i binding. Patch 4 adds device tree binding for Allwinner R40's Ethernet controller. Patch 5 converts regmap access of the syscon region in the dwmac-sun8i driver to regmap_field, in anticipation of different field widths on the R40. Patch 6 introduces custom plumbing in the dwmac-sun8i driver to fetch a regmap from another device, by looking up said device via a phandle, then getting the regmap associated with that device. Patch 7 adds support for different or absent TX/RX delay chain ranges to the dwmac-sun8i driver. Patch 8 adds support for the R40's ethernet controller. Excerpt from original cover letter: Changes since v1: - Default to fetching regmap from device pointed to by syscon phandle, and falling back to syscon API if that fails. - Dropped .syscon_from_dev field in device data as a result of the previous change. - Added a large comment block explaining the first change. - Simplified description of syscon property in sun8i-dwmac binding. - Regmap now only exposes the EMAC/GMAC register, but retains the offset within its address space. - Added patches for A64, which reuse the same sun8i-dwmac changes. This series adds support for the DWMAC based Ethernet controller found on the Allwinner R40 SoC. The controller is either a DWMAC clone or DWMAC core with its registers rearranged. This is already supported by the dwmac-sun8i driver. The glue layer control registers, unlike other sun8i family SoCs, is not in the system controller region, but in the clock control unit, like with the older A20 and A31 SoCs. While we reuse the bindings for dwmac-sun8i using a syscon phandle reference, we need some custom plumbing for the clock driver to export a regmap that only allows access to the GMAC register to the dwmac-sun8i driver. An alternative would be to allow drivers to register custom syscon devices with their own regmap and locking. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14net: stmmac: dwmac-sun8i: Add support for GMAC on Allwinner R40 SoCChen-Yu Tsai
The Allwinner R40 SoC has the EMAC controller supported by dwmac-sun8i. It is named "GMAC", while EMAC refers to the 10/100 Mbps Ethernet controller supported by sun4i-emac. The controller is the same, but the R40 has the glue layer controls in the clock control unit (CCU), with a reduced RX delay chain, and no TX delay chain. This patch adds support for it using the framework laid out by previous patches to map the differences. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14net: stmmac: dwmac-sun8i: Support different ranges for TX/RX delay chainsChen-Yu Tsai
On the R40 SoC, the RX delay chain only has a range of 0~7 (hundred picoseconds), instead of 0~31. Also the TX delay chain is completely absent. This patch adds support for different ranges by adding per-compatible maximum values in the variant data. A maximum of 0 indicates that the delay chain is not supported or absent. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14net: stmmac: dwmac-sun8i: Allow getting syscon regmap from external deviceChen-Yu Tsai
On the Allwinner R40 SoC, the "GMAC clock" register is in the CCU address space. Using a standard syscon to access it provides no coordination with the CCU driver for register access. Neither does it prevent this and other drivers from accessing other, maybe critical, clock control registers. On other SoCs, the register is in the "system control" address space, which might also contain controls for mapping SRAM to devices or the CPU. This hardware has the same issues. Instead, for these types of setups, we let the device containing the control register create a regmap tied to it. We can then get the device from the existing syscon phandle, and retrieve the regmap with dev_get_regmap(). Signed-off-by: Chen-Yu Tsai <wens@csie.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14net: stmmac: dwmac-sun8i: Use regmap_field for syscon register accessChen-Yu Tsai
On the Allwinner R40, the "GMAC clock" register is located in the CCU block, at a different register address than the other SoCs that have it in the "system control" block. This patch converts the use of regmap to regmap_field for mapping and accessing the syscon register, so we can have the register address in the variants data, and not in the actual register manipulation code. This patch only converts regmap_read() and regmap_write() calls to regmap_field_read() and regmap_field_write() calls. There are some places where it might make sense to switch to regmap_field_update_bits(), but this is not done here to keep the patch simple. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14dt-bindings: net: dwmac-sun8i: Add binding for GMAC on Allwinner R40 SoCChen-Yu Tsai
The Allwinner R40 SoC has the EMAC controller supported by dwmac-sun8i. It is named "GMAC", while EMAC refers to the 10/100 Mbps Ethernet controller supported by sun4i-emac. The controller is the same, but the R40 has the glue layer controls in the clock control unit (CCU), with a reduced RX delay chain, and no TX delay chain. This patch adds the R40 specific bits to the dwmac-sun8i binding. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14dt-bindings: net: dwmac-sun8i: simplify description of syscon propertyChen-Yu Tsai
The syscon property is used to point to the device that holds the glue layer control register known as the "EMAC (or GMAC) clock register". We do not need to explicitly list what compatible strings are needed, as this information is readily available in the user manuals. Also the "syscon" device type is more of an implementation detail. There are many ways to access a register not in a device's address range, the syscon interface being the most generic and unrestricted one. Simplify the description so that it says what it is supposed to describe. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14dt-bindings: net: dwmac-sun8i: Sort syscon compatibles by alphabetical orderChen-Yu Tsai
The A83T syscon compatible was appended to the syscon compatibles list, instead of inserted in to preserve the ordering. Move it to the proper place to keep the list sorted. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14dt-bindings: net: dwmac-sun8i: Clean up clock delay chain descriptionsChen-Yu Tsai
The clock delay chains found in the glue layer for dwmac-sun8i are only used with RGMII PHYs. They are not intended for non-RGMII PHYs, such as MII external PHYs or the internal PHY. Also, a recent SoC has a smaller range of possible values for the delay chain. This patch reformats the delay chain section of the device tree binding to make it clear that the delay chains only apply to RGMII PHYs, and make it easier to add the R40-specific bits later. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14Merge branch 'dsa-mv88e6xxx-remove-Global-1-setup'David S. Miller
Vivien Didelot says: ==================== net: dsa: mv88e6xxx: remove Global 1 setup The mv88e6xxx driver is still writing arbitrary registers at setup time, e.g. priority override bits. Add ops for them and provide specific setup functions for priority and stats before getting rid of the erroneous mv88e6xxx_g1_setup code, as previously done with Global 2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14net: dsa: mv88e6xxx: add a stats setup functionVivien Didelot
Now that the Global 1 specific setup function only setup the statistics unit, kill it in favor of a mv88e6xxx_stats_setup function. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14net: dsa: mv88e6xxx: add IEEE and IP mapping opsVivien Didelot
All Marvell switch families except 88E6390 have direct registers in Global 1 for IEEE and IP priorities override mapping. The 88E6390 uses indirect tables instead. Add .ieee_pri_map and .ip_pri_map ops to distinct that and call them from a mv88e6xxx_pri_setup helper. Only non-6390 are concerned ATM. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14net: dsa: mv88e6xxx: use helper for 6390 histogramVivien Didelot
The Marvell 88E6390 model has its histogram mode bits moved in the Global 1 Control 2 register. Use the previously introduced mv88e6xxx_g1_ctl2_mask helper to set them. At the same time complete the documentation of the said register. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-05-14 This series contains updates to virtchnl, i40e and i40evf. Bruce cleans up whitespace and unnecessary parentheses in virtchnl. Jake does a number of stat cleanups in the i40e driver, including cleanup of code indentation, whitespace issues, remove duplicate stats, fix grammar in code comment and general spring cleaning of the statistics code. Patryk fixes an issue where we recalculate vectors left and vectors wanted but do not take into account the reduced number of queue pairs per VSI. Harshitha adds tx_busy stat to ethtool stats to track the number of times we return NETDEV_TX_BUSY to the stack during transmit. Paweł fixes a potential system crash when unloading the VF driver after a hardware reset. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14Merge branch ↵David S. Miller
'kernel-add-support-to-collect-hardware-logs-in-crash-recovery-kernel' Rahul Lakkireddy says: ==================== kernel: add support to collect hardware logs in crash recovery kernel On production servers running variety of workloads over time, kernel panic can happen sporadically after days or even months. It is important to collect as much debug logs as possible to root cause and fix the problem, that may not be easy to reproduce. Snapshot of underlying hardware/firmware state (like register dump, firmware logs, adapter memory, etc.), at the time of kernel panic will be very helpful while debugging the culprit device driver. This series of patches add new generic framework that enable device drivers to collect device specific snapshot of the hardware/firmware state of the underlying device in the crash recovery kernel. In crash recovery kernel, the collected logs are added as elf notes to /proc/vmcore, which is copied by user space scripts for post-analysis. The sequence of actions done by device drivers to append their device specific hardware/firmware logs to /proc/vmcore are as follows: 1. During probe (before hardware is initialized), device drivers register to the vmcore module (via vmcore_add_device_dump()), with callback function, along with buffer size and log name needed for firmware/hardware log collection. 2. vmcore module allocates the buffer with requested size. It adds an elf note and invokes the device driver's registered callback function. 3. Device driver collects all hardware/firmware logs into the buffer and returns control back to vmcore module. The device specific hardware/firmware logs can be seen as elf notes with note type 0x700, as shown below: Displaying notes found at file offset 0x00001000 with length 0x040032c0: Owner Data size Description LINUX 0x02000fec Unknown note type: (0x00000700) LINUX 0x02000fec Unknown note type: (0x00000700) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) CORE 0x00000150 NT_PRSTATUS (prstatus structure) VMCOREINFO 0x00000785 Unknown note type: (0x00000000) Patch 1 adds API to vmcore module to allow drivers to register callback to collect the device specific hardware/firmware logs. The logs will be added to /proc/vmcore as elf notes. Patch 2 updates read and mmap logic to append device specific hardware/ firmware logs as elf notes. Patch 3 shows a cxgb4 driver example using the API to collect hardware/firmware logs in crash recovery kernel, before hardware is initialized. Thanks, Rahul --- v8: - Added missing linux/types.h header include. - Removed __vmcore_add_device_dump(). v7: - Removed "CHELSIO" vendor identifier in Elf Note name. Instead, writing "LINUX". - Moved vmcoredd_header to new file include/uapi/linux/vmcore.h - Reworked vmcoredd_header to include Elf Note as part of the header itself. - Removed vmcoredd_get_note_size(). - Renamed vmcoredd_write_note() to vmcoredd_write_header(). - Replaced all "unsigned long" with "unsigned int" for device dump size since max size of Elf Word is u32. v6: - Reworked device dump elf note name to contain vendor identifier. - Added vmcoredd_header that precedes actual dump in the Elf Note. - Device dump's name is moved inside vmcoredd_header. - Added "CHELSIO" string as vendor identifier in the Elf Note name for cxgb4 device dumps. v5: - Removed enabling CONFIG_PROC_VMCORE_DEVICE_DUMP by default and updated help message. v4: - Made __vmcore_add_device_dump() static. - Moved compile check to define vmcore_add_device_dump() to crash_dump.h to fix compilation when vmcore.c is not compiled in. - Convert ---help--- to help in Kconfig as indicated by checkpatch. - Rebased to tip. v3: - Dropped sysfs crashdd module. - Exported dumps as elf notes. Suggested by Eric Biederman <ebiederm@xmission.com>. Added as patch 2 in this version. - Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device dump support. - Moved logic related to adding dumps from crashdd to vmcore module. - Rename all crashdd* to vmcoredd*. - Updated comments. v2: - Added ABI Documentation for crashdd. - Directly use octal permission instead of macro. Changes since rfc v2: - Moved exporting crashdd from procfs to sysfs. Suggested by Stephen Hemminger <stephen@networkplumber.org> - Moved code from fs/proc/crashdd.c to fs/crashdd/ directory. - Replaced all proc API with sysfs API and updated comments. - Calling driver callback before creating the binary file under crashdd sysfs. - Changed binary dump file permission from S_IRUSR to S_IRUGO. - Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP. rfc v2: - Collecting logs in 2nd kernel instead of during kernel panic. Suggested by Eric Biederman <ebiederm@xmission.com>. - Added new crashdd module that exports /proc/crashdd/ containing driver's registered hardware/firmware logs in patch 1. - Replaced the API to allow drivers to register their hardware/firmware log collect routine in crash recovery kernel in patch 1. - Updated patch 2 to use the new API in patch 1. ==================== Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14cxgb4: collect hardware dump in second kernelRahul Lakkireddy
Register callback to collect hardware/firmware dumps in second kernel before hardware/firmware is initialized. The dumps for each device will be available as elf notes in /proc/vmcore in second kernel. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14vmcore: append device dumps to vmcore as elf notesRahul Lakkireddy
Update read and mmap logic to append device dumps as additional notes before the other elf notes. We add device dumps before other elf notes because the other elf notes may not fill the elf notes buffer completely and we will end up with zero-filled data between the elf notes and the device dumps. Tools will then try to decode this zero-filled data as valid notes and we don't want that. Hence, adding device dumps before the other elf notes ensure that zero-filled data can be avoided. This also ensures that the device dumps and the other elf notes can be properly mmaped at page aligned address. Incorporate device dump size into the total vmcore size. Also update offsets for other program headers after the device dumps are added. Suggested-by: Eric Biederman <ebiederm@xmission.com>. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14vmcore: add API to collect hardware dump in second kernelRahul Lakkireddy
The sequence of actions done by device drivers to append their device specific hardware/firmware logs to /proc/vmcore are as follows: 1. During probe (before hardware is initialized), device drivers register to the vmcore module (via vmcore_add_device_dump()), with callback function, along with buffer size and log name needed for firmware/hardware log collection. 2. vmcore module allocates the buffer with requested size. It adds an Elf note and invokes the device driver's registered callback function. 3. Device driver collects all hardware/firmware logs into the buffer and returns control back to vmcore module. Ensure that the device dump buffer size is always aligned to page size so that it can be mmaped. Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more generic and reserve NT_VMCOREDD note type to indicate vmcore device dump. Suggested-by: Eric Biederman <ebiederm@xmission.com>. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-14i40evf: Fix a hardware reset support in VF driverPaweł Jabłoński
This patch fixes a hardware reset support in VF driver. It is needed because when a hardware reset is detected adapter->state is in __I40EVF_RESETTING state before i40evf_reset_task is called. Without this patch unloading VF driver after a hardware reset ends with a system crash. Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-14i40e: free the skb after clearing the bitlockJacob Keller
In commit bbc4e7d273b5 ("i40e: fix race condition with PTP_TX_IN_PROGRESS bits") we modified the code which handles Tx timestamps so that we would clear the progress bit as soon as possible. A later commit 0bc0706b46cd ("i40e: check for Tx timestamp timeouts during watchdog") introduced similar code for detecting and handling cleanup of a blocked Tx timestamp. This code did not use the same pattern for cleaning up the skb. Update this code to wait to free the skb until after the bit lock is free, by first setting the ptp_tx_skb to NULL and clearing the lock. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-14i40e: cleanup wording in a header commentJacob Keller
Fix up the English in the header comment for i40e_ptp_tx_hang. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-14i40evf: remove MAX_QUEUES and just use I40EVF_MAX_REQ_QUEUESJacob Keller
We don't really need to have separate definitions for MAX_QUEUES and I40EVF_MAX_REQ_QUEUES, since we'll always be limited by how many queues we request anyways. If we haven't enabled requesting the maximum number of queues, there's no reason to have our call to alloc_etherdev_mq actually pass the higher value, since we'd never enable those queues anyways. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-05-14i40e: add tx_busy to ethtool statsHarshitha Ramamurthy
This patch adds the tx_busy stat to the ethtool stats. The tx_busy stat tracks the number of times we return NETDEV_TX_BUSY to the stack during transmit. Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>