Age | Commit message (Collapse) | Author |
|
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This fixes a crash when changing rss with multiple traffic flows.
While interface teardown, disable tx queues after all NAPI threads
are done. If done otherwise tx queues might be woken up inside NAPI
if any CQE_TX are processed.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If a txq (SQ) remains in stopped state after this timeout its
considered as stuck and interface is reinited.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously TXQ is wakedup whenever napi is executed
and irrespective of if any CQE_TX are processed or not.
Added 'txq_stop' and 'txq_wake' counters to aid in debugging
if there are any future issues.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Suppressing standard alloc_pages() warnings. Some kernel configs limit
alloc size and the network driver may fail. Do not drop a kernel
warning in this case, instead just drop a oneliner that the network
driver could not be loaded since the buffer could not be allocated.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixing TSO packages not being counted.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix for memory leak when changing queue/channel count via ethtool
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With earlier configured value sufficient number of CQEs are not
being reserved for transmitted packets. Hence under heavy incoming
traffic load, receive notifications will take away most of the CQ
thus transmit notifications will be lost resulting in tx skbs not
being freed.
Finally SQ will be full and it will be stopped, watchdog timer
will kick in. After this fix receive notifications will not take
morethan half of CQ reserving the rest for transmit notifications.
Also changed CQ & SQ sizes from 16k to 4k.
This is also due to the receive notifications taking first half of
CQ under heavy load and time taken by NAPI to clear transmit notifications
will increase with higher queue sizes. Again results in SQ being stopped.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixed 'tso_hdrs' memory not being freed properly.
Also fixed SQ skbuff maintenance issues.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Switching back to LDD transactions from LDWB.
While transmitting packets out with LDWB transactions
data integrity issues are seen very frequently.
hence switching back to LDD.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current driver always uses vlan-promisc mode, i.e., it receives both
tagged and untagged traffic and lets the network stack drop packets
tagged with unrequested vlan tags.
This patch implements vlan-filtering offload in the driver -
Unless explicitly configured to promisc mode, only untagged packets or
packets tagged with requested vlans would reach the Rx flow.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Amir Vadai says:
====================
net/mlx5e: Driver update 29-Jul-2015
This patchset contain bug fixes and code cleaning patches to the ConnectX-4
Ethernet driver.
Patchset was applied and tested over commit 8c1a91f ("Merge branch
'mlx4-802.1ad-accel'")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It was used to update netdev priv parameters that require stopping
and re-opening the device in a generic way - it got the new
parameters and did: ndo_stop(), copy new parameters into current
parameters, ndo_open().
We chose to remove it for two reasons:
1) It requires additional instance of struct mlx5e_params on the
stack and looking forward we expect this struct to grow.
2) Sometimes we want to do additional operations (besides
just updating the priv parameters) while the netdev is stopped.
For example, updating netdev->mtu @mlx5e_change_mtu() should
be done while the netdev is stopped (done in this commit).
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce access functions to create/destroy RSS indrection table
and use it in the Ethernet driver.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since it is un-named at this time.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the already defined rq pointer directly.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is not needed by the mlx5 Eth driver since it has a CQ per RQ/SQ.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This field already exists under the mlx5e_params struct
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The page size of the device's RQ/SQ/CQ objects is defined in 4K
units regardless of the system pages size.
Thus using the Linux's PAGE_SHIFT macro yields wrong device
configuration in systems where PAGE_SHIFT!=12.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mlx5_cmd_exec() might fail - need to check return value.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This readds the config option CONFIG_OPENVSWITCH_VXLAN to avoid a
hard dependency of OVS on VXLAN. It moves the VXLAN config compat
code to vport-vxlan.c and allows compliation as a module.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Fixes: 2661371ace96 ("openvswitch: fix compilation when vxlan is a module")
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is the IPv6 equivalent of commit
6c8b4e3ff81b ("arp: flush arp cache on IFF_NOARP change")
Without it, we keep buggy neighbours in the cache, with destination
MAC address equal to our own MAC address.
Tested:
tcpdump -i eth0 -s 0 ip6 -n -e &
ip link set dev eth0 arp off
ping6 remote // sends buggy frames
ip link set dev eth0 arp on
ping6 remote // should work once kernel is patched
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mario Fanelli <mariofanelli@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Field pktgen_dev.allocated_skbs had been written to, but never read
from. The number of allocated skbs can be deduced anyway, from the total
number of sent packets and the 'clone_skb' param.
Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allocate enough space so as not to force the outgoing net device to do
skb_realloc_headroom().
Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Any external user should use the registration API instead of
accessing this directly.
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As part of defconfig consolidation using fragments, we'd like to be
able to have the same drivers enabled on 32-bit and 64-bit. Gianfar
happens to only exist on 32-bit systems, and when building the
resulting 64-bit kernel warnings were produced.
A couple of the warnings are trivial, but the rfbptr code has deeper
issues. It uses the virtual address as the DMA address, which again,
happens to work in the environments where this driver is currently
used, but is not the right thing to do.
Fixes: 45b679c9a3cc ("gianfar: Implement PAUSE frame generation
support")
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tom Herbert says:
====================
net: Initialize sk_hash to random value and reset for failing cnxs
This patch set implements a common function to simply set sk_txhash to
a random number instead of going through the trouble to call flow
dissector. From dst_negative_advice we now reset the sk_txhash in hopes
of finding a better ECMP path through the network. Changing sk_txhash
affects:
- IPv6 flow label and UDP source port which affect ECMP in the network
- Local ECMP route selection (pending changes to use sk_txhash)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a connection is failing a transport protocol calls
dst_negative_advice to try to get a better route. This patch includes
changing the sk_txhash in that function. This provides a rudimentary
method to try to find a different path in the network since sk_txhash
affects ECMP on the local host and through the network (via flow labels
or UDP source port in encapsulation).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch creates sk_set_txhash and eliminates protocol specific
inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
random number instead of performing flow dissection. sk_set_txash
is also allowed to be called multiple times for the same socket,
we'll need this when redoing the hash for negative routing advice.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Writing a large value into a voltage limit attribute can result
in an overflow due to an auto-conversion from unsigned long to
unsigned int.
Cc: Constantine Shulyupin <const@MakeLinux.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
pwm attributes have well defined names, which should be used.
Cc: Vadim V. Vlasov <vvlasov@dev.rtsoft.ru>
Cc: stable@vger.kernel.org #v4.1+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
into drm-fixes
Fix for nasty crash on mdp4 in disable path, fix for dma-buf export,
smb leak on mdp5 which could result in intermittent modeset fails, and
don't let interrupted system call disturb atomic commit once we are
past the point of no return.
* 'msm-fixes-4.2' of git://people.freedesktop.org/~robclark/linux:
drm/msm/mdp5: release SMB (shared memory blocks) in various cases
drm/msm: change to uninterruptible wait in atomic commit
drm/msm: mdp4: Fix drm_framebuffer dereference crash
drm/msm: fix msm_gem_prime_get_sg_table()
|
|
into drm-fixes
Radeon and amdgpu fixes for 4.2. The audio fix ended up being more
invasive than I would have liked, but this should finally fix up the
last of the regressions since DP audio support was added.
* 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: add new parameter to seperate map and unmap
drm/amdgpu: hdp_flush is not needed for inside IB
drm/amdgpu: different emit_ib for gfx and compute
drm/amdgpu: information leak in amdgpu_info_ioctl()
drm/amdgpu: clean up init sequence for failures
drm/radeon/combios: add some validation of lvds values
drm/radeon: rework audio modeset to handle non-audio hdmi features
drm/radeon: rework audio detect (v4)
drm/amdgpu: Drop drm/ prefix for including drm.h in amdgpu_drm.h
drm/radeon: Drop drm/ prefix for including drm.h in radeon_drm.h
|
|
Murali Karicheri says:
====================
net: netcp: bug fixes for dynamic module support
This series fixes few bugs to allow keystone netcp modules to be
dynamically loaded and removed. Currently it allows following
sequence multiple times
insmod cpsw_ale.ko
insmod davinci_mdio.ko
insmod keystone_netcp.ko
insmod keystone_netcp_ethss.ko
ifup eth0
ifup eth1
ping <hosts on eth0>
ping <hosts on eth1>
ifdown eth1
ifdown eth0
rmmod keystone_netcp_ethss.ko
rmmod keystone_netcp.ko
rmmod davinci_mdio.ko
rmmod cpsw_ale.ko
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch clean up error handle code to use goto label properly. In some
cases, the code unnecessarily use goto instead of just returning the error
code. Code also make explicit calls to devm_* APIs on error which is
not necessary. In the gbe_remove() also it makes similar calls which is
also unnecessary.
Also fix few checkpatch warnings
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The code seems to assume a null is returned when the list is empty
from first_sec_slave() to break the loop which is incorrect. Fix the
code by using list_empty().
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently if user do rmmod keystone_netcp.ko following warning is
seen :-
[ 59.035891] ------------[ cut here ]------------
[ 59.040535] WARNING: CPU: 2 PID: 1619 at drivers/net/ethernet/ti/
netcp_core.c:2127 netcp_remove)
This is because the interface list is not cleaned up in netcp_remove.
This patch fixes this. Also fix some checkpatch related warnings.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These fix three regressions, two recent ones (cpufreq core and ACPI
device power management) and one introduced during the 4.1 cycle
(intel_pstate).
Specifics:
- Fix a recently introduced issue in the cpufreq core causing it to
attempt to create duplicate symbolic links to the policy directory
in sysfs for CPUs that are offline when the cpufreq driver is being
registered (Rafael J Wysocki)
- Fix a recently introduced problem in the ACPI device power
management core code causing it to store an incorrect value in the
device object's power.state field in some cases which in turn leads
to attempts to turn power resources off while they should still be
on going forward (Mika Westerberg)
- Fix an intel_pstate driver issue introduced during the 4.1 cycle
which leads to kernel panics on boot on Knights Landing chips due
to incomplete support for them in that driver (Lukasz Anaczkowski)"
* tag 'pm+acpi-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid attempts to create duplicate symbolic links
ACPI / PM: Use target_state to set the device power state
intel_pstate: Add get_scaling cpu_defaults param to Knights Landing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- fix DM thinp to consistently return -ENOSPC when out of data space
- fix a logic bug in the DM cache smq policy's creation error path
- revert a DM cache 4.2-rc3 change that reduced writeback efficiency
- fix a hang on DM cache device destruction due to improper
prealloc_used accounting introduced in 4.2-rc3
- update URL for dm-crypt wiki page
* tag 'dm-4.2-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix device destroy hang due to improper prealloc_used accounting
Revert "dm cache: do not wake_worker() in free_migration()"
dm crypt: update wiki page URL
dm cache policy smq: fix alloc_bitset check that always evaluates as false
dm thin: return -ENOSPC when erroring retry list due to out of data space
|
|
Radha Mohan Chintakuntla says:
====================
Add MDIO support to ThunderX NIC driver
This patch series adds MDIO support to ThunderX NIC driver by making use
of existing mdio-octeon driver. In the process modified the mdio-octeon
driver to work on both Octeon and ThunderX platforms.
* From v1:
- Removed default selection in Kconfig for MDIO_OCTEON
- Replace uint64 with u64 as suggested by David Daney
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The CONFIG_MDIO_OCTEON is required so that the ThunderX NIC driver can
talk to the PHY drivers.
Signed-off-by: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes a possible crash in the octeon_mdiobus_probe function
if the return values are not handled properly.
Signed-off-by: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch modifies the mdio-octeon driver to work on both ThunderX and
Octeon SoCs from Cavium Inc.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On some of the K2E and K2L platforms, the two DWORDs in
efuse occupied by the pre-programmed mac address for
slave port 1 are swapped. To workaround this issue,
this patch adds a new define NETCP_EFUSE_ADDR_SWAP (2)
which signifies the occurrence of such swapping so that
the driver can take proper action. The flag can be
enabled in the corresponding netcp interface dts binding
as efuse-mac = <2> under the corresponding netcp
interface node.
Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With eBPF JIT compiler enabled on x86_64, I was able to reliably trigger
the following general protection fault out of an eBPF program with a simple
tail call, f.e. tracex5 (or a stripped down version of it):
[ 927.097918] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[...]
[ 927.100870] task: ffff8801f228b780 ti: ffff880016a64000 task.ti: ffff880016a64000
[ 927.102096] RIP: 0010:[<ffffffffa002440d>] [<ffffffffa002440d>] 0xffffffffa002440d
[ 927.103390] RSP: 0018:ffff880016a67a68 EFLAGS: 00010006
[ 927.104683] RAX: 5a5a5a5a5a5a5a5a RBX: 0000000000000000 RCX: 0000000000000001
[ 927.105921] RDX: 0000000000000000 RSI: ffff88014e438000 RDI: ffff880016a67e00
[ 927.107137] RBP: ffff880016a67c90 R08: 0000000000000000 R09: 0000000000000001
[ 927.108351] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880016a67e00
[ 927.109567] R13: 0000000000000000 R14: ffff88026500e460 R15: ffff880220a81520
[ 927.110787] FS: 00007fe7d5c1f740(0000) GS:ffff880265000000(0000) knlGS:0000000000000000
[ 927.112021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 927.113255] CR2: 0000003e7bbb91a0 CR3: 000000006e04b000 CR4: 00000000001407e0
[ 927.114500] Stack:
[ 927.115737] ffffc90008cdb000 ffff880016a67e00 ffff88026500e460 ffff880220a81520
[ 927.117005] 0000000100000000 000000000000001b ffff880016a67aa8 ffffffff8106c548
[ 927.118276] 00007ffcdaf22e58 0000000000000000 0000000000000000 ffff880016a67ff0
[ 927.119543] Call Trace:
[ 927.120797] [<ffffffff8106c548>] ? lookup_address+0x28/0x30
[ 927.122058] [<ffffffff8113d176>] ? __module_text_address+0x16/0x70
[ 927.123314] [<ffffffff8117bf0e>] ? is_ftrace_trampoline+0x3e/0x70
[ 927.124562] [<ffffffff810c1a0f>] ? __kernel_text_address+0x5f/0x80
[ 927.125806] [<ffffffff8102086f>] ? print_context_stack+0x7f/0xf0
[ 927.127033] [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
[ 927.128254] [<ffffffff810f7852>] ? __lock_acquire+0x572/0x2050
[ 927.129461] [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
[ 927.130654] [<ffffffff8119ee4a>] trace_call_bpf+0x8a/0x140
[ 927.131837] [<ffffffff8119edfa>] ? trace_call_bpf+0x3a/0x140
[ 927.133015] [<ffffffff8119f008>] kprobe_perf_func+0x28/0x220
[ 927.134195] [<ffffffff811a1668>] kprobe_dispatcher+0x38/0x60
[ 927.135367] [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
[ 927.136523] [<ffffffff81061400>] kprobe_ftrace_handler+0xf0/0x150
[ 927.137666] [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
[ 927.138802] [<ffffffff8117950c>] ftrace_ops_recurs_func+0x5c/0xb0
[ 927.139934] [<ffffffffa022b0d5>] 0xffffffffa022b0d5
[ 927.141066] [<ffffffff81174b91>] ? seccomp_phase1+0x1/0x230
[ 927.142199] [<ffffffff81174b95>] seccomp_phase1+0x5/0x230
[ 927.143323] [<ffffffff8102c0a4>] syscall_trace_enter_phase1+0xc4/0x150
[ 927.144450] [<ffffffff81174b95>] ? seccomp_phase1+0x5/0x230
[ 927.145572] [<ffffffff8102c0a4>] ? syscall_trace_enter_phase1+0xc4/0x150
[ 927.146666] [<ffffffff817f9a9f>] tracesys+0xd/0x44
[ 927.147723] Code: 48 8b 46 10 48 39 d0 76 2c 8b 85 fc fd ff ff 83 f8 20 77 21 83
c0 01 89 85 fc fd ff ff 48 8d 44 d6 80 48 8b 00 48 83 f8 00 74
0a <48> 8b 40 20 48 83 c0 33 ff e0 48 89 d8 48 8b 9d d8 fd ff
ff 4c
[ 927.150046] RIP [<ffffffffa002440d>] 0xffffffffa002440d
The code section with the instructions that traps points into the eBPF JIT
image of the root program (the one invoking the tail call instruction).
Using bpf_jit_disasm -o on the eBPF root program image:
[...]
4e: mov -0x204(%rbp),%eax
8b 85 fc fd ff ff
54: cmp $0x20,%eax <--- if (tail_call_cnt > MAX_TAIL_CALL_CNT)
83 f8 20
57: ja 0x000000000000007a
77 21
59: add $0x1,%eax <--- tail_call_cnt++
83 c0 01
5c: mov %eax,-0x204(%rbp)
89 85 fc fd ff ff
62: lea -0x80(%rsi,%rdx,8),%rax <--- prog = array->prog[index]
48 8d 44 d6 80
67: mov (%rax),%rax
48 8b 00
6a: cmp $0x0,%rax <--- check for NULL
48 83 f8 00
6e: je 0x000000000000007a
74 0a
70: mov 0x20(%rax),%rax <--- GPF triggered here! fetch of bpf_func
48 8b 40 20 [ matches <48> 8b 40 20 ... from above ]
74: add $0x33,%rax <--- prologue skip of new prog
48 83 c0 33
78: jmpq *%rax <--- jump to new prog insns
ff e0
[...]
The problem is that rax has 5a5a5a5a5a5a5a5a, which suggests a tail call
jump to map slot 0 is pointing to a poisoned page. The issue is the following:
lea instruction has a wrong offset, i.e. it should be ...
lea 0x80(%rsi,%rdx,8),%rax
... but it actually seems to be ...
lea -0x80(%rsi,%rdx,8),%rax
... where 0x80 is offsetof(struct bpf_array, prog), thus the offset needs
to be positive instead of negative. Disassembling the interpreter, we btw
similarly do:
[...]
c88: lea 0x80(%rax,%rdx,8),%rax <--- prog = array->prog[index]
48 8d 84 d0 80 00 00 00
c90: add $0x1,%r13d
41 83 c5 01
c94: mov (%rax),%rax
48 8b 00
[...]
Now the other interesting fact is that this panic triggers only when things
like CONFIG_LOCKDEP are being used. In that case offsetof(struct bpf_array,
prog) starts at offset 0x80 and in non-CONFIG_LOCKDEP case at offset 0x50.
Reason is that the work_struct inside struct bpf_map grows by 48 bytes in my
case due to the lockdep_map member (which also has CONFIG_LOCK_STAT enabled
members).
Changing the emitter to always use the 4 byte displacement in the lea
instruction fixes the panic on my side. It increases the tail call instruction
emission by 3 more byte, but it should cover us from various combinations
(and perhaps other future increases on related structures).
After patch, disassembly:
[...]
9e: lea 0x80(%rsi,%rdx,8),%rax <--- CONFIG_LOCKDEP/CONFIG_LOCK_STAT
48 8d 84 d6 80 00 00 00
a6: mov (%rax),%rax
48 8b 00
[...]
[...]
9e: lea 0x50(%rsi,%rdx,8),%rax <--- No CONFIG_LOCKDEP
48 8d 84 d6 50 00 00 00
a6: mov (%rax),%rax
48 8b 00
[...]
Fixes: b52f00e6a715 ("x86: bpf_jit: implement bpf_tail_call() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6
("tipc: reduce locking scope during packet reception") we introduced
a new function tipc_build_bcast_sync_msg(), which carries initial
synchronization data between two nodes at first contact and at
re-contact. In this function, we missed to add synchronization data,
with the effect that the broadcast link endpoints will fail to
synchronize correctly at re-contact between a running and a restarted
node. All other cases work as intended.
With this commit, we fix this bug.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Follow e8e85cc5eb57 ("packet: remove handling of tx_ring") and remove
the tx_ring parameter from prb_shutdown_retire_blk_timer() as it is only
called with tx_ring = 0.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since mdb states were introduced when deleting an entry the state was
left as it was set in the delete request from the user which leads to
the following output when doing a monitor (for example):
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 temp
^^^
Note the "temp" state in the delete notification which is wrong since
the entry was permanent, the state in a delete is always reported as
"temp" regardless of the real state of the entry.
After this patch:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
There's one important note to make here that the state is actually not
matched when doing a delete, so one can delete a permanent entry by
stating "temp" in the end of the command, I've chosen this fix in order
not to break user-space tools which rely on this (incorrect) behaviour.
So to give an example after this patch and using the wrong state:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 temp
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
Note the state of the entry that got deleted is correct in the
notification.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: ccb1c31a7a87 ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael Holzheu says:
====================
s390/bpf: recache skb->data/hlen for skb_vlan_push/pop
Here the s390 backend for Alexei's patch 4e10df9a60d9 ("bpf: introduce
bpf_skb_vlan_push/pop() helpers") plus two bugfixes and two minor
improvements.
The first patch "s390/bpf: clear correct BPF accumulator register" will
also go upstream via Martin's "fixes" branch.
* v2: Integrated suggestions from Joe Perches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop
via helper functions. These functions may change skb->data/hlen.
This data is cached by s390 JIT to improve performance of ld_abs/ld_ind
instructions. Therefore after a change we have to reload the data.
In case of usage of skb_vlan_push/pop, in the prologue we store
the SKB pointer on the stack and restore it after BPF_JMP_CALL
to skb_vlan_push/pop.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|