linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2014-03-06	bonding: correctly handle out of range parameters for lp_interval	Sasha Levin
	We didn't correctly check cases where the value for lp_interval is not within the legal range due to a missing table terminator. This would let userspace trigger a kernel panic by specifying a value out of range: echo -1 > /sys/devices/virtual/net/bond0/bonding/lp_interval Introduced by commit 4325b374f84 ("bonding: convert lp_interval to use the new option API"). Acked-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	drm/radeon/dpm: fix typo in EVERGREEN_SMC_FIRMWARE_HEADER_softRegisters	Alex Deucher
	Should be at 0x8 rather than 0. fixes: https://bugzilla.kernel.org/show_bug.cgi?id=60523 Noticed by ArtForz on #radeon Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2014-03-06	drm/radeon/cik: fix typo in documentation	Alex Deucher
	Copy-paste typo. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-03-06	drm/radeon: silence GCC warning on 32 bit	Paul Bolle
	Building radeon_ttm.o on 32 bit x86 triggers a warning: In file included from include/asm-generic/bug.h:13:0, from [...]/arch/x86/include/asm/bug.h:38, from include/linux/bug.h:4, from include/drm/drm_mm.h:39, from include/drm/drm_vma_manager.h:26, from include/drm/ttm/ttm_bo_api.h:35, from drivers/gpu/drm/radeon/radeon_ttm.c:32: drivers/gpu/drm/radeon/radeon_ttm.c: In function 'radeon_ttm_gtt_read': include/linux/kernel.h:712:17: warning: comparison of distinct pointer types lacks a cast [enabled by default] (void) (&_min1 == &_min2); \ ^ drivers/gpu/drm/radeon/radeon_ttm.c:938:22: note: in expansion of macro 'min' ssize_t cur_size = min(size, PAGE_SIZE - off); ^ Silence this warning by using min_t(). Since cur_size will never be negative and its upper bound is PAGE_SIZE, we can change its type to size_t and use min_t(size_t, [...]) here. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
2014-03-06	drm/radeon: resume old pm late	Alex Deucher
	Moving the pm resume up in the init order to fix dpm seems to have regressed somes cases with the old pm code. Move it back to late resume. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-03-06	drm/radeon: TTM must be init with cpu-visible VRAM, v2	Lauri Kasanen
	Without this, a bo may get created in the cpu-inaccessible vram. Before the CP engines get setup, all copies are done via cpu memcpy. This means that the cpu tries to read from inaccessible memory, fails, and the radeon module proceeds to disable acceleration. Doing this has no downsides, as the real VRAM size gets set as soon as the CP engines get init. This is a candidate for 3.14 fixes. v2: Add comment on why the function is used Signed-off-by: Lauri Kasanen <cand@gmx.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org
2014-03-06	ipv6: Fix exthdrs offload registration.	Anton Nayshtut
	Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds). This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS header. The issue detected and the fix verified by running MS HCK Offload LSO test on top of QEMU Windows guests, as this test sends IPv6 packets with IPPROTO_DSTOPTS. Signed-off-by: Anton Nayshtut <anton@swortex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	sparc: serial: Clean up the locking for -rt	David Miller
	Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Allen Pais <allen.pais@oracle.com>
2014-03-06	ibmveth: Fix endian issues with MAC addresses	Anton Blanchard
	The code to load a MAC address into a u64 for passing to the hypervisor via a register is broken on little endian. Create a helper function called ibmveth_encode_mac_addr which does the right thing in both big and little endian. We were storing the MAC address in a long in struct ibmveth_adapter. It's never used so remove it - we don't need another place in the driver where we create endian issues with MAC addresses. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	net: unix socket code abuses csum_partial	Anton Blanchard
	The unix socket code is using the result of csum_partial to hash into a lookup table: unix_hash_fold(csum_partial(sunaddr, len, 0)); csum_partial is only guaranteed to produce something that can be folded into a checksum, as its prototype explains: * returns a 32-bit number suitable for feeding into itself * or csum_tcpudp_magic The 32bit value should not be used directly. Depending on the alignment, the ppc64 csum_partial will return different 32bit partial checksums that will fold into the same 16bit checksum. This difference causes the following testcase (courtesy of Gustavo) to sometimes fail: #include <sys/socket.h> #include <stdio.h> int main() { int fd = socket(PF_LOCAL, SOCK_STREAM\|SOCK_CLOEXEC, 0); int i = 1; setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4); struct sockaddr addr; addr.sa_family = AF_LOCAL; bind(fd, &addr, 2); listen(fd, 128); struct sockaddr_storage ss; socklen_t sslen = (socklen_t)sizeof(ss); getsockname(fd, (struct sockaddr)&ss, &sslen); fd = socket(PF_LOCAL, SOCK_STREAM\|SOCK_CLOEXEC, 0); if (connect(fd, (struct sockaddr)&ss, sslen) == -1){ perror(NULL); return 1; } printf("OK\n"); return 0; } As suggested by davem, fix this by using csum_fold to fold the partial 32bit checksum into a 16bit checksum before using it. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	net: Improve SO_TIMESTAMPING documentation and fix a minor code bug	Andrew Lutomirski
	The original documentation was very unclear. The code fix is presumably related to the formerly unclear documentation: SOCK_TIMESTAMPING_RX_SOFTWARE has no effect on __sock_recv_timestamp's behavior, so calling __sock_recv_ts_and_drops from sock_recv_ts_and_drops if only SOCK_TIMESTAMPING_RX_SOFTWARE is set is pointless. This should have no user-observable effect. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	phy: fix compiler array bounds warning on settings[]	Bjorn Helgaas
	With -Werror=array-bounds, gcc v4.7.x warns that in phy_find_valid(), the settings[] "array subscript is above array bounds", I think because idx is a signed integer and if the caller supplied idx < 0, we pass the guard but still reference out of bounds. Fix this by making idx unsigned here and elsewhere. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	firewire: ohci: fix probe failure with Agere/LSI controllers	Stefan Richter
	Since commit bd972688eb24 "firewire: ohci: Fix 'failed to read phy reg' on FW643 rev8", there is a high chance that firewire-ohci fails to initialize LSI née Agere controllers. https://bugzilla.kernel.org/show_bug.cgi?id=65151 Peter Hurley points out the reason: IEEE 1394a:2000 clause 5A.1 (or IEEE 1394:2008 clause 17.2.1) say: "The PHY shall insure that no more than 10 ms elapse from the reassertion of LPS until the interface is reset. The link shall not assert LReq until the reset is complete." In other words, the link needs to give the PHY at least 10 ms to get the interface operational. With just the msleep(1) in bd972688eb24, the first read_phy_reg() during ohci_enable() may happen before the phy-link interface reset was finished, and fail. Due to the high variability of msleep(n) with small n, this failure was not fully reproducible, and not apparent at all with low CONFIG_HZ setting. On the other hand, Peter can no longer reproduce the issue with FW643 rev8. The read phy reg failures that happened back then may have had an unrelated cause. So, just revert bd972688eb24, except for the valid comment on TSB82AA2 cards. Reported-by: Mikhail Gavrilov Reported-by: Jay Fenlason <fenlason@redhat.com> Reported-by: Clemens Ladisch <clemens@ladisch.de> Reported-by: Peter Hurley <peter@hurleysoftware.com> Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2014-03-06	inet: frag: make sure forced eviction removes all frags	Florian Westphal
	Quoting Alexander Aring: While fragmentation and unloading of 6lowpan module I got this kernel Oops after few seconds: BUG: unable to handle kernel paging request at f88bbc30 [..] Modules linked in: ipv6 [last unloaded: 6lowpan] Call Trace: [<c012af4c>] ? call_timer_fn+0x54/0xb3 [<c012aef8>] ? process_timeout+0xa/0xa [<c012b66b>] run_timer_softirq+0x140/0x15f Problem is that incomplete frags are still around after unload; when their frag expire timer fires, we get crash. When a netns is removed (also done when unloading module), inet_frag calls the evictor with 'force' argument to purge remaining frags. The evictor loop terminates when accounted memory ('work') drops to 0 or the lru-list becomes empty. However, the mem accounting is done via percpu counters and may not be accurate, i.e. loop may terminate prematurely. Alter evictor to only stop once the lru list is empty when force is requested. Reported-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Reported-by: Alexander Aring <alex.aring@gmail.com> Tested-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	MIPS: APRP: Choose the correct VPE loader by fixing the linking	Deng-Cheng Zhu
	Now we have CONFIG_MIPS_VPE_LOADER and CONFIG_MIPS_VPE_LOADER_[CMP\|MT]. The latter two are used by the 2 exclusive flavors. The vpe_run in malta-amon.c is for CMP APRP. Without the fix, this vpe_run will be used in MT APRP. Reviewed-by: Steven J. Hill <Steven.Hill@imgtec.com> Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com> Cc: linux-mips@linux-mips.org Cc: john@phrozen.org Patchwork: https://patchwork.linux-mips.org/patch/6589/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-06	MIPS: APRP: Unregister rtlx interrupt hook at module exit	Deng-Cheng Zhu
	If the aprp_hook is not assigned back to NULL, it will still be called after module exits. This is not wanted. Reviewed-by: Steven J. Hill <Steven.Hill@imgtec.com> Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com> Cc: linux-mips@linux-mips.org Cc: john@phrozen.org Patchwork: https://patchwork.linux-mips.org/patch/6590/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-06	MIPS: APRP: Fix the linking of rtlx interrupt hook	Deng-Cheng Zhu
	There are 2 errors with the existing aprp_hook linking: - The prefix CONFIG_ is missing; - The hook should be linked exclusively in the cases of MT and CMP. Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com> Reviewed-by: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org Cc: john@phrozen.org Patchwork: https://patchwork.linux-mips.org/patch/6588/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-06	MIPS: bcm47xx: Include missing errno.h for ENXIO	Markos Chandras
	Fixes the following build problen on allmodconfig: arch/mips/bcm47xx/board.c: In function 'bcm47xx_board_detect': arch/mips/bcm47xx/board.c:291:14: error: 'ENXIO' undeclared (first use in this function) Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/6571/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-06	MIPS: Alchemy: Fix unchecked kstrtoul return value	Manuel Lauss
	enabled __must_check logic triggers a build error for mtx1 and gpr in the prom init code. Fix by checking the kstrtoul() return value. Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com> Cc: Linux-MIPS <linux-mips@linux-mips.org> Patchwork: https://patchwork.linux-mips.org/patch/6574/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-06	MIPS: Fix randconfig build error.	Ralf Baechle
	CC arch/mips/kernel/ptrace.o In file included from arch/mips/kernel/ptrace.c:42:0: arch/mips/kernel/ptrace.c: In function ‘mips_get_syscall_arg’: /home/ralf/src/linux/linux-mips/arch/mips/include/asm/syscall.h:60:1: error: control reaches end of non-void function [-Werror=return-type] cc1: all warnings being treated as errors make[2]: * [arch/mips/kernel/ptrace.o] Error 1 make[1]: * [arch/mips/kernel] Error 2 make: *** [arch/mips] Error 2 Fixed by marking the end of mips_get_syscall_arg() as unreachable. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-03-06	Merge branch 'tipc'	David S. Miller
	Eric Hugne says: ==================== tipc: refcount and memory leak fixes v3: Remove error logging from data path completely. Rebased on top of latest net merge. v2: Drop specific -ENOMEM logging in patch #1 (tipc: allow connection shutdown callback to be invoked in advance) And add a general error message if an internal server tries to send a message on a closed/nonexisting connection. In addition to the fix for refcount leak and memory leak during module removal, we also fix a problem where the topology server listening socket where unexpectedly closed. We also eliminate an unnecessary context switch during accept()/recvmsg() for nonblocking sockets. It might be good to include this patchset in stable aswell. After the v3 rebase on latest merge from net all patches apply cleanly on that tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	tipc: don't log disabled tasklet handler errors	Erik Hugne
	Failure to schedule a TIPC tasklet with tipc_k_signal because the tasklet handler is disabled is not an error. It means TIPC is currently in the process of shutting down. We remove the error logging in this case. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	tipc: fix memory leak during module removal	Erik Hugne
	When the TIPC module is removed, the tasklet handler is disabled before all other subsystems. This will cause lingering publications in the name table because the node_down tasklets responsible to clean up publications from an unreachable node will never run. When the name table is shut down, these publications are detected and an error message is logged: tipc: nametbl_stop(): orphaned hash chain detected This is actually a memory leak, introduced with commit 993b858e37b3120ee76d9957a901cca22312ffaa ("tipc: correct the order of stopping services at rmmod") Instead of just logging an error and leaking memory, we free the orphaned entries during nametable shutdown. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	tipc: drop subscriber connection id invalidation	Erik Hugne
	When a topology server subscriber is disconnected, the associated connection id is set to zero. A check vs zero is then done in the subscription timeout function to see if the subscriber have been shut down. This is unnecessary, because all subscription timers will be cancelled when a subscriber terminates. Setting the connection id to zero is actually harmful because id zero is the identity of the topology server listening socket, and can cause a race that leads to this socket being closed instead. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	tipc: avoid to unnecessary process switch under non-block mode	Ying Xue
	When messages are received via tipc socket under non-block mode, schedule_timeout() is called in tipc_wait_for_rcvmsg(), that is, the process of receiving messages will be scheduled once although timeout value passed to schedule_timeout() is 0. The same issue exists in accept()/wait_for_accept(). To avoid this unnecessary process switch, we only call schedule_timeout() if the timeout value is non-zero. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	tipc: fix connection refcount leak	Ying Xue
	When tipc_conn_sendmsg() calls tipc_conn_lookup() to query a connection instance, its reference count value is increased if it's found. But subsequently if it's found that the connection is closed, the work of sending message is not queued into its server send workqueue, and the connection reference count is not decreased. This will cause a reference count leak. To reproduce this problem, an application would need to open and closes topology server connections with high intensity. We fix this by immediately decrementing the connection reference count if a send fails due to the connection being closed. Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	tipc: allow connection shutdown callback to be invoked in advance	Ying Xue
	Currently connection shutdown callback function is called when connection instance is released in tipc_conn_kref_release(), and receiving packets and sending packets are running in different threads. Even if connection is closed by the thread of receiving packets, its shutdown callback may not be called immediately as the connection reference count is non-zero at that moment. So, although the connection is shut down by the thread of receiving packets, the thread of sending packets doesn't know it. Before its shutdown callback is invoked to tell the sending thread its connection has been closed, the sending thread may deliver messages by tipc_conn_sendmsg(), this is why the following error information appears: "Sending subscription event failed, no memory" To eliminate it, allow connection shutdown callback function to be called before connection id is removed in tipc_close_conn(), which makes the sending thread know the truth in time that its socket is closed so that it doesn't send message to it. We also remove the "Sending XXX failed..." error reporting for topology and config services. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	l2tp: fix userspace reception on plain L2TP sockets	Guillaume Nault
	As pppol2tp_recv() never queues up packets to plain L2TP sockets, pppol2tp_recvmsg() never returns data to userspace, thus making the recv*() system calls unusable. Instead of dropping packets when the L2TP socket isn't bound to a PPP channel, this patch adds them to its reception queue. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	l2tp: fix manual sequencing (de)activation in L2TPv2	Guillaume Nault
	Commit e0d4435f "l2tp: Update PPP-over-L2TP driver to work over L2TPv3" broke the PPPOL2TP_SO_SENDSEQ setsockopt. The L2TP header length was previously computed by pppol2tp_l2t_header_len() before each call to l2tp_xmit_skb(). Now that header length is retrieved from the hdr_len session field, this field must be updated every time the L2TP header format is modified, or l2tp_xmit_skb() won't push the right amount of data for the L2TP header. This patch uses l2tp_session_set_header_len() to adjust hdr_len every time sequencing is (de)activated from userspace (either by the PPPOL2TP_SO_SENDSEQ setsockopt or the L2TP_ATTR_SEND_SEQ netlink attribute). Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06	dm thin: fix Documentation for held metadata root feature	Mike Snitzer
	The Documentation for the thin provisioning target's held metadata root feature was incorrect. It is now available and the value for the held metadata root is in block units (not 512b sectors). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-03-06	x86, trace: Further robustify CR2 handling vs tracing	Peter Zijlstra
	Building on commit 0ac09f9f8cd1 ("x86, trace: Fix CR2 corruption when tracing page faults") this patch addresses another few issues: - Now that read_cr2() is lifted into trace_do_page_fault(), we should pass the address to trace_page_fault_entries() to avoid it re-reading a potentially changed cr2. - Put both trace_do_page_fault() and trace_page_fault_entries() under CONFIG_TRACING. - Mark both fault entry functions {,trace_}do_page_fault() as notrace to avoid getting __mcount or other function entry trace callbacks before we've observed CR2. - Mark __do_page_fault() as noinline to guarantee the function tracer does get to see the fault. Cc: <jolsa@redhat.com> Cc: <vincent.weaver@maine.edu> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140306145300.GO9987@twins.programming.kicks-ass.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06	mwifiex: save and copy AP's VHT capability info correctly	Amitkumar Karwar
	While preparing association request, intersection of device's VHT capability information and corresponding field advertised by AP is used. This patch fixes a couple errors while saving and copying vht_cap and vht_oper fields from AP's beacon. Cc: <stable@vger.kernel.org> # 3.9+ Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-03-06	mwifiex: copy AP's HT capability info correctly	Amitkumar Karwar
	While preparing association request, intersection of device's HT capability information and corresponding fields advertised by AP is used. This patch fixes an error while copying this field from AP's beacon. Cc: <stable@vger.kernel.org> Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-03-06	Merge branch 'for-john' of ↵	John W. Linville
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
2014-03-06	ACPI / EC: Clear stale EC events on Samsung systems	Kieran Clancy
	A number of Samsung notebooks (530Uxx/535Uxx/540Uxx/550Pxx/900Xxx/etc) continue to log events during sleep (lid open/close, AC plug/unplug, battery level change), which accumulate in the EC until a buffer fills. After the buffer is full (tests suggest it holds 8 events), GPEs stop being triggered for new events. This state persists on wake or even on power cycle, and prevents new events from being registered until the EC is manually polled. This is the root cause of a number of bugs, including AC not being detected properly, lid close not triggering suspend, and low ambient light not triggering the keyboard backlight. The bug also seemed to be responsible for performance issues on at least one user's machine. Juan Manuel Cabo found the cause of bug and the workaround of polling the EC manually on wake. The loop which clears the stale events is based on an earlier patch by Lan Tianyu (see referenced attachment). This patch: - Adds a function acpi_ec_clear() which polls the EC for stale _Q events at most ACPI_EC_CLEAR_MAX (currently 100) times. A warning is logged if this limit is reached. - Adds a flag EC_FLAGS_CLEAR_ON_RESUME which is set to 1 if the DMI system vendor is Samsung. This check could be replaced by several more specific DMI vendor/product pairs, but it's likely that the bug affects more Samsung products than just the five series mentioned above. Further, it should not be harmful to run acpi_ec_clear() on systems without the bug; it will return immediately after finding no data waiting. - Runs acpi_ec_clear() on initialisation (boot), from acpi_ec_add() - Runs acpi_ec_clear() on wake, from acpi_ec_unblock_transactions() References: https://bugzilla.kernel.org/show_bug.cgi?id=44161 References: https://bugzilla.kernel.org/show_bug.cgi?id=45461 References: https://bugzilla.kernel.org/show_bug.cgi?id=57271 References: https://bugzilla.kernel.org/attachment.cgi?id=126801 Suggested-by: Juan Manuel Cabo <juanmanuel.cabo@gmail.com> Signed-off-by: Kieran Clancy <clancy.kieran@gmail.com> Reviewed-by: Lan Tianyu <tianyu.lan@intel.com> Reviewed-by: Dennis Jansen <dennis.jansen@web.de> Tested-by: Kieran Clancy <clancy.kieran@gmail.com> Tested-by: Juan Manuel Cabo <juanmanuel.cabo@gmail.com> Tested-by: Dennis Jansen <dennis.jansen@web.de> Tested-by: Maurizio D'Addona <mauritiusdadd@gmail.com> Tested-by: San Zamoyski <san@plusnet.pl> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06	cpufreq: Initialize governor for a new policy under policy->rwsem	Viresh Kumar
	policy->rwsem is used to lock access to all parts of code modifying struct cpufreq_policy, but it's not used on a new policy created by __cpufreq_add_dev(). Because of that, if cpufreq_update_policy() is called in a tight loop on one CPU in parallel with offline/online of another CPU, then the following crash can be triggered: Unable to handle kernel NULL pointer dereference at virtual address 00000020 pgd = c0003000 [00000020] pgd=80000000004003, pmd=00000000 Internal error: Oops: 206 [#1] PREEMPT SMP ARM PC is at __cpufreq_governor+0x10/0x1ac LR is at cpufreq_update_policy+0x114/0x150 ---[ end trace f23a8defea6cd706 ]--- Kernel panic - not syncing: Fatal exception CPU0: stopping CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396 [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68) [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44) [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc) [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78) [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74) [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c) [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180) [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68) [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30) Fix that by taking locks at appropriate places in __cpufreq_add_dev() as well. Reported-by: Saravana Kannan <skannan@codeaurora.org> Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06	cpufreq: Initialize policy before making it available for others to use	Viresh Kumar
	Policy must be fully initialized before it is being made available for use by others. Otherwise cpufreq_cpu_get() would be able to grab a half initialized policy structure that might not have affected_cpus (for example) populated. Then, anybody accessing those fields will get a wrong value and that will lead to unpredictable results. In order to fix this, do all the necessary initialization before we make the policy structure available via cpufreq_cpu_get(). That will guarantee that any code accessing fields of the policy will get correct data from them. Reported-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06	cpufreq: use cpufreq_cpu_get() to avoid cpufreq_get() race conditions	Aaron Plattner
	If a module calls cpufreq_get while cpufreq is initializing, it's possible for it to be called after cpufreq_driver is set but before cpufreq_cpu_data is written during subsys_interface_register. This happens because cpufreq_get doesn't take the cpufreq_driver_lock around its use of cpufreq_cpu_data. Fix this by using cpufreq_cpu_get(cpu) to look up the policy rather than reading it out of cpufreq_cpu_data directly. cpufreq_cpu_get() takes the appropriate locks to prevent this race from happening. Since it's possible for policy to be NULL if the caller passes in an invalid CPU number or calls the function before cpufreq is initialized, delete the BUG_ON(!policy) and simply return 0. Don't try to return -ENOENT because that's negative and the function returns an unsigned integer. References: https://bbs.archlinux.org/viewtopic.php?id=177934 Signed-off-by: Aaron Plattner <aplattner@nvidia.com> Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06	drm/i915: Fix PSR programming	Ben Widawsky
	\| has a higher precedence than ?. Therefore, the calculation doesn't do at all what you would expect. Thanks to Ken for convincing me that this was indeed the issue. Send me back to C programmer school, please. I'm sort of surprised PSR was continuing to work for people. It should be broken IMO (and it was broken for me, but I had assumed it never worked). Regression from: commit ed8546ac1f99b850879f07b1e9b06b42fb0a36d9 Author: Ben Widawsky <benjamin.widawsky@intel.com> Date: Mon Nov 4 22:45:05 2013 -0800 drm/i915/bdw: Support eDP PSR Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com> Cc: Kenneth Graunke <kenneth.w.graunke@intel.com> Cc: Art Runyan <arthur.j.runyan@intel.com> Reported-by: "Kumar, Kiran S" <kiran.s.kumar@intel.com> Cc: stable@vger.kernel.org [v3.13+] Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-03-06	clocksource: vf_pit_timer: use complement for sched_clock reading	Stefan Agner
	Vybrids PIT register is monitonic decreasing. However, sched_clock reading needs to be monitonic increasing. Use bitwise not to get the complement of the clock register. This fixes the clock going backward. Also, the clock now starts at 0 since we load the register with the maximum value at start. Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Shawn Guo <shawn.guo@linaro.org> Cc: daniel.lezcano@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Link: http://lkml.kernel.org/r/d25af915993aec1b486be653eb86f748ddef54fe.1394057313.git.stefan@agner.ch Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-06	ARM: KVM: fix non-VGIC compilation	Marc Zyngier
	Add a stub for kvm_vgic_addr when compiling without CONFIG_KVM_ARM_VGIC. The usefulness of this configurarion is extremely doubtful, but let's fix it anyway (until we decide that we'll always support a VGIC). Reported-by: Michele Paolino <m.paolino@virtualopensystems.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-05	clk: shmobile: rcar-gen2: Use kick bit to allow Z clock frequency change	Benoit Cousson
	The Z clock frequency change is effective only after setting the kick bit located in the FRQCRB register. Without that, the CA15 CPUs clock rate will never change. Fix that by checking if the kick bit is cleared and enable it to make the clock rate change effective. The bit is cleared automatically upon completion. Signed-off-by: Benoit Cousson <bcousson+renesas@baylibre.com> Acked-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Mike Turquette <mturquette@linaro.org>
2014-03-05	hyperv: Move state setting for link query	Haiyang Zhang
	It moves the state setting for query into rndis_filter_receive_response(). All callbacks including query-complete and status-callback are synchronized by channel->inbound_lock. This prevents pentential race between them. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05	net: macb: DMA-unmap full rx-buffer	Soren Brinkmann
	When allocating RX buffers a fixed size is used, while freeing is based on actually received bytes, resulting in the following kernel warning when CONFIG_DMA_API_DEBUG is enabled: WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1051 check_unmap+0x258/0x894() macb e000b000.ethernet: DMA-API: device driver frees DMA memory with different size [device address=0x000000002d170040] [map size=1536 bytes] [unmap size=60 bytes] Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc3-xilinx-00220-g49f84081ce4f #65 [<c001516c>] (unwind_backtrace) from [<c0011df8>] (show_stack+0x10/0x14) [<c0011df8>] (show_stack) from [<c03c775c>] (dump_stack+0x7c/0xc8) [<c03c775c>] (dump_stack) from [<c00245cc>] (warn_slowpath_common+0x60/0x84) [<c00245cc>] (warn_slowpath_common) from [<c0024670>] (warn_slowpath_fmt+0x2c/0x3c) [<c0024670>] (warn_slowpath_fmt) from [<c0227d44>] (check_unmap+0x258/0x894) [<c0227d44>] (check_unmap) from [<c0228588>] (debug_dma_unmap_page+0x64/0x70) [<c0228588>] (debug_dma_unmap_page) from [<c02ab78c>] (gem_rx+0x118/0x170) [<c02ab78c>] (gem_rx) from [<c02ac4d4>] (macb_poll+0x24/0x94) [<c02ac4d4>] (macb_poll) from [<c031222c>] (net_rx_action+0x6c/0x188) [<c031222c>] (net_rx_action) from [<c0028a28>] (__do_softirq+0x108/0x280) [<c0028a28>] (__do_softirq) from [<c0028e8c>] (irq_exit+0x84/0xf8) [<c0028e8c>] (irq_exit) from [<c000f360>] (handle_IRQ+0x68/0x8c) [<c000f360>] (handle_IRQ) from [<c0008528>] (gic_handle_irq+0x3c/0x60) [<c0008528>] (gic_handle_irq) from [<c0012904>] (__irq_svc+0x44/0x78) Exception stack(0xc056df20 to 0xc056df68) df20: 00000001 c0577430 00000000 c0577430 04ce8e0d 00000002 edfce238 00000000 df40: 04e20f78 00000002 c05981f4 00000000 00000008 c056df68 c0064008 c02d7658 df60: 20000013 ffffffff [<c0012904>] (__irq_svc) from [<c02d7658>] (cpuidle_enter_state+0x54/0xf8) [<c02d7658>] (cpuidle_enter_state) from [<c02d77dc>] (cpuidle_idle_call+0xe0/0x138) [<c02d77dc>] (cpuidle_idle_call) from [<c000f660>] (arch_cpu_idle+0x8/0x3c) [<c000f660>] (arch_cpu_idle) from [<c006bec4>] (cpu_startup_entry+0xbc/0x124) [<c006bec4>] (cpu_startup_entry) from [<c053daec>] (start_kernel+0x350/0x3b0) ---[ end trace d5fdc38641bd3a11 ]--- Mapped at: [<c0227184>] debug_dma_map_page+0x48/0x11c [<c02ab32c>] gem_rx_refill+0x154/0x1f8 [<c02ac7b4>] macb_open+0x270/0x3e0 [<c03152e0>] __dev_open+0x7c/0xfc [<c031554c>] __dev_change_flags+0x8c/0x140 Fixing this by passing the same size which is passed during mapping the memory to the unmap function as well. Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05	net: macb: Check DMA mappings for error	Soren Brinkmann
	With CONFIG_DMA_API_DEBUG enabled the following warning is printed: WARNING: CPU: 0 PID: 619 at lib/dma-debug.c:1101 check_unmap+0x758/0x894() macb e000b000.ethernet: DMA-API: device driver failed to check map error[device address=0x000000002d171c02] [size=322 bytes] [mapped as single] Modules linked in: CPU: 0 PID: 619 Comm: udhcpc Not tainted 3.14.0-rc3-xilinx-00219-gd158fc7f36a2 #63 [<c001516c>] (unwind_backtrace) from [<c0011df8>] (show_stack+0x10/0x14) [<c0011df8>] (show_stack) from [<c03c7714>] (dump_stack+0x7c/0xc8) [<c03c7714>] (dump_stack) from [<c00245cc>] (warn_slowpath_common+0x60/0x84) [<c00245cc>] (warn_slowpath_common) from [<c0024670>] (warn_slowpath_fmt+0x2c/0x3c) [<c0024670>] (warn_slowpath_fmt) from [<c0228244>] (check_unmap+0x758/0x894) [<c0228244>] (check_unmap) from [<c0228588>] (debug_dma_unmap_page+0x64/0x70) [<c0228588>] (debug_dma_unmap_page) from [<c02aba64>] (macb_interrupt+0x1f8/0x2dc) [<c02aba64>] (macb_interrupt) from [<c006c6e4>] (handle_irq_event_percpu+0x2c/0x178) [<c006c6e4>] (handle_irq_event_percpu) from [<c006c86c>] (handle_irq_event+0x3c/0x5c) [<c006c86c>] (handle_irq_event) from [<c006f548>] (handle_fasteoi_irq+0xb8/0x100) [<c006f548>] (handle_fasteoi_irq) from [<c006c148>] (generic_handle_irq+0x20/0x30) [<c006c148>] (generic_handle_irq) from [<c000f35c>] (handle_IRQ+0x64/0x8c) [<c000f35c>] (handle_IRQ) from [<c0008528>] (gic_handle_irq+0x3c/0x60) [<c0008528>] (gic_handle_irq) from [<c0012904>] (__irq_svc+0x44/0x78) Exception stack(0xed197f60 to 0xed197fa8) 7f60: 00000134 60000013 bd94362e bd94362e be96b37c 00000014 fffffd72 00000122 7f80: c000ebe4 ed196000 00000000 00000011 c032c0d8 ed197fa8 c0064008 c000ea20 7fa0: 60000013 ffffffff [<c0012904>] (__irq_svc) from [<c000ea20>] (ret_fast_syscall+0x0/0x48) ---[ end trace 478f921d0d542d1e ]--- Mapped at: [<c0227184>] debug_dma_map_page+0x48/0x11c [<c02aaca0>] macb_start_xmit+0x184/0x2a8 [<c03143c0>] dev_hard_start_xmit+0x334/0x470 [<c032c09c>] sch_direct_xmit+0x78/0x2f8 [<c0314814>] __dev_queue_xmit+0x318/0x708 due to missing checks of the dma mapping. Add the appropriate checks to fix this. Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05	net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunk	Daniel Borkmann
	While working on ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable"), we noticed that there's a skb memory leakage in the error path. Running the same reproducer as in ec0223ec48a9 and by unconditionally jumping to the error label (to simulate an error condition) in sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about the unfreed chunk->auth_chunk skb clone: Unreferenced object 0xffff8800b8f3a000 (size 256): comm "softirq", pid 0, jiffies 4294769856 (age 110.757s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00 ..u^..X......... backtrace: [<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210 [<ffffffff81566929>] skb_clone+0x49/0xb0 [<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp] [<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp] [<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp] [<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210 [<ffffffff815a64af>] nf_reinject+0xbf/0x180 [<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue] [<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink] [<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0 [<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink] [<ffffffff815a2bd8>] netlink_unicast+0x168/0x250 [<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0 [<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0 [<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380 What happens is that commit bbd0d59809f9 clones the skb containing the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case that an endpoint requires COOKIE-ECHO chunks to be authenticated: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- ------------------ AUTH; COOKIE-ECHO ----------------> <-------------------- COOKIE-ACK --------------------- When we enter sctp_sf_do_5_1D_ce() and before we actually get to the point where we process (and subsequently free) a non-NULL chunk->auth_chunk, we could hit the "goto nomem_init" path from an error condition and thus leave the cloned skb around w/o freeing it. The fix is to centrally free such clones in sctp_chunk_destroy() handler that is invoked from sctp_chunk_free() after all refs have dropped; and also move both kfree_skb(chunk->auth_chunk) there, so that chunk->auth_chunk is either NULL (since sctp_chunkify() allocs new chunks through kmem_cache_zalloc()) or non-NULL with a valid skb pointer. chunk->skb and chunk->auth_chunk are the only skbs in the sctp_chunk structure that need to be handeled. While at it, we should use consume_skb() for both. It is the same as dev_kfree_skb() but more appropriately named as we are not a device but a protocol. Also, this effectively replaces the kfree_skb() from both invocations into consume_skb(). Functions are the same only that kfree_skb() assumes that the frame was being dropped after a failure (e.g. for tools like drop monitor), usage of consume_skb() seems more appropriate in function sctp_chunk_destroy() though. Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <yasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05	r8152: disable the ECM mode	hayeswang
	There are known issues for switching the drivers between ECM mode and vendor mode. The interrup transfer may become abnormal. The hardware may have the opportunity to die if you change the configuration without unloading the current driver first, because all the control transfers of the current driver would fail after the command of switching the configuration. Although to use the ecm driver and vendor driver independently is fine, it may have problems to change the driver from one to the other by switching the configuration. Additionally, now the vendor mode driver is more powerful than the ECM driver. Thus, disable the ECM mode driver, and let r8152 to set the configuration to vendor mode and reset the device automatically. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05	net/mlx4: Support shutdown() interface	Gavin Shan
	In kexec scenario, we failed to load the mlx4 driver in the second kernel because the ownership bit was hold by the first kernel without release correctly. The patch adds shutdown() interface so that the ownership can be released correctly in the first kernel. It also helps avoiding EEH error happened during boot stage of the second kernel because of undesired traffic, which can't be handled by hardware during that stage on Power platform. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Tested-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05	bridge: multicast: add sanity check for query source addresses	Linus Lüssing
	MLD queries are supposed to have an IPv6 link-local source address according to RFC2710, section 4 and RFC3810, section 5.1.14. This patch adds a sanity check to ignore such broken MLD queries. Without this check, such malformed MLD queries can result in a denial of service: The queries are ignored by any MLD listener therefore they will not respond with an MLD report. However, without this patch these malformed MLD queries would enable the snooping part in the bridge code, potentially shutting down the according ports towards these hosts for multicast traffic as the bridge did not learn about these listeners. Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Linus Lüssing <linus.luessing@web.de> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-05	net: fix for a race condition in the inet frag code	Nikolay Aleksandrov
	I stumbled upon this very serious bug while hunting for another one, it's a very subtle race condition between inet_frag_evictor, inet_frag_intern and the IPv4/6 frag_queue and expire functions (basically the users of inet_frag_kill/inet_frag_put). What happens is that after a fragment has been added to the hash chain but before it's been added to the lru_list (inet_frag_lru_add) in inet_frag_intern, it may get deleted (either by an expired timer if the system load is high or the timer sufficiently low, or by the fraq_queue function for different reasons) before it's added to the lru_list, then after it gets added it's a matter of time for the evictor to get to a piece of memory which has been freed leading to a number of different bugs depending on what's left there. I've been able to trigger this on both IPv4 and IPv6 (which is normal as the frag code is the same), but it's been much more difficult to trigger on IPv4 due to the protocol differences about how fragments are treated. The setup I used to reproduce this is: 2 machines with 4 x 10G bonded in a RR bond, so the same flow can be seen on multiple cards at the same time. Then I used multiple instances of ping/ping6 to generate fragmented packets and flood the machines with them while running other processes to load the attacked machine. *It is very important to have the _same flow_ coming in on multiple CPUs concurrently. Usually the attacked machine would die in less than 30 minutes, if configured properly to have many evictor calls and timeouts it could happen in 10 minutes or so. An important point to make is that any caller (frag_queue or timer) of inet_frag_kill will remove both the timer refcount and the original/guarding refcount thus removing everything that's keeping the frag from being freed at the next inet_frag_put. All of this could happen before the frag was ever added to the LRU list, then it gets added and the evictor uses a freed fragment. An example for IPv6 would be if a fragment is being added and is at the stage of being inserted in the hash after the hash lock is released, but before inet_frag_lru_add executes (or is able to obtain the lru lock) another overlapping fragment for the same flow arrives at a different CPU which finds it in the hash, but since it's overlapping it drops it invoking inet_frag_kill and thus removing all guarding refcounts, and afterwards freeing it by invoking inet_frag_put which removes the last refcount added previously by inet_frag_find, then inet_frag_lru_add gets executed by inet_frag_intern and we have a freed fragment in the lru_list. The fix is simple, just move the lru_add under the hash chain locked region so when a removing function is called it'll have to wait for the fragment to be added to the lru_list, and then it'll remove it (it works because the hash chain removal is done before the lru_list one and there's no window between the two list adds when the frag can get dropped). With this fix applied I couldn't kill the same machine in 24 hours with the same setup. Fixes: 3ef0eb0db4bf ("net: frag, move LRU list maintenance outside of rwlock") CC: Florian Westphal <fw@strlen.de> CC: Jesper Dangaard Brouer <brouer@redhat.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>