summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-09-01staging: fsl-dpaa2/eth: Delay netdev_register() callIoana Radulescu
Only call netdev_register() at the end of the probe function, once all other necessary bits and pieces are properly initialized. We keep the rest of the netdevice initialization code in place, at the earlier point of the probing sequence, including the settings previously done in ndo_init. Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01x86/vdso: Fix lsl operand orderSamuel Neves
In the __getcpu function, lsl is using the wrong target and destination registers. Luckily, the compiler tends to choose %eax for both variables, so it has been working so far. Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt
2018-09-01Merge tag 'linux-watchdog-4.19-rc2' of ↵Linus Torvalds
git://www.linux-watchdog.org/linux-watchdog Pull watchdog fixlet from Wim Van Sebroeck: "Document support for r8a774a1" * tag 'linux-watchdog-4.19-rc2' of git://www.linux-watchdog.org/linux-watchdog: dt-bindings: watchdog: renesas-wdt: Document r8a774a1 support
2018-09-01Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Two small fixes, one for the x86 Stoney SoC to get a more accurate clk frequency and the other to fix a bad allocation in the Nuvoton NPCM7XX driver" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: x86: Set default parent to 48Mhz clk: npcm7xx: fix memory allocation
2018-09-01kernel/dma/direct: take DMA offset into account in dma_direct_supportedChristoph Hellwig
When a device has a DMA offset the dma capable result will change due to the difference between the physical and DMA address. Take that into account. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-09-01x86/mce: Fix set_mce_nospec() to avoid #GP faultLuckTony
The trick with flipping bit 63 to avoid loading the address of the 1:1 mapping of the poisoned page while the 1:1 map is updated used to work when unmapping the page. But it falls down horribly when attempting to directly set the page as uncacheable. The problem is that when the cache mode is changed to uncachable, the pages needs to be flushed from the cache first. But the decoy address is non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a #GP fault. Add code to change_page_attr_set_clr() to fix the address before calling flush. Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk
2018-08-31ibmvnic: Include missing return code checks in reset functionThomas Falcon
Check the return codes of these functions and halt reset in case of failure. The driver will remain in a dormant state until the next reset event, when device initialization will be re-attempted. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31selftests: pmtu: detect correct binary to ping ipv6 addressesSabrina Dubroca
Some systems don't have the ping6 binary anymore, and use ping for everything. Detect the absence of ping6 and try to use ping instead. Fixes: d1f1b9cbf34c ("selftests: net: Introduce first PMTU test") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31selftests: pmtu: maximum MTU for vti4 is 2^16-1-20Sabrina Dubroca
Since commit 82612de1c98e ("ip_tunnel: restore binding to ifaces with a large mtu"), the maximum MTU for vti4 is based on IP_MAX_MTU instead of the mysterious constant 0xFFF8. This makes this selftest fail. Fixes: 82612de1c98e ("ip_tunnel: restore binding to ifaces with a large mtu") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31bridge: Switch to bitmap_zalloc()Andy Shevchenko
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31tcp: do not restart timewait timer on rst receptionFlorian Westphal
RFC 1337 says: ''Ignore RST segments in TIME-WAIT state. If the 2 minute MSL is enforced, this fix avoids all three hazards.'' So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk expire rather than removing it instantly when a reset is received. However, Linux will also re-start the TIME-WAIT timer. This causes connect to fail when tying to re-use ports or very long delays (until syn retry interval exceeds MSL). packetdrill test case: // Demonstrate bogus rearming of TIME-WAIT timer in rfc1337 mode. `sysctl net.ipv4.tcp_rfc1337=1` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Receive first segment 0.310 < P. 1:1001(1000) ack 1 win 46 // Send one ACK 0.310 > . 1:1(0) ack 1001 // read 1000 byte 0.310 read(4, ..., 1000) = 1000 // Application writes 100 bytes 0.350 write(4, ..., 100) = 100 0.350 > P. 1:101(100) ack 1001 // ACK 0.500 < . 1001:1001(0) ack 101 win 257 // close the connection 0.600 close(4) = 0 0.600 > F. 101:101(0) ack 1001 win 244 // Our side is in FIN_WAIT_1 & waits for ack to fin 0.7 < . 1001:1001(0) ack 102 win 244 // Our side is in FIN_WAIT_2 with no outstanding data. 0.8 < F. 1001:1001(0) ack 102 win 244 0.8 > . 102:102(0) ack 1002 win 244 // Our side is now in TIME_WAIT state, send ack for fin. 0.9 < F. 1002:1002(0) ack 102 win 244 0.9 > . 102:102(0) ack 1002 win 244 // Peer reopens with in-window SYN: 1.000 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7> // Therefore, reply with ACK. 1.000 > . 102:102(0) ack 1002 win 244 // Peer sends RST for this ACK. Normally this RST results // in tw socket removal, but rfc1337=1 setting prevents this. 1.100 < R 1002:1002(0) win 244 // second syn. Due to rfc1337=1 expect another pure ACK. 31.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7> 31.0 > . 102:102(0) ack 1002 win 244 // .. and another RST from peer. 31.1 < R 1002:1002(0) win 244 31.2 `echo no timer restart;ss -m -e -a -i -n -t -o state TIME-WAIT` // third syn after one minute. Time-Wait socket should have expired by now. 63.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7> // so we expect a syn-ack & 3whs to proceed from here on. 63.0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> Without this patch, 'ss' shows restarts of tw timer and last packet is thus just another pure ack, more than one minute later. This restores the original code from commit 283fd6cf0be690a83 ("Merge in ANK networking jumbo patch") in netdev-vger-cvs.git . For some reason the else branch was removed/lost in 1f28b683339f7 ("Merge in TCP/UDP optimizations and [..]") and timer restart became unconditional. Reported-by: Michal Tesar <mtesar@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31net/rds: RDS is not Radio Data SystemPavel Machek
Getting prompt "The RDS Protocol" (RDS) is not too helpful, and it is easily confused with Radio Data System (which we may want to support in kernel, too). Signed-off-by: Pavel Machek <pavel@ucw.cz> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()Dexuan Cui
This patch fixes the race between netvsc_probe() and rndis_set_subchannel(), which can cause a deadlock. These are the related 3 paths which show the deadlock: path #1: Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus] Call Trace: schedule schedule_preempt_disabled __mutex_lock __device_attach bus_probe_device device_add vmbus_device_register vmbus_onoffer vmbus_onmessage_work process_one_work worker_thread kthread ret_from_fork path #2: schedule schedule_preempt_disabled __mutex_lock netvsc_probe vmbus_probe really_probe __driver_attach bus_for_each_dev driver_attach_async async_run_entry_fn process_one_work worker_thread kthread ret_from_fork path #3: Workqueue: events netvsc_subchan_work [hv_netvsc] Call Trace: schedule rndis_set_subchannel netvsc_subchan_work process_one_work worker_thread kthread ret_from_fork Before path #1 finishes, path #2 can start to run, because just before the "bus_probe_device(dev);" in device_add() in path #1, there is a line "object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can immediately try to load hv_netvsc and hence path #2 can start to run. Next, path #2 offloads the subchannal's initialization to a workqueue, i.e. path #3, so we can end up in a deadlock situation like this: Path #2 gets the device lock, and is trying to get the rtnl lock; Path #3 gets the rtnl lock and is waiting for all the subchannel messages to be processed; Path #1 is trying to get the device lock, but since #2 is not releasing the device lock, path #1 has to sleep; since the VMBus messages are processed one by one, this means the sub-channel messages can't be procedded, so #3 has to sleep with the rtnl lock held, and finally #2 has to sleep... Now all the 3 paths are sleeping and we hit the deadlock. With the patch, we can make sure #2 gets both the device lock and the rtnl lock together, gets its job done, and releases the locks, so #1 and #3 will not be blocked for ever. Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31net: dsa: mv88e6xxx: Share main switch IRQMarek Behún
On some boards the interrupt can be shared between multiple devices. For example on Turris Mox the interrupt is shared between all switches. Signed-off-by: Marek Behun <marek.behun@nic.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31net/ipv6: Do not reset nl_net in ip6_route_info_createDavid Ahern
nl_net is set on entry to ip6_route_info_create. Only devices within that namespace are considered so no need to reset it before returning. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31net/ipv4: Add extack message that dev is required for ONLINKDavid Ahern
Make IPv4 consistent with IPv6 and return an extack message that the ONLINK flag requires a nexthop device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31tcp: change IPv6 flow-label upon receiving spurious retransmissionYuchung Cheng
Currently a Linux IPv6 TCP sender will change the flow label upon timeouts to potentially steer away from a data path that has gone bad. However this does not help if the problem is on the ACK path and the data path is healthy. In this case the receiver is likely to receive repeated spurious retransmission because the sender couldn't get the ACKs in time and has recurring timeouts. This patch adds another feature to mitigate this problem. It leverages the DSACK states in the receiver to change the flow label of the ACKs to speculatively re-route the ACK packets. In order to allow triggering on the second consecutive spurious RTO, the receiver changes the flow label upon sending a second consecutive DSACK for a sequence number below RCV.NXT. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31nfp: wait for posted reconfigs when disabling the deviceJakub Kicinski
To avoid leaking a running timer we need to wait for the posted reconfigs after netdev is unregistered. In common case the process of deinitializing the device will perform synchronous reconfigs which wait for posted requests, but especially with VXLAN ports being actively added and removed there can be a race condition leaving a timer running after adapter structure is freed leading to a crash. Add an explicit flush after deregistering and for a good measure a warning to check if timer is running just before structures are freed. Fixes: 3d780b926a12 ("nfp: add async reconfiguration mechanism") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31Revert "packet: switch kvzalloc to allocate memory"Eric Dumazet
This reverts commit 71e41286203c017d24f041a7cd71abea7ca7b1e0. mmap()/munmap() can not be backed by kmalloced pages : We fault in : VM_BUG_ON_PAGE(PageSlab(page), page); unmap_single_vma+0x8a/0x110 unmap_vmas+0x4b/0x90 unmap_region+0xc9/0x140 do_munmap+0x274/0x360 vm_munmap+0x81/0xc0 SyS_munmap+0x2b/0x40 do_syscall_64+0x13e/0x1c0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: 71e41286203c ("packet: switch kvzalloc to allocate memory") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: John Sperbeck <jsperbeck@google.com> Bisected-by: John Sperbeck <jsperbeck@google.com> Cc: Zhang Yu <zhangyu31@baidu.com> Cc: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31net_sched: add missing tcf_lock for act_connmarkCong Wang
According to the new locking rule, we have to take tcf_lock for both ->init() and ->dump(), as RTNL will be removed. However, it is missing for act_connmark. Cc: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31veth: add software timestampingMichael Walle
Provide a software TX timestamp as well as the ethtool query interface and report the software timestamp capabilities. Tested with "ethtool -T" and two linuxptp instances each bound to a tunnel endpoint. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31Revert "net: sched: act: add extack for lookup callback"Cong Wang
This reverts commit 331a9295de23 ("net: sched: act: add extack for lookup callback"). This extack is never used after 6 months... In fact, it can be just set in the caller, right after ->lookup(). Cc: Alexander Aring <aring@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-09-01 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add AF_XDP zero-copy support for i40e driver (!), from Björn and Magnus. 2) BPF verifier improvements by giving each register its own liveness chain which allows to simplify and getting rid of skip_callee() logic, from Edward. 3) Add bpf fs pretty print support for percpu arraymap, percpu hashmap and percpu lru hashmap. Also add generic percpu formatted print on bpftool so the same can be dumped there, from Yonghong. 4) Add bpf_{set,get}sockopt() helper support for TCP_SAVE_SYN and TCP_SAVED_SYN options to allow reflection of tos/tclass from received SYN packet, from Nikita. 5) Misc improvements to the BPF sockmap test cases in terms of cgroup v2 interaction and removal of incorrect shutdown() calls, from John. 6) Few cleanups in xdp_umem_assign_dev() and xdpsock samples, from Prashant. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01xsk: i40e: get rid of useless struct xdp_umem_propsMagnus Karlsson
This commit gets rid of the structure xdp_umem_props. It was there to be able to break a dependency at one point, but this is no longer needed. The values in the struct are instead stored directly in the xdp_umem structure. This simplifies the xsk code as well as af_xdp zero-copy drivers and as a bonus gets rid of one internal header file. The i40e driver is also adapted to the new interface in this commit. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01i40e: fix possible compiler warning in xsk TX pathMagnus Karlsson
With certain gcc versions, it was possible to get the warning "'tx_desc' may be used uninitialized in this function" for the i40e_xmit_zc. This was not possible, however this commit simplifies the code path so that this warning is no longer emitted. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01bpf: add selftest for bpf's (set|get)_sockopt for SAVE_SYNNikita V. Shirokov
adding selftest for feature, introduced in commit 9452048c79404 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt"). Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01samples/bpf: xdpsock, minor fixesPrashant Bhole
- xsks_map size was fixed to 4, changed it MAX_SOCKS - Remove redundant definition of MAX_SOCKS in xdpsock_user.c - In dump_stats(), add NULL check for xsks[i] Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01xsk: remove unnecessary assignmentPrashant Bhole
Since xdp_umem_query() was added one assignment of bpf.command was missed from cleanup. Removing the assignment statement. Fixes: 84c6b86875e01a0 ("xsk: don't allow umem replace at stack level") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN sample programNikita V. Shirokov
Sample program which shows TCP_SAVE_SYN/TCP_SAVED_SYN usage example: bpf program which is doing TOS/TCLASS reflection (server would reply with a same TOS/TCLASS as client). Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockoptNikita V. Shirokov
Adding support for two new bpf get/set sockopts: TCP_SAVE_SYN (set) and TCP_SAVED_SYN (get). This would allow for bpf program to build logic based on data from ingress SYN packet (e.g. doing tcp's tos/ tclass reflection (see sample prog)) and do it transparently from userspace program point of view. Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-01xdp: remove redundant variable 'headroom'Colin Ian King
Variable 'headroom' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable ‘headroom’ set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-31Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-08-30 This series contains updates to i40e, i40evf and virtchnl. Jake implements helper functions to use an array to handle the queue stats which reduces the boiler plate code as well as keep the complexity localized to a few functions. Paweł adds the ability to change a VF's MAC address from the host side without having to reload the VF driver on the guest side. Paul adds a check to ensure that the number of queues that the PF sends to the VF is equal to or less than the maximum number of queues the VF can support. Mitch fixes an issue caught by GCC 8, where we need to not include the terminating null in the length of the string for strncpy(). Lihong fixes a VF issue to ensure that it does not enter into promiscuous mode when macvlan is added to the VF. Fixed a potential crash after a VF is removed, since the workqueue sync for the adminq task was not being cancelled. Harshitha fixes the type for field_flags in the virtchnl_filter struct. Martyna removes an unnecessary check in a conditional if statement. Björn fixes an issue reported by Jesper Dangaard Brouer, where the driver was reporting incorrect statistics when XDP was enabled. Jan fixes the potential reporting of incorrect speed settings. Patryk fixed an issue where the flag I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING was getting set when any offload is set via ethtool. Resolved by only setting this flag when VLAN offload is enabled. Also ensure we hold the rtnl lock when we are clearing the interrupt scheme. Added a check when deleting the MAC address from the VF to ensure that the MAC address was not set by the PF and if it was, do not delete it. v2: updated patch 2 in the series based on community feedback from David Miller to inline a function ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "A few arm64 fixes came in this week, specifically fixing some nasty truncation of return values from firmware calls and resolving a VM_BUG_ON due to accessing uninitialised struct pages corresponding to NOMAP pages. Summary: - Fix typos in SVE documentation - Fix type-checking and implicit truncation for SMCCC calls - Force CONFIG_HOLES_IN_ZONE=y so that SLAB doesn't fall over NOMAP regions" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mm: always enable CONFIG_HOLES_IN_ZONE arm/arm64: smccc-1.1: Handle function result as parameters arm/arm64: smccc-1.1: Make return values unsigned long Documentation/arm64/sve: Couple of improvements and typos
2018-08-31x86/efi: Load fixmap GDT in efi_call_phys_epilog()Joerg Roedel
When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap for the simple reason that this address is also mapped for user-space. The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT to call EFI runtime services and switch back to the kernel GDT when they return. But the switch-back uses the writable GDT, not the fixmap GDT. When that happened and and the CPU returns to user-space it switches to the user %cr3 and tries to restore user segment registers. This fails because the writable GDT is not mapped in the user page-table, and without a GDT the fault handlers also can't be launched. The result is a triple fault and reboot of the machine. Fix that by restoring the GDT back to the fixmap GDT which is also mapped in the user page-table. Fixes: 7757d607c6b3 x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32') Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: hpa@zytor.com Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org
2018-08-31Merge tag 'for-linus-4.19b-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - minor cleanup avoiding a warning when building with new gcc - a patch to add a new sysfs node for Xen frontend/backend drivers to make it easier to obtain the state of a pv device - two fixes for 32-bit pv-guests to avoid intermediate L1TF vulnerable PTEs * tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: remove redundant variable save_pud xen: export device state to sysfs x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear x86/xen: don't write ptes directly in 32-bit PV guests
2018-08-31Merge tag 'm68k-for-v4.19-tag2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k fix from Geert Uytterhoeven: "Just a single fix for a bug introduced during the merge window: fix wrong date and time on PMU-based Macs" * tag 'm68k-for-v4.19-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k/mac: Use correct PMU response format
2018-08-31Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - regression fixes for i801 and designware - better API and leak fix for releasing DMA safe buffers - better greppable strings for the bitbang algorithm * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: sh_mobile: fix leak when using DMA bounce buffer i2c: sh_mobile: define start_ch() void as it only returns 0 anyhow i2c: refactor function to release a DMA safe buffer i2c: algos: bit: make the error messages grepable i2c: designware: Re-init controllers with pm_disabled set on resume i2c: i801: Allow ACPI AML access I/O ports not reserved for SMBus
2018-08-31x86/nmi: Fix NMI uaccess race against CR3 switchingAndy Lutomirski
A NMI can hit in the middle of context switching or in the middle of switch_mm_irqs_off(). In either case, CR3 might not match current->mm, which could cause copy_from_user_nmi() and friends to read the wrong memory. Fix it by adding a new nmi_uaccess_okay() helper and checking it in copy_from_user_nmi() and in __copy_from_user_nmi()'s callers. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rik van Riel <riel@surriel.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
2018-08-31x86: Allow generating user-space headers without a compilerBen Hutchings
When bootstrapping an architecture, it's usual to generate the kernel's user-space headers (make headers_install) before building a compiler. Move the compiler check (for asm goto support) to the archprepare target so that it is only done when building code for the target. Fixes: e501ce957a78 ("x86: Force asm-goto") Reported-by: Helmut Grohne <helmutg@debian.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk
2018-08-31x86/dumpstack: Don't dump kernel memory based on usermode RIPJann Horn
show_opcodes() is used both for dumping kernel instructions and for dumping user instructions. If userspace causes #PF by jumping to a kernel address, show_opcodes() can be reached with regs->ip controlled by the user, pointing to kernel code. Make sure that userspace can't trick us into dumping kernel memory into dmesg. Fixes: 7cccf0725cf7 ("x86/dumpstack: Add a show_ip() function") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: security@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
2018-08-31of: Add device_type access helper functionsRob Herring
In preparation to remove direct access to device_node.type, add of_node_is_type() and of_node_get_device_type() helpers to check and retrieve the device type. Cc: Frank Rowand <frowand.list@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org>
2018-08-31cpu/hotplug: Remove skip_onerr field from cpuhp_step structureMukesh Ojha
When notifiers were there, `skip_onerr` was used to avoid calling particular step startup/teardown callbacks in the CPU up/down rollback path, which made the hotplug asymmetric. As notifiers are gone now after the full state machine conversion, the `skip_onerr` field is no longer required. Remove it from the structure and its usage. Signed-off-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1535439294-31426-1-git-send-email-mojha@codeaurora.org
2018-08-31arm64: mm: always enable CONFIG_HOLES_IN_ZONEJames Morse
Commit 6d526ee26ccd ("arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA") only enabled HOLES_IN_ZONE for NUMA systems because the NUMA code was choking on the missing zone for nomap pages. This problem doesn't just apply to NUMA systems. If the architecture doesn't set HAVE_ARCH_PFN_VALID, pfn_valid() will return true if the pfn is part of a valid sparsemem section. When working with multiple pages, the mm code uses pfn_valid_within() to test each page it uses within the sparsemem section is valid. On most systems memory comes in MAX_ORDER_NR_PAGES chunks which all have valid/initialised struct pages. In this case pfn_valid_within() is optimised out. Systems where this isn't true (e.g. due to nomap) should set HOLES_IN_ZONE and provide HAVE_ARCH_PFN_VALID so that mm tests each page as it works with it. Currently non-NUMA arm64 systems can't enable HOLES_IN_ZONE, leading to a VM_BUG_ON(): | page:fffffdff802e1780 is uninitialized and poisoned | raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff | raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff | page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) | ------------[ cut here ]------------ | kernel BUG at include/linux/mm.h:978! | Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [...] | CPU: 1 PID: 25236 Comm: dd Not tainted 4.18.0 #7 | Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 | pstate: 40000085 (nZcv daIf -PAN -UAO) | pc : move_freepages_block+0x144/0x248 | lr : move_freepages_block+0x144/0x248 | sp : fffffe0071177680 [...] | Process dd (pid: 25236, stack limit = 0x0000000094cc07fb) | Call trace: | move_freepages_block+0x144/0x248 | steal_suitable_fallback+0x100/0x16c | get_page_from_freelist+0x440/0xb20 | __alloc_pages_nodemask+0xe8/0x838 | new_slab+0xd4/0x418 | ___slab_alloc.constprop.27+0x380/0x4a8 | __slab_alloc.isra.21.constprop.26+0x24/0x34 | kmem_cache_alloc+0xa8/0x180 | alloc_buffer_head+0x1c/0x90 | alloc_page_buffers+0x68/0xb0 | create_empty_buffers+0x20/0x1ec | create_page_buffers+0xb0/0xf0 | __block_write_begin_int+0xc4/0x564 | __block_write_begin+0x10/0x18 | block_write_begin+0x48/0xd0 | blkdev_write_begin+0x28/0x30 | generic_perform_write+0x98/0x16c | __generic_file_write_iter+0x138/0x168 | blkdev_write_iter+0x80/0xf0 | __vfs_write+0xe4/0x10c | vfs_write+0xb4/0x168 | ksys_write+0x44/0x88 | sys_write+0xc/0x14 | el0_svc_naked+0x30/0x34 | Code: aa1303e0 90001a01 91296421 94008902 (d4210000) | ---[ end trace 1601ba47f6e883fe ]--- Remove the NUMA dependency. Link: https://www.spinics.net/lists/arm-kernel/msg671851.html Cc: <stable@vger.kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-08-31m68k/mac: Use correct PMU response formatFinn Thain
Now that the 68k Mac port has adopted the via-pmu driver, it must decode the PMU response accordingly otherwise the date and time will be wrong. Fixes: ebd722275f9cfc67 ("macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-08-30Merge tag 'drm-fixes-2018-08-31' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular fixes pull: - Mediatek has a bunch of fixes to their RDMA and Overlay engines. - i915 has some Cannonlake/Geminilake watermark workarounds, LSPCON fix, HDCP free fix, audio fix and a ppgtt reference counting fix. - amdgpu has some SRIOV, Kasan, memory leaks and other misc fixes" * tag 'drm-fixes-2018-08-31' of git://anongit.freedesktop.org/drm/drm: (35 commits) drm/i915/audio: Hook up component bindings even if displays are disabled drm/i915: Increase LSPCON timeout drm/i915: Stop holding a ref to the ppgtt from each vma drm/i915: Free write_buf that we allocated with kzalloc. drm/i915: Fix glk/cnl display w/a #1175 drm/amdgpu: Need to set moved to true when evict bo drm/amdgpu: Remove duplicated power source update drm/amd/display: Fix memory leak caused by missed dc_sink_release drm/amdgpu: fix holding mn_lock while allocating memory drm/amdgpu: Power on uvd block when hw_fini drm/amdgpu: Update power state at the end of smu hw_init. drm/amdgpu: Fix vce initialize failed on Kaveri/Mullins drm/amdgpu: Enable/disable gfx PG feature in rlc safe mode drm/amdgpu: Adjust the VM size based on system memory size v2 drm/mediatek: fix connection from RDMA2 to DSI1 drm/mediatek: update some variable name from ovl to comp drm/mediatek: use layer_nr function to get layer number to init plane drm/mediatek: add function to return RDMA layer number drm/mediatek: add function to return OVL layer number drm/mediatek: add function to get layer number for component ...
2018-08-30disable stringop truncation warnings for nowStephen Rothwell
They are too noisy Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-30Merge tag 'pm-4.19-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These address a corner case in the menu cpuidle governor and fix error handling in the PM core's generic clock management code. Specifics: - Make the menu cpuidle governor avoid stopping the scheduler tick if the predicted idle duration exceeds the tick period length, but the selected idle state is shallow and deeper idle states with high target residencies are available (Rafael Wysocki). - Make the PM core's generic clock management code use a proper data type for one variable to make error handling work (Dan Carpenter)" * tag 'pm-4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle: menu: Retain tick when shallow state is selected PM / clk: signedness bug in of_pm_clk_add_clks()
2018-08-31Merge branch 'pm-core'Rafael J. Wysocki
Merge a generic clock management fix for 4.19-rc2. * pm-core: PM / clk: signedness bug in of_pm_clk_add_clks()
2018-08-30clk: x86: Set default parent to 48MhzAkshu Agrawal
System clk provided in ST soc can be set to: 48Mhz, non-spread 25Mhz, spread To get accurate rate, we need it to set it at non-spread option which is 48Mhz. Signed-off-by: Akshu Agrawal <akshu.agrawal@amd.com> Reviewed-by: Daniel Kurtz <djkurtz@chromium.org> Fixes: 421bf6a1f061 ("clk: x86: Add ST oscout platform clock") Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-08-30i2c: sh_mobile: fix leak when using DMA bounce bufferWolfram Sang
We only freed the bounce buffer after successful DMA, missing the cases where DMA setup may have gone wrong. Use a better location which always gets called after each message and use 'stop_after_dma' as a flag for a successful transfer. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>