summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-01Merge branch 'bpf-sockmap-fixes'Daniel Borkmann
John Fastabend says: ==================== This addresses two syzbot issues that lead to identifying (by Eric and Wei) a class of bugs where we don't correctly check for IPv4/v6 sockets and their associated state. The second issue was a locking omission in sockhash. The first patch addresses IPv6 socks and fixing an error where sockhash would overwrite the prot pointer with IPv4 prot. To fix this build similar solution to TLS ULP. Although we continue to allow socks in all states not just ESTABLISH in this patch set because as Martin points out there should be no issue with this on the sockmap ULP because we don't use the ctx in this code. Once multiple ULPs coexist we may need to revisit this. However we can do this in *next trees. The other issue syzbot found that the tcp_close() handler missed locking the hash bucket lock which could result in corrupting the sockhash bucket list if delete and close ran at the same time. And also the smap_list_remove() routine was not working correctly at all. This was not caught in my testing because in general my tests (to date at least lets add some more robust selftest in bpf-next) do things in the "expected" order, create map, add socks, delete socks, then tear down maps. The tests we have that do the ops out of this order where only working on single maps not multi- maps so we never saw the issue. Thanks syzbot. The fix is to restructure the tcp_close() lock handling. And fix the obvious bug in smap_list_remove(). Finally, during review I noticed the release handler was omitted from the upstream code (patch 4) due to an incorrect merge conflict fix when I ported the code to latest bpf-next before submitting. This would leave references to the map around if the user never closes the map. v3: rework patches, dropping ESTABLISH check and adding rcu annotation along with the smap_list_remove fix v4: missed one more case where maps was being accessed without the sk_callback_lock, spoted by Martin as well. v5: changed to use a specific lock for maps and reduced callback lock so that it is only used to gaurd sk callbacks. I think this makes the logic a bit cleaner and avoids confusion ovoer what each lock is doing. Also big thanks to Martin for thorough review he caught at least one case where I missed a rcu_call(). ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01bpf: sockhash, add release routineJohn Fastabend
Add map_release_uref pointer to hashmap ops. This was dropped when original sockhash code was ported into bpf-next before initial commit. Fixes: 81110384441a ("bpf: sockmap, add hash map support") Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01bpf: sockhash fix omitted bucket lock in sock_closeJohn Fastabend
First the sk_callback_lock() was being used to protect both the sock callback hooks and the psock->maps list. This got overly convoluted after the addition of sockhash (in sockmap it made some sense because masp and callbacks were tightly coupled) so lets split out a specific lock for maps and only use the callback lock for its intended purpose. This fixes a couple cases where we missed using maps lock when it was in fact needed. Also this makes it easier to follow the code because now we can put the locking closer to the actual code its serializing. Next, in sock_hash_delete_elem() the pattern was as follows, sock_hash_delete_elem() [...] spin_lock(bucket_lock) l = lookup_elem_raw() if (l) hlist_del_rcu() write_lock(sk_callback_lock) .... destroy psock ... write_unlock(sk_callback_lock) spin_unlock(bucket_lock) The ordering is necessary because we only know the {p}sock after dereferencing the hash table which we can't do unless we have the bucket lock held. Once we have the bucket lock and the psock element it is deleted from the hashmap to ensure any other path doing a lookup will fail. Finally, the refcnt is decremented and if zero the psock is destroyed. In parallel with the above (or free'ing the map) a tcp close event may trigger tcp_close(). Which at the moment omits the bucket lock altogether (oops!) where the flow looks like this, bpf_tcp_close() [...] write_lock(sk_callback_lock) for each psock->maps // list of maps this sock is part of hlist_del_rcu(ref_hash_node); .... destroy psock ... write_unlock(sk_callback_lock) Obviously, and demonstrated by syzbot, this is broken because we can have multiple threads deleting entries via hlist_del_rcu(). To fix this we might be tempted to wrap the hlist operation in a bucket lock but that would create a lock inversion problem. In summary to follow locking rules the psocks maps list needs the sk_callback_lock (after this patch maps_lock) but we need the bucket lock to do the hlist_del_rcu. To resolve the lock inversion problem pop the head of the maps list repeatedly and remove the reference until no more are left. If a delete happens in parallel from the BPF API that is OK as well because it will do a similar action, lookup the lock in the map/hash, delete it from the map/hash, and dec the refcnt. We check for this case before doing a destroy on the psock to ensure we don't have two threads tearing down a psock. The new logic is as follows, bpf_tcp_close() e = psock_map_pop(psock->maps) // done with map lock bucket_lock() // lock hash list bucket l = lookup_elem_raw(head, hash, key, key_size); if (l) { //only get here if elmnt was not already removed hlist_del_rcu() ... destroy psock... } bucket_unlock() And finally for all the above to work add missing locking around map operations per above. Then add RCU annotations and use rcu_dereference/rcu_assign_pointer to manage values relying on RCU so that the object is not free'd from sock_hash_free() while it is being referenced in bpf_tcp_close(). Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com Fixes: 81110384441a ("bpf: sockmap, add hash map support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01bpf: sockmap, fix smap_list_map_remove when psock is in many mapsJohn Fastabend
If a hashmap is free'd with open socks it removes the reference to the hash entry from the psock. If that is the last reference to the psock then it will also be free'd by the reference counting logic. However the current logic that removes the hash reference from the list of references is broken. In smap_list_remove() we first check if the sockmap entry matches and then check if the hashmap entry matches. But, the sockmap entry sill always match because its NULL in this case which causes the first entry to be removed from the list. If this is always the "right" entry (because the user adds/removes entries in order) then everything is OK but otherwise a subsequent bpf_tcp_close() may reference a free'd object. To fix this create two list handlers one for sockmap and one for sockhash. Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com Fixes: 81110384441a ("bpf: sockmap, add hash map support") Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01bpf: sockmap, fix crash when ipv6 sock is addedJohn Fastabend
This fixes a crash where we assign tcp_prot to IPv6 sockets instead of tcpv6_prot. Previously we overwrote the sk->prot field with tcp_prot even in the AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot are used. Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as 'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously crashing case here. Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-30ACPICA: Drop leading newlines from error messagesRafael J. Wysocki
Commit 5088814a6e93 (ACPICA: AML parser: attempt to continue loading table after error) unintentionally added leading newlines to error messages emitted by ACPICA which caused unexpected things to be printed to the kernel log. Drop these newlines (which effectively reverts the part of commit 5088814a6e93 adding them). Fixes: 5088814a6e93 (ACPICA: AML parser: attempt to continue loading table after error) Reported-by: Toralf Förster <toralf.foerster@gmx.de> Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: 4.17+ <stable@vger.kernel.org> # 4.17+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-30PCI / ACPI / PM: Resume bridges w/o drivers on suspend-to-RAMRafael J. Wysocki
It is reported that commit c62ec4610c40 (PM / core: Fix direct_complete handling for devices with no callbacks) introduced a system suspend regression on Samsung 305V4A by allowing a PCI bridge (not a PCIe port) to stay in D3 over suspend-to-RAM, which is a side effect of setting power.direct_complete for the children of that bridge that have no PM callbacks. On the majority of systems PCI bridges are not allowed to be runtime-suspended (the power/control sysfs attribute is set to "on" for them by default), but user space can change that setting and if it does so and a given bridge has no children with PM callbacks, the direct_complete optimization will be applied to it and it will stay in suspend over system suspend. Apparently, that confuses the platform firmware on the affected machine and that may very well happen elsewhere, so avoid the direct_complete optimization for PCI bridges with no drivers (if there is a driver, it should take care of the PM handling) on suspend-to-RAM altogether (that should not matter for suspend-to-idle as platform firmware is not involved in it). Fixes: c62ec4610c40 (PM / core: Fix direct_complete handling for devices with no callbacks) Link: https://bugzilla.kernel.org/show_bug.cgi?id=199941 Reported-by: n0000b.n000b@gmail.com Tested-by: n0000b.n000b@gmail.com Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: 4.15+ <stable@vger.kernel.org> # 4.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-30Merge branch 'parisc-4.18-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes and cleanups from Helge Deller: "Nothing exiting in this patchset, just - small cleanups of header files - default to 4 CPUs when building a SMP kernel - mark 16kB and 64kB page sizes broken - addition of the new io_pgetevents syscall" * 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Build kernel without -ffunction-sections parisc: Reduce debug output in unwind code parisc: Wire up io_pgetevents syscall parisc: Default to 4 SMP CPUs parisc: Convert printk(KERN_LEVEL) to pr_lvl() parisc: Mark 16kB and 64kB page sizes BROKEN parisc: Drop struct sigaction from not exported header file
2018-06-30Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "A smaller batch for the end of the week (let's see if I can keep the weekly cadence going for once). All medium-grade fixes here, nothing worrisome: - Fixes for some fairly old bugs around SD card write-protect detection and GPIO interrupt assignments on Davinci. - Wifi module suspend fix for Hikey. - Minor DT tweaks to fix inaccuracies for Amlogic platforms, one of which solves booting with third-party u-boot" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm64: dts: hikey960: Define wl1837 power capabilities arm64: dts: hikey: Define wl1835 power capabilities ARM64: dts: meson-gxl: fix Mali GPU compatible string ARM64: dts: meson-axg: fix ethernet stability issue ARM64: dts: meson-gx: fix ATF reserved memory region ARM64: dts: meson-gxl-s905x-p212: Add phy-supply for usb0 ARM64: dts: meson: fix register ranges for SD/eMMC ARM64: dts: meson: disable sd-uhs modes on the libretech-cc ARM: dts: da850: Fix interrups property for gpio ARM: davinci: board-da850-evm: fix WP pin polarity for MMC/SD
2018-06-30Merge tag 'kbuild-fixes-v4.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - introduce __diag_* macros and suppress -Wattribute-alias warnings from GCC 8 - fix stack protector test script for x86_64 - fix line number handling in Kconfig - document that '#' starts a comment in Kconfig - handle P_SYMBOL property in dump debugging of Kconfig - correct help message of LD_DEAD_CODE_DATA_ELIMINATION - fix occasional segmentation faults in Kconfig * tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kconfig: loop boundary condition fix kbuild: reword help of LD_DEAD_CODE_DATA_ELIMINATION kconfig: handle P_SYMBOL in print_symbol() kconfig: document Kconfig source file comments kconfig: fix line numbers for if-entries in menu tree stack-protector: Fix test with 32-bit userland and CONFIG_64BIT=y powerpc: Remove -Wattribute-alias pragmas disable -Wattribute-alias warning for SYSCALL_DEFINEx() kbuild: add macro for controlling warnings to linux/compiler.h
2018-06-30Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "The biggest diffstat comes from self-test updates, plus there's entry code fixes, 5-level paging related fixes, console debug output fixes, and misc fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Clean up the printk()s in show_fault_oops() x86/mm: Drop unneeded __always_inline for p4d page table helpers x86/efi: Fix efi_call_phys_epilog() with CONFIG_X86_5LEVEL=y selftests/x86/sigreturn: Do minor cleanups selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs x86/entry/64/compat: Fix "x86/entry/64/compat: Preserve r8-r11 in int $0x80" x86/mm: Don't free P4D table when it is folded at runtime x86/entry/32: Add explicit 'l' instruction suffix x86/mm: Get rid of KERN_CONT in show_fault_oops()
2018-06-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Tooling fixes mostly, plus a build warning fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf/core: Move inline keyword at the beginning of declaration tools/headers: Pick up latest kernel ABIs perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE] perf script: Fix crash because of missing evsel->priv perf script: Add missing output fields in a hint perf bench: Fix numa report output code perf stat: Remove duplicate event counting perf alias: Rebuild alias expression string to make it comparable perf alias: Remove trailing newline when reading sysfs files perf tools: Fix a clang 7.0 compilation error tools include uapi: Synchronize bpf.h with the kernel tools include uapi: Update if_link.h to pick IFLA_{BRPORT_ISOLATED,VXLAN_TTL_INHERIT} tools include powerpc: Update arch/powerpc/include/uapi/asm/unistd.h copy to get 'rseq' syscall perf tools: Update x86's syscall_64.tbl, adding 'io_pgetevents' and 'rseq' tools headers uapi: Synchronize drm/drm.h perf intel-pt: Fix packet decoding of CYC packets perf tests: Add valid callback for parse-events test perf tests: Add event parsing error handling to parse events test perf report powerpc: Fix crash if callchain is empty perf test session topology: Fix test on s390 ...
2018-06-30Merge tag 'selinux-pr-20180629' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "One fairly straightforward patch to fix a longstanding issue where a process could stall while accessing files in selinuxfs and block everyone else due to a held mutex. The patch passes all our tests and looks to apply cleanly to your current tree" * tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: move user accesses in selinuxfs out of locked regions
2018-06-30Merge tag 'for-linus-20180629' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Small set of fixes for this series. Mostly just minor fixes, the only oddball in here is the sg change. The sg change came out of the stall fix for NVMe, where we added a mempool and limited us to a single page allocation. CONFIG_SG_DEBUG sort-of ruins that, since we'd need to account for that. That's actually a generic problem, since lots of drivers need to allocate SG lists. So this just removes support for CONFIG_SG_DEBUG, which I added back in 2007 and to my knowledge it was never useful. Anyway, outside of that, this pull contains: - clone of request with special payload fix (Bart) - drbd discard handling fix (Bart) - SATA blk-mq stall fix (me) - chunk size fix (Keith) - double free nvme rdma fix (Sagi)" * tag 'for-linus-20180629' of git://git.kernel.dk/linux-block: sg: remove ->sg_magic member drbd: Fix drbd_request_prepare() discard handling blk-mq: don't queue more if we get a busy return block: Fix cloning of requests with a special payload nvme-rdma: fix possible double free of controller async event buffer block: Fix transfer when chunk sectors exceeds max
2018-06-30net: fib_rules: bring back rule_exists to match rule during addRoopa Prabhu
After commit f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find for existing rule lookup in both the add and del paths. While this is good for the delete path, it solves a few problems but opens up a few invalid key matches in the add path. $ip -4 rule add table main tos 10 fwmark 1 $ip -4 rule add table main tos 10 RTNETLINK answers: File exists The problem here is rule_find does not check if the key masks in the new and old rule are the same and hence ends up matching a more secific rule. Rule key masks cannot be easily compared today without an elaborate if-else block. Its best to introduce key masks for easier and accurate rule comparison in the future. Until then, due to fear of regressions this patch re-introduces older loose rule_exists during add. Also fixes both rule_exists and rule_find to cover missing attributes. Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30hv_netvsc: split sub-channel setup into async and syncStephen Hemminger
When doing device hotplug the sub channel must be async to avoid deadlock issues because device is discovered in softirq context. When doing changes to MTU and number of channels, the setup must be synchronous to avoid races such as when MTU and device settings are done in a single ip command. Reported-by: Thomas Walker <Thomas.Walker@twosigma.com> Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Fixes: 732e49850c5e ("netvsc: fix race on sub channel creation") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30net: use dev_change_tx_queue_len() for SIOCSIFTXQLENCong Wang
As noticed by Eric, we need to switch to the helper dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too, otheriwse still miss dev_qdisc_change_tx_queue_len(). Fixes: 6a643ddb5624 ("net: introduce helper dev_change_tx_queue_len()") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30atm: zatm: Fix potential Spectre v1Gustavo A. R. Silva
pool can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue 'zatm_dev->pool_info' (local cap) Fix this by sanitizing pool before using it to index zatm_dev->pool_info Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30Merge branch 's390-qeth-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2018-06-29 please apply a few qeth fixes for -net and your 4.17 stable queue. Patches 1-3 fix several issues wrt to MAC address management that were introduced during the 4.17 cycle. Patch 4 tackles a long-standing issue with busy multi-connection workloads on devices in af_iucv mode. Patch 5 makes sure to re-enable all active HW offloads, after a card was previously set offline and thus lost its HW context. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30s390/qeth: consistently re-enable device featuresJulian Wiedmann
commit e830baa9c3f0 ("qeth: restore device features after recovery") and commit ce3443564145 ("s390/qeth: rely on kernel for feature recovery") made sure that the HW functions for device features get re-programmed after recovery. But we missed that the same handling is also required when a card is first set offline (destroying all HW context), and then online again. Fix this by moving the re-enable action out of the recovery-only path. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30s390/qeth: don't clobber buffer on async TX completionJulian Wiedmann
If qeth_qdio_output_handler() detects that a transmit requires async completion, it replaces the pending buffer's metadata object (qeth_qdio_out_buffer) so that this queue buffer can be re-used while the data is pending completion. Later when the CQ indicates async completion of such a metadata object, qeth_qdio_cq_handler() tries to free any data associated with this object (since HW has now completed the transfer). By calling qeth_clear_output_buffer(), it erronously operates on the queue buffer that _previously_ belonged to this transfer ... but which has been potentially re-used several times by now. This results in double-free's of the buffer's data, and failing transmits as the buffer descriptor is scrubbed in mid-air. The correct way of handling this situation is to 1. scrub the queue buffer when it is prepared for re-use, and 2. later obtain the data addresses from the async-completion notifier (ie. the AOB), instead of the queue buffer. All this only affects qeth devices used for af_iucv HiperTransport. Fixes: 0da9581ddb0f ("qeth: exploit asynchronous delivery of storage blocks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]Vasily Gorbik
*ether_addr*_64bits functions have been introduced to optimize performance critical paths, which access 6-byte ethernet address as u64 value to get "nice" assembly. A harmless hack works nicely on ethernet addresses shoved into a structure or a larger buffer, until busted by Kasan on smth like plain (u8 *)[6]. qeth_l2_set_mac_address calls qeth_l2_remove_mac passing u8 old_addr[ETH_ALEN] as an argument. Adding/removing macs for an ethernet adapter is not that performance critical. Moreover is_multicast_ether_addr_64bits itself on s390 is not faster than is_multicast_ether_addr: is_multicast_ether_addr(%r2) -> %r2 llc %r2,0(%r2) risbg %r2,%r2,63,191,0 is_multicast_ether_addr_64bits(%r2) -> %r2 llgc %r2,0(%r2) risbg %r2,%r2,63,191,0 So, let's just use is_multicast_ether_addr instead of is_multicast_ether_addr_64bits. Fixes: bcacfcbc82b4 ("s390/qeth: fix MAC address update sequence") Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30s390/qeth: fix race when setting MAC addressJulian Wiedmann
When qeth_l2_set_mac_address() finds the card in a non-reachable state, it merely copies the new MAC address into dev->dev_addr so that __qeth_l2_set_online() can later register it with the HW. But __qeth_l2_set_online() may very well be running concurrently, so we can't trust the card state without appropriate locking: If the online sequence is past the point where it registers dev->dev_addr (but not yet in SOFTSETUP state), any address change needs to be properly programmed into the HW. Otherwise the netdevice ends up with a different MAC address than what's set in the HW, and inbound traffic is not forwarded as expected. This is most likely to occur for OSD in LPAR, where commit 21b1702af12e ("s390/qeth: improve fallback to random MAC address") now triggers eg. systemd to immediately change the MAC when the netdevice is registered with a NET_ADDR_RANDOM address. Fixes: bcacfcbc82b4 ("s390/qeth: fix MAC address update sequence") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30Revert "s390/qeth: use Read device to query hypervisor for MAC"Julian Wiedmann
This reverts commit b7493e91c11a757cf0f8ab26989642ee4bb2c642. On its own, querying RDEV for a MAC address works fine. But when upgrading from a qeth that previously queried DDEV on a z/VM NIC (ie. any kernel with commit ec61bd2fd2a2), the RDEV query now returns a _different_ MAC address than the DDEV query. If the NIC is configured with MACPROTECT, z/VM apparently requires us to use the MAC that was initially returned (on DDEV) and registered. So after upgrading to a kernel that uses RDEV, the SETVMAC registration cmd for the new MAC address fails and we end up with a non-operabel interface. To avoid regressions on upgrade, switch back to using DDEV for the MAC address query. The downgrade path (first RDEV, later DDEV) is fine, in this case both queries return the same MAC address. Fixes: b7493e91c11a ("s390/qeth: use Read device to query hypervisor for MAC") Reported-by: Michal Kubecek <mkubecek@suse.com> Tested-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30alx: take rtnl before calling __alx_open from resumeSabrina Dubroca
The __alx_open function can be called from ndo_open, which is called under RTNL, or from alx_resume, which isn't. Since commit d768319cd427, we're calling the netif_set_real_num_{tx,rx}_queues functions, which need to be called under RTNL. This is similar to commit 0c2cc02e571a ("igb: Move the calls to set the Tx and Rx queues into igb_open"). Fixes: d768319cd427 ("alx: enable multiple tx queues") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30sfc: correctly initialise filter rwsem for farchBert Kenward
Fixes: fc7a6c287ff3 ("sfc: use a semaphore to lock farch filters too") Suggested-by: Joseph Korty <joe.korty@concurrent-rt.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30net: phy: DP83TC811: Fix disabling interruptsDan Murphy
Fix a bug where INT_STAT1 was written twice and INT_STAT2 was ignored when disabling interrupts. Fixes: b753a9faaf9a ("net: phy: DP83TC811: Introduce support for the DP83TC811 phy") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30net/ipv6: Fix updates to prefix routeDavid Ahern
Sowmini reported that a recent commit broke prefix routes for linklocal addresses. The newly added modify_prefix_route is attempting to add a new prefix route when the ifp priority does not match the route metric however the check needs to account for the default priority. In addition, the route add fails because the route already exists, and then the delete removes the one that exists. Flip the order to do the delete first. Fixes: 8308f3ff1753 ("net/ipv6: Add support for specifying metric of connected routes") Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30net: cleanup gfp mask in alloc_skb_with_fragsMichal Hocko
alloc_skb_with_frags uses __GFP_NORETRY for non-sleeping allocations which is just a noop and a little bit confusing. __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for high order allocations") to prevent from the OOM killer. Yet this was not enough because fb05e7a89f50 ("net: don't wait for order-3 page allocation") didn't want an excessive reclaim for non-costly orders so it made it completely NOWAIT while it preserved __GFP_NORETRY in place which is now redundant. Drop the pointless __GFP_NORETRY because this function is used as copy&paste source for other places. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30Merge branch 'DPAA-fixes'David S. Miller
Madalin Bucur says: ==================== DPAA fixes A couple of fixes for the DPAA drivers, addressing an issue with short UDP or TCP frames (with padding) that were marked as having a wrong checksum and dropped by the FMan hardware and a problem with the buffer used for the scatter-gather table being too small as per the hardware requirements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30dpaa_eth: DPAA SGT needs to be 256BMadalin Bucur
The DPAA HW requires that at least 256 bytes from the start of the first scatter-gather table entry are allocated and accessible. The hardware reads the maximum size the table can have in one access, thus requiring that the allocation and mapping to be done for the maximum size of 256B even if there is a smaller number of entries in the table. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30fsl/fman: fix parser reporting bad checksum on short framesMadalin Bucur
The FMan hardware parser needs to be configured to remove the short frame padding from the checksum calculation, otherwise short UDP and TCP frames are likely to be marked as having a bad checksum. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30bnx2x: Fix receiving tx-timeout in error or recovery state.Sudarsana Reddy Kalluru
Driver performs the internal reload when it receives tx-timeout event from the OS. Internal reload might fail in some scenarios e.g., fatal HW issues. In such cases OS still see the link, which would result in undesirable functionalities such as re-generation of tx-timeouts. The patch addresses this issue by indicating the link-down to OS when tx-timeout is detected, and keeping the link in down state till the internal reload is successful. Please consider applying it to 'net' branch. Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30cnic: tidy up a size calculationDan Carpenter
Static checkers complain that id_tbl->table points to longs and 4 bytes is smaller than sizeof(long). But the since other side is dividing by 32 instead of sizeof(long), that means the current code works fine. Anyway, it's more conventional to use the BITS_TO_LONGS() macro when we're allocating a bitmap. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30atm: iphase: fix a 64 bit bugDan Carpenter
The code assumes that there is 4 bytes in a pointer and it doesn't allocate enough memory. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30tcp: fix Fast Open key endiannessYuchung Cheng
Fast Open key could be stored in different endian based on the CPU. Previously hosts in different endianness in a server farm using the same key config (sysctl value) would produce different cookies. This patch fixes it by always storing it as little endian to keep same API for LE hosts. Reported-by: Daniele Iamartino <danielei@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29Merge tag 'powerpc-4.18-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Two regression fixes, and a new syscall wire-up: - A fix for the recent conversion to time64_t in the powermac RTC routines, which caused time to go backward. - Another fix for fallout from the split PMD PTL conversion. - Wire up the new io_pgetevents() syscall. Thanks to: Aneesh Kumar K.V, Arnd Bergmann, Breno Leitao, Mathieu Malaterre" * tag 'powerpc-4.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powermac: Fix rtc read/write functions powerpc/mm/32: Fix pgtable_page_dtor call powerpc: Wire up io_pgetevents
2018-06-29Merge tag 'davinci-fixes-for-v4.18' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes This fixes polarity of SD card write-protect pin on DA850 EVM and fixes interrupt property for DA850 SoC GPIO as defined in device-tree. Both of these are not introduced with v4.18 merge but have existed prior. * tag 'davinci-fixes-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci: ARM: dts: da850: Fix interrups property for gpio ARM: davinci: board-da850-evm: fix WP pin polarity for MMC/SD Signed-off-by: Olof Johansson <olof@lixom.net>
2018-06-29Merge tag 'hisi-fixes-for-4.18' of git://github.com/hisilicon/linux-hisi ↵Olof Johansson
into fixes ARM64: hisi fixes for 4.18 - Added power capabilities for the mmc host controller on the hikey and hikey960 boards to avoid broken wifi. * tag 'hisi-fixes-for-4.18' of git://github.com/hisilicon/linux-hisi: arm64: dts: hikey960: Define wl1837 power capabilities arm64: dts: hikey: Define wl1835 power capabilities Signed-off-by: Olof Johansson <olof@lixom.net>
2018-06-29Merge tag 'amlogic-fixes' of ↵Olof Johansson
https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into fixes Amlogic fixes for v4.18-rc - minor 64-bit DT fixes * tag 'amlogic-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic: ARM64: dts: meson-gxl: fix Mali GPU compatible string ARM64: dts: meson-axg: fix ethernet stability issue ARM64: dts: meson-gx: fix ATF reserved memory region ARM64: dts: meson-gxl-s905x-p212: Add phy-supply for usb0 ARM64: dts: meson: fix register ranges for SD/eMMC ARM64: dts: meson: disable sd-uhs modes on the libretech-cc Signed-off-by: Olof Johansson <olof@lixom.net>
2018-06-29Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - The alternatives patching code uses flush_icache_range() which itself uses alternatives. Change the code to use an unpatched variant of cache maintenance - Remove unnecessary ISBs from set_{pte,pmd,pud} - perf: xgene_pmu: Fix IOB SLOW PMU parser error * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Remove unnecessary ISBs from set_{pte,pmd,pud} arm64: Avoid flush_icache_range() in alternatives patching code drivers/perf: xgene_pmu: Fix IOB SLOW PMU parser error
2018-06-29Merge branch 'i2c/for-current-fixed' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - a revert because of bugzilla #200045 (and some documentation about it) - another regression fix in the i2c-gpio driver - a leak fix for the i2c core * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: gpio: initialize SCL to HIGH again i2c: smbus: kill memory leak on emulated and failed DMA SMBus xfers i2c: algos: bit: mention our experience about initial states Revert "i2c: algo-bit: init the bus to a known state"
2018-06-29Merge tag 'ceph-for-4.18-rc3' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fix from Ilya Dryomov: "A trivial dentry leak fix from Zheng" * tag 'ceph-for-4.18-rc3' of git://github.com/ceph/ceph-client: ceph: fix dentry leak in splice_dentry()
2018-06-29Merge branch 'bpf-fixes'Alexei Starovoitov
Daniel Borkmann says: ==================== This set contains three fixes that are mostly JIT and set_memory_*() related. The third in the series in particular fixes the syzkaller bugs that were still pending; aside from local reproduction & testing, also 'syz test' wasn't able to trigger them anymore. I've tested this series on x86_64, arm64 and s390x, and kbuild bot wasn't yelling either for the rest. For details, please see patches as usual, thanks! ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-29bpf: undo prog rejection on read-only lock failureDaniel Borkmann
Partially undo commit 9facc336876f ("bpf: reject any prog that failed read-only lock") since it caused a regression, that is, syzkaller was able to manage to cause a panic via fault injection deep in set_memory_ro() path by letting an allocation fail: In x86's __change_page_attr_set_clr() it was able to change the attributes of the primary mapping but not in the alias mapping via cpa_process_alias(), so the second, inner call to the __change_page_attr() via __change_page_attr_set_clr() had to split a larger page and failed in the alloc_pages() with the artifically triggered allocation error which is then propagated down to the call site. Thus, for set_memory_ro() this means that it returned with an error, but from debugging a probe_kernel_write() revealed EFAULT on that memory since the primary mapping succeeded to get changed. Therefore the subsequent hdr->locked = 0 reset triggered the panic as it was performed on read-only memory, so call-site assumptions were infact wrong to assume that it would either succeed /or/ not succeed at all since there's no such rollback in set_memory_*() calls from partial change of mappings, in other words, we're left in a state that is "half done". A later undo via set_memory_rw() is succeeding though due to matching permissions on that part (aka due to the try_preserve_large_page() succeeding). While reproducing locally with explicitly triggering this error, the initial splitting only happens on rare occasions and in real world it would additionally need oom conditions, but that said, it could partially fail. Therefore, it is definitely wrong to bail out on set_memory_ro() error and reject the program with the set_memory_*() semantics we have today. Shouldn't have gone the extra mile since no other user in tree today infact checks for any set_memory_*() errors, e.g. neither module_enable_ro() / module_disable_ro() for module RO/NX handling which is mostly default these days nor kprobes core with alloc_insn_page() / free_insn_page() as examples that could be invoked long after bootup and original 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") did neither when it got first introduced to BPF so "improving" with bailing out was clearly not right when set_memory_*() cannot handle it today. Kees suggested that if set_memory_*() can fail, we should annotate it with __must_check, and all callers need to deal with it gracefully given those set_memory_*() markings aren't "advisory", but they're expected to actually do what they say. This might be an option worth to move forward in future but would at the same time require that set_memory_*() calls from supporting archs are guaranteed to be "atomic" in that they provide rollback if part of the range fails, once that happened, the transition from RW -> RO could be made more robust that way, while subsequent RO -> RW transition /must/ continue guaranteeing to always succeed the undo part. Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com Fixes: 9facc336876f ("bpf: reject any prog that failed read-only lock") Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-29bpf, s390: fix potential memleak when later bpf_jit_prog failsDaniel Borkmann
If we would ever fail in the bpf_jit_prog() pass that writes the actual insns to the image after we got header via bpf_jit_binary_alloc() then we also need to make sure to free it through bpf_jit_binary_free() again when bailing out. Given we had prior bpf_jit_prog() passes to initially probe for clobbered registers, program size and to fill in addrs arrray for jump targets, this is more of a theoretical one, but at least make sure this doesn't break with future changes. Fixes: 054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-29bpf, arm32: fix to use bpf_jit_binary_lock_ro apiDaniel Borkmann
Any eBPF JIT that where its underlying arch supports ARCH_HAS_SET_MEMORY would need to use bpf_jit_binary_{un,}lock_ro() pair instead of the set_memory_{ro,rw}() pair directly as otherwise changes to the former might break. arm32's eBPF conversion missed to change it, so fix this up here. Fixes: 39c13c204bb1 ("arm: eBPF JIT compiler") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-29parisc: Build kernel without -ffunction-sectionsHelge Deller
As suggested by Nick Piggin it seems we can drop the -ffunction-sections compile flag, now that the kernel uses thin archives. Testing with 32- and 64-bit kernel showed no difference in kernel size. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
2018-06-29sg: remove ->sg_magic memberJens Axboe
This was introduced more than a decade ago when sg chaining was added, but we never really caught anything with it. The scatterlist entry size can be critical, since drivers allocate it, so remove the magic member. Recently it's been triggering allocation stalls and failures in NVMe. Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-29Merge tag 'pci-v4.18-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Fix crash caused by endpoint library initialization order change (Alan Douglas) - Fix shpchp NULL pointer dereference regression on non-ACPI platforms (Bjorn Helgaas) - Move PCI_DOMAINS selection to fix build regression (Lorenzo Pieralisi) * tag 'pci-v4.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: controller: Move PCI_DOMAINS selection to arch Kconfig PCI: Initialize endpoint library before controllers PCI: shpchp: Manage SHPC unconditionally on non-ACPI systems