summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-11-29mm: introduce get_user_pages_longtermDan Williams
Patch series "introduce get_user_pages_longterm()", v2. Here is a new get_user_pages api for cases where a driver intends to keep an elevated page count indefinitely. This is distinct from usages like iov_iter_get_pages where the elevated page counts are transient. The iov_iter_get_pages cases immediately turn around and submit the pages to a device driver which will put_page when the i/o operation completes (under kernel control). In the longterm case userspace is responsible for dropping the page reference at some undefined point in the future. This is untenable for filesystem-dax case where the filesystem is in control of the lifetime of the block / page and needs reasonable limits on how long it can wait for pages in a mapping to become idle. Fixing filesystems to actually wait for dax pages to be idle before blocks from a truncate/hole-punch operation are repurposed is saved for a later patch series. Also, allowing longterm registration of dax mappings is a future patch series that introduces a "map with lease" semantic where the kernel can revoke a lease and force userspace to drop its page references. I have also tagged these for -stable to purposely break cases that might assume that longterm memory registrations for filesystem-dax mappings were supported by the kernel. The behavior regression this policy change implies is one of the reasons we maintain the "dax enabled. Warning: EXPERIMENTAL, use at your own risk" notification when mounting a filesystem in dax mode. It is worth noting the device-dax interface does not suffer the same constraints since it does not support file space management operations like hole-punch. This patch (of 4): Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented for the affected sub-systems (RDMA and V4L2). [akpm@linux-foundation.org: use kcalloc()] Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Christoph Hellwig <hch@lst.de> Cc: Doug Ledford <dledford@redhat.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Joonyoung Shim <jy0922.shim@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Seung-Woo Kim <sw0312.kim@samsung.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29device-dax: implement ->split() to catch invalid munmap attemptsDan Williams
Similar to how device-dax enforces that the 'address', 'offset', and 'len' parameters to mmap() be aligned to the device's fundamental alignment, the same constraints apply to munmap(). Implement ->split() to fail munmap calls that violate the alignment constraint. Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path with crash signatures of the form: vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000 next (null) prev (null) mm ffff8800b61150c0 prot 8000000000000027 anon_vma (null) vm_ops ffffffffa0091240 pgoff 0 file ffff8800b638ef80 private_data (null) flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage) ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:2014! [..] RIP: 0010:__split_huge_pud+0x12a/0x180 [..] Call Trace: unmap_page_range+0x245/0xa40 ? __vma_adjust+0x301/0x990 unmap_vmas+0x4c/0xa0 unmap_region+0xae/0x120 ? __vma_rb_erase+0x11a/0x230 do_munmap+0x276/0x410 vm_munmap+0x6a/0xa0 SyS_munmap+0x1d/0x30 Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm, hugetlbfs: introduce ->split() to vm_operations_structDan Williams
Patch series "device-dax: fix unaligned munmap handling" When device-dax is operating in huge-page mode we want it to behave like hugetlbfs and fail attempts to split vmas into unaligned ranges. It would be messy to teach the munmap path about device-dax alignment constraints in the same (hstate) way that hugetlbfs communicates this constraint. Instead, these patches introduce a new ->split() vm operation. This patch (of 2): The device-dax interface has similar constraints as hugetlbfs in that it requires the munmap path to unmap in huge page aligned units. Rather than add more custom vma handling code in __split_vma() introduce a new vm operation to perform this vma specific check. Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29scripts/faddr2line: extend usage on generic archLiu, Changcheng
When cross-compiling, fadd2line should use the binary tool used for the target system, rather than that of the host. Link: http://lkml.kernel.org/r/20171121092911.GA150711@sofia Signed-off-by: Liu Changcheng <changcheng.liu@intel.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: NeilBrown <neilb@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: replace pte_write with pte_access_permitted in fault + gup pathsDan Williams
The 'access_permitted' helper is used in the gup-fast path and goes beyond the simple _PAGE_RW check to also: - validate that the mapping is writable from a protection keys standpoint - validate that the pte has _PAGE_USER set since all fault paths where pte_write is must be referencing user-memory. Link: http://lkml.kernel.org/r/151043111604.2842.8051684481794973100.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: replace pmd_write with pmd_access_permitted in fault + gup pathsDan Williams
The 'access_permitted' helper is used in the gup-fast path and goes beyond the simple _PAGE_RW check to also: - validate that the mapping is writable from a protection keys standpoint - validate that the pte has _PAGE_USER set since all fault paths where pmd_write is must be referencing user-memory. Link: http://lkml.kernel.org/r/151043111049.2842.15241454964150083466.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: replace pud_write with pud_access_permitted in fault + gup pathsDan Williams
The 'access_permitted' helper is used in the gup-fast path and goes beyond the simple _PAGE_RW check to also: - validate that the mapping is writable from a protection keys standpoint - validate that the pte has _PAGE_USER set since all fault paths where pud_write is must be referencing user-memory. [dan.j.williams@intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129127237.37405.16073414520854722485.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151043110453.2842.2166049702068628177.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITEDan Williams
In response to compile breakage introduced by a series that added the pud_write helper to x86, Stephen notes: did you consider using the other paradigm: In arch include files: #define pud_write pud_write static inline int pud_write(pud_t pud) ..... Then in include/asm-generic/pgtable.h: #ifndef pud_write tatic inline int pud_write(pud_t pud) { .... } #endif If you had, then the powerpc code would have worked ... ;-) and many of the other interfaces in include/asm-generic/pgtable.h are protected that way ... Given that some architecture already define pmd_write() as a macro, it's a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE. Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm: fix device-dax pud write-faults triggered by get_user_pages()Dan Williams
Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 For now this just implements a simple check for the _PAGE_RW bit similar to pmd_write. However, this implies that the gup-slow-path check is missing the extra checks that the gup-fast-path performs with pud_access_permitted. Later patches will align all checks to use the 'access_permitted' helper if the architecture provides it. Note that the generic 'access_permitted' helper fallback is the simple _PAGE_RW check on architectures that do not define the 'access_permitted' helper(s). [dan.j.williams@intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm/cma: fix alloc_contig_range ret code/potential leakMike Kravetz
If the call __alloc_contig_migrate_range() in alloc_contig_range returns -EBUSY, processing continues so that test_pages_isolated() is called where there is a tracepoint to identify the busy pages. However, it is possible for busy pages to become available between the calls to these two routines. In this case, the range of pages may be allocated. Unfortunately, the original return code (ret == -EBUSY) is still set and returned to the caller. Therefore, the caller believes the pages were not allocated and they are leaked. Update the comment to indicate that allocation is still possible even if __alloc_contig_migrate_range returns -EBUSY. Also, clear return code in this case so that it is not accidentally used or returned to caller. Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm, oom_reaper: gather each vma to prevent leaking TLB entryWang Nan
tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory space. In this case, tlb->fullmm is true. Some archs like arm64 doesn't flush TLB when tlb->fullmm is true: commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1"). Which causes leaking of tlb entries. Will clarifies his patch: "Basically, we tag each address space with an ASID (PCID on x86) which is resident in the TLB. This means we can elide TLB invalidation when pulling down a full mm because we won't ever assign that ASID to another mm without doing TLB invalidation elsewhere (which actually just nukes the whole TLB). I think that means that we could potentially not fault on a kernel uaccess, because we could hit in the TLB" There could be a window between complete_signal() sending IPI to other cores and all threads sharing this mm are really kicked off from cores. In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to flush TLB then frees pages. However, due to the above problem, the TLB entries are not really flushed on arm64. Other threads are possible to access these pages through TLB entries. Moreover, a copy_to_user() can also write to these pages without generating page fault, causes use-after-free bugs. This patch gathers each vma instead of gathering full vm space. In this case tlb->fullmm is not true. The behavior of oom reaper become similar to munmapping before do_exit, which should be safe for all archs. Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com Fixes: aac453635549 ("mm, oom: introduce oom reaper") Signed-off-by: Wang Nan <wangnan0@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Bob Liu <liubo95@huawei.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29mm, memory_hotplug: do not back off draining pcp free pages from kworker contextMichal Hocko
drain_all_pages backs off when called from a kworker context since commit 0ccce3b92421 ("mm, page_alloc: drain per-cpu pages from workqueue context") because the original IPI based pcp draining has been replaced by a WQ based one and the check wanted to prevent from recursion and inter workers dependencies. This has made some sense at the time because the system WQ has been used and one worker holding the lock could be blocked while waiting for new workers to emerge which can be a problem under OOM conditions. Since then commit ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a rescuer so we shouldn't depend on any other WQ activity to make a forward progress so calling drain_all_pages from a worker context is safe as long as this doesn't happen from mm_percpu_wq itself which is not the case because all workers are required to _not_ depend on any MM locks. Why is this a problem in the first place? ACPI driven memory hot-remove (acpi_device_hotplug) is executed from the worker context. We end up calling __offline_pages to free all the pages and that requires both lru_add_drain_all_cpuslocked and drain_all_pages to do their job otherwise we can have dangling pages on pcp lists and fail the offline operation (__test_page_isolated_in_pageblock would see a page with 0 ref count but without PageBuddy set). Fix the issue by removing the worker check in drain_all_pages. lru_add_drain_all_cpuslocked doesn't have this restriction so it works as expected. Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context") Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29Merge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "I screwed up my merge window pull request; I only sent half of what I meant to. There were no new features, just bugfixes of various importance and some very minor cleanup, so I think it's all still appropriate for -rc2. Highlights: - Fixes from Trond for some races in the NFSv4 state code. - Fix from Naofumi Honda for a typo in the blocked lock notificiation code - Fixes from Vasily Averin for some problems starting and stopping lockd especially in network namespaces" * tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits) lockd: fix "list_add double add" caused by legacy signal interface nlm_shutdown_hosts_net() cleanup race of nfsd inetaddr notifiers vs nn->nfsd_serv change race of lockd inetaddr notifiers vs nlmsvc_rqst change SUNRPC: make cache_detail structures const NFSD: make cache_detail structures const sunrpc: make the function arg as const nfsd: check for use of the closed special stateid nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat lockd: lost rollback of set_grace_period() in lockd_down_net() lockd: added cleanup checks in exit_net hook grace: replace BUG_ON by WARN_ONCE in exit_net hook nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class lockd: remove net pointer from messages nfsd: remove net pointer from debug messages nfsd: Fix races with check_stateid_generation() nfsd: Ensure we check stateid validity in the seqid operation checks nfsd: Fix race in lock stateid creation nfsd4: move find_lock_stateid nfsd: Ensure we don't recognise lock stateids after freeing them ...
2017-11-29Merge tag 'for-4.15-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We've collected some fixes in since the pre-merge window freeze. There's technically only one regression fix for 4.15, but the rest seems important and candidates for stable. - fix missing flush bio puts in error cases (is serious, but rarely happens) - fix reporting stat::st_blocks for buffered append writes - fix space cache invalidation - fix out of bound memory access when setting zlib level - fix potential memory corruption when fsync fails in the middle - fix crash in integrity checker - incremetnal send fix, path mixup for certain unlink/rename combination - pass flags to writeback so compressed writes can be throttled properly - error handling fixes" * tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: incremental send, fix wrong unlink path after renaming file btrfs: tree-checker: Fix false panic for sanity test Btrfs: fix list_add corruption and soft lockups in fsync btrfs: Fix wild memory access in compression level parser btrfs: fix deadlock when writing out space cache btrfs: clear space cache inode generation always Btrfs: fix reported number of inode blocks after buffered append writes Btrfs: move definition of the function btrfs_find_new_delalloc_bytes Btrfs: bail out gracefully rather than BUG_ON btrfs: dev_alloc_list is not protected by RCU, use normal list_del btrfs: add missing device::flush_bio puts btrfs: Fix transaction abort during failure in btrfs_rm_dev_item Btrfs: add write_flags for compression bio
2017-11-29Merge tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds
Pull Microblaze fix from Michal Simek: "Add missing header to mmu_context_mm.h" * tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: add missing include to mmu_context_mm.h
2017-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc fix from David Miller: "Sparc T4 and later cpu bootup regression fix" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix boot on T4 and later.
2017-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) The forcedeth conversion from pci_*() DMA interfaces to dma_*() ones missed one spot. From Zhu Yanjun. 2) Missing CRYPTO_SHA256 Kconfig dep in cfg80211, from Johannes Berg. 3) Fix checksum offloading in thunderx driver, from Sunil Goutham. 4) Add SPDX to vm_sockets_diag.h, from Stephen Hemminger. 5) Fix use after free of packet headers in TIPC, from Jon Maloy. 6) "sizeof(ptr)" vs "sizeof(*ptr)" bug in i40e, from Gustavo A R Silva. 7) Tunneling fixes in mlxsw driver, from Petr Machata. 8) Fix crash in fanout_demux_rollover() of AF_PACKET, from Mike Maloney. 9) Fix race in AF_PACKET bind() vs. NETDEV_UP notifier, from Eric Dumazet. 10) Fix regression in sch_sfq.c due to one of the timer_setup() conversions. From Paolo Abeni. 11) SCTP does list_for_each_entry() using wrong struct member, fix from Xin Long. 12) Don't use big endian netlink attribute read for IFLA_BOND_AD_ACTOR_SYSTEM, it is in cpu endianness. Also from Xin Long. 13) Fix mis-initialization of q->link.clock in CBQ scheduler, preventing adding filters there. From Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) ethernet: dwmac-stm32: Fix copyright net: via: via-rhine: use %p to format void * address instead of %x net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit myri10ge: Update MAINTAINERS net: sched: cbq: create block for q->link.block atm: suni: remove extraneous space to fix indentation atm: lanai: use %p to format kernel addresses instead of %x VSOCK: Don't set sk_state to TCP_CLOSE before testing it atm: fore200e: use %pK to format kernel addresses instead of %x ambassador: fix incorrect indentation of assignment statement vxlan: use __be32 type for the param vni in __vxlan_fdb_delete bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM sctp: use right member as the param of list_for_each_entry sch_sfq: fix null pointer dereference at timer expiration cls_bpf: don't decrement net's refcount when offload fails net/packet: fix a race in packet_bind() and packet_notifier() packet: fix crash in fanout_demux_rollover() sctp: remove extern from stream sched sctp: force the params with right types for sctp csum apis sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail ...
2017-11-29sparc64: Fix boot on T4 and later.David S. Miller
If we don't put the NG4fls.o object into the same part of the link as the generic sparc64 objects for fls() and __fls() then the relocation in the branch we use for patching will not fit. Move NG4fls.o into lib-y to fix this problem. Fixes: 46ad8d2d22c1 ("sparc64: Use sparc optimized fls and __fls for T4 and above") Signed-off-by: David S. Miller <davem@davemloft.net> Reported-by: Anatoly Pugachev <matorola@gmail.com> Tested-by: Anatoly Pugachev <matorola@gmail.com>
2017-11-29drm/radeon: remove init of CIK VMIDs 8-16 for amdkfdOded Gabbay
VMIDs 8-16 in Kaveri were reserved for use by the amdkfd driver. Because we removed amdkfd support from radeon, those VMIDs are now used by radeon and are initialized by radeon. This patch removes the function that initialized those VMIDs for amdkfd use. This initialization overridden the radeon initialization and caused GPU faults and GUI crashed. Fixes: f4fa88ab28ab ("drm/radeon: deprecate and remove KFD interface") Rported-by: Michel Dänzer <michel.daenzer@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-11-29drm/ttm: fix populate_and_map() functions once moreChristian König
This reverts "drm/ttm: Fix configuration error around populate_and_map() functions". This fix has gone into the wrong direction. Those helpers should be available even when neither CONFIG_INTEL_IOMMU nor CONFIG_SWIOTLB are set. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-11-29vsprintf: don't use 'restricted_pointer()' when not restrictingLinus Torvalds
Instead, just fall back on the new '%p' behavior which hashes the pointer. Otherwise, '%pK' - that was intended to mark a pointer as restricted - just ends up leaking pointers that a normal '%p' wouldn't leak. Which just make the whole thing pointless. I suspect we should actually get rid of '%pK' entirely, and make it just work as '%p' regardless, but this is the minimal obvious fix. People who actually use 'kptr_restrict' should weigh in on which behavior they want. Cc: Tobin Harding <me@tobin.cc> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29SUNRPC: Allow connect to return EHOSTUNREACHTrond Myklebust
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-29NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"Trond Myklebust
gcc 4.4.4 is too old to have full C11 anonymous union support, so the current initialiser fails to compile. Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> (compile-)Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-11-29kallsyms: take advantage of the new '%px' formatLinus Torvalds
The conditional kallsym hex printing used a special fixed-width '%lx' output (KALLSYM_FMT) in preparation for the hashing of %p, but that series ended up adding a %px specifier to help with the conversions. Use it, and avoid the "print pointer as an unsigned long" code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29Merge tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linuxLinus Torvalds
Pull printk pointer hashing update from Tobin Harding: "Here is the patch set that implements hashing of printk specifier %p. First we have two clean up patches then we do the hashing. Hashing is done via the SipHash algorithm. The next patch adds printk specifier %px for printing pointers when we _really_ want to see the address i.e %px is functionally equivalent to %lx. Final patch in the set fixes KASAN since we break it by hashing %p. For the record here is the justification for the series: Currently there exist approximately 14 000 places in the Kernel where addresses are being printed using an unadorned %p. This potentially leaks sensitive information about the Kernel layout in memory. Many of these calls are stale, instead of fixing every call we hash the address by default before printing. We then add %px to provide a way to print the actual address. Although this is achievable using %lx, using %px will assist us if we ever want to change pointer printing behaviour. %px is more uniquely grep'able (there are already >50 000 uses of %lx). The added advantage of hashing %p is that security is now opt-out, if you _really_ want the address you have to work a little harder and use %px. This will of course break some users, forcing code printing needed addresses to be updated" [ I do expect this to be an annoyance, and a number of %px users to be added for debuggability. But nobody is willing to audit existing %p users for information leaks, and a number of places really only use the pointer as an object identifier rather than really 'I need the address'. IOW - sorry for the inconvenience, but it's the least inconvenient of the options. - Linus ] * tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linux: kasan: use %px to print addresses instead of %p vsprintf: add printk specifier %px printk: hash addresses printed with %p vsprintf: refactor %pK code out of pointer() docs: correct documentation for %pK
2017-11-29Revert "mm, thp: Do not make pmd/pud dirty without a reason"Linus Torvalds
This reverts commit 152e93af3cfe2d29d8136cc0a02a8612507136ee. It was a nice cleanup in theory, but as Nicolai Stange points out, we do need to make the page dirty for the copy-on-write case even when we didn't end up making it writable, since the dirty bit is what we use to check that we've gone through a COW cycle. Reported-by: Michal Hocko <mhocko@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29Merge branch 'nvme-4.15' of git://git.infradead.org/nvme into for-linusJens Axboe
Pull NVMe fixes from Christoph: "A few more nvme updates for 4.15. A single small PCIe fix, and a number of patches for RDMA that are a little larger than what I'd like to see for -rc2, but they fix important issues seen in the wild."
2017-11-29quota: Check for register_shrinker() failure.Tetsuo Handa
register_shrinker() might return -ENOMEM error since Linux 3.12. Call panic() as with other failure checks in this function if register_shrinker() failed. Fixes: 1d3d4437eae1 ("vmscan: per-node deferred work") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Jan Kara <jack@suse.com> Cc: Michal Hocko <mhocko@suse.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-11-29ethernet: dwmac-stm32: Fix copyrightBenjamin Gaignard
Uniformize STMicroelectronics copyrights header Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com> CC: Alexandre Torgue <alexandre.torgue@st.com> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-29eeprom: at24: check at24_read/write argumentsHeiner Kallweit
So far we completely rely on the caller to provide valid arguments. To be on the safe side perform an own sanity check. Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2017-11-29eeprom: at24: fix reading from 24MAC402/24MAC602Heiner Kallweit
Chip datasheet mentions that word addresses other than the actual start position of the MAC delivers undefined results. So fix this. Current implementation doesn't work due to this wrong offset. Cc: stable@vger.kernel.org Fixes: 0b813658c115 ("eeprom: at24: add support for at24mac series") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2017-11-29net: via: via-rhine: use %p to format void * address instead of %xColin Ian King
Don't use %x and casting to print out an address, instead use %p and remove the casting. Cleans up smatch warnings: drivers/net/ethernet/via/via-rhine.c:998 rhine_init_one_common() warn: argument 4 to %lx specifier is cast from pointer Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-29net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bitGeert Uytterhoeven
On 64-bit (e.g. powerpc64/allmodconfig): drivers/net/ethernet/xilinx/ll_temac_main.c: In function 'temac_start_xmit_done': drivers/net/ethernet/xilinx/ll_temac_main.c:633:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] dev_kfree_skb_irq((struct sk_buff *)cur_p->app4); ^ cdmac_bd.app4 is u32, so it is too small to hold a kernel pointer. Note that several other fields in struct cdmac_bd are also too small to hold physical addresses on 64-bit platforms. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-29drm/fb_helper: Disable all crtc's when initial setup fails.Maarten Lankhorst
Some drivers like i915 start with crtc's enabled, but with deferred fbcon setup they were no longer disabled as part of fbdev setup. Headless units could no longer enter pc3 state because the crtc was still enabled. Fix this by calling restore_fbdev_mode when we would have called it otherwise once during initial fbdev setup. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Fixes: ca91a2758fce ("drm/fb-helper: Support deferred setup") Cc: <stable@vger.kernel.org> # v4.14+ Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20171128111603.62757-1-maarten.lankhorst@linux.intel.com
2017-11-29myri10ge: Update MAINTAINERSHyong-Youb Kim
Change the maintainer to Chris Lee who has access to Myricom hardware and can test/review. Update the website URL. Signed-off-by: Hyong-Youb Kim <hykim@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-29MAINTAINERS: change maintainer for Rockchip drm driversMark Yao
For personal reasons, Mark Yao will leave rockchip, can not continue maintain drm/rockchip, Sandy Huang and Heiko Stübner will take over drm/rockchip. Cc: Sandy Huang <hjc@rock-chips.com> Cc: Heiko Stübner <heiko@sntech.de> Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Acked-by: Heiko Stuebner <heiko@sntech.de> [seanpaul added Heiko] Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20171129142459.32351-1-seanpaul@chromium.org
2017-11-29eeprom: at24: correctly set the size for at24mac402Bartosz Golaszewski
There's an ilog2() expansion in AT24_DEVICE_MAGIC() which rounds down the actual size of EUI-48 byte array in at24mac402 eeproms to 4 from 6, making it impossible to read it all. Fix it by manually adjusting the value in probe(). This patch contains a temporary fix that is suitable for stable branches. Eventually we'll probably remove the call to ilog2() while converting the magic values to actual structs. Cc: stable@vger.kernel.org Fixes: 0b813658c115 ("eeprom: at24: add support for at24mac series") Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2017-11-29drm/atomic: make drm_atomic_helper_wait_for_vblanks more agressiveLucas Stach
drm_atomic_helper_setup_commit expects that flipping of previous commits has happened when it is called to set up a new commit. This can be violated by commits where userspace doesn't get a flip completion event to synchronize against i.e. legacy modesets and property changes. The expectation is that those are done by blocking commits, which wait for completion. Most drivers call drm_atomic_helper_wait_for_vblanks in the commit_tail to ensure completion, but the wait for next vblank might not actually happen if the commit didn't change any planes. Make the wait more agressive by also waiting if no planes changed. This is the minimal regression fix for the 4.15 kernel series. Long term drivers should switch away from drm_atomic_helper_wait_for_vblanks and use drm_atomic_helper_wait_for_flip_done instead. Fixes: de39bec1a0c4 ("drm/atomic: Remove waits in drm_atomic_helper_commit_cleanup_done, v2.") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171129110431.6300-1-l.stach@pengutronix.de
2017-11-29drm/vblank: Fix vblank timestamp debugsVille Syrjälä
We're currently calling ktime_to_timespec64() on stack garbage hence the debug output for vblank timestamps also contains garbage. Let's assing something to the ktime_t first before we go converting it to a timespec. While at it micro-optimize the ktime_to_timespec64() calls away when vblank debugging isn't enabled. Fixes: 67680d3c0464 ("drm: vblank: use ktime_t instead of timeval") Cc: Arnd Bergmann <arnd@arndb.de> Cc: Keith Packard <keithp@keithp.com> Cc: Sean Paul <seanpaul@chromium.org> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171113150210.11311-1-ville.syrjala@linux.intel.com Acked-by: Arnd Bergmann <arnd@arndb.de>
2017-11-29mmc: core: prepend 0x to OCR entry in sysfsBastian Stender
The sysfs entry "ocr" was missing the 0x prefix to identify it as hex formatted. Fixes: 5fb06af7a33b ("mmc: core: Extend sysfs with OCR register") Signed-off-by: Bastian Stender <bst@pengutronix.de> Cc: <stable@vger.kernel.org> # v4.8+ [Ulf: Amended change to also cover SD-cards] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-11-29mmc: core: prepend 0x to pre_eol_info entry in sysfsBastian Stender
The sysfs entry "pre_eol_info" was missing the 0x prefix to identify it as hex formatted. Fixes: 46bc5c408e4e ("mmc: core: Export device lifetime information through sysfs") Signed-off-by: Bastian Stender <bst@pengutronix.de> Cc: <stable@vger.kernel.org> # v4.11+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-11-29powerpc: Do not assign thread.tidr if already assignedVaibhav Jain
If set_thread_tidr() is called twice for same task_struct then it will allocate a new tidr value to it leaving the previous value still dangling in the vas_thread_ida table. To fix this the patch changes set_thread_tidr() to check if a tidr value is already assigned to the task_struct and if yes then returns zero. Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR") Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> [mpe: Modify to return 0 in the success case, not the TID value] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-29powerpc: Avoid signed to unsigned conversion in set_thread_tidr()Vaibhav Jain
There is an unsafe signed to unsigned conversion in set_thread_tidr() that may cause an error value to be assigned to SPRN_TIDR register and used as thread-id. The issue happens as assign_thread_tidr() returns an int and thread.tidr is an unsigned-long. So a negative error code returned from assign_thread_tidr() will fail the error check and gets assigned as tidr as a large positive value. To fix this the patch assigns the return value of assign_thread_tidr() to a temporary int and assigns it to thread.tidr iff its '> 0'. The patch shouldn't impact the calling convention of set_thread_tidr() i.e all -ve return-values are error codes and a return value of '0' indicates success. Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR") Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Christophe Lombard clombard@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-29kasan: use %px to print addresses instead of %pTobin C. Harding
Pointers printed with %p are now hashed by default. Kasan needs the actual address. We can use the new printk specifier %px for this purpose. Use %px instead of %p to print addresses. Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-29vsprintf: add printk specifier %pxTobin C. Harding
printk specifier %p now hashes all addresses before printing. Sometimes we need to see the actual unmodified address. This can be achieved using %lx but then we face the risk that if in future we want to change the way the Kernel handles printing of pointers we will have to grep through the already existent 50 000 %lx call sites. Let's add specifier %px as a clear, opt-in, way to print a pointer and maintain some level of isolation from all the other hex integer output within the Kernel. Add printk specifier %px to print the actual unmodified address. Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-29printk: hash addresses printed with %pTobin C. Harding
Currently there exist approximately 14 000 places in the kernel where addresses are being printed using an unadorned %p. This potentially leaks sensitive information regarding the Kernel layout in memory. Many of these calls are stale, instead of fixing every call lets hash the address by default before printing. This will of course break some users, forcing code printing needed addresses to be updated. Code that _really_ needs the address will soon be able to use the new printk specifier %px to print the address. For what it's worth, usage of unadorned %p can be broken down as follows (thanks to Joe Perches). $ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c 1084 arch 20 block 10 crypto 32 Documentation 8121 drivers 1221 fs 143 include 101 kernel 69 lib 100 mm 1510 net 40 samples 7 scripts 11 security 166 sound 152 tools 2 virt Add function ptr_to_id() to map an address to a 32 bit unique identifier. Hash any unadorned usage of specifier %p and any malformed specifiers. Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-29vsprintf: refactor %pK code out of pointer()Tobin C. Harding
Currently code to handle %pK is all within the switch statement in pointer(). This is the wrong level of abstraction. Each of the other switch clauses call a helper function, pK should do the same. Refactor code out of pointer() to new function restricted_pointer(). Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-29docs: correct documentation for %pKTobin C. Harding
Current documentation indicates that %pK prints a leading '0x'. This is not the case. Correct documentation for printk specifier %pK. Signed-off-by: Tobin C. Harding <me@tobin.cc>
2017-11-28Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - avoid potential bogus alignment for some AEAD operations - fix crash in algif_aead - avoid sleeping in softirq context with async af_alg * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: skcipher - Fix skcipher_walk_aead_common crypto: af_alg - remove locking in async callback crypto: algif_aead - skip SGL entries with NULL page
2017-11-28drm/amd/display: USB-C / thunderbolt dock specific workaroundHersen Wu
reading dpcd 0x600 cause link loss for a particular USB-C dock with thurderbolt. workaround by avoiding dcpd 0x600 read unless it's necessary. Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Reviewed-by: Tony Cheng <Tony.Cheng@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>