summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-07-24net: switch copy_bpf_fprog_from_user to sockptr_tChristoph Hellwig
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net: add a new sockptr_t typeChristoph Hellwig
Add a uptr_t type that can hold a pointer to either a user or kernel memory region, and simply helpers to copy to and from it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24bpfilter: reject kernel addressesChristoph Hellwig
The bpfilter user mode helper processes the optval address using process_vm_readv. Don't send it kernel addresses fed under set_fs(KERNEL_DS) as that won't work. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net/bpfilter: split __bpfilter_process_sockoptChristoph Hellwig
Split __bpfilter_process_sockopt into a low-level send request routine and the actual setsockopt hook to split the init time ping from the actual setsockopt processing. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24bpfilter: fix up a sparse annotationChristoph Hellwig
The __user doesn't make sense when casting to an integer type, just switch to a uintptr_t cast which also removes the need for the __force. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24Merge branch 'TC-datapath-hash-api'David S. Miller
Ariel Levkovich says: ==================== TC datapath hash api Hash based packet classification allows user to set up rules that provide load balancing of traffic across multiple vports and for ECMP path selection while keeping the number of rule at minimum. Instead of matching on exact flow spec, which requires a rule per flow, user can define rules based on a their hash value and distribute the flows to different buckets. The number of rules in this case will be constant and equal to the number of buckets. The series introduces an extention to the cls flower classifier and allows user to add rules that match on the hash value that is stored in skb->hash while assuming the value was set prior to the classification. Setting the skb->hash can be done in various ways and is not defined in this series - for example: 1. By the device driver upon processing an rx packet. 2. Using tc action bpf with a program which computes and sets the skb->hash value. $ tc filter add dev ens1f0_0 ingress \ prio 1 chain 2 proto ip \ flower hash 0x0/0xf \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_0 ingress \ prio 1 chain 2 proto ip \ flower hash 0x1/0xf \ action mirred egress redirect dev ens1f0_2 v3 -> v4: *Drop hash setting code leaving only the classidication parts. Setting the hash will be possible via existing tc action bpf. v2 -> v3: *Split hash algorithm option into 2 different actions. Asym_l4 available via act_skbedit and bpf via new act_hash. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net/sched: cls_flower: Add hash info to flow classificationAriel Levkovich
Adding new cls flower keys for hash value and hash mask and dissect the hash info from the skb into the flow key towards flow classication. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net/flow_dissector: add packet hash dissectionAriel Levkovich
Retreive a hash value from the SKB and store it in the dissector key for future matching. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net: hyperv: dump TX indirection table to ethtool regsChi Song
An imbalanced TX indirection table causes netvsc to have low performance. This table is created and managed during runtime. To help better diagnose performance issues caused by imbalanced tables, it needs make TX indirection tables visible. Because TX indirection table is driver specified information, so display it via ethtool register dump. Signed-off-by: Chi Song <chisong@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24flow_offload: Move rhashtable inclusion to the source fileHerbert Xu
I noticed that touching linux/rhashtable.h causes lib/vsprintf.c to be rebuilt. This dependency came through a bogus inclusion in the file net/flow_offload.h. This patch moves it to the right place. This patch also removes a lingering rhashtable inclusion in cls_api created by the same commit. Fixes: 4e481908c51b ("flow_offload: move tc indirect block to...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24Merge branch 'akpm' into master (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "Subsystems affected by this patch series: mm/pagemap, mm/shmem, mm/hotfixes, mm/memcg, mm/hugetlb, mailmap, squashfs, scripts, io-mapping, MAINTAINERS, and gdb" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: scripts/gdb: fix lx-symbols 'gdb.error' while loading modules MAINTAINERS: add KCOV section io-mapping: indicate mapping failure scripts/decode_stacktrace: strip basepath from all paths squashfs: fix length field overlap check in metadata reading mailmap: add entry for Mike Rapoport khugepaged: fix null-pointer dereference due to race mm/hugetlb: avoid hardcoding while checking if cma is enabled mm: memcg/slab: fix memory leak at non-root kmem_cache destroy mm/memcg: fix refcount error while moving and swapping mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages() mm: initialize return of vm_insert_pages vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right way mm/mmap.c: close race between munmap() and expand_upwards()/downwards()
2020-07-24Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into master Pull xtensa csum regression fix from Al Viro: "Max Filippov caught a breakage introduced in xtensa this cycle by the csum_and_copy_..._user() series. Cut'n'paste from the wrong source - the check that belongs in csum_and_copy_to_user() ended up both there and in csum_and_copy_from_user()" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xtensa: fix access check in csum_and_copy_from_user
2020-07-24Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into master Pull arm64 fix from Will Deacon: "Fix compat vDSO build flags for recent versions of clang to tell it where to find the assembler" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: vdso32: Fix '--prefix=' value for newer versions of clang
2020-07-24Merge tag 'for-5.8-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into master Pull btrfs fixes from David Sterba: "A few resouce leak fixes from recent patches, all are stable material. The problems have been observed during testing or have a reproducer" * tag 'for-5.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix mount failure caused by race with umount btrfs: fix page leaks after failure to lock page for delalloc btrfs: qgroup: fix data leak caused by race between writeback and truncate btrfs: fix double free on ulist after backref resolution failure
2020-07-24Merge tag 'zonefs-5.8-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs into master Pull zonefs fixes from Damien Le Moal: "Two fixes, the first one to remove compilation warnings and the second to avoid potentially inefficient allocation of BIOs for direct writes into sequential zones" * tag 'zonefs-5.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: count pages after truncating the iterator zonefs: Fix compilation warning
2020-07-24Merge tag 'io_uring-5.8-2020-07-24' of git://git.kernel.dk/linux-block into ↵Linus Torvalds
master Pull io_uring fixes from Jens Axboe: - Fix discrepancy in how sqe->flags are treated for a few requests, this makes it consistent (Daniele) - Ensure that poll driven retry works with double waitqueue poll users - Fix a missing io_req_init_async() (Pavel) * tag 'io_uring-5.8-2020-07-24' of git://git.kernel.dk/linux-block: io_uring: missed req_init_async() for IOSQE_ASYNC io_uring: always allow drain/link/hardlink/async sqe flags io_uring: ensure double poll additions work with both request types
2020-07-24Merge tag 'iommu-fix-v5.8-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu into master Pull iommu fix from Joerg Roedel: "Fix a NULL-ptr dereference in the QCOM IOMMU driver" * tag 'iommu-fix-v5.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/qcom: Use domain rather than dev as tlb cookie
2020-07-24Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma into master Pull rdma fixes from Jason Gunthorpe: "One merge window regression, some corruption bugs in HNS and a few more syzkaller fixes: - Two long standing syzkaller races - Fix incorrect HW configuration in HNS - Restore accidentally dropped locking in IB CM - Fix ODP prefetch bug added in the big rework several versions ago" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/mlx5: Prevent prefetch from racing with implicit destruction RDMA/cm: Protect access to remote_sidr_table RDMA/core: Fix race in rdma_alloc_commit_uobject() RDMA/hns: Fix wrong PBL offset when VA is not aligned to PAGE_SIZE RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC RDMA/mlx5: Use xa_lock_irq when access to SRQ table
2020-07-24Merge tag 'for-5.8/dm-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm into master Pull device mapper fix from Mike Snitzer: "A stable fix for DM integrity target's integrity recalculation that gets skipped when resuming a device. This is a fix for a previous stable@ fix" * tag 'for-5.8/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm integrity: fix integrity recalculation that is improperly skipped
2020-07-24Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into master Pull i2c fixes from Wolfram Sang: "Again some driver bugfixes and some documentation fixes" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: i2c-qcom-geni: Fix DMA transfer race i2c: rcar: always clear ICSAR to avoid side effects MAINTAINERS: i2c: at91: handover maintenance to Codrin Ciubotariu i2c: drop duplicated word in the header file i2c: cadence: Clear HOLD bit at correct time in Rx path Revert "i2c: cadence: Fix the hold bit setting"
2020-07-24Merge tag 'mmc-v5.8-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc into master Pull MMC fix from Ulf Hansson: "Fix clock divider calculation in the ASPEED SDHCI controller" * tag 'mmc-v5.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-of-aspeed: Fix clock divider calculation
2020-07-24Merge tag 'drm-fixes-2020-07-24' of git://anongit.freedesktop.org/drm/drm ↵Linus Torvalds
into master Pull drm fixes from Dave Airlie: "Quiet fixes, I may have a single regression fix follow up to this for nouveau, but it might be next week, Ben was testing it a bit more . Otherwise two amdgpu fixes, one lima and one sun4i: amdgpu: - Fix crash when overclocking VegaM - Fix possible crash when editing dpm levels sun4i: - Fix inverted HPD result; fixes an earlier fix lima: - fix timeout during reset" * tag 'drm-fixes-2020-07-24' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: Fix NULL dereference in dpm sysfs handlers drm/amd/powerplay: fix a crash when overclocking Vega M drm/lima: fix wait pp reset timeout drm: sun4i: hdmi: Fix inverted HPD result
2020-07-24scripts/gdb: fix lx-symbols 'gdb.error' while loading modulesStefano Garzarella
Commit ed66f991bb19 ("module: Refactor section attr into bin attribute") removed the 'name' field from 'struct module_sect_attr' triggering the following error when invoking lx-symbols: (gdb) lx-symbols loading vmlinux scanning for modules in linux/build loading @0xffffffffc014f000: linux/build/drivers/net/tun.ko Python Exception <class 'gdb.error'> There is no member named name.: Error occurred in Python: There is no member named name. This patch fixes the issue taking the module name from the 'struct attribute'. Fixes: ed66f991bb19 ("module: Refactor section attr into bin attribute") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Kieran Bingham <kbingham@kernel.org> Link: http://lkml.kernel.org/r/20200722102239.313231-1-sgarzare@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24MAINTAINERS: add KCOV sectionAndrey Konovalov
To link KCOV to the kasan-dev@ mailing list. Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Link: http://lkml.kernel.org/r/5fa344db7ac4af2213049e5656c0f43d6ecaa379.1595331682.git.andreyknvl@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24io-mapping: indicate mapping failureMichael J. Ruhl
The !ATOMIC_IOMAP version of io_maping_init_wc will always return success, even when the ioremap fails. Since the ATOMIC_IOMAP version returns NULL when the init fails, and callers check for a NULL return on error this is unexpected. During a device probe, where the ioremap failed, a crash can look like this: BUG: unable to handle page fault for address: 0000000000210000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page Oops: 0002 [#1] PREEMPT SMP CPU: 0 PID: 177 Comm: RIP: 0010:fill_page_dma [i915] gen8_ppgtt_create [i915] i915_ppgtt_create [i915] intel_gt_init [i915] i915_gem_init [i915] i915_driver_probe [i915] pci_device_probe really_probe driver_probe_device The remap failure occurred much earlier in the probe. If it had been propagated, the driver would have exited with an error. Return NULL on ioremap failure. [akpm@linux-foundation.org: detect ioremap_wc() errors earlier] Fixes: cafaf14a5d8f ("io-mapping: Always create a struct to hold metadata about the io-mapping") Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200721171936.81563-1-michael.j.ruhl@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24scripts/decode_stacktrace: strip basepath from all pathsPi-Hsun Shih
Currently the basepath is removed only from the beginning of the string. When the symbol is inlined and there's multiple line outputs of addr2line, only the first line would have basepath removed. Change to remove the basepath prefix from all lines. Fixes: 31013836a71e ("scripts/decode_stacktrace: match basepath using shell prefix operator, not regex") Co-developed-by: Shik Chen <shik@chromium.org> Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Signed-off-by: Shik Chen <shik@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Cc: Sasha Levin <sashal@kernel.org> Cc: Nicolas Boichat <drinkcat@chromium.org> Cc: Jiri Slaby <jslaby@suse.cz> Link: http://lkml.kernel.org/r/20200720082709.252805-1-pihsun@chromium.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24squashfs: fix length field overlap check in metadata readingPhillip Lougher
This is a regression introduced by the "migrate from ll_rw_block usage to BIO" patch. Squashfs packs structures on byte boundaries, and due to that the length field (of the metadata block) may not be fully in the current block. The new code rewrote and introduced a faulty check for that edge case. Fixes: 93e72b3c612adcaca1 ("squashfs: migrate from ll_rw_block usage to BIO") Reported-by: Bernd Amend <bernd.amend@gmail.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Adrien Schildknecht <adrien+dev@schischi.me> Cc: Guenter Roeck <groeck@chromium.org> Cc: Daniel Rosenberg <drosen@google.com> Link: http://lkml.kernel.org/r/20200717195536.16069-1-phillip@squashfs.org.uk Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24mailmap: add entry for Mike RapoportMike Rapoport
Add an entry to correct my email addresses. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200708095414.12275-1-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24khugepaged: fix null-pointer dereference due to raceKirill A. Shutemov
khugepaged has to drop mmap lock several times while collapsing a page. The situation can change while the lock is dropped and we need to re-validate that the VMA is still in place and the PMD is still subject for collapse. But we miss one corner case: while collapsing an anonymous pages the VMA could be replaced with file VMA. If the file VMA doesn't have any private pages we get NULL pointer dereference: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] anon_vma_lock_write include/linux/rmap.h:120 [inline] collapse_huge_page mm/khugepaged.c:1110 [inline] khugepaged_scan_pmd mm/khugepaged.c:1349 [inline] khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline] khugepaged_do_scan mm/khugepaged.c:2193 [inline] khugepaged+0x3bba/0x5a10 mm/khugepaged.c:2238 The fix is to make sure that the VMA is anonymous in hugepage_vma_revalidate(). The helper is only used for collapsing anonymous pages. Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Reported-by: syzbot+ed318e8b790ca72c5ad0@syzkaller.appspotmail.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200722121439.44328-1-kirill.shutemov@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24mm/hugetlb: avoid hardcoding while checking if cma is enabledBarry Song
hugetlb_cma[0] can be NULL due to various reasons, for example, node0 has no memory. so NULL hugetlb_cma[0] doesn't necessarily mean cma is not enabled. gigantic pages might have been reserved on other nodes. This patch fixes possible double reservation and CMA leak. [akpm@linux-foundation.org: fix CONFIG_CMA=n warning] [sfr@canb.auug.org.au: better checks before using hugetlb_cma] Link: http://lkml.kernel.org/r/20200721205716.6dbaa56b@canb.auug.org.au Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200710005726.36068-1-song.bao.hua@hisilicon.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24mm: memcg/slab: fix memory leak at non-root kmem_cache destroyMuchun Song
If the kmem_cache refcount is greater than one, we should not mark the root kmem_cache as dying. If we mark the root kmem_cache dying incorrectly, the non-root kmem_cache can never be destroyed. It resulted in memory leak when memcg was destroyed. We can use the following steps to reproduce. 1) Use kmem_cache_create() to create a new kmem_cache named A. 2) Coincidentally, the kmem_cache A is an alias for kmem_cache B, so the refcount of B is just increased. 3) Use kmem_cache_destroy() to destroy the kmem_cache A, just decrease the B's refcount but mark the B as dying. 4) Create a new memory cgroup and alloc memory from the kmem_cache B. It leads to create a non-root kmem_cache for allocating memory. 5) When destroy the memory cgroup created in the step 4), the non-root kmem_cache can never be destroyed. If we repeat steps 4) and 5), this will cause a lot of memory leak. So only when refcount reach zero, we mark the root kmem_cache as dying. Fixes: 92ee383f6daa ("mm: fix race between kmem_cache destroy, create and deactivate") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200716165103.83462-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24mm/memcg: fix refcount error while moving and swappingHugh Dickins
It was hard to keep a test running, moving tasks between memcgs with move_charge_at_immigrate, while swapping: mem_cgroup_id_get_many()'s refcount is discovered to be 0 (supposedly impossible), so it is then forced to REFCOUNT_SATURATED, and after thousands of warnings in quick succession, the test is at last put out of misery by being OOM killed. This is because of the way moved_swap accounting was saved up until the task move gets completed in __mem_cgroup_clear_mc(), deferred from when mem_cgroup_move_swap_account() actually exchanged old and new ids. Concurrent activity can free up swap quicker than the task is scanned, bringing id refcount down 0 (which should only be possible when offlining). Just skip that optimization: do that part of the accounting immediately. Fixes: 615d66c37c75 ("mm: memcontrol: fix memcg id ref counter on swap charge move") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alex Shi <alex.shi@linux.alibaba.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2007071431050.4726@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24mm/memcontrol: fix OOPS inside mem_cgroup_get_nr_swap_pages()Bhupesh Sharma
Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages() function in a corner case seen on some arm64 boards when kdump kernel runs with "cgroup_disable=memory" passed to the kdump kernel via bootargs. The root-cause behind the same is that currently mem_cgroup_swap_init() function is implemented as a subsys_initcall() call instead of a core_initcall(), this means 'cgroup_memory_noswap' still remains set to the default value (false) even when memcg is disabled via "cgroup_disable=memory" boot parameter. This may result in premature OOPS inside mem_cgroup_get_nr_swap_pages() function in corner cases: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 [0000000000000188] user address but active_mm is swapper Internal error: Oops: 96000006 [#1] SMP Modules linked in: <..snip..> Call trace: mem_cgroup_get_nr_swap_pages+0x9c/0xf4 shrink_lruvec+0x404/0x4f8 shrink_node+0x1a8/0x688 do_try_to_free_pages+0xe8/0x448 try_to_free_pages+0x110/0x230 __alloc_pages_slowpath.constprop.106+0x2b8/0xb48 __alloc_pages_nodemask+0x2ac/0x2f8 alloc_page_interleave+0x20/0x90 alloc_pages_current+0xdc/0xf8 atomic_pool_expand+0x60/0x210 __dma_atomic_pool_init+0x50/0xa4 dma_atomic_pool_init+0xac/0x158 do_one_initcall+0x50/0x218 kernel_init_freeable+0x22c/0x2d0 kernel_init+0x18/0x110 ret_from_fork+0x10/0x18 Code: aa1403e3 91106000 97f82a27 14000011 (f940c663) ---[ end trace 9795948475817de4 ]--- Kernel panic - not syncing: Fatal exception Rebooting in 10 seconds.. Fixes: eccb52e78809 ("mm: memcontrol: prepare swap controller setup for integration") Reported-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: http://lkml.kernel.org/r/1593641660-13254-2-git-send-email-bhsharma@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24mm: initialize return of vm_insert_pagesTom Rix
clang static analysis reports a garbage return In file included from mm/memory.c:84: mm/memory.c:1612:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn] return err; ^~~~~~~~~~ The setting of err depends on a loop executing. So initialize err. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200703155354.29132-1-trix@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24vfs/xattr: mm/shmem: kernfs: release simple xattr entry in a right wayChengguang Xu
After commit fdc85222d58e ("kernfs: kvmalloc xattr value instead of kmalloc"), simple xattr entry is allocated with kvmalloc() instead of kmalloc(), so we should release it with kvfree() instead of kfree(). Fixes: fdc85222d58e ("kernfs: kvmalloc xattr value instead of kmalloc") Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Xu <dxu@dxuuu.xyz> Cc: Chris Down <chris@chrisdown.name> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [5.7] Link: http://lkml.kernel.org/r/20200704051608.15043-1-cgxu519@mykernel.net Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24mm/mmap.c: close race between munmap() and expand_upwards()/downwards()Kirill A. Shutemov
VMA with VM_GROWSDOWN or VM_GROWSUP flag set can change their size under mmap_read_lock(). It can lead to race with __do_munmap(): Thread A Thread B __do_munmap() detach_vmas_to_be_unmapped() mmap_write_downgrade() expand_downwards() vma->vm_start = address; // The VMA now overlaps with // VMAs detached by the Thread A // page fault populates expanded part // of the VMA unmap_region() // Zaps pagetables partly // populated by Thread B Similar race exists for expand_upwards(). The fix is to avoid downgrading mmap_lock in __do_munmap() if detached VMAs are next to VM_GROWSDOWN or VM_GROWSUP VMA. [akpm@linux-foundation.org: s/mmap_sem/mmap_lock/ in comment] Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> [4.20+] Link: http://lkml.kernel.org/r/20200709105309.42495-1-kirill.shutemov@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-24uprobes: Change handle_swbp() to send SIGTRAP with si_code=SI_KERNEL, to fix ↵Oleg Nesterov
GDB regression If a tracee is uprobed and it hits int3 inserted by debugger, handle_swbp() does send_sig(SIGTRAP, current, 0) which means si_code == SI_USER. This used to work when this code was written, but then GDB started to validate si_code and now it simply can't use breakpoints if the tracee has an active uprobe: # cat test.c void unused_func(void) { } int main(void) { return 0; } # gcc -g test.c -o test # perf probe -x ./test -a unused_func # perf record -e probe_test:unused_func gdb ./test -ex run GNU gdb (GDB) 10.0.50.20200714-git ... Program received signal SIGTRAP, Trace/breakpoint trap. 0x00007ffff7ddf909 in dl_main () from /lib64/ld-linux-x86-64.so.2 (gdb) The tracee hits the internal breakpoint inserted by GDB to monitor shared library events but GDB misinterprets this SIGTRAP and reports a signal. Change handle_swbp() to use force_sig(SIGTRAP), this matches do_int3_user() and fixes the problem. This is the minimal fix for -stable, arch/x86/kernel/uprobes.c is equally wrong; it should use send_sigtrap(TRAP_TRACE) instead of send_sig(SIGTRAP), but this doesn't confuse GDB and needs another x86-specific patch. Reported-by: Aaron Merey <amerey@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200723154420.GA32043@redhat.com
2020-07-24sched: Warn if garbage is passed to default_wake_function()Chris Wilson
Since the default_wake_function() passes its flags onto try_to_wake_up(), warn if those flags collide with internal values. Given that the supplied flags are garbage, no repair can be done but at least alert the user to the damage they are causing. In the belief that these errors should be picked up during testing, the warning is only compiled in under CONFIG_SCHED_DEBUG. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: https://lore.kernel.org/r/20200723201042.18861-1-chris@chris-wilson.co.uk
2020-07-23vrf: Handle CONFIG_SYSCTL not setDavid Ahern
Randy reported compile failure when CONFIG_SYSCTL is not set/enabled: ERROR: modpost: "sysctl_vals" [drivers/net/vrf.ko] undefined! Fix by splitting out the sysctl init and cleanup into helpers that can be set to do nothing when CONFIG_SYSCTL is disabled. In addition, move vrf_strict_mode and vrf_strict_mode_change to above vrf_shared_table_handler (code move only) and wrap all of it in the ifdef CONFIG_SYSCTL. Update the strict mode tests to check for the existence of the /proc/sys entry. Fixes: 33306f1aaf82 ("vrf: add sysctl parameter for strict mode") Cc: Andrea Mayer <andrea.mayer@uniroma2.it> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Fix NAT hook deletion when table is dormant, from Florian Westphal. 2) Fix IPVS sync stalls, from guodeqing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23ice: add 1G SGMII PHY typePaul M Stillwell Jr
There isn't a case for 1G SGMII in ice_get_media_type() so add the handling for it. Also handle the special case where some direct attach cables may report that they support 1G SGMII, but that is erroneous since SGMII is supposed to be a backplane media type (between a MAC and a PHY). If the driver doesn't handle this special case then a user could see the 'Port' in ethtool change from 'Direct attach Copper' to 'Backplane' when they have forced the speed to 1G, but the cable hasn't changed. Lastly, change ice_aq_get_phy_caps() to save the module_type info if the function was called with ICE_AQC_REPORT_TOPO_CAP. This call uses the media information to populate the module_type. If no media is present then the values in module_type will be 0. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: Report AOC PHY Types as FiberDoug Dziggel
Report AOC types as fiber instead of unknown. Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: add AQC get link topology handle supportPaul Greenwalt
Add AQC get link topology handle support. This is needed to determine Direct Attach (DA) or backplane media type for PHY types that support either. Get link topology handle cage node type request can be used to determine if a cage is present or not. If a cage is present for PHY types that supports both DA and backplane media type, then the media type is DA, else the media type is backplane. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: Rename low_power_ctrlLev Faerman
Rename the low_power_ctrl field to low_power_ctrl_an to be properly descriptive of it being an autoneg field. Signed-off-by: Lev Faerman <lev.faerman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: update reporting of autoneg capabilitiesPaul Greenwalt
Firmware now reports AN28, AN32, and AN73. Add a helper and check these new values and report PHY autoneg capability. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: add ice_aq_get_phy_caps() debug logsPaul Greenwalt
Add debug logs for ice_aq_get_phy_caps(), and format ice_aq_set_phy_cfg() and ice_aq_get_link_info() debug logs to make them more readable. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: support Total Port Shutdown on devices that support itBruce Allan
When the Port Disable bit is set in the Link Default Override Mask TLV PFA module in the NVM, Total Port Shutdown mode is supported and enabled. In this mode, the driver should act as if the link-down-on-close ethtool private flag is always enabled and dis-allow any change to that flag. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: add link lenient and default override supportPaul Greenwalt
Adds functions to check for link override firmware support and get the override settings for a port. The previously supported/default link mode was strict mode. In strict mode link is configured based on get PHY capabilities PHY types with media. Lenient mode is now the default link mode. In lenient mode the link is configured based on get PHY capabilities PHY types without media. This allows the user to configure link that the media does not report. Limit the minimum supported link mode to 25G for devices that support 100G, and 1G for devices that support less than 100G. Default override is only supported in lenient mode. If default override is supported and enabled, then default override values are used for configuring speed and FEC. Default override provide persistent link settings in the NVM. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Evan Swanson <evan.swanson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23geneve: fix an uninitialized value in geneve_changelink()Cong Wang
geneve_nl2info() sets 'df' conditionally, so we have to initialize it by copying the value from existing geneve device in geneve_changelink(). Fixes: 56c09de347e4 ("geneve: allow changing DF behavior after creation") Reported-by: syzbot+7ebc2e088af5e4c0c9fa@syzkaller.appspotmail.com Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23bonding: check return value of register_netdevice() in bond_newlink()Cong Wang
Very similar to commit 544f287b8495 ("bonding: check error value of register_netdevice() immediately"), we should immediately check the return value of register_netdevice() before doing anything else. Fixes: 005db31d5f5f ("bonding: set carrier off for devices created through netlink") Reported-and-tested-by: syzbot+bbc3a11c4da63c1b74d6@syzkaller.appspotmail.com Cc: Beniamino Galvani <bgalvani@redhat.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>