summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-01-04memblock: Fix doc for memblock_phys_freeMiaoqian Lin
memblock_phys_free() is the counterpart to memblock_phys_alloc. Change memblock_alloc_xx() with memblock_phys_alloc_xx() to keep consistency. Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20221216100304.688209-1-linmq006@gmail.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2023-01-04memblock tests: Fix compilation error.Aaron Thompson
Commit cf4694be2b2cf ("tools: Add atomic_test_and_set_bit()") changed tools/arch/x86/include/asm/atomic.h to include <asm/asm.h>, which causes 'make -C tools/testing/memblock' to fail with: In file included from ../../include/asm/atomic.h:6, from ../../include/linux/atomic.h:5, from ./linux/mmzone.h:5, from ../../include/linux/mm.h:5, from ../../include/linux/pfn.h:5, from ./linux/memory_hotplug.h:6, from ./linux/init.h:7, from ./linux/memblock.h:11, from tests/common.h:8, from tests/basic_api.h:5, from main.c:2: ../../include/asm/../../arch/x86/include/asm/atomic.h:11:10: fatal error: asm/asm.h: No such file or directory 11 | #include <asm/asm.h> | ^~~~~~~~~~~ Create a symlink to asm/asm.h in the same manner as the existing one to asm/cmpxchg.h. Signed-off-by: Aaron Thompson <dev@aaront.org> Link: https://lore.kernel.org/r/010101857c402765-96e2dbc6-b82b-47e2-a437-4834dbe0b96b-000000@us-west-2.amazonses.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2023-01-04x86/bugs: Flush IBP in ib_prctl_set()Rodrigo Branco
We missed the window between the TIF flag update and the next reschedule. Signed-off-by: Rodrigo Branco <bsdaemon@google.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org>
2023-01-04MAINTAINERS: Update maintainers for ptp_vmw driverSrivatsa S. Bhat (VMware)
Vivek has decided to transfer the maintainership of the VMware virtual PTP clock driver (ptp_vmw) to Srivatsa and Deep. Update the MAINTAINERS file to reflect this change, and also add Alexey as a reviewer for the driver. Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Acked-by: Vivek Thampi <vivek@vivekthampi.com> Acked-by: Deep Shah <sdeep@vmware.com> Acked-by: Alexey Makhalov <amakhalov@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-04cifs: fix race in assemble_neg_contexts()Paulo Alcantara
Serialise access of TCP_Server_Info::hostname in assemble_neg_contexts() by holding the server's mutex otherwise it might end up accessing an already-freed hostname pointer from cifs_reconnect() or cifs_resolve_server(). Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-04cifs: ignore ipc reconnect failures during dfs failoverPaulo Alcantara
If it failed to reconnect ipc used for getting referrals, we can just ignore it as it is not required for reconnecting the share. The worst case would be not being able to detect or chase nested links as long as dfs root server is unreachable. Before patch: $ mount.cifs //root/dfs/link /mnt -o echo_interval=10,... -> target share: /fs0/share disconnect root & fs0 $ ls /mnt ls: cannot access '/mnt': Host is down connect fs0 $ ls /mnt ls: cannot access '/mnt': Resource temporarily unavailable After patch: $ mount.cifs //root/dfs/link /mnt -o echo_interval=10,... -> target share: /fs0/share disconnect root & fs0 $ ls /mnt ls: cannot access '/mnt': Host is down connect fs0 $ ls /mnt bar.rtf dir1 foo Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-04cifs: Fix kmap_local_page() unmappingIra Weiny
kmap_local_page() requires kunmap_local() to unmap the mapping. In addition memcpy_page() is provided to perform this common memcpy pattern. Replace the kmap_local_page() and broken kunmap() with memcpy_page() Fixes: d406d26745ab ("cifs: skip alloc when request has no pages") Reviewed-by: Paulo Alcantara <pc@cjr.nz> Reviewed-by: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-03io_uring: lockdep annotate CQ lockingPavel Begunkov
Locking around CQE posting is complex and depends on options the ring is created with, add more thorough lockdep annotations checking all invariants. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/aa3770b4eacae3915d782cc2ab2f395a99b4b232.1672795976.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-03io_uring: pin context while queueing deferred twPavel Begunkov
Unlike normal tw, nothing prevents deferred tw to be executed right after an tw item added to ->work_llist in io_req_local_work_add(). For instance, the waiting task may get waken up by CQ posting or a normal tw. Thus we need to pin the ring for the rest of io_req_local_work_add() Cc: stable@vger.kernel.org Fixes: c0e0d6ba25f18 ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1a79362b9c10b8523ef70b061d96523650a23344.1672795998.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-03dt-bindings: soundwire: qcom,soundwire: correct sizes related to number of portsKrzysztof Kozlowski
There are several properties depending on number of ports. Some of them had maximum limit of 5 and some of 8. SM8450 AudioReach comes with 8 ports, so fix the limits: sm8450-sony-xperia-nagara-pdx224.dtb: soundwire-controller@3250000: qcom,ports-word-length: 'oneOf' conditional failed, one must be fixed: [[255, 255, 255, 255, 255, 255, 255, 255]] is too short [255, 255, 255, 255, 255, 255, 255, 255] is too long Fixes: febc50b82bc9 ("dt-bindings: soundwire: Convert text bindings to DT Schema") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221223132159.81211-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-01-03of/fdt: run soc memory setup when early_init_dt_scan_memory failsAndreas Rammhold
If memory has been found early_init_dt_scan_memory now returns 1. If it hasn't found any memory it will return 0, allowing other memory setup mechanisms to carry on. Previously early_init_dt_scan_memory always returned 0 without distinguishing between any kind of memory setup being done or not. Any code path after the early_init_dt_scan memory call in the ramips plat_mem_setup code wouldn't be executed anymore. Making early_init_dt_scan_memory the only way to initialize the memory. Some boards, including my mt7621 based Cudy X6 board, depend on memory initialization being done via the soc_info.mem_detect function pointer. Those wouldn't be able to obtain memory and panic the kernel during early bootup with the message "early_init_dt_alloc_memory_arch: Failed to allocate 12416 bytes align=0x40". Fixes: 1f012283e936 ("of/fdt: Rework early_init_dt_scan_memory() to call directly") Cc: stable@vger.kernel.org Signed-off-by: Andreas Rammhold <andreas@rammhold.de> Link: https://lore.kernel.org/r/20221223112748.2935235-1-andreas@rammhold.de Signed-off-by: Rob Herring <robh@kernel.org>
2023-01-03drm/amd/display: Uninitialized variables causing 4k60 UCLK to stay at DPM1 ↵Samson Tam
and not DPM0 [Why] SwathSizePerSurfaceY[] and SwathSizePerSurfaceC[] values are uninitialized because we are using += instead of = operator. [How] Assign values in loop with = operator. Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Samson Tam <samson.tam@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.0.x, 6.1.x
2023-01-03drm/amdkfd: Fix kernel warning during topology setupMukul Joshi
This patch fixes the following kernel warning seen during driver load by correctly initializing the p2plink attr before creating the sysfs file: [ +0.002865] ------------[ cut here ]------------ [ +0.002327] kobject: '(null)' (0000000056260cfb): is not initialized, yet kobject_put() is being called. [ +0.004780] WARNING: CPU: 32 PID: 1006 at lib/kobject.c:718 kobject_put+0xaa/0x1c0 [ +0.001361] Call Trace: [ +0.001234] <TASK> [ +0.001067] kfd_remove_sysfs_node_entry+0x24a/0x2d0 [amdgpu] [ +0.003147] kfd_topology_update_sysfs+0x3d/0x750 [amdgpu] [ +0.002890] kfd_topology_add_device+0xbd7/0xc70 [amdgpu] [ +0.002844] ? lock_release+0x13c/0x2e0 [ +0.001936] ? smu_cmn_send_smc_msg_with_param+0x1e8/0x2d0 [amdgpu] [ +0.003313] ? amdgpu_dpm_get_mclk+0x54/0x60 [amdgpu] [ +0.002703] kgd2kfd_device_init.cold+0x39f/0x4ed [amdgpu] [ +0.002930] amdgpu_amdkfd_device_init+0x13d/0x1f0 [amdgpu] [ +0.002944] amdgpu_device_init.cold+0x1464/0x17b4 [amdgpu] [ +0.002970] ? pci_bus_read_config_word+0x43/0x80 [ +0.002380] amdgpu_driver_load_kms+0x15/0x100 [amdgpu] [ +0.002744] amdgpu_pci_probe+0x147/0x370 [amdgpu] [ +0.002522] local_pci_probe+0x40/0x80 [ +0.001896] work_for_cpu_fn+0x10/0x20 [ +0.001892] process_one_work+0x26e/0x5a0 [ +0.002029] worker_thread+0x1fd/0x3e0 [ +0.001890] ? process_one_work+0x5a0/0x5a0 [ +0.002115] kthread+0xea/0x110 [ +0.001618] ? kthread_complete_and_exit+0x20/0x20 [ +0.002422] ret_from_fork+0x1f/0x30 [ +0.001808] </TASK> [ +0.001103] irq event stamp: 59837 [ +0.001718] hardirqs last enabled at (59849): [<ffffffffb30fab12>] __up_console_sem+0x52/0x60 [ +0.004414] hardirqs last disabled at (59860): [<ffffffffb30faaf7>] __up_console_sem+0x37/0x60 [ +0.004414] softirqs last enabled at (59654): [<ffffffffb307d9c7>] irq_exit_rcu+0xd7/0x130 [ +0.004205] softirqs last disabled at (59649): [<ffffffffb307d9c7>] irq_exit_rcu+0xd7/0x130 [ +0.004203] ---[ end trace 0000000000000000 ]--- Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-01-03Merge tag 'drm-misc-next-fixes-2023-01-03' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Maxime writes: "The drm-misc-next-fixes leftovers. It addresses a bug in drm/scheduler ending up causing a lockup, and reduces the stack usage of some drm/mm kunit tests." Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20230103144926.bmjjni3xnuis2jmq@houat
2023-01-03perf lock contention: Fix core dump related to not finding the ↵Thomas Richter
"__sched_text_end" symbol on s/390 The test case perf lock contention dumps core on s390. Run the following commands: # ./perf lock record -- ./perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 2.799 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.073 MB perf.data (100 samples) ] # # ./perf lock contention Segmentation fault (core dumped) # The function call stack is lengthy, here are the top 5 functions: # gdb ./perf core.24048 GNU gdb (GDB) Fedora Linux 12.1-6.fc37 Core was generated by `./perf lock contention'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356 3356 machine->sched.text_end = kmap->unmap_ip(kmap, sym->start); (gdb) where #0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356 #1 0x000000000109f244 in callchain_id (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:957 #2 0x000000000109e094 in get_key_by_aggr_mode (key=0x3ffea4f7290, addr=27758136, evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:586 #3 0x000000000109f4d0 in report_lock_contention_begin_event (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1004 #4 0x00000000010a00ae in evsel__process_contention_begin (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1254 #5 0x00000000010a0e14 in process_sample_event (tool=0x3ffea4f8480, event=0x3ff85601ef8, sample=0x3ffea4f77d0, evsel=0x30313e0, machine=0x3029e28) at builtin-lock.c:1464 ..... The issue is in function machine__is_lock_function() in file ./util/machine.c lines 3355: /* should not fail from here */ sym = machine__find_kernel_symbol_by_name(machine, "__sched_text_end", &kmap); machine->sched.text_end = kmap->unmap_ip(kmap, sym->start) On s390 the symbol __sched_text_end is *NOT* in the symbol list and the resulting pointer sym is set to NULL. The sym->start is then a NULL pointer access and generates the core dump. The reason why __sched_text_end is not in the symbol list on s390 is simple: When the symbol list is created at perf start up with function calls dso__load +--> dso__load_vmlinux_path +--> dso__load_vmlinux +--> dso__load_sym +--> dso__load_sym_internal (reads kernel symbols) +--> symbols__fixup_end +--> symbols__fixup_duplicate The issue is in function symbols__fixup_duplicate(). It deletes all symbols with have the same address. On s390: # nm -g ~/linux/vmlinux| fgrep c68390 0000000000c68390 T __cpuidle_text_start 0000000000c68390 T __sched_text_end # two symbols have identical addresses and __sched_text_end is considered duplicate (in ascending sort order) and removed from the symbol list. Therefore it is missing and an invalid pointer reference occurs. The code checks for symbol __sched_text_start and when it exists assumes symbol __sched_text_end is also in the symbol table. However this is not the case on s390. Same situation exists for symbol __lock_text_start: 0000000000c68770 T __cpuidle_text_end 0000000000c68770 T __lock_text_start This symbol is also removed from the symbol table but used in function machine__is_lock_function(). To fix this and keep duplicate symbols in the symbol table, set symbol_conf.allow_aliases to true. This prevents the removal of duplicate symbols in function symbols__fixup_duplicate(). Output After: # ./perf lock contention contended total wait max wait avg wait type caller 48 124.39 ms 123.99 ms 2.59 ms rwsem:W unlink_anon_vmas+0x24a 47 83.68 ms 83.26 ms 1.78 ms rwsem:W free_pgtables+0x132 5 41.22 us 10.55 us 8.24 us rwsem:W free_pgtables+0x140 4 40.12 us 20.55 us 10.03 us rwsem:W copy_process+0x1ac8 # Fixes: 0d2997f750d1de39 ("perf lock: Look up callchain for the contended locks") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20221230102627.2410847-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-03perf build: Don't propagate subdir to submakes for install_headersIan Rogers
subdir is added to the OUTPUT which fails as part of building install_headers when passed from "make -C tools perf_install". Committer testing: The original reporter (see the Link: below) had trouble with this: $ make -C tools perf_install That ended up with errors like this: /var/home/acme/git/perf-urgent/tools/scripts/Makefile.include:17: *** output directory "/var/home/acme/git/perf-urgent/tools/perf/libperf/perf/" does not exist. Stop. With this patch applied we now get it installed at: INSTALL /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h As expected: $ ls -la /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h -rw-r--r--. 1 acme acme 1146 Jan 3 15:42 /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h And if we clean tools with: $ make -C tools clean it gets cleaned up: $ ls -la /var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h ls: cannot access '/var/home/acme/git/perf-urgent/tools/perf/libperf/include/perf/bpf_perf.h': No such file or directory $ Fixes: 746bd29e348f99b4 ("perf build: Use tools/lib headers from install path") Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/fa4b3115-d555-3d7f-54d1-018002e99350@secunet.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-03xfs: xfs_qm: remove unnecessary ‘0’ values from errorLi zeming
error is assigned first, so it does not need to initialize the assignment. Signed-off-by: Li zeming <zeming@nfschina.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-01-03xfs: Fix deadlock on xfs_inodegc_workerWu Guanghao
We are doing a test about deleting a large number of files when memory is low. A deadlock problem was found. [ 1240.279183] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 1240.280450] lock_acquire+0x197/0x460 [ 1240.281548] fs_reclaim_acquire.part.0+0x20/0x30 [ 1240.282625] kmem_cache_alloc+0x2b/0x940 [ 1240.283816] xfs_trans_alloc+0x8a/0x8b0 [ 1240.284757] xfs_inactive_ifree+0xe4/0x4e0 [ 1240.285935] xfs_inactive+0x4e9/0x8a0 [ 1240.286836] xfs_inodegc_worker+0x160/0x5e0 [ 1240.287969] process_one_work+0xa19/0x16b0 [ 1240.289030] worker_thread+0x9e/0x1050 [ 1240.290131] kthread+0x34f/0x460 [ 1240.290999] ret_from_fork+0x22/0x30 [ 1240.291905] [ 1240.291905] -> #0 ((work_completion)(&gc->work)){+.+.}-{0:0}: [ 1240.293569] check_prev_add+0x160/0x2490 [ 1240.294473] __lock_acquire+0x2c4d/0x5160 [ 1240.295544] lock_acquire+0x197/0x460 [ 1240.296403] __flush_work+0x6bc/0xa20 [ 1240.297522] xfs_inode_mark_reclaimable+0x6f0/0xdc0 [ 1240.298649] destroy_inode+0xc6/0x1b0 [ 1240.299677] dispose_list+0xe1/0x1d0 [ 1240.300567] prune_icache_sb+0xec/0x150 [ 1240.301794] super_cache_scan+0x2c9/0x480 [ 1240.302776] do_shrink_slab+0x3f0/0xaa0 [ 1240.303671] shrink_slab+0x170/0x660 [ 1240.304601] shrink_node+0x7f7/0x1df0 [ 1240.305515] balance_pgdat+0x766/0xf50 [ 1240.306657] kswapd+0x5bd/0xd20 [ 1240.307551] kthread+0x34f/0x460 [ 1240.308346] ret_from_fork+0x22/0x30 [ 1240.309247] [ 1240.309247] other info that might help us debug this: [ 1240.309247] [ 1240.310944] Possible unsafe locking scenario: [ 1240.310944] [ 1240.312379] CPU0 CPU1 [ 1240.313363] ---- ---- [ 1240.314433] lock(fs_reclaim); [ 1240.315107] lock((work_completion)(&gc->work)); [ 1240.316828] lock(fs_reclaim); [ 1240.318088] lock((work_completion)(&gc->work)); [ 1240.319203] [ 1240.319203] *** DEADLOCK *** ... [ 2438.431081] Workqueue: xfs-inodegc/sda xfs_inodegc_worker [ 2438.432089] Call Trace: [ 2438.432562] __schedule+0xa94/0x1d20 [ 2438.435787] schedule+0xbf/0x270 [ 2438.436397] schedule_timeout+0x6f8/0x8b0 [ 2438.445126] wait_for_completion+0x163/0x260 [ 2438.448610] __flush_work+0x4c4/0xa40 [ 2438.455011] xfs_inode_mark_reclaimable+0x6ef/0xda0 [ 2438.456695] destroy_inode+0xc6/0x1b0 [ 2438.457375] dispose_list+0xe1/0x1d0 [ 2438.458834] prune_icache_sb+0xe8/0x150 [ 2438.461181] super_cache_scan+0x2b3/0x470 [ 2438.461950] do_shrink_slab+0x3cf/0xa50 [ 2438.462687] shrink_slab+0x17d/0x660 [ 2438.466392] shrink_node+0x87e/0x1d40 [ 2438.467894] do_try_to_free_pages+0x364/0x1300 [ 2438.471188] try_to_free_pages+0x26c/0x5b0 [ 2438.473567] __alloc_pages_slowpath.constprop.136+0x7aa/0x2100 [ 2438.482577] __alloc_pages+0x5db/0x710 [ 2438.485231] alloc_pages+0x100/0x200 [ 2438.485923] allocate_slab+0x2c0/0x380 [ 2438.486623] ___slab_alloc+0x41f/0x690 [ 2438.490254] __slab_alloc+0x54/0x70 [ 2438.491692] kmem_cache_alloc+0x23e/0x270 [ 2438.492437] xfs_trans_alloc+0x88/0x880 [ 2438.493168] xfs_inactive_ifree+0xe2/0x4e0 [ 2438.496419] xfs_inactive+0x4eb/0x8b0 [ 2438.497123] xfs_inodegc_worker+0x16b/0x5e0 [ 2438.497918] process_one_work+0xbf7/0x1a20 [ 2438.500316] worker_thread+0x8c/0x1060 [ 2438.504938] ret_from_fork+0x22/0x30 When the memory is insufficient, xfs_inonodegc_worker will trigger memory reclamation when memory is allocated, then flush_work() may be called to wait for the work to complete. This causes a deadlock. So use memalloc_nofs_save() to avoid triggering memory reclamation in xfs_inodegc_worker. Signed-off-by: Wu Guanghao <wuguanghao3@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-01-03xfs: get root inode correctly at bulkstatHironori Shiina
The root inode number should be set to `breq->startino` for getting stat information of the root when XFS_BULK_IREQ_SPECIAL_ROOT is used. Otherwise, the inode search is started from 1 (XFS_BULK_IREQ_SPECIAL_ROOT) and the inode with the lowest number in a filesystem is returned. Fixes: bf3cb3944792 ("xfs: allow single bulkstat of special inodes") Signed-off-by: Hironori Shiina <shiina.hironori@fujitsu.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-01-03xfs: fix off-by-one error in xfs_btree_space_to_heightDarrick J. Wong
Lately I've been stress-testing extreme-sized rmap btrees by using the (new) xfs_db bmap_inflate command to clone bmbt mappings billions of times and then using xfs_repair to build new rmap and refcount btrees. This of course is /much/ faster than actually FICLONEing a file billions of times. Unfortunately, xfs_repair fails in xfs_btree_bload_compute_geometry with EOVERFLOW, which indicates that xfs_mount.m_rmap_maxlevels is not sufficiently large for the test scenario. For a 1TB filesystem (~67 million AG blocks, 4 AGs) the btheight command reports: $ xfs_db -c 'btheight -n 4400801200 -w min rmapbt' /dev/sda rmapbt: worst case per 4096-byte block: 84 records (leaf) / 45 keyptrs (node) level 0: 4400801200 records, 52390491 blocks level 1: 52390491 records, 1164234 blocks level 2: 1164234 records, 25872 blocks level 3: 25872 records, 575 blocks level 4: 575 records, 13 blocks level 5: 13 records, 1 block 6 levels, 53581186 blocks total The AG is sufficiently large to build this rmap btree. Unfortunately, m_rmap_maxlevels is 5. Augmenting the loop in the space->height function to report height, node blocks, and blocks remaining produces this: ht 1 node_blocks 45 blockleft 67108863 ht 2 node_blocks 2025 blockleft 67108818 ht 3 node_blocks 91125 blockleft 67106793 ht 4 node_blocks 4100625 blockleft 67015668 final height: 5 The goal of this function is to compute the maximum height btree that can be stored in the given number of ondisk fsblocks. Starting with the top level of the tree, each iteration through the loop adds the fanout factor of the next level down until we run out of blocks. IOWs, maximum height is achieved by using the smallest fanout factor that can apply to that level. However, the loop setup is not correct. Top level btree blocks are allowed to contain fewer than minrecs items, so the computation is incorrect because the first time through the loop it should be using a fanout factor of 2. With this corrected, the above becomes: ht 1 node_blocks 2 blockleft 67108863 ht 2 node_blocks 90 blockleft 67108861 ht 3 node_blocks 4050 blockleft 67108771 ht 4 node_blocks 182250 blockleft 67104721 ht 5 node_blocks 8201250 blockleft 66922471 final height: 6 Fixes: 9ec691205e7d ("xfs: compute the maximum height of the rmap btree when reflink enabled") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-01-03perf/x86/rapl: Treat Tigerlake like IcelakeChris Wilson
Since Tigerlake seems to have inherited its cstates and other RAPL power caps from Icelake, assume it also follows Icelake for its RAPL events. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Zhang Rui <rui.zhang@intel.com> Link: https://lore.kernel.org/r/20221228113454.1199118-1-rodrigo.vivi@intel.com
2023-01-03x86/insn: Avoid namespace clash by separating instruction decoder MMIO type ↵Jason A. Donenfeld
from MMIO trace type Both <linux/mmiotrace.h> and <asm/insn-eval.h> define various MMIO_ enum constants, whose namespace overlaps. Rename the <asm/insn-eval.h> ones to have a INSN_ prefix, so that the headers can be used from the same source file. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230101162910.710293-2-Jason@zx2c4.com
2023-01-03f2fs: let's avoid panic if extent_tree is not createdJaegeuk Kim
This patch avoids the below panic. pc : __lookup_extent_tree+0xd8/0x760 lr : f2fs_do_write_data_page+0x104/0x87c sp : ffffffc010cbb3c0 x29: ffffffc010cbb3e0 x28: 0000000000000000 x27: ffffff8803e7f020 x26: ffffff8803e7ed40 x25: ffffff8803e7f020 x24: ffffffc010cbb460 x23: ffffffc010cbb480 x22: 0000000000000000 x21: 0000000000000000 x20: ffffffff22e90900 x19: 0000000000000000 x18: ffffffc010c5d080 x17: 0000000000000000 x16: 0000000000000020 x15: ffffffdb1acdbb88 x14: ffffff888759e2b0 x13: 0000000000000000 x12: ffffff802da49000 x11: 000000000a001200 x10: ffffff8803e7ed40 x9 : ffffff8023195800 x8 : ffffff802da49078 x7 : 0000000000000001 x6 : 0000000000000000 x5 : 0000000000000006 x4 : ffffffc010cbba28 x3 : 0000000000000000 x2 : ffffffc010cbb480 x1 : 0000000000000000 x0 : ffffff8803e7ed40 Call trace: __lookup_extent_tree+0xd8/0x760 f2fs_do_write_data_page+0x104/0x87c f2fs_write_single_data_page+0x420/0xb60 f2fs_write_cache_pages+0x418/0xb1c __f2fs_write_data_pages+0x428/0x58c f2fs_write_data_pages+0x30/0x40 do_writepages+0x88/0x190 __writeback_single_inode+0x48/0x448 writeback_sb_inodes+0x468/0x9e8 __writeback_inodes_wb+0xb8/0x2a4 wb_writeback+0x33c/0x740 wb_do_writeback+0x2b4/0x400 wb_workfn+0xe4/0x34c process_one_work+0x24c/0x5bc worker_thread+0x3e8/0xa50 kthread+0x150/0x1b4 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03f2fs: should use a temp extent_info for lookupJaegeuk Kim
Otherwise, __lookup_extent_tree() will override the given extent_info which will be used by caller. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03f2fs: don't mix to use union values in extent_infoJaegeuk Kim
Let's explicitly use the defined values in block_age case only. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03f2fs: initialize extent_cache parameterJaegeuk Kim
This can avoid confusing tracepoint values. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()Chao Yu
With below two cases, it will cause NULL pointer dereference when accessing SM_I(sbi)->fcc_info in f2fs_issue_flush(). a) If kthread_run() fails in f2fs_create_flush_cmd_control(), it will release SM_I(sbi)->fcc_info, - mount -o noflush_merge /dev/vda /mnt/f2fs - mount -o remount,flush_merge /dev/vda /mnt/f2fs -- kthread_run() fails - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync b) we will never allocate memory for SM_I(sbi)->fcc_info w/ below testcase, - mount -o ro /dev/vda /mnt/f2fs - mount -o rw,remount /dev/vda /mnt/f2fs - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync In order to fix this issue, let change as below: - fix error path handling in f2fs_create_flush_cmd_control(). - allocate SM_I(sbi)->fcc_info even if readonly is on. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03x86/asm: Fix an assembler warning with current binutilsMikulas Patocka
Fix a warning: "found `movsd'; assuming `movsl' was meant" Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org
2023-01-03btrfs: fix compat_ro checks against remountQu Wenruo
[BUG] Even with commit 81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling"), btrfs can still mount a fs with unsupported compat_ro flags read-only, then remount it RW: # btrfs ins dump-super /dev/loop0 | grep compat_ro_flags -A 3 compat_ro_flags 0x403 ( FREE_SPACE_TREE | FREE_SPACE_TREE_VALID | unknown flag: 0x400 ) # mount /dev/loop0 /mnt/btrfs mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. ^^^ RW mount failed as expected ^^^ # dmesg -t | tail -n5 loop0: detected capacity change from 0 to 1048576 BTRFS: device fsid cb5b82f5-0fdd-4d81-9b4b-78533c324afa devid 1 transid 7 /dev/loop0 scanned by mount (1146) BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device loop0): using free space tree BTRFS error (device loop0): cannot mount read-write because of unknown compat_ro features (0x403) BTRFS error (device loop0): open_ctree failed # mount /dev/loop0 -o ro /mnt/btrfs # mount -o remount,rw /mnt/btrfs ^^^ RW remount succeeded unexpectedly ^^^ [CAUSE] Currently we use btrfs_check_features() to check compat_ro flags against our current mount flags. That function get reused between open_ctree() and btrfs_remount(). But for btrfs_remount(), the super block we passed in still has the old mount flags, thus btrfs_check_features() still believes we're mounting read-only. [FIX] Replace the existing @sb argument with @is_rw_mount. As originally we only use @sb to determine if the mount is RW. Now it's callers' responsibility to determine if the mount is RW, and since there are only two callers, the check is pretty simple: - caller in open_ctree() Just pass !sb_rdonly(). - caller in btrfs_remount() Pass !(*flags & SB_RDONLY), as our check should be against the new flags. Now we can correctly reject the RW remount: # mount /dev/loop0 -o ro /mnt/btrfs # mount -o remount,rw /mnt/btrfs mount: /mnt/btrfs: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. # dmesg -t | tail -n 1 BTRFS error (device loop0: state M): cannot mount read-write because of unknown compat_ro features (0x403) Reported-by: Chung-Chiang Cheng <shepjeng@gmail.com> Fixes: 81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03btrfs: always report error in run_one_delayed_ref()Qu Wenruo
Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but if end users hit such problem, there will be no chance that btrfs_debug() is enabled. This can lead to very little useful info for debugging. This patch will: - Add extra info for error reporting Including: * logical bytenr * num_bytes * type * action * ref_mod - Replace the btrfs_debug() with btrfs_err() - Move the error reporting into run_one_delayed_ref() This is to avoid use-after-free, the @node can be freed in the caller. This error should only be triggered at most once. As if run_one_delayed_ref() failed, we trigger the error message, then causing the call chain to error out: btrfs_run_delayed_refs() `- btrfs_run_delayed_refs() `- btrfs_run_delayed_refs_for_head() `- run_one_delayed_ref() And we will abort the current transaction in btrfs_run_delayed_refs(). If we have to run delayed refs for the abort transaction, run_one_delayed_ref() will just cleanup the refs and do nothing, thus no new error messages would be output. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03btrfs: handle case when repair happens with dev-replaceQu Wenruo
[BUG] There is a bug report that a BUG_ON() in btrfs_repair_io_failure() (originally repair_io_failure() in v6.0 kernel) got triggered when replacing a unreliable disk: BTRFS warning (device sda1): csum failed root 257 ino 2397453 off 39624704 csum 0xb0d18c75 expected csum 0x4dae9c5e mirror 3 kernel BUG at fs/btrfs/extent_io.c:2380! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 PID: 3614331 Comm: kworker/u257:2 Tainted: G OE 6.0.0-5-amd64 #1 Debian 6.0.10-2 Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40 PRO WIFI (MS-7C60), BIOS 2.70 07/01/2021 Workqueue: btrfs-endio btrfs_end_bio_work [btrfs] RIP: 0010:repair_io_failure+0x24a/0x260 [btrfs] Call Trace: <TASK> clean_io_failure+0x14d/0x180 [btrfs] end_bio_extent_readpage+0x412/0x6e0 [btrfs] ? __switch_to+0x106/0x420 process_one_work+0x1c7/0x380 worker_thread+0x4d/0x380 ? rescuer_thread+0x3a0/0x3a0 kthread+0xe9/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 [CAUSE] Before the BUG_ON(), we got some read errors from the replace target first, note the mirror number (3, which is beyond RAID1 duplication, thus it's read from the replace target device). Then at the BUG_ON() location, we are trying to writeback the repaired sectors back the failed device. The check looks like this: ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical, &map_length, &bioc, mirror_num); if (ret) goto out_counter_dec; BUG_ON(mirror_num != bioc->mirror_num); But inside btrfs_map_block(), we can modify bioc->mirror_num especially for dev-replace: if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 && !need_full_stripe(op) && dev_replace->tgtdev != NULL) { ret = get_extra_mirror_from_replace(fs_info, logical, *length, dev_replace->srcdev->devid, &mirror_num, &physical_to_patch_in_first_stripe); patch_the_first_stripe_for_dev_replace = 1; } Thus if we're repairing the replace target device, we're going to trigger that BUG_ON(). But in reality, the read failure from the replace target device may be that, our replace hasn't reached the range we're reading, thus we're reading garbage, but with replace running, the range would be properly filled later. Thus in that case, we don't need to do anything but let the replace routine to handle it. [FIX] Instead of a BUG_ON(), just skip the repair if we're repairing the device replace target device. Reported-by: 小太 <nospam@kota.moe> Link: https://lore.kernel.org/linux-btrfs/CACsxjPYyJGQZ+yvjzxA1Nn2LuqkYqTCcUH43S=+wXhyf8S00Ag@mail.gmail.com/ CC: stable@vger.kernel.org # 6.0+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03btrfs: fix off-by-one in delalloc search during lseekFilipe Manana
During lseek, when searching for delalloc in a range that represents a hole and that range has a length of 1 byte, we end up not doing the actual delalloc search in the inode's io tree, resulting in not correctly reporting the offset with data or a hole. This actually only happens when the start offset is 0 because with any other start offset we round it down by sector size. Reproducer: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -c "pwrite -q 0 1" /mnt/sdc/foo $ xfs_io -c "seek -d 0" /mnt/sdc/foo Whence Result DATA EOF It should have reported an offset of 0 instead of EOF. Fix this by updating btrfs_find_delalloc_in_range() and count_range_bits() to deal with inclusive ranges properly. These functions are already supposed to work with inclusive end offsets, they just got it wrong in a couple places due to off-by-one mistakes. A test case for fstests will be added later. Reported-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Link: https://lore.kernel.org/linux-btrfs/20221223020509.457113-1-joanbrugueram@gmail.com/ Fixes: b6e833567ea1 ("btrfs: make hole and data seeking a lot more efficient") CC: stable@vger.kernel.org # 6.1 Tested-by: Joan Bruguera Micó <joanbrugueram@gmail.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03btrfs: fix false alert on bad tree level checkQu Wenruo
[BUG] There is a bug report that on a RAID0 NVMe btrfs system, under heavy write load the filesystem can flip RO randomly. With extra debugging, it shows some tree blocks failed to pass their level checks, and if that happens at critical path of a transaction, we abort the transaction: BTRFS error (device nvme0n1p3): level verify failed on logical 5446121209856 mirror 1 wanted 0 found 1 BTRFS error (device nvme0n1p3: state A): Transaction aborted (error -5) BTRFS: error (device nvme0n1p3: state A) in btrfs_finish_ordered_io:3343: errno=-5 IO failure BTRFS info (device nvme0n1p3: state EA): forced readonly [CAUSE] The reporter has already bisected to commit 947a629988f1 ("btrfs: move tree block parentness check into validate_extent_buffer()"). And with extra debugging, it shows we can have btrfs_tree_parent_check filled with all zeros in the following call trace: submit_one_bio+0xd4/0xe0 submit_extent_page+0x142/0x550 read_extent_buffer_pages+0x584/0x9c0 ? __pfx_end_bio_extent_readpage+0x10/0x10 ? folio_unlock+0x1d/0x50 btrfs_read_extent_buffer+0x98/0x150 read_tree_block+0x43/0xa0 read_block_for_search+0x266/0x370 btrfs_search_slot+0x351/0xd30 ? lock_is_held_type+0xe8/0x140 btrfs_lookup_csum+0x63/0x150 btrfs_csum_file_blocks+0x197/0x6c0 ? sched_clock_cpu+0x9f/0xc0 ? lock_release+0x14b/0x440 ? _raw_read_unlock+0x29/0x50 btrfs_finish_ordered_io+0x441/0x860 btrfs_work_helper+0xfe/0x400 ? lock_is_held_type+0xe8/0x140 process_one_work+0x294/0x5b0 worker_thread+0x4f/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xf5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 Currently we only copy the btrfs_tree_parent_check structure into bbio at read_extent_buffer_pages() after we have assembled the bbio. But as shown above, submit_extent_page() itself can already submit the bbio, leaving the bbio->parent_check uninitialized, and cause the false alert. [FIX] Instead of copying @check into bbio after bbio is assembled, we pass @check in btrfs_bio_ctrl::parent_check, and copy the content of parent_check in submit_one_bio() for metadata read. By this we should be able to pass the needed info for metadata endio verification, and fix the false alert. Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CABXGCsNzVxo4iq-tJSGm_kO1UggHXgq6CdcHDL=z5FL4njYXSQ@mail.gmail.com/ Fixes: 947a629988f1 ("btrfs: move tree block parentness check into validate_extent_buffer()") Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03btrfs: add error message for metadata level mismatchQu Wenruo
From a recent regression report, we found that after commit 947a629988f1 ("btrfs: move tree block parentness check into validate_extent_buffer()") if we have a level mismatch (false alert though), there is no error message at all. This makes later debugging harder. This patch will add the proper error message for such case. Link: https://lore.kernel.org/linux-btrfs/CABXGCsNzVxo4iq-tJSGm_kO1UggHXgq6CdcHDL=z5FL4njYXSQ@mail.gmail.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03btrfs: fix ASSERT em->len condition in btrfs_get_extentTanmay Bhushan
The em->len value is supposed to be verified in the assertion condition though we expect it to be same as the sectorsize. Fixes: a196a8944f77 ("btrfs: do not reset extent map members for inline extents read") Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Tanmay Bhushan <007047221b@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03perf test record_probe_libc_inet_pton: Fix failure due to extra inet_pton() ↵Arnaldo Carvalho de Melo
backtrace in glibc >= 2.35 Starting with glibc 2.35 there are extra inet_pton() calls when doing a IPv6 ping as in one of the 'perf test' entry, which makes it fail: # perf test inet_pton 89: probe libc's inet_pton & backtrace it with ping : FAILED! # If we look at what this script is expecting (commenting out the removal of the temporary files in it): # cat /tmp/expected.aT6 ping[][0-9 \.:]+probe_libc:inet_pton: \([[:xdigit:]]+\) .*inet_pton\+0x[[:xdigit:]]+[[:space:]]\(/usr/lib64/libc.so.6|inlined\)$ getaddrinfo\+0x[[:xdigit:]]+[[:space:]]\(/usr/lib64/libc.so.6\)$ .*(\+0x[[:xdigit:]]+|\[unknown\])[[:space:]]\(.*/bin/ping.*\)$ # And looking at what we are getting out of 'perf script', to match with the above: # cat /tmp/perf.script.IUC ping 623883 [006] 265438.471610: probe_libc:inet_pton: (7f32bcf314c0) 1314c0 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6) 29510 __libc_start_call_main+0x80 (/usr/lib64/libc.so.6) ping 623883 [006] 265438.471664: probe_libc:inet_pton: (7f32bcf314c0) 1314c0 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6) fa6c6 getaddrinfo+0x126 (/usr/lib64/libc.so.6) 491e [unknown] (/usr/bin/ping) # We see that its just the first call to inet_pton() that didn't came thru getaddrinfo(), so if we ignore the first the script matches what it expects, testing that using 'perf probe' + 'perf record' + 'perf script' with callchains on userspace targets is producing the expected results. Since we don't have a 'perf script --skip' to help us here, use tac + grep to do that, resulting in a one liner that makes this script work on both older glibc versions as well as with 2.35. With it, on fedora 36, x86, glibc 2.35: # perf test inet_pton 90: probe libc's inet_pton & backtrace it with ping : Ok # perf test -v inet_pton 90: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 627197 ping 627220 1 267956.962402: probe_libc:inet_pton_1: (7f488bf314c0) 1314c0 __GI___inet_pton+0x0 (/usr/lib64/libc.so.6) fa6c6 getaddrinfo+0x126 (/usr/lib64/libc.so.6) 491e n (/usr/bin/ping) test child finished with 0 ---- end ---- probe libc's inet_pton & backtrace it with ping: Ok # And on Ubuntu 22.04.1 LTS on a Libre Computer ROC-RK3399-PC arm64 system: Before this patch it works (see that the script used has no 'tac' to remove the first event): root@roc-rk3399-pc:~# dpkg -l | grep libc-bin ii libc-bin 2.35-0ubuntu3.1 arm64 GNU C Library: Binaries root@roc-rk3399-pc:~# grep -w tac ~acme/libexec/perf-core/tests/shell/record+probe_libc_inet_pton.sh root@roc-rk3399-pc:~# perf test inet_pton 86: probe libc's inet_pton & backtrace it with ping : Ok root@roc-rk3399-pc:~# perf test -v inet_pton 86: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 1375 ping 1399 [000] 4114.417450: probe_libc:inet_pton: (ffffb3e26120) 106120 inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc.so.6) d18bc getaddrinfo+0xec (/usr/lib/aarch64-linux-gnu/libc.so.6) 2b68 [unknown] (/usr/bin/ping) test child finished with 0 ---- end ---- probe libc's inet_pton & backtrace it with ping: Ok root@roc-rk3399-pc:~# And after it continues to work: root@roc-rk3399-pc:~# grep -w tac ~acme/libexec/perf-core/tests/shell/record+probe_libc_inet_pton.sh perf script -i $perf_data | tac | grep -m1 ^ping -B9 | tac > $perf_script root@roc-rk3399-pc:~# perf test inet_pton 86: probe libc's inet_pton & backtrace it with ping : Ok root@roc-rk3399-pc:~# perf test -v inet_pton 86: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 6995 ping 7019 [005] 4832.160741: probe_libc:inet_pton: (ffffa62e6120) 106120 inet_pton+0x0 (/usr/lib/aarch64-linux-gnu/libc.so.6) d18bc getaddrinfo+0xec (/usr/lib/aarch64-linux-gnu/libc.so.6) 2b68 [unknown] (/usr/bin/ping) test child finished with 0 ---- end ---- probe libc's inet_pton & backtrace it with ping: Ok root@roc-rk3399-pc:~# Reported-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/Y7QyPkPlDYip3cZH@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-03drm/scheduler: Fix lockup in drm_sched_entity_kill()Dmitry Osipenko
The drm_sched_entity_kill() is invoked twice by drm_sched_entity_destroy() while userspace process is exiting or being killed. First time it's invoked when sched entity is flushed and second time when entity is released. This causes a lockup within wait_for_completion(entity_idle) due to how completion API works. Calling wait_for_completion() more times than complete() was invoked is a error condition that causes lockup because completion internally uses counter for complete/wait calls. The complete_all() must be used instead in such cases. This patch fixes lockup of Panfrost driver that is reproducible by killing any application in a middle of 3d drawing operation. Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://patchwork.freedesktop.org/patch/msgid/20221123001303.533968-1-dmitry.osipenko@collabora.com
2023-01-03usb: rndis_host: Secure rndis_query check against int overflowSzymon Heidrich
Variables off and len typed as uint32 in rndis_query function are controlled by incoming RNDIS response message thus their value may be manipulated. Setting off to a unexpectetly large value will cause the sum with len and 8 to overflow and pass the implemented validation step. Consequently the response pointer will be referring to a location past the expected buffer boundaries allowing information leakage e.g. via RNDIS_OID_802_3_PERMANENT_ADDRESS OID. Fixes: ddda08624013 ("USB: rndis_host, various cleanups") Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-03net: dpaa: Fix dtsec check for PCS availabilitySean Anderson
We want to fail if the PCS is not available, not if it is available. Fix this condition. Fixes: 5d93cfcf7360 ("net: dpaa: Convert to phylink") Reported-by: Christian Zigotzky <info@xenosoft.de> Signed-off-by: Sean Anderson <seanga2@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-03octeontx2-pf: Fix lmtst ID used in aura freeGeetha sowjanya
Current code uses per_cpu pointer to get the lmtst_id mapped to the core on which aura_free() is executed. Using per_cpu pointer without preemption disable causing mismatch between lmtst_id and core on which pointer gets freed. This patch fixes the issue by disabling preemption around aura_free. Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-03drivers/net/bonding/bond_3ad: return when there's no aggregatorDaniil Tatianin
Otherwise we would dereference a NULL aggregator pointer when calling __set_agg_ports_ready on the line below. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Use signed integer in ipv6_skip_exthdr() called from nf_confirm(). Reported by static analysis tooling, patch from Florian Westphal. 2) Missing set type checks in nf_tables: Validate that set declaration matches the an existing set type, otherwise bail out with EEXIST. Currently, nf_tables silently accepts the re-declaration with a different type but it bails out later with EINVAL when the user adds entries to the set. This fix is relatively large because it requires two preparation patches that are included in this batch. 3) Do not ignore updates of timeout and gc_interval parameters in existing sets. 4) Fix a hang when 0/0 subnets is added to a hash:net,port,net type of ipset. Except hash:net,port,net and hash:net,iface, the set types don't support 0/0 and the auxiliary functions rely on this fact. So 0/0 needs a special handling in hash:net,port,net which was missing (hash:net,iface was not affected by this bug), from Jozsef Kadlecsik. 5) When adding/deleting large number of elements in one step in ipset, it can take a reasonable amount of time and can result in soft lockup errors. This patch is a complete rework of the previous version in order to use a smaller internal batch limit and at the same time removing the external hard limit to add arbitrary number of elements in one step. Also from Jozsef Kadlecsik. Except for patch #1, which fixes a bug introduced in the previous net-next development cycle, anything else has been broken for several releases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-03Merge drm/drm-fixes into drm-misc-fixesMaxime Ripard
Let's start the fixes cycle. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2023-01-02io_uring/io-wq: free worker if task_work creation is canceledJens Axboe
If we cancel the task_work, the worker will never come into existance. As this is the last reference to it, ensure that we get it freed appropriately. Cc: stable@vger.kernel.org Reported-by: 진호 <wnwlsgh98@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-02Merge tag 'for-6.2-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "First batch of regression and regular fixes: - regressions: - fix error handling after conversion to qstr for paths - fix raid56/scrub recovery caused by uninitialized variable after conversion to error bitmaps - restore qgroup backref lookup behaviour after recent refactoring - fix leak of device lists at module exit time - fix resolving backrefs for inline extent followed by prealloc - reset defrag ioctl buffer on memory allocation error" * tag 'for-6.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix fscrypt name leak after failure to join log transaction btrfs: scrub: fix uninitialized return value in recover_scrub_rbio btrfs: fix resolving backrefs for inline extent followed by prealloc btrfs: fix trace event name typo for FLUSH_DELAYED_REFS btrfs: restore BTRFS_SEQ_LAST when looking up qgroup backref lookup btrfs: fix leak of fs devices after removing btrfs module btrfs: fix an error handling path in btrfs_defrag_leaves() btrfs: fix an error handling path in btrfs_rename()
2023-01-02fs/ntfs3: don't hold ni_lock when calling truncate_setsize()Tetsuo Handa
syzbot is reporting hung task at do_user_addr_fault() [1], for there is a silent deadlock between PG_locked bit and ni_lock lock. Since filemap_update_page() calls filemap_read_folio() after calling folio_trylock() which will set PG_locked bit, ntfs_truncate() must not call truncate_setsize() which will wait for PG_locked bit to be cleared when holding ni_lock lock. Link: https://lore.kernel.org/all/00000000000060d41f05f139aa44@google.com/ Link: https://syzkaller.appspot.com/bug?extid=bed15dbf10294aa4f2ae [1] Reported-by: syzbot <syzbot+bed15dbf10294aa4f2ae@syzkaller.appspotmail.com> Debugged-by: Linus Torvalds <torvalds@linux-foundation.org> Co-developed-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-02x86/kexec: Fix double-free of elf header bufferTakashi Iwai
After b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer"), freeing image->elf_headers in the error path of crash_load_segments() is not needed because kimage_file_post_load_cleanup() will take care of that later. And not clearing it could result in a double-free. Drop the superfluous vfree() call at the error path of crash_load_segments(). Fixes: b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer") Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20221122115122.13937-1-tiwai@suse.de
2023-01-02kcsan: test: don't put the expect array on the stackMax Filippov
Size of the 'expect' array in the __report_matches is 1536 bytes, which is exactly the default frame size warning limit of the xtensa architecture. As a result allmodconfig xtensa kernel builds with the gcc that does not support the compiler plugins (which otherwise would push the said warning limit to 2K) fail with the following message: kernel/kcsan/kcsan_test.c:257:1: error: the frame size of 1680 bytes is larger than 1536 bytes Fix it by dynamically allocating the 'expect' array. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com>
2023-01-02perf tools: Fix segfault when trying to process tracepoints in perf.data and ↵Arnaldo Carvalho de Melo
not linked with libtraceevent When we have a perf.data file with tracepoints, such as: # perf evlist -f probe_perf:lzma_decompress_to_file # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events # We end up segfaulting when using perf built with NO_LIBTRACEEVENT=1 by trying to find an evsel with a NULL 'event_name' variable: (gdb) run report --stdio -f Starting program: /root/bin/perf report --stdio -f Program received signal SIGSEGV, Segmentation fault. 0x000000000055219d in find_evsel (evlist=0xfda7b0, event_name=0x0) at util/sort.c:2830 warning: Source file is more recent than executable. 2830 if (event_name[0] == '%') { Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-11.fc36.x86_64 cyrus-sasl-lib-2.1.27-18.fc36.x86_64 elfutils-debuginfod-client-0.188-3.fc36.x86_64 elfutils-libelf-0.188-3.fc36.x86_64 elfutils-libs-0.188-3.fc36.x86_64 glibc-2.35-20.fc36.x86_64 keyutils-libs-1.6.1-4.fc36.x86_64 krb5-libs-1.19.2-12.fc36.x86_64 libbrotli-1.0.9-7.fc36.x86_64 libcap-2.48-4.fc36.x86_64 libcom_err-1.46.5-2.fc36.x86_64 libcurl-7.82.0-12.fc36.x86_64 libevent-2.1.12-6.fc36.x86_64 libgcc-12.2.1-4.fc36.x86_64 libidn2-2.3.4-1.fc36.x86_64 libnghttp2-1.51.0-1.fc36.x86_64 libpsl-0.21.1-5.fc36.x86_64 libselinux-3.3-4.fc36.x86_64 libssh-0.9.6-4.fc36.x86_64 libstdc++-12.2.1-4.fc36.x86_64 libunistring-1.0-1.fc36.x86_64 libunwind-1.6.2-2.fc36.x86_64 libxcrypt-4.4.33-4.fc36.x86_64 libzstd-1.5.2-2.fc36.x86_64 numactl-libs-2.0.14-5.fc36.x86_64 opencsd-1.2.0-1.fc36.x86_64 openldap-2.6.3-1.fc36.x86_64 openssl-libs-3.0.5-2.fc36.x86_64 slang-2.3.2-11.fc36.x86_64 xz-libs-5.2.5-9.fc36.x86_64 zlib-1.2.11-33.fc36.x86_64 (gdb) bt #0 0x000000000055219d in find_evsel (evlist=0xfda7b0, event_name=0x0) at util/sort.c:2830 #1 0x0000000000552416 in add_dynamic_entry (evlist=0xfda7b0, tok=0xffb6eb "trace", level=2) at util/sort.c:2976 #2 0x0000000000552d26 in sort_dimension__add (list=0xf93e00 <perf_hpp_list>, tok=0xffb6eb "trace", evlist=0xfda7b0, level=2) at util/sort.c:3193 #3 0x0000000000552e1c in setup_sort_list (list=0xf93e00 <perf_hpp_list>, str=0xffb6eb "trace", evlist=0xfda7b0) at util/sort.c:3227 #4 0x00000000005532fa in __setup_sorting (evlist=0xfda7b0) at util/sort.c:3381 #5 0x0000000000553cdc in setup_sorting (evlist=0xfda7b0) at util/sort.c:3608 #6 0x000000000042eb9f in cmd_report (argc=0, argv=0x7fffffffe470) at builtin-report.c:1596 #7 0x00000000004aee7e in run_builtin (p=0xf64ca0 <commands+288>, argc=3, argv=0x7fffffffe470) at perf.c:330 #8 0x00000000004af0f2 in handle_internal_command (argc=3, argv=0x7fffffffe470) at perf.c:384 #9 0x00000000004af241 in run_argv (argcp=0x7fffffffe29c, argv=0x7fffffffe290) at perf.c:428 #10 0x00000000004af5fc in main (argc=3, argv=0x7fffffffe470) at perf.c:562 (gdb) So check if we have tracepoint events in add_dynamic_entry() and bail out instead: # perf report --stdio -f This perf binary isn't linked with libtraceevent, can't process probe_perf:lzma_decompress_to_file Error: Unknown --sort key: `trace' # Fixes: 378ef0f5d9d7f465 ("perf build: Use libtraceevent from the system") Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/Y7MDb7kRaHZB6APC@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-02nfsd: fix handling of readdir in v4root vs. mount upcall timeoutJeff Layton
If v4 READDIR operation hits a mountpoint and gets back an error, then it will include that entry in the reply and set RDATTR_ERROR for it to the error. That's fine for "normal" exported filesystems, but on the v4root, we need to be more careful to only expose the existence of dentries that lead to exports. If the mountd upcall times out while checking to see whether a mountpoint on the v4root is exported, then we have no recourse other than to fail the whole operation. Cc: Steve Dickson <steved@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777 Reported-by: JianHong Yin <yin-jianhong@163.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org>