summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-07-18Merge tag 'linux-can-fixes-for-6.5-20230717' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2023-07-17 The 1st patch is by Ziyang Xuan and fixes a possible memory leak in the receiver handling in the CAN RAW protocol. YueHaibing contributes a use after free in bcm_proc_show() of the Broad Cast Manager (BCM) CAN protocol. The next 2 patches are by me and fix a possible null pointer dereference in the RX path of the gs_usb driver with activated hardware timestamps and the candlelight firmware. The last patch is by Fedor Ross, Marek Vasut and me and targets the mcp251xfd driver. The polling timeout of __mcp251xfd_chip_set_mode() is increased to fix bus joining on busy CAN buses and very low bit rate. * tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout can: gs_usb: fix time stamp counter initialization can: gs_usb: gs_can_open(): improve error handling can: bcm: Fix UAF in bcm_proc_show() can: raw: fix receiver memory leak ==================== Link: https://lore.kernel.org/r/20230717180938.230816-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18mailmap: Add entry for old intel emailJohn Fastabend
Fix old email to avoid bouncing email from net/drivers and older netdev work. Anyways my @intel email hasn't been active for years. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230717173306.38407-1-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18mailmap: add entries for past livesShannon Nelson
Update old emails for my current work email. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20230717193242.43670-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18Merge branch 'selftests-tc-increase-timeout-and-add-missing-kconfig'Jakub Kicinski
Matthieu Baerts says: ==================== selftests: tc: increase timeout and add missing kconfig When looking for something else in LKFT reports [1], I noticed that the TC selftest ended with a timeout error: not ok 1 selftests: tc-testing: tdc.sh # TIMEOUT 45 seconds I also noticed most of the tests were skipped because the "teardown stage" did not complete successfully. It was due to missing kconfig. These patches fix these two errors plus an extra one because this selftest reads info from "/proc/net/nf_conntrack". Thank you Pedro for having helped me fixing these issues [2]. Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1] Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2] ==================== Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-0-1eb4fd3a96e7@tessares.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18selftests: tc: add ConnTrack procfs kconfigMatthieu Baerts
When looking at the TC selftest reports, I noticed one test was failing because /proc/net/nf_conntrack was not available. not ok 373 3992 - Add ct action triggering DNAT tuple conflict Could not match regex pattern. Verify command output: cat: /proc/net/nf_conntrack: No such file or directory It is only available if NF_CONNTRACK_PROCFS kconfig is set. So the issue can be fixed simply by adding it to the list of required kconfig. Fixes: e46905641316 ("tc-testing: add test for ct DNAT tuple collision") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [1] Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Tested-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-3-1eb4fd3a96e7@tessares.net Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18selftests: tc: add 'ct' action kconfig depMatthieu Baerts
When looking for something else in LKFT reports [1], I noticed most of the tests were skipped because the "teardown stage" did not complete successfully. Pedro found out this is due to the fact CONFIG_NF_FLOW_TABLE is required but not listed in the 'config' file. Adding it to the list fixes the issues on LKFT side. CONFIG_NET_ACT_CT is now set to 'm' in the final kconfig. Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone") Cc: stable@vger.kernel.org Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1] Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2] Suggested-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Tested-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-2-1eb4fd3a96e7@tessares.net Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18selftests: tc: set timeout to 15 minutesMatthieu Baerts
When looking for something else in LKFT reports [1], I noticed that the TC selftest ended with a timeout error: not ok 1 selftests: tc-testing: tdc.sh # TIMEOUT 45 seconds The timeout had been introduced 3 years ago, see the Fixes commit below. This timeout is only in place when executing the selftests via the kselftests runner scripts. I guess this is not what most TC devs are using and nobody noticed the issue before. The new timeout is set to 15 minutes as suggested by Pedro [2]. It looks like it is plenty more time than what it takes in "normal" conditions. Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Cc: stable@vger.kernel.org Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1] Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2] Suggested-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-1-1eb4fd3a96e7@tessares.net Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-18bpf, arm64: Fix BTI type used for freplace attached functionsAlexander Duyck
When running an freplace attached bpf program on an arm64 system w were seeing the following issue: Unhandled 64-bit el1h sync exception on CPU47, ESR 0x0000000036000003 -- BTI After a bit of work to track it down I determined that what appeared to be happening is that the 'bti c' at the start of the program was somehow being reached after a 'br' instruction. Further digging pointed me toward the fact that the function was attached via freplace. This in turn led me to build_plt which I believe is invoking the long jump which is triggering this error. To resolve it we can replace the 'bti c' with 'bti jc' and add a comment explaining why this has to be modified as such. Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/168926677665.316237.9953845318337455525.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18Merge branch 'two-more-fixes-for-check_max_stack_depth'Alexei Starovoitov
Kumar Kartikeya Dwivedi says: ==================== Two more fixes for check_max_stack_depth I noticed two more bugs while reviewing the code, description and examples available in the patches. One leads to incorrect subprog index to be stored in the frame stack maintained by the function (leading to incorrect tail_call_reachable marks, among other things). The other problem is missing exploration pass of other async callbacks when they are not called from the main prog. Call chains rooted at them can thus bypass the stack limits (32 call frames * max permitted stack depth per function). Changelog: ---------- v1 -> v2 v1: https://lore.kernel.org/bpf/20230713003118.1327943-1-memxor@gmail.com * Fix commit message for patch 2 (Alexei) ==================== Link: https://lore.kernel.org/r/20230717161530.1238-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18selftests/bpf: Add more tests for check_max_stack_depth bugKumar Kartikeya Dwivedi
Another test which now exercies the path of the verifier where it will explore call chains rooted at the async callback. Without the prior fixes, this program loads successfully, which is incorrect. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230717161530.1238-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18bpf: Repeat check_max_stack_depth for async callbacksKumar Kartikeya Dwivedi
While the check_max_stack_depth function explores call chains emanating from the main prog, which is typically enough to cover all possible call chains, it doesn't explore those rooted at async callbacks unless the async callback will have been directly called, since unlike non-async callbacks it skips their instruction exploration as they don't contribute to stack depth. It could be the case that the async callback leads to a callchain which exceeds the stack depth, but this is never reachable while only exploring the entry point from main subprog. Hence, repeat the check for the main subprog *and* all async callbacks marked by the symbolic execution pass of the verifier, as execution of the program may begin at any of them. Consider functions with following stack depths: main: 256 async: 256 foo: 256 main: rX = async bpf_timer_set_callback(...) async: foo() Here, async is not descended as it does not contribute to stack depth of main (since it is referenced using bpf_pseudo_func and not bpf_pseudo_call). However, when async is invoked asynchronously, it will end up breaching the MAX_BPF_STACK limit by calling foo. Hence, in addition to main, we also need to explore call chains beginning at all async callback subprogs in a program. Fixes: 7ddc80a476c2 ("bpf: Teach stack depth check about async callbacks.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230717161530.1238-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18bpf: Fix subprog idx logic in check_max_stack_depthKumar Kartikeya Dwivedi
The assignment to idx in check_max_stack_depth happens once we see a bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of the code performs a few checks and then pushes the frame to the frame stack, except the case of async callbacks. If the async callback case causes the loop iteration to be skipped, the idx assignment will be incorrect on the next iteration of the loop. The value stored in the frame stack (as the subprogno of the current subprog) will be incorrect. This leads to incorrect checks and incorrect tail_call_reachable marking. Save the target subprog in a new variable and only assign to idx once we are done with the is_async_cb check which may skip pushing of frame to the frame stack and subsequent stack depth checks and tail call markings. Fixes: 7ddc80a476c2 ("bpf: Teach stack depth check about async callbacks.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230717161530.1238-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18Merge tag 'perf-tools-fixes-for-v6.5-1-2023-07-18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Don't group events when computing metrics that require more than the maximum number of simultaneously enabled events on AMD systems. - Fix multi CU handling in 'perf probe', add a 'perf test' entry to regress test it. - Make the 'perf test task_exit' stop generating samples by using the 'dummy' event, all it is testing is if a PERF_RECORD_EXIT is generated at the end of a perf session. This makes this perf test to stop sometimes failing on some systems due to a full ring buffer. - Avoid SEGV if PMU lookup fails for legacy cache terms. - Fix libsubcmd SEGV/use-after-free when commands aren't excluded. - Fix OpenCSD (ARM64's CoreSight hardware tracing) library path resolution when specifying CSLIBS= in the make command line. - Fix broken feature check for libtracefs due to external lib changes, use the provided pkgconfig file instead future proof it. - Sync drm, fcntl, kvm, mount, prctl, socket, vhost, asound, arm64's cputype headers with the kernel sources, in some cases this made the tools become aware of new kernel APIs such as ioctls and the cachestat sysctl. * tag 'perf-tools-fixes-for-v6.5-1-2023-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf test task_exit: No need for a cycles event to check if we get an PERF_RECORD_EXIT tools headers arm64: Sync arm64's cputype.h with the kernel sources tools include UAPI: Sync the sound/asound.h copy with the kernel sources tools include UAPI: Sync linux/vhost.h with the kernel sources perf beauty: Update copy of linux/socket.h with the kernel sources perf parse-events: Avoid SEGV if PMU lookup fails for legacy cache terms libsubcmd: Avoid SEGV/use-after-free when commands aren't excluded tools headers UAPI: Sync linux/prctl.h with the kernel sources perf build: Fix broken feature check for libtracefs due to external lib changes tools include UAPI: Sync linux/mount.h copy with the kernel sources tools headers UAPI: Sync linux/kvm.h with the kernel sources tools headers uapi: Sync linux/fcntl.h with the kernel sources perf vendor events amd: Fix large metrics perf build: Fix library not found error when using CSLIBS tools headers UAPI: Sync files changed by new cachestat syscall with the kernel sources tools headers UAPI: Sync drm/i915_drm.h with the kernel sources perf probe: Read DWARF files from the correct CU perf probe: Add test for regression introduced by switch to die_get_decl_file()
2023-07-18Merge tag 'mm-hotfixes-stable-2023-07-18-12-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Seven hotfixes, six of which are cc:stable and one of which addresses a post-6.5 issue" * tag 'mm-hotfixes-stable-2023-07-18-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: maple_tree: fix node allocation testing on 32 bit maple_tree: fix 32 bit mas_next testing selftests/mm: mkdirty: fix incorrect position of #endif maple_tree: set the node limit when creating a new root node mm/mlock: fix vma iterator conversion of apply_vma_lock_flags() prctl: move PR_GET_AUXV out of PR_MCE_KILL selftests/mm: give scripts execute permission
2023-07-18io_uring: don't audit the capability check in io_uring_create()Ondrej Mosnacek
The check being unconditional may lead to unwanted denials reported by LSMs when a process has the capability granted by DAC, but denied by an LSM. In the case of SELinux such denials are a problem, since they can't be effectively filtered out via the policy and when not silenced, they produce noise that may hide a true problem or an attack. Since not having the capability merely means that the created io_uring context will be accounted against the current user's RLIMIT_MEMLOCK limit, we can disable auditing of denials for this check by using ns_capable_noaudit() instead of capable(). Fixes: 2b188cc1bb85 ("Add io_uring IO interface") Link: https://bugzilla.redhat.com/show_bug.cgi?id=2193317 Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/20230718115607.65652-1-omosnace@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-18drm/amdgpu: use a macro to define no xcp partition caseGuchun Chen
~0 as no xcp partition is used in several places, so improve its definition by a macro for code consistency. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu/vm: use the same xcp_id from root PDGuchun Chen
Other PDs/PTs allocation should just use the same xcp_id as that stored in root PD. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_createGuchun Chen
Recent code set xcp_id stored from file private data when opening device to amdgpu bo for accounting memory usage etc, but not all VMs are attached to this fpriv structure like the vm cases in amdgpu_mes_self_test, otherwise, KASAN will complain below out of bound access. And more importantly, VM code should not touch fpriv structure, so drop fpriv code handling from amdgpu_vm_pt. [ 77.292314] BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.293845] Read of size 4 at addr ffff888102c48a48 by task modprobe/1069 [ 77.294146] Call Trace: [ 77.294178] <TASK> [ 77.294208] dump_stack_lvl+0x49/0x63 [ 77.294260] print_report+0x16f/0x4a6 [ 77.294307] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.295979] ? kasan_complete_mode_report_info+0x3c/0x200 [ 77.296057] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.297556] kasan_report+0xb4/0x130 [ 77.297609] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.299202] __asan_load4+0x6f/0x90 [ 77.299272] amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.300796] ? amdgpu_init+0x6e/0x1000 [amdgpu] [ 77.302222] ? amdgpu_vm_pt_clear+0x750/0x750 [amdgpu] [ 77.303721] ? preempt_count_sub+0x18/0xc0 [ 77.303786] amdgpu_vm_init+0x39e/0x870 [amdgpu] [ 77.305186] ? amdgpu_vm_wait_idle+0x90/0x90 [amdgpu] [ 77.306683] ? kasan_set_track+0x25/0x30 [ 77.306737] ? kasan_save_alloc_info+0x1b/0x30 [ 77.306795] ? __kasan_kmalloc+0x87/0xa0 [ 77.306852] amdgpu_mes_self_test+0x169/0x620 [amdgpu] v2: without specifying xcp partition for PD/PT bo, the xcp id is -1. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2686 Fixes: 3ebfd221c1a8 ("drm/amdkfd: Store xcp partition id to amdgpu bo") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: Allocate root PD on correct partitionGuchun Chen
file_priv needs to be setup firstly, otherwise, root PD will always be allocated on partition 0, even if opening the device from other partitions. Fixes: 3ebfd221c1a8 ("drm/amdkfd: Store xcp partition id to amdgpu bo") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amd/display: Keep PHY active for DP displays on DCN31Nicholas Kazlauskas
[Why & How] Port of a change that went into DCN314 to keep the PHY enabled when we have a connected and active DP display. The PHY can hang if PHY refclk is disabled inadvertently. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Josip Pavic <josip.pavic@amd.com> Acked-by: Alan Liu <haoping.liu@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amd/display: Prevent vtotal from being set to 0Daniel Miess
[Why] In dcn314 DML the destination pipe vtotal was being set to the crtc adjustment vtotal_min value even in cases where that value is 0. [How] Only set vtotal to the crtc adjustment vtotal_min value in cases where the value is non-zero. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Alan Liu <haoping.liu@amd.com> Signed-off-by: Daniel Miess <daniel.miess@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amd/display: Disable MPC split by default on special asicZhikai Zhai
[WHY] All of pipes will be used when the MPC split enable on the dcn which just has 2 pipes. Then MPO enter will trigger the minimal transition which need programe dcn from 2 pipes MPC split to 2 pipes MPO. This action will cause lag if happen frequently. [HOW] Disable the MPC split for the platform which dcn resource is limited Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Alan Liu <haoping.liu@amd.com> Signed-off-by: Zhikai Zhai <zhikai.zhai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amd/display: check TG is non-null before checking if enabledTaimur Hassan
[Why & How] If there is no TG allocation we can dereference a NULL pointer when checking if the TG is enabled. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Alan Liu <haoping.liu@amd.com> Signed-off-by: Taimur Hassan <syed.hassan@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amd/display: Add polling method to handle MST reply packetWayne Lin
[Why] Specific TBT4 dock doesn't send out short HPD to notify source that IRQ event DOWN_REP_MSG_RDY is set. Which violates the spec and cause source can't send out streams to mst sinks. [How] To cover this misbehavior, add an additional polling method to detect DOWN_REP_MSG_RDY is set. HPD driven handling method is still kept. Just hook up our handler to drm mgr->cbs->poll_hpd_irq(). Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Jerry Zuo <jerry.zuo@amd.com> Acked-by: Alan Liu <haoping.liu@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amd/display: Clean up errors & warnings in amdgpu_dm.cSrinivasan Shanmugam
Fix the following errors & warnings reported by checkpatch: ERROR: space required before the open brace '{' ERROR: space required before the open parenthesis '(' ERROR: that open brace { should be on the previous line ERROR: space prohibited before that ',' (ctx:WxW) ERROR: else should follow close brace '}' ERROR: open brace '{' following function definitions go on the next line ERROR: code indent should use tabs where possible WARNING: braces {} are not necessary for single statement blocks WARNING: void function return statements are not generally useful WARNING: Block comments use * on subsequent lines WARNING: Block comments use a trailing */ on a separate line Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: Allow the initramfs generator to include psp_13_0_6_taCandice Li
Allow the initramfs generator to automatically include psp_13_0_6_ta firmware to initramfs. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu/pm: make mclk consistent for smu 13.0.7Alex Deucher
Use current uclk to be consistent with other dGPUs. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-07-18drm/amdgpu/pm: make gfxclock consistent for sienna cichlidAlex Deucher
Use average gfxclock for consistency with other dGPUs. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-07-18drm/amd/display: only accept async flips for fast updatesSimon Ser
Up until now, amdgpu was silently degrading to vsync when user-space requested an async flip but the hardware didn't support it. The hardware doesn't support immediate flips when the update changes the FB pitch, the DCC state, the rotation, enables or disables CRTCs or planes, etc. This is reflected in the dm_crtc_state.update_type field: UPDATE_TYPE_FAST means that immediate flip is supported. Silently degrading async flips to vsync is not the expected behavior from a uAPI point-of-view. Xorg expects async flips to fail if unsupported, to be able to fall back to a blit. i915 already behaves this way. This patch aligns amdgpu with uAPI expectations and returns a failure when an async flip is not possible. Signed-off-by: Simon Ser <contact@emersion.fr> Reviewed-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-07-18drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancelGuchun Chen
In below thousands of screen rotation loop tests with virtual display enabled, a CPU hard lockup issue may happen, leading system to unresponsive and crash. do { xrandr --output Virtual --rotate inverted xrandr --output Virtual --rotate right xrandr --output Virtual --rotate left xrandr --output Virtual --rotate normal } while (1); NMI watchdog: Watchdog detected hard LOCKUP on cpu 1 ? hrtimer_run_softirq+0x140/0x140 ? store_vblank+0xe0/0xe0 [drm] hrtimer_cancel+0x15/0x30 amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu] drm_vblank_disable_and_save+0x185/0x1f0 [drm] drm_crtc_vblank_off+0x159/0x4c0 [drm] ? record_print_text.cold+0x11/0x11 ? wait_for_completion_timeout+0x232/0x280 ? drm_crtc_wait_one_vblank+0x40/0x40 [drm] ? bit_wait_io_timeout+0xe0/0xe0 ? wait_for_completion_interruptible+0x1d7/0x320 ? mutex_unlock+0x81/0xd0 amdgpu_vkms_crtc_atomic_disable It's caused by a stuck in lock dependency in such scenario on different CPUs. CPU1 CPU2 drm_crtc_vblank_off hrtimer_interrupt grab event_lock (irq disabled) __hrtimer_run_queues grab vbl_lock/vblank_time_block amdgpu_vkms_vblank_simulate amdgpu_vkms_disable_vblank drm_handle_vblank hrtimer_cancel grab dev->event_lock So CPU1 stucks in hrtimer_cancel as timer callback is running endless on current clock base, as that timer queue on CPU2 has no chance to finish it because of failing to hold the lock. So NMI watchdog will throw the errors after its threshold, and all later CPUs are impacted/blocked. So use hrtimer_try_to_cancel to fix this, as disable_vblank callback does not need to wait the handler to finish. And also it's not necessary to check the return value of hrtimer_try_to_cancel, because even if it's -1 which means current timer callback is running, it will be reprogrammed in hrtimer_start with calling enable_vblank to make it works. v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes tag as well v3: drop warn printing (Christian) v4: drop superfluous check of blank->enabled in timer function, as it's guaranteed in drm_handle_vblank (Christian) Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)") Cc: stable@vger.kernel.org Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amd/display: add DCN301 specific logic for OTG programmingAurabindo Pillai
[Why&How] DCN301 does not have FAMS hence the workaround needed on other DCN3x variants related to OTG min/max selector programming is not applicable for it. Hence isolate it and have it use the old sequence without workaround. Fixes: 1598fc576420 ("drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+") Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amd/display: export some optc function for reuseAurabindo Pillai
[Why&How] Make a few functions non static so that they can be reused for other asic. This is in preparation for separating out OTG programming sequence for DCN301 Fixes: 1598fc576420 ("drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+") Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amd: Use amdgpu_device_pcie_dynamic_switching_supported() for SMU7Mario Limonciello
SMU7 does a check if the dGPU is inserted into a Rocket Lake system, to turn off DPM. Extend this check to all systems that have problems with dynamic switching by using the amdgpu_device_pcie_dynamic_switching_supported() helper. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18Merge tag 'linux-kselftest-fixes-6.5-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "Fixes to bugs that are interfering with arm64 and risc workflows. Also two fixes to timer and mincore tests that are causing test failures" * tag 'linux-kselftest-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/arm64: fix build failure during the "emit_tests" step selftests/riscv: fix potential build failure during the "emit_tests" step tools: timers: fix freq average calculation selftests/mincore: fix skip condition for check_huge_pages test
2023-07-18Merge tag 'tpmdd-v6.5-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fixes from Jarkko Sakkinen. Mostly interrupt storm fixes, with some other minor changes. * tag 'tpmdd-v6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm,tpm_tis: Disable interrupts after 1000 unhandled IRQs tpm/tpm_tis: Disable interrupts for Lenovo L590 devices tpm: Do not remap from ACPI resources again for Pluton TPM tpm/tpm_tis: Disable interrupts for Framework Laptop Intel 13th gen tpm/tpm_tis: Disable interrupts for Framework Laptop Intel 12th gen security: keys: Modify mismatched function name tpm: return false from tpm_amd_is_rng_defective on non-x86 platforms keys: Fix linking a duplicate key to a keyring's assoc_array tpm: tis_i2c: Limit write bursts to I2C_SMBUS_BLOCK_MAX (32) bytes tpm: tis_i2c: Limit read bursts to I2C_SMBUS_BLOCK_MAX (32) bytes tpm_tis_spi: Release chip select when flow control fails tpm: tpm_tis: Disable interrupts *only* for AEON UPX-i11 tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation
2023-07-18ALSA: hda/realtek: Add quirk for Clevo NS70AUChristoffer Sandberg
Fixes headset detection on Clevo NS70AU. Co-developed-by: Werner Sembach <wse@tuxedocomputers.com> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Signed-off-by: Christoffer Sandberg <cs@tuxedo.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20230718145722.10592-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-07-18s390/mm: fix per vma lock fault handlingSven Schnelle
With per-vma locks, handle_mm_fault() may return non-fatal error flags. In this case the code should reset the fault flags before returning. Fixes: e06f47a16573 ("s390/mm: try VMA lock-based page fault handling first") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-18octeontx2-pf: Dont allocate BPIDs for LBK interfacesGeetha sowjanya
Current driver enables backpressure for LBK interfaces. But these interfaces do not support this feature. Hence, this patch fixes the issue by skipping the backpressure configuration for these interfaces. Fixes: 75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool"). Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Link: https://lore.kernel.org/r/20230716093741.28063-1-gakula@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-18vrf: Fix lockdep splat in output pathIdo Schimmel
Cited commit converted the neighbour code to use the standard RCU variant instead of the RCU-bh variant, but the VRF code still uses rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup code in its IPv4 and IPv6 output paths, resulting in lockdep splats [1][2]. Can be reproduced using [3]. Fix by switching to rcu_read_lock() / rcu_read_unlock(). [1] ============================= WARNING: suspicious RCU usage 6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted ----------------------------- include/net/neighbour.h:302 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by ping/183: #0: ffff888105ea1d80 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0xc6c/0x33c0 #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output+0x2e3/0x2030 stack backtrace: CPU: 0 PID: 183 Comm: ping Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xc1/0xf0 lockdep_rcu_suspicious+0x211/0x3b0 vrf_output+0x1380/0x2030 ip_push_pending_frames+0x125/0x2a0 raw_sendmsg+0x200d/0x33c0 inet_sendmsg+0xa2/0xe0 __sys_sendto+0x2aa/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [2] ============================= WARNING: suspicious RCU usage 6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted ----------------------------- include/net/neighbour.h:302 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by ping6/182: #0: ffff888114b63000 (sk_lock-AF_INET6){+.+.}-{0:0}, at: rawv6_sendmsg+0x1602/0x3e50 #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output6+0xe9/0x1310 stack backtrace: CPU: 0 PID: 182 Comm: ping6 Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xc1/0xf0 lockdep_rcu_suspicious+0x211/0x3b0 vrf_output6+0xd32/0x1310 ip6_local_out+0xb4/0x1a0 ip6_send_skb+0xbc/0x340 ip6_push_pending_frames+0xe5/0x110 rawv6_sendmsg+0x2e6e/0x3e50 inet_sendmsg+0xa2/0xe0 __sys_sendto+0x2aa/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [3] #!/bin/bash ip link add name vrf-red up numtxqueues 2 type vrf table 10 ip link add name swp1 up master vrf-red type dummy ip address add 192.0.2.1/24 dev swp1 ip address add 2001:db8:1::1/64 dev swp1 ip neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud perm dev swp1 ip neigh add 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud perm dev swp1 ip vrf exec vrf-red ping 192.0.2.2 -c 1 &> /dev/null ip vrf exec vrf-red ping6 2001:db8:1::2 -c 1 &> /dev/null Fixes: 09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/netdev/CA+G9fYtEr-=GbcXNDYo3XOkwR+uYgehVoDjsP0pFLUpZ_AZcyg@mail.gmail.com/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230715153605.4068066-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-18Merge branch '100GbE' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-07-14 (ice) This series contains updates to ice driver only. Petr Oros removes multiple calls made to unregister netdev and devlink_port. Michal fixes null pointer dereference that can occur during reload. ==================== Link: https://lore.kernel.org/r/20230714201041.1717834-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-07-18KVM: s390: pv: fix index value of replaced ASCEClaudio Imbrenda
The index field of the struct page corresponding to a guest ASCE should be 0. When replacing the ASCE in s390_replace_asce(), the index of the new ASCE should also be set to 0. Having the wrong index might lead to the wrong addresses being passed around when notifying pte invalidations, and eventually to validity intercepts (VM crash) if the prefix gets unmapped and the notifier gets called with the wrong address. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Fixes: faa2f72cb356 ("KVM: s390: pv: leak the topmost page table when destroy fails") Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20230705111937.33472-3-imbrenda@linux.ibm.com>
2023-07-18KVM: s390: pv: simplify shutdown and fix raceClaudio Imbrenda
Simplify the shutdown of non-protected VMs. There is no need to do complex manipulations of the counter if it was zero. This also fixes a very rare race which caused pages to be torn down from the address space with a non-zero counter even on older machines that don't support the UVC instruction, causing a crash. Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot") Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20230705111937.33472-2-imbrenda@linux.ibm.com>
2023-07-18btrfs: fix warning when putting transaction with qgroups enabled after abortFilipe Manana
If we have a transaction abort with qgroups enabled we get a warning triggered when doing the final put on the transaction, like this: [552.6789] ------------[ cut here ]------------ [552.6815] WARNING: CPU: 4 PID: 81745 at fs/btrfs/transaction.c:144 btrfs_put_transaction+0x123/0x130 [btrfs] [552.6817] Modules linked in: btrfs blake2b_generic xor (...) [552.6819] CPU: 4 PID: 81745 Comm: btrfs-transacti Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [552.6819] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [552.6819] RIP: 0010:btrfs_put_transaction+0x123/0x130 [btrfs] [552.6821] Code: bd a0 01 00 (...) [552.6821] RSP: 0018:ffffa168c0527e28 EFLAGS: 00010286 [552.6821] RAX: ffff936042caed00 RBX: ffff93604a3eb448 RCX: 0000000000000000 [552.6821] RDX: ffff93606421b028 RSI: ffffffff92ff0878 RDI: ffff93606421b010 [552.6821] RBP: ffff93606421b000 R08: 0000000000000000 R09: ffffa168c0d07c20 [552.6821] R10: 0000000000000000 R11: ffff93608dc52950 R12: ffffa168c0527e70 [552.6821] R13: ffff93606421b000 R14: ffff93604a3eb420 R15: ffff93606421b028 [552.6821] FS: 0000000000000000(0000) GS:ffff93675fb00000(0000) knlGS:0000000000000000 [552.6821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [552.6821] CR2: 0000558ad262b000 CR3: 000000014feda005 CR4: 0000000000370ee0 [552.6822] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [552.6822] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [552.6822] Call Trace: [552.6822] <TASK> [552.6822] ? __warn+0x80/0x130 [552.6822] ? btrfs_put_transaction+0x123/0x130 [btrfs] [552.6824] ? report_bug+0x1f4/0x200 [552.6824] ? handle_bug+0x42/0x70 [552.6824] ? exc_invalid_op+0x14/0x70 [552.6824] ? asm_exc_invalid_op+0x16/0x20 [552.6824] ? btrfs_put_transaction+0x123/0x130 [btrfs] [552.6826] btrfs_cleanup_transaction+0xe7/0x5e0 [btrfs] [552.6828] ? _raw_spin_unlock_irqrestore+0x23/0x40 [552.6828] ? try_to_wake_up+0x94/0x5e0 [552.6828] ? __pfx_process_timeout+0x10/0x10 [552.6828] transaction_kthread+0x103/0x1d0 [btrfs] [552.6830] ? __pfx_transaction_kthread+0x10/0x10 [btrfs] [552.6832] kthread+0xee/0x120 [552.6832] ? __pfx_kthread+0x10/0x10 [552.6832] ret_from_fork+0x29/0x50 [552.6832] </TASK> [552.6832] ---[ end trace 0000000000000000 ]--- This corresponds to this line of code: void btrfs_put_transaction(struct btrfs_transaction *transaction) { (...) WARN_ON(!RB_EMPTY_ROOT( &transaction->delayed_refs.dirty_extent_root)); (...) } The warning happens because btrfs_qgroup_destroy_extent_records(), called in the transaction abort path, we free all entries from the rbtree "dirty_extent_root" with rbtree_postorder_for_each_entry_safe(), but we don't actually empty the rbtree - it's still pointing to nodes that were freed. So set the rbtree's root node to NULL to avoid this warning (assign RB_ROOT). Fixes: 81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-18btrfs: fix ordered extent split error handling in btrfs_dio_submit_ioChristoph Hellwig
When the call to btrfs_extract_ordered_extent in btrfs_dio_submit_io fails to allocate memory for a new ordered_extent, it calls into the btrfs_dio_end_io for error handling. btrfs_dio_end_io then assumes that bbio->ordered is set because it is supposed to be at this point, except for this error handling corner case. Try to not overload the btrfs_dio_end_io with error handling of a bio in a non-canonical state, and instead call btrfs_finish_ordered_extent and iomap_dio_bio_end_io directly for this error case. Reported-by: syzbot <syzbot+5b82f0e951f8c2bcdb8f@syzkaller.appspotmail.com> Fixes: b41b6f6937dc ("btrfs: use btrfs_finish_ordered_extent to complete direct writes") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Tested-by: syzbot <syzbot+5b82f0e951f8c2bcdb8f@syzkaller.appspotmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-18btrfs: set_page_extent_mapped after read_folio in btrfs_cont_expandJosef Bacik
While trying to get the subpage blocksize tests running, I hit the following panic on generic/476 assertion failed: PagePrivate(page) && page->private, in fs/btrfs/subpage.c:229 kernel BUG at fs/btrfs/subpage.c:229! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 1 PID: 1453 Comm: fsstress Not tainted 6.4.0-rc7+ #12 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20230301gitf80f052277c8-26.fc38 03/01/2023 pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : btrfs_subpage_assert+0xbc/0xf0 lr : btrfs_subpage_assert+0xbc/0xf0 Call trace: btrfs_subpage_assert+0xbc/0xf0 btrfs_subpage_clear_checked+0x38/0xc0 btrfs_page_clear_checked+0x48/0x98 btrfs_truncate_block+0x5d0/0x6a8 btrfs_cont_expand+0x5c/0x528 btrfs_write_check.isra.0+0xf8/0x150 btrfs_buffered_write+0xb4/0x760 btrfs_do_write_iter+0x2f8/0x4b0 btrfs_file_write_iter+0x1c/0x30 do_iter_readv_writev+0xc8/0x158 do_iter_write+0x9c/0x210 vfs_iter_write+0x24/0x40 iter_file_splice_write+0x224/0x390 direct_splice_actor+0x38/0x68 splice_direct_to_actor+0x12c/0x260 do_splice_direct+0x90/0xe8 generic_copy_file_range+0x50/0x90 vfs_copy_file_range+0x29c/0x470 __arm64_sys_copy_file_range+0xcc/0x498 invoke_syscall.constprop.0+0x80/0xd8 do_el0_svc+0x6c/0x168 el0_svc+0x50/0x1b0 el0t_64_sync_handler+0x114/0x120 el0t_64_sync+0x194/0x198 This happens because during btrfs_cont_expand we'll get a page, set it as mapped, and if it's not Uptodate we'll read it. However between the read and re-locking the page we could have called release_folio() on the page, but left the page in the file mapping. release_folio() can clear the page private, and thus further down we blow up when we go to modify the subpage bits. Fix this by putting the set_page_extent_mapped() after the read. This is safe because read_folio() will call set_page_extent_mapped() before it does the read, and then if we clear page private but leave it on the mapping we're completely safe re-setting set_page_extent_mapped(). With this patch I can now run generic/476 without panicing. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-18btrfs: raid56: always verify the P/Q contents for scrubQu Wenruo
[REGRESSION] Commit 75b470332965 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap") changed the behavior of scrub_rbio(). Initially if we have no error reading the raid bio, we will assign @need_check to true, then finish_parity_scrub() would later verify the content of P/Q stripes before writeback. But after that commit we never verify the content of P/Q stripes and just writeback them. This can lead to unrepaired P/Q stripes during scrub, or already corrupted P/Q copied to the dev-replace target. [FIX] The situation is more complex than the regression, in fact the initial behavior is not 100% correct either. If we have the following rare case, it can still lead to the same problem using the old behavior: 0 16K 32K 48K 64K Data 1: |IIIIIII| | Data 2: | | Parity: | |CCCCCCC| | Where "I" means IO error, "C" means corruption. In the above case, we're scrubbing the parity stripe, then read out all the contents of Data 1, Data 2, Parity stripes. But found IO error in Data 1, which leads to rebuild using Data 2 and Parity and got the correct data. In that case, we would not verify if the Parity is correct for range [16K, 32K). So here we have to always verify the content of Parity no matter if we did recovery or not. This patch would remove the @need_check parameter of finish_parity_scrub() completely, and would always do the P/Q verification before writeback. Fixes: 75b470332965 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap") CC: stable@vger.kernel.org # 6.2+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-18btrfs: use irq safe locking when running and adding delayed iputsFilipe Manana
Running delayed iputs, which never happens in an irq context, needs to lock the spinlock fs_info->delayed_iput_lock. When finishing bios for data writes (irq context, bio.c) we call btrfs_put_ordered_extent() which needs to add a delayed iput and for that it needs to acquire the spinlock fs_info->delayed_iput_lock. Without disabling irqs when running delayed iputs we can therefore deadlock on that spinlock. The same deadlock can also happen when adding an inode to the delayed iputs list, since this can be done outside an irq context as well. Syzbot recently reported this, which results in the following trace: ================================ WARNING: inconsistent lock state 6.4.0-syzkaller-09904-ga507db1d8fdc #0 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. btrfs-cleaner/16079 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888107804d20 (&fs_info->delayed_iput_lock){+.?.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline] ffff888107804d20 (&fs_info->delayed_iput_lock){+.?.}-{2:2}, at: btrfs_run_delayed_iputs+0x28/0xe0 fs/btrfs/inode.c:3523 {IN-SOFTIRQ-W} state was registered at: lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5726 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:350 [inline] btrfs_add_delayed_iput+0x128/0x390 fs/btrfs/inode.c:3490 btrfs_put_ordered_extent fs/btrfs/ordered-data.c:559 [inline] btrfs_put_ordered_extent+0x2f6/0x610 fs/btrfs/ordered-data.c:547 __btrfs_bio_end_io fs/btrfs/bio.c:118 [inline] __btrfs_bio_end_io+0x136/0x180 fs/btrfs/bio.c:112 btrfs_orig_bbio_end_io+0x86/0x2b0 fs/btrfs/bio.c:163 btrfs_simple_end_io+0x105/0x380 fs/btrfs/bio.c:378 bio_endio+0x589/0x690 block/bio.c:1617 req_bio_endio block/blk-mq.c:766 [inline] blk_update_request+0x5c5/0x1620 block/blk-mq.c:911 blk_mq_end_request+0x59/0x680 block/blk-mq.c:1032 lo_complete_rq+0x1c6/0x280 drivers/block/loop.c:370 blk_complete_reqs+0xb3/0xf0 block/blk-mq.c:1110 __do_softirq+0x1d4/0x905 kernel/softirq.c:553 run_ksoftirqd kernel/softirq.c:921 [inline] run_ksoftirqd+0x31/0x60 kernel/softirq.c:913 smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164 kthread+0x344/0x440 kernel/kthread.c:389 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 irq event stamp: 39 hardirqs last enabled at (39): [<ffffffff81d5ebc4>] __do_kmem_cache_free mm/slab.c:3558 [inline] hardirqs last enabled at (39): [<ffffffff81d5ebc4>] kmem_cache_free mm/slab.c:3582 [inline] hardirqs last enabled at (39): [<ffffffff81d5ebc4>] kmem_cache_free+0x244/0x370 mm/slab.c:3575 hardirqs last disabled at (38): [<ffffffff81d5eb5e>] __do_kmem_cache_free mm/slab.c:3553 [inline] hardirqs last disabled at (38): [<ffffffff81d5eb5e>] kmem_cache_free mm/slab.c:3582 [inline] hardirqs last disabled at (38): [<ffffffff81d5eb5e>] kmem_cache_free+0x1de/0x370 mm/slab.c:3575 softirqs last enabled at (0): [<ffffffff814ac99f>] copy_process+0x227f/0x75c0 kernel/fork.c:2448 softirqs last disabled at (0): [<0000000000000000>] 0x0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&fs_info->delayed_iput_lock); <Interrupt> lock(&fs_info->delayed_iput_lock); *** DEADLOCK *** 1 lock held by btrfs-cleaner/16079: #0: ffff888107804860 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x103/0x4b0 fs/btrfs/disk-io.c:1463 stack backtrace: CPU: 3 PID: 16079 Comm: btrfs-cleaner Not tainted 6.4.0-syzkaller-09904-ga507db1d8fdc #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 print_usage_bug kernel/locking/lockdep.c:3978 [inline] valid_state kernel/locking/lockdep.c:4020 [inline] mark_lock_irq kernel/locking/lockdep.c:4223 [inline] mark_lock.part.0+0x1102/0x1960 kernel/locking/lockdep.c:4685 mark_lock kernel/locking/lockdep.c:4649 [inline] mark_usage kernel/locking/lockdep.c:4598 [inline] __lock_acquire+0x8e4/0x5e20 kernel/locking/lockdep.c:5098 lock_acquire kernel/locking/lockdep.c:5761 [inline] lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5726 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:350 [inline] btrfs_run_delayed_iputs+0x28/0xe0 fs/btrfs/inode.c:3523 cleaner_kthread+0x2e5/0x4b0 fs/btrfs/disk-io.c:1478 kthread+0x344/0x440 kernel/kthread.c:389 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> So fix this by using spin_lock_irq() and spin_unlock_irq() when running delayed iputs, and using spin_lock_irqsave() and spin_unlock_irqrestore() when adding a delayed iput(). Reported-by: syzbot+da501a04be5ff533b102@syzkaller.appspotmail.com Fixes: ec63b84d4611 ("btrfs: add an ordered_extent pointer to struct btrfs_bio") Link: https://lore.kernel.org/linux-btrfs/000000000000d5c89a05ffbd39dd@google.com/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-18btrfs: fix iput() on error pointer after error during orphan cleanupFilipe Manana
At btrfs_orphan_cleanup(), if we can't find an inode (btrfs_iget() returns an -ENOENT error pointer), we proceed with 'ret' set to -ENOENT and the inode pointer set to ERR_PTR(-ENOENT). Later when we proceed to the body of the following if statement: if (ret == -ENOENT || inode->i_nlink) { (...) trans = btrfs_start_transaction(root, 1); if (IS_ERR(trans)) { ret = PTR_ERR(trans); iput(inode); goto out; } (...) ret = btrfs_del_orphan_item(trans, root, found_key.objectid); btrfs_end_transaction(trans); if (ret) { iput(inode); goto out; } continue; } If we get an error from btrfs_start_transaction() or from the call to btrfs_del_orphan_item() we end calling iput() against an inode pointer that has a value of ERR_PTR(-ENOENT), resulting in a crash with the following trace: [876.667] BUG: kernel NULL pointer dereference, address: 0000000000000096 [876.667] #PF: supervisor read access in kernel mode [876.667] #PF: error_code(0x0000) - not-present page [876.667] PGD 0 P4D 0 [876.668] Oops: 0000 [#1] PREEMPT SMP PTI [876.668] CPU: 0 PID: 2356187 Comm: mount Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [876.668] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [876.668] RIP: 0010:iput+0xa/0x20 [876.668] Code: ff ff ff 66 (...) [876.669] RSP: 0018:ffffafa9c0c9f9d0 EFLAGS: 00010282 [876.669] RAX: ffffffffffffffe4 RBX: 000000000009453b RCX: 0000000000000000 [876.669] RDX: 0000000000000001 RSI: ffffafa9c0c9f930 RDI: fffffffffffffffe [876.669] RBP: ffff95c612f3b800 R08: 0000000000000001 R09: ffffffffffffffe4 [876.670] R10: 00018f2a71010000 R11: 000000000ead96e3 R12: ffff95cb7d6909a0 [876.670] R13: fffffffffffffffe R14: ffff95c60f477000 R15: 00000000ffffffe4 [876.670] FS: 00007f5fbe30a840(0000) GS:ffff95ccdfa00000(0000) knlGS:0000000000000000 [876.670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [876.671] CR2: 0000000000000096 CR3: 000000055e9f6004 CR4: 0000000000370ef0 [876.671] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [876.671] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [876.672] Call Trace: [876.744] <TASK> [876.744] ? __die_body+0x1b/0x60 [876.744] ? page_fault_oops+0x15d/0x450 [876.745] ? __kmem_cache_alloc_node+0x47/0x410 [876.745] ? do_user_addr_fault+0x65/0x8a0 [876.745] ? exc_page_fault+0x74/0x170 [876.746] ? asm_exc_page_fault+0x22/0x30 [876.746] ? iput+0xa/0x20 [876.746] btrfs_orphan_cleanup+0x221/0x330 [btrfs] [876.746] btrfs_lookup_dentry+0x58f/0x5f0 [btrfs] [876.747] btrfs_lookup+0xe/0x30 [btrfs] [876.747] __lookup_slow+0x82/0x130 [876.785] walk_component+0xe5/0x160 [876.786] path_lookupat.isra.0+0x6e/0x150 [876.786] filename_lookup+0xcf/0x1a0 [876.786] ? mod_objcg_state+0xd2/0x360 [876.786] ? obj_cgroup_charge+0xf5/0x110 [876.787] ? should_failslab+0xa/0x20 [876.787] ? kmem_cache_alloc+0x47/0x450 [876.787] vfs_path_lookup+0x51/0x90 [876.788] mount_subtree+0x8d/0x130 [876.788] btrfs_mount+0x149/0x410 [btrfs] [876.788] ? __kmem_cache_alloc_node+0x47/0x410 [876.788] ? vfs_parse_fs_param+0xc0/0x110 [876.789] legacy_get_tree+0x24/0x50 [876.834] vfs_get_tree+0x22/0xd0 [876.852] path_mount+0x2d8/0x9c0 [876.852] do_mount+0x79/0x90 [876.852] __x64_sys_mount+0x8e/0xd0 [876.853] do_syscall_64+0x38/0x90 [876.899] entry_SYSCALL_64_after_hwframe+0x72/0xdc [876.958] RIP: 0033:0x7f5fbe50b76a [876.959] Code: 48 8b 0d a9 (...) [876.959] RSP: 002b:00007fff01925798 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [876.959] RAX: ffffffffffffffda RBX: 00007f5fbe694264 RCX: 00007f5fbe50b76a [876.960] RDX: 0000561bde6c8720 RSI: 0000561bde6bdec0 RDI: 0000561bde6c31a0 [876.960] RBP: 0000561bde6bdc70 R08: 0000000000000000 R09: 0000000000000001 [876.960] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [876.960] R13: 0000561bde6c31a0 R14: 0000561bde6c8720 R15: 0000561bde6bdc70 [876.960] </TASK> So fix this by setting 'inode' to NULL whenever we get an error from btrfs_iget(), and to make the code simpler, stop testing for 'ret' being -ENOENT to check if we have an inode - instead test for 'inode' being NULL or not. Having a NULL 'inode' prevents any iput() call from crashing, as iput() ignores NULL inode pointers. Also, stop testing for a NULL return value from btrfs_iget() with PTR_ERR_OR_ZERO(), because btrfs_iget() never returns NULL - in case an inode is not found, it returns ERR_PTR(-ENOENT), and in case of memory allocation failure, it returns ERR_PTR(-ENOMEM). We also don't need the extra iput() calls on the error branches for the btrfs_start_transaction() and btrfs_del_orphan_item() calls, as we have already called iput() before, so remove them. Fixes: a13bb2c03848 ("btrfs: add missing iputs on orphan cleanup failure") CC: stable@vger.kernel.org # 6.4 Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-18btrfs: fix double iput() on inode after an error during orphan cleanupFilipe Manana
At btrfs_orphan_cleanup(), if we were able to find the inode, we do an iput() on the inode, then if btrfs_drop_verity_items() succeeds and then either btrfs_start_transaction() or btrfs_del_orphan_item() fail, we do another iput() in the respective error paths, resulting in an extra iput() on the inode. Fix this by setting inode to NULL after the first iput(), as iput() ignores a NULL inode pointer argument. Fixes: a13bb2c03848 ("btrfs: add missing iputs on orphan cleanup failure") CC: stable@vger.kernel.org # 6.4 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-18btrfs: zoned: fix memory leak after finding block group with super blocksFilipe Manana
At exclude_super_stripes(), if we happen to find a block group that has super blocks mapped to it and we are on a zoned filesystem, we error out as this is not supposed to happen, indicating either a bug or maybe some memory corruption for example. However we are exiting the function without freeing the memory allocated for the logical address of the super blocks. Fix this by freeing the logical address. Fixes: 12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>