summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-29Merge tag 'loongarch-fixes-6.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Remove the unused dma-direct.h, and some bug & warning fixes" * tag 'loongarch-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Invalidate guest steal time address on vCPU reset LoongArch: Add ifdefs to fix LSX and LASX related warnings LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE LoongArch: Remove the unused dma-direct.h
2024-08-29Merge tag 'platform-drivers-x86-v6.11-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers fixes from Ilpo Järvinen: - platform/x86/amd/pmc: AMD 1Ah model 60h series support (2nd attempt) - asus-wmi: Prevent spurious rfkill on Asus Zenbook Duo - x86-android-tablets: Relax DMI match to cover another model * tag 'platform-drivers-x86-v6.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: x86-android-tablets: Make Lenovo Yoga Tab 3 X90F DMI match less strict platform/x86: asus-wmi: Fix spurious rfkill on UX8406MA platform/x86/amd/pmc: Extend support for PMC features on new AMD platform platform/x86/amd/pmc: Fix SMU command submission path on new AMD platform
2024-08-29Merge tag 'nfsd-6.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a number of crashers - Update email address for an NFSD reviewer * tag 'nfsd-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: fs/nfsd: fix update of inode attrs in CB_GETATTR nfsd: fix potential UAF in nfsd4_cb_getattr_release nfsd: hold reference to delegation when updating it for cb_getattr MAINTAINERS: Update Olga Kornievskaia's email address nfsd: prevent panic for nfsv4.0 closed files in nfs4_show_open nfsd: ensure that nfsd4_fattr_args.context is zeroed out
2024-08-29Merge tag 'for-6.11-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix use-after-free when submitting bios for read, after an error and partially submitted bio the original one is freed while it can be still be accessed again - fix fstests case btrfs/301, with enabled quotas wait for delayed iputs when flushing delalloc - fix periodic block group reclaim, an unitialized value can be returned if there are no block groups to reclaim - fix build warning (-Wmaybe-uninitialized) * tag 'for-6.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix uninitialized return value from btrfs_reclaim_sweep() btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk() btrfs: initialize last_extent_end to fix -Wmaybe-uninitialized warning in extent_fiemap() btrfs: run delayed iputs when flushing delalloc
2024-08-28fuse: update stats for pages in dropped aux writeback listJoanne Koong
In the case where the aux writeback list is dropped (e.g. the pages have been truncated or the connection is broken), the stats for its pages and backing device info need to be updated as well. Fixes: e2653bd53a98 ("fuse: fix leaked aux requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Cc: <stable@vger.kernel.org> # v5.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28fuse: clear PG_uptodate when using a stolen pageMiklos Szeredi
Originally when a stolen page was inserted into fuse's page cache by fuse_try_move_page(), it would be marked uptodate. Then fuse_readpages_end() would call SetPageUptodate() again on the already uptodate page. Commit 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use folio_end_read()") changed that by replacing the SetPageUptodate() + unlock_page() combination with folio_end_read(), which does mostly the same, except it sets the uptodate flag with an xor operation, which in the above scenario resulted in the uptodate flag being cleared, which in turn resulted in EIO being returned on the read. Fix by clearing PG_uptodate instead of setting it in fuse_try_move_page(), conforming to the expectation of folio_end_read(). Reported-by: Jürg Billeter <j@bitron.ch> Debugged-by: Matthew Wilcox <willy@infradead.org> Fixes: 413e8f014c8b ("fuse: Convert fuse_readpages_end() to use folio_end_read()") Cc: <stable@vger.kernel.org> # v6.10 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28fuse: fix memory leak in fuse_create_openyangyun
The memory of struct fuse_file is allocated but not freed when get_create_ext return error. Fixes: 3e2b6fdbdc9a ("fuse: send security context of inode on file") Cc: stable@vger.kernel.org # v5.17 Signed-off-by: yangyun <yangyun50@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28fuse: check aborted connection before adding requests to pending list for ↵Joanne Koong
resending There is a race condition where inflight requests will not be aborted if they are in the middle of being re-sent when the connection is aborted. If fuse_resend has already moved all the requests in the fpq->processing lists to its private queue ("to_queue") and then the connection starts and finishes aborting, these requests will be added to the pending queue and remain on it indefinitely. Fixes: 760eac73f9f6 ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v6.9 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28fuse: use unsigned type for getxattr/listxattr size truncationJann Horn
The existing code uses min_t(ssize_t, outarg.size, XATTR_LIST_MAX) when parsing the FUSE daemon's response to a zero-length getxattr/listxattr request. On 32-bit kernels, where ssize_t and outarg.size are the same size, this is wrong: The min_t() will pass through any size values that are negative when interpreted as signed. fuse_listxattr() will then return this userspace-supplied negative value, which callers will treat as an error value. This kind of bug pattern can lead to fairly bad security bugs because of how error codes are used in the Linux kernel. If a caller were to convert the numeric error into an error pointer, like so: struct foo *func(...) { int len = fuse_getxattr(..., NULL, 0); if (len < 0) return ERR_PTR(len); ... } then it would end up returning this userspace-supplied negative value cast to a pointer - but the caller of this function wouldn't recognize it as an error pointer (IS_ERR_VALUE() only detects values in the narrow range in which legitimate errno values are), and so it would just be treated as a kernel pointer. I think there is at least one theoretical codepath where this could happen, but that path would involve virtio-fs with submounts plus some weird SELinux configuration, so I think it's probably not a concern in practice. Cc: stable@vger.kernel.org # v4.9 Fixes: 63401ccdb2ca ("fuse: limit xattr returned size") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-08-28mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4KSam Protsenko
Commit 616f87661792 ("mmc: pass queue_limits to blk_mq_alloc_disk") [1] revealed the long living issue in dw_mmc.c driver, existing since the time when it was first introduced in commit f95f3850f7a9 ("mmc: dw_mmc: Add Synopsys DesignWare mmc host driver."), also making kernel boot broken on platforms using dw_mmc driver with 16K or 64K pages enabled, with this message in dmesg: mmcblk: probe of mmc0:0001 failed with error -22 That's happening because mmc_blk_probe() fails when it calls blk_validate_limits() consequently, which returns the error due to failed max_segment_size check in this code: /* * The maximum segment size has an odd historic 64k default that * drivers probably should override. Just like the I/O size we * require drivers to at least handle a full page per segment. */ ... if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE)) return -EINVAL; In case when IDMAC (Internal DMA Controller) is used, dw_mmc.c always sets .max_seg_size to 4 KiB: mmc->max_seg_size = 0x1000; The comment in the code above explains why it's incorrect. Arnd suggested setting .max_seg_size to .max_req_size to fix it, which is also what some other drivers are doing: $ grep -rl 'max_seg_size.*=.*max_req_size' drivers/mmc/host/ | \ wc -l 18 This change is not only fixing the boot with 16K/64K pages, but also leads to a better MMC performance. The linear write performance was tested on E850-96 board (eMMC only), before commit [1] (where it's possible to boot with 16K/64K pages without this fix, to be able to do a comparison). It was tested with this command: # dd if=/dev/zero of=somefile bs=1M count=500 oflag=sync Test results are as follows: - 4K pages, .max_seg_size = 4 KiB: 94.2 MB/s - 4K pages, .max_seg_size = .max_req_size = 512 KiB: 96.9 MB/s - 16K pages, .max_seg_size = 4 KiB: 126 MB/s - 16K pages, .max_seg_size = .max_req_size = 2 MiB: 128 MB/s - 64K pages, .max_seg_size = 4 KiB: 138 MB/s - 64K pages, .max_seg_size = .max_req_size = 8 MiB: 138 MB/s Unfortunately, SD card controller is not enabled in E850-96 yet, so it wasn't possible for me to run the test on some cheap SD cards to check this patch's impact on those. But it's possible that this change might also reduce the writes count, thus improving SD/eMMC longevity. All credit for the analysis and the suggested solution goes to Arnd. [1] https://lore.kernel.org/all/20240215070300.2200308-18-hch@lst.de/ Fixes: f95f3850f7a9 ("mmc: dw_mmc: Add Synopsys DesignWare mmc host driver.") Suggested-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/all/CA+G9fYtddf2Fd3be+YShHP6CmSDNcn0ptW8qg+stUKW+Cn0rjQ@mail.gmail.com/ Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240306232052.21317-1-semen.protsenko@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-08-28cpufreq/amd-pstate: Remove warning for X86_FEATURE_CPPC on certain Zen modelsGautham R. Shenoy
commit bff7d13c190a ("cpufreq: amd-pstate: add debug message while CPPC is supported and disabled by SBIOS") issues a warning on plaforms where the X86_FEATURE_CPPC is expected to be enabled, but is not due to it being disabled in the BIOS. This feature bit corresponds to CPUID 0x80000008.ebx[27] which is a reserved bit on the Zen1 processors and a reserved bit on Zen2 based models 0x70-0x7F, and is expected to be cleared on these platforms. Thus printing the warning message for these models when X86_FEATURE_CPPC is unavailable is incorrect. Fix this. Modify some of the comments, and use switch-case for model range checking for improved readability while at it. Fixes: bff7d13c190a ("cpufreq: amd-pstate: add debug message while CPPC is supported and disabled by SBIOS") Cc: Xiaojian Du <xiaojian.du@amd.com> Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/lkml/20240730140111.4491-1-00107082@163.com/ Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Acked-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-08-28mmc: sdhci-of-aspeed: fix module autoloadingLiao Chen
Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from of_device_id table. Signed-off-by: Liao Chen <liaochen4@huawei.com> Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au> Fixes: bb7b8ec62dfb ("mmc: sdhci-of-aspeed: Add support for the ASPEED SD controller") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240826124851.379759-1-liaochen4@huawei.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-08-28block: fix detection of unsupported WRITE SAME in blkdev_issue_write_zeroesDarrick J. Wong
On error, blkdev_issue_write_zeroes used to recheck the block device's WRITE SAME queue limits after submitting WRITE SAME bios. As stated in the comment, the purpose of this was to collapse all IO errors to EOPNOTSUPP if the effect of issuing bios was that WRITE SAME got turned off in the queue limits. Therefore, it does not make sense to reuse the zeroes limit that was read earlier in the function because we only care about the queue limit *now*, not what it was at the start of the function. Found by running generic/351 from fstests. Fixes: 64b582ca88ca1 ("block: Read max write zeroes once for __blkdev_issue_write_zeroes()") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240827175340.GB1977952@frogsfrogsfrogs Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-28drm/v3d: Disable preemption while updating GPU statsTvrtko Ursulin
We forgot to disable preemption around the write_seqcount_begin/end() pair while updating GPU stats: [ ] WARNING: CPU: 2 PID: 12 at include/linux/seqlock.h:221 __seqprop_assert.isra.0+0x128/0x150 [v3d] [ ] Workqueue: v3d_bin drm_sched_run_job_work [gpu_sched] <...snip...> [ ] Call trace: [ ] __seqprop_assert.isra.0+0x128/0x150 [v3d] [ ] v3d_job_start_stats.isra.0+0x90/0x218 [v3d] [ ] v3d_bin_job_run+0x23c/0x388 [v3d] [ ] drm_sched_run_job_work+0x520/0x6d0 [gpu_sched] [ ] process_one_work+0x62c/0xb48 [ ] worker_thread+0x468/0x5b0 [ ] kthread+0x1c4/0x1e0 [ ] ret_from_fork+0x10/0x20 Fix it. Cc: Maíra Canal <mcanal@igalia.com> Cc: stable@vger.kernel.org # v6.10+ Fixes: 6abe93b621ab ("drm/v3d: Fix race-condition between sysfs/fdinfo and interrupt handler") Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240813102505.80512-1-tursulin@igalia.com
2024-08-28drm/amd/pm: Drop unsupported features on smu v14_0_2Candice Li
Drop unsupported features on smu v14_0_2. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3376f922bfe070eff762164b3fc66981e3079417)
2024-08-28drm/amd/pm: Add support for new P2S table revisionLijo Lazar
Add p2s table support for a new revision of SMUv13.0.6. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 010cc730ace807c6d267481b5fb6ff99acc35c46)
2024-08-28drm/amdgpu: support for gc_info table v1.3Likun Gao
Add gc_info table v1.3 for IP discovery. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 875ff9a7ee8824200885384effa7743892a34ed6)
2024-08-28drm/amd/display: avoid using null object of framebufferMa Ke
Instead of using state->fb->obj[0] directly, get object from framebuffer by calling drm_gem_fb_get_obj() and return error code when object is null to avoid using null object of framebuffer. Fixes: 5d945cbcd4b1 ("drm/amd/display: Create a file dedicated to planes") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 73dd0ad9e5dad53766ea3e631303430116f834b3)
2024-08-28drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDsAlex Deucher
This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3575 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 40318a2406bd426c6f4591269669c04e8eda571d)
2024-08-28drm/amd/pm: update message interface for smu v14.0.2/3Kenneth Feng
update message interface for smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 01bfabc2d1d8aaffe5268f8df0843a6d916dcbaa)
2024-08-28drm/amdgpu/swsmu: always force a state reprogram on initAlex Deucher
Always reprogram the hardware state on init. This ensures the PMFW state is explicitly programmed and we are not relying on the default PMFW state. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3131 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c50fe289ed7207f71df3b5f1720512a9620e84fb) Cc: stable@vger.kernel.org
2024-08-28drm/amdgpu/smu13.0.7: print index for profilesAlex Deucher
Print the index for the profiles. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3543 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b86a6a57b8ad1699ba8b1c270a79678383baf632)
2024-08-28drm/amdgpu: align pp_power_profile_mode with kernel docsAlex Deucher
The kernel doc says you need to select manual mode to adjust this, but the code only allows you to adjust it when manual mode is not selected. Remove the manual mode check. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bbb05f8a9cd87f5046d05a0c596fddfb714ee457) Cc: stable@vger.kernel.org
2024-08-28dmaengine: dw-edma: Do not enable watermark interrupts for HDMAMrinmay Sarkar
DW_HDMA_V0_LIE and DW_HDMA_V0_RIE are initialized as BIT(3) and BIT(4) respectively in dw_hdma_control enum. But as per HDMA register these bits are corresponds to LWIE and RWIE bit i.e local watermark interrupt enable and remote watermarek interrupt enable. In linked list mode LWIE and RWIE bits only enable the local and remote watermark interrupt. Since the watermark interrupts are not used but enabled, this leads to spurious interrupts getting generated. So remove the code that enables them to avoid generating spurious watermark interrupts. And also rename DW_HDMA_V0_LIE to DW_HDMA_V0_LWIE and DW_HDMA_V0_RIE to DW_HDMA_V0_RWIE as there is no LIE and RIE bits in HDMA and those bits are corresponds to LWIE and RWIE bits. Fixes: e74c39573d35 ("dmaengine: dw-edma: Add support for native HDMA") cc: stable@vger.kernel.org Signed-off-by: Mrinmay Sarkar <quic_msarkar@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/1724674261-3144-3-git-send-email-quic_msarkar@quicinc.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-08-28dmaengine: dw-edma: Fix unmasking STOP and ABORT interrupts for HDMAMrinmay Sarkar
The current logic is enabling both STOP_INT_MASK and ABORT_INT_MASK bit. This is apparently masking those particular interrupts rather than unmasking the same. If the interrupts are masked, they would never get triggered. So fix the issue by unmasking the STOP and ABORT interrupts properly. Fixes: e74c39573d35 ("dmaengine: dw-edma: Add support for native HDMA") cc: stable@vger.kernel.org Signed-off-by: Mrinmay Sarkar <quic_msarkar@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/1724674261-3144-2-git-send-email-quic_msarkar@quicinc.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-08-28cifs: Fix copy offload to flush destination regionDavid Howells
Fix cifs_file_copychunk_range() to flush the destination region before invalidating it to avoid potential loss of data should the copy fail, in whole or in part, in some way. Fixes: 7b2404a886f8 ("cifs: Fix flushing, invalidation and file size with copy_file_range()") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <stfrench@microsoft.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-28netfs, cifs: Fix handling of short DIO readDavid Howells
Short DIO reads, particularly in relation to cifs, are not being handled correctly by cifs and netfslib. This can be tested by doing a DIO read of a file where the size of read is larger than the size of the file. When it crosses the EOF, it gets a short read and this gets retried, and in the case of cifs, the retry read fails, with the failure being translated to ENODATA. Fix this by the following means: (1) Add a flag, NETFS_SREQ_HIT_EOF, for the filesystem to set when it detects that the read did hit the EOF. (2) Make the netfslib read assessment stop processing subrequests when it encounters one with that flag set. (3) Return rreq->transferred, the accumulated contiguous amount read to that point, to userspace for a DIO read. (4) Make cifs set the flag and clear the error if the read RPC returned ENODATA. (5) Make cifs set the flag and clear the error if a short read occurred without error and the read-to file position is now at the remote inode size. Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-28cifs: Fix lack of credit renegotiation on read retryDavid Howells
When netfslib asks cifs to issue a read operation, it prefaces this with a call to ->clamp_length() which cifs uses to negotiate credits, providing receive capacity on the server; however, in the event that a read op needs reissuing, netfslib doesn't call ->clamp_length() again as that could shorten the subrequest, leaving a gap. This causes the retried read to be done with zero credits which causes the server to reject it with STATUS_INVALID_PARAMETER. This is a problem for a DIO read that is requested that would go over the EOF. The short read will be retried, causing EINVAL to be returned to the user when it fails. Fix this by making cifs_req_issue_read() negotiate new credits if retrying (NETFS_SREQ_RETRYING now gets set in the read side as well as the write side in this instance). This isn't sufficient, however: the new credits might not be sufficient to complete the remainder of the read, so also add an additional field, rreq->actual_len, that holds the actual size of the op we want to perform without having to alter subreq->len. We then rely on repeated short reads being retried until we finish the read or reach the end of file and make a zero-length read. Also fix a couple of places where the subrequest start and length need to be altered by the amount so far transferred when being used. Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-28KVM: SEV: Update KVM_AMD_SEV Kconfig entry and mention SEV-SNPVitaly Kuznetsov
SEV-SNP support is present since commit 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support") but Kconfig entry wasn't updated and still mentions SEV and SEV-ES only. Add SEV-SNP there and, while on it, expand 'SEV' in the description as 'Encrypted VMs' is not what 'SEV' stands for. No functional change. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20240828122111.160273-1-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-08-28ASoc: SOF: topology: Clear SOF link platform name upon unloadChen-Yu Tsai
The SOF topology loading function sets the device name for the platform component link. This should be unset when unloading the topology, otherwise a machine driver unbind/bind or reprobe would complain about an invalid component as having both its component name and of_node set: mt8186_mt6366 sound: ASoC: Both Component name/of_node are set for AFE_SOF_DL1 mt8186_mt6366 sound: error -EINVAL: Cannot register card mt8186_mt6366 sound: probe with driver mt8186_mt6366 failed with error -22 This happens with machine drivers that set the of_node separately. Clear the SOF link platform name in the topology unload callback. Fixes: 311ce4fe7637 ("ASoC: SOF: Add support for loading topologies") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Link: https://patch.msgid.link/20240821041006.2618855-1-wenst@chromium.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-08-28x86/resctrl: Fix arch_mbm_* array overrun on SNCPeter Newman
When using resctrl on systems with Sub-NUMA Clustering enabled, monitoring groups may be allocated RMID values which would overrun the arch_mbm_{local,total} arrays. This is due to inconsistencies in whether the SNC-adjusted num_rmid value or the unadjusted value in resctrl_arch_system_num_rmid_idx() is used. The num_rmid value for the L3 resource is currently: resctrl_arch_system_num_rmid_idx() / snc_nodes_per_l3_cache As a simple fix, make resctrl_arch_system_num_rmid_idx() return the SNC-adjusted, L3 num_rmid value on x86. Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache") Signed-off-by: Peter Newman <peternewman@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20240822190212.1848788-1-peternewman@google.com
2024-08-28drm/i915/dp_mst: Fix MST state after a sink resetImre Deak
In some cases the sink can reset itself after it was configured into MST mode, without the driver noticing the disconnected state. For instance the reset may happen in the middle of a modeset, or the (long) HPD pulse generated may be not long enough for the encoder detect handler to observe the HPD's deasserted state. In this case the sink's DPCD register programmed to enable MST will be reset, while the driver still assumes MST is still enabled. Detect this condition, which will tear down and recreate/re-enable the MST topology. v2: - Add a code comment about adjusting the expected DP_MSTM_CTRL register value for SST + SideBand. (Suraj, Jani) - Print a debug message about detecting the link reset. (Jani) - Verify the DPCD MST state only if it wasn't already determined that the sink is disconnected. Cc: stable@vger.kernel.org Cc: Jani Nikula <jani.nikula@intel.com> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11195 Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> (v1) Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823162918.1211875-1-imre.deak@intel.com (cherry picked from commit 594cf78dc36f31c0c7e0de4567e644f406d46bae) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-08-28ALSA: hda/conexant: Add pincfg quirk to enable top speakers on Sirius devicesChristoffer Sandberg
The Sirius notebooks have two sets of speakers 0x17 (sides) and 0x1d (top center). The side speakers are active by default but the top speakers aren't. This patch provides a pincfg quirk to activate the top speakers. Signed-off-by: Christoffer Sandberg <cs@tuxedo.de> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240827102540.9480-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-08-28Merge tag 'v6.11-rc5-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - two RDMA/smbdirect fixes and a minor cleanup - punch hole fix * tag 'v6.11-rc5-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix FALLOC_FL_PUNCH_HOLE support smb/client: fix rdma usage in smb2_async_writev() smb/client: remove unused rq_iter_size from struct smb_rqst smb/client: avoid dereferencing rdata=NULL in smb2_new_read_req()
2024-08-28Merge tag 'tpmdd-next-6.11-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull TPM fix from Jarkko Sakkinen: "A bug fix for tpm_ibmvtpm driver so that it will take the bus encryption into use" * tag 'tpmdd-next-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: ibmvtpm: Call tpm2_sessions_init() to initialize session support
2024-08-27sctp: fix association labeling in the duplicate COOKIE-ECHO caseOndrej Mosnacek
sctp_sf_do_5_2_4_dupcook() currently calls security_sctp_assoc_request() on new_asoc, but as it turns out, this association is always discarded and the LSM labels never get into the final association (asoc). This can be reproduced by having two SCTP endpoints try to initiate an association with each other at approximately the same time and then peel off the association into a new socket, which exposes the unitialized labels and triggers SELinux denials. Fix it by calling security_sctp_assoc_request() on asoc instead of new_asoc. Xin Long also suggested limit calling the hook only to cases A, B, and D, since in cases C and E the COOKIE ECHO chunk is discarded and the association doesn't enter the ESTABLISHED state, so rectify that as well. One related caveat with SELinux and peer labeling: When an SCTP connection is set up simultaneously in this way, we will end up with an association that is initialized with security_sctp_assoc_request() on both sides, so the MLS component of the security context of the association will get swapped between the peers, instead of just one side setting it to the other's MLS component. However, at that point security_sctp_assoc_request() had already been called on both sides in sctp_sf_do_unexpected_init() (on a temporary association) and thus if the exchange didn't fail before due to MLS, it won't fail now either (most likely both endpoints have the same MLS range). Tested by: - reproducer from https://src.fedoraproject.org/tests/selinux/pull-request/530 - selinux-testsuite (https://github.com/SELinuxProject/selinux-testsuite/) - sctp-tests (https://github.com/sctp/sctp-tests) - no tests failed that wouldn't fail also without the patch applied Fixes: c081d53f97a1 ("security: pass asoc to sctp_assoc_request and sctp_sk_clone") Suggested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Xin Long <lucien.xin@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM/SELinux) Link: https://patch.msgid.link/20240826130711.141271-1-omosnace@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27Merge branch 'mptcp-close-subflow-when-receiving-tcp-fin-and-misc'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: close subflow when receiving TCP+FIN and misc. Here are different fixes: Patch 1 closes the subflow after having received a FIN, instead of leaving it half-closed until the end of the MPTCP connection. A fix for v5.12. Patch 2 validates the previous patch. Patch 3 is a fix for a recent fix to check both directions for the backup flag. It can follow the 'Fixes' commit and be backported up to v5.7. Patch 4 adds a missing \n at the end of pr_debug(), causing debug messages to be displayed with a delay, which confuses the debugger. A fix for v5.6. ==================== Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-0-905199fe1172@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27mptcp: pr_debug: add missing \n at the endMatthieu Baerts (NGI0)
pr_debug() have been added in various places in MPTCP code to help developers to debug some situations. With the dynamic debug feature, it is easy to enable all or some of them, and asks users to reproduce issues with extra debug. Many of these pr_debug() don't end with a new line, while no 'pr_cont()' are used in MPTCP code. So the goal was not to display multiple debug messages on one line: they were then not missing the '\n' on purpose. Not having the new line at the end causes these messages to be printed with a delay, when something else needs to be printed. This issue is not visible when many messages need to be printed, but it is annoying and confusing when only specific messages are expected, e.g. # echo "func mptcp_pm_add_addr_echoed +fmp" \ > /sys/kernel/debug/dynamic_debug/control # ./mptcp_join.sh "signal address"; \ echo "$(awk '{print $1}' /proc/uptime) - end"; \ sleep 5s; \ echo "$(awk '{print $1}' /proc/uptime) - restart"; \ ./mptcp_join.sh "signal address" 013 signal address (...) 10.75 - end 15.76 - restart 013 signal address [ 10.367935] mptcp:mptcp_pm_add_addr_echoed: MPTCP: msk=(...) (...) => a delay of 5 seconds: printed with a 10.36 ts, but after 'restart' which was printed at the 15.76 ts. The 'Fixes' tag here below points to the first pr_debug() used without '\n' in net/mptcp. This patch could be split in many small ones, with different Fixes tag, but it doesn't seem worth it, because it is easy to re-generate this patch with this simple 'sed' command: git grep -l pr_debug -- net/mptcp | xargs sed -i "s/\(pr_debug(\".*[^n]\)\(\"[,)]\)/\1\\\n\2/g" So in case of conflicts, simply drop the modifications, and launch this command. Fixes: f870fa0b5768 ("mptcp: Add MPTCP socket stubs") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-4-905199fe1172@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27mptcp: sched: check both backup in retransMatthieu Baerts (NGI0)
The 'mptcp_subflow_context' structure has two items related to the backup flags: - 'backup': the subflow has been marked as backup by the other peer - 'request_bkup': the backup flag has been set by the host Looking only at the 'backup' flag can make sense in some cases, but it is not the behaviour of the default packet scheduler when selecting paths. As explained in the commit b6a66e521a20 ("mptcp: sched: check both directions for backup"), the packet scheduler should look at both flags, because that was the behaviour from the beginning: the 'backup' flag was set by accident instead of the 'request_bkup' one. Now that the latter has been fixed, get_retrans() needs to be adapted as well. Fixes: b6a66e521a20 ("mptcp: sched: check both directions for backup") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-3-905199fe1172@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27selftests: mptcp: join: cannot rm sf if closedMatthieu Baerts (NGI0)
Thanks to the previous commit, the MPTCP subflows are now closed on both directions even when only the MPTCP path-manager of one peer asks for their closure. In the two tests modified here -- "userspace pm add & remove address" and "userspace pm create destroy subflow" -- one peer is controlled by the userspace PM, and the other one by the in-kernel PM. When the userspace PM sends a RM_ADDR notification, the in-kernel PM will automatically react by closing all subflows using this address. Now, thanks to the previous commit, the subflows are properly closed on both directions, the userspace PM can then no longer closes the same subflows if they are already closed. Before, it was OK to do that, because the subflows were still half-opened, still OK to send a RM_ADDR. In other words, thanks to the previous commit closing the subflows, an error will be returned to the userspace if it tries to close a subflow that has already been closed. So no need to run this command, which mean that the linked counters will then not be incremented. These tests are then no longer sending both a RM_ADDR, then closing the linked subflow just after. The test with the userspace PM on the server side is now removing one subflow linked to one address, then sending a RM_ADDR for another address. The test with the userspace PM on the client side is now only removing the subflow that was previously created. Fixes: 4369c198e599 ("selftests: mptcp: test userspace pm out of transfer") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-2-905199fe1172@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27mptcp: close subflow when receiving TCP+FINMatthieu Baerts (NGI0)
When a peer decides to close one subflow in the middle of a connection having multiple subflows, the receiver of the first FIN should accept that, and close the subflow on its side as well. If not, the subflow will stay half closed, and would even continue to be used until the end of the MPTCP connection or a reset from the network. The issue has not been seen before, probably because the in-kernel path-manager always sends a RM_ADDR before closing the subflow. Upon the reception of this RM_ADDR, the other peer will initiate the closure on its side as well. On the other hand, if the RM_ADDR is lost, or if the path-manager of the other peer only closes the subflow without sending a RM_ADDR, the subflow would switch to TCP_CLOSE_WAIT, but that's it, leaving the subflow half-closed. So now, when the subflow switches to the TCP_CLOSE_WAIT state, and if the MPTCP connection has not been closed before with a DATA_FIN, the kernel owning the subflow schedules its worker to initiate the closure on its side as well. This issue can be easily reproduced with packetdrill, as visible in [1], by creating an additional subflow, injecting a FIN+ACK before sending the DATA_FIN, and expecting a FIN+ACK in return. Fixes: 40947e13997a ("mptcp: schedule worker when subflow is closed") Cc: stable@vger.kernel.org Link: https://github.com/multipath-tcp/packetdrill/pull/154 [1] Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-1-905199fe1172@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27tcp: fix forever orphan socket caused by tcp_abortXueming Feng
We have some problem closing zero-window fin-wait-1 tcp sockets in our environment. This patch come from the investigation. Previously tcp_abort only sends out reset and calls tcp_done when the socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only purging the write queue, but not close the socket and left it to the timer. While purging the write queue, tp->packets_out and sk->sk_write_queue is cleared along the way. However tcp_retransmit_timer have early return based on !tp->packets_out and tcp_probe_timer have early return based on !sk->sk_write_queue. This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched and socket not being killed by the timers, converting a zero-windowed orphan into a forever orphan. This patch removes the SOCK_DEAD check in tcp_abort, making it send reset to peer and close the socket accordingly. Preventing the timer-less orphan from happening. According to Lorenzo's email in the v1 thread, the check was there to prevent force-closing the same socket twice. That situation is handled by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is already closed. The -ENOENT code comes from the associate patch Lorenzo made for iproute2-ss; link attached below, which also conform to RFC 9293. At the end of the patch, tcp_write_queue_purge(sk) is removed because it was already called in tcp_done_with_error(). p.s. This is the same patch with v2. Resent due to mis-labeled "changes requested" on patchwork.kernel.org. Link: https://patchwork.ozlabs.org/project/netdev/patch/1450773094-7978-3-git-send-email-lorenzo@google.com/ Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.") Signed-off-by: Xueming Feng <kuro@kuroa.me> Tested-by: Lorenzo Colitti <lorenzo@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240826102327.1461482-1-kuro@kuroa.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27gtp: fix a potential NULL pointer dereferenceCong Wang
When sockfd_lookup() fails, gtp_encap_enable_socket() returns a NULL pointer, but its callers only check for error pointers thus miss the NULL pointer case. Fix it by returning an error pointer with the error code carried from sockfd_lookup(). (I found this bug during code inspection.) Fixes: 1e3a3abd8b28 ("gtp: make GTP sockets in gtp_newlink optional") Cc: Andreas Schultz <aschultz@tpip.net> Cc: Harald Welte <laforge@gnumonks.org> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://patch.msgid.link/20240825191638.146748-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27rust: allow `stable_features` lintMiguel Ojeda
Support for several Rust compiler versions started in commit 63b27f4a0074 ("rust: start supporting several compiler versions"). Since we currently need to use a number of unstable features in the kernel, it is a matter of time until one gets stabilized and the `stable_features` lint warns. For instance, the `new_uninit` feature may become stable soon, which would give us multiple warnings like the following: warning: the feature `new_uninit` has been stable since 1.82.0-dev and no longer requires an attribute to enable --> rust/kernel/lib.rs:17:12 | 17 | #![feature(new_uninit)] | ^^^^^^^^^^ | = note: `#[warn(stable_features)]` on by default Thus allow the `stable_features` lint to avoid such warnings. This is the simplest approach -- we do not have that many cases (and the goal is to stop using unstable features anyway) and cleanups can be easily done when we decide to update the minimum version. An alternative would be to conditionally enable them based on the compiler version (with the upcoming `RUSTC_VERSION` or maybe with the unstable `cfg(version(...))`, but that one apparently will not work for the nightly case). However, doing so is more complex and may not work well for different nightlies of the same version, unless we do not care about older nightlies. Another alternative is using explicit tests of the feature calling `rustc`, but that is also more complex and slower. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240827100403.376389-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-08-27docs: rust: remove unintended blockquote in Quick StartJon Mulder
Remove indentation within the "Hacking" section of the Rust Quick Start guide, i.e. remove a `<blockquote>` HTML element from the rendered documentation. Reported-by: Miguel Ojeda <ojeda@kernel.org> Closes: https://github.com/Rust-for-Linux/linux/issues/1103 Fixes: d07479b211b7 ("docs: add Rust documentation") Signed-off-by: Jon Mulder <jon.e.mulder@gmail.com> Link: https://lore.kernel.org/r/20240826-pr-docs-rust-remove-quickstart-blockquote-v1-1-c51317d8d71a@gmail.com [ Added Fixes tag, reworded slightly and matched title to a previous, similar commit. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-08-27Merge branch 'fixes-for-ipsec-over-bonding'Jakub Kicinski
Jianbo Liu says: ==================== Fixes for IPsec over bonding This patchset provides bug fixes for IPsec over bonding driver. It adds the missing xdo_dev_state_free API, and fixes "scheduling while atomic" by using mutex lock instead. Series generated against: commit c07ff8592d57 ("netem: fix return value if duplicate enqueue fails") ==================== Link: https://patch.msgid.link/20240823031056.110999-1-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27bonding: change ipsec_lock from spin lock to mutexJianbo Liu
In the cited commit, bond->ipsec_lock is added to protect ipsec_list, hence xdo_dev_state_add and xdo_dev_state_delete are called inside this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep, "scheduling while atomic" will be triggered when changing bond's active slave. [ 101.055189] BUG: scheduling while atomic: bash/902/0x00000200 [ 101.055726] Modules linked in: [ 101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1 [ 101.058760] Hardware name: [ 101.059434] Call Trace: [ 101.059436] <TASK> [ 101.060873] dump_stack_lvl+0x51/0x60 [ 101.061275] __schedule_bug+0x4e/0x60 [ 101.061682] __schedule+0x612/0x7c0 [ 101.062078] ? __mod_timer+0x25c/0x370 [ 101.062486] schedule+0x25/0xd0 [ 101.062845] schedule_timeout+0x77/0xf0 [ 101.063265] ? asm_common_interrupt+0x22/0x40 [ 101.063724] ? __bpf_trace_itimer_state+0x10/0x10 [ 101.064215] __wait_for_common+0x87/0x190 [ 101.064648] ? usleep_range_state+0x90/0x90 [ 101.065091] cmd_exec+0x437/0xb20 [mlx5_core] [ 101.065569] mlx5_cmd_do+0x1e/0x40 [mlx5_core] [ 101.066051] mlx5_cmd_exec+0x18/0x30 [mlx5_core] [ 101.066552] mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core] [ 101.067163] ? bonding_sysfs_store_option+0x4d/0x80 [bonding] [ 101.067738] ? kmalloc_trace+0x4d/0x350 [ 101.068156] mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core] [ 101.068747] mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core] [ 101.069312] bond_change_active_slave+0x392/0x900 [bonding] [ 101.069868] bond_option_active_slave_set+0x1c2/0x240 [bonding] [ 101.070454] __bond_opt_set+0xa6/0x430 [bonding] [ 101.070935] __bond_opt_set_notify+0x2f/0x90 [bonding] [ 101.071453] bond_opt_tryset_rtnl+0x72/0xb0 [bonding] [ 101.071965] bonding_sysfs_store_option+0x4d/0x80 [bonding] [ 101.072567] kernfs_fop_write_iter+0x10c/0x1a0 [ 101.073033] vfs_write+0x2d8/0x400 [ 101.073416] ? alloc_fd+0x48/0x180 [ 101.073798] ksys_write+0x5f/0xe0 [ 101.074175] do_syscall_64+0x52/0x110 [ 101.074576] entry_SYSCALL_64_after_hwframe+0x4b/0x53 As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called from bond_change_active_slave, which requires holding the RTNL lock. And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user context. So ipsec_lock doesn't have to be spin lock, change it to mutex, and thus the above issue can be resolved. Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20240823031056.110999-4-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27bonding: extract the use of real_device into local variableJianbo Liu
Add a local variable for slave->dev, to prepare for the lock change in the next patch. There is no functionality change. Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20240823031056.110999-3-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27bonding: implement xdo_dev_state_free and call it after deletionJianbo Liu
Add this implementation for bonding, so hardware resources can be freed from the active slave after xfrm state is deleted. The netdev used to invoke xdo_dev_state_free callback, is saved in the xfrm state (xs->xso.real_dev), which is also the bond's active slave. To prevent it from being freed, acquire netdev reference before leaving RCU read-side critical section, and release it after callback is done. And call it when deleting all SAs from old active real interface while switching current active slave. Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Link: https://patch.msgid.link/20240823031056.110999-2-jianbol@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-27selftests: forwarding: local_termination: Down ports on cleanupPetr Machata
This test neglects to put ports down on cleanup. Fix it. Fixes: 90b9566aa5cd ("selftests: forwarding: add a test for local_termination.sh") Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/bf9b79f45de378f88344d44550f0a5052b386199.1724692132.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>