summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-21btrfs: fix data overwriting bug during buffered write when block size < page ↵Qu Wenruo
size [BUG] When running generic/418 with a btrfs whose block size < page size (subpage cases), it always fails. And the following minimal reproducer is more than enough to trigger it reliably: workload() { mkfs.btrfs -s 4k -f $dev > /dev/null dmesg -C mount $dev $mnt $fsstree_dir/src/dio-invalidate-cache -r -b 4096 -n 3 -i 1 -f $mnt/diotest ret=$? umount $mnt stop_trace if [ $ret -ne 0 ]; then fail fi } for (( i = 0; i < 1024; i++)); do echo "=== $i/$runtime ===" workload done [CAUSE] With extra trace printk added to the following functions: - btrfs_buffered_write() * Which folio is touched * The file offset (start) where the buffered write is at * How many bytes are copied * The content of the write (the first 2 bytes) - submit_one_sector() * Which folio is touched * The position inside the folio * The content of the page cache (the first 2 bytes) - pagecache_isize_extended() * The parameters of the function itself * The parameters of the folio_zero_range() Which are enough to show the problem: 22.158114: btrfs_buffered_write: folio pos=0 start=0 copied=4096 content=0x0101 22.158161: submit_one_sector: r/i=5/257 folio=0 pos=0 content=0x0101 22.158609: btrfs_buffered_write: folio pos=0 start=4096 copied=4096 content=0x0101 22.158634: btrfs_buffered_write: folio pos=0 start=8192 copied=4096 content=0x0101 22.158650: pagecache_isize_extended: folio=0 from=4096 to=8192 bsize=4096 zero off=4096 len=8192 22.158682: submit_one_sector: r/i=5/257 folio=0 pos=4096 content=0x0000 22.158686: submit_one_sector: r/i=5/257 folio=0 pos=8192 content=0x0101 The tool dio-invalidate-cache will start 3 threads, each doing a buffered write with 0x01 at offset 0, 4096 and 8192, do a fsync, then do a direct read, and compare the read buffer with the write buffer. Note that all 3 btrfs_buffered_write() are writing the correct 0x01 into the page cache. But at submit_one_sector(), at file offset 4096, the content is zeroed out, by pagecache_isize_extended(). The race happens like this: Thread A is writing into range [4K, 8K). Thread B is writing into range [8K, 12k). Thread A | Thread B -------------------------------------+------------------------------------ btrfs_buffered_write() | btrfs_buffered_write() |- old_isize = 4K; | |- old_isize = 4096; |- btrfs_inode_lock() | | |- write into folio range [4K, 8K) | | |- pagecache_isize_extended() | | | extend isize from 4096 to 8192 | | | no folio_zero_range() called | | |- btrfs_inode_lock() | | | |- btrfs_inode_lock() | |- write into folio range [8K, 12K) | |- pagecache_isize_extended() | | calling folio_zero_range(4K, 8K) | | This is caused by the old_isize is | | grabbed too early, without any | | inode lock. | |- btrfs_inode_unlock() The @old_isize is grabbed without inode lock, causing race between two buffered write threads and making pagecache_isize_extended() to zero range which is still containing cached data. And this is only affecting subpage btrfs, because for regular blocksize == page size case, the function pagecache_isize_extended() will do nothing if the block size >= page size. [FIX] Grab the old i_size while holding the inode lock. This means each buffered write thread will have a stable view of the old inode size, thus avoid the above race. CC: stable@vger.kernel.org # 5.15+ Fixes: 5e8b9ef30392 ("btrfs: move pos increment and pagecache extension to btrfs_buffered_write") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-02-21btrfs: output an error message if btrfs failed to find the seed fsidQu Wenruo
[BUG] If btrfs failed to locate the seed device for whatever reason, mounting the sprouted device will fail without any meaning error message: # mkfs.btrfs -f /dev/test/scratch1 # btrfstune -S1 /dev/test/scratch1 # mount /dev/test/scratch1 /mnt/btrfs # btrfs dev add -f /dev/test/scratch2 /mnt/btrfs # umount /mnt/btrfs # btrfs dev scan -u # btrfs mount /dev/test/scratch2 /mnt/btrfs mount: /mnt/btrfs: fsconfig system call failed: No such file or directory. dmesg(1) may have more information after failed mount system call. # dmesg -t | tail -n6 BTRFS info (device dm-5): first mount of filesystem 64252ded-5953-4868-b962-cea48f7ac4ea BTRFS info (device dm-5): using crc32c (crc32c-generic) checksum algorithm BTRFS info (device dm-5): using free-space-tree BTRFS error (device dm-5): failed to read chunk tree: -2 BTRFS error (device dm-5): open_ctree failed: -2 [CAUSE] The failure to mount is pretty straight forward, just unable to find the seed device and its fsid, caused by `btrfs dev scan -u`. But the lack of any useful info is a problem. [FIX] Just add an extra error message in open_seed_devices() to indicate the error. Now the error message would look like this: BTRFS info (device dm-4): first mount of filesystem 7769223d-4db1-4e4c-ac29-0a96f53576ab BTRFS info (device dm-4): using crc32c (crc32c-generic) checksum algorithm BTRFS info (device dm-4): using free-space-tree BTRFS error (device dm-4): failed to find fsid e87c12e6-584b-4e98-8b88-962c33a619ff when attempting to open seed devices BTRFS error (device dm-4): failed to read chunk tree: -2 BTRFS error (device dm-4): open_ctree failed: -2 Link: https://github.com/kdave/btrfs-progs/issues/959 Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-02-21btrfs: do regular iput instead of delayed iput during extent map shrinkingFilipe Manana
The extent map shrinker now runs in the system unbound workqueue and no longer in kswapd context so it can directly do an iput() on inodes even if that blocks or needs to acquire any lock (we aren't holding any locks when requesting the delayed iput from the shrinker). So we don't need to add a delayed iput, wake up the cleaner and delegate the iput() to the cleaner, which also adds extra contention on the spinlock that protects the delayed iputs list. Reported-by: Ivan Shapovalov <intelfx@intelfx.name> Tested-by: Ivan Shapovalov <intelfx@intelfx.name> Link: https://lore.kernel.org/linux-btrfs/0414d690ac5680d0d77dfc930606cdc36e42e12f.camel@intelfx.name/ CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-02-21btrfs: skip inodes without loaded extent maps when shrinking extent mapsFilipe Manana
If there are inodes that don't have any loaded extent maps, we end up grabbing a reference on them and later adding a delayed iput, which wakes up the cleaner and makes it do unnecessary work. This is common when for example the inodes were open only to run stat(2) or all their extent maps were already released through the folio release callback (btrfs_release_folio()) or released by a previous run of the shrinker, or directories which never have extent maps. Reported-by: Ivan Shapovalov <intelfx@intelfx.name> Tested-by: Ivan Shapovalov <intelfx@intelfx.name> Link: https://lore.kernel.org/linux-btrfs/0414d690ac5680d0d77dfc930606cdc36e42e12f.camel@intelfx.name/ CC: stable@vger.kernel.org # 6.13+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-02-21btrfs: fix use-after-free on inode when scanning root during em shrinkingFilipe Manana
At btrfs_scan_root() we are accessing the inode's root (and fs_info) in a call to btrfs_fs_closing() after we have scheduled the inode for a delayed iput, and that can result in a use-after-free on the inode in case the cleaner kthread does the iput before we dereference the inode in the call to btrfs_fs_closing(). Fix this by using the fs_info stored already in a local variable instead of doing inode->root->fs_info. Fixes: 102044384056 ("btrfs: make the extent map shrinker run asynchronously as a work queue job") CC: stable@vger.kernel.org # 6.13+ Tested-by: Ivan Shapovalov <intelfx@intelfx.name> Link: https://lore.kernel.org/linux-btrfs/0414d690ac5680d0d77dfc930606cdc36e42e12f.camel@intelfx.name/ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-02-20scsi: ufs: core: bsg: Fix crash when arpmb command failsArthur Simchaev
If the device doesn't support arpmb we'll crash due to copying user data in bsg_transport_sg_io_fn(). In the case where ufs_bsg_exec_advanced_rpmb_req() returns an error, do not set the job's reply_len. Memory crash backtrace: 3,1290,531166405,-;ufshcd 0000:00:12.5: ARPMB OP failed: error code -22 4,1308,531166555,-;Call Trace: 4,1309,531166559,-; <TASK> 4,1310,531166565,-; ? show_regs+0x6d/0x80 4,1311,531166575,-; ? die+0x37/0xa0 4,1312,531166583,-; ? do_trap+0xd4/0xf0 4,1313,531166593,-; ? do_error_trap+0x71/0xb0 4,1314,531166601,-; ? usercopy_abort+0x6c/0x80 4,1315,531166610,-; ? exc_invalid_op+0x52/0x80 4,1316,531166622,-; ? usercopy_abort+0x6c/0x80 4,1317,531166630,-; ? asm_exc_invalid_op+0x1b/0x20 4,1318,531166643,-; ? usercopy_abort+0x6c/0x80 4,1319,531166652,-; __check_heap_object+0xe3/0x120 4,1320,531166661,-; check_heap_object+0x185/0x1d0 4,1321,531166670,-; __check_object_size.part.0+0x72/0x150 4,1322,531166679,-; __check_object_size+0x23/0x30 4,1323,531166688,-; bsg_transport_sg_io_fn+0x314/0x3b0 Fixes: 6ff265fc5ef6 ("scsi: ufs: core: bsg: Add advanced RPMB support in ufs_bsg") Cc: stable@vger.kernel.org Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Arthur Simchaev <arthur.simchaev@sandisk.com> Link: https://lore.kernel.org/r/20250220142039.250992-1-arthur.simchaev@sandisk.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20scsi: ufs: core: Set default runtime/system PM levels before ufshcd_hba_init()Manivannan Sadhasivam
Commit bb9850704c04 ("scsi: ufs: core: Honor runtime/system PM levels if set by host controller drivers") introduced the check for setting default PM levels only if the levels are uninitialized by the host controller drivers. But it missed the fact that the levels could be initialized to 0 (UFS_PM_LVL_0) on purpose by the controller drivers. Even though none of the drivers are doing so now, the logic should be fixed irrespectively. So set the default levels unconditionally before calling ufshcd_hba_init() API which initializes the controller drivers. It ensures that the controller drivers could override the default levels if required. Fixes: bb9850704c04 ("scsi: ufs: core: Honor runtime/system PM levels if set by host controller drivers") Reported-by: Bao D. Nguyen <quic_nguyenb@quicinc.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250219105047.49932-1-manivannan.sadhasivam@linaro.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20scsi: core: Clear driver private data when retrying requestYe Bin
After commit 1bad6c4a57ef ("scsi: zero per-cmd private driver data for each MQ I/O"), the xen-scsifront/virtio_scsi/snic drivers all removed code that explicitly zeroed driver-private command data. In combination with commit 464a00c9e0ad ("scsi: core: Kill DRIVER_SENSE"), after virtio_scsi performs a capacity expansion, the first request will return a unit attention to indicate that the capacity has changed. And then the original command is retried. As driver-private command data was not cleared, the request would return UA again and eventually time out and fail. Zero driver-private command data when a request is retried. Fixes: f7de50da1479 ("scsi: xen-scsifront: Remove code that zeroes driver-private command data") Fixes: c2bb87318baa ("scsi: virtio_scsi: Remove code that zeroes driver-private command data") Fixes: c3006a926468 ("scsi: snic: Remove code that zeroes driver-private command data") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250217021628.2929248-1-yebin@huaweicloud.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-02-20Merge tag 'for-v6.14-rc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply fixes from Sebastian Reichel: - core: Fix extension related lockdep warning for LED triggers - axp20x-battery: Fix fault handling for AXP717 - da9150-fg: fix potential overflow * tag 'for-v6.14-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: power: supply: axp20x_battery: Fix fault handling for AXP717 power: supply: core: Fix extension related lockdep warning power: supply: da9150-fg: fix potential overflow
2025-02-20Merge tag 'ata-6.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fix from Niklas Cassel: - Fix an unintentional masking of AHCI ports when the device tree does not define port child nodes (Damien) * tag 'ata-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: libahci_platform: Do not set mask_port_map when not needed
2025-02-20net: stmmac: dwmac-loongson: Add fix_soc_reset() callbackQunqin Zhao
Loongson's DWMAC device may take nearly two seconds to complete DMA reset, however, the default waiting time for reset is 200 milliseconds. Therefore, the following error message may appear: [14.427169] dwmac-loongson-pci 0000:00:03.2: Failed to reset the dma Fixes: 803fc61df261 ("net: stmmac: dwmac-loongson: Add Loongson Multi-channels GMAC support") Cc: stable@vger.kernel.org Signed-off-by: Qunqin Zhao <zhaoqunqin@loongson.cn> Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Yanteng Si <si.yanteng@linux.dev> Link: https://patch.msgid.link/20250219020701.15139-1-zhaoqunqin@loongson.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21Merge tag 'drm-msm-fixes-2025-02-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-fixes Fixes for v6.14-rc4 Display: * More catalog fixes: - to skip watchdog programming through top block if its not present - fix the setting of WB mask to ensure the WB input control is programmed correctly through ping-pong - drop lm_pair for sm6150 as that chipset does not have any 3dmerge block * Fix the mode validation logic for DP/eDP to account for widebus (2ppc) to allow high clock resolutions * Fix to disable dither during encoder disable as otherwise this was causing kms_writeback failure due to resource sharing between * WB and DSI paths as DSI uses dither but WB does not * Fixes for virtual planes, namely to drop extraneous return and fix uninitialized variables * Fix to avoid spill-over of DSC encoder block bits when programming the bits-per-component * Fixes in the DSI PHY to protect against concurrent access of PHY_CMN_CLK_CFG regs between clock and display drivers Core/GPU: * Fix non-blocking fence wait incorrectly rounding up to 1 jiffy timeout * Only print GMU fw version once, instead of each time the GPU resumes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGtt2AODBXdod8ULXcAygf_qYvwRDVeUVtODx=2jErp6cA@mail.gmail.com
2025-02-21Merge tag 'drm-intel-fixes-2025-02-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Use spin_lock_irqsave() in interruptible context on guc submission (Krzysztof) - Fixes on DDI and TRANS programming (Imre) - Make sure all planes in use by the joiner have their crtc included (Ville) - Fix 128b/132b modeset issues (Imre) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Z7dgcUG_hvityvHn@intel.com
2025-02-20Merge tag 'nvme-6.14-2025-02-20' of git://git.infradead.org/nvme into block-6.14Jens Axboe
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.14 - FC controller state check fixes (Daniel) - PCI Endpoint fixes (Damien) - TCP connection failure fixe (Caleb) - TCP handling C2HTermReq PDU (Maurizio) - RDMA queue state check (Ruozhu) - Apple controller fixes (Hector) - Target crash on disbaled namespace (Hannes)" * tag 'nvme-6.14-2025-02-20' of git://git.infradead.org/nvme: nvme: only allow entering LIVE from CONNECTING state nvme-fc: rely on state transitions to handle connectivity loss apple-nvme: Support coprocessors left idle apple-nvme: Release power domains when probe fails nvmet: Use enum definitions instead of hardcoded values nvme: Cleanup the definition of the controller config register fields nvme/ioctl: add missing space in err message nvme-tcp: fix connect failure on receiving partial ICResp PDU nvme: tcp: Fix compilation warning with W=1 nvmet: pci-epf: Avoid RCU stalls under heavy workload nvmet: pci-epf: Do not uselessly write the CSTS register nvmet: pci-epf: Correctly initialize CSTS when enabling the controller nvmet-rdma: recheck queue state is LIVE in state lock in recv done nvmet: Fix crash when a namespace is disabled nvme-tcp: add basic support for the C2HTermReq PDU nvme-pci: quirk Acer FA100 for non-uniqueue identifiers
2025-02-21Merge tag 'drm-xe-fixes-2025-02-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes - Fix error handling in xe_irq_install (Lucas) - Fix devcoredump format (Jose, Lucas) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Z7dePS3a9POnjrVL@intel.com
2025-02-20Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull BPF fixes from Daniel Borkmann: - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels (Alan Maguire) - Fix a missing allocation failure check in BPF verifier's acquire_lock_state (Kumar Kartikeya Dwivedi) - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima) - Fix a deadlock when freeing BPF cgroup storage (Abel Wu) - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex (Andrii Nakryiko) - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is accessing skb data not containing an Ethernet header (Shigeru Yoshida) - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai) - Several BPF sockmap fixes to address incorrect TCP copied_seq calculations, which prevented correct data reads from recv(2) in user space (Jiayuan Chen) - Two fixes for BPF map lookup nullness elision (Daniel Xu) - Fix a NULL-pointer dereference from vmlinux BTF lookup in bpf_sk_storage_tracing_allowed (Jared Kangas) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests: bpf: test batch lookup on array of maps with holes bpf: skip non exist keys in generic_map_lookup_batch bpf: Handle allocation failure in acquire_lock_state bpf: verifier: Disambiguate get_constant_map_key() errors bpf: selftests: Test constant key extraction on irrelevant maps bpf: verifier: Do not extract constant map keys for irrelevant maps bpf: Fix softlockup in arena_map_free on 64k page kernel net: Add rx_skb of kfree_skb to raw_tp_null_args[]. bpf: Fix deadlock when freeing cgroup storage selftests/bpf: Add strparser test for bpf selftests/bpf: Fix invalid flag of recv() bpf: Disable non stream socket for strparser bpf: Fix wrong copied_seq calculation strparser: Add read_sock callback bpf: avoid holding freeze_mutex during mmap operation bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic selftests/bpf: Adjust data size to have ETH_HLEN bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type() bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-20MAINTAINERS: Change maintainer for RDTFenghua Yu
Due to job transition, I am stepping down as RDT maintainer. Add Tony as a co-maintainer. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20250131190731.3981085-1-fenghua.yu%40intel.com
2025-02-21Merge tag 'drm-misc-fixes-2025-02-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes An reset signal polarity fix for the jd9365da-h3 panel, a folio handling fix and config fix in nouveau, a dmem cgroup descendant pool handling fix, and a missing header for amdxdna. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250220-glorious-cockle-of-might-5b35f7@houat
2025-02-20bcachefs: Fix memmove when move keys downAlan Huang
The fix alone doesn't fix [1], but should be applied before debugging that. [1] https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-20bcachefs: print op->nonce on data update inconsistencyKent Overstreet
"nonce inconstancy" is popping up again, causing us to go emergency read-only. This one looks less serious, i.e. specific to the encryption path and not indicative of a data corruption bug. But we'll need more info to track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-20soc: loongson: loongson2_guts: Add check for devm_kstrdup()Haoxiang Li
Add check for the return value of devm_kstrdup() in loongson2_guts_probe() to catch potential exception. Fixes: b82621ac8450 ("soc: loongson: add GUTS driver for loongson-2 platforms") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Link: https://lore.kernel.org/r/20250220081714.2676828-1-haoxiang_li2024@163.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-02-20Merge tag 'scmi-fix-6.14' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes Arm SCMI fix for v6.14 Just a single fix to address the incorrect size of the Tx buffer in the function scmi_imx_misc_ctrl_set() which is part of NXP/i.MX SCMI vendor extensions. * tag 'scmi-fix-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_scmi: imx: Correct tx size of scmi_imx_misc_ctrl_set Link: https://lore.kernel.org/r/20250217155246.1668182-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-02-20net: phy: qcom: qca807x fix condition for DAC_DSP_BIAS_CURRENTGeorge Moussalem
While setting the DAC value, the wrong boolean value is evaluated to set the DSP bias current. So let's correct the conditional statement and use the right boolean value read from the DTS set in the priv. Cc: stable@vger.kernel.org Fixes: d1cb613efbd3 ("net: phy: qcom: add support for QCA807x PHY Family") Signed-off-by: George Moussalem <george.moussalem@outlook.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250219130923.7216-1-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-20MAINTAINERS: Add entry for DMEM cgroup controllerMaarten Lankhorst
The cgroups controller is currently maintained through the drm-misc tree, so lets add Maxime Ripard, Natalie Vock and me as specific maintainers for dmem. We keep the cgroup mailing list CC'd on all cgroup specific patches. Acked-by: Maxime Ripard <mripard@kernel.org> Acked-by: Natalie Vock <natalie.vock@gmx.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Koutný <mkoutny@suse.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250220140757.16823-1-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-02-20Bluetooth: L2CAP: Fix L2CAP_ECRED_CONN_RSP responseLuiz Augusto von Dentz
L2CAP_ECRED_CONN_RSP needs to respond DCID in the same order received as SCID but the order is reversed due to use of list_add which actually prepend channels to the list so the response is reversed: > ACL Data RX: Handle 16 flags 0x02 dlen 26 LE L2CAP: Enhanced Credit Connection Request (0x17) ident 2 len 18 PSM: 39 (0x0027) MTU: 256 MPS: 251 Credits: 65535 Source CID: 116 Source CID: 117 Source CID: 118 Source CID: 119 Source CID: 120 < ACL Data TX: Handle 16 flags 0x00 dlen 26 LE L2CAP: Enhanced Credit Connection Response (0x18) ident 2 len 18 MTU: 517 MPS: 247 Credits: 3 Result: Connection successful (0x0000) Destination CID: 68 Destination CID: 67 Destination CID: 66 Destination CID: 65 Destination CID: 64 Also make sure the response don't include channels that are not on BT_CONNECT2 since the chan->ident can be set to the same value as in the following trace: < ACL Data TX: Handle 16 flags 0x00 dlen 12 LE L2CAP: LE Flow Control Credit (0x16) ident 6 len 4 Source CID: 64 Credits: 1 ... > ACL Data RX: Handle 16 flags 0x02 dlen 18 LE L2CAP: Enhanced Credit Connection Request (0x17) ident 6 len 10 PSM: 39 (0x0027) MTU: 517 MPS: 251 Credits: 255 Source CID: 70 < ACL Data TX: Handle 16 flags 0x00 dlen 20 LE L2CAP: Enhanced Credit Connection Response (0x18) ident 6 len 12 MTU: 517 MPS: 247 Credits: 3 Result: Connection successful (0x0000) Destination CID: 64 Destination CID: 68 Closes: https://github.com/bluez/bluez/issues/1094 Fixes: 9aa9d9473f15 ("Bluetooth: L2CAP: Fix responding with wrong PDU type") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-02-20Bluetooth: Always allow SCO packets for user channelHsin-chen Chuang
The SCO packets from Bluetooth raw socket are now rejected because hci_conn_num is left 0. This patch allows such the usecase to enable the userspace SCO support. Fixes: b16b327edb4d ("Bluetooth: btusb: add sysfs attribute to control USB alt setting") Signed-off-by: Hsin-chen Chuang <chharry@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-02-20Merge tag 'net-6.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Smaller than usual with no fixes from any subtree. Current release - regressions: - core: fix race of rtnl_net_lock(dev_net(dev)) Previous releases - regressions: - core: remove the single page frag cache for good - flow_dissector: fix handling of mixed port and port-range keys - sched: cls_api: fix error handling causing NULL dereference - tcp: - adjust rcvq_space after updating scaling ratio - drop secpath at the same time as we currently drop dst - eth: gtp: suppress list corruption splat in gtp_net_exit_batch_rtnl(). Previous releases - always broken: - vsock: - fix variables initialization during resuming - for connectible sockets allow only connected - eth: - geneve: fix use-after-free in geneve_find_dev() - ibmvnic: don't reference skb after sending to VIOS" * tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) Revert "net: skb: introduce and use a single page frag cache" net: allow small head cache usage with large MAX_SKB_FRAGS values nfp: bpf: Add check for nfp_app_ctrl_msg_alloc() tcp: drop secpath at the same time as we currently drop dst net: axienet: Set mac_managed_pm arp: switch to dev_getbyhwaddr() in arp_req_set_public() net: Add non-RCU dev_getbyhwaddr() helper sctp: Fix undefined behavior in left shift operation selftests/bpf: Add a specific dst port matching flow_dissector: Fix port range key handling in BPF conversion selftests/net/forwarding: Add a test case for tc-flower of mixed port and port-range flow_dissector: Fix handling of mixed port and port-range keys geneve: Suppress list corruption splat in geneve_destroy_tunnels(). gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl(). dev: Use rtnl_net_dev_lock() in unregister_netdev(). net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net(). net: Add net_passive_inc() and net_passive_dec(). net: pse-pd: pd692x0: Fix power limit retrieval MAINTAINERS: trim the GVE entry gve: set xdp redirect target only when it is available ...
2025-02-20smb: client: Add check for next_buffer in receive_encrypted_standard()Haoxiang Li
Add check for the return value of cifs_buf_get() and cifs_small_buf_get() in receive_encrypted_standard() to prevent null pointer dereference. Fixes: eec04ea11969 ("smb: client: fix OOB in receive_encrypted_standard()") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-02-20nvme: only allow entering LIVE from CONNECTING stateDaniel Wagner
The fabric transports and also the PCI transport are not entering the LIVE state from NEW or RESETTING. This makes the state machine more restrictive and allows to catch not supported state transitions, e.g. directly switching from RESETTING to LIVE. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-20nvme-fc: rely on state transitions to handle connectivity lossDaniel Wagner
It's not possible to call nvme_state_ctrl_state with holding a spin lock, because nvme_state_ctrl_state calls cancel_delayed_work_sync when fastfail is enabled. Instead syncing the ASSOC_FLAG and state transitions using a lock, it's possible to only rely on the state machine transitions. That means nvme_fc_ctrl_connectivity_loss should unconditionally call nvme_reset_ctrl which avoids the read race on the ctrl state variable. Actually, it's not necessary to test in which state the ctrl is, the reset work will only scheduled when the state machine is in LIVE state. In nvme_fc_create_association, the LIVE state can only be entered if it was previously CONNECTING. If this is not possible then the reset handler got triggered. Thus just error out here. Fixes: ee59e3820ca9 ("nvme-fc: do not ignore connectivity loss during connecting") Closes: https://lore.kernel.org/all/denqwui6sl5erqmz2gvrwueyxakl5txzbbiu3fgebryzrfxunm@iwxuthct377m/ Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-20ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2Dmitry Panchenko
Re-add the sample-rate quirk for the Pioneer DJM-900NXS2. This device does not work without setting sample-rate. Signed-off-by: Dmitry Panchenko <dmitry@d-systems.ee> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20250220161540.3624660-1-dmitry@d-systems.ee Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-02-20Merge tag 'v6.14-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Fix for chmod regression - Two reparse point related fixes - One minor cleanup (for GCC 14 compiles) - Fix for SMB3.1.1 POSIX Extensions reporting incorrect file type * tag 'v6.14-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Treat unhandled directory name surrogate reparse points as mount directory nodes cifs: Throw -EOPNOTSUPP error on unsupported reparse point type from parse_reparse_point() smb311: failure to open files of length 1040 when mounting with SMB3.1.1 POSIX extensions smb: client, common: Avoid multiple -Wflex-array-member-not-at-end warnings smb: client: fix chmod(2) regression with ATTR_READONLY
2025-02-20Merge tag 'bcachefs-2025-02-20' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Small stuff: - The fsck code for Hongbo's directory i_size patch was wrong, caught by transaction restart injection: we now have the CI running another test variant with restart injection enabled - Another fixup for reflink pointers to missing indirect extents: previous fix was for fsck code, this fixes the normal runtime paths - Another small srcu lock hold time fix, reported by jpsollie" * tag 'bcachefs-2025-02-20' of git://evilpiepirate.org/bcachefs: bcachefs: Fix srcu lock warning in btree_update_nodes_written() bcachefs: Fix bch2_indirect_extent_missing_error() bcachefs: Fix fsck directory i_size checking
2025-02-20Merge tag 'xfs-fixes-6.14-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: "Just a collection of bug fixes, nothing really stands out" * tag 'xfs-fixes-6.14-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: flush inodegc before swapon xfs: rename xfs_iomap_swapfile_activate to xfs_vm_swap_activate xfs: Do not allow norecovery mount with quotacheck xfs: do not check NEEDSREPAIR if ro,norecovery mount. xfs: fix data fork format filtering during inode repair xfs: fix online repair probing when CONFIG_XFS_ONLINE_REPAIR=n
2025-02-20KVM: arm64: Ensure a VMID is allocated before programming VTTBR_EL2Oliver Upton
Vladimir reports that a race condition to attach a VMID to a stage-2 MMU sometimes results in a vCPU entering the guest with a VMID of 0: | CPU1 | CPU2 | | | | kvm_arch_vcpu_ioctl_run | | vcpu_load <= load VTTBR_EL2 | | kvm_vmid->id = 0 | | | kvm_arch_vcpu_ioctl_run | | vcpu_load <= load VTTBR_EL2 | | with kvm_vmid->id = 0| | kvm_arm_vmid_update <= allocates fresh | | kvm_vmid->id and | | reload VTTBR_EL2 | | | | | kvm_arm_vmid_update <= observes that kvm_vmid->id | | already allocated, | | skips reload VTTBR_EL2 Oh yeah, it's as bad as it looks. Remember that VHE loads the stage-2 MMU eagerly but a VMID only gets attached to the MMU later on in the KVM_RUN loop. Even in the "best case" where VTTBR_EL2 correctly gets reprogrammed before entering the EL1&0 regime, there is a period of time where hardware is configured with VMID 0. That's completely insane. So, rather than decorating the 'late' binding with another hack, just allocate the damn thing up front. Attaching a VMID from vcpu_load() is still rollover safe since (surprise!) it'll always get called after a vCPU was preempted. Excuse me while I go find a brown paper bag. Cc: stable@vger.kernel.org Fixes: 934bf871f011 ("KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()") Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250219220737.130842-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-20perf/x86/intel: Fix event constraints for LNCKan Liang
According to the latest event list, update the event constraint tables for Lion Cove core. The general rule (the event codes < 0x90 are restricted to counters 0-3.) has been removed. There is no restriction for most of the performance monitoring events. Fixes: a932aa0e868f ("perf/x86: Add Lunar Lake and Arrow Lake support") Reported-by: Amiri Khalil <amiri.khalil@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250219141005.2446823-1-kan.liang@linux.intel.com
2025-02-20Merge tag 'md-6.14-20250218' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.14 Pull MD fix from Yu: "This patch, by Bart Van Assche, fixes queue limits error handling for raid0, raid1 and raid10." * tag 'md-6.14-20250218' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux: md/raid*: Fix the set_queue_limits implementations
2025-02-20fuse: don't truncate cached, mutated symlinkMiklos Szeredi
Fuse allows the value of a symlink to change and this property is exploited by some filesystems (e.g. CVMFS). It has been observed, that sometimes after changing the symlink contents, the value is truncated to the old size. This is caused by fuse_getattr() racing with fuse_reverse_inval_inode(). fuse_reverse_inval_inode() updates the fuse_inode's attr_version, which results in fuse_change_attributes() exiting before updating the cached attributes This is okay, as the cached attributes remain invalid and the next call to fuse_change_attributes() will likely update the inode with the correct values. The reason this causes problems is that cached symlinks will be returned through page_get_link(), which truncates the symlink to inode->i_size. This is correct for filesystems that don't mutate symlinks, but in this case it causes bad behavior. The solution is to just remove this truncation. This can cause a regression in a filesystem that relies on supplying a symlink larger than the file size, but this is unlikely. If that happens we'd need to make this behavior conditional. Reported-by: Laura Promberger <laura.promberger@cern.ch> Tested-by: Sam Lewis <samclewis@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250220100258.793363-1-mszeredi@redhat.com Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-20gpiolib: don't bail out if get_direction() fails in gpiochip_add_data()Bartosz Golaszewski
Since commit 9d846b1aebbe ("gpiolib: check the return value of gpio_chip::get_direction()") we check the return value of the get_direction() callback as per its API contract. Some drivers have been observed to fail to register now as they may call get_direction() in gpiochip_add_data() in contexts where it has always silently failed. Until we audit all drivers, replace the bail-out to a kernel log warning. Fixes: 9d846b1aebbe ("gpiolib: check the return value of gpio_chip::get_direction()") Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/all/Z7VFB1nST6lbmBIo@finisterre.sirena.org.uk/ Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/all/dfe03f88-407e-4ef1-ad30-42db53bbd4e4@samsung.com/ Tested-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250219144356.258635-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-20Merge branch 'net-remove-the-single-page-frag-cache-for-good'Paolo Abeni
Paolo Abeni says: ==================== net: remove the single page frag cache for good This is another attempt at reverting commit dbae2b062824 ("net: skb: introduce and use a single page frag cache"), as it causes regressions in specific use-cases. Reverting such commit uncovers an allocation issue for build with CONFIG_MAX_SKB_FRAGS=45, as reported by Sabrina. This series handle the latter in patch 1 and brings the revert in patch 2. Note that there is a little chicken-egg problem, as I included into the patch 1's changelog the splat that would be visible only applying first the revert: I think current patch order is better for bisectability, still the splat is useful for correct attribution. ==================== Link: https://patch.msgid.link/cover.1739899357.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20Revert "net: skb: introduce and use a single page frag cache"Paolo Abeni
After the previous commit is finally safe to revert commit dbae2b062824 ("net: skb: introduce and use a single page frag cache"): do it here. The intended goal of such change was to counter a performance regression introduced by commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs"). Unfortunately, the blamed commit introduces another regression for the virtio_net driver. Such a driver calls napi_alloc_skb() with a tiny size, so that the whole head frag could fit a 512-byte block. The single page frag cache uses a 1K fragment for such allocation, and the additional overhead, under small UDP packets flood, makes the page allocator a bottleneck. Thanks to commit bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head"), this revert does not re-introduce the original regression. Actually, in the relevant test on top of this revert, I measure a small but noticeable positive delta, just above noise level. The revert itself required some additional mangling due to recent updates in the affected code. Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: dbae2b062824 ("net: skb: introduce and use a single page frag cache") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20net: allow small head cache usage with large MAX_SKB_FRAGS valuesPaolo Abeni
Sabrina reported the following splat: WARNING: CPU: 0 PID: 1 at net/core/dev.c:6935 netif_napi_add_weight_locked+0x8f2/0xba0 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-net-00092-g011b03359038 #996 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:netif_napi_add_weight_locked+0x8f2/0xba0 Code: e8 c3 e6 6a fe 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc c7 44 24 10 ff ff ff ff e9 8f fb ff ff e8 9e e6 6a fe <0f> 0b e9 d3 fe ff ff e8 92 e6 6a fe 48 8b 04 24 be ff ff ff ff 48 RSP: 0000:ffffc9000001fc60 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88806ce48128 RCX: 1ffff11001664b9e RDX: ffff888008f00040 RSI: ffffffff8317ca42 RDI: ffff88800b325cb6 RBP: ffff88800b325c40 R08: 0000000000000001 R09: ffffed100167502c R10: ffff88800b3a8163 R11: 0000000000000000 R12: ffff88800ac1c168 R13: ffff88800ac1c168 R14: ffff88800ac1c168 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff888008201000 CR3: 0000000004c94001 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> gro_cells_init+0x1ba/0x270 xfrm_input_init+0x4b/0x2a0 xfrm_init+0x38/0x50 ip_rt_init+0x2d7/0x350 ip_init+0xf/0x20 inet_init+0x406/0x590 do_one_initcall+0x9d/0x2e0 do_initcalls+0x23b/0x280 kernel_init_freeable+0x445/0x490 kernel_init+0x20/0x1d0 ret_from_fork+0x46/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> irq event stamp: 584330 hardirqs last enabled at (584338): [<ffffffff8168bf87>] __up_console_sem+0x77/0xb0 hardirqs last disabled at (584345): [<ffffffff8168bf6c>] __up_console_sem+0x5c/0xb0 softirqs last enabled at (583242): [<ffffffff833ee96d>] netlink_insert+0x14d/0x470 softirqs last disabled at (583754): [<ffffffff8317c8cd>] netif_napi_add_weight_locked+0x77d/0xba0 on kernel built with MAX_SKB_FRAGS=45, where SKB_WITH_OVERHEAD(1024) is smaller than GRO_MAX_HEAD. Such built additionally contains the revert of the single page frag cache so that napi_get_frags() ends up using the page frag allocator, triggering the splat. Note that the underlying issue is independent from the mentioned revert; address it ensuring that the small head cache will fit either TCP and GRO allocation and updating napi_alloc_skb() and __netdev_alloc_skb() to select kmalloc() usage for any allocation fitting such cache. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()Haoxiang Li
Add check for the return value of nfp_app_ctrl_msg_alloc() in nfp_bpf_cmsg_alloc() to prevent null pointer dereference. Fixes: ff3d43f7568c ("nfp: bpf: implement helpers for FW map ops") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Link: https://patch.msgid.link/20250218030409.2425798-1-haoxiang_li2024@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20tcp: drop secpath at the same time as we currently drop dstSabrina Dubroca
Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while running tests that boil down to: - create a pair of netns - run a basic TCP test over ipcomp6 - delete the pair of netns The xfrm_state found on spi_byaddr was not deleted at the time we delete the netns, because we still have a reference on it. This lingering reference comes from a secpath (which holds a ref on the xfrm_state), which is still attached to an skb. This skb is not leaked, it ends up on sk_receive_queue and then gets defer-free'd by skb_attempt_defer_free. The problem happens when we defer freeing an skb (push it on one CPU's defer_list), and don't flush that list before the netns is deleted. In that case, we still have a reference on the xfrm_state that we don't expect at this point. We already drop the skb's dst in the TCP receive path when it's no longer needed, so let's also drop the secpath. At this point, tcp_filter has already called into the LSM hooks that may require the secpath, so it should not be needed anymore. However, in some of those places, the MPTCP extension has just been attached to the skb, so we cannot simply drop all extensions. Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/5055ba8f8f72bdcb602faa299faca73c280b7735.1739743613.git.sd@queasysnail.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20net: axienet: Set mac_managed_pmNick Hu
The external PHY will undergo a soft reset twice during the resume process when it wake up from suspend. The first reset occurs when the axienet driver calls phylink_of_phy_connect(), and the second occurs when mdio_bus_phy_resume() invokes phy_init_hw(). The second soft reset of the external PHY does not reinitialize the internal PHY, which causes issues with the internal PHY, resulting in the PHY link being down. To prevent this, setting the mac_managed_pm flag skips the mdio_bus_phy_resume() function. Fixes: a129b41fe0a8 ("Revert "net: phy: dp83867: perform soft reset and retain established link"") Signed-off-by: Nick Hu <nick.hu@sifive.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250217055843.19799-1-nick.hu@sifive.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20RDMA/mlx5: Fix AH static rate parsingPatrisious Haddad
Previously static rate wasn't translated according to our PRM but simply used the 4 lower bytes. Correctly translate static rate value passed in AH creation attribute according to our PRM expected values. In addition change 800GB mapping to zero, which is the PRM specified value. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://patch.msgid.link/18ef4cc5396caf80728341eb74738cd777596f60.1739187089.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-20RDMA/mlx5: Fix implicit ODP hang on parent deregistrationYishai Hadas
Fix the destroy_unused_implicit_child_mr() to prevent hanging during parent deregistration as of below [1]. Upon entering destroy_unused_implicit_child_mr(), the reference count for the implicit MR parent is incremented using: refcount_inc_not_zero(). A corresponding decrement must be performed if free_implicit_child_mr_work() is not called. The code has been updated to properly manage the reference count that was incremented. [1] INFO: task python3:2157 blocked for more than 120 seconds. Not tainted 6.12.0-rc7+ #1633 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:python3 state:D stack:0 pid:2157 tgid:2157 ppid:1685 flags:0x00000000 Call Trace: <TASK> __schedule+0x420/0xd30 schedule+0x47/0x130 __mlx5_ib_dereg_mr+0x379/0x5d0 [mlx5_ib] ? __pfx_autoremove_wake_function+0x10/0x10 ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs] uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs] uobj_destroy+0x3f/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs] ? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs] ? lock_acquire+0xc1/0x2f0 ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0x116/0x170 [ib_uverbs] ? lock_release+0xc6/0x280 ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] __x64_sys_ioctl+0x1b0/0xa70 ? kmem_cache_free+0x221/0x400 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f20f21f017b RSP: 002b:00007ffcfc4a77c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffcfc4a78d8 RCX: 00007f20f21f017b RDX: 00007ffcfc4a78c0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffcfc4a78a0 R08: 000056147d125190 R09: 00007f20f1f14c60 R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffcfc4a7890 R13: 000000000000001c R14: 000056147d100fc0 R15: 00007f20e365c9d0 </TASK> Fixes: d3d930411ce3 ("RDMA/mlx5: Fix implicit ODP use after free") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@nvidia.com> Link: https://patch.msgid.link/80f2fcd19952dfa7d9981d93fd6359b4471f8278.1739186929.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-19Merge branch 'net-core-improvements-to-device-lookup-by-hardware-address'Jakub Kicinski
Breno Leitao says: ==================== net: core: improvements to device lookup by hardware address. The first patch adds a new dev_getbyhwaddr() helper function for finding devices by hardware address when the rtnl lock is held. This prevents PROVE_LOCKING warnings that occurred when rtnl lock was held but the RCU read lock wasn't. The common address comparison logic is extracted into dev_comp_addr() to avoid code duplication. The second coverts arp_req_set_public() to the new helper. ==================== Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-0-d3d6892db9e1@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19arp: switch to dev_getbyhwaddr() in arp_req_set_public()Breno Leitao
The arp_req_set_public() function is called with the rtnl lock held, which provides enough synchronization protection. This makes the RCU variant of dev_getbyhwaddr() unnecessary. Switch to using the simpler dev_getbyhwaddr() function since we already have the required rtnl locking. This change helps maintain consistency in the networking code by using the appropriate helper function for the existing locking context. Since we're not holding the RCU read lock in arp_req_set_public() existing code could trigger false positive locking warnings. Fixes: 941666c2e3e0 ("net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()") Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-2-d3d6892db9e1@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19net: Add non-RCU dev_getbyhwaddr() helperBreno Leitao
Add dedicated helper for finding devices by hardware address when holding rtnl_lock, similar to existing dev_getbyhwaddr_rcu(). This prevents PROVE_LOCKING warnings when rtnl_lock is held but RCU read lock is not. Extract common address comparison logic into dev_addr_cmp(). The context about this change could be found in the following discussion: Link: https://lore.kernel.org/all/20250206-scarlet-ermine-of-improvement-1fcac5@leitao/ Cc: kuniyu@amazon.com Cc: ushankar@purestorage.com Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-1-d3d6892db9e1@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>