summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-02-20nvme-fc: rely on state transitions to handle connectivity lossDaniel Wagner
It's not possible to call nvme_state_ctrl_state with holding a spin lock, because nvme_state_ctrl_state calls cancel_delayed_work_sync when fastfail is enabled. Instead syncing the ASSOC_FLAG and state transitions using a lock, it's possible to only rely on the state machine transitions. That means nvme_fc_ctrl_connectivity_loss should unconditionally call nvme_reset_ctrl which avoids the read race on the ctrl state variable. Actually, it's not necessary to test in which state the ctrl is, the reset work will only scheduled when the state machine is in LIVE state. In nvme_fc_create_association, the LIVE state can only be entered if it was previously CONNECTING. If this is not possible then the reset handler got triggered. Thus just error out here. Fixes: ee59e3820ca9 ("nvme-fc: do not ignore connectivity loss during connecting") Closes: https://lore.kernel.org/all/denqwui6sl5erqmz2gvrwueyxakl5txzbbiu3fgebryzrfxunm@iwxuthct377m/ Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Daniel Wagner <wagi@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-02-20ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2Dmitry Panchenko
Re-add the sample-rate quirk for the Pioneer DJM-900NXS2. This device does not work without setting sample-rate. Signed-off-by: Dmitry Panchenko <dmitry@d-systems.ee> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20250220161540.3624660-1-dmitry@d-systems.ee Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-02-20Merge tag 'v6.14-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Fix for chmod regression - Two reparse point related fixes - One minor cleanup (for GCC 14 compiles) - Fix for SMB3.1.1 POSIX Extensions reporting incorrect file type * tag 'v6.14-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Treat unhandled directory name surrogate reparse points as mount directory nodes cifs: Throw -EOPNOTSUPP error on unsupported reparse point type from parse_reparse_point() smb311: failure to open files of length 1040 when mounting with SMB3.1.1 POSIX extensions smb: client, common: Avoid multiple -Wflex-array-member-not-at-end warnings smb: client: fix chmod(2) regression with ATTR_READONLY
2025-02-20Merge tag 'bcachefs-2025-02-20' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Small stuff: - The fsck code for Hongbo's directory i_size patch was wrong, caught by transaction restart injection: we now have the CI running another test variant with restart injection enabled - Another fixup for reflink pointers to missing indirect extents: previous fix was for fsck code, this fixes the normal runtime paths - Another small srcu lock hold time fix, reported by jpsollie" * tag 'bcachefs-2025-02-20' of git://evilpiepirate.org/bcachefs: bcachefs: Fix srcu lock warning in btree_update_nodes_written() bcachefs: Fix bch2_indirect_extent_missing_error() bcachefs: Fix fsck directory i_size checking
2025-02-20Merge tag 'xfs-fixes-6.14-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: "Just a collection of bug fixes, nothing really stands out" * tag 'xfs-fixes-6.14-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: flush inodegc before swapon xfs: rename xfs_iomap_swapfile_activate to xfs_vm_swap_activate xfs: Do not allow norecovery mount with quotacheck xfs: do not check NEEDSREPAIR if ro,norecovery mount. xfs: fix data fork format filtering during inode repair xfs: fix online repair probing when CONFIG_XFS_ONLINE_REPAIR=n
2025-02-20KVM: arm64: Ensure a VMID is allocated before programming VTTBR_EL2Oliver Upton
Vladimir reports that a race condition to attach a VMID to a stage-2 MMU sometimes results in a vCPU entering the guest with a VMID of 0: | CPU1 | CPU2 | | | | kvm_arch_vcpu_ioctl_run | | vcpu_load <= load VTTBR_EL2 | | kvm_vmid->id = 0 | | | kvm_arch_vcpu_ioctl_run | | vcpu_load <= load VTTBR_EL2 | | with kvm_vmid->id = 0| | kvm_arm_vmid_update <= allocates fresh | | kvm_vmid->id and | | reload VTTBR_EL2 | | | | | kvm_arm_vmid_update <= observes that kvm_vmid->id | | already allocated, | | skips reload VTTBR_EL2 Oh yeah, it's as bad as it looks. Remember that VHE loads the stage-2 MMU eagerly but a VMID only gets attached to the MMU later on in the KVM_RUN loop. Even in the "best case" where VTTBR_EL2 correctly gets reprogrammed before entering the EL1&0 regime, there is a period of time where hardware is configured with VMID 0. That's completely insane. So, rather than decorating the 'late' binding with another hack, just allocate the damn thing up front. Attaching a VMID from vcpu_load() is still rollover safe since (surprise!) it'll always get called after a vCPU was preempted. Excuse me while I go find a brown paper bag. Cc: stable@vger.kernel.org Fixes: 934bf871f011 ("KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()") Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250219220737.130842-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-20perf/x86/intel: Fix event constraints for LNCKan Liang
According to the latest event list, update the event constraint tables for Lion Cove core. The general rule (the event codes < 0x90 are restricted to counters 0-3.) has been removed. There is no restriction for most of the performance monitoring events. Fixes: a932aa0e868f ("perf/x86: Add Lunar Lake and Arrow Lake support") Reported-by: Amiri Khalil <amiri.khalil@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250219141005.2446823-1-kan.liang@linux.intel.com
2025-02-20Merge tag 'md-6.14-20250218' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into block-6.14 Pull MD fix from Yu: "This patch, by Bart Van Assche, fixes queue limits error handling for raid0, raid1 and raid10." * tag 'md-6.14-20250218' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux: md/raid*: Fix the set_queue_limits implementations
2025-02-20fuse: don't truncate cached, mutated symlinkMiklos Szeredi
Fuse allows the value of a symlink to change and this property is exploited by some filesystems (e.g. CVMFS). It has been observed, that sometimes after changing the symlink contents, the value is truncated to the old size. This is caused by fuse_getattr() racing with fuse_reverse_inval_inode(). fuse_reverse_inval_inode() updates the fuse_inode's attr_version, which results in fuse_change_attributes() exiting before updating the cached attributes This is okay, as the cached attributes remain invalid and the next call to fuse_change_attributes() will likely update the inode with the correct values. The reason this causes problems is that cached symlinks will be returned through page_get_link(), which truncates the symlink to inode->i_size. This is correct for filesystems that don't mutate symlinks, but in this case it causes bad behavior. The solution is to just remove this truncation. This can cause a regression in a filesystem that relies on supplying a symlink larger than the file size, but this is unlikely. If that happens we'd need to make this behavior conditional. Reported-by: Laura Promberger <laura.promberger@cern.ch> Tested-by: Sam Lewis <samclewis@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250220100258.793363-1-mszeredi@redhat.com Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-20bus: simple-pm-bus: fix forced runtime PM useJohan Hovold
The simple-pm-bus driver only enables runtime PM for some buses ('simple-pm-bus') yet has started calling pm_runtime_force_suspend() and pm_runtime_force_resume() during system suspend unconditionally. This currently works, but that is not obvious and depends on implementation details which may change at some point. Add dedicated system sleep ops and only call pm_runtime_force_suspend() and pm_runtime_force_resume() for buses that use runtime PM to avoid any future surprises. Fixes: c45839309c3d ("drivers: bus: simple-pm-bus: Use clocks") Cc: Liu Ying <victor.liu@nxp.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Liu Ying <victor.liu@nxp.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20250214102130.3000-1-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20char: misc: deallocate static minor in error pathThadeu Lima de Souza Cascardo
When creating sysfs files fail, the allocated minor must be freed such that it can be later reused. That is specially harmful for static minor numbers, since those would always fail to register later on. Fixes: 6d04d2b554b1 ("misc: misc_minor_alloc to use ida for all dynamic/misc dynamic minors") Cc: stable <stable@kernel.org> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://lore.kernel.org/r/20250123123249.4081674-5-cascardo@igalia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20eeprom: digsy_mtc: Make GPIO lookup table match the deviceAndy Shevchenko
The dev_id value in the GPIO lookup table must match to the device instance name, which in this case is combined of name and platform device ID, i.e. "spi_gpio.1". But the table assumed that there was no platform device ID defined, which is wrong. Fix the dev_id value accordingly. Fixes: 9b00bc7b901f ("spi: spi-gpio: Rewrite to use GPIO descriptors") Cc: stable <stable@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250206220311.1554075-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20drivers: virt: acrn: hsm: Use kzalloc to avoid info leak in pmcmd_ioctlHaoyu Li
In the "pmcmd_ioctl" function, three memory objects allocated by kmalloc are initialized by "hcall_get_cpu_state", which are then copied to user space. The initializer is indeed implemented in "acrn_hypercall2" (arch/x86/include/asm/acrn.h). There is a risk of information leakage due to uninitialized bytes. Fixes: 3d679d5aec64 ("virt: acrn: Introduce interfaces to query C-states and P-states allowed by hypervisor") Signed-off-by: Haoyu Li <lihaoyu499@gmail.com> Cc: stable <stable@kernel.org> Acked-by: Fei Li <fei1.li@intel.com> Link: https://lore.kernel.org/r/20250130115811.92424-1-lihaoyu499@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20binderfs: fix use-after-free in binder_devicesCarlos Llamas
Devices created through binderfs are added to the global binder_devices list but are not removed before being destroyed. This leads to dangling pointers in the list and subsequent use-after-free errors: ================================================================== BUG: KASAN: slab-use-after-free in binder_add_device+0x5c/0x9c Write of size 8 at addr ffff0000c258d708 by task mount/653 CPU: 7 UID: 0 PID: 653 Comm: mount Not tainted 6.13.0-09030-g6d61a53dd6f5 #1 Hardware name: linux,dummy-virt (DT) Call trace: binder_add_device+0x5c/0x9c binderfs_binder_device_create+0x690/0x84c [...] __arm64_sys_mount+0x324/0x3bc Allocated by task 632: binderfs_binder_device_create+0x168/0x84c binder_ctl_ioctl+0xfc/0x184 [...] __arm64_sys_ioctl+0x110/0x150 Freed by task 649: kfree+0xe0/0x338 binderfs_evict_inode+0x138/0x1dc [...] ================================================================== Remove devices from binder_devices before destroying them. Cc: Li Li <dualli@google.com> Reported-by: syzbot+7015dcf45953112c8b45@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7015dcf45953112c8b45 Fixes: 12d909cac1e1 ("binderfs: add new binder devices to binder_devices") Signed-off-by: Carlos Llamas <cmllamas@google.com> Tested-by: syzbot+7015dcf45953112c8b45@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20250130215823.1518990-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20slimbus: messaging: Free transaction ID in delayed interrupt scenarioVisweswara Tanuku
In case of interrupt delay for any reason, slim_do_transfer() returns timeout error but the transaction ID (TID) is not freed. This results into invalid memory access inside qcom_slim_ngd_rx_msgq_cb() due to invalid TID. Fix the issue by freeing the TID in slim_do_transfer() before returning timeout error to avoid invalid memory access. Call trace: __memcpy_fromio+0x20/0x190 qcom_slim_ngd_rx_msgq_cb+0x130/0x290 [slim_qcom_ngd_ctrl] vchan_complete+0x2a0/0x4a0 tasklet_action_common+0x274/0x700 tasklet_action+0x28/0x3c _stext+0x188/0x620 run_ksoftirqd+0x34/0x74 smpboot_thread_fn+0x1d8/0x464 kthread+0x178/0x238 ret_from_fork+0x10/0x20 Code: aa0003e8 91000429 f100044a 3940002b (3800150b) ---[ end trace 0fe00bec2b975c99 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt. Fixes: afbdcc7c384b ("slimbus: Add messaging APIs to slimbus framework") Cc: stable <stable@kernel.org> Signed-off-by: Visweswara Tanuku <quic_vtanuku@quicinc.com> Link: https://lore.kernel.org/r/20250124125740.16897-1-quic_vtanuku@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20vbox: add HAS_IOPORT dependencyArnd Bergmann
The vboxguest driver depends on port I/O for debug output: include/asm-generic/io.h:626:15: error: call to '_outl' declared with attribute error: outl() requires CONFIG_HAS_IOPORT 626 | #define _outl _outl include/asm-generic/io.h:663:14: note: in expansion of macro '_outl' 663 | #define outl _outl | ^~~~~ drivers/virt/vboxguest/vboxguest_utils.c:102:9: note: in expansion of macro 'outl' 102 | outl(phys_req, gdev->io_port + VMMDEV_PORT_OFF_REQUEST); | ^~~~ Most arm64 platforms don't actually support port I/O, though it is currently enabled unconditionally. Refine the vbox dependency to allow turning HAS_IOPORT off in the future when building for platforms without port I/O and allow compile-testing on all architectures. Fixes: 5cf8f938bf5c ("vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20250122065445.1469218-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20cdx: Fix possible UAF error in driver_override_show()Qiu-ji Chen
Fixed a possible UAF problem in driver_override_show() in drivers/cdx/cdx.c This function driver_override_show() is part of DEVICE_ATTR_RW, which includes both driver_override_show() and driver_override_store(). These functions can be executed concurrently in sysfs. The driver_override_store() function uses driver_set_override() to update the driver_override value, and driver_set_override() internally locks the device (device_lock(dev)). If driver_override_show() reads cdx_dev->driver_override without locking, it could potentially access a freed pointer if driver_override_store() frees the string concurrently. This could lead to printing a kernel address, which is a security risk since DEVICE_ATTR can be read by all users. Additionally, a similar pattern is used in drivers/amba/bus.c, as well as many other bus drivers, where device_lock() is taken in the show function, and it has been working without issues. This potential bug was detected by our experimental static analysis tool, which analyzes locking APIs and paired functions to identify data races and atomicity violations. Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus") Cc: stable <stable@kernel.org> Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com> Link: https://lore.kernel.org/r/20250118070833.27201-1-chenqiuji666@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20gpiolib: don't bail out if get_direction() fails in gpiochip_add_data()Bartosz Golaszewski
Since commit 9d846b1aebbe ("gpiolib: check the return value of gpio_chip::get_direction()") we check the return value of the get_direction() callback as per its API contract. Some drivers have been observed to fail to register now as they may call get_direction() in gpiochip_add_data() in contexts where it has always silently failed. Until we audit all drivers, replace the bail-out to a kernel log warning. Fixes: 9d846b1aebbe ("gpiolib: check the return value of gpio_chip::get_direction()") Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/all/Z7VFB1nST6lbmBIo@finisterre.sirena.org.uk/ Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/all/dfe03f88-407e-4ef1-ad30-42db53bbd4e4@samsung.com/ Tested-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250219144356.258635-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-02-20usb: gadget: Fix setting self-powered state on suspendMarek Szyprowski
cdev->config might be NULL, so check it before dereferencing. CC: stable <stable@kernel.org> Fixes: 40e89ff5750f ("usb: gadget: Set self-powered based on MaxPower and bmAttributes") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250220120314.3614330-1-m.szyprowski@samsung.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20drivers: core: fix device leak in __fw_devlink_relax_cycles()Luca Ceresoli
Commit bac3b10b78e5 ("driver core: fw_devlink: Stop trying to optimize cycle detection logic") introduced a new struct device *con_dev and a get_dev_from_fwnode() call to get it, but without adding a corresponding put_device(). Closes: https://lore.kernel.org/all/20241204124826.2e055091@booty/ Fixes: bac3b10b78e5 ("driver core: fw_devlink: Stop trying to optimize cycle detection logic") Cc: stable@vger.kernel.org Reviewed-by: Saravana Kannan <saravanak@google.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Link: https://lore.kernel.org/r/20250213-fix__fw_devlink_relax_cycles_missing_device_put-v2-1-8cd3b03e6a3f@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20Merge branch 'net-remove-the-single-page-frag-cache-for-good'Paolo Abeni
Paolo Abeni says: ==================== net: remove the single page frag cache for good This is another attempt at reverting commit dbae2b062824 ("net: skb: introduce and use a single page frag cache"), as it causes regressions in specific use-cases. Reverting such commit uncovers an allocation issue for build with CONFIG_MAX_SKB_FRAGS=45, as reported by Sabrina. This series handle the latter in patch 1 and brings the revert in patch 2. Note that there is a little chicken-egg problem, as I included into the patch 1's changelog the splat that would be visible only applying first the revert: I think current patch order is better for bisectability, still the splat is useful for correct attribution. ==================== Link: https://patch.msgid.link/cover.1739899357.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20Revert "net: skb: introduce and use a single page frag cache"Paolo Abeni
After the previous commit is finally safe to revert commit dbae2b062824 ("net: skb: introduce and use a single page frag cache"): do it here. The intended goal of such change was to counter a performance regression introduced by commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs"). Unfortunately, the blamed commit introduces another regression for the virtio_net driver. Such a driver calls napi_alloc_skb() with a tiny size, so that the whole head frag could fit a 512-byte block. The single page frag cache uses a 1K fragment for such allocation, and the additional overhead, under small UDP packets flood, makes the page allocator a bottleneck. Thanks to commit bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head"), this revert does not re-introduce the original regression. Actually, in the relevant test on top of this revert, I measure a small but noticeable positive delta, just above noise level. The revert itself required some additional mangling due to recent updates in the affected code. Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: dbae2b062824 ("net: skb: introduce and use a single page frag cache") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20net: allow small head cache usage with large MAX_SKB_FRAGS valuesPaolo Abeni
Sabrina reported the following splat: WARNING: CPU: 0 PID: 1 at net/core/dev.c:6935 netif_napi_add_weight_locked+0x8f2/0xba0 Modules linked in: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc1-net-00092-g011b03359038 #996 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:netif_napi_add_weight_locked+0x8f2/0xba0 Code: e8 c3 e6 6a fe 48 83 c4 28 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc c7 44 24 10 ff ff ff ff e9 8f fb ff ff e8 9e e6 6a fe <0f> 0b e9 d3 fe ff ff e8 92 e6 6a fe 48 8b 04 24 be ff ff ff ff 48 RSP: 0000:ffffc9000001fc60 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88806ce48128 RCX: 1ffff11001664b9e RDX: ffff888008f00040 RSI: ffffffff8317ca42 RDI: ffff88800b325cb6 RBP: ffff88800b325c40 R08: 0000000000000001 R09: ffffed100167502c R10: ffff88800b3a8163 R11: 0000000000000000 R12: ffff88800ac1c168 R13: ffff88800ac1c168 R14: ffff88800ac1c168 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff888008201000 CR3: 0000000004c94001 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> gro_cells_init+0x1ba/0x270 xfrm_input_init+0x4b/0x2a0 xfrm_init+0x38/0x50 ip_rt_init+0x2d7/0x350 ip_init+0xf/0x20 inet_init+0x406/0x590 do_one_initcall+0x9d/0x2e0 do_initcalls+0x23b/0x280 kernel_init_freeable+0x445/0x490 kernel_init+0x20/0x1d0 ret_from_fork+0x46/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> irq event stamp: 584330 hardirqs last enabled at (584338): [<ffffffff8168bf87>] __up_console_sem+0x77/0xb0 hardirqs last disabled at (584345): [<ffffffff8168bf6c>] __up_console_sem+0x5c/0xb0 softirqs last enabled at (583242): [<ffffffff833ee96d>] netlink_insert+0x14d/0x470 softirqs last disabled at (583754): [<ffffffff8317c8cd>] netif_napi_add_weight_locked+0x77d/0xba0 on kernel built with MAX_SKB_FRAGS=45, where SKB_WITH_OVERHEAD(1024) is smaller than GRO_MAX_HEAD. Such built additionally contains the revert of the single page frag cache so that napi_get_frags() ends up using the page frag allocator, triggering the splat. Note that the underlying issue is independent from the mentioned revert; address it ensuring that the small head cache will fit either TCP and GRO allocation and updating napi_alloc_skb() and __netdev_alloc_skb() to select kmalloc() usage for any allocation fitting such cache. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20arm64: dts: rockchip: remove supports-cqe from rk3588 tigerHeiko Stuebner
The sdhci controller supports cqe it seems and necessary code also is in place - in theory. At this point Jaguar and Tiger are the only boards enabling cqe support on the rk3588 and we are seeing reliability issues under load. This can be caused by either a controller-, hw- or driver-issue and definitly needs more investigation to work properly it seems. So disable cqe support on Tiger for now. Fixes: 6173ef24b35b ("arm64: dts: rockchip: add RK3588-Q7 (Tiger) SoM") Signed-off-by: Heiko Stuebner <heiko.stuebner@cherry.de> Reviewed-by: Quentin Schulz <quentin.schulz@cherry.de> Link: https://lore.kernel.org/r/20250219093303.2320517-2-heiko@sntech.de Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-02-20arm64: dts: rockchip: remove supports-cqe from rk3588 jaguarHeiko Stuebner
The sdhci controller supports cqe it seems and necessary code also is in place - in theory. At this point Jaguar and Tiger are the only boards enabling cqe support on the rk3588 and we are seeing reliability issues under load. This can be caused by either a controller-, hw- or driver-issue and definitly needs more investigation to work properly it seems. So disable cqe support on Jaguar for now. Fixes: d1b8b36a2cc5 ("arm64: dts: rockchip: add Theobroma Jaguar SBC") Signed-off-by: Heiko Stuebner <heiko.stuebner@cherry.de> Reviewed-by: Quentin Schulz <quentin.schulz@cherry.de> Link: https://lore.kernel.org/r/20250219093303.2320517-1-heiko@sntech.de Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-02-20intel_th: pci: Add Panther Lake-P/U supportAlexander Shishkin
Add support for the Trace Hub in Panther Lake-P/U. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20250211185017.1759193-6-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20intel_th: pci: Add Panther Lake-H supportAlexander Shishkin
Add support for the Trace Hub in Panther Lake-H. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20250211185017.1759193-5-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20intel_th: pci: Add Arrow Lake supportPawel Chmielewski
Add support for the Trace Hub in Arrow Lake. Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20250211185017.1759193-4-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20intel_th: msu: Fix less trivial kernel-doc warningsAlexander Shishkin
Correct function comments to prevent kernel-doc warnings found when using "W=1" that the drive-by fixers had trouble documenting and skipped over. msu.c:168: warning: Function parameter or struct member 'msu_base' not described in 'msc' msu.c:168: warning: Function parameter or struct member 'work' not described in 'msc' msu.c:168: warning: Function parameter or struct member 'switch_on_unlock' not described in 'msc' msu.c:168: warning: Function parameter or struct member 'iter_list' not described in 'msc' msu.c:168: warning: Function parameter or struct member 'stop_on_full' not described in 'msc' msu.c:168: warning: Function parameter or struct member 'do_irq' not described in 'msc' msu.c:168: warning: Function parameter or struct member 'multi_is_broken' not described in 'msc' Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250211185017.1759193-3-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20intel_th: msu: Fix kernel-doc warningsAndy Shevchenko
Correct function comments to prevent kernel-doc warnings found when using "W=1". msu.c:162: warning: Function parameter or struct member 'mbuf_priv' not described in 'msc' msu.c:164: warning: Function parameter or struct member 'orig_addr' not described in 'msc' msu.c:164: warning: Function parameter or struct member 'orig_sz' not described in 'msc' Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20250211185017.1759193-2-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-20nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()Haoxiang Li
Add check for the return value of nfp_app_ctrl_msg_alloc() in nfp_bpf_cmsg_alloc() to prevent null pointer dereference. Fixes: ff3d43f7568c ("nfp: bpf: implement helpers for FW map ops") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com> Link: https://patch.msgid.link/20250218030409.2425798-1-haoxiang_li2024@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20tcp: drop secpath at the same time as we currently drop dstSabrina Dubroca
Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while running tests that boil down to: - create a pair of netns - run a basic TCP test over ipcomp6 - delete the pair of netns The xfrm_state found on spi_byaddr was not deleted at the time we delete the netns, because we still have a reference on it. This lingering reference comes from a secpath (which holds a ref on the xfrm_state), which is still attached to an skb. This skb is not leaked, it ends up on sk_receive_queue and then gets defer-free'd by skb_attempt_defer_free. The problem happens when we defer freeing an skb (push it on one CPU's defer_list), and don't flush that list before the netns is deleted. In that case, we still have a reference on the xfrm_state that we don't expect at this point. We already drop the skb's dst in the TCP receive path when it's no longer needed, so let's also drop the secpath. At this point, tcp_filter has already called into the LSM hooks that may require the secpath, so it should not be needed anymore. However, in some of those places, the MPTCP extension has just been attached to the skb, so we cannot simply drop all extensions. Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/5055ba8f8f72bdcb602faa299faca73c280b7735.1739743613.git.sd@queasysnail.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20net: axienet: Set mac_managed_pmNick Hu
The external PHY will undergo a soft reset twice during the resume process when it wake up from suspend. The first reset occurs when the axienet driver calls phylink_of_phy_connect(), and the second occurs when mdio_bus_phy_resume() invokes phy_init_hw(). The second soft reset of the external PHY does not reinitialize the internal PHY, which causes issues with the internal PHY, resulting in the PHY link being down. To prevent this, setting the mac_managed_pm flag skips the mdio_bus_phy_resume() function. Fixes: a129b41fe0a8 ("Revert "net: phy: dp83867: perform soft reset and retain established link"") Signed-off-by: Nick Hu <nick.hu@sifive.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250217055843.19799-1-nick.hu@sifive.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-20Merge tag 'mhi-fixes-for-v6.14' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-linus Manivannan writes: MHI Host ======== - Use pci_try_reset_function() to reset the MHI function during recovery process to avoid the deadlock reported on the X1E80100 CRD device. The deadlock can happen if the caller has already acquired the 'device_lock()' while calling the recovery function. So using pci_try_reset_function() avoids the deadlock by returning -EAGAIN if the lock was already acquired. * tag 'mhi-fixes-for-v6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock
2025-02-20RDMA/mlx5: Fix AH static rate parsingPatrisious Haddad
Previously static rate wasn't translated according to our PRM but simply used the 4 lower bytes. Correctly translate static rate value passed in AH creation attribute according to our PRM expected values. In addition change 800GB mapping to zero, which is the PRM specified value. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://patch.msgid.link/18ef4cc5396caf80728341eb74738cd777596f60.1739187089.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-20RDMA/mlx5: Fix implicit ODP hang on parent deregistrationYishai Hadas
Fix the destroy_unused_implicit_child_mr() to prevent hanging during parent deregistration as of below [1]. Upon entering destroy_unused_implicit_child_mr(), the reference count for the implicit MR parent is incremented using: refcount_inc_not_zero(). A corresponding decrement must be performed if free_implicit_child_mr_work() is not called. The code has been updated to properly manage the reference count that was incremented. [1] INFO: task python3:2157 blocked for more than 120 seconds. Not tainted 6.12.0-rc7+ #1633 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:python3 state:D stack:0 pid:2157 tgid:2157 ppid:1685 flags:0x00000000 Call Trace: <TASK> __schedule+0x420/0xd30 schedule+0x47/0x130 __mlx5_ib_dereg_mr+0x379/0x5d0 [mlx5_ib] ? __pfx_autoremove_wake_function+0x10/0x10 ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs] uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs] uobj_destroy+0x3f/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs] ? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs] ? lock_acquire+0xc1/0x2f0 ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0x116/0x170 [ib_uverbs] ? lock_release+0xc6/0x280 ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] __x64_sys_ioctl+0x1b0/0xa70 ? kmem_cache_free+0x221/0x400 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f20f21f017b RSP: 002b:00007ffcfc4a77c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffcfc4a78d8 RCX: 00007f20f21f017b RDX: 00007ffcfc4a78c0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffcfc4a78a0 R08: 000056147d125190 R09: 00007f20f1f14c60 R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffcfc4a7890 R13: 000000000000001c R14: 000056147d100fc0 R15: 00007f20e365c9d0 </TASK> Fixes: d3d930411ce3 ("RDMA/mlx5: Fix implicit ODP use after free") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Artemy Kovalyov <artemyko@nvidia.com> Link: https://patch.msgid.link/80f2fcd19952dfa7d9981d93fd6359b4471f8278.1739186929.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-19Merge branch 'net-core-improvements-to-device-lookup-by-hardware-address'Jakub Kicinski
Breno Leitao says: ==================== net: core: improvements to device lookup by hardware address. The first patch adds a new dev_getbyhwaddr() helper function for finding devices by hardware address when the rtnl lock is held. This prevents PROVE_LOCKING warnings that occurred when rtnl lock was held but the RCU read lock wasn't. The common address comparison logic is extracted into dev_comp_addr() to avoid code duplication. The second coverts arp_req_set_public() to the new helper. ==================== Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-0-d3d6892db9e1@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19arp: switch to dev_getbyhwaddr() in arp_req_set_public()Breno Leitao
The arp_req_set_public() function is called with the rtnl lock held, which provides enough synchronization protection. This makes the RCU variant of dev_getbyhwaddr() unnecessary. Switch to using the simpler dev_getbyhwaddr() function since we already have the required rtnl locking. This change helps maintain consistency in the networking code by using the appropriate helper function for the existing locking context. Since we're not holding the RCU read lock in arp_req_set_public() existing code could trigger false positive locking warnings. Fixes: 941666c2e3e0 ("net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()") Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-2-d3d6892db9e1@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19net: Add non-RCU dev_getbyhwaddr() helperBreno Leitao
Add dedicated helper for finding devices by hardware address when holding rtnl_lock, similar to existing dev_getbyhwaddr_rcu(). This prevents PROVE_LOCKING warnings when rtnl_lock is held but RCU read lock is not. Extract common address comparison logic into dev_addr_cmp(). The context about this change could be found in the following discussion: Link: https://lore.kernel.org/all/20250206-scarlet-ermine-of-improvement-1fcac5@leitao/ Cc: kuniyu@amazon.com Cc: ushankar@purestorage.com Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250218-arm_fix_selftest-v5-1-d3d6892db9e1@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19sctp: Fix undefined behavior in left shift operationYu-Chun Lin
According to the C11 standard (ISO/IEC 9899:2011, 6.5.7): "If E1 has a signed type and E1 x 2^E2 is not representable in the result type, the behavior is undefined." Shifting 1 << 31 causes signed integer overflow, which leads to undefined behavior. Fix this by explicitly using '1U << 31' to ensure the shift operates on an unsigned type, avoiding undefined behavior. Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Link: https://patch.msgid.link/20250218081217.3468369-1-eleanor15x@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19Merge branch 'flow_dissector-fix-handling-of-mixed-port-and-port-range-keys'Jakub Kicinski
Cong Wang says: ==================== flow_dissector: Fix handling of mixed port and port-range keys This patchset contains two fixes for flow_dissector handling of mixed port and port-range keys, for both tc-flower case and bpf case. Each of them also comes with a selftest. ==================== Link: https://patch.msgid.link/20250218043210.732959-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19selftests/bpf: Add a specific dst port matchingCong Wang
After this patch: #102/1 flow_dissector_classification/ipv4:OK #102/2 flow_dissector_classification/ipv4_continue_dissect:OK #102/3 flow_dissector_classification/ipip:OK #102/4 flow_dissector_classification/gre:OK #102/5 flow_dissector_classification/port_range:OK #102/6 flow_dissector_classification/ipv6:OK #102 flow_dissector_classification:OK Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250218043210.732959-5-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19flow_dissector: Fix port range key handling in BPF conversionCong Wang
Fix how port range keys are handled in __skb_flow_bpf_to_target() by: - Separating PORTS and PORTS_RANGE key handling - Using correct key_ports_range structure for range keys - Properly initializing both key types independently This ensures port range information is correctly stored in its dedicated structure rather than incorrectly using the regular ports key structure. Fixes: 59fb9b62fb6c ("flow_dissector: Fix to use new variables for port ranges in bpf hook") Reported-by: Qiang Zhang <dtzq01@gmail.com> Closes: https://lore.kernel.org/netdev/CAPx+-5uvFxkhkz4=j_Xuwkezjn9U6kzKTD5jz4tZ9msSJ0fOJA@mail.gmail.com/ Cc: Yoshiki Komachi <komachi.yoshiki@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250218043210.732959-4-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19selftests/net/forwarding: Add a test case for tc-flower of mixed port and ↵Cong Wang
port-range After this patch: # ./tc_flower_port_range.sh TEST: Port range matching - IPv4 UDP [ OK ] TEST: Port range matching - IPv4 TCP [ OK ] TEST: Port range matching - IPv6 UDP [ OK ] TEST: Port range matching - IPv6 TCP [ OK ] TEST: Port range matching - IPv4 UDP Drop [ OK ] Cc: Qiang Zhang <dtzq01@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250218043210.732959-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19flow_dissector: Fix handling of mixed port and port-range keysCong Wang
This patch fixes a bug in TC flower filter where rules combining a specific destination port with a source port range weren't working correctly. The specific case was when users tried to configure rules like: tc filter add dev ens38 ingress protocol ip flower ip_proto udp \ dst_port 5000 src_port 2000-3000 action drop The root cause was in the flow dissector code. While both FLOW_DISSECTOR_KEY_PORTS and FLOW_DISSECTOR_KEY_PORTS_RANGE flags were being set correctly in the classifier, the __skb_flow_dissect_ports() function was only populating one of them: whichever came first in the enum check. This meant that when the code needed both a specific port and a port range, one of them would be left as 0, causing the filter to not match packets as expected. Fix it by removing the either/or logic and instead checking and populating both key types independently when they're in use. Fixes: 8ffb055beae5 ("cls_flower: Fix the behavior using port ranges with hw-offload") Reported-by: Qiang Zhang <dtzq01@gmail.com> Closes: https://lore.kernel.org/netdev/CAPx+-5uvFxkhkz4=j_Xuwkezjn9U6kzKTD5jz4tZ9msSJ0fOJA@mail.gmail.com/ Cc: Yoshiki Komachi <komachi.yoshiki@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250218043210.732959-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19Merge branch 'gtp-geneve-suppress-list_del-splat-during-exit_batch_rtnl'Jakub Kicinski
Kuniyuki Iwashima says: ==================== gtp/geneve: Suppress list_del() splat during ->exit_batch_rtnl(). The common pattern in tunnel device's ->exit_batch_rtnl() is iterating two netdev lists for each netns: (i) for_each_netdev() to clean up devices in the netns, and (ii) the device type specific list to clean up devices in other netns. list_for_each_entry(net, net_list, exit_list) { for_each_netdev_safe(net, dev, next) { /* (i) call unregister_netdevice_queue(dev, list) */ } list_for_each_entry_safe(xxx, xxx_next, &net->yyy, zzz) { /* (ii) call unregister_netdevice_queue(xxx->dev, list) */ } } Then, ->exit_batch_rtnl() could touch the same device twice. Say we have two netns A & B and device B that is created in netns A and moved to netns B. 1. cleanup_net() processes netns A and then B. 2. ->exit_batch_rtnl() finds the device B while iterating netns A's (ii) [ device B is not yet unlinked from netns B as unregister_netdevice_many() has not been called. ] 3. ->exit_batch_rtnl() finds the device B while iterating netns B's (i) gtp and geneve calls ->dellink() at 2. and 3. that calls list_del() for (ii) and unregister_netdevice_queue(). Calling unregister_netdevice_queue() twice is fine because it uses list_move_tail(), but the 2nd list_del() triggers a splat when CONFIG_DEBUG_LIST is enabled. Possible solution is either of (a) Use list_del_init() in ->dellink() (b) Iterate dev with empty ->unreg_list for (i) like #define for_each_netdev_alive(net, d) \ list_for_each_entry(d, &(net)->dev_base_head, dev_list) \ if (list_empty(&d->unreg_list)) (c) Remove (i) and delegate it to default_device_exit_batch(). This series avoids the 2nd ->dellink() by (c) to suppress the splat for gtp and geneve. Note that IPv4/IPv6 tunnels calls just unregister_netdevice() during ->exit_batch_rtnl() and dev is unlinked from (ii) later in ->ndo_uninit(), so they are safe. Also, pfcp has the same pattern but is safe because unregister_netdevice_many() is called for each netns. ==================== Link: https://patch.msgid.link/20250217203705.40342-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19geneve: Suppress list corruption splat in geneve_destroy_tunnels().Kuniyuki Iwashima
As explained in the previous patch, iterating for_each_netdev() and gn->geneve_list during ->exit_batch_rtnl() could trigger ->dellink() twice for the same device. If CONFIG_DEBUG_LIST is enabled, we will see a list_del() corruption splat in the 2nd call of geneve_dellink(). Let's remove for_each_netdev() in geneve_destroy_tunnels() and delegate that part to default_device_exit_batch(). Fixes: 9593172d93b9 ("geneve: Fix use-after-free in geneve_find_dev().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250217203705.40342-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().Kuniyuki Iwashima
Brad Spengler reported the list_del() corruption splat in gtp_net_exit_batch_rtnl(). [0] Commit eb28fd76c0a0 ("gtp: Destroy device along with udp socket's netns dismantle.") added the for_each_netdev() loop in gtp_net_exit_batch_rtnl() to destroy devices in each netns as done in geneve and ip tunnels. However, this could trigger ->dellink() twice for the same device during ->exit_batch_rtnl(). Say we have two netns A & B and gtp device B that resides in netns B but whose UDP socket is in netns A. 1. cleanup_net() processes netns A and then B. 2. gtp_net_exit_batch_rtnl() finds the device B while iterating netns A's gn->gtp_dev_list and calls ->dellink(). [ device B is not yet unlinked from netns B as unregister_netdevice_many() has not been called. ] 3. gtp_net_exit_batch_rtnl() finds the device B while iterating netns B's for_each_netdev() and calls ->dellink(). gtp_dellink() cleans up the device's hash table, unlinks the dev from gn->gtp_dev_list, and calls unregister_netdevice_queue(). Basically, calling gtp_dellink() multiple times is fine unless CONFIG_DEBUG_LIST is enabled. Let's remove for_each_netdev() in gtp_net_exit_batch_rtnl() and delegate the destruction to default_device_exit_batch() as done in bareudp. [0]: list_del corruption, ffff8880aaa62c00->next (autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object]) is LIST_POISON1 (ffffffffffffff02) (prev is 0xffffffffffffff04) kernel BUG at lib/list_debug.c:58! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 UID: 0 PID: 1804 Comm: kworker/u8:7 Tainted: G T 6.12.13-grsec-full-20250211091339 #1 Tainted: [T]=RANDSTRUCT Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:[<ffffffff84947381>] __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58 Code: c2 76 91 31 c0 e8 9f b1 f7 fc 0f 0b 4d 89 f0 48 c7 c1 02 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 e0 c2 76 91 31 c0 e8 7f b1 f7 fc <0f> 0b 4d 89 e8 48 c7 c1 04 ff ff ff 48 89 ea 48 89 ee 48 c7 c7 60 RSP: 0018:fffffe8040b4fbd0 EFLAGS: 00010283 RAX: 00000000000000cc RBX: dffffc0000000000 RCX: ffffffff818c4054 RDX: ffffffff84947381 RSI: ffffffff818d1512 RDI: 0000000000000000 RBP: ffff8880aaa62c00 R08: 0000000000000001 R09: fffffbd008169f32 R10: fffffe8040b4f997 R11: 0000000000000001 R12: a1988d84f24943e4 R13: ffffffffffffff02 R14: ffffffffffffff04 R15: ffff8880aaa62c08 RBX: kasan shadow of 0x0 RCX: __wake_up_klogd.part.0+0x74/0xe0 kernel/printk/printk.c:4554 RDX: __list_del_entry_valid_or_report+0x141/0x200 lib/list_debug.c:58 RSI: vprintk+0x72/0x100 kernel/printk/printk_safe.c:71 RBP: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc00/0x1000 [slab object] RSP: process kstack fffffe8040b4fbd0+0x7bd0/0x8000 [kworker/u8:7+netns 1804 ] R09: kasan shadow of process kstack fffffe8040b4f990+0x7990/0x8000 [kworker/u8:7+netns 1804 ] R10: process kstack fffffe8040b4f997+0x7997/0x8000 [kworker/u8:7+netns 1804 ] R15: autoslab_size_M_dev_P_net_core_dev_11127_8_1328_8_S_4096_A_64_n_139+0xc08/0x1000 [slab object] FS: 0000000000000000(0000) GS:ffff888116000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000748f5372c000 CR3: 0000000015408000 CR4: 00000000003406f0 shadow CR4: 00000000003406f0 Stack: 0000000000000000 ffffffff8a0c35e7 ffffffff8a0c3603 ffff8880aaa62c00 ffff8880aaa62c00 0000000000000004 ffff88811145311c 0000000000000005 0000000000000001 ffff8880aaa62000 fffffe8040b4fd40 ffffffff8a0c360d Call Trace: <TASK> [<ffffffff8a0c360d>] __list_del_entry_valid include/linux/list.h:131 [inline] fffffe8040b4fc28 [<ffffffff8a0c360d>] __list_del_entry include/linux/list.h:248 [inline] fffffe8040b4fc28 [<ffffffff8a0c360d>] list_del include/linux/list.h:262 [inline] fffffe8040b4fc28 [<ffffffff8a0c360d>] gtp_dellink+0x16d/0x360 drivers/net/gtp.c:1557 fffffe8040b4fc28 [<ffffffff8a0d0404>] gtp_net_exit_batch_rtnl+0x124/0x2c0 drivers/net/gtp.c:2495 fffffe8040b4fc88 [<ffffffff8e705b24>] cleanup_net+0x5a4/0xbe0 net/core/net_namespace.c:635 fffffe8040b4fcd0 [<ffffffff81754c97>] process_one_work+0xbd7/0x2160 kernel/workqueue.c:3326 fffffe8040b4fd88 [<ffffffff81757195>] process_scheduled_works kernel/workqueue.c:3407 [inline] fffffe8040b4fec0 [<ffffffff81757195>] worker_thread+0x6b5/0xfa0 kernel/workqueue.c:3488 fffffe8040b4fec0 [<ffffffff817782a0>] kthread+0x360/0x4c0 kernel/kthread.c:397 fffffe8040b4ff78 [<ffffffff814d8594>] ret_from_fork+0x74/0xe0 arch/x86/kernel/process.c:172 fffffe8040b4ffb8 [<ffffffff8110f509>] ret_from_fork_asm+0x29/0xc0 arch/x86/entry/entry_64.S:399 fffffe8040b4ffe8 </TASK> Modules linked in: Fixes: eb28fd76c0a0 ("gtp: Destroy device along with udp socket's netns dismantle.") Reported-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250217203705.40342-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19Merge tag 'mm-hotfixes-stable-2025-02-19-17-49' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "18 hotfixes. 5 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 10 are for MM and 8 are for non-MM. All are singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-02-19-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: test_xarray: fix failure in check_pause when CONFIG_XARRAY_MULTI is not defined kasan: don't call find_vm_area() in a PREEMPT_RT kernel MAINTAINERS: update Nick's contact info selftests/mm: fix check for running THP tests mm: hugetlb: avoid fallback for specific node allocation of 1G pages memcg: avoid dead loop when setting memory.max mailmap: update Nick's entry mm: pgtable: fix incorrect reclaim of non-empty PTE pages taskstats: modify taskstats version getdelays: fix error format characters mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() tools/mm: fix build warnings with musl-libc mailmap: add entry for Feng Tang .mailmap: add entries for Jeff Johnson mm,madvise,hugetlb: check for 0-length range after end address adjustment mm/zswap: fix inconsistency when zswap_store_page() fails lib/iov_iter: fix import_iovec_ubuf iovec management procfs: fix a locking bug in a vmcore_add_device_dump() error path
2025-02-19bcachefs: Fix srcu lock warning in btree_update_nodes_written()Kent Overstreet
We don't want to be holding the srcu lock while waiting on btree write completions - easily fixed. Reported-by: Janpieter Sollie <janpieter.sollie@edpnet.be> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>