summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
3 daysnet: Lock netdevices during dev_shutdownCosmin Ratiu
__qdisc_destroy() calls into various qdiscs .destroy() op, which in turn can call .ndo_setup_tc(), which requires the netdev instance lock. This commit extends the critical section in unregister_netdevice_many_notify() to cover dev_shutdown() (and dev_tcx_uninstall() as a side-effect) and acquires the netdev instance lock in __dev_change_net_namespace() for the other dev_shutdown() call. This should now guarantee that for all qdisc ops, the netdev instance lock is held during .ndo_setup_tc(). Fixes: a0527ee2df3f ("net: hold netdev instance lock during qdisc ndo_setup_tc") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250505194713.1723399-1-cratiu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysice: use DSN instead of PCI BDF for ice_adapter indexPrzemek Kitszel
Use Device Serial Number instead of PCI bus/device/function for the index of struct ice_adapter. Functions on the same physical device should point to the very same ice_adapter instance, but with two PFs, when at least one of them is PCI-e passed-through to a VM, it is no longer the case - PFs will get seemingly random PCI BDF values, and thus indices, what finally leds to each of them being on their own instance of ice_adapter. That causes them to don't attempt any synchronization of the PTP HW clock usage, or any other future resources. DSN works nicely in place of the index, as it is "immutable" in terms of virtualization. Fixes: 0e2bddf9e5f9 ("ice: add ice_adapter for shared data across PFs on the same NIC") Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Suggested-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20250505161939.2083581-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysx86/Kconfig: make CFI_AUTO_DEFAULT depend on !RUST or Rust >= 1.88Paweł Anikiel
Calling core::fmt::write() from rust code while FineIBT is enabled results in a kernel panic: [ 4614.199779] kernel BUG at arch/x86/kernel/cet.c:132! [ 4614.205343] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 4614.211781] CPU: 2 UID: 0 PID: 6057 Comm: dmabuf_dump Tainted: G U O 6.12.17-android16-0-g6ab38c534a43 #1 9da040f27673ec3945e23b998a0f8bd64c846599 [ 4614.227832] Tainted: [U]=USER, [O]=OOT_MODULE [ 4614.241247] RIP: 0010:do_kernel_cp_fault+0xea/0xf0 ... [ 4614.398144] RIP: 0010:_RNvXs5_NtNtNtCs3o2tGsuHyou_4core3fmt3num3impyNtB9_7Display3fmt+0x0/0x20 [ 4614.407792] Code: 48 f7 df 48 0f 48 f9 48 89 f2 89 c6 5d e9 18 fd ff ff 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 41 81 ea 14 61 af 2c 74 03 0f 0b 90 <66> 0f 1f 00 55 48 89 e5 48 89 f2 48 8b 3f be 01 00 00 00 5d e9 e7 [ 4614.428775] RSP: 0018:ffffb95acfa4ba68 EFLAGS: 00010246 [ 4614.434609] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000 [ 4614.442587] RDX: 0000000000000007 RSI: ffffb95acfa4ba70 RDI: ffffb95acfa4bc88 [ 4614.450557] RBP: ffffb95acfa4bae0 R08: ffff0a00ffffff05 R09: 0000000000000070 [ 4614.458527] R10: 0000000000000000 R11: ffffffffab67eaf0 R12: ffffb95acfa4bcc8 [ 4614.466493] R13: ffffffffac5d50f0 R14: 0000000000000000 R15: 0000000000000000 [ 4614.474473] ? __cfi__RNvXs5_NtNtNtCs3o2tGsuHyou_4core3fmt3num3impyNtB9_7Display3fmt+0x10/0x10 [ 4614.484118] ? _RNvNtCs3o2tGsuHyou_4core3fmt5write+0x1d2/0x250 This happens because core::fmt::write() calls core::fmt::rt::Argument::fmt(), which currently has CFI disabled: library/core/src/fmt/rt.rs: 171 // FIXME: Transmuting formatter in new and indirectly branching to/calling 172 // it here is an explicit CFI violation. 173 #[allow(inline_no_sanitize)] 174 #[no_sanitize(cfi, kcfi)] 175 #[inline] 176 pub(super) unsafe fn fmt(&self, f: &mut Formatter<'_>) -> Result { This causes a Control Protection exception, because FineIBT has sealed off the original function's endbr64. This makes rust currently incompatible with FineIBT. Add a Kconfig dependency that prevents FineIBT from getting turned on by default if rust is enabled. [ Rust 1.88.0 (scheduled for 2025-06-26) should have this fixed [1], and thus we relaxed the condition with Rust >= 1.88. When `objtool` lands checking for this with e.g. [2], the plan is to ideally run that in upstream Rust's CI to prevent regressions early [3], since we do not control `core`'s source code. Alice tested the Rust PR backported to an older compiler. Peter would like that Rust provides a stable `core` which can be pulled into the kernel: "Relying on that much out of tree code is 'unfortunate'". - Miguel ] Signed-off-by: Paweł Anikiel <panikiel@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://github.com/rust-lang/rust/pull/139632 [1] Link: https://lore.kernel.org/rust-for-linux/20250410154556.GB9003@noisy.programming.kicks-ass.net/ [2] Link: https://github.com/rust-lang/rust/pull/139632#issuecomment-2801950873 [3] Link: https://lore.kernel.org/r/20250410115420.366349-1-panikiel@google.com Link: https://lore.kernel.org/r/att0-CANiq72kjDM0cKALVy4POEzhfdT4nO7tqz0Pm7xM+3=_0+L1t=A@mail.gmail.com [ Reduced splat. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 daysrust: clean Rust 1.88.0's `clippy::uninlined_format_args` lintMiguel Ojeda
Starting with Rust 1.88.0 (expected 2025-06-26) [1], `rustc` may move back the `uninlined_format_args` to `style` from `pedantic` (it was there waiting for rust-analyzer suppotr), and thus we will start to see lints like: warning: variables can be used directly in the `format!` string --> rust/macros/kunit.rs:105:37 | 105 | let kunit_wrapper_fn_name = format!("kunit_rust_wrapper_{}", test); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args help: change this to | 105 - let kunit_wrapper_fn_name = format!("kunit_rust_wrapper_{}", test); 105 + let kunit_wrapper_fn_name = format!("kunit_rust_wrapper_{test}"); There is even a case that is a pure removal: warning: variables can be used directly in the `format!` string --> rust/macros/module.rs:51:13 | 51 | format!("{field}={content}\0", field = field, content = content) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args help: change this to | 51 - format!("{field}={content}\0", field = field, content = content) 51 + format!("{field}={content}\0") The lints all seem like nice cleanups, thus just apply them. We may want to disable `allow-mixed-uninlined-format-args` in the future. Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Link: https://github.com/rust-lang/rust-clippy/pull/14160 [1] Acked-by: Benno Lossin <lossin@kernel.org> Reviewed-by: Tamir Duberstein <tamird@gmail.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250502140237.1659624-6-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 daysrust: clean Rust 1.88.0's warning about `clippy::disallowed_macros` ↵Miguel Ojeda
configuration Starting with Rust 1.88.0 (expected 2025-06-26) [1], Clippy may start warning about paths that do not resolve in the `disallowed_macros` configuration: warning: `kernel::dbg` does not refer to an existing macro --> .clippy.toml:10:5 | 10 | { path = "kernel::dbg", reason = "the `dbg!` macro is intended as a debugging tool" }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a lint we requested at [2], due to the trouble debugging the lint due to false negatives (e.g. [3]), which we use to emulate `clippy::dbg_macro` [4]. See commit 8577c9dca799 ("rust: replace `clippy::dbg_macro` with `disallowed_macros`") for more details. Given the false negatives are not resolved yet, it is expected that Clippy complains about not finding this macro. Thus, until the false negatives are fixed (and, even then, probably we will need to wait for the MSRV to raise enough), use the escape hatch to allow an invalid path. Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Link: https://github.com/rust-lang/rust-clippy/pull/14397 [1] Link: https://github.com/rust-lang/rust-clippy/issues/11432 [2] Link: https://github.com/rust-lang/rust-clippy/issues/11431 [3] Link: https://github.com/rust-lang/rust-clippy/issues/11303 [4] Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250502140237.1659624-5-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 daysrust: clean Rust 1.88.0's `unnecessary_transmutes` lintMiguel Ojeda
Starting with Rust 1.88.0 (expected 2025-06-26) [1][2], `rustc` may introduce a new lint that catches unnecessary transmutes, e.g.: error: unnecessary transmute --> rust/uapi/uapi_generated.rs:23242:18 | 23242 | unsafe { ::core::mem::transmute(self._bitfield_1.get(0usize, 1u8) as u8) } | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace this with: `(self._bitfield_1.get(0usize, 1u8) as u8 == 1)` | = note: `-D unnecessary-transmutes` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unnecessary_transmutes)]` There are a lot of them (at least 300), but luckily they are all in `bindgen`-generated code. Thus clean all up by allowing it there. Since unknown lints trigger a lint itself in older compilers, do it conditionally so that we can keep the `unknown_lints` lint enabled. Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Link: https://github.com/rust-lang/rust/pull/136083 [1] Link: https://github.com/rust-lang/rust/issues/136067 [2] Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250502140237.1659624-4-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 daysrust: allow Rust 1.87.0's `clippy::ptr_eq` lintMiguel Ojeda
Starting with Rust 1.87.0 (expected 2025-05-15) [1], Clippy may expand the `ptr_eq` lint, e.g.: error: use `core::ptr::eq` when comparing raw pointers --> rust/kernel/list.rs:438:12 | 438 | if self.first == item { | ^^^^^^^^^^^^^^^^^^ help: try: `core::ptr::eq(self.first, item)` | = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_eq = note: `-D clippy::ptr-eq` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::ptr_eq)]` It is expected that a PR to relax the lint will be backported [2] by the time Rust 1.87.0 releases, since the lint was considered too eager (at least by default) [3]. Thus allow the lint temporarily just in case. Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Link: https://github.com/rust-lang/rust-clippy/pull/14339 [1] Link: https://github.com/rust-lang/rust-clippy/pull/14526 [2] Link: https://github.com/rust-lang/rust-clippy/issues/14525 [3] Link: https://lore.kernel.org/r/20250502140237.1659624-3-ojeda@kernel.org [ Converted to `allow`s since backport was confirmed. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 daysobjtool/rust: add one more `noreturn` Rust function for Rust 1.87.0Miguel Ojeda
Starting with Rust 1.87.0 (expected 2025-05-15), `objtool` may report: rust/core.o: warning: objtool: _R..._4core9panicking9panic_fmt() falls through to next function _R..._4core9panicking18panic_nounwind_fmt() rust/core.o: warning: objtool: _R..._4core9panicking18panic_nounwind_fmt() falls through to next function _R..._4core9panicking5panic() The reason is that `rust_begin_unwind` is now mangled: _R..._7___rustc17rust_begin_unwind Thus add the mangled one to the list so that `objtool` knows it is actually `noreturn`. See commit 56d680dd23c3 ("objtool/rust: list `noreturn` Rust functions") for more details. Alternatively, we could remove the fixed one in `noreturn.h` and relax this test to cover both, but it seems best to be strict as long as we can. Cc: stable@vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs). Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250502140237.1659624-2-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
3 dayswifi: iwlwifi: add support for Killer on MTLJohannes Berg
For now, we need another entry for these devices, this will be changed completely for 6.16. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219926 Link: https://patch.msgid.link/20250506214258.2efbdc9e9a82.I31915ec252bd1c74bd53b89a0e214e42a74b6f2e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 dayswifi: mac80211: fix the type of status_code for negotiated TID to Link MappingMichael-CY Lee
The status code should be type of __le16. Fixes: 83e897a961b8 ("wifi: ieee80211: add definitions for negotiated TID to Link map") Fixes: 8f500fbc6c65 ("wifi: mac80211: process and save negotiated TID to Link mapping request") Signed-off-by: Michael-CY Lee <michael-cy.lee@mediatek.com> Link: https://patch.msgid.link/20250505081946.3927214-1-michael-cy.lee@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 dayswifi: cfg80211: fix out-of-bounds access during multi-link element ↵Veerendranath Jakkam
defragmentation Currently during the multi-link element defragmentation process, the multi-link element length added to the total IEs length when calculating the length of remaining IEs after the multi-link element in cfg80211_defrag_mle(). This could lead to out-of-bounds access if the multi-link element or its corresponding fragment elements are the last elements in the IEs buffer. To address this issue, correctly calculate the remaining IEs length by deducting the multi-link element end offset from total IEs end offset. Cc: stable@vger.kernel.org Fixes: 2481b5da9c6b ("wifi: cfg80211: handle BSS data contained in ML probe responses") Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://patch.msgid.link/20250424-fix_mle_defragmentation_oob_access-v1-1-84412a1743fa@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
3 daysvfio/pci: Align huge faults to orderAlex Williamson
The vfio-pci huge_fault handler doesn't make any attempt to insert a mapping containing the faulting address, it only inserts mappings if the faulting address and resulting pfn are aligned. This works in a lot of cases, particularly in conjunction with QEMU where DMA mappings linearly fault the mmap. However, there are configurations where we don't get that linear faulting and pages are faulted on-demand. The scenario reported in the bug below is such a case, where the physical address width of the CPU is greater than that of the IOMMU, resulting in a VM where guest firmware has mapped device MMIO beyond the address width of the IOMMU. In this configuration, the MMIO is faulted on demand and tracing indicates that occasionally the faults generate a VM_FAULT_OOM. Given the use case, this results in a "error: kvm run failed Bad address", killing the VM. The host is not under memory pressure in this test, therefore it's suspected that VM_FAULT_OOM is actually the result of a NULL return from __pte_offset_map_lock() in the get_locked_pte() path from insert_pfn(). This suggests a potential race inserting a pte concurrent to a pmd, and maybe indicates some deficiency in the mm layer properly handling such a case. Nevertheless, Peter noted the inconsistency of vfio-pci's huge_fault handler where our mapping granularity depends on the alignment of the faulting address relative to the order rather than aligning the faulting address to the order to more consistently insert huge mappings. This change not only uses the page tables more consistently and efficiently, but as any fault to an aligned page results in the same mapping, the race condition suspected in the VM_FAULT_OOM is avoided. Reported-by: Adolfo <adolfotregosa@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220057 Fixes: 09dfc8a5f2ce ("vfio/pci: Fallback huge faults for unaligned pfn") Cc: stable@vger.kernel.org Tested-by: Adolfo <adolfotregosa@gmail.com> Co-developed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250502224035.3183451-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
3 daysMerge tag 'for-6.15-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - revert device path canonicalization, this does not work as intended with namespaces and is not reliable in all setups - fix crash in scrub when checksum tree is not valid, e.g. when mounted with rescue=ignoredatacsums - fix crash when tracepoint btrfs_prelim_ref_insert is enabled - other minor fixups: - open code folio_index(), meant to be used in MM code - use matching type for sizeof in compression allocation * tag 'for-6.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: open code folio_index() in btree_clear_folio_dirty_tag() Revert "btrfs: canonicalize the device path before adding it" btrfs: avoid NULL pointer dereference if no valid csum tree btrfs: handle empty eb->folios in num_extent_folios() btrfs: correct the order of prelim_ref arguments in btrfs__prelim_ref btrfs: compression: adjust cb->compressed_folios allocation type
3 daysMerge tag 'for-6.15/dm-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mikulas Patocka: - fix reading past the end of allocated memory - fix missing dm_put_live_table() in dm_keyslot_evict() * tag 'for-6.15/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: fix copying after src array boundaries dm: add missing unlock on in dm_keyslot_evict()
4 dayscan: gw: fix RCU/BH usage in cgw_create_job()Oliver Hartkopp
As reported by Sebastian Andrzej Siewior the use of local_bh_disable() is only feasible in uni processor systems to update the modification rules. The usual use-case to update the modification rules is to update the data of the modifications but not the modification types (AND/OR/XOR/SET) or the checksum functions itself. To omit additional memory allocations to maintain fast modification switching times, the modification description space is doubled at gw-job creation time so that only the reference to the active modification description is changed under rcu protection. Rename cgw_job::mod to cf_mod and make it a RCU pointer. Allocate in cgw_create_job() and free it together with cgw_job in cgw_job_free_rcu(). Update all users to dereference cgw_job::cf_mod with a RCU accessor and if possible once. [bigeasy: Replace mod1/mod2 from the Oliver's original patch with dynamic allocation, use RCU annotation and accessor] Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/linux-can/20231031112349.y0aLoBrz@linutronix.de/ Fixes: dd895d7f21b2 ("can: cangw: introduce optional uid to reference created routing jobs") Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20250429070555.cs-7b_eZ@linutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 daysMerge patch series "can: rx-offload: fix order of unregistration calls"Marc Kleine-Budde
Marc Kleine-Budde <mkl@pengutronix.de> says: If a driver is removed, the driver framework invokes the driver's remove callback. A CAN driver's remove function calls unregister_candev(), which calls net_device_ops::ndo_stop further down in the call stack for interfaces which are in the "up" state. With the mcp251xfd driver the removal of the module causes the following warning: | WARNING: CPU: 0 PID: 352 at net/core/dev.c:7342 __netif_napi_del_locked+0xc8/0xd8 as can_rx_offload_del() deletes the NAPI, while it is still active, because the interface is still up. To fix the warning, first unregister the network interface, which calls net_device_ops::ndo_stop, which disables the NAPI, and then call can_rx_offload_del(). All other driver using the rx-offload helper have been checked and the same issue has been found in the rockchip and m_can driver. These have been fixed, but only compile time tested. On the mcp251xfd the fix was tested on hardware. Link: https://patch.msgid.link/20250502-can-rx-offload-del-v1-0-59a9b131589d@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 dayscan: mcan: m_can_class_unregister(): fix order of unregistration callsMarc Kleine-Budde
If a driver is removed, the driver framework invokes the driver's remove callback. A CAN driver's remove function calls unregister_candev(), which calls net_device_ops::ndo_stop further down in the call stack for interfaces which are in the "up" state. The removal of the module causes a warning, as can_rx_offload_del() deletes the NAPI, while it is still active, because the interface is still up. To fix the warning, first unregister the network interface, which calls net_device_ops::ndo_stop, which disables the NAPI, and then call can_rx_offload_del(). Fixes: 1be37d3b0414 ("can: m_can: fix periph RX path: use rx-offload to ensure skbs are sent from softirq context") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250502-can-rx-offload-del-v1-3-59a9b131589d@pengutronix.de Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 dayscan: rockchip_canfd: rkcanfd_remove(): fix order of unregistration callsMarc Kleine-Budde
If a driver is removed, the driver framework invokes the driver's remove callback. A CAN driver's remove function calls unregister_candev(), which calls net_device_ops::ndo_stop further down in the call stack for interfaces which are in the "up" state. The removal of the module causes a warning, as can_rx_offload_del() deletes the NAPI, while it is still active, because the interface is still up. To fix the warning, first unregister the network interface, which calls net_device_ops::ndo_stop, which disables the NAPI, and then call can_rx_offload_del(). Fixes: ff60bfbaf67f ("can: rockchip_canfd: add driver for Rockchip CAN-FD controller") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250502-can-rx-offload-del-v1-2-59a9b131589d@pengutronix.de Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 daysblock: only update request sector if neededJohannes Thumshirn
In case of a ZONE APPEND write, regardless of native ZONE APPEND or the emulation layer in the zone write plugging code, the sector the data got written to by the device needs to be updated in the bio. At the moment, this is done for every native ZONE APPEND write and every request that is flagged with 'BIO_ZONE_WRITE_PLUGGING'. But thus superfluously updates the sector for regular writes to a zoned block device. Check if a bio is a native ZONE APPEND write or if the bio is flagged as 'BIO_EMULATES_ZONE_APPEND', meaning the block layer's zone write plugging code handles the ZONE APPEND and translates it into a regular write and back. Only if one of these two criterion is met, update the sector in the bio upon completion. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/dea089581cb6b777c1cd1500b38ac0b61df4b2d1.1746530748.git.jth@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 dayscan: mcp251xfd: mcp251xfd_remove(): fix order of unregistration callsMarc Kleine-Budde
If a driver is removed, the driver framework invokes the driver's remove callback. A CAN driver's remove function calls unregister_candev(), which calls net_device_ops::ndo_stop further down in the call stack for interfaces which are in the "up" state. With the mcp251xfd driver the removal of the module causes the following warning: | WARNING: CPU: 0 PID: 352 at net/core/dev.c:7342 __netif_napi_del_locked+0xc8/0xd8 as can_rx_offload_del() deletes the NAPI, while it is still active, because the interface is still up. To fix the warning, first unregister the network interface, which calls net_device_ops::ndo_stop, which disables the NAPI, and then call can_rx_offload_del(). Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250502-can-rx-offload-del-v1-1-59a9b131589d@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 dayscan: mcp251xfd: fix TDC setting for low data bit ratesKelsey Maes
The TDC is currently hardcoded enabled. This means that even for lower CAN-FD data bitrates (with a DBRP (data bitrate prescaler) > 2) a TDC is configured. This leads to a bus-off condition. ISO 11898-1 section 11.3.3 says "Transmitter delay compensation" (TDC) is only applicable if DBRP is 1 or 2. To fix the problem, switch the driver to use the TDC calculation provided by the CAN driver framework (which respects ISO 11898-1 section 11.3.3). This has the positive side effect that userspace can control TDC as needed. Demonstration of the feature in action: | $ ip link set can0 up type can bitrate 125000 dbitrate 500000 fd on | $ ip -details link show can0 | 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10 | link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0 | can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 | bitrate 125000 sample-point 0.875 | tq 50 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 10 brp 2 | mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp_inc 1 | dbitrate 500000 dsample-point 0.875 | dtq 125 dprop-seg 6 dphase-seg1 7 dphase-seg2 2 dsjw 1 dbrp 5 | mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp_inc 1 | tdcv 0..63 tdco 0..63 | clock 40000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 parentbus spi parentdev spi0.0 | $ ip link set can0 up type can bitrate 1000000 dbitrate 4000000 fd on | $ ip -details link show can0 | 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10 | link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0 | can <FD,TDC-AUTO> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 | bitrate 1000000 sample-point 0.750 | tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 5 brp 1 | mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp_inc 1 | dbitrate 4000000 dsample-point 0.700 | dtq 25 dprop-seg 3 dphase-seg1 3 dphase-seg2 3 dsjw 1 dbrp 1 | tdco 7 | mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp_inc 1 | tdcv 0..63 tdco 0..63 | clock 40000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 parentbus spi parentdev spi0.0 There has been some confusion about the MCP2518FD using a relative or absolute TDCO due to the datasheet specifying a range of [-64,63]. I have a custom board with a 40 MHz clock and an estimated loop delay of 100 to 216 ns. During testing at a data bit rate of 4 Mbit/s I found that using can_get_relative_tdco() resulted in bus-off errors. The final TDCO value was 1 which corresponds to a 10% SSP in an absolute configuration. This behavior is expected if the TDCO value is really absolute and not relative. Using priv->can.tdc.tdco instead results in a final TDCO of 8, setting the SSP at exactly 80%. This configuration works. The automatic, manual, and off TDC modes were tested at speeds up to, and including, 8 Mbit/s on real hardware and behave as expected. Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Reported-by: Kelsey Maes <kelsey@vpprocess.com> Closes: https://lore.kernel.org/all/C2121586-C87F-4B23-A933-845362C29CA1@vpprocess.com Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Kelsey Maes <kelsey@vpprocess.com> Link: https://patch.msgid.link/20250430161501.79370-1-kelsey@vpprocess.com [mkl: add comment] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 dayscan: m_can: m_can_class_allocate_dev(): initialize spin lock on device probeAntonios Salios
The spin lock tx_handling_spinlock in struct m_can_classdev is not being initialized. This leads the following spinlock bad magic complaint from the kernel, eg. when trying to send CAN frames with cansend from can-utils: | BUG: spinlock bad magic on CPU#0, cansend/95 | lock: 0xff60000002ec1010, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 | CPU: 0 UID: 0 PID: 95 Comm: cansend Not tainted 6.15.0-rc3-00032-ga79be02bba5c #5 NONE | Hardware name: MachineWare SIM-V (DT) | Call Trace: | [<ffffffff800133e0>] dump_backtrace+0x1c/0x24 | [<ffffffff800022f2>] show_stack+0x28/0x34 | [<ffffffff8000de3e>] dump_stack_lvl+0x4a/0x68 | [<ffffffff8000de70>] dump_stack+0x14/0x1c | [<ffffffff80003134>] spin_dump+0x62/0x6e | [<ffffffff800883ba>] do_raw_spin_lock+0xd0/0x142 | [<ffffffff807a6fcc>] _raw_spin_lock_irqsave+0x20/0x2c | [<ffffffff80536dba>] m_can_start_xmit+0x90/0x34a | [<ffffffff806148b0>] dev_hard_start_xmit+0xa6/0xee | [<ffffffff8065b730>] sch_direct_xmit+0x114/0x292 | [<ffffffff80614e2a>] __dev_queue_xmit+0x3b0/0xaa8 | [<ffffffff8073b8fa>] can_send+0xc6/0x242 | [<ffffffff8073d1c0>] raw_sendmsg+0x1a8/0x36c | [<ffffffff805ebf06>] sock_write_iter+0x9a/0xee | [<ffffffff801d06ea>] vfs_write+0x184/0x3a6 | [<ffffffff801d0a88>] ksys_write+0xa0/0xc0 | [<ffffffff801d0abc>] __riscv_sys_write+0x14/0x1c | [<ffffffff8079ebf8>] do_trap_ecall_u+0x168/0x212 | [<ffffffff807a830a>] handle_exception+0x146/0x152 Initializing the spin lock in m_can_class_allocate_dev solves that problem. Fixes: 1fa80e23c150 ("can: m_can: Introduce a tx_fifo_in_flight counter") Signed-off-by: Antonios Salios <antonios@mwa.re> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20250425111744.37604-2-antonios@mwa.re Reviewed-by: Markus Schneider-Pargmann <msp@baylibre.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
4 daysksmbd: Fix UAF in __close_file_table_idsSean Heelan
A use-after-free is possible if one thread destroys the file via __ksmbd_close_fd while another thread holds a reference to it. The existing checks on fp->refcount are not sufficient to prevent this. The fix takes ft->lock around the section which removes the file from the file table. This prevents two threads acquiring the same file pointer via __close_file_table_ids, as well as the other functions which retrieve a file from the IDR and which already use this same lock. Cc: stable@vger.kernel.org Signed-off-by: Sean Heelan <seanheelan@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
4 daysksmbd: prevent out-of-bounds stream writes by validating *posNorbert Szetei
ksmbd_vfs_stream_write() did not validate whether the write offset (*pos) was within the bounds of the existing stream data length (v_len). If *pos was greater than or equal to v_len, this could lead to an out-of-bounds memory write. This patch adds a check to ensure *pos is less than v_len before proceeding. If the condition fails, -EINVAL is returned. Cc: stable@vger.kernel.org Signed-off-by: Norbert Szetei <norbert@doyensec.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
4 daysdm: fix copying after src array boundariesTudor Ambarus
The blammed commit copied to argv the size of the reallocated argv, instead of the size of the old_argv, thus reading and copying from past the old_argv allocated memory. Following BUG_ON was hit: [ 3.038929][ T1] kernel BUG at lib/string_helpers.c:1040! [ 3.039147][ T1] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP ... [ 3.056489][ T1] Call trace: [ 3.056591][ T1] __fortify_panic+0x10/0x18 (P) [ 3.056773][ T1] dm_split_args+0x20c/0x210 [ 3.056942][ T1] dm_table_add_target+0x13c/0x360 [ 3.057132][ T1] table_load+0x110/0x3ac [ 3.057292][ T1] dm_ctl_ioctl+0x424/0x56c [ 3.057457][ T1] __arm64_sys_ioctl+0xa8/0xec [ 3.057634][ T1] invoke_syscall+0x58/0x10c [ 3.057804][ T1] el0_svc_common+0xa8/0xdc [ 3.057970][ T1] do_el0_svc+0x1c/0x28 [ 3.058123][ T1] el0_svc+0x50/0xac [ 3.058266][ T1] el0t_64_sync_handler+0x60/0xc4 [ 3.058452][ T1] el0t_64_sync+0x1b0/0x1b4 [ 3.058620][ T1] Code: f800865e a9bf7bfd 910003fd 941f48aa (d4210000) [ 3.058897][ T1] ---[ end trace 0000000000000000 ]--- [ 3.059083][ T1] Kernel panic - not syncing: Oops - BUG: Fatal exception Fix it by copying the size of src, and not the size of dst, as it was. Fixes: 5a2a6c428190 ("dm: always update the array size in realloc_argv on success") Cc: stable@vger.kernel.org Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
4 daysdrm/panel: simple: Update timings for AUO G101EVN010Kevin Baker
Switch to panel timings based on datasheet for the AUO G101EVN01.0 LVDS panel. Default timings were tested on the panel. Previous mode-based timings resulted in horizontal display shift. Signed-off-by: Kevin Baker <kevinb@ventureresearch.com> Fixes: 4fb86404a977 ("drm/panel: simple: Add AUO G101EVN010 panel support") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250505170256.1385113-1-kevinb@ventureresearch.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20250505170256.1385113-1-kevinb@ventureresearch.com
4 daysnet: ethernet: mtk_eth_soc: do not reset PSE when setting FEFrank Wunderlich
Remove redundant PSE reset. When setting FE register there is no need to reset PSE, doing so may cause FE to work abnormal. Link: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/3a5223473e086a4b54a2b9a44df7d9ddcc2bc75a Fixes: dee4dd10c79aa ("net: ethernet: mtk_eth_soc: ppe: add support for multiple PPEs") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Link: https://patch.msgid.link/18f0ac7d83f82defa3342c11ef0d1362f6b81e88.1746406763.git.daniel@makrotopia.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysnet: ethernet: mtk_eth_soc: reset all TX queues on DMA freeDaniel Golle
The purpose of resetting the TX queue is to reset the byte and packet count as well as to clear the software flow control XOFF bit. MediaTek developers pointed out that netdev_reset_queue would only resets queue 0 of the network device. Queues that are not reset may cause unexpected issues. Packets may stop being sent after reset and "transmit timeout" log may be displayed. Import fix from MediaTek's SDK to resolve this issue. Link: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/319c0d9905579a46dc448579f892f364f1f84818 Fixes: f63959c7eec31 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/c9ff9adceac4f152239a0f65c397f13547639175.1746406763.git.daniel@makrotopia.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysarm64: cpufeature: Move arm64_use_ng_mappings to the .data section to ↵Yeoreum Yun
prevent wrong idmap generation The PTE_MAYBE_NG macro sets the nG page table bit according to the value of "arm64_use_ng_mappings". This variable is currently placed in the .bss section. create_init_idmap() is called before the .bss section initialisation which is done in early_map_kernel(). Therefore, data/test_prot in create_init_idmap() could be set incorrectly through the PAGE_KERNEL -> PROT_DEFAULT -> PTE_MAYBE_NG macros. # llvm-objdump-21 --syms vmlinux-gcc | grep arm64_use_ng_mappings ffff800082f242a8 g O .bss 0000000000000001 arm64_use_ng_mappings The create_init_idmap() function disassembly compiled with llvm-21: // create_init_idmap() ffff80008255c058: d10103ff sub sp, sp, #0x40 ffff80008255c05c: a9017bfd stp x29, x30, [sp, #0x10] ffff80008255c060: a90257f6 stp x22, x21, [sp, #0x20] ffff80008255c064: a9034ff4 stp x20, x19, [sp, #0x30] ffff80008255c068: 910043fd add x29, sp, #0x10 ffff80008255c06c: 90003fc8 adrp x8, 0xffff800082d54000 ffff80008255c070: d280e06a mov x10, #0x703 // =1795 ffff80008255c074: 91400409 add x9, x0, #0x1, lsl #12 // =0x1000 ffff80008255c078: 394a4108 ldrb w8, [x8, #0x290] ------------- (1) ffff80008255c07c: f2e00d0a movk x10, #0x68, lsl #48 ffff80008255c080: f90007e9 str x9, [sp, #0x8] ffff80008255c084: aa0103f3 mov x19, x1 ffff80008255c088: aa0003f4 mov x20, x0 ffff80008255c08c: 14000000 b 0xffff80008255c08c <__pi_create_init_idmap+0x34> ffff80008255c090: aa082d56 orr x22, x10, x8, lsl #11 -------- (2) Note (1) is loading the arm64_use_ng_mappings value in w8 and (2) is set the text or data prot with the w8 value to set PTE_NG bit. If the .bss section isn't initialized, x8 could include a garbage value and generate an incorrect mapping. Annotate arm64_use_ng_mappings as __read_mostly so that it is placed in the .data section. Fixes: 84b04d3e6bdb ("arm64: kernel: Create initial ID map from C code") Cc: stable@vger.kernel.org # 6.9.x Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Link: https://lore.kernel.org/r/20250502180412.3774883-1-yeoreum.yun@arm.com [catalin.marinas@arm.com: use __read_mostly instead of __ro_after_init] [catalin.marinas@arm.com: slight tweaking of the code comment] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
4 daysdrm/ttm: Remove the struct ttm_backup abstractionThomas Hellström
The abstraction was previously added to support separate ttm_backup implementations. However with the current implementation casting from a struct file to a struct ttm_backup, we run into trouble since struct file may have randomized the layout and gcc complains. Remove the struct ttm_backup abstraction Cc: dri-devel@lists.freedesktop.org Cc: Matthew Brost <matthew.brost@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Kees Cook <kees@kernel.org> Closes: https://lore.kernel.org/dri-devel/9c8dbbafdaf9f3f089da2cde5a772d69579b3795.camel@linux.intel.com/T/#mb153ab9216cb813b92bdeb36f391ad4808c2ba29 Suggested-by: Christian König <christian.koenig@amd.com> Fixes: 70d645deac98 ("drm/ttm: Add helpers for shrinking") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250502130014.3156-1-thomas.hellstrom@linux.intel.com
4 daysdrm/ttm: Fix ttm_backup kerneldocThomas Hellström
The docs were not properly updated from an earlier version of the code. Fixes: e7b5d23e5d47 ("drm/ttm: Provide a shmem backup implementation") Cc: Christian König <christian.koenig@amd.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250502130101.3185-1-thomas.hellstrom@linux.intel.com
4 daystools: ynl-gen: validate 0 len strings from kernelDavid Wei
Strings from the kernel are guaranteed to be null terminated and ynl_attr_validate() checks for this. But it doesn't check if the string has a len of 0, which would cause problems when trying to access data[len - 1]. Fix this by checking that len is positive. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250503043050.861238-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysMerge branch 'selftests-drv-net-fix-ping-py-test-failure'Jakub Kicinski
Mohsin Bashir says: ==================== selftests: drv: net: fix `ping.py` test failure Fix `ping.py` test failure on an ipv6 system, and appropriately handle the cases where either one of the two address families (ipv4, ipv6) is not present. ==================== Link: https://patch.msgid.link/20250503013518.1722913-1-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysselftests: drv: net: add version indicatorMohsin Bashir
Currently, the test result does not differentiate between the cases when either one of the address families are configured or if both the address families are configured. Ideally, the result should report if a particular case was skipped. ./drivers/net/ping.py TAP version 13 1..7 ok 1 ping.test_default_v4 # SKIP Test requires IPv4 connectivity ok 2 ping.test_default_v6 ok 3 ping.test_xdp_generic_sb ok 4 ping.test_xdp_generic_mb ok 5 ping.test_xdp_native_sb ok 6 ping.test_xdp_native_mb ok 7 ping.test_xdp_offload # SKIP device does not support offloaded XDP Totals: pass:5 fail:0 xfail:0 xpass:0 skip:2 error:0 Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250503013518.1722913-4-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysselftests: drv: net: avoid skipping testsMohsin Bashir
On a system with either of the ipv4 or ipv6 information missing, tests are currently skipped. Ideally, the test should run as long as at least one address family is present. This patch make test run whenever possible. Before: ./drivers/net/ping.py TAP version 13 1..6 ok 1 ping.test_default # SKIP Test requires IPv4 connectivity ok 2 ping.test_xdp_generic_sb # SKIP Test requires IPv4 connectivity ok 3 ping.test_xdp_generic_mb # SKIP Test requires IPv4 connectivity ok 4 ping.test_xdp_native_sb # SKIP Test requires IPv4 connectivity ok 5 ping.test_xdp_native_mb # SKIP Test requires IPv4 connectivity ok 6 ping.test_xdp_offload # SKIP device does not support offloaded XDP Totals: pass:0 fail:0 xfail:0 xpass:0 skip:6 error:0 After: ./drivers/net/ping.py TAP version 13 1..6 ok 1 ping.test_default ok 2 ping.test_xdp_generic_sb ok 3 ping.test_xdp_generic_mb ok 4 ping.test_xdp_native_sb ok 5 ping.test_xdp_native_mb ok 6 ping.test_xdp_offload # SKIP device does not support offloaded XDP Totals: pass:5 fail:0 xfail:0 xpass:0 skip:1 error:0 Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20250503013518.1722913-3-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysselftests: drv: net: fix test failure on ipv6 sysMohsin Bashir
The `get_interface_info` call has ip version hard-coded which leads to failures on an IPV6 system. The NetDrvEnv class already gathers information about remote interface, so instead of fixing the local implementation switch to using cfg.remote_ifname. Before: ./drivers/net/ping.py Traceback (most recent call last): File "/new_tests/./drivers/net/ping.py", line 217, in <module> main() File "/new_tests/./drivers/net/ping.py", line 204, in main get_interface_info(cfg) File "/new_tests/./drivers/net/ping.py", line 128, in get_interface_info raise KsftFailEx('Can not get remote interface') net.lib.py.ksft.KsftFailEx: Can not get remote interface After: ./drivers/net/ping.py TAP version 13 1..6 ok 1 ping.test_default # SKIP Test requires IPv4 connectivity ok 2 ping.test_xdp_generic_sb # SKIP Test requires IPv4 connectivity ok 3 ping.test_xdp_generic_mb # SKIP Test requires IPv4 connectivity ok 4 ping.test_xdp_native_sb # SKIP Test requires IPv4 connectivity ok 5 ping.test_xdp_native_mb # SKIP Test requires IPv4 connectivity ok 6 ping.test_xdp_offload # SKIP device does not support offloaded XDP Totals: pass:0 fail:0 xfail:0 xpass:0 skip:6 error:0 Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250503013518.1722913-2-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysMerge branch 'gre-reapply-ipv6-link-local-address-generation-fix'Jakub Kicinski
Guillaume Nault says: ==================== gre: Reapply IPv6 link-local address generation fix. Reintroduce the IPv6 link-local address generation fix for GRE and its kernel selftest. These patches were introduced by merge commit b3fc5927de4b ("Merge branch 'gre-fix-regressions-in-ipv6-link-local-address-generation'") but have been reverted by commit 8417db0be5bb ("Merge branch 'gre-revert-ipv6-link-local-address-fix'"), because it uncovered another bug in multipath routing. Now that this bug has been investigated and fixed, we can apply the GRE link-local address fix and its kernel selftest again. For convenience, here's the original cover letter: IPv6 link-local address generation has some special cases for GRE devices. This has led to several regressions in the past, and some of them are still not fixed. This series fixes the remaining problems, like the ipv6.conf.<dev>.addr_gen_mode sysctl being ignored and the router discovery process not being started (see details in patch 1). To avoid any further regressions, patch 2 adds selftests covering IPv4 and IPv6 gre/gretap devices with all combinations of currently supported addr_gen_mode values. ==================== Link: https://patch.msgid.link/cover.1746225213.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysselftests: Add IPv6 link-local address generation tests for GRE devices.Guillaume Nault
GRE devices have their special code for IPv6 link-local address generation that has been the source of several regressions in the past. Add selftest to check that all gre, ip6gre, gretap and ip6gretap get an IPv6 link-link local address in accordance with the net.ipv6.conf.<dev>.addr_gen_mode sysctl. Note: This patch was originally applied as commit 6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices."). However, it was then reverted by commit 355d940f4d5a ("Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."") because the commit it depended on was going to be reverted. Now that the situation is resolved, we can add this selftest again (no changes since original patch, appart from context update in tools/testing/selftests/net/Makefile). Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/2c3a5733cb3a6e3119504361a9b9f89fda570a2d.1746225214.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysgre: Fix again IPv6 link-local address generation.Guillaume Nault
Use addrconf_addr_gen() to generate IPv6 link-local addresses on GRE devices in most cases and fall back to using add_v4_addrs() only in case the GRE configuration is incompatible with addrconf_addr_gen(). GRE used to use addrconf_addr_gen() until commit e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") restricted this use to gretap and ip6gretap devices, and created add_v4_addrs() (borrowed from SIT) for non-Ethernet GRE ones. The original problem came when commit 9af28511be10 ("addrconf: refuse isatap eui64 for INADDR_ANY") made __ipv6_isatap_ifid() fail when its addr parameter was 0. The commit says that this would create an invalid address, however, I couldn't find any RFC saying that the generated interface identifier would be wrong. Anyway, since gre over IPv4 devices pass their local tunnel address to __ipv6_isatap_ifid(), that commit broke their IPv6 link-local address generation when the local address was unspecified. Then commit e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") tried to fix that case by defining add_v4_addrs() and calling it to generate the IPv6 link-local address instead of using addrconf_addr_gen() (apart for gretap and ip6gretap devices, which would still use the regular addrconf_addr_gen(), since they have a MAC address). That broke several use cases because add_v4_addrs() isn't properly integrated into the rest of IPv6 Neighbor Discovery code. Several of these shortcomings have been fixed over time, but add_v4_addrs() remains broken on several aspects. In particular, it doesn't send any Router Sollicitations, so the SLAAC process doesn't start until the interface receives a Router Advertisement. Also, add_v4_addrs() mostly ignores the address generation mode of the interface (/proc/sys/net/ipv6/conf/*/addr_gen_mode), thus breaking the IN6_ADDR_GEN_MODE_RANDOM and IN6_ADDR_GEN_MODE_STABLE_PRIVACY cases. Fix the situation by using add_v4_addrs() only in the specific scenario where the normal method would fail. That is, for interfaces that have all of the following characteristics: * run over IPv4, * transport IP packets directly, not Ethernet (that is, not gretap interfaces), * tunnel endpoint is INADDR_ANY (that is, 0), * device address generation mode is EUI64. In all other cases, revert back to the regular addrconf_addr_gen(). Also, remove the special case for ip6gre interfaces in add_v4_addrs(), since ip6gre devices now always use addrconf_addr_gen() instead. Note: This patch was originally applied as commit 183185a18ff9 ("gre: Fix IPv6 link-local address generation."). However, it was then reverted by commit fc486c2d060f ("Revert "gre: Fix IPv6 link-local address generation."") because it uncovered another bug that ended up breaking net/forwarding/ip6gre_custom_multipath_hash.sh. That other bug has now been fixed by commit 4d0ab3a6885e ("ipv6: Start path selection from the first nexthop"). Therefore we can now revive this GRE patch (no changes since original commit 183185a18ff9 ("gre: Fix IPv6 link-local address generation."). Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/a88cc5c4811af36007645d610c95102dccb360a6.1746225214.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysdt-bindings: net: ethernet-controller: Add informative text about RGMII delaysAndrew Lunn
Device Tree and Ethernet MAC driver writers often misunderstand RGMII delays. Rewrite the Normative section in terms of the PCB, is the PCB adding the 2ns delay. This meaning was previous implied by the definition, but often wrongly interpreted due to the ambiguous wording and looking at the definition from the wrong perspective. The new definition concentrates clearly on the hardware, and should be less ambiguous. Add an Informative section to the end of the binding describing in detail what the four RGMII delays mean. This expands on just the PCB meaning, adding in the implications for the MAC and PHY. Additionally, when the MAC or PHY needs to add a delay, which is software configuration, describe how Linux does this, in the hope of reducing errors. Make it clear other users of device tree binding may implement the software configuration in other ways while still conforming to the binding. Fixes: 9d3de3c58347 ("dt-bindings: net: Add YAML schemas for the generic Ethernet options") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20250430-v6-15-rc3-net-rgmii-delays-v2-1-099ae651d5e5@lunn.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysvirtio-net: free xsk_buffs on error in virtnet_xsk_pool_enable()Jakub Kicinski
The selftests added to our CI by Bui Quang Minh recently reveals that there is a mem leak on the error path of virtnet_xsk_pool_enable(): unreferenced object 0xffff88800a68a000 (size 2048): comm "xdp_helper", pid 318, jiffies 4294692778 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): __kvmalloc_node_noprof+0x402/0x570 virtnet_xsk_pool_enable+0x293/0x6a0 (drivers/net/virtio_net.c:5882) xp_assign_dev+0x369/0x670 (net/xdp/xsk_buff_pool.c:226) xsk_bind+0x6a5/0x1ae0 __sys_bind+0x15e/0x230 __x64_sys_bind+0x72/0xb0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Acked-by: Jason Wang <jasowang@redhat.com> Fixes: e9f3962441c0 ("virtio_net: xsk: rx: support fill with xsk buffer") Link: https://patch.msgid.link/20250430163836.3029761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysvirtio-net: don't re-enable refill work too early when NAPI is disabledJakub Kicinski
Commit 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx") fixed a deadlock between reconfig paths and refill work trying to disable the same NAPI instance. The refill work can't run in parallel with reconfig because trying to double-disable a NAPI instance causes a stall under the instance lock, which the reconfig path needs to re-enable the NAPI and therefore unblock the stalled thread. There are two cases where we re-enable refill too early. One is in the virtnet_set_queues() handler. We call it when installing XDP: virtnet_rx_pause_all(vi); ... virtnet_napi_tx_disable(..); ... virtnet_set_queues(..); ... virtnet_rx_resume_all(..); We want the work to be disabled until we call virtnet_rx_resume_all(), but virtnet_set_queues() kicks it before NAPIs were re-enabled. The other case is a more trivial case of mis-ordering in __virtnet_rx_resume() found by code inspection. Taking the spin lock in virtnet_set_queues() (requested during review) may be unnecessary as we are under rtnl_lock and so are all paths writing to ->refill_enabled. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Bui Quang Minh <minhquangbui99@gmail.com> Fixes: 4bc12818b363 ("virtio-net: disable delayed refill when pausing rx") Fixes: 413f0271f396 ("net: protect NAPI enablement with netdev_lock()") Link: https://patch.msgid.link/20250430163758.3029367-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysMerge branch 'net_sched-fix-a-regression-in-sch_htb'Jakub Kicinski
Cong Wang says: ==================== net_sched: fix a regression in sch_htb This patchset contains a fix for the regression reported by Alan and a selftest to cover that case. Please see each patch description for more details. ==================== Link: https://patch.msgid.link/20250428232955.1740419-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysselftests/tc-testing: Add a test case to cover basic HTB+FQ_CODEL caseCong Wang
Integrate the reproducer from Alan into TC selftests and use scapy to generate TCP traffic instead of relying on ping command. Cc: Alan J. Wylie <alan@wylie.me.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250428232955.1740419-3-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 dayssch_htb: make htb_deactivate() idempotentCong Wang
Alan reported a NULL pointer dereference in htb_next_rb_node() after we made htb_qlen_notify() idempotent. It turns out in the following case it introduced some regression: htb_dequeue_tree(): |-> fq_codel_dequeue() |-> qdisc_tree_reduce_backlog() |-> htb_qlen_notify() |-> htb_deactivate() |-> htb_next_rb_node() |-> htb_deactivate() For htb_next_rb_node(), after calling the 1st htb_deactivate(), the clprio[prio]->ptr could be already set to NULL, which means htb_next_rb_node() is vulnerable here. For htb_deactivate(), although we checked qlen before calling it, in case of qlen==0 after qdisc_tree_reduce_backlog(), we may call it again which triggers the warning inside. To fix the issues here, we need to: 1) Make htb_deactivate() idempotent, that is, simply return if we already call it before. 2) Make htb_next_rb_node() safe against ptr==NULL. Many thanks to Alan for testing and for the reproducer. Fixes: 5ba8b837b522 ("sch_htb: make htb_qlen_notify() idempotent") Reported-by: Alan J. Wylie <alan@wylie.me.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250428232955.1740419-2-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysbcachefs: Call bch2_fs_start before getting vfs superblockKent Overstreet
This reverts 1fdbe0b184c8 bcachefs: Make sure c->vfs_sb is set before starting fs switched up bch2_fs_get_tree() so that we got a superblock before calling bch2_fs_start, so that c->vfs_sb would always be initialized while the filesystem was active. This turned out not to be necessary, because blk_holder_ops were implemented using our own locking, not vfs locking. And this had the side effect of creating a super_block and doing our full recovery (including potentially fsck) before setting SB_BORN, which causes things like sync calls to hang until our recovery is finished. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 daysbcachefs: fix hung task timeout in journal readKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 daysbcachefs: Add missing barriers before wake_up_bit()Kent Overstreet
wake_up() doesn't require a barrier - but wake_up_bit() does. This only affected non x86, and primarily lead to lost wakeups after btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 daysbcachefs: Ensure proper write alignmentKent Overstreet
There was a buggy version of bcachefs-tools which picked misaligned bucket sizes when formatting, and we're also about to do dynamic block sizes - which will allow picking logical block size or physical block size of the device per-write, allowing for better compression ratios at the cost of slightly worse write performance (i.e. forcing the device to do RMW or extra buffering). To account for this, tweak bch2_alloc_sectors_start() to properly align open_buckets to the blocksize of the write we're about to do. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 daysbcachefs: Improve want_cached_ptr()Kent Overstreet
If promote target isn't set, rebalance should still leave a cached copy on the faster device. Fall back to foreground_target if it's set, or allow a cached copy on any device if neither are set. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>