Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
- Do not free "head" variable in filter_free_subsystem_filters()
The first error path jumps to "free_now" label but first frees the
newly allocated "head" variable. But the "free_now" code checks this
variable, and if it is not NULL, it will iterate the list. As this
list variable was already initialized, the "free_now" code will not
do anything as it is empty. But freeing it will cause a UAF bug.
The error path should simply jump to the "free_now" label and leave
the "head" variable alone.
* tag 'trace-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Do not free "head" on error path of filter_free_subsystem_filters()
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Rework of system register accessors for system registers that are
directly writen to memory, so that sanitisation of the in-memory
value happens at the correct time (after the read, or before the
write). For convenience, RMW-style accessors are also provided.
- Multiple fixes for the so-called "arch-timer-edge-cases' selftest,
which was always broken.
x86:
- Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to
pass only the "untouched" addresses and flipping the shared/private
bit in the implementation.
- Disable SEV-SNP support on initialization failure
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY
KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORY
KVM: SEV: Disable SEV-SNP support on initialization failure
KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases
KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases
KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases
KVM: arm64: selftests: Fix help text for arch_timer_edge_cases
KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand
KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg
KVM: arm64: Add RMW specific sysreg accessor
KVM: arm64: Add assignment-specific sysreg accessor
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a broken self-test in hkdf (new regression)"
* tag 'v6.16-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: hkdf - move to late_initcall
|
|
Pull bcachefs fixes from Kent Overstreet:
"As usual, highlighting the ones users have been noticing:
- Fix a small issue with has_case_insensitive not being propagated on
snapshot creation; this led to fsck errors, which we're harmless
because we're not using this flag yet (it's for overlayfs +
casefolding).
- Log the error being corrected in the journal when we're doing fsck
repair: this was one of the "lessons learned" from the i_nlink 0 ->
subvolume deletion bug, where reconstructing what had happened by
analyzing the journal was a bit more difficult than it needed to
be.
- Don't schedule btree node scan to run in the superblock: this fixes
a regression from the 6.16 recovery passes rework, and let to it
running unnecessarily.
The real issue here is that we don't have online, "self healing"
style topology repair yet: topology repair currently has to run
before we go RW, which means that we may schedule it unnecessarily
after a transient error. This will be fixed in the future.
- We now track, in btree node flags, the reason it was scheduled to
be rewritten. We discovered a deadlock in recovery when many btree
nodes need to be rewritten because they're degraded: fully fixing
this will take some work but it's now easier to see what's going
on.
For the bug report where this came up, a device had been kicked RO
due to transient errors: manually setting it back to RW was
sufficient to allow recovery to succeed.
- Mark a few more fsck errors as autofix: as a reminder to users,
please do keep reporting cases where something needs to be repaired
and is not repaired automatically (i.e. cases where -o fix_errors
or fsck -y is required).
- rcu_pending.c now works with PREEMPT_RT
- 'bcachefs device add', then umount, then remount wasn't working -
we now emit a uevent so that the new device's new superblock is
correctly picked up
- Assorted repair fixes: btree node scan will no longer incorrectly
update sb->version_min,
- Assorted syzbot fixes"
* tag 'bcachefs-2025-06-12' of git://evilpiepirate.org/bcachefs: (23 commits)
bcachefs: Don't trace should_be_locked unless changing
bcachefs: Ensure that snapshot creation propagates has_case_insensitive
bcachefs: Print devices we're mounting on multi device filesystems
bcachefs: Don't trust sb->nr_devices in members_to_text()
bcachefs: Fix version checks in validate_bset()
bcachefs: ioctl: avoid stack overflow warning
bcachefs: Don't pass trans to fsck_err() in gc_accounting_done
bcachefs: Fix leak in bch2_fs_recovery() error path
bcachefs: Fix rcu_pending for PREEMPT_RT
bcachefs: Fix downgrade_table_extra()
bcachefs: Don't put rhashtable on stack
bcachefs: Make sure opts.read_only gets propagated back to VFS
bcachefs: Fix possible console lock involved deadlock
bcachefs: mark more errors autofix
bcachefs: Don't persistently run scan_for_btree_nodes
bcachefs: Read error message now prints if self healing
bcachefs: Only run 'increase_depth' for keys from btree node csan
bcachefs: Mark need_discard_freespace_key_bad autofix
bcachefs: Update /dev/disk/by-uuid on device add
bcachefs: Add more flags to btree nodes for rewrite reason
...
|
|
Pull bitmap fix from Yury Norov:
"Fix for __GENMASK() and __GENMASK_ULL() in UAPI"
* tag 'bitmap-for-6.16-rc2' of https://github.com/norov/linux:
uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent
invalid routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
transient packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple
requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set"
* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
net: ethtool: Don't check if RSS context exists in case of context 0
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
ipv6: Move fib6_config_validate() to ip6_route_add().
net: drv: netdevsim: don't napi_complete() from netpoll
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
veth: prevent NULL pointer dereference in veth_xdp_rcv
net_sched: remove qdisc_tree_flush_backlog()
net_sched: ets: fix a race in ets_qdisc_change()
net_sched: tbf: fix a race in tbf_change()
net_sched: red: fix a race in __red_change()
net_sched: prio: fix a race in prio_tune()
net_sched: sch_sfq: reject invalid perturb period
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
MAINTAINERS: Update Kuniyuki Iwashima's email address.
selftests: net: add test case for NAT46 looping back dst
net: clear the dst when changing skb protocol
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
net/mlx5e: Fix leak of Geneve TLV option object
net/mlx5: HWS, make sure the uplink is the last destination
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Add some missing pins on the Qualcomm QCM2290, along with a managed
resources patch that make it clean and nice
- Drop an unused function in the ST Micro driver
- Drop bouncing MAINTAINER entry
- Drop of_match_ptr() macro to rid compile warnings in the TB10x
driver
- Fix up calculation of pin numbers from base in the Sunxi driver
* tag 'pinctrl-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: dt: Consider pin base when calculating bank number from pin
pinctrl: tb10x: Drop of_match_ptr for ID table
pinctrl: MAINTAINERS: Drop bouncing Jianlong Huang
pinctrl: st: Drop unused st_gpio_bank() function
pinctrl: qcom: pinctrl-qcm2290: Add missing pins
pinctrl: qcom: switch to devm_gpiochip_add_data()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- arch_atomic64_cmpxchg relaxed variant [Jason]
- use of inbuilt swap in stack unwinder [Yu-Chun Lin]
- use of __ASSEMBLER__ in kernel headers [Thomas Huth]
* tag 'arc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in the non-uapi headers
ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
ARC: unwind: Use built-in sort swap to reduce code size and improve performance
ARC: atomics: Implement arch_atomic64_cmpxchg using _relaxed
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Another quick round of updates:
- revert mwifiex HT40 that was causing issues
- many ath10k/ath11k/ath12k fixes
- re-add some iwlwifi code I lost in a merge
- use kfree_sensitive() on an error path in cfg80211
* tag 'wireless-2025-06-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: use kfree_sensitive() for connkeys cleanup
wifi: iwlwifi: fix merge damage related to iwl_pci_resume
Revert "wifi: mwifiex: Fix HT40 bandwidth issue."
wifi: ath12k: fix uaf in ath12k_core_init()
wifi: ath12k: Fix hal_reo_cmd_status kernel-doc
wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850
wifi: ath11k: validate ath11k_crypto_mode on top of ath11k_core_qmi_firmware_ready
wifi: ath11k: consistently use ath11k_mac_get_fw_stats()
wifi: ath11k: move locking outside of ath11k_mac_get_fw_stats()
wifi: ath11k: adjust unlock sequence in ath11k_update_stats_event()
wifi: ath11k: move some firmware stats related functions outside of debugfs
wifi: ath11k: don't wait when there is no vdev started
wifi: ath11k: don't use static variables in ath11k_debugfs_fw_stats_process()
wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
wil6210: fix support for sparrow chipsets
wifi: ath10k: Avoid vdev delete timeout when firmware is already down
ath10k: snoc: fix unbalanced IRQ enable in crash recovery
====================
Link: https://patch.msgid.link/20250612082519.11447-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gal Pressman says:
====================
Fix ntuple rules targeting default RSS
This series addresses a regression in ethtool flow steering where rules
targeting the default RSS context (context 0) were incorrectly rejected.
The default RSS context always exists but is not stored in the rss_ctx
xarray like additional contexts. The current validation logic was
checking for the existence of context 0 in this array, causing valid
flow steering rules to be rejected.
This prevented configurations such as:
- High priority rules directing specific traffic to the default context
- Low priority catch-all rules directing remaining traffic to additional
contexts
Patch 1 fixes the validation logic to skip the existence check for
context 0.
Patch 2 adds a selftest that verifies this behavior.
v3: https://lore.kernel.org/20250609120250.1630125-1-gal@nvidia.com
v2: https://lore.kernel.org/20250225071348.509432-1-gal@nvidia.com
====================
Link: https://patch.msgid.link/20250612071958.1696361-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
context
Add test_rss_default_context_rule() to verify that ntuple rules can
correctly direct traffic to the default RSS context (context 0).
The test creates two ntuple rules with explicit location priorities:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This validates that:
1. Rules targeting the default context function properly.
2. Traffic steering works as expected when mixing default and
additional RSS contexts.
The test was written by AI, and reviewed by humans.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250612071958.1696361-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Context 0 (default context) always exists, there is no need to check
whether it exists or not when adding a flow steering rule.
The existing check fails when creating a flow steering rule for context
0 as it is not stored in the rss_ctx xarray.
For example:
$ ethtool --config-ntuple eth2 flow-type tcp4 dst-ip 194.237.147.23 dst-port 19983 context 0 loc 618
rmgr: Cannot insert RX class rule: Invalid argument
Cannot insert classification rule
An example usecase for this could be:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This is a user-visible regression that was caught in our testing
environment, it was not reported by a user yet.
Fixes: de7f7582dff2 ("net: ethtool: prevent flow steering to RSS contexts which don't exist")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250612071958.1696361-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- eir: Fix NULL pointer deference on eir_get_service_data
- eir: Fix possible crashes on eir_create_adv_data
- hci_sync: Fix broadcast/PA when using an existing instance
- ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
- ISO: Fix not using bc_sid as advertisement SID
- MGMT: Fix sparse errors
* tag 'for-net-2025-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Fix sparse errors
Bluetooth: ISO: Fix not using bc_sid as advertisement SID
Bluetooth: ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
Bluetooth: eir: Fix possible crashes on eir_create_adv_data
Bluetooth: hci_sync: Fix broadcast/PA when using an existing instance
Bluetooth: Fix NULL pointer deference on eir_get_service_data
====================
Link: https://patch.msgid.link/20250611204944.1559356-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Before the cited commit, the kernel unconditionally embedded SCM
credentials to skb for embryo sockets even when both the sender
and listener disabled SO_PASSCRED and SO_PASSPIDFD.
Now, the credentials are added to skb only when configured by the
sender or the listener.
However, as reported in the link below, it caused a regression for
some programs that assume credentials are included in every skb,
but sometimes not now.
The only problematic scenario would be that a socket starts listening
before setting the option. Then, there will be 2 types of non-small
race window, where a client can send skb without credentials, which
the peer receives as an "invalid" message (and aborts the connection
it seems ?):
Client Server
------ ------
s1.listen() <-- No SO_PASS{CRED,PIDFD}
s2.connect()
s2.send() <-- w/o cred
s1.setsockopt(SO_PASS{CRED,PIDFD})
s2.send() <-- w/ cred
or
Client Server
------ ------
s1.listen() <-- No SO_PASS{CRED,PIDFD}
s2.connect()
s2.send() <-- w/o cred
s3, _ = s1.accept() <-- Inherit cred options
s2.send() <-- w/o cred but not set yet
s3.setsockopt(SO_PASS{CRED,PIDFD})
s2.send() <-- w/ cred
It's unfortunate that buggy programs depend on the behaviour,
but let's restore the previous behaviour.
Fixes: 3f84d577b79d ("af_unix: Inherit sk_flags at connect().")
Reported-by: Jacek Łuczak <difrost.kernel@gmail.com>
Closes: https://lore.kernel.org/all/68d38b0b-1666-4974-85d4-15575789c8d4@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Tested-by: Christian Heusel <christian@heusel.eu>
Tested-by: André Almeida <andrealmeid@igalia.com>
Tested-by: Jacek Łuczak <difrost.kernel@gmail.com>
Link: https://patch.msgid.link/20250611202758.3075858-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller created an IPv6 route from a malformed packet, which has
a prefix len > 128, triggering the splat below. [0]
This is a similar issue fixed by commit 586ceac9acb7 ("ipv6: Restore
fib6_config validation for SIOCADDRT.").
The cited commit removed fib6_config validation from some callers
of ip6_add_route().
Let's move the validation back to ip6_route_add() and
ip6_route_multipath_add().
[0]:
UBSAN: array-index-out-of-bounds in ./include/net/ipv6.h:616:34
index 20 is out of range for type '__u8 [16]'
CPU: 1 UID: 0 PID: 7444 Comm: syz.0.708 Not tainted 6.16.0-rc1-syzkaller-g19272b37aa4f #0 PREEMPT
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80078a80>] dump_backtrace+0x2e/0x3c arch/riscv/kernel/stacktrace.c:132
[<ffffffff8000327a>] show_stack+0x30/0x3c arch/riscv/kernel/stacktrace.c:138
[<ffffffff80061012>] __dump_stack lib/dump_stack.c:94 [inline]
[<ffffffff80061012>] dump_stack_lvl+0x12e/0x1a6 lib/dump_stack.c:120
[<ffffffff800610a6>] dump_stack+0x1c/0x24 lib/dump_stack.c:129
[<ffffffff8001c0ea>] ubsan_epilogue+0x14/0x46 lib/ubsan.c:233
[<ffffffff819ba290>] __ubsan_handle_out_of_bounds+0xf6/0xf8 lib/ubsan.c:455
[<ffffffff85b363a4>] ipv6_addr_prefix include/net/ipv6.h:616 [inline]
[<ffffffff85b363a4>] ip6_route_info_create+0x8f8/0x96e net/ipv6/route.c:3793
[<ffffffff85b635da>] ip6_route_add+0x2a/0x1aa net/ipv6/route.c:3889
[<ffffffff85b02e08>] addrconf_prefix_route+0x2c4/0x4e8 net/ipv6/addrconf.c:2487
[<ffffffff85b23bb2>] addrconf_prefix_rcv+0x1720/0x1e62 net/ipv6/addrconf.c:2878
[<ffffffff85b92664>] ndisc_router_discovery+0x1a06/0x3504 net/ipv6/ndisc.c:1570
[<ffffffff85b99038>] ndisc_rcv+0x500/0x600 net/ipv6/ndisc.c:1874
[<ffffffff85bc2c18>] icmpv6_rcv+0x145e/0x1e0a net/ipv6/icmp.c:988
[<ffffffff85af6798>] ip6_protocol_deliver_rcu+0x18a/0x1976 net/ipv6/ip6_input.c:436
[<ffffffff85af8078>] ip6_input_finish+0xf4/0x174 net/ipv6/ip6_input.c:480
[<ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<ffffffff85af8262>] ip6_input+0x16a/0x70c net/ipv6/ip6_input.c:491
[<ffffffff85af8dcc>] ip6_mc_input+0x5c8/0x1268 net/ipv6/ip6_input.c:588
[<ffffffff85af6112>] dst_input include/net/dst.h:469 [inline]
[<ffffffff85af6112>] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
[<ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<ffffffff85af6112>] ipv6_rcv+0x5ae/0x6e0 net/ipv6/ip6_input.c:309
[<ffffffff85087e84>] __netif_receive_skb_one_core+0x106/0x16e net/core/dev.c:5977
[<ffffffff85088104>] __netif_receive_skb+0x2c/0x144 net/core/dev.c:6090
[<ffffffff850883c6>] netif_receive_skb_internal net/core/dev.c:6176 [inline]
[<ffffffff850883c6>] netif_receive_skb+0x1aa/0xbf2 net/core/dev.c:6235
[<ffffffff8328656e>] tun_rx_batched.isra.0+0x430/0x686 drivers/net/tun.c:1485
[<ffffffff8329ed3a>] tun_get_user+0x2952/0x3d6c drivers/net/tun.c:1938
[<ffffffff832a21e0>] tun_chr_write_iter+0xc4/0x21c drivers/net/tun.c:1984
[<ffffffff80b9b9ae>] new_sync_write fs/read_write.c:593 [inline]
[<ffffffff80b9b9ae>] vfs_write+0x56c/0xa9a fs/read_write.c:686
[<ffffffff80b9c2be>] ksys_write+0x126/0x228 fs/read_write.c:738
[<ffffffff80b9c42e>] __do_sys_write fs/read_write.c:749 [inline]
[<ffffffff80b9c42e>] __se_sys_write fs/read_write.c:746 [inline]
[<ffffffff80b9c42e>] __riscv_sys_write+0x6e/0x94 fs/read_write.c:746
[<ffffffff80076912>] syscall_handler+0x94/0x118 arch/riscv/include/asm/syscall.h:112
[<ffffffff8637e31e>] do_trap_ecall_u+0x396/0x530 arch/riscv/kernel/traps.c:341
[<ffffffff863a69e2>] handle_exception+0x146/0x152 arch/riscv/kernel/entry.S:197
Fixes: fa76c1674f2e ("ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().")
Reported-by: syzbot+4c2358694722d304c44e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6849b8c3.a00a0220.1eb5f5.00f0.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611193551.2999991-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
netdevsim supports netpoll. Make sure we don't call napi_complete()
from it, since it may not be scheduled. Breno reports hitting a
warning in napi_complete_done():
WARNING: CPU: 14 PID: 104 at net/core/dev.c:6592 napi_complete_done+0x2cc/0x560
__napi_poll+0x2d8/0x3a0
handle_softirqs+0x1fe/0x710
This is presumably after netpoll stole the SCHED bit prematurely.
Reported-by: Breno Leitao <leitao@debian.org>
Fixes: 3762ec05a9fb ("netdevsim: add NAPI support")
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250611174643.2769263-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Check for if ida_alloc() or rhashtable_lookup_get_insert_fast() fails.
Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Link: https://patch.msgid.link/aEmBONjyiF6z5yCV@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The veth peer device is RCU protected, but when the peer device gets
deleted (veth_dellink) then the pointer is assigned NULL (via
RCU_INIT_POINTER).
This patch adds a necessary NULL check in veth_xdp_rcv when accessing
the veth peer net_device.
This fixes a bug introduced in commit dc82a33297fc ("veth: apply qdisc
backpressure on full ptr_ring to reduce TX drops"). The bug is a race
and only triggers when having inflight packets on a veth that is being
deleted.
Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Closes: https://lore.kernel.org/all/fecfcad0-7a16-42b8-bff2-66ee83a6e5c4@linux.dev/
Reported-by: syzbot+c4c7bf27f6b0c4bd97fe@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/683da55e.a00a0220.d8eae.0052.GAE@google.com/
Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://patch.msgid.link/174964557873.519608.10855046105237280978.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
net_sched: no longer use qdisc_tree_flush_backlog()
This series is based on a report from Gerrard Tai.
Essentially, all users of qdisc_tree_flush_backlog() are racy.
We must instead use qdisc_purge_queue().
====================
Link: https://patch.msgid.link/20250611111515.1983366-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This function is no longer used after the four prior fixes.
Given all prior uses were wrong, it seems better to remove it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in ETS, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in TBF, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://patch.msgid.link/20250611111515.1983366-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in RED, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: 0c8d13ac9607 ("net: sched: red: delay destroying child qdisc on replace")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: 7b8e0b6e6599 ("net: sched: prio: delay destroying child qdiscs on change")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported that SFQ perturb_period has no range check yet,
and this can be used to trigger a race condition fixed in a separate patch.
We want to make sure ctl->perturb_period * HZ will not overflow
and is positive.
Tested:
tc qd add dev lo root sfq perturb -10 # negative value : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb 1000000000 # too big : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb 2000000 # acceptable value
tc -s -d qd sh dev lo
qdisc sfq 8005: root refcnt 2 limit 127p quantum 64Kb depth 127 flows 128 divisor 1024 perturb 2000000sec
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250611083501.1810459-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When performing a non-exact phy_caps lookup, we are looking for a
supported mode that matches as closely as possible the passed speed/duplex.
Blamed patch broke that logic by returning a match too early in case
the caller asks for half-duplex, as a full-duplex linkmode may match
first, and returned as a non-exact match without even trying to mach on
half-duplex modes.
Reported-by: Jijie Shao <shaojijie@huawei.com>
Closes: https://lore.kernel.org/netdev/20250603102500.4ec743cf@fedora/T/#m22ed60ca635c67dc7d9cbb47e8995b2beb5c1576
Tested-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Fixes: fc81e257d19f ("net: phy: phy_caps: Allow looking-up link caps based on speed and duplex")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250606094321.483602-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Only let userspace pass the same addresses that were used in KVM_SET_USER_MEMORY_REGION
(or KVM_SET_USER_MEMORY_REGION2); gpas in the the upper half of the address space
are an implementation detail of TDX and KVM.
Extracted from a patch by Sean Christopherson <seanjc@google.com>.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Bug[*] reported for TDX case when enabling KVM_PRE_FAULT_MEMORY in QEMU.
It turns out that @gpa passed to kvm_mmu_do_page_fault() doesn't have
shared bit set when the memory attribute of it is shared, and it leads
to wrong root in tdp_mmu_get_root_for_fault().
Fix it by embedding the direct bits in the gpa that is passed to
kvm_tdp_map_page(), when the memory of the gpa is not private.
[*] https://lore.kernel.org/qemu-devel/4a757796-11c2-47f1-ae0d-335626e818fd@intel.com/
Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Closes: https://lore.kernel.org/qemu-devel/4a757796-11c2-47f1-ae0d-335626e818fd@intel.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250611001018.2179964-1-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We normally can't create a new directory with the case-insensitive
option already set - except when we're creating a snapshot.
And if casefolding is enabled filesystem wide, we should still set it
even though not strictly required, for consistency.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, we only ever logged the filesystem UUID.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We have to be able to print superblock sections even if they fail to
validate (for debugging), so we have to calculate the number of entries
from the field size.
Reported-by: syzbot+5138f00559ffb3cb3610@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It seems btree node scan picked up a partially overwritten btree node,
and corrected the "bset version older than sb version_min" error -
resulting in an invalid superblock with a bad version_min field.
Don't run this check at all when we're in btree node scan, and when we
do run it, do something saner if the bset version is totally crazy.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Multiple ioctl handlers individually use a lot of stack space, and clang chooses
to inline them into the bch2_fs_ioctl() function, blowing through the warning
limit:
fs/bcachefs/chardev.c:655:6: error: stack frame size (1032) exceeds limit (1024) in 'bch2_fs_ioctl' [-Werror,-Wframe-larger-than]
655 | long bch2_fs_ioctl(struct bch_fs *c, unsigned cmd, void __user *arg)
By marking the largest two of them as noinline_for_stack, no indidual code path
ends up using this much, which avoids the warning and reduces the possible
total stack usage in the ioctl handler.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fsck_err() can return a transaction restart if passed a transaction
object - this has always been true when it has to drop locks to prompt
for user input, but we're seeing this more now that we're logging the
error being corrected in the journal.
gc_accounting_done() doesn't call fsck_err() from an actual commit loop,
and it doesn't need to be holding btree locks when it calls fsck_err(),
so the easy fix here for the unhandled transaction restart is to just
not pass it the transaction object. We'll miss out on the fancy new
logging, but that's ok.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a small leak of the superblock 'clean' section.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
PREEMPT_RT redefines how standard spinlocks work, so local_irq_save() +
spin_lock() is no longer equivalent to spin_lock_irqsave(). Fortunately,
we don't strictly need to do it that way.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a UAF: we were calling darray_make_room() and retaining a pointer to
the old buffer.
And fix an UBSAN warning: struct bch_sb_field_downgrade_entry uses
__counted_by, so set dst->nr_errors before assigning to the array entry.
Reported-by: syzbot+14c52d86ddbd89bea13e@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Object debugging generally needs special provisions for putting said
objects on the stack, which rhashtable does not have.
Reported-by: syzbot+bcc38a9556d0324c2ec2@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we think we're read-only but the VFS doesn't, fun will ensue.
And now that we know we have to be able to do this safely, just make
nochanges imply ro.
Reported-by: syzbot+a7d6ceaba099cc21dee4@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Link: https://lore.kernel.org/all/6822ab02.050a0220.f2294.00cb.GAE@google.com/T/
Reported-by: syzbot+2c3ef91c9523c3d1a25c@syzkaller.appspotmail.com
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_btree_lost_data() gets called on btree node read error, but the
error might be transient.
btree_node_scan is expensive, and there's no need to run it persistently
(marking it in the superblock as required to run) - check_topology
will run it if required, via bch2_get_scanned_nodes().
Running it non-persistently is fine, to avoid check_topology having to
rewind recovery to run it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_btree_increase_depth() was originally for disaster recovery, to get
some data back from the journal when a btree root was bad.
We don't need it for that purpose anymore; on bad btree root we'll
launch btree node scan and reconstruct all the interior nodes.
If there's a key in the journal for a depth that doesn't exists, and
it's not from check_topology/btree node scan, we should just ignore it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Invalidate pagecache after we write the new superblock and send a
uevent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It seems excessive forced btree node rewrites can cause interior btree
updates to become wedged during recovery, before we're using the write
buffer for backpointer updates.
Add more flags so we can determine where these are coming from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We had a deadlock during recovery where interior btree updates became
wedged and all open_buckets were consumed; start adding more
introspection.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Log the specific error being corrected in the journal when we're
repairing, this helps greatly with 'bcachefs list_journal' analysis.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|