Age | Commit message (Collapse) | Author |
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix regression disallowing 64K SVM migration (Maarten)
- Use a bounce buffer for WA BB (Lucas)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aEsBQoh5Si3ouPgE@fedora
|
|
Pull bitmap fix from Yury Norov:
"Fix for __GENMASK() and __GENMASK_ULL() in UAPI"
* tag 'bitmap-for-6.16-rc2' of https://github.com/norov/linux:
uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again
|
|
Currently, cached directory contents were not reused across subsequent
'ls' operations because the cache validity check relied on comparing
the ctx pointer, which changes with each readdir invocation. As a
result, the cached dir entries was not marked as valid and the cache was
not utilized for subsequent 'ls' operations.
This change uses the file pointer, which remains consistent across all
readdir calls for a given directory instance, to associate and validate
the cache. As a result, cached directory contents can now be
correctly reused, improving performance for repeated directory listings.
Performance gains with local windows SMB server:
Without the patch and default actimeo=1:
1000 directory enumeration operations on dir with 10k files took 135.0s
With this patch and actimeo=0:
1000 directory enumeration operations on dir with 10k files took just 5.1s
Signed-off-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Customer reported that one of their applications started failing to
open files with STATUS_INSUFFICIENT_RESOURCES due to NetApp server
hitting the maximum number of opens to same file that it would allow
for a single client connection.
It turned out the client was failing to reuse open handles with
deferred closes because matching ->f_flags directly without masking
off O_CREAT|O_EXCL|O_TRUNC bits first broke the comparision and then
client ended up with thousands of deferred closes to same file. Those
bits are already satisfied on the original open, so no need to check
them against existing open handles.
Reproducer:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>
#define NR_THREADS 4
#define NR_ITERATIONS 2500
#define TEST_FILE "/mnt/1/test/dir/foo"
static char buf[64];
static void *worker(void *arg)
{
int i, j;
int fd;
for (i = 0; i < NR_ITERATIONS; i++) {
fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_APPEND, 0666);
for (j = 0; j < 16; j++)
write(fd, buf, sizeof(buf));
close(fd);
}
}
int main(int argc, char *argv[])
{
pthread_t t[NR_THREADS];
int fd;
int i;
fd = open(TEST_FILE, O_WRONLY|O_CREAT|O_TRUNC, 0666);
close(fd);
memset(buf, 'a', sizeof(buf));
for (i = 0; i < NR_THREADS; i++)
pthread_create(&t[i], NULL, worker, NULL);
for (i = 0; i < NR_THREADS; i++)
pthread_join(t[i], NULL);
return 0;
}
Before patch:
$ mount.cifs //srv/share /mnt/1 -o ...
$ mkdir -p /mnt/1/test/dir
$ gcc repro.c && ./a.out
...
number of opens: 1391
After patch:
$ mount.cifs //srv/share /mnt/1 -o ...
$ mkdir -p /mnt/1/test/dir
$ gcc repro.c && ./a.out
...
number of opens: 1
Cc: linux-cifs@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: Jay Shin <jaeshin@redhat.com>
Cc: Pierguido Lambri <plambri@redhat.com>
Fixes: b8ea3b1ff544 ("smb: enable reuse of deferred file handles for write operations")
Acked-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and wireless.
Current release - regressions:
- af_unix: allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD
Current release - new code bugs:
- eth: airoha: correct enable mask for RX queues 16-31
- veth: prevent NULL pointer dereference in veth_xdp_rcv when peer
disappears under traffic
- ipv6: move fib6_config_validate() to ip6_route_add(), prevent
invalid routes
Previous releases - regressions:
- phy: phy_caps: don't skip better duplex match on non-exact match
- dsa: b53: fix untagged traffic sent via cpu tagged with VID 0
- Revert "wifi: mwifiex: Fix HT40 bandwidth issue.", it caused
transient packet loss, exact reason not fully understood, yet
Previous releases - always broken:
- net: clear the dst when BPF is changing skb protocol (IPv4 <> IPv6)
- sched: sfq: fix a potential crash on gso_skb handling
- Bluetooth: intel: improve rx buffer posting to avoid causing issues
in the firmware
- eth: intel: i40e: make reset handling robust against multiple
requests
- eth: mlx5: ensure FW pages are always allocated on the local NUMA
node, even when device is configure to 'serve' another node
- wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850,
prevent kernel crashes
- wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
for 3 sec if fw_stats_done is not set"
* tag 'net-6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
selftests: drv-net: rss_ctx: Add test for ntuple rules targeting default RSS context
net: ethtool: Don't check if RSS context exists in case of context 0
af_unix: Allow passing cred for embryo without SO_PASSCRED/SO_PASSPIDFD.
ipv6: Move fib6_config_validate() to ip6_route_add().
net: drv: netdevsim: don't napi_complete() from netpoll
net/mlx5: HWS, Add error checking to hws_bwc_rule_complex_hash_node_get()
veth: prevent NULL pointer dereference in veth_xdp_rcv
net_sched: remove qdisc_tree_flush_backlog()
net_sched: ets: fix a race in ets_qdisc_change()
net_sched: tbf: fix a race in tbf_change()
net_sched: red: fix a race in __red_change()
net_sched: prio: fix a race in prio_tune()
net_sched: sch_sfq: reject invalid perturb period
net: phy: phy_caps: Don't skip better duplex macth on non-exact match
MAINTAINERS: Update Kuniyuki Iwashima's email address.
selftests: net: add test case for NAT46 looping back dst
net: clear the dst when changing skb protocol
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
net/mlx5e: Fix leak of Geneve TLV option object
net/mlx5: HWS, make sure the uplink is the last destination
...
|
|
In case the BO is in iomem, we can't simply take the vaddr and write to
it. Instead, prepare a separate buffer that is later copied into io
memory. Right now it's just a few words that could be using
xe_map_write32(), but the intention is to grow the WA BB for other
uses.
Fixes: 617d824c5323 ("drm/xe: Add WA BB to capture active context utilization")
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250604-wa-bb-fix-v1-1-0dfc5dafcef0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit ef48715b2d3df17c060e23b9aa636af3d95652f8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Add some missing pins on the Qualcomm QCM2290, along with a managed
resources patch that make it clean and nice
- Drop an unused function in the ST Micro driver
- Drop bouncing MAINTAINER entry
- Drop of_match_ptr() macro to rid compile warnings in the TB10x
driver
- Fix up calculation of pin numbers from base in the Sunxi driver
* tag 'pinctrl-v6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: dt: Consider pin base when calculating bank number from pin
pinctrl: tb10x: Drop of_match_ptr for ID table
pinctrl: MAINTAINERS: Drop bouncing Jianlong Huang
pinctrl: st: Drop unused st_gpio_bank() function
pinctrl: qcom: pinctrl-qcm2290: Add missing pins
pinctrl: qcom: switch to devm_gpiochip_add_data()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- arch_atomic64_cmpxchg relaxed variant [Jason]
- use of inbuilt swap in stack unwinder [Yu-Chun Lin]
- use of __ASSEMBLER__ in kernel headers [Thomas Huth]
* tag 'arc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in the non-uapi headers
ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers
ARC: unwind: Use built-in sort swap to reduce code size and improve performance
ARC: atomics: Implement arch_atomic64_cmpxchg using _relaxed
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
Another quick round of updates:
- revert mwifiex HT40 that was causing issues
- many ath10k/ath11k/ath12k fixes
- re-add some iwlwifi code I lost in a merge
- use kfree_sensitive() on an error path in cfg80211
* tag 'wireless-2025-06-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: use kfree_sensitive() for connkeys cleanup
wifi: iwlwifi: fix merge damage related to iwl_pci_resume
Revert "wifi: mwifiex: Fix HT40 bandwidth issue."
wifi: ath12k: fix uaf in ath12k_core_init()
wifi: ath12k: Fix hal_reo_cmd_status kernel-doc
wifi: ath12k: fix GCC_GCC_PCIE_HOT_RST definition for WCN7850
wifi: ath11k: validate ath11k_crypto_mode on top of ath11k_core_qmi_firmware_ready
wifi: ath11k: consistently use ath11k_mac_get_fw_stats()
wifi: ath11k: move locking outside of ath11k_mac_get_fw_stats()
wifi: ath11k: adjust unlock sequence in ath11k_update_stats_event()
wifi: ath11k: move some firmware stats related functions outside of debugfs
wifi: ath11k: don't wait when there is no vdev started
wifi: ath11k: don't use static variables in ath11k_debugfs_fw_stats_process()
wifi: ath11k: avoid burning CPU in ath11k_debugfs_fw_stats_request()
wil6210: fix support for sparrow chipsets
wifi: ath10k: Avoid vdev delete timeout when firmware is already down
ath10k: snoc: fix unbalanced IRQ enable in crash recovery
====================
Link: https://patch.msgid.link/20250612082519.11447-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gal Pressman says:
====================
Fix ntuple rules targeting default RSS
This series addresses a regression in ethtool flow steering where rules
targeting the default RSS context (context 0) were incorrectly rejected.
The default RSS context always exists but is not stored in the rss_ctx
xarray like additional contexts. The current validation logic was
checking for the existence of context 0 in this array, causing valid
flow steering rules to be rejected.
This prevented configurations such as:
- High priority rules directing specific traffic to the default context
- Low priority catch-all rules directing remaining traffic to additional
contexts
Patch 1 fixes the validation logic to skip the existence check for
context 0.
Patch 2 adds a selftest that verifies this behavior.
v3: https://lore.kernel.org/20250609120250.1630125-1-gal@nvidia.com
v2: https://lore.kernel.org/20250225071348.509432-1-gal@nvidia.com
====================
Link: https://patch.msgid.link/20250612071958.1696361-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
context
Add test_rss_default_context_rule() to verify that ntuple rules can
correctly direct traffic to the default RSS context (context 0).
The test creates two ntuple rules with explicit location priorities:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This validates that:
1. Rules targeting the default context function properly.
2. Traffic steering works as expected when mixing default and
additional RSS contexts.
The test was written by AI, and reviewed by humans.
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250612071958.1696361-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Context 0 (default context) always exists, there is no need to check
whether it exists or not when adding a flow steering rule.
The existing check fails when creating a flow steering rule for context
0 as it is not stored in the rss_ctx xarray.
For example:
$ ethtool --config-ntuple eth2 flow-type tcp4 dst-ip 194.237.147.23 dst-port 19983 context 0 loc 618
rmgr: Cannot insert RX class rule: Invalid argument
Cannot insert classification rule
An example usecase for this could be:
- A high-priority rule (loc 0) directing specific port traffic to
context 0.
- A low-priority rule (loc 1) directing all other TCP traffic to context
1.
This is a user-visible regression that was caught in our testing
environment, it was not reported by a user yet.
Fixes: de7f7582dff2 ("net: ethtool: prevent flow steering to RSS contexts which don't exist")
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250612071958.1696361-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- eir: Fix NULL pointer deference on eir_get_service_data
- eir: Fix possible crashes on eir_create_adv_data
- hci_sync: Fix broadcast/PA when using an existing instance
- ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
- ISO: Fix not using bc_sid as advertisement SID
- MGMT: Fix sparse errors
* tag 'for-net-2025-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: MGMT: Fix sparse errors
Bluetooth: ISO: Fix not using bc_sid as advertisement SID
Bluetooth: ISO: Fix using BT_SK_PA_SYNC to detect BIS sockets
Bluetooth: eir: Fix possible crashes on eir_create_adv_data
Bluetooth: hci_sync: Fix broadcast/PA when using an existing instance
Bluetooth: Fix NULL pointer deference on eir_get_service_data
====================
Link: https://patch.msgid.link/20250611204944.1559356-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Before the cited commit, the kernel unconditionally embedded SCM
credentials to skb for embryo sockets even when both the sender
and listener disabled SO_PASSCRED and SO_PASSPIDFD.
Now, the credentials are added to skb only when configured by the
sender or the listener.
However, as reported in the link below, it caused a regression for
some programs that assume credentials are included in every skb,
but sometimes not now.
The only problematic scenario would be that a socket starts listening
before setting the option. Then, there will be 2 types of non-small
race window, where a client can send skb without credentials, which
the peer receives as an "invalid" message (and aborts the connection
it seems ?):
Client Server
------ ------
s1.listen() <-- No SO_PASS{CRED,PIDFD}
s2.connect()
s2.send() <-- w/o cred
s1.setsockopt(SO_PASS{CRED,PIDFD})
s2.send() <-- w/ cred
or
Client Server
------ ------
s1.listen() <-- No SO_PASS{CRED,PIDFD}
s2.connect()
s2.send() <-- w/o cred
s3, _ = s1.accept() <-- Inherit cred options
s2.send() <-- w/o cred but not set yet
s3.setsockopt(SO_PASS{CRED,PIDFD})
s2.send() <-- w/ cred
It's unfortunate that buggy programs depend on the behaviour,
but let's restore the previous behaviour.
Fixes: 3f84d577b79d ("af_unix: Inherit sk_flags at connect().")
Reported-by: Jacek Łuczak <difrost.kernel@gmail.com>
Closes: https://lore.kernel.org/all/68d38b0b-1666-4974-85d4-15575789c8d4@gmail.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Tested-by: Christian Heusel <christian@heusel.eu>
Tested-by: André Almeida <andrealmeid@igalia.com>
Tested-by: Jacek Łuczak <difrost.kernel@gmail.com>
Link: https://patch.msgid.link/20250611202758.3075858-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller created an IPv6 route from a malformed packet, which has
a prefix len > 128, triggering the splat below. [0]
This is a similar issue fixed by commit 586ceac9acb7 ("ipv6: Restore
fib6_config validation for SIOCADDRT.").
The cited commit removed fib6_config validation from some callers
of ip6_add_route().
Let's move the validation back to ip6_route_add() and
ip6_route_multipath_add().
[0]:
UBSAN: array-index-out-of-bounds in ./include/net/ipv6.h:616:34
index 20 is out of range for type '__u8 [16]'
CPU: 1 UID: 0 PID: 7444 Comm: syz.0.708 Not tainted 6.16.0-rc1-syzkaller-g19272b37aa4f #0 PREEMPT
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80078a80>] dump_backtrace+0x2e/0x3c arch/riscv/kernel/stacktrace.c:132
[<ffffffff8000327a>] show_stack+0x30/0x3c arch/riscv/kernel/stacktrace.c:138
[<ffffffff80061012>] __dump_stack lib/dump_stack.c:94 [inline]
[<ffffffff80061012>] dump_stack_lvl+0x12e/0x1a6 lib/dump_stack.c:120
[<ffffffff800610a6>] dump_stack+0x1c/0x24 lib/dump_stack.c:129
[<ffffffff8001c0ea>] ubsan_epilogue+0x14/0x46 lib/ubsan.c:233
[<ffffffff819ba290>] __ubsan_handle_out_of_bounds+0xf6/0xf8 lib/ubsan.c:455
[<ffffffff85b363a4>] ipv6_addr_prefix include/net/ipv6.h:616 [inline]
[<ffffffff85b363a4>] ip6_route_info_create+0x8f8/0x96e net/ipv6/route.c:3793
[<ffffffff85b635da>] ip6_route_add+0x2a/0x1aa net/ipv6/route.c:3889
[<ffffffff85b02e08>] addrconf_prefix_route+0x2c4/0x4e8 net/ipv6/addrconf.c:2487
[<ffffffff85b23bb2>] addrconf_prefix_rcv+0x1720/0x1e62 net/ipv6/addrconf.c:2878
[<ffffffff85b92664>] ndisc_router_discovery+0x1a06/0x3504 net/ipv6/ndisc.c:1570
[<ffffffff85b99038>] ndisc_rcv+0x500/0x600 net/ipv6/ndisc.c:1874
[<ffffffff85bc2c18>] icmpv6_rcv+0x145e/0x1e0a net/ipv6/icmp.c:988
[<ffffffff85af6798>] ip6_protocol_deliver_rcu+0x18a/0x1976 net/ipv6/ip6_input.c:436
[<ffffffff85af8078>] ip6_input_finish+0xf4/0x174 net/ipv6/ip6_input.c:480
[<ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<ffffffff85af8262>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<ffffffff85af8262>] ip6_input+0x16a/0x70c net/ipv6/ip6_input.c:491
[<ffffffff85af8dcc>] ip6_mc_input+0x5c8/0x1268 net/ipv6/ip6_input.c:588
[<ffffffff85af6112>] dst_input include/net/dst.h:469 [inline]
[<ffffffff85af6112>] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
[<ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:317 [inline]
[<ffffffff85af6112>] NF_HOOK include/linux/netfilter.h:311 [inline]
[<ffffffff85af6112>] ipv6_rcv+0x5ae/0x6e0 net/ipv6/ip6_input.c:309
[<ffffffff85087e84>] __netif_receive_skb_one_core+0x106/0x16e net/core/dev.c:5977
[<ffffffff85088104>] __netif_receive_skb+0x2c/0x144 net/core/dev.c:6090
[<ffffffff850883c6>] netif_receive_skb_internal net/core/dev.c:6176 [inline]
[<ffffffff850883c6>] netif_receive_skb+0x1aa/0xbf2 net/core/dev.c:6235
[<ffffffff8328656e>] tun_rx_batched.isra.0+0x430/0x686 drivers/net/tun.c:1485
[<ffffffff8329ed3a>] tun_get_user+0x2952/0x3d6c drivers/net/tun.c:1938
[<ffffffff832a21e0>] tun_chr_write_iter+0xc4/0x21c drivers/net/tun.c:1984
[<ffffffff80b9b9ae>] new_sync_write fs/read_write.c:593 [inline]
[<ffffffff80b9b9ae>] vfs_write+0x56c/0xa9a fs/read_write.c:686
[<ffffffff80b9c2be>] ksys_write+0x126/0x228 fs/read_write.c:738
[<ffffffff80b9c42e>] __do_sys_write fs/read_write.c:749 [inline]
[<ffffffff80b9c42e>] __se_sys_write fs/read_write.c:746 [inline]
[<ffffffff80b9c42e>] __riscv_sys_write+0x6e/0x94 fs/read_write.c:746
[<ffffffff80076912>] syscall_handler+0x94/0x118 arch/riscv/include/asm/syscall.h:112
[<ffffffff8637e31e>] do_trap_ecall_u+0x396/0x530 arch/riscv/kernel/traps.c:341
[<ffffffff863a69e2>] handle_exception+0x146/0x152 arch/riscv/kernel/entry.S:197
Fixes: fa76c1674f2e ("ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config().")
Reported-by: syzbot+4c2358694722d304c44e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6849b8c3.a00a0220.1eb5f5.00f0.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611193551.2999991-1-kuni1840@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
netdevsim supports netpoll. Make sure we don't call napi_complete()
from it, since it may not be scheduled. Breno reports hitting a
warning in napi_complete_done():
WARNING: CPU: 14 PID: 104 at net/core/dev.c:6592 napi_complete_done+0x2cc/0x560
__napi_poll+0x2d8/0x3a0
handle_softirqs+0x1fe/0x710
This is presumably after netpoll stole the SCHED bit prematurely.
Reported-by: Breno Leitao <leitao@debian.org>
Fixes: 3762ec05a9fb ("netdevsim: add NAPI support")
Tested-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20250611174643.2769263-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Check for if ida_alloc() or rhashtable_lookup_get_insert_fast() fails.
Fixes: 17e0accac577 ("net/mlx5: HWS, support complex matchers")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Link: https://patch.msgid.link/aEmBONjyiF6z5yCV@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The veth peer device is RCU protected, but when the peer device gets
deleted (veth_dellink) then the pointer is assigned NULL (via
RCU_INIT_POINTER).
This patch adds a necessary NULL check in veth_xdp_rcv when accessing
the veth peer net_device.
This fixes a bug introduced in commit dc82a33297fc ("veth: apply qdisc
backpressure on full ptr_ring to reduce TX drops"). The bug is a race
and only triggers when having inflight packets on a veth that is being
deleted.
Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Closes: https://lore.kernel.org/all/fecfcad0-7a16-42b8-bff2-66ee83a6e5c4@linux.dev/
Reported-by: syzbot+c4c7bf27f6b0c4bd97fe@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/683da55e.a00a0220.d8eae.0052.GAE@google.com/
Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://patch.msgid.link/174964557873.519608.10855046105237280978.stgit@firesoul
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
net_sched: no longer use qdisc_tree_flush_backlog()
This series is based on a report from Gerrard Tai.
Essentially, all users of qdisc_tree_flush_backlog() are racy.
We must instead use qdisc_purge_queue().
====================
Link: https://patch.msgid.link/20250611111515.1983366-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This function is no longer used after the four prior fixes.
Given all prior uses were wrong, it seems better to remove it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in ETS, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in TBF, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: b05972f01e7d ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://patch.msgid.link/20250611111515.1983366-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in RED, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: 0c8d13ac9607 ("net: sched: red: delay destroying child qdisc on replace")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported a race condition in PRIO, whenever SFQ perturb timer
fires at the wrong time.
The race is as follows:
CPU 0 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
|
| [5]: lock root
| [6]: rehash
| [7]: qdisc_tree_reduce_backlog()
|
[4]: qdisc_put()
This can be abused to underflow a parent's qlen.
Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.
Fixes: 7b8e0b6e6599 ("net: sched: prio: delay destroying child qdiscs on change")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Gerrard Tai reported that SFQ perturb_period has no range check yet,
and this can be used to trigger a race condition fixed in a separate patch.
We want to make sure ctl->perturb_period * HZ will not overflow
and is positive.
Tested:
tc qd add dev lo root sfq perturb -10 # negative value : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb 1000000000 # too big : error
Error: sch_sfq: invalid perturb period.
tc qd add dev lo root sfq perturb 2000000 # acceptable value
tc -s -d qd sh dev lo
qdisc sfq 8005: root refcnt 2 limit 127p quantum 64Kb depth 127 flows 128 divisor 1024 perturb 2000000sec
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250611083501.1810459-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When performing a non-exact phy_caps lookup, we are looking for a
supported mode that matches as closely as possible the passed speed/duplex.
Blamed patch broke that logic by returning a match too early in case
the caller asks for half-duplex, as a full-duplex linkmode may match
first, and returned as a non-exact match without even trying to mach on
half-duplex modes.
Reported-by: Jijie Shao <shaojijie@huawei.com>
Closes: https://lore.kernel.org/netdev/20250603102500.4ec743cf@fedora/T/#m22ed60ca635c67dc7d9cbb47e8995b2beb5c1576
Tested-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Fixes: fc81e257d19f ("net: phy: phy_caps: Allow looking-up link caps based on speed and duplex")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20250606094321.483602-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sqpoll thread is dereferenced with rcu read protection in one place,
so it needs to be annotated as an __rcu type, and should consistently
use rcu helpers for access and assignment to make sparse happy.
Since most of the accesses occur under the sqd->lock, we can use
rcu_dereference_protected() without declaring an rcu read section.
Provide a simple helper to get the thread from a locked context.
Fixes: ac0b8b327a5677d ("io_uring: fix use-after-free of sq->thread in __io_uring_show_fdinfo()")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20250611205343.1821117-1-kbusch@meta.com
[axboe: fold in fix for register.c]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This function takes super_lock in shared mode, so it should release the
same lock.
Cc: stable@vger.kernel.org # v6.16-rc1
Fixes: af7551cf13cf7f ("super: remove pointless s_root checks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Link: https://lore.kernel.org/20250611164044.GF6138@frogsfrogsfrogs
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We want to print the name in case of mkdir failure and now we will
get a cryptic (efault) as name.
Fixes: c54b386969a5 ("VFS: Change vfs_mkdir() to return the dentry.")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250612072245.2825938-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Merge CPUFreq fixes for 6.16-rc from Viresh Kumar:
"- Implement CpuId rust abstraction and use it to fix doctest failure
(Viresh Kumar).
- Minor cleanups in the `# Safety` sections for cpufreq abstractions
(Viresh Kumar)."
* tag 'cpufreq-arm-fixes-6.16-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
rust: cpu: Add CpuId::current() to retrieve current CPU ID
rust: Use CpuId in place of raw CPU numbers
rust: cpu: Introduce CpuId abstraction
cpufreq: Convert `/// SAFETY` lines to `# Safety` sections
|
|
After commit a934a57a42f64a4 ("scripts/misc-check: check missing #include
<linux/export.h> when W=1") and 7d95680d64ac8e836c ("scripts/misc-check:
check unnecessary #include <linux/export.h> when W=1"), we get some build
warnings with W=1:
init/main.c: warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing
init/initramfs.c: warning: EXPORT_SYMBOL() is used, but #include <linux/export.h> is missing
So fix these build warnings for the init code.
Link: https://lkml.kernel.org/r/20250608141235.155206-1-chenhuacai@loongson.cn
Fixes: a934a57a42f6 ("scripts/misc-check: check missing #include <linux/export.h> when W=1")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
I have been actively contributing to mTHP and reviewing related patches
for an extended period, and I would like to continue supporting patch
reviews.
Link: https://lkml.kernel.org/r/20250609002442.1856-1-21cnbao@gmail.com
Signed-off-by: Barry Song <baohua@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Dev Jain <dev.jain@arm.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In
riocm_cdev_ioctl(RIO_CM_CHAN_SEND)
-> cm_chan_msg_send()
-> riocm_ch_send()
cm_chan_msg_send() checks that userspace didn't send too much data but
riocm_ch_send() failed to check that userspace sent sufficient data. The
result is that riocm_ch_send() can write to fields in the rio_ch_chan_hdr
which were outside the bounds of the space which cm_chan_msg_send()
allocated.
Address this by teaching riocm_ch_send() to check that the entire
rio_ch_chan_hdr was copied in from userspace.
Reported-by: maher azz <maherazz04@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a
parallel reclaim leaving stale TLB entries") described a theoretical race
as such:
"""
Nadav Amit identified a theoretical race between page reclaim and mprotect
due to TLB flushes being batched outside of the PTL being held.
He described the race as follows:
CPU0 CPU1
---- ----
user accesses memory using RW PTE
[PTE now cached in TLB]
try_to_unmap_one()
==> ptep_get_and_clear()
==> set_tlb_ubc_flush_pending()
mprotect(addr, PROT_READ)
==> change_pte_range()
==> [ PTE non-present - no flush ]
user writes using cached RW PTE
...
try_to_unmap_flush()
The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such as
munmap, mremap and madvise.
"""
The solution was to introduce flush_tlb_batched_pending() and call it
under the PTL from mprotect/madvise/munmap/mremap to complete any pending
tlb flushes.
However, while madvise_free_pte_range() and
madvise_cold_or_pageout_pte_range() were both retro-fitted to call
flush_tlb_batched_pending() immediately after initially acquiring the PTL,
they both temporarily release the PTL to split a large folio if they
stumble upon one. In this case, where re-acquiring the PTL
flush_tlb_batched_pending() must be called again, but it previously was
not. Let's fix that.
There are 2 Fixes: tags here: the first is the commit that fixed
madvise_free_pte_range(). The second is the commit that added
madvise_cold_or_pageout_pte_range(), which looks like it copy/pasted the
faulty pattern from madvise_free_pte_range().
This is a theoretical bug discovered during code review.
Link: https://lkml.kernel.org/r/20250606092809.4194056-1-ryan.roberts@arm.com
Fixes: 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries")
Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
While an OOM failure in commit_merge() isn't really feasible due to the
allocation which might fail (a maple tree pre-allocation) being 'too small
to fail', we do need to handle this case correctly regardless.
In vma_merge_existing_range(), we can theoretically encounter failures
which result in an OOM error in two ways - firstly dup_anon_vma() might
fail with an OOM error, and secondly commit_merge() failing, ultimately,
to pre-allocate a maple tree node.
The abort logic for dup_anon_vma() resets the VMA iterator to the initial
range, ensuring that any logic looping on this iterator will correctly
proceed to the next VMA.
However the commit_merge() abort logic does not do the same thing. This
resulted in a syzbot report occurring because mlockall() iterates through
VMAs, is tolerant of errors, but ended up with an incorrect previous VMA
being specified due to incorrect iterator state.
While making this change, it became apparent we are duplicating logic -
the logic introduced in commit 41e6ddcaa0f1 ("mm/vma: add give_up_on_oom
option on modify/merge, use in uffd release") duplicates the
vmg->give_up_on_oom check in both abort branches.
Additionally, we observe that we can perform the anon_dup check safely on
dup_anon_vma() failure, as this will not be modified should this call
fail.
Finally, we need to reset the iterator in both cases, so now we can simply
use the exact same code to abort for both.
We remove the VM_WARN_ON(err != -ENOMEM) as it would be silly for this to
be otherwise and it allows us to implement the abort check more neatly.
Link: https://lkml.kernel.org/r/20250606125032.164249-1-lorenzo.stoakes@oracle.com
Fixes: 47b16d0462a4 ("mm: abort vma_modify() on merge out of memory failure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: syzbot+d16409ea9ecc16ed261a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/6842cc67.a00a0220.29ac89.003b.GAE@google.com/
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove outdated VM_DENYWRITE("dw") reference and add missing
VM_LOCKONFAULT("lf") and VM_UFFD_MINOR("ui") flags.
[akpm@linux-foundation.org: add "dp" (VM_DROPPABLE), per Tal]
Link: https://lkml.kernel.org/r/20250607153614.81914-1-wangfushuai@baidu.com
Signed-off-by: wangfushuai <wangfushuai@baidu.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mariano Pache <npache@redhat.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Tal Zussman <tz2294@columbia.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Using "@argname@" in kernel-doc produces "argname****" (with "argname" in
bold) in the generated html output, so use the expected kernel-doc
notation of just "@argname" instead.
"Fixes:" lines are added in case Matthew's patch [1] is backported.
Link: https://lkml.kernel.org/r/20250605002337.2842659-1-rdunlap@infradead.org
Link: https://lore.kernel.org/linux-doc/3bc4e779-7a79-42c1-8867-024f643a22fc@infradead.org/T/#m5d2bd9d21fb34f297aa4e7db069f09bc27b89007 [1]
Fixes: 0db9299f48eb ("SG: Move functions to lib/scatterlist.c and add sg chaining allocator helpers")
Fixes: 8d1d4b538bb1 ("scatterlist: inline sg_next()")
Fixes: 18dabf473e15 ("Change table chaining layout")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Unlike the other cases gup_longterm's memfd tests previously skipped the
test when failing to set up the file descriptor to test. Restore this
behavior to avoid hitting failures when hugetlb isn't configured.
Link: https://lkml.kernel.org/r/20250605-selftest-mm-gup-longterm-tweaks-v1-1-2fae34b05958@kernel.org
Fixes: 66bce7afbaca ("selftests/mm: fix test result reporting in gup_longterm")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reported-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Closes: https://lkml.kernel.org/r/a76fc252-0fe3-4d4b-a9a1-4a2895c2680d@lucifer.local
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce `CpuId::current()`, a constructor that wraps the C function
`raw_smp_processor_id()` to retrieve the current CPU identifier without
guaranteeing stability.
This function should be used only when the caller can ensure that
the CPU ID won't change unexpectedly due to preemption or migration.
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
|
|
Use the newly defined `CpuId` abstraction instead of raw CPU numbers.
This also fixes a doctest failure for configurations where `nr_cpu_ids <
4`.
The C `cpumask_{set|clear}_cpu()` APIs emit a warning when given an
invalid CPU number — but only if `CONFIG_DEBUG_PER_CPU_MAPS=y` is set.
Meanwhile, `cpumask_weight()` only considers CPUs up to `nr_cpu_ids`,
which can cause inconsistencies: a CPU number greater than `nr_cpu_ids`
may be set in the mask, yet the weight calculation won't reflect it.
This leads to doctest failures when `nr_cpu_ids < 4`, as the test tries
to set CPUs 2 and 3:
rust_doctest_kernel_cpumask_rs_0.location: rust/kernel/cpumask.rs:180
rust_doctest_kernel_cpumask_rs_0: ASSERTION FAILED at rust/kernel/cpumask.rs:190
Fixes: 8961b8cb3099 ("rust: cpumask: Add initial abstractions")
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://lore.kernel.org/rust-for-linux/CANiq72k3ozKkLMinTLQwvkyg9K=BeRxs1oYZSKhJHY-veEyZdg@mail.gmail.com/
Reported-by: Andreas Hindborg <a.hindborg@kernel.org>
Closes: https://lore.kernel.org/all/87qzzy3ric.fsf@kernel.org/
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
|
|
Only let userspace pass the same addresses that were used in KVM_SET_USER_MEMORY_REGION
(or KVM_SET_USER_MEMORY_REGION2); gpas in the the upper half of the address space
are an implementation detail of TDX and KVM.
Extracted from a patch by Sean Christopherson <seanjc@google.com>.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Bug[*] reported for TDX case when enabling KVM_PRE_FAULT_MEMORY in QEMU.
It turns out that @gpa passed to kvm_mmu_do_page_fault() doesn't have
shared bit set when the memory attribute of it is shared, and it leads
to wrong root in tdp_mmu_get_root_for_fault().
Fix it by embedding the direct bits in the gpa that is passed to
kvm_tdp_map_page(), when the memory of the gpa is not private.
[*] https://lore.kernel.org/qemu-devel/4a757796-11c2-47f1-ae0d-335626e818fd@intel.com/
Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Closes: https://lore.kernel.org/qemu-devel/4a757796-11c2-47f1-ae0d-335626e818fd@intel.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20250611001018.2179964-1-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We normally can't create a new directory with the case-insensitive
option already set - except when we're creating a snapshot.
And if casefolding is enabled filesystem wide, we should still set it
even though not strictly required, for consistency.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, we only ever logged the filesystem UUID.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We have to be able to print superblock sections even if they fail to
validate (for debugging), so we have to calculate the number of entries
from the field size.
Reported-by: syzbot+5138f00559ffb3cb3610@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It seems btree node scan picked up a partially overwritten btree node,
and corrected the "bset version older than sb version_min" error -
resulting in an invalid superblock with a bad version_min field.
Don't run this check at all when we're in btree node scan, and when we
do run it, do something saner if the bset version is totally crazy.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Multiple ioctl handlers individually use a lot of stack space, and clang chooses
to inline them into the bch2_fs_ioctl() function, blowing through the warning
limit:
fs/bcachefs/chardev.c:655:6: error: stack frame size (1032) exceeds limit (1024) in 'bch2_fs_ioctl' [-Werror,-Wframe-larger-than]
655 | long bch2_fs_ioctl(struct bch_fs *c, unsigned cmd, void __user *arg)
By marking the largest two of them as noinline_for_stack, no indidual code path
ends up using this much, which avoids the warning and reduces the possible
total stack usage in the ioctl handler.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fsck_err() can return a transaction restart if passed a transaction
object - this has always been true when it has to drop locks to prompt
for user input, but we're seeing this more now that we're logging the
error being corrected in the journal.
gc_accounting_done() doesn't call fsck_err() from an actual commit loop,
and it doesn't need to be holding btree locks when it calls fsck_err(),
so the easy fix here for the unhandled transaction restart is to just
not pass it the transaction object. We'll miss out on the fancy new
logging, but that's ok.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a small leak of the superblock 'clean' section.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|