Age | Commit message (Collapse) | Author |
|
Currently, in-kernel PM specific helpers are prefixed with
'mptcp_pm_nl_'. But here 'mptcp_pm_nl_mp_prio_send_ack()' is not
specific to this PM: it is used by both the in-kernel and userspace PMs.
To avoid confusions, the '_nl' bit has been removed from the name.
No behavioural changes intended.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250307-net-next-mptcp-pm-reorg-v1-3-abef20ada03b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, in-kernel PM specific helpers are prefixed with
'mptcp_pm_nl_'. But here 'mptcp_pm_nl_addr_send_ack()' is not specific
to this PM: it is used by both the in-kernel and userspace PMs.
To avoid confusions, the '_nl' bit has been removed from the name.
No behavioural changes intended.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250307-net-next-mptcp-pm-reorg-v1-2-abef20ada03b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The following code in mptcp_userspace_pm_get_local_id() that assigns "skc"
to "new_entry" is not allowed in BPF if we use the same code to implement
the get_local_id() interface of a BFP path manager:
memset(&new_entry, 0, sizeof(struct mptcp_pm_addr_entry));
new_entry.addr = *skc;
new_entry.addr.id = 0;
new_entry.flags = MPTCP_PM_ADDR_FLAG_IMPLICIT;
To solve the issue, this patch moves this assignment to "new_entry" forward
to mptcp_pm_get_local_id(), and then passing "new_entry" as a parameter to
both mptcp_pm_nl_get_local_id() and mptcp_userspace_pm_get_local_id().
No behavioural changes intended.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250307-net-next-mptcp-pm-reorg-v1-1-abef20ada03b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Martin has left AMD and no longer works on the sfc driver.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20250307154731.211368-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Taehee Yoo says:
====================
eth: bnxt: fix several bugs in the bnxt module
The first fixes setting incorrect skb->truesize.
When xdp-mb prog returns XDP_PASS, skb is allocated and initialized.
Currently, The truesize is calculated as BNXT_RX_PAGE_SIZE *
sinfo->nr_frags, but sinfo->nr_frags is flushed by napi_build_skb().
So, it stores sinfo before calling napi_build_skb() and then use it
for calculate truesize.
The second fixes kernel panic in the bnxt_queue_mem_alloc().
The bnxt_queue_mem_alloc() accesses rx ring descriptor.
rx ring descriptors are allocated when the interface is up and it's
freed when the interface is down.
So, if bnxt_queue_mem_alloc() is called when the interface is down,
kernel panic occurs.
This patch makes the bnxt_queue_mem_alloc() return -ENETDOWN if rx ring
descriptors are not allocated.
The third patch fixes kernel panic in the bnxt_queue_{start | stop}().
When a queue is restarted bnxt_queue_{start | stop}() are called.
These functions set MRU to 0 to stop packet flow and then to set up the
remaining things.
MRU variable is a member of vnic_info[] the first vnic_info is for
default and the second is for ntuple.
The first vnic_info is always allocated when interface is up, but the
second is allocated only when ntuple is enabled.
(ethtool -K eth0 ntuple <on | off>).
Currently, the bnxt_queue_{start | stop}() access
vnic_info[BNXT_VNIC_NTUPLE] regardless of whether ntuple is enabled or
not.
So kernel panic occurs.
This patch make the bnxt_queue_{start | stop}() use bp->nr_vnics instead
of BNXT_VNIC_NTUPLE.
The fourth patch fixes a warning due to checksum state.
The bnxt_rx_pkt() checks whether skb->ip_summed is not CHECKSUM_NONE
before updating ip_summed. if ip_summed is not CHECKSUM_NONE, it WARNS
about it. However, the bnxt_xdp_build_skb() is called in XDP-MB-PASS
path and it updates ip_summed earlier than bnxt_rx_pkt().
So, in the XDP-MB-PASS path, the bnxt_rx_pkt() always warns about
checksum.
Updating ip_summed at the bnxt_xdp_build_skb() is unnecessary and
duplicate, so it is removed.
The fifth patch fixes a kernel panic in the
bnxt_get_queue_stats{rx | tx}().
The bnxt_get_queue_stats{rx | tx}() callback functions are called when
a queue is resetting.
These internally access rx and tx rings without null check, but rings
are allocated and initialized when the interface is up.
So, these functions are called when the interface is down, it
occurs a kernel panic.
The sixth patch fixes memory leak in queue reset logic.
When a queue is resetting, tpa_info is allocated for the new queue and
tpa_info for an old queue is not used anymore.
So it should be freed, but not.
The seventh patch makes net_devmem_unbind_dmabuf() ignore -ENETDOWN.
When devmem socket is closed, net_devmem_unbind_dmabuf() is called to
unbind/release resources.
If interface is down, the driver returns -ENETDOWN.
The -ENETDOWN return value is not an actual error, because the interface
will release resources when the interface is down.
So, net_devmem_unbind_dmabuf() needs to ignore -ENETDOWN.
The last patch adds XDP testcases to
tools/testing/selftests/drivers/net/ping.py.
====================
Link: https://patch.msgid.link/20250309134219.91670-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ping.py has 3 cases, test_v4, test_v6 and test_tcp.
But these cases are not executed on the XDP environment.
So, it adds XDP environment, existing tests(test_v4, test_v6, and
test_tcp) are executed too on the below XDP environment.
So, it adds XDP cases.
1. xdp-generic + single-buffer
2. xdp-generic + multi-buffer
3. xdp-native + single-buffer
4. xdp-native + multi-buffer
5. xdp-offload
It also makes test_{v4 | v6 | tcp} sending large size packets. this may
help to check whether multi-buffer is working or not.
Note that the physical interface may be down and then up when xdp is
attached or detached.
This takes some period to activate traffic. So sleep(10) is
added if the test interface is the physical interface.
netdevsim and veth type interfaces skip sleep.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250309134219.91670-9-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When devmem socket is closed, netdev_rx_queue_restart() is called to
reset queue by the net_devmem_unbind_dmabuf(). But callback may return
-ENETDOWN if the interface is down because queues are already freed
when the interface is down so queue reset is not needed.
So, it should not warn if the return value is -ENETDOWN.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250309134219.91670-8-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the queue is reset, the bnxt_alloc_one_tpa_info() is called to
allocate tpa_info for the new queue.
And then the old queue's tpa_info should be removed by the
bnxt_free_one_tpa_info(), but it is not called.
So memory leak occurs.
It adds the bnxt_free_one_tpa_info() in the bnxt_queue_mem_free().
unreferenced object 0xffff888293cc0000 (size 16384):
comm "ncdevmem", pid 2076, jiffies 4296604081
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 40 75 78 93 82 88 ff ff ........@ux.....
40 75 78 93 02 00 00 00 00 00 00 00 00 00 00 00 @ux.............
backtrace (crc 5d7d4798):
___kmalloc_large_node+0x10d/0x1b0
__kmalloc_large_node_noprof+0x17/0x60
__kmalloc_noprof+0x3f6/0x520
bnxt_alloc_one_tpa_info+0x5f/0x300 [bnxt_en]
bnxt_queue_mem_alloc+0x8e8/0x14f0 [bnxt_en]
netdev_rx_queue_restart+0x233/0x620
net_devmem_bind_dmabuf_to_queue+0x2a3/0x600
netdev_nl_bind_rx_doit+0xc00/0x10a0
genl_family_rcv_msg_doit+0x1d4/0x2b0
genl_rcv_msg+0x3fb/0x6c0
netlink_rcv_skb+0x12c/0x360
genl_rcv+0x24/0x40
netlink_unicast+0x447/0x710
netlink_sendmsg+0x712/0xbc0
__sys_sendto+0x3fd/0x4d0
__x64_sys_sendto+0xdc/0x1b0
Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250309134219.91670-7-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When qstats-get operation is executed, callbacks of netdev_stats_ops
are called. The bnxt_get_queue_stats{rx | tx} collect per-queue stats
from sw_stats in the rings.
But {rx | tx | cp}_ring are allocated when the interface is up.
So, these rings are not allocated when the interface is down.
The qstats-get is allowed even if the interface is down. However,
the bnxt_get_queue_stats{rx | tx}() accesses cp_ring and tx_ring
without null check.
So, it needs to avoid accessing rings if the interface is down.
Reproducer:
ip link set $interface down
./cli.py --spec netdev.yaml --dump qstats-get
OR
ip link set $interface down
python ./stats.py
Splat looks like:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1680fa067 P4D 1680fa067 PUD 16be3b067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 UID: 0 PID: 1495 Comm: python3 Not tainted 6.14.0-rc4+ #32 5cd0f999d5a15c574ac72b3e4b907341
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
RIP: 0010:bnxt_get_queue_stats_rx+0xf/0x70 [bnxt_en]
Code: c6 87 b5 18 00 00 02 eb a2 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 01
RSP: 0018:ffffabef43cdb7e0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffffc04c8710 RCX: 0000000000000000
RDX: ffffabef43cdb858 RSI: 0000000000000000 RDI: ffff8d504e850000
RBP: ffff8d506c9f9c00 R08: 0000000000000004 R09: ffff8d506bcd901c
R10: 0000000000000015 R11: ffff8d506bcd9000 R12: 0000000000000000
R13: ffffabef43cdb8c0 R14: ffff8d504e850000 R15: 0000000000000000
FS: 00007f2c5462b080(0000) GS:ffff8d575f600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000167fd0000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x15a/0x460
? sched_balance_find_src_group+0x58d/0xd10
? exc_page_fault+0x6e/0x180
? asm_exc_page_fault+0x22/0x30
? bnxt_get_queue_stats_rx+0xf/0x70 [bnxt_en cdd546fd48563c280cfd30e9647efa420db07bf1]
netdev_nl_stats_by_netdev+0x2b1/0x4e0
? xas_load+0x9/0xb0
? xas_find+0x183/0x1d0
? xa_find+0x8b/0xe0
netdev_nl_qstats_get_dumpit+0xbf/0x1e0
genl_dumpit+0x31/0x90
netlink_dump+0x1a8/0x360
Fixes: af7b3b4adda5 ("eth: bnxt: support per-queue statistics")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250309134219.91670-6-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bnxt_rx_pkt() updates ip_summed value at the end if checksum offload
is enabled.
When the XDP-MB program is attached and it returns XDP_PASS, the
bnxt_xdp_build_skb() is called to update skb_shared_info.
The main purpose of bnxt_xdp_build_skb() is to update skb_shared_info,
but it updates ip_summed value too if checksum offload is enabled.
This is actually duplicate work.
When the bnxt_rx_pkt() updates ip_summed value, it checks if ip_summed
is CHECKSUM_NONE or not.
It means that ip_summed should be CHECKSUM_NONE at this moment.
But ip_summed may already be updated to CHECKSUM_UNNECESSARY in the
XDP-MB-PASS path.
So the by skb_checksum_none_assert() WARNS about it.
This is duplicate work and updating ip_summed in the
bnxt_xdp_build_skb() is not needed.
Splat looks like:
WARNING: CPU: 3 PID: 5782 at ./include/linux/skbuff.h:5155 bnxt_rx_pkt+0x479b/0x7610 [bnxt_en]
Modules linked in: bnxt_re bnxt_en rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs veth xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_]
CPU: 3 UID: 0 PID: 5782 Comm: socat Tainted: G W 6.14.0-rc4+ #27
Tainted: [W]=WARN
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
RIP: 0010:bnxt_rx_pkt+0x479b/0x7610 [bnxt_en]
Code: 54 24 0c 4c 89 f1 4c 89 ff c1 ea 1f ff d3 0f 1f 00 49 89 c6 48 85 c0 0f 84 4c e5 ff ff 48 89 c7 e8 ca 3d a0 c8 e9 8f f4 ff ff <0f> 0b f
RSP: 0018:ffff88881ba09928 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000c7590303 RCX: 0000000000000000
RDX: 1ffff1104e7d1610 RSI: 0000000000000001 RDI: ffff8881c91300b8
RBP: ffff88881ba09b28 R08: ffff888273e8b0d0 R09: ffff888273e8b070
R10: ffff888273e8b010 R11: ffff888278b0f000 R12: ffff888273e8b080
R13: ffff8881c9130e00 R14: ffff8881505d3800 R15: ffff888273e8b000
FS: 00007f5a2e7be080(0000) GS:ffff88881ba00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff2e708ff8 CR3: 000000013e3b0000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<IRQ>
? __warn+0xcd/0x2f0
? bnxt_rx_pkt+0x479b/0x7610
? report_bug+0x326/0x3c0
? handle_bug+0x53/0xa0
? exc_invalid_op+0x14/0x50
? asm_exc_invalid_op+0x16/0x20
? bnxt_rx_pkt+0x479b/0x7610
? bnxt_rx_pkt+0x3e41/0x7610
? __pfx_bnxt_rx_pkt+0x10/0x10
? napi_complete_done+0x2cf/0x7d0
__bnxt_poll_work+0x4e8/0x1220
? __pfx___bnxt_poll_work+0x10/0x10
? __pfx_mark_lock.part.0+0x10/0x10
bnxt_poll_p5+0x36a/0xfa0
? __pfx_bnxt_poll_p5+0x10/0x10
__napi_poll.constprop.0+0xa0/0x440
net_rx_action+0x899/0xd00
...
Following ping.py patch adds xdp-mb-pass case. so ping.py is going
to be able to reproduce this issue.
Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250309134219.91670-5-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a queue is restarted, it sets MRU to 0 for stopping packet flow.
MRU variable is a member of vnic_info[], the first vnic_info is default
and the second is ntuple.
Only when ntuple is enabled(ethtool -K eth0 ntuple on), vnic_info for
ntuple is allocated in init logic.
The bp->nr_vnics indicates how many vnic_info are allocated.
However bnxt_queue_{start | stop}() accesses vnic_info[BNXT_VNIC_NTUPLE]
regardless of ntuple state.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Fixes: b9d2956e869c ("bnxt_en: stop packet flow during bnxt_queue_stop/start")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250309134219.91670-4-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bnxt_queue_mem_alloc() is called to allocate new queue memory when
a queue is restarted.
It internally accesses rx buffer descriptor corresponding to the index.
The rx buffer descriptor is allocated and set when the interface is up
and it's freed when the interface is down.
So, if queue is restarted if interface is down, kernel panic occurs.
Splat looks like:
BUG: unable to handle page fault for address: 000000000000b240
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 UID: 0 PID: 1563 Comm: ncdevmem2 Not tainted 6.14.0-rc2+ #9 844ddba6e7c459cafd0bf4db9a3198e
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
RIP: 0010:bnxt_queue_mem_alloc+0x3f/0x4e0 [bnxt_en]
Code: 41 54 4d 89 c4 4d 69 c0 c0 05 00 00 55 48 89 f5 53 48 89 fb 4c 8d b5 40 05 00 00 48 83 ec 15
RSP: 0018:ffff9dcc83fef9e8 EFLAGS: 00010202
RAX: ffffffffc0457720 RBX: ffff934ed8d40000 RCX: 0000000000000000
RDX: 000000000000001f RSI: ffff934ea508f800 RDI: ffff934ea508f808
RBP: ffff934ea508f800 R08: 000000000000b240 R09: ffff934e84f4b000
R10: ffff9dcc83fefa30 R11: ffff934e84f4b000 R12: 000000000000001f
R13: ffff934ed8d40ac0 R14: ffff934ea508fd40 R15: ffff934e84f4b000
FS: 00007fa73888c740(0000) GS:ffff93559f780000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000b240 CR3: 0000000145a2e000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x20/0x70
? page_fault_oops+0x15a/0x460
? exc_page_fault+0x6e/0x180
? asm_exc_page_fault+0x22/0x30
? __pfx_bnxt_queue_mem_alloc+0x10/0x10 [bnxt_en 7f85e76f4d724ba07471d7e39d9e773aea6597b7]
? bnxt_queue_mem_alloc+0x3f/0x4e0 [bnxt_en 7f85e76f4d724ba07471d7e39d9e773aea6597b7]
netdev_rx_queue_restart+0xc5/0x240
net_devmem_bind_dmabuf_to_queue+0xf8/0x200
netdev_nl_bind_rx_doit+0x3a7/0x450
genl_family_rcv_msg_doit+0xd9/0x130
genl_rcv_msg+0x184/0x2b0
? __pfx_netdev_nl_bind_rx_doit+0x10/0x10
? __pfx_genl_rcv_msg+0x10/0x10
netlink_rcv_skb+0x54/0x100
genl_rcv+0x24/0x40
...
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250309134219.91670-3-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When mb-xdp is set and return is XDP_PASS, packet is converted from
xdp_buff to sk_buff with xdp_update_skb_shared_info() in
bnxt_xdp_build_skb().
bnxt_xdp_build_skb() passes incorrect truesize argument to
xdp_update_skb_shared_info().
The truesize is calculated as BNXT_RX_PAGE_SIZE * sinfo->nr_frags but
the skb_shared_info was wiped by napi_build_skb() before.
So it stores sinfo->nr_frags before bnxt_xdp_build_skb() and use it
instead of getting skb_shared_info from xdp_get_shared_info_from_buff().
Splat looks like:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 0 at net/core/skbuff.c:6072 skb_try_coalesce+0x504/0x590
Modules linked in: xt_nat xt_tcpudp veth af_packet xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xt_addrtype nft_coms
CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.14.0-rc2+ #3
RIP: 0010:skb_try_coalesce+0x504/0x590
Code: 4b fd ff ff 49 8b 34 24 40 80 e6 40 0f 84 3d fd ff ff 49 8b 74 24 48 40 f6 c6 01 0f 84 2e fd ff ff 48 8d 4e ff e9 25 fd ff ff <0f> 0b e99
RSP: 0018:ffffb62c4120caa8 EFLAGS: 00010287
RAX: 0000000000000003 RBX: ffffb62c4120cb14 RCX: 0000000000000ec0
RDX: 0000000000001000 RSI: ffffa06e5d7dc000 RDI: 0000000000000003
RBP: ffffa06e5d7ddec0 R08: ffffa06e6120a800 R09: ffffa06e7a119900
R10: 0000000000002310 R11: ffffa06e5d7dcec0 R12: ffffe4360575f740
R13: ffffe43600000000 R14: 0000000000000002 R15: 0000000000000002
FS: 0000000000000000(0000) GS:ffffa0755f700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f147b76b0f8 CR3: 00000001615d4000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<IRQ>
? __warn+0x84/0x130
? skb_try_coalesce+0x504/0x590
? report_bug+0x18a/0x1a0
? handle_bug+0x53/0x90
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? skb_try_coalesce+0x504/0x590
inet_frag_reasm_finish+0x11f/0x2e0
ip_defrag+0x37a/0x900
ip_local_deliver+0x51/0x120
ip_sublist_rcv_finish+0x64/0x70
ip_sublist_rcv+0x179/0x210
ip_list_rcv+0xf9/0x130
How to reproduce:
<Node A>
ip link set $interface1 xdp obj xdp_pass.o
ip link set $interface1 mtu 9000 up
ip a a 10.0.0.1/24 dev $interface1
<Node B>
ip link set $interfac2 mtu 9000 up
ip a a 10.0.0.2/24 dev $interface2
ping 10.0.0.1 -s 65000
Following ping.py patch adds xdp-mb-pass case. so ping.py is going to be
able to reproduce this issue.
Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250309134219.91670-2-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
usb_control_msg() returns the number of transferred bytes or a negative
error code. The current implementation propagates the transferred byte
count, which is unintended. This affects code paths that assume a
boolean success/failure check, such as the EEPROM detection logic.
Fix this by ensuring lan78xx_read_reg() and lan78xx_write_reg() return
only 0 on success and preserve negative error codes.
This approach is consistent with existing usage, as the transferred byte
count is not explicitly checked elsewhere.
Fixes: 8b1b2ca83b20 ("net: usb: lan78xx: Improve error handling in EEPROM and OTP operations")
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/ac965de8-f320-430f-80f6-b16f4e1ba06d@sirena.org.uk
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20250307101223.3025632-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This code is trying to ensure that the last byte of the buffer is a NUL
terminator. However, the problem is that attr->value[] is an array of
__le32, not char, so it zeroes out 4 bytes way beyond the end of the
buffer. Cast the buffer to char to address this.
Fixes: e5cf5107c9e4 ("eth: fbnic: Update fbnic_tlv_attr_get_string() to work like nla_strscpy()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Lee Trager <lee@trager.us>
Link: https://patch.msgid.link/2791d4be-ade4-4e50-9b12-33307d8410f6@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix out-of-bounds access on CPU-less AMD NUMA systems by the
microcode code
- Make the kernel SGX CPU init code less passive-aggressive about
non-working SGX features, instead of silently keeping the driver
disabled, this is something people are running into. This doesn't
affect functionality, it's a sysadmin QoL fix
* tag 'x86-urgent-2025-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Fix out-of-bounds on systems with CPU-less NUMA nodes
x86/sgx: Warn explicitly if X86_FEATURE_SGX_LC is not enabled
|
|
Fix missing initialization of ts_info->phc_index in the dump command,
which could cause a netdev interface to incorrectly display a PTP provider
at index 0 instead of "none".
Fix it by initializing the phc_index to -1.
In the same time, restore missing initialization of ts_info.cmd for the
IOCTL case, as it was before the transition from ethnl_default_dumpit to
custom ethnl_tsinfo_dumpit.
Also, remove unnecessary zeroing of ts_info, as it is embedded within
reply_data, which is fully zeroed two lines earlier.
Fixes: b9e3f7dc9ed95 ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250307091255.463559-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Realtek confirmed that all RTL8125/RTL8126 chip versions support up to
16K jumbo packets. Reflect this in the driver.
Tested by Rui on RTL8125B with 12K jumbo packets.
Suggested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/396762ad-cc65-4e60-b01e-8847db89e98b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Willem de Bruijn says:
====================
follow-up on deduplicate cookie logic
1/3: I came across a leftover from cookie deduplication, due to UDP
having two code paths: lockless fast path and locked cork path.
3/3: Even though the leftover was in the fast path, this prompted me
to complete coverage to the cork path.
2/3: That uncovered a subtle API inconsistency in how dontfrag is
configured. It should not be possible to switch DF mid datagram.
====================
Link: https://patch.msgid.link/20250307033620.411611-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
UDP send with MSG_MORE takes a slightly different path than the
lockless fast path.
For completeness, add coverage to this case too.
Pass MSG_MORE on the initial sendmsg, then follow up with a zero byte
write to unplug the cork.
Unrelated: also add two missing endlines in usage().
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250307033620.411611-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When spanning datagram construction over multiple send calls using
MSG_MORE, per datagram settings are configured on the first send.
That is when ip(6)_setup_cork stores these settings for subsequent use
in __ip(6)_append_data and others.
The only flag that escaped this was dontfrag. As a result, a datagram
could be constructed with df=0 on the first sendmsg, but df=1 on a
next. Which is what cmsg_ip.sh does in an upcoming MSG_MORE test in
the "diff" scenario.
Changing datagram conditions in the middle of constructing an skb
makes this already complex code path even more convoluted. It is here
unintentional. Bring this flag in line with expected sockopt/cmsg
behavior.
And stop passing ipc6 to __ip6_append_data, to avoid such issues
in the future. This is already the case for __ip_append_data.
inet6_cork had a 6 byte hole, so the 1B flag has no impact.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250307033620.411611-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As of the blamed commit ipc6.dontfrag is always initialized at the
start of udpv6_sendmsg, by ipcm6_init_sk, to either 0 or 1.
Later checks against -1 are no longer needed and the branches are now
dead code.
The blamed commit had removed those branches. But I had overlooked
this one case.
UDP has both a lockless fast path and a slower path for corked
requests. This branch remained in the fast path.
Fixes: 096208592b09 ("ipv6: replace ipcm6_init calls with ipcm6_init_sk")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250307033620.411611-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Joe Damato says:
====================
virtio-net: Link queues to NAPIs
Jakub recently commented [1] that I should not hold this series on
virtio-net linking queues to NAPIs behind other important work that is
on-going and suggested I re-spin, so here we are :)
As per the discussion on the v3 [2], now both RX and TX NAPIs use the
API to link queues to NAPIs. Since TX-only NAPIs don't have a NAPI ID,
commit 6597e8d35851 ("netdev-genl: Elide napi_id when not present") now
correctly elides the TX-only NAPIs (instead of printing zero) when the
queues and NAPIs are linked.
As per the discussion on the v4 [3], patch 3 has been refactored to hold
RTNL only in the specific locations which need it as Jason requested.
As per the discussion on the v5 [4], patch 3 now leaves refill_work
as-is and does not use the API to unlink and relink queues and NAPIs. A
comment has been left as suggested by Jakub [5] for future work.
See the commit message of patch 3 for an example of how to get the NAPI
to queue mapping information.
See the commit message of patch 4 for an example of how NAPI IDs are
persistent despite queue count changes.
[1]: https://lore.kernel.org/20250221142650.3c74dcac@kernel.org
[2]: https://lore.kernel.org/20250127142400.24eca319@kernel.org
[3]: https://lore.kernel.org/CACGkMEv=ejJnOWDnAu7eULLvrqXjkMkTL4cbi-uCTUhCpKN_GA@mail.gmail.com
[4]: https://lore.kernel.org/Z8X15hxz8t-vXpPU@LQ3V64L9R2
[5]: https://lore.kernel.org/20250303160355.5f8d82d8@kernel.org
v5: https://lore.kernel.org/20250227185017.206785-1-jdamato@fastly.com
v4: https://lore.kernel.org/20250225020455.212895-1-jdamato@fastly.com
rfcv3: https://lore.kernel.org/20250121191047.269844-1-jdamato@fastly.com
v2: https://lore.kernel.org/20250116055302.14308-1-jdamato@fastly.com
v1: https://lore.kernel.org/20250110202605.429475-1-jdamato@fastly.com
====================
Link: https://patch.msgid.link/20250307011215.266806-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use persistent NAPI config so that NAPI IDs are not renumbered as queue
counts change.
$ sudo ethtool -l ens4 | tail -5 | egrep -i '(current|combined)'
Current hardware settings:
Combined: 4
$ ./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'type': 'tx'},
{'id': 1, 'ifindex': 2, 'type': 'tx'},
{'id': 2, 'ifindex': 2, 'type': 'tx'},
{'id': 3, 'ifindex': 2, 'type': 'tx'}]
Now adjust the queue count, note that the NAPI IDs are not renumbered:
$ sudo ethtool -L ens4 combined 1
$ ./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'type': 'tx'}]
$ sudo ethtool -L ens4 combined 8
$ ./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'},
{'id': 4, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'},
{'id': 5, 'ifindex': 2, 'napi-id': 8198, 'type': 'rx'},
{'id': 6, 'ifindex': 2, 'napi-id': 8199, 'type': 'rx'},
{'id': 7, 'ifindex': 2, 'napi-id': 8200, 'type': 'rx'},
[...]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20250307011215.266806-5-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use netif_queue_set_napi to map NAPIs to queue IDs so that the mapping
can be accessed by user apps. Note that the netif_queue_set_napi
currently requires RTNL, so care must be taken to ensure RTNL is held on
paths where this API might be reached.
The paths in the driver where this API can be reached appear to be:
- ndo_open, ndo_close, which hold RTNL so no driver change is needed.
- rx_pause, rx_resume, tx_pause, tx_resume are reached either via
an ethtool ioctl or via XSK - neither path requires a driver change.
- power management paths (which call open and close), which have been
updated to hold/release RTNL.
$ ethtool -i ens4 | grep driver
driver: virtio_net
$ sudo ethtool -L ens4 combined 4
$ ./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8289, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8290, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8291, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8292, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'type': 'tx'},
{'id': 1, 'ifindex': 2, 'type': 'tx'},
{'id': 2, 'ifindex': 2, 'type': 'tx'},
{'id': 3, 'ifindex': 2, 'type': 'tx'}]
Note that virtio_net has TX-only NAPIs which do not have NAPI IDs, so
the lack of 'napi-id' in the above output is expected.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20250307011215.266806-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Create virtnet_napi_disable helper and refactor virtnet_napi_tx_disable
to take a struct send_queue.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20250307011215.266806-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor virtnet_napi_enable and virtnet_napi_tx_enable to take a struct
receive_queue. Create a helper, virtnet_napi_do_enable, which contains
the logic to enable a NAPI.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://patch.msgid.link/20250307011215.266806-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In mlx5_chains_create_table(), the return value of mlx5_get_fdb_sub_ns()
and mlx5_get_flow_namespace() must be checked to prevent NULL pointer
dereferences. If either function fails, the function should log error
message with mlx5_core_warn() and return error pointer.
Fixes: 39ac237ce009 ("net/mlx5: E-Switch, Refactor chains and priorities")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250307021820.2646-1-vulab@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The VMBus driver manages the MMIO space it owns via the hyperv_mmio
resource tree. Because the synthetic video framebuffer portion of the
MMIO space is initially setup by the Hyper-V host for each guest, the
VMBus driver does an early reserve of that portion of MMIO space in the
hyperv_mmio resource tree. It saves a pointer to that resource in
fb_mmio. When a VMBus driver requests MMIO space and passes "true"
for the "fb_overlap_ok" argument, the reserved framebuffer space is
used if possible. In that case it's not necessary to do another request
against the "shadow" hyperv_mmio resource tree because that resource
was already requested in the early reserve steps.
However, the vmbus_free_mmio() function currently does no special
handling for the fb_mmio resource. When a framebuffer device is
removed, or the driver is unbound, the current code for
vmbus_free_mmio() releases the reserved resource, leaving fb_mmio
pointing to memory that has been freed. If the same or another
driver is subsequently bound to the device, vmbus_allocate_mmio()
checks against fb_mmio, and potentially gets garbage. Furthermore
a second unbind operation produces this "nonexistent resource" error
because of the unbalanced behavior between vmbus_allocate_mmio() and
vmbus_free_mmio():
[ 55.499643] resource: Trying to free nonexistent
resource <0x00000000f0000000-0x00000000f07fffff>
Fix this by adding logic to vmbus_free_mmio() to recognize when
MMIO space in the fb_mmio reserved area would be released, and don't
release it. This filtering ensures the fb_mmio resource always exists,
and makes vmbus_free_mmio() more parallel with vmbus_allocate_mmio().
Fixes: be000f93e5d7 ("drivers:hv: Track allocations of children of hv_vmbus in private resource tree")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250310035208.275764-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250310035208.275764-1-mhklinux@outlook.com>
|
|
Currently, load_microcode_amd() iterates over all NUMA nodes, retrieves their
CPU masks and unconditionally accesses per-CPU data for the first CPU of each
mask.
According to Documentation/admin-guide/mm/numaperf.rst:
"Some memory may share the same node as a CPU, and others are provided as
memory only nodes."
Therefore, some node CPU masks may be empty and wouldn't have a "first CPU".
On a machine with far memory (and therefore CPU-less NUMA nodes):
- cpumask_of_node(nid) is 0
- cpumask_first(0) is CONFIG_NR_CPUS
- cpu_data(CONFIG_NR_CPUS) accesses the cpu_info per-CPU array at an
index that is 1 out of bounds
This does not have any security implications since flashing microcode is
a privileged operation but I believe this has reliability implications by
potentially corrupting memory while flashing a microcode update.
When booting with CONFIG_UBSAN_BOUNDS=y on an AMD machine that flashes
a microcode update. I get the following splat:
UBSAN: array-index-out-of-bounds in arch/x86/kernel/cpu/microcode/amd.c:X:Y
index 512 is out of range for type 'unsigned long[512]'
[...]
Call Trace:
dump_stack
__ubsan_handle_out_of_bounds
load_microcode_amd
request_microcode_amd
reload_store
kernfs_fop_write_iter
vfs_write
ksys_write
do_syscall_64
entry_SYSCALL_64_after_hwframe
Change the loop to go over only NUMA nodes which have CPUs before determining
whether the first CPU on the respective node needs microcode update.
[ bp: Massage commit message, fix typo. ]
Fixes: 7ff6edf4fef3 ("x86/microcode/AMD: Fix mixed steppings support")
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250310144243.861978-1-revest@chromium.org
|
|
The kernel requires X86_FEATURE_SGX_LC to be able to create SGX enclaves,
not just X86_FEATURE_SGX.
There is quite a number of hardware which has X86_FEATURE_SGX but not
X86_FEATURE_SGX_LC. A kernel running on such hardware does not create
the /dev/sgx_enclave file and does so silently.
Explicitly warn if X86_FEATURE_SGX_LC is not enabled to properly notify
users that the kernel disabled the SGX driver.
The X86_FEATURE_SGX_LC, a.k.a. SGX Launch Control, is a CPU feature
that enables LE (Launch Enclave) hash MSRs to be writable (with
additional opt-in required in the 'feature control' MSR) when running
enclaves, i.e. using a custom root key rather than the Intel proprietary
key for enclave signing.
I've hit this issue myself and have spent some time researching where
my /dev/sgx_enclave file went on SGX-enabled hardware.
Related links:
https://github.com/intel/linux-sgx/issues/837
https://patchwork.kernel.org/project/platform-driver-x86/patch/20180827185507.17087-3-jarkko.sakkinen@linux.intel.com/
[ mingo: Made the error message a bit more verbose, and added other cases
where the kernel fails to create the /dev/sgx_enclave device node. ]
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Kai Huang <kai.huang@intel.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250309172215.21777-2-vdronov@redhat.com
|
|
Bring in the fix for afs_atcell_get_link() to handle RCU pathwalk from
the afs branch for this cycle. This fix has to go upstream now.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The ->get_link() method may be entered under RCU pathwalk conditions (in
which case, the dentry pointer is NULL). This is not taken account of by
afs_atcell_get_link() and lockdep will complain when it tries to lock an
rwsem.
Fix this by marking net->ws_cell as __rcu and using RCU access macros on it
and by making afs_atcell_get_link() just return a pointer to the name in
RCU pathwalk without taking net->cells_lock or a ref on the cell as RCU
will protect the name storage (the cell is already freed via call_rcu()).
Fixes: 30bca65bbbae ("afs: Make /afs/@cell and /afs/.@cell symlinks")
Reported-by: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250310094206.801057-2-dhowells@redhat.com/ # v4
|
|
Add recovery counters group layout of PPCNT (Ports Performance Counters
Register). This group counts recovery events per link. Also add the
corresponding bit in PCAM to indicate this group is supported.
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741545697-23041-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The hypercall in hv_mark_gpa_visibility() is invoked with an input
argument and an output argument. The output argument ostensibly returns
the number of pages that were processed. But in fact, the hypercall does
not provide any output, so the output argument is spurious.
The spurious argument is harmless because Hyper-V ignores it, but in the
interest of correctness and to avoid the potential for future problems,
remove it.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250226200612.2062-2-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250226200612.2062-2-mhklinux@outlook.com>
|
|
When a Hyper-V framebuffer device is unbind, hyperv_fb driver tries to
release the framebuffer forcefully. If this framebuffer is in use it
produce the following WARN and hence this framebuffer is never released.
[ 44.111220] WARNING: CPU: 35 PID: 1882 at drivers/video/fbdev/core/fb_info.c:70 framebuffer_release+0x2c/0x40
< snip >
[ 44.111289] Call Trace:
[ 44.111290] <TASK>
[ 44.111291] ? show_regs+0x6c/0x80
[ 44.111295] ? __warn+0x8d/0x150
[ 44.111298] ? framebuffer_release+0x2c/0x40
[ 44.111300] ? report_bug+0x182/0x1b0
[ 44.111303] ? handle_bug+0x6e/0xb0
[ 44.111306] ? exc_invalid_op+0x18/0x80
[ 44.111308] ? asm_exc_invalid_op+0x1b/0x20
[ 44.111311] ? framebuffer_release+0x2c/0x40
[ 44.111313] ? hvfb_remove+0x86/0xa0 [hyperv_fb]
[ 44.111315] vmbus_remove+0x24/0x40 [hv_vmbus]
[ 44.111323] device_remove+0x40/0x80
[ 44.111325] device_release_driver_internal+0x20b/0x270
[ 44.111327] ? bus_find_device+0xb3/0xf0
Fix this by moving the release of framebuffer and assosiated memory
to fb_ops.fb_destroy function, so that framebuffer framework handles
it gracefully.
While we fix this, also replace manual registrations/unregistration of
framebuffer with devm_register_framebuffer.
Fixes: 68a2d20b79b1 ("drivers/video: add Hyper-V Synthetic Video Frame Buffer Driver")
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/1740845791-19977-3-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1740845791-19977-3-git-send-email-ssengar@linux.microsoft.com>
|
|
The device object required in 'hvfb_release_phymem' function
for 'dma_free_coherent' can also be obtained from the 'info'
pointer, making 'hdev' parameter in 'hvfb_putmem' redundant.
Remove the unnecessary 'hdev' argument from 'hvfb_putmem'.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/1740845791-19977-2-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1740845791-19977-2-git-send-email-ssengar@linux.microsoft.com>
|
|
Gen 2 Hyper-V VMs boot via EFI and have a standard EFI framebuffer
device. When the kdump kernel runs in such a VM, loading the efifb
driver may hang because of accessing the framebuffer at the wrong
memory address.
The scenario occurs when the hyperv_fb driver in the original kernel
moves the framebuffer to a different MMIO address because of conflicts
with an already-running efifb or simplefb driver. The hyperv_fb driver
then informs Hyper-V of the change, which is allowed by the Hyper-V FB
VMBus device protocol. However, when the kexec command loads the kdump
kernel into crash memory via the kexec_file_load() system call, the
system call doesn't know the framebuffer has moved, and it sets up the
kdump screen_info using the original framebuffer address. The transition
to the kdump kernel does not go through the Hyper-V host, so Hyper-V
does not reset the framebuffer address like it would do on a reboot.
When efifb tries to run, it accesses a non-existent framebuffer
address, which traps to the Hyper-V host. After many such accesses,
the Hyper-V host thinks the guest is being malicious, and throttles
the guest to the point that it runs very slowly or appears to have hung.
When the kdump kernel is loaded into crash memory via the kexec_load()
system call, the problem does not occur. In this case, the kexec command
builds the screen_info table itself in user space from data returned
by the FBIOGET_FSCREENINFO ioctl against /dev/fb0, which gives it the
new framebuffer location.
This problem was originally reported in 2020 [1], resulting in commit
3cb73bc3fa2a ("hyperv_fb: Update screen_info after removing old
framebuffer"). This commit solved the problem by setting orig_video_isVGA
to 0, so the kdump kernel was unaware of the EFI framebuffer. The efifb
driver did not try to load, and no hang occurred. But in 2024, commit
c25a19afb81c ("fbdev/hyperv_fb: Do not clear global screen_info")
effectively reverted 3cb73bc3fa2a. Commit c25a19afb81c has no reference
to 3cb73bc3fa2a, so perhaps it was done without knowing the implications
that were reported with 3cb73bc3fa2a. In any case, as of commit
c25a19afb81c, the original problem came back again.
Interestingly, the hyperv_drm driver does not have this problem because
it never moves the framebuffer. The difference is that the hyperv_drm
driver removes any conflicting framebuffers *before* allocating an MMIO
address, while the hyperv_fb drivers removes conflicting framebuffers
*after* allocating an MMIO address. With the "after" ordering, hyperv_fb
may encounter a conflict and move the framebuffer to a different MMIO
address. But the conflict is essentially bogus because it is removed
a few lines of code later.
Rather than fix the problem with the approach from 2020 in commit
3cb73bc3fa2a, instead slightly reorder the steps in hyperv_fb so
conflicting framebuffers are removed before allocating an MMIO address.
Then the default framebuffer MMIO address should always be available, and
there's never any confusion about which framebuffer address the kdump
kernel should use -- it's always the original address provided by
the Hyper-V host. This approach is already used by the hyperv_drm
driver, and is consistent with the usage guidelines at the head of
the module with the function aperture_remove_conflicting_devices().
This approach also solves a related minor problem when kexec_load()
is used to load the kdump kernel. With current code, unbinding and
rebinding the hyperv_fb driver could result in the framebuffer moving
back to the default framebuffer address, because on the rebind there
are no conflicts. If such a move is done after the kdump kernel is
loaded with the new framebuffer address, at kdump time it could again
have the wrong address.
This problem and fix are described in terms of the kdump kernel, but
it can also occur with any kernel started via kexec.
See extensive discussion of the problem and solution at [2].
[1] https://lore.kernel.org/linux-hyperv/20201014092429.1415040-1-kasong@redhat.com/
[2] https://lore.kernel.org/linux-hyperv/BLAPR10MB521793485093FDB448F7B2E5FDE92@BLAPR10MB5217.namprd10.prod.outlook.com/
Reported-by: Thomas Tai <thomas.tai@oracle.com>
Fixes: c25a19afb81c ("fbdev/hyperv_fb: Do not clear global screen_info")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20250218230130.3207-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250218230130.3207-1-mhklinux@outlook.com>
|
|
When a Hyper-V DRM device is probed, the driver allocates MMIO space for
the vram, and maps it cacheable. If the device removed, or in the error
path for device probing, the MMIO space is released but no unmap is done.
Consequently the kernel address space for the mapping is leaked.
Fix this by adding iounmap() calls in the device removal path, and in the
error path during device probing.
Fixes: f1f63cbb705d ("drm/hyperv: Fix an error handling path in hyperv_vmbus_probe()")
Fixes: a0ab5abced55 ("drm/hyperv : Removing the restruction of VRAM allocation with PCI bar size")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Tested-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250210193441.2414-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250210193441.2414-1-mhklinux@outlook.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Use the specified $(LD) when building userprogs with Clang
- Pass the correct target triple when compile-testing UAPI headers
with Clang
- Fix pacman-pkg build error with KBUILD_OUTPUT
* tag 'kbuild-fixes-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: install-extmod-build: Fix build when specifying KBUILD_OUTPUT
docs: Kconfig: fix defconfig description
kbuild: hdrcheck: fix cross build with clang
kbuild: userprogs: use correct lld when linking through clang
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for some reported issues. These
contain:
- typec driver fixes
- dwc3 driver fixes
- xhci driver fixes
- renesas controller fixes
- gadget driver fixes
- a new USB quirk added
All of these have been in linux-next with no reported issues"
* tag 'usb-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Fix NULL pointer access
usb: quirks: Add DELAY_INIT and NO_LPM for Prolific Mass Storage Card Reader
usb: xhci: Fix host controllers "dying" after suspend and resume
usb: dwc3: Set SUSPENDENABLE soon after phy init
usb: hub: lack of clearing xHC resources
usb: renesas_usbhs: Flush the notify_hotplug_work
usb: renesas_usbhs: Use devm_usb_get_phy()
usb: renesas_usbhs: Call clk_put()
usb: dwc3: gadget: Prevent irq storm when TH re-executes
usb: gadget: Check bmAttributes only if configuration is valid
xhci: Restrict USB4 tunnel detection for USB3 devices to Intel hosts
usb: xhci: Enable the TRB overfetch quirk on VIA VL805
usb: gadget: Fix setting self-powered state on suspend
usb: typec: ucsi: increase timeout for PPM reset operations
acpi: typec: ucsi: Introduce a ->poll_cci method
usb: typec: tcpci_rt1711h: Unmask alert interrupts to fix functionality
usb: gadget: Set self-powered based on MaxPower and bmAttributes
usb: gadget: u_ether: Set is_suspend flag if remote wakeup fails
usb: atm: cxacru: fix a flaw in existing endpoint checks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single driver core fix that resolves a reported memory leak.
It's been in linux-next for 2 weeks now with no reported problems"
* tag 'driver-core-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers: core: fix device leak in __fw_devlink_relax_cycles()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc/IIO driver fixes from Greg KH:
"Here are a number of misc and char and iio driver fixes that have been
sitting in my tree for way too long. They contain:
- iio driver fixes for reported issues
- regression fix for rtsx_usb card reader
- mei and mhi driver fixes
- small virt driver fixes
- ntsync permissions fix
- other tiny driver fixes for reported problems.
All of these have been in linux-next for quite a while with no
reported issues"
* tag 'char-misc-6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits)
Revert "drivers/card_reader/rtsx_usb: Restore interrupt based detection"
ntsync: Check wait count based on byte size.
bus: simple-pm-bus: fix forced runtime PM use
char: misc: deallocate static minor in error path
eeprom: digsy_mtc: Make GPIO lookup table match the device
drivers: virt: acrn: hsm: Use kzalloc to avoid info leak in pmcmd_ioctl
binderfs: fix use-after-free in binder_devices
slimbus: messaging: Free transaction ID in delayed interrupt scenario
vbox: add HAS_IOPORT dependency
cdx: Fix possible UAF error in driver_override_show()
intel_th: pci: Add Panther Lake-P/U support
intel_th: pci: Add Panther Lake-H support
intel_th: pci: Add Arrow Lake support
intel_th: msu: Fix less trivial kernel-doc warnings
intel_th: msu: Fix kernel-doc warnings
MAINTAINERS: change maintainer for FSI
ntsync: Set the permissions to be 0666
bus: mhi: host: pci_generic: Use pci_try_reset_function() to avoid deadlock
mei: vsc: Use "wakeuphostint" when getting the host wakeup GPIO
mei: me: add panther lake P DID
...
|
|
Pull KVM fixes from Paolo Bonzini:
"arm64:
- Fix a couple of bugs affecting pKVM's PSCI relay implementation
when running in the hVHE mode, resulting in the host being entered
with the MMU in an unknown state, and EL2 being in the wrong mode
x86:
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow
- Ensure DEBUGCTL is context switched on AMD to avoid running the
guest with the host's value, which can lead to unexpected bus lock
#DBs
- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't
properly emulate BTF. KVM's lack of context switching has meant BTF
has always been broken to some extent
- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as
the guest can enable DebugSwap without KVM's knowledge
- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes
to RO memory" phase without actually generating a write-protection
fault
- Fix a printf() goof in the SEV smoke test that causes build
failures with -Werror
- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when
PERFMON_V2 isn't supported by KVM"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM
KVM: selftests: Fix printf() format goof in SEV smoke test
KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage
KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3
KVM: SVM: Save host DR masks on CPUs with DebugSwap
KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()
KVM: arm64: Initialize HCR_EL2.E2H early
KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
KVM: x86: Snapshot the host's DEBUGCTL in common x86
KVM: SVM: Suppress DEBUGCTL.BTF on AMD
KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
KVM: selftests: Assert that STI blocking isn't set after event injection
KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow
|
|
into HEAD
KVM x86 fixes for 6.14-rcN #2
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow.
- Ensure DEBUGCTL is context switched on AMD to avoid running the guest with
the host's value, which can lead to unexpected bus lock #DBs.
- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly
emulate BTF. KVM's lack of context switching has meant BTF has always been
broken to some extent.
- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest
can enable DebugSwap without KVM's knowledge.
- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO
memory" phase without actually generating a write-protection fault.
- Fix a printf() goof in the SEV smoke test that causes build failures with
-Werror.
- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2
isn't supported by KVM.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.14, take #4
- Fix a couple of bugs affecting pKVM's PSCI relay implementation
when running in the hVHE mode, resulting in the host being entered
with the MMU in an unknown state, and EL2 being in the wrong mode.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"33 hotfixes. 24 are cc:stable and the remainder address post-6.13
issues or aren't considered necessary for -stable kernels.
26 are for MM and 7 are for non-MM.
- "mm: memory_failure: unmap poisoned folio during migrate properly"
from Ma Wupeng fixes a couple of two year old bugs involving the
migration of hwpoisoned folios.
- "selftests/damon: three fixes for false results" from SeongJae Park
fixes three one year old bugs in the SAMON selftest code.
The remainder are singletons and doubletons. Please see the individual
changelogs for details"
* tag 'mm-hotfixes-stable-2025-03-08-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (33 commits)
mm/page_alloc: fix uninitialized variable
rapidio: add check for rio_add_net() in rio_scan_alloc_net()
rapidio: fix an API misues when rio_add_net() fails
MAINTAINERS: .mailmap: update Sumit Garg's email address
Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone"
mm: fix finish_fault() handling for large folios
mm: don't skip arch_sync_kernel_mappings() in error paths
mm: shmem: remove unnecessary warning in shmem_writepage()
userfaultfd: fix PTE unmapping stack-allocated PTE copies
userfaultfd: do not block on locking a large folio with raised refcount
mm: zswap: use ATOMIC_LONG_INIT to initialize zswap_stored_pages
mm: shmem: fix potential data corruption during shmem swapin
mm: fix kernel BUG when userfaultfd_move encounters swapcache
selftests/damon/damon_nr_regions: sort collected regiosn before checking with min/max boundaries
selftests/damon/damon_nr_regions: set ops update for merge results check to 100ms
selftests/damon/damos_quota: make real expectation of quota exceeds
include/linux/log2.h: mark is_power_of_2() with __always_inline
NFS: fix nfs_release_folio() to not deadlock via kcompactd writeback
mm, swap: avoid BUG_ON in relocate_cluster()
mm: swap: use correct step in loop to wait all clusters in wait_for_allocation()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 fixes from Ingo Molnar:
- Add more model IDs to the AMD microcode version check, more people
are hitting these checks
- Fix a Xen guest boot warning related to AMD northbridge setup
- Fix SEV guest bugs related to a recent changes in its locking logic
- Fix a missing definition of PTRS_PER_PMD that assembly builds can hit
* tag 'x86-urgent-2025-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Add some forgotten models to the SHA check
x86/mm: Define PTRS_PER_PMD for assembly code too
virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutex
virt: sev-guest: Allocate request data dynamically
x86/amd_nb: Use rdmsr_safe() in amd_get_mmconfig_range()
|
|
Add some more forgotten models to the SHA check.
Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches")
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Link: https://lore.kernel.org/r/20250307220256.11816-1-bp@kernel.org
|