linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-06-18	bpf: Adjust free target to avoid global starvation of LRU map	Willem de Bruijn
	BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the map is full, due to percpu reservations and force shrink before neighbor stealing. Once a CPU is unable to borrow from the global map, it will once steal one elem from a neighbor and after that each time flush this one element to the global list and immediately recycle it. Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map with 79 CPUs. CPU 79 will observe this behavior even while its neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%). CPUs need not be active concurrently. The issue can appear with affinity migration, e.g., irqbalance. Each CPU can reserve and then hold onto its 128 elements indefinitely. Avoid global list exhaustion by limiting aggregate percpu caches to half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count. This change has no effect on sufficiently large tables. Similar to LOCAL_NR_SCANS and lru->nr_scans, introduce a map variable lru->free_target. The extra field fits in a hole in struct bpf_lru. The cacheline is already warm where read in the hot path. The field is only accessed with the lru lock held. Tested-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/r/20250618215803.3587312-1-willemdebruijn.kernel@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-18	Merge branch 'net-fix-passive-tfo-socket-having-invalid-napi-id'	Jakub Kicinski
	David Wei says: ==================== net: fix passive TFO socket having invalid NAPI ID Found a bug where an accepted passive TFO socket returns an invalid NAPI ID (i.e. 0) from SO_INCOMING_NAPI_ID. Add a selftest for this using netdevsim and fix the bug. Patch 1 is a drive-by fix for the lib.sh include in an existing drivers/net/netdevsim/peer.sh selftest. Patch 2 adds a test binary for a simple TFO server/client. Patch 3 adds a selftest for checking that the NAPI ID of a passive TFO socket is valid. This will currently fail. Patch 4 adds a fix for the bug. ==================== Link: https://patch.msgid.link/20250617212102.175711-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	tcp: fix passive TFO socket having invalid NAPI ID	David Wei
	There is a bug with passive TFO sockets returning an invalid NAPI ID 0 from SO_INCOMING_NAPI_ID. Normally this is not an issue, but zero copy receive relies on a correct NAPI ID to process sockets on the right queue. Fix by adding a sk_mark_napi_id_set(). Fixes: e5907459ce7e ("tcp: Record Rx hash and NAPI ID in tcp_child_process") Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250617212102.175711-5-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	selftests: net: add test for passive TFO socket NAPI ID	David Wei
	Add a test that checks that the NAPI ID of a passive TFO socket is valid i.e. not zero. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250617212102.175711-4-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	selftests: net: add passive TFO test binary	David Wei
	Add a simple passive TFO server and client test binary. This will be used to test the SO_INCOMING_NAPI_ID of passive TFO accepted sockets. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250617212102.175711-3-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	selftests: netdevsim: improve lib.sh include in peer.sh	David Wei
	Fix the peer.sh test to run from INSTALL_PATH. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250617212102.175711-2-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	Merge tag '6.16-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd	Linus Torvalds
	Pull smb server fixes from Steve French: - Fix alternate data streams bug - Important fix for null pointer deref with Kerberos authentication - Fix oops in smbdirect (RDMA) in free_transport * tag '6.16-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: handle set/get info file for streamed file ksmbd: fix null pointer dereference in destroy_previous_session ksmbd: add free_transport ops in ksmbd connection
2025-06-18	f2fs: fix to zero post-eof page	Chao Yu
	fstest reports a f2fs bug: generic/363 42s ... [failed, exit status 1]- output mismatch (see /share/git/fstests/results//generic/363.out.bad) --- tests/generic/363.out 2025-01-12 21:57:40.271440542 +0800 +++ /share/git/fstests/results//generic/363.out.bad 2025-05-19 19:55:58.000000000 +0800 @@ -1,2 +1,78 @@ QA output created by 363 fsx -q -S 0 -e 1 -N 100000 +READ BAD DATA: offset = 0xd6fb, size = 0xf044, fname = /mnt/f2fs/junk +OFFSET GOOD BAD RANGE +0x1540d 0x0000 0x2a25 0x0 +operation# (mod 256) for the bad data may be 37 +0x1540e 0x0000 0x2527 0x1 ... (Run 'diff -u /share/git/fstests/tests/generic/363.out /share/git/fstests/results//generic/363.out.bad' to see the entire diff) Ran: generic/363 Failures: generic/363 Failed 1 of 1 tests The root cause is user can update post-eof page via mmap [1], however, f2fs missed to zero post-eof page in below operations, so, once it expands i_size, then it will include dummy data locates previous post-eof page, so during below operations, we need to zero post-eof page. Operations which can include dummy data after previous i_size after expanding i_size: - write - mapwrite [1] - truncate - fallocate * preallocate * zero_range * insert_range * collapse_range - clone_range (doesn’t support in f2fs) - copy_range (doesn’t support in f2fs) [1] https://man7.org/linux/man-pages/man2/mmap.2.html 'BUG section' Cc: stable@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-06-18	Merge tag 'driver-core-6.16-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core fixes from Danilo Krummrich: - Fix a race condition in Devres::drop(). This depends on two other patches: - (Minimal) Rust abstractions for struct completion - Let Revocable indicate whether its data is already being revoked - Fix Devres to avoid exposing the internal Revocable - Add .mailmap entry for Danilo Krummrich - Add Madhavan Srinivasan to embargoed-hardware-issues.rst * tag 'driver-core-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: Documentation: embargoed-hardware-issues.rst: Add myself for Power mailmap: add entry for Danilo Krummrich rust: devres: do not dereference to the internal Revocable rust: devres: fix race in Devres::drop() rust: revocable: indicate whether `data` has been revoked already rust: completion: implement initial abstraction
2025-06-18	Merge tag 'cgroup-for-6.16-rc2-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: - In cgroup1 freezer, a task migrating into a frozen cgroup might not get frozen immediately due to the wrong operation order. Fix it. * tag 'cgroup-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup,freezer: fix incomplete freezing when attaching tasks
2025-06-18	Merge tag 'wq-for-6.16-rc2-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: - Fix missed early init of wq_isolated_cpumask * tag 'wq-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Initialize wq_isolated_cpumask in workqueue_init_early()
2025-06-18	tipc: fix null-ptr-deref when acquiring remote ip of ethernet bearer	Haixia Qu
	The reproduction steps: 1. create a tun interface 2. enable l2 bearer 3. TIPC_NL_UDP_GET_REMOTEIP with media name set to tun tipc: Started in network mode tipc: Node identity 8af312d38a21, cluster identity 4711 tipc: Enabled bearer <eth:syz_tun>, priority 1 Oops: general protection fault KASAN: null-ptr-deref in range CPU: 1 UID: 1000 PID: 559 Comm: poc Not tainted 6.16.0-rc1+ #117 PREEMPT Hardware name: QEMU Ubuntu 24.04 PC RIP: 0010:tipc_udp_nl_dump_remoteip+0x4a4/0x8f0 the ub was in fact a struct dev. when bid != 0 && skip_cnt != 0, bearer_list[bid] may be NULL or other media when other thread changes it. fix this by checking media_id. Fixes: 832629ca5c313 ("tipc: add UDP remoteip dump to netlink API") Signed-off-by: Haixia Qu <hxqu@hillstonenet.com> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Link: https://patch.msgid.link/20250617055624.2680-1-hxqu@hillstonenet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	Octeontx2-pf: Fix Backpresure configuration	Hariprasad Kelam
	NIX block can receive packets from multiple links such as MAC (RPM), LBK and CPT. ----------------- RPM --\| NIX \| ----------------- \| \| LBK Each link supports multiple channels for example RPM link supports 16 channels. In case of link oversubsribe, NIX will assert backpressure on receive channels. The previous patch considered a single channel per link, resulting in backpressure not being enabled on the remaining channels Fixes: a7ef63dbd588 ("octeontx2-af: Disable backpressure between CPT and NIX") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250617063403.3582210-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	Merge tag 'sched_ext-for-6.16-rc2-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix a couple bugs in cgroup cpu.weight support - Add the new sched-ext@lists.linux.dev to MAINTAINERS * tag 'sched_ext-for-6.16-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group() sched_ext: Make scx_group_set_weight() always update tg->scx.weight sched_ext: Update mailing list entry in MAINTAINERS
2025-06-18	Merge branch '100GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2025-06-17 (ice, e1000e) For ice: Krishna Kumar modifies aRFS match criteria to correctly identify matching filters. Grzegorz fixes a memory leak in eswitch legacy mode. For e1000e: Vitaly sets clock frequency on some Nahum systems which may misreport their value. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: e1000e: set fixed clock frequency indication for Nahum 11 and Nahum 13 ice: fix eswitch code memory leak in reset scenario net: ice: Perform accurate aRFS flow match ==================== Link: https://patch.msgid.link/20250617172444.1419560-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: ftgmac100: select FIXED_PHY	Heiner Kallweit
	Depending on e.g. DT configuration this driver uses a fixed link. So we shouldn't rely on the user to enable FIXED_PHY, select it in Kconfig instead. We may end up with a non-functional driver otherwise. Fixes: 38561ded50d0 ("net: ftgmac100: support fixed link") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/476bb33b-5584-40f0-826a-7294980f2895@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	net: ethtool: remove duplicate defines for family info	Jakub Kicinski
	Commit under fixes switched to uAPI generation from the YAML spec. A number of custom defines were left behind, mostly for commands very hard to express in YAML spec. Among what was left behind was the name and version of the generic netlink family. Problem is that the codegen always outputs those values so we ended up with a duplicated, differently named set of defines. Provide naming info in YAML and remove the incorrect defines. Fixes: 8d0580c6ebdd ("ethtool: regenerate uapi header from the spec") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250617202240.811179-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-18	Merge tag 'libcrypto-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library fixes from Eric Biggers: - Fix a regression in the arm64 Poly1305 code - Fix a couple compiler warnings * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto/poly1305: Fix arm64's poly1305_blocks_arch() lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older lib/crypto: Annotate crypto strings with nonstring
2025-06-18	cgroup,freezer: fix incomplete freezing when attaching tasks	Chen Ridong
	An issue was found: # cd /sys/fs/cgroup/freezer/ # mkdir test # echo FROZEN > test/freezer.state # cat test/freezer.state FROZEN # sleep 1000 & [1] 863 # echo 863 > test/cgroup.procs # cat test/freezer.state FREEZING When tasks are migrated to a frozen cgroup, the freezer fails to immediately freeze the tasks, causing the cgroup to remain in the "FREEZING". The freeze_task() function is called before clearing the CGROUP_FROZEN flag. This causes the freezing() check to incorrectly return false, preventing __freeze_task() from being invoked for the migrated task. To fix this issue, clear the CGROUP_FROZEN state before calling freeze_task(). Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic") Cc: stable@vger.kernel.org # v6.1+ Reported-by: Zhong Jiawei <zhongjiawei1@huawei.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-06-18	ACPICA: Refuse to evaluate a method if arguments are missing	Rafael J. Wysocki
	As reported in [1], a platform firmware update that increased the number of method parameters and forgot to update a least one of its callers, caused ACPICA to crash due to use-after-free. Since this a result of a clear AML issue that arguably cannot be fixed up by the interpreter (it cannot produce missing data out of thin air), address it by making ACPICA refuse to evaluate a method if the caller attempts to pass fewer arguments than expected to it. Closes: https://github.com/acpica/acpica/issues/1027 [1] Reported-by: Peter Williams <peter@newton.cx> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hans de Goede <hansg@kernel.org> Tested-by: Hans de Goede <hansg@kernel.org> # Dell XPS 9640 with BIOS 1.12.0 Link: https://patch.msgid.link/5909446.DvuYhMxLoT@rjwysocki.net Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-06-18	cifs: Remove duplicate fattr->cf_dtype assignment from wsl_to_fattr() function	Pali Rohár
	Commit 8bd25b61c5a5 ("smb: client: set correct d_type for reparse DFS/DFSR and mount point") deduplicated assignment of fattr->cf_dtype member from all places to end of the function cifs_reparse_point_to_fattr(). The only one missing place which was not deduplicated is wsl_to_fattr(). Fix it. Fixes: 8bd25b61c5a5 ("smb: client: set correct d_type for reparse DFS/DFSR and mount point") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-18	smb: fix secondary channel creation issue with kerberos by populating ↵	Bharath SM
	hostname when adding channels When mounting a share with kerberos authentication with multichannel support, share mounts correctly, but fails to create secondary channels. This occurs because the hostname is not populated when adding the channels. The hostname is necessary for the userspace cifs.upcall program to retrieve the required credentials and pass it back to kernel, without hostname secondary channels fails establish. Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Reported-by: xfuren <xfuren@gmail.com> Link: https://bugzilla.samba.org/show_bug.cgi?id=15824 Signed-off-by: Steve French <stfrench@microsoft.com>
2025-06-18	EDAC/igen6: Fix NULL pointer dereference	Qiuxu Zhuo
	A kernel panic was reported with the following kernel log: EDAC igen6: Expected 2 mcs, but only 1 detected. BUG: unable to handle page fault for address: 000000000000d570 ... Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024 RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac] ... igen6_probe+0x2a0/0x343 [igen6_edac] ... igen6_init+0xc5/0xff0 [igen6_edac] ... This issue occurred because one memory controller was disabled by the BIOS but the igen6_edac driver still checked all the memory controllers, including this absent one, to identify the source of the error. Accessing the null MMIO for the absent memory controller resulted in the oops above. Fix this issue by reverting the configuration structure to non-const and updating the field 'res_cfg->num_imc' to reflect the number of detected memory controllers. Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/ Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Link: https://lore.kernel.org/r/20250618162307.1523736-1-qiuxu.zhuo@intel.com
2025-06-18	Merge tag 'selinux-pr-20250618' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "A small SELinux patch to resolve a UBSAN warning in the xfrm/labeled-IPsec code" * tag 'selinux-pr-20250618' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: fix selinux_xfrm_alloc_user() to set correct ctx_len
2025-06-18	Merge tag 'ftrace-v6.16-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace fix from Steven Rostedt: - Do not blindly enable function_graph tracer when updating funcgraph-args When the option to trace function arguments in the function graph trace is updated, it requires the function graph tracer to switch its callback routine. It disables function graph tracing, updates the callback and then re-enables function graph tracing. The issue is that it doesn't check if function graph tracing is currently enabled or not. If it is not enabled, it will try to disable it and re-enable it (which will actually enable it even though it is not the current tracer). This causes an issue in the accounting and will trigger a WARN_ON() if the function tracer is enabled after that. * tag 'ftrace-v6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: fgraph: Do not enable function_graph tracer when setting funcgraph-args
2025-06-18	Merge tag 'ata-6.16-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fixes from Niklas Cassel: - Force PIO for ATAPI devices on VT6415/VT6330 as the controller locks up on ATAPI DMA (Tasos) - Fix ACPI PATA cable type detection such that the controller is not forced down to a slow transfer mode (Tasos) - Fix build error on 32-bit UML (Johannes) - Fix a PCI region leak in the pata_macio driver so that the driver no longer fails to load after rmmod (Philipp) - Use correct DMI BIOS build date for ThinkPad W541 quirk (me) - Disallow LPM for ASUSPRO-D840SA motherboard as this board interestingly enough gets graphical corruptions on the iGPU when LPM is enabled (me) - Disallow LPM for Asus B550-F motherboard as this board will get command timeouts on ports 5 and 6, yet LPM with the same drive works fine on all other ports (Mikko) * tag 'ata-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: ahci: Disallow LPM for Asus B550-F motherboard ata: ahci: Disallow LPM for ASUSPRO-D840SA motherboard ata: ahci: Use correct BIOS build date for ThinkPad W541 quirk ata: pata_macio: Fix PCI region leak ata: pata_cs5536: fix build on 32-bit UML ata: libata-acpi: Do not assume 40 wire cable if no devices are enabled ata: pata_via: Force PIO for ATAPI devices on VT6415/VT6330
2025-06-18	drm/amdgpu/sdma5.2: init engine reset mutex	Alex Deucher
	Missing the mutex init. Fixes: 47454f2dc0bf ("drm/amdgpu: Register the new sdma function pointers for sdma_v5_2") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ea685ff30a51a25dd9be90786933ada49a088f65)
2025-06-18	drm/amdkfd: Fix race in GWS queue scheduling	Jay Cornwall
	q->gws is not updated atomically with qpd->mapped_gws_queue. If a runlist is created between pqm_set_gws and update_queue it will contain a queue which uses GWS in a process with no GWS allocated. This will result in a scheduler hang. Use q->properties.is_gws which is changed while holding the DQM lock. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b98370220eb3110e82248e3354e16a489a492cfb) Cc: stable@vger.kernel.org
2025-06-18	drm/amdgpu/sdma5: init engine reset mutex	Alex Deucher
	Missing the mutex init. Fixes: e56d4bf57fab ("drm/amdgpu/: drm/amdgpu: Register the new sdma function pointers for sdma_v5_0") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3f4caf092f02f0de169c6122639af481c7edc8f9)
2025-06-18	drm/amdgpu: switch job hw_fence to amdgpu_fence	Alex Deucher
	Use the amdgpu fence container so we can store additional data in the fence. This also fixes the start_time handling for MCBP since we were casting the fence to an amdgpu_fence and it wasn't. Fixes: 3f4c175d62d8 ("drm/amdgpu: MCBP based on DRM scheduler (v9)") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bf1cd14f9e2e1fdf981eed273ddd595863f5288c) Cc: stable@vger.kernel.org
2025-06-18	drm/amdgpu: Fix SDMA UTC_L1 handling during start/stop sequences	Jesse Zhang
	This commit makes two key fixes to SDMA v4.4.2 handling: 1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines by reading the current value before modifying UTC_L1_ENABLE bit. 2. Ensure UTC_L1_ENABLE is consistently managed by: - Adding the missing register write when enabling UTC_L1 during start - Keeping UTC_L1 enabled by default as per hardware requirements v2: Correct SDMA_CNTL setting (Philip) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 375bf564654e85a7b1b0657b191645b3edca1bda) Cc: stable@vger.kernel.org
2025-06-18	drm/amdgpu: Release reset locks during failures	Lijo Lazar
	Make sure to release reset domain lock in case of failures. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Fixes: 11bb33766f66 ("drm/amdgpu: refactor amdgpu_device_gpu_recover") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1ab11a82681eb33a66f423216cb063e7f40c6f85)
2025-06-18	drm/amd/display: Check dce_hwseq before dereferencing it	Alex Hung
	[WHAT] hws was checked for null earlier in dce110_blank_stream, indicating hws can be null, and should be checked whenever it is used. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 79db43611ff61280b6de58ce1305e0b2ecf675ad) Cc: stable@vger.kernel.org
2025-06-18	drm/amdgpu: VCN v5_0_1 to prevent FW checking RB during DPG pause	Sonny Jiang
	Add a protection to ensure programming are all complete prior VCPU starting. This is a WA for an unintended VCPU running. Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c29521b529fa5e225feaf709d863a636ca0cbbfa) Cc: stable@vger.kernel.org
2025-06-18	drm/amdgpu: Use logical instance ID for SDMA v4_4_2 queue operations	Jesse Zhang
	Simplify SDMA v4_4_2 queue reset and stop operations by: 1. Removing GET_INST(SDMA0) conversion for ring->me 2. Using the logical instance ID (ring->me) directly 3. Maintaining consistent behavior with other SDMA queue operations This change aligns with the existing queue handling logic where ring->me already represents the correct instance identifier. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3bab282dfe25dff7a55add432f56898505a6cc6c) Cc: stable@vger.kernel.org
2025-06-18	drm/amdgpu: Fix SDMA engine reset with logical instance ID	Jesse Zhang
	This commit makes the following improvements to SDMA engine reset handling: 1. Clarifies in the function documentation that instance_id refers to a logical ID 2. Adds conversion from logical to physical instance ID before performing reset using GET_INST(SDMA0, instance_id) macro 3. Improves error messaging to indicate when a logical instance reset fails 4. Adds better code organization with blank lines for readability The change ensures proper SDMA engine reset by using the correct physical instance ID while maintaining the logical ID interface for callers. V2: Remove harvest_config check and convert directly to physical instance (Lijo) Suggested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5efa6217c239ed1ceec0f0414f9b6f6927387dfc) Cc: stable@vger.kernel.org
2025-06-18	drm/amdgpu: add kicker fws loading for gfx11/smu13/psp13	Frank Min
	1. Add kicker firmwares loading for gfx11/smu13/psp13 2. Register additional MODULE_FIRMWARE entries for kicker fws - gc_11_0_0_rlc_kicker.bin - gc_11_0_0_imu_kicker.bin - psp_13_0_0_sos_kicker.bin - psp_13_0_0_ta_kicker.bin - smu_13_0_0_kicker.bin Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fb5ec2174d70a8989bc207d257db90ffeca3b163) Cc: stable@vger.kernel.org
2025-06-18	drm/amdgpu: Add kicker device detection	Frank Min
	1. add kicker device list 2. add kicker device checking helper function Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 09aa2b408f4ab689c3541d22b0968de0392ee406) Cc: stable@vger.kernel.org
2025-06-18	drm/amd/display: Export full brightness range to userspace	Mario Limonciello
	[WHY] Userspace currently is offered a range from 0-0xFF but the PWM is programmed from 0-0xFFFF. This can be limiting to some software that wants to apply greater granularity. [HOW] Convert internally to firmware values only when mapping custom brightness curves because these are in 0-0xFF range. Advertise full PWM range to userspace. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8dbd72cb790058ce52279af38a43c2b302fdd3e5) Cc: stable@vger.kernel.org
2025-06-18	drm/amd/display: Only read ACPI backlight caps once	Mario Limonciello
	[WHY] Backlight caps are read already in amdgpu_dm_update_backlight_caps(). They may be updated by update_connector_ext_caps(). Reading again when registering backlight device may cause wrong values to be used. [HOW] Use backlight caps already registered to the dm. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 148144f6d2f14b02eaaa39b86bbe023cbff350bd) Cc: stable@vger.kernel.org
2025-06-18	drm/amd/display: Fix RMCM programming seq errors	Yihan Zhu
	[WHY & HOW] Fix RMCM programming sequence errors and mapping issues to pass the RMCM test. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 11baa4975025033547f45f5894087a0dda6efbb8) Cc: stable@vger.kernel.org
2025-06-18	drm/amd/display: Fix mpv playback corruption on weston	Alex Hung
	[WHAT] Severe video playback corruption is observed in the following setup: weston 14.0.90 (built from source) + mpv v0.40.0 with command: mpv bbb_sunflower_1080p_60fps_normal.mp4 --vo=gpu [HOW] ABGR16161616 needs to be included in dml2/2.1 translation. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d023de809f85307ca819a9dbbceee6ae1f50e2ad) Cc: stable@vger.kernel.org
2025-06-18	drm/amd/display: Add more checks for DSC / HUBP ONO guarantees	Nicholas Kazlauskas
	[WHY] For non-zero DSC instances it's possible that the HUBP domain required to drive it for sequential ONO ASICs isn't met, potentially causing the logic to the tile to enter an undefined state leading to a system hang. [HOW] Add more checks to ensure that the HUBP domain matching the DSC instance is appropriately powered. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Duncan Ma <duncan.ma@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit da63df07112e5a9857a8d2aaa04255c4206754ec) Cc: stable@vger.kernel.org
2025-06-18	drm/amd/display: Get LTTPR IEEE OUI/Device ID From Closest LTTPR To Host	Michael Strauss
	[WHY] These fields are read for the explicit purpose of detecting embedded LTTPRs (i.e. between host ASIC and the user-facing port), and thus need to calculate the correct DPCD address offset based on LTTPR count to target the appropriate LTTPR's DPCD register space with these queries. [HOW] Cascaded LTTPRs in a link each snoop and increment LTTPR count when queried via DPCD read, so an LTTPR embedded in a source device (e.g. USB4 port on a laptop) will always be addressible using the max LTTPR count seen by the host. Therefore we simply need to use a recently added helper function to calculate the correct DPCD address to target potentially embedded LTTPRs based on the received LTTPR count. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 791897f5c77a2a65d0e500be4743af2ddf6eb061) Cc: stable@vger.kernel.org
2025-06-18	drm/amd/display: Add dc cap for dp tunneling	Peichen Huang
	[WHAT] 1. add dc cap for dp tunneling 2. add function to get index of host router Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Cruise Hung <cruise.hung@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 29e178d13979cf6fdb42c5fe2dfec2da2306c4ad) Cc: stable@vger.kernel.org
2025-06-18	drm/amdkfd: move SDMA queue reset capability check to node_show	Jesse.Zhang
	Relocate the per-SDMA queue reset capability check from kfd_topology_set_capabilities() to node_show() to ensure we read the latest value of sdma.supported_reset after all IP blocks are initialized. Fixes: ceb7114c961b ("drm/amdkfd: flag per-sdma queue reset supported to user space") Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e17df7b086cf908cedd919f448da9e00028419bb)
2025-06-18	ASoC: doc: cs35l56: Add CS35L63 to the list of supported devices	Richard Fitzgerald
	Add CS35L63 to the list of parts supported by the cs35l56 driver and mention the CS35L63 where the text refers to the supported part numbers. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20250618145547.152814-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-06-18	PCI: pciehp: Ignore belated Presence Detect Changed caused by DPC	Lukas Wunner
	Commit c3be50f7547c ("PCI: pciehp: Ignore Presence Detect Changed caused by DPC") sought to ignore Presence Detect Changed events occurring as a side effect of Downstream Port Containment. The commit awaits recovery from DPC and then clears events which occurred in the meantime. However if the first event seen after DPC is Data Link Layer State Changed, only that event is cleared and not Presence Detect Changed. The object of the commit is thus defeated. That's because pciehp_ist() computes the events to clear based on the local "events" variable instead of "ctrl->pending_events". The former contains the events that had occurred when pciehp_ist() was entered, whereas the latter also contains events that have accumulated while awaiting DPC recovery. In practice, the order of PDC and DLLSC events is arbitrary and the delay in-between can be several milliseconds. So change the logic to always clear PDC events, even if they come after an initial DLLSC event. Fixes: c3be50f7547c ("PCI: pciehp: Ignore Presence Detect Changed caused by DPC") Reported-by: Lương Việt Hoàng <tcm4095@gmail.com> Reported-by: Joel Mathew Thomas <proxy0@tutamail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219765#c165 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Lương Việt Hoàng <tcm4095@gmail.com> Tested-by: Joel Mathew Thomas <proxy0@tutamail.com> Link: https://patch.msgid.link/d9c4286a16253af7e93eaf12e076e3ef3546367a.1750257164.git.lukas@wunner.de
2025-06-18	Documentation: embargoed-hardware-issues.rst: Add myself for Power	Madhavan Srinivasan
	Adding myself as the contact for Power Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://lore.kernel.org/r/20250614152925.82831-1-maddy@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-18	pinctrl: nuvoton: Fix boot on ma35dx platforms	Miquel Raynal
	As part of a wider cleanup trying to get rid of OF specific APIs, an incorrect (and partially unrelated) cleanup was introduced. The goal was to replace a device_for_each_chil_node() loop including an additional condition inside by a macro doing both the loop and the check on a single line. The snippet: device_for_each_child_node(dev, child) if (fwnode_property_present(child, "gpio-controller")) continue; was replaced by: for_each_gpiochip_node(dev, child) which expands into: device_for_each_child_node(dev, child) for_each_if(fwnode_property_present(child, "gpio-controller")) This change is actually doing the opposite of what was initially expected, breaking the probe of this driver, breaking at the same time the whole boot of Nuvoton platforms (no more console, the kernel WARN()). Revert these two changes to roll back to the correct behavior. Fixes: 693c9ecd8326 ("pinctrl: nuvoton: Reduce use of OF-specific APIs") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/20250613181312.1269794-1-miquel.raynal@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>