summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)Author
2025-05-08Reapply: drm/amdgpu: Use generic hdp flush functionLijo Lazar
Except HDP v5.2 all use a common logic for HDP flush. Use a generic function. HDP v5.2 forces NO_KIQ logic, revisit it later. Reapply after fixing up an HDP regression. v2: merge the fix (Alex) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-08drm/amdgpu/hdp7: use memcfg register to post the write for HDP flushAlex Deucher
Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: 689275140cb8 ("drm/amdgpu/hdp7.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-08drm/amdgpu/hdp6: use memcfg register to post the write for HDP flushAlex Deucher
Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: abe1cbaec6cf ("drm/amdgpu/hdp6.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-08drm/amdgpu: cleanup sriov function for psp v12Huang Rui
PSP v12 won't have SRIOV function. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-08drm/amdgpu/hdp5.2: use memcfg register to post the write for HDP flushAlex Deucher
Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: f756dbac1ce1 ("drm/amdgpu/hdp5.2: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-08drm/amdgpu/hdp5: use memcfg register to post the write for HDP flushAlex Deucher
Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: cf424020e040 ("drm/amdgpu/hdp5.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-08drm/amdgpu: remove re-route ih in psp v12Huang Rui
APU doesn't have second IH ring, so re-routing action here is a no-op. It will take a lot of time to wait timeout from PSP during the initialization. So remove the function in psp v12. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu/hdp4: use memcfg register to post the write for HDP flushAlex Deucher
Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: c9b8dcabb52a ("drm/amdgpu/hdp4.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5c937b4a6050316af37ef214825b6340b5e9e391) Cc: stable@vger.kernel.org
2025-05-07drm/amdgpu: fix pm notifier handlingAlex Deucher
Set the s3/s0ix and s4 flags in the pm notifier so that we can skip the resource evictions properly in pm prepare based on whether we are suspending or hibernating. Drop the eviction as processes are not frozen at this time, we we can end up getting stuck trying to evict VRAM while applications continue to submit work which causes the buffers to get pulled back into VRAM. v2: Move suspend flags out of pm notifier (Mario) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support") Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 06f2dcc241e7e5c681f81fbc46cacdf4bfd7d6d7) Cc: stable@vger.kernel.org
2025-05-07Revert "drm/amd: Stop evicting resources on APUs in suspend"Alex Deucher
This reverts commit 3a9626c816db901def438dc2513622e281186d39. This breaks S4 because we end up setting the s3/s0ix flags even when we are entering s4 since prepare is used by both flows. The causes both the S3/s0ix and s4 flags to be set which breaks several checks in the driver which assume they are mutually exclusive. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3634 Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ce8f7d95899c2869b47ea6ce0b3e5bf304b2fff4) Cc: stable@vger.kernel.org
2025-05-07drm/amdgpu/vcn: using separate VCN1_AON_SOC offsetRuijing Dong
VCN1_AON_SOC_ADDRESS_3_0 offset varies on different VCN generations, the issue in vcn4.0.5 is caused by a different VCN1_AON_SOC_ADDRESS_3_0 offset. This patch does the following: 1. use the same offset for other VCN generations. 2. use the vcn4.0.5 special offset 3. update vcn_4_0 and vcn_5_0 Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5c89ceda9984498b28716944633a9a01cbb2c90d) Cc: stable@vger.kernel.org
2025-05-07drm/amd/display: Fix wrong handling for AUX_DEFER caseWayne Lin
[Why] We incorrectly ack all bytes get written when the reply actually is defer. When it's defer, means sink is not ready for the request. We should retry the request. [How] Only reply all data get written when receive I2C_ACK|AUX_ACK. Otherwise, reply the number of actual written bytes received from the sink. Add some messages to facilitate debugging as well. Fixes: ad6756b4d773 ("drm/amd/display: Shift dc link aux to aux_payload") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3637e457eb0000bc37d8bbbec95964aad2fb29fd) Cc: stable@vger.kernel.org
2025-05-07drm/amd/display: Copy AUX read reply data whenever length > 0Wayne Lin
[Why] amdgpu_dm_process_dmub_aux_transfer_sync() should return all exact data reply from the sink side. Don't do the analysis job in it. [How] Remove unnecessary check condition AUX_TRANSACTION_REPLY_AUX_ACK. Fixes: ead08b95fa50 ("drm/amd/display: Fix race condition in DPIA AUX transfer") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9b540e3fe6796fec4fb1344f3be8952fc2f084d4) Cc: stable@vger.kernel.org
2025-05-07drm/amd/display: Remove incorrect checking in dmub aux handlerWayne Lin
[Why & How] "Request length != reply length" is expected behavior defined in spec. It's not an invalid reply. Besides, replied data handling logic is not designed to be written in amdgpu_dm_process_dmub_aux_transfer_sync(). Remove the incorrectly handling section. Fixes: ead08b95fa50 ("drm/amd/display: Fix race condition in DPIA AUX transfer") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 81b5c6fa62af62fe89ae9576f41aae37830b94cb) Cc: stable@vger.kernel.org
2025-05-07drm/amd/display: Fix the checking condition in dmub aux handlingWayne Lin
[Why & How] Fix the checking condition for detecting AUX_RET_ERROR_PROTOCOL_ERROR. It was wrongly checking by "not equals to" Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1db6c9e9b62e1a8912f0a281c941099fca678da3) Cc: stable@vger.kernel.org
2025-05-07drm/amd/display: Shift DMUB AUX reply command if necessaryWayne Lin
[Why] Defined value of dmub AUX reply command field get updated but didn't adjust dm receiving side accordingly. [How] Check the received reply command value to see if it's updated version or not. Adjust it if necessary. Fixes: ead08b95fa50 ("drm/amd/display: Fix race condition in DPIA AUX transfer") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d5c9ade755a9afa210840708a12a8f44c0d532f4) Cc: stable@vger.kernel.org
2025-05-07drm/amd/display: Call FP Protect Before Mode Programming/Mode SupportAustin Zheng
[Why] Memory allocation occurs within dml21_validate() for adding phantom planes. May cause kernel to be tainted due to usage of FP Start. [How] Move FP start from dml21_validate to before mode programming/mode support. Calculations requiring floating point are all done within mode programming or mode support. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fe3250f10819b411808ab9ae1d824c5fc9b59170)
2025-05-07drm/amd/display: Remove unnecessary DC_FP_START/DC_FP_ENDAlex Hung
[WHY & HOW] Remove the unnecessary DC_FP_START/DC_FP_END pair to reduce time in preempt_disable. It also fixes "BUG: sleeping function called from invalid context" error messages because of calling kzalloc with GFP_KERNEL which can sleep. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 94da0735b67b3ada90a3e2e017d306f070306f1d)
2025-05-07drm/amd/display: more liberal vmin/vmax update for freesyncAurabindo Pillai
[Why] FAMS2 expects vmin/vmax to be updated in the case when freesync is off, but supported. But we only update it when freesync is enabled. [How] Change the vsync handler such that dc_stream_adjust_vmin_vmax() its called irrespective of whether freesync is enabled. If freesync is supported, then there is no harm in updating vmin/vmax registers. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3546 Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cfb2d41831ee5647a4ae0ea7c24971a92d5dfa0d) Cc: stable@vger.kernel.org
2025-05-07drm/amd/display: Fix invalid context error in dml helperRoman Li
[Why] "BUG: sleeping function called from invalid context" error. after: "drm/amd/display: Protect FPU in dml2_validate()/dml21_validate()" The populate_dml_plane_cfg_from_plane_state() uses the GFP_KERNEL flag for memory allocation, which shouldn't be used in atomic contexts. The allocation is needed only for using another helper function get_scaler_data_for_plane(). [How] Modify helpers to pass a pointer to scaler_data within existing context, eliminating the need for dynamic memory allocation/deallocation and copying. Fixes: 366e77cd4923 ("drm/amd/display: Protect FPU in dml2_validate()/dml21_validate()") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bd3e84bc98f81b44f2c43936bdadc3241d654259) Cc: stable@vger.kernel.org
2025-05-07drm/amd: Add per-ring reset for vcn v5.0.0 useMario Limonciello
If there is a problem requiring a reset of the VCN engine, it is better to reset the VCN engine rather than the entire GPU. Add a reset callback for the ring which will stop and start VCN if an issue happens. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250506204948.12048-4-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amd: Add per-ring reset for vcn v4.0.0 useMario Limonciello
If there is a problem requiring a reset of the VCN engine, it is better to reset the VCN engine rather than the entire GPU. Add a reset callback for the ring which will stop and start VCN if an issue happens. Link: https://lore.kernel.org/r/20250506204948.12048-3-mario.limonciello@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amd: Add per-ring reset for vcn v4.0.5 useMario Limonciello
There is a problem occurring on VCN 4.0.5 where in some situations a job is timing out. This triggers a job timeout which then causes a GPU reset for recovery. That has exposed a number of issues with GPU reset that have since been fixed. But also a GPU reset isn't actually needed for this circumstance. Just restarting the ring is enough. Add a reset callback for the ring which will stop and start VCN if the issue happens. Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3909 Link: https://lore.kernel.org/r/20250506204948.12048-2-mario.limonciello@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu/hdp4: use memcfg register to post the write for HDP flushAlex Deucher
Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: c9b8dcabb52a ("drm/amdgpu/hdp4.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07Revert "drm/amdgpu: Use generic hdp flush function"Alex Deucher
This reverts commit 18a878fd8aef0ec21648a3782f55a79790cd4073. Revert this temporarily to make it easier to fix a regression in the HDP handling. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amd/pm/smu13: Remove unused smu_v3 functionsDr. David Alan Gilbert
smu_v13_0_display_clock_voltage_request() and smu_v13_0_set_min_deep_sleep_dcefclk() were added in 2020 by commit c05d1c401572 ("drm/amd/swsmu: add aldebaran smu13 ip support (v3)") but have remained unused. Remove them. smu_v13_0_display_clock_voltage_request() was the only user of smu_v13_0_set_hard_freq_limited_range(). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amd/pm/smu11: Remove unused smu_v11_0_get_dpm_level_rangeDr. David Alan Gilbert
The last use of smu_v11_0_get_dpm_level_range() was removed in 2020 by commit 46a301e14e8a ("drm/amd/powerplay: drop unnecessary Navi1x specific APIs") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amd/pm/smu7: Remove unused smu7_copy_bytes_from_smcDr. David Alan Gilbert
smu7_copy_bytes_from_smc() was added in 2016 by commit 1ff55f465103 ("drm/amd/powerplay: implement smu7_smumgr for asics with smu ip version 7.") but never used. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: fix the indentationSunil Khatri
fix the indentation drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c:6992 gfx_v11_ip_dump compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202505071619.7sHTLpNg-lkp@intel.com/ Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: remove mdelay in psp v12Huang Rui
Since secure firmware is more stable than bring up phase, I believe we don't need such mdelays any more before wait PSP response on PSP v12. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Trigger Huang <Trigger.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07amd/amdkfd: Trigger segfault for early userptr unmmappingShane Xiao
If applications unmap the memory before destroying the userptr, it needs trigger a segfault to notify user space to correct the free sequence in VM debug mode. v2: Send gpu access fault to user space v3: Report gpu address to user space, remove unnecessary params v4: update pr_err into one line, remove userptr log info Signed-off-by: Shane Xiao <shane.xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: Add debug bit for userptr usageShane Xiao
In VM debug mode, it is desirable to notify the application to correct the freeing sequence by unmapping the memory before destroying the userptr in the old userptr path. Add a bitmask to decide whether to send gpu vm fault to the applition. Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: unreserve the gem BO before returning from attach errorPrike Liang
It requires unlocking the reserved gem BO before returning from attaching the eviction fence error. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: promote the implicit sync to the dependent read fencesPrike Liang
The driver doesn't want to implicitly sync on the DMA_RESV_USAGE_BOOKKEEP usage fences, and the BOOKEEP fences should be synced explicitly. So, as the VM implicit syncing only need to return and sync the dependent read fences. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu/psp: mark securedisplay TA as optionalAlex Deucher
This is an optional TA which is only available on certain embedded systems. Mark it as optional to avoid user confusion. This mirrors what we already do for other optional TAs. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4181 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: fix pm notifier handlingAlex Deucher
Set the s3/s0ix and s4 flags in the pm notifier so that we can skip the resource evictions properly in pm prepare based on whether we are suspending or hibernating. Drop the eviction as processes are not frozen at this time, we we can end up getting stuck trying to evict VRAM while applications continue to submit work which causes the buffers to get pulled back into VRAM. v2: Move suspend flags out of pm notifier (Mario) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support") Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: Implement unrecoverable error message handling for VFsEllen Pan
This notification may arrive in VF mailbox while polling for response from another event. This patches covers the following scenarios: - If VF is already in RMA state, then do not attempt to contact the host. Host will ignore the VF after sending the notification. - If the notification is detected during polling, then set the RMA status, and return error to caller. - If the notification arrives by interrupt, then set the RMA status and queue a reset. This reset will fail and VF will stop runtime services. Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: Add unrecoverable error message definitions for VFsEllen Pan
Host may stop runtime services after reaching a bad page threshold. This notification will indicate to the VF that it no longer has access to the GPU. Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07Revert "drm/amd: Stop evicting resources on APUs in suspend"Alex Deucher
This reverts commit 3a9626c816db901def438dc2513622e281186d39. This breaks S4 because we end up setting the s3/s0ix flags even when we are entering s4 since prepare is used by both flows. The causes both the S3/s0ix and s4 flags to be set which breaks several checks in the driver which assume they are mutually exclusive. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3634 Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu/vcn: using separate VCN1_AON_SOC offsetRuijing Dong
VCN1_AON_SOC_ADDRESS_3_0 offset varies on different VCN generations, the issue in vcn4.0.5 is caused by a different VCN1_AON_SOC_ADDRESS_3_0 offset. This patch does the following: 1. use the same offset for other VCN generations. 2. use the vcn4.0.5 special offset 3. update vcn_4_0 and vcn_5_0 Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: fix the eviction fence dereferencePrike Liang
The dma_resv_add_fence() already refers to the added fence. So when attaching the evciton fence to the gem bo, it needn't refer to it anymore. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: Implement Runtime Bad Page query for VFsEllen Pan
Host will send a notification when new bad pages are available. Uopn guest request, the first 256 bad page addresses will be placed into the PF2VF region. Guest should pause the PF2VF worker thread while the copy is in progress. Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: Add Runtime Bad Page message definitions for VFsEllen Pan
Currently VFs rely on poison consumption interrupt from HW to kick off the bad page retirement process. Part of this process includes a VF reset. This patch adds the following: 1) Host Bad Pages notification message. 2) Guest request bad pages message. When combined, VFs are able to reserve the pages early, and potentially avoid future poison consumption that will disrupt user services from consequent FLR. Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdgpu: Add documentation to some parts of the AMDGPU ring and wbRodrigo Siqueira
Add some random documentation associated with the ring buffer manipulations and writeback. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amdkfd: change error to warning message for SDMA queues creationEric Huang
SDMA doesn't support oversubsciption, it is the user matter to create queues over HW limit, but not supposed to be a KFD error. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amd/display: Don't check for NULL divisor in fixpt codeHarry Wentland
[Why] We check for a NULL divisor but don't act on it. This check does nothing other than throw a warning. It does confuse static checkers though: See https://lkml.org/lkml/2025/4/26/371 [How] Drop the ASSERTs in both DC and SPL variants. Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Fixes: 6efc0ab3b05d ("drm/amd/display: add back quality EASF and ISHARP and dc dependency changes") Signed-off-by: Harry Wentland <harry.wentland@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Leo Li <sunpeng.li@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amd/display: Use true/false for boolean variables in DML2 core filesIvan Shamliev
Replace 0 and 1 with false and true for boolean variables in dml2_core_dcn4_calcs.c and dml2_core_utils.c to align with the Linux kernel coding style guidelines, which recommend using C99 bool type with true/false values. Signed-off-by: Ivan Shamliev <ivan.shamliev.dev@abv.bg> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-07drm/amd/display: adds kernel-doc comment for dc_stream_remove_writeback()James Flowers
Adds kernel-doc for externally linked dc_stream_remove_writeback function. Signed-off-by: James Flowers <bold.zone2373@fastmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-05-06BackMerge tag 'v6.15-rc5' into drm-nextDave Airlie
Linux 6.15-rc5, requested by tzimmerman for fixes required in drm-next. Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-05drm/amdgpu: only keep most recent fence for each contextArvind Yadav
Keep only the latest fences to reduce the number of values given back to userspace v2: - Export this code from dma-fence-unwrap.c(by Christian). v3: - To split this in a dma_buf patch and amd userq patch(by Sunil). - No need to add a new function just re-use existing(by Christian). v4: Export dma_fence_dedub_array function and used it(by Christian). Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>