git.armlinux.org.uk/linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2024-01-22	drm/amdgpu: Fix null pointer dereference	Hawking Zhang
	amdgpu_reg_state_sysfs_fini could be invoked at the time when asic_func is even not initialized, i.e., amdgpu_discovery_init fails for some reason. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: skip call ras_late_init if ras block is not supported	Yang Wang
	skip call ras_late_init callback if ras block is not supported. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: Show vram vendor only if available	Lijo Lazar
	Ony if vram vendor info is available, show in sysfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: Avoid fetching vram vendor information	Lijo Lazar
	For GFX 9.4.3 APUs, the current method of fetching vram vendor information is not reliable. Avoid fetching the information. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/pm: update the power cap setting	Kenneth Feng
	update the power cap setting for smu_v13.0.0/smu_v13.0.7 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2356 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu/pm: Fix the power source flag error	Ma Jun
	The power source flag should be updated when [1] System receives an interrupt indicating that the power source has changed. [2] System resumes from suspend or runtime suspend Signed-off-by: Ma Jun <Jun.Ma2@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Promote DAL to 3.2.269	Aric Cyr
	- FW Release 0.0.201.0 - Fix resizing video window for dcn321 - Fix timing bandwidth calculation for HDMI - Fix null-deref in dml2 assigned pipe search - Add GART memory support for dmcub - Add power_state and pme_pending flag - Add usb4_bw_alloc_support flag - Revert "Rework DC Z10 restore Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: [FW Promotion] Release 0.0.201.0	Anthony Koo
	- Add debug flag for Replay IPS visual confirm - Remove unused debug flags that should not be controlled inside Replay FSM Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Anthony Koo <anthony.koo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Replay + IPS + ABM in Full Screen VPB	ChunTao Tso
	[Why] Because ABM will wait VStart to start getting histogram data, it will cause we can't enter IPS while full screnn video playing. [How] Modify the panel refresh rate to the maximun multiple of current refresh rate. Reviewed-by: Dennis Chan <dennis.chan@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: ChunTao Tso <chuntao.tso@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: turn off windowed Mpo ODM feature for dcn321	Wenjing Liu
	[why] It has been found a regression caused by enabling this feature during ODM to MPC combine switch when user is resizing video window. The transition is only needed when the feature is enabled. During the transition driver will temporary switch to use max dppclk level through SMU set hard min interface. The interface times out and fail to configure the max dpp clock level, which caused system issue as the desired clock can't be set. We will continue investigating the issue and root cause the issue where max dppclk level can't be reached. But for now we have to disable this feature as this feature will cause us to hit this problem in common use cases during video playback unfortunately. The issue is dcn321 specific so it won't impact other dcn revisions. Reviewed-by: Martin Leung <martin.leung@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Add GART memory support for dmcub	Fudongwang
	[Why] In dump file, GART memory can be accessed while frame buffer cannot. [How] Add GART memory support for dmcub. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Fudongwang <fudong.wang@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Revert "Rework DC Z10 restore"	Charlene Liu
	This reverts commit e6f82bd44b401049367fcdee3328c7c720351419. It caused intermittent hangs when enabling IPS on static screen. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Charlene Liu <charlene.liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: add power_state and pme_pending flag	Muhammad Ahmed
	[what] Adding power_state to dc.h and pme_pending flag to clk_mgr_internal.h Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Muhammad Ahmed <ahmed.ahmed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Add IPS checks before dcn register access	Roman Li
	[Why] With IPS enabled a system hangs once PSR is active. PSR active triggers transition to IPS2 state. While in IPS2 an access to dcn registers results in hard hang. Existing check doesn't cover for PSR sequence. [How] Safeguard register access by disabling idle optimization in atomic commit and crtc scanout. It will be re-enabled on next vblank. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Add NULL-checks in dml2 assigned pipe search	Allen Pan
	[Why] NULL-deref regression after: "drm/amd/display: Fix dml2 assigned pipe search" [How] Add verification for potential NULLs Fixes: d451b534e0b4 ("drm/amd/display: Fix dml2 assigned pipe search") Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Gabe Teeger <gabe.teeger@amd.com> Signed-off-by: Allen Pan <allen.pan@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Add usb4_bw_alloc_support flag	Peichen Huang
	[Why] dc should have a flag for DM to enable usb4_bw_alloc in dptx [How] - Add usb4_bw_alloc_support flag in dc_config Reviewed-by: Wayne Lin <wayne.lin@amd.com> Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Peichen Huang <peichen.huang@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Promote DAL to 3.2.268	Aric Cyr
	Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: create DCN3-specific log for MPC state	Melissa Wen
	Logging DCN3 MPC state was following DCN1 implementation that doesn't consider new DCN3 MPC color blocks. Create new elements according to DCN3 MPC color caps and a new DCN3-specific function for reading MPC data. v3: - remove gamut remap reg reading in favor of fixed31_32 matrix data Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Fix timing bandwidth calculation for HDMI	Leo (Hanghong) Ma
	[Why && How] The current bandwidth calculation for timing doesn't account for certain HDMI modes overhead which leads to DSC can't be enabled. Add support to calculate the actual bandwidth for these HDMI modes. Reviewed-by: Chris Park <chris.park@amd.com> Acked-by: Roman Li <roman.li@amd.com> Signed-off-by: Leo (Hanghong) Ma <hanghong.ma@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: add get_gamut_remap helper for MPC3	Melissa Wen
	We want to be able to read the MPC's gamut remap matrix similar to what we do with .dpp_get_gamut_remap functions. On the other hand, we don't need a hook here because only DCN3+ has the MPC gamut remap block, being absent in previous families. Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: fill up DCN3 DPP color state	Melissa Wen
	DCN3 DPP color state was uncollected and some state elements from DCN1 doesn't fit DCN3. Create new elements according to DCN3 color caps and fill them up for DTN log output. rfc-v2: - fix reading of gamcor and blnd gamma states - remove gamut remap register in favor of gamut remap matrix reading Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: read gamut remap matrix in fixed-point 31.32 format	Melissa Wen
	Instead of read gamut remap data from hw values, convert HW register values (S2D13) into a fixed-point 31.32 matrix for color state log. Change DCN10 log to print data in the format of the gamut remap matrix. Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Add dpp_get_gamut_remap functions	Harry Wentland
	We want to be able to read the DPP's gamut remap matrix. v2: - code-style and doc comments clean-up (Melissa) Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: decouple color state from hw state log	Melissa Wen
	Prepare to hook up color state log according to the DCN version. v3: - put functions in single line (Siqueira) Signed-off-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() ↵	Srinivasan Shanmugam
	& write_dpcd()' functions The 'status' variable in 'core_link_read_dpcd()' & 'core_link_write_dpcd()' was uninitialized. Thus, initializing 'status' variable to 'DC_ERROR_UNEXPECTED' by default. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:226 core_link_read_dpcd() error: uninitialized symbol 'status'. drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:248 core_link_write_dpcd() error: uninitialized symbol 'status'. Cc: stable@vger.kernel.org Cc: Jerry Zuo <jerry.zuo@amd.com> Cc: Jun Lei <Jun.Lei@amd.com> Cc: Wayne Lin <Wayne.Lin@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: update check condition of query for ras page retire	Tao Zhou
	Support page retirement handling in debug mode. v2: revert smu_v13_0_6_get_ecc_info directly. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	Revert "drm/amd/pm: smu v13_0_6 supports ecc info by default"	Tao Zhou
	This reverts commit 6fe08f56db798659beca41ab5b1727a31518f794. We use debug mode flag instead of this interface. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/display: Drop kdoc markers for some Panel Replay functions	Srinivasan Shanmugam
	Fixes the below gcc with W=1: drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_replay.c:262: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Set REPLAY power optimization flags and coasting vtotal. drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dmub_replay.c:284: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst send Replay general cmd to DMUB. Fixes: e379787cbc2a ("drm/amd/display: Add some functions for Panel Replay") Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: Cleanup inconsistent indenting in 'amdgpu_gfx_enable_kcq()'	Srinivasan Shanmugam
	Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting Cc: Le Ma <Le.Ma@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amd/pm: udpate smu v13.0.6 message permission	Yang Wang
	update smu v13.0.6 message to allow guest driver set gfx clock. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu:Support retiring multiple MCA error address pages	YiPeng Chai
	Support retiring multiple MCA error address pages in one in-band query for umc v12_0. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: add interface to check mca umc status	YiPeng Chai
	Add interface to check mca umc status. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: Use asynchronous polling to handle umc_v12_0 poisoning	YiPeng Chai
	Use asynchronous polling to handle umc_v12_0 poisoning. v2: 1. Change function name. 2. Change the debugging information content. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: Fix ras features value calltrace	Stanley.Yang
	The high three bits of ras features mask indicate socket id, it should skip to check high three bits of ras features mask before disable all ras features. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: Prepare for asynchronous processing of umc page retirement	YiPeng Chai
	Preparing for asynchronous processing of umc page retirement. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: Add log info for umc_v12_0	YiPeng Chai
	Add log info for umc_v12_0. v2: Delete redundant logs. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: fix wrong sizeof argument	Samasth Norway Ananda
	voltage_parameters is a point to a struct of type SET_VOLTAGE_PARAMETERS_V1_3. Passing just voltage_parameters would not print the right size of the struct variable. So we need to pass *voltage_parameters to sizeof(). Fixes: 4630d5031cd8 ("drm/amdgpu: check PS, WS index") Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/amdgpu: Enable seq64 manager and fix bugs	Arunpravin Paneer Selvam
	- Enable the seq64 mapping sequence. - Fix wflinfo va conflict and other bugs. v1: - The seq64 area needs to be included in the AMDGPU_VA_RESERVED_SIZE otherwise the areas will conflict with user space allocations (Alex) - It needs to be mapped read only in the user VM (Alex) v2: - Instead of just one define for TOP/BOTTOM reserved space separate them into two (Christian) - Fix the CPU and VA calculations and while at it also cleanup error handling and kerneldoc (Christian) Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
2024-01-22	drm/radeon/ni_dpm: remove redundant NULL check	Nikita Zhandarovich
	'leakage_table' will always be successfully initialized as a pointer to '&rdev->pm.dpm.dyn_state.cac_leakage_table'. Remove unnecessary check if only to silence static checkers. Found by Linux Verification Center (linuxtesting.org) with static analysis tool Svace. Fixes: 69e0b57a91ad ("drm/radeon/kms: add dpm support for cayman (v5)") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-22	drm/radeon: remove dead code in ni_mc_load_microcode()	Nikita Zhandarovich
	Inside the if block with (running == 0), the checks for 'running' possibly being non-zero are redundant. Remove them altogether. This change is similar to the one authored by Heinrich Schuchardt <xypron.glpk@gmx.de> in commit ddbbd3be9679 ("drm/radeon: remove dead code, si_mc_load_microcode (v2)") Found by Linux Verification Center (linuxtesting.org) with static analysis tool Svace. Fixes: 0af62b016804 ("drm/radeon/kms: add ucode loader for NI") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amd/pm: enable amdgpu smu send message log	Yang Wang
	v1: enable amdgpu smu driver message log. v2: add smu/pmfw response value into debug log. Signed-off-by: Yang Wang <KevinYang.Wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amdgpu: update error condition check for umc_v12_0_query_error_address	Tao Zhou
	Deferred error is also taken into account. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amdgpu: Skip do PCI error slot reset during RAS recovery	Stanley.Yang
	Why: The PCI error slot reset maybe triggered after inject ue to UMC multi times, this caused system hang. [ 557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume [ 557.373718] [drm] PCIE GART of 512M enabled. [ 557.373722] [drm] PTB located at 0x0000031FED700000 [ 557.373788] [drm] VRAM is lost due to GPU reset! [ 557.373789] [drm] PSP is resuming... [ 557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset [ 557.547067] [drm] PCI error: detected callback, state(1)!! [ 557.547069] [drm] No support for XGMI hive yet... [ 557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter [ 557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations [ 557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered [ 557.610492] [drm] PCI error: slot reset callback!! ... [ 560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded! [ 560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded! [ 560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI [ 560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G OE 5.15.0-91-generic #101-Ubuntu [ 560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023 [ 560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu] [ 560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff <48> 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00 [ 560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202 [ 560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0 [ 560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010 [ 560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08 [ 560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000 [ 560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000 [ 560.803889] FS: 0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000 [ 560.812973] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0 [ 560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 560.843444] PKRU: 55555554 [ 560.846480] Call Trace: [ 560.849225] <TASK> [ 560.851580] ? show_trace_log_lvl+0x1d6/0x2ea [ 560.856488] ? show_trace_log_lvl+0x1d6/0x2ea [ 560.861379] ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.867778] ? show_regs.part.0+0x23/0x29 [ 560.872293] ? __die_body.cold+0x8/0xd [ 560.876502] ? die_addr+0x3e/0x60 [ 560.880238] ? exc_general_protection+0x1c5/0x410 [ 560.885532] ? asm_exc_general_protection+0x27/0x30 [ 560.891025] ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu] [ 560.898323] amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu] [ 560.904520] process_one_work+0x228/0x3d0 How: In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amdgpu: Show deferred error count for UMC	Stanley.Yang
	Show deferred error count for UMC syfs node Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amdgpu: Enable GFXOFF for Compute on GFX11	Ori Messinger
	On GFX version 11, GFXOFF was disabled due to a MES KIQ firmware issue, which has since been fixed after version 64. This patch only re-enables GFXOFF for GFX version 11 if the GPU's MES KIQ firmware version is newer than version 64. V2: Keep GFXOFF disabled on GFX11 if MES KIQ is below version 64. V3: Add parentheses to avoid GCC warning for parentheses: "suggest parentheses around comparison in operand of ‘&’" V4: Remove "V3" from commit title V5: Change commit description and insert 'Acked-by' Signed-off-by: Ori Messinger <Ori.Messinger@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amdgpu: fix UBSAN array-index-out-of-bounds for ras_block_string[]	Yang Wang
	fix array index out of bounds issue for ras_block_string[] array. Fixes: 30df05fb74f6 ("drm/amdgpu: Align ras block enum with firmware") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amd/amdgpu: Update RLC_SPM_MC_CNT by ring wreg in guest	YuanShang
	Submit command of wreg in GFX and COMPUTE ring to update RLC_SPM_MC_CNT in guest machine during runtime. Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amd/display: Drop 'acrtc' and add 'new_crtc_state' NULL check for ↵	Srinivasan Shanmugam
	writeback requests. Return value of 'to_amdgpu_crtc' which is container_of(...) can't be null, so it's null check 'acrtc' is dropped. Fixing the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:9302 amdgpu_dm_atomic_commit_tail() error: we previously assumed 'acrtc' could be null (see line 9299) Added 'new_crtc_state' NULL check for function 'drm_atomic_get_new_crtc_state' that retrieves the new state for a CRTC, while enabling writeback requests. Cc: stable@vger.kernel.org Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amdgpu: Remove unnecessary NULL check	Felix Kuehling
	A static checker pointed out, that bo_va->base.bo was already derefenced earlier in the same scope. Therefore this check is unnecessary here. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: 50661eb1a2c8 ("drm/amdgpu: Auto-validate DMABuf imports in compute VMs") Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18	drm/amdgpu: revert "Adjust removal control flow for smu v13_0_2"	Christian König
	Calling amdgpu_device_ip_resume_phase1() during shutdown leaves the HW in an active state and is an unbalanced use of the IP callbacks. Using the IP callbacks like this can lead to memory leaks, double free and imbalanced reference counters. Leaving the HW in an active state can lead to DMA accesses to memory now freed by the driver. Both is a complete no-go for driver unload so completely revert the workaround for now. This reverts commit f5c7e7797060255dbc8160734ccc5ad6183c5e04. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>