summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-02-09drm/amd/pm: correct hwmon power label nameYang Wang
only vangogh has 2 types of hwmon power node: "fastPPT" and "slowPPT", the other asic only has 1 type of hwmon power node: "PPT". Signed-off-by: Yang Wang <KevinYang.Wang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amd/amdgpu/amdgpu_uvd: Fix forgotten unmap buffer objectzhanglianjie
After the buffer object is successfully mapped, call amdgpu_bo_kunmap before the function returns. Signed-off-by: zhanglianjie <zhanglianjie@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/radeon/uvd: Fix forgotten unmap buffer objectszhanglianjie
After the buffer object is successfully mapped, call radeon_bo_kunmap before the function returns. Signed-off-by: zhanglianjie <zhanglianjie@uniontech.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amdkfd: Consolidate MQD manager functionsMukul Joshi
A few MQD manager functions are duplicated for all versions of MQD manager. Remove this duplication by moving the common functions into kfd_mqd_manager.c file. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amdkfd: Remove unused old debugger implementationMukul Joshi
Cleanup the kfd code by removing the unused old debugger implementation. The address watch was only ever implemented in the upstream driver for GFXv7 (Kaveri). The user mode tools runtime using this API was never open-sourced. Work on the old debugger prototype that used this API has been discontinued years ago. Only a small piece of resetting wavefronts is kept and is moved to kfd_device_queue_manager.c. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amdkfd: Fix TLB flushing in KFD SVM with no HWSMukul Joshi
With no HWS, TLB flushing will not work in SVM code. Fix this by calling kfd_flush_tlb() which works for both HWS and no HWS case. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amd/pm: fix hwmon node of power1_label create issueYang Wang
it will cause hwmon node of power1_label is not created. v2: the hwmon node of "power1_label" is always needed for all ASICs. and the patch will remove ASIC type check for "power1_label". Fixes: ae07970a0621d6 ("drm/amd/pm: add support for hwmon control of slow and fast PPT limit on vangogh") Signed-off-by: Yang Wang <KevinYang.Wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amd/pm: drm/amd/pm: disable GetPptLimit message in sriov modeYang Wang
PPT limit cannot be queried from VF Fixes: f3527a6483fbcc ("drm/amd/pm: Enable sysfs required by rocm-smi tool for One VF mode") Signed-off-by: Yang Wang <KevinYang.Wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amdkfd: use unmap all queues for poison consumptionTao Zhou
Replace reset queue for specific PASID with unmap all queues, reset queue could break CP scheduler. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasidTao Zhou
As the function is used in more different cases, use a more general name. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: move dpcs_3_0_3 headers from dcn to dpcsAlex Deucher
To align with other headers. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: move dpcs_3_0_0 headers from dcn to dpcsAlex Deucher
To align with other headers. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: add missing license to dpcs_3_0_0 headersAlex Deucher
MIT. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu/display: change pipe policy for DCN 2.0Alex Deucher
Fixes hangs on driver load with multiple displays on DCN 2.0 parts. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215511 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1877 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1886 Fixes: ee2698cf79cc ("drm/amd/display: Changed pipe split policy to allow for multi-display pipe split") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: drop experimental flag on aldebaranAlex Deucher
These have been at production level for a while. Drop the flag. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/pm: add missing prototypes to amdgpu_dpm_internalMaíra Canal
Include the header with the prototype to silence the following clang warnings: drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_dpm_internal.c:29:6: warning: no previous prototype for function 'amdgpu_dpm_get_active_displays' [-Wmissing-prototypes] void amdgpu_dpm_get_active_displays(struct amdgpu_device *adev) ^ drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_dpm_internal.c:29:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void amdgpu_dpm_get_active_displays(struct amdgpu_device *adev) ^ static drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_dpm_internal.c:76:5: warning: no previous prototype for function 'amdgpu_dpm_get_vrefresh' [-Wmissing-prototypes] u32 amdgpu_dpm_get_vrefresh(struct amdgpu_device *adev) ^ drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_dpm_internal.c:76:1: note: declare 'static' if the function is not intended to be used outside of this translation unit u32 amdgpu_dpm_get_vrefresh(struct amdgpu_device *adev) ^ static 2 warnings generated. Besides that, remove the duplicated prototype of the function amdgpu_dpm_get_vblank_time in order to keep the consistency of the headers. Fixes: 6ddbd37f1074 ("drm/amd/pm: optimize the amdgpu_pm_compute_clocks() implementations") Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Maíra Canal <maira.canal@usp.br> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/pm: fix error handlingTom Rix
clang static analysis reports this error amdgpu_smu.c:2289:9: warning: Called function pointer is null (null dereference) return smu->ppt_funcs->emit_clk_levels( ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There is a logic error in the earlier check of emit_clk_levels. The error value is set to the ret variable but ret is never used. Return directly and remove the unneeded ret variable. Fixes: 5d64f9bbb628 ("amdgpu/pm: Implement new API function "emit" that accepts buffer base and write offset") Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: reserve the pd while cleaning up PRTsChristian König
We want to have lockdep annotation here, so make sure that we reserve the PD while removing PRTs even if it isn't strictly necessary since the VM object is about to be destroyed anyway. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: move lockdep assert to the right place.Christian König
Since newly added BOs don't have any mappings it's ok to add them without holding the VM lock. Only when we add per VM BOs the lock is mandatory. Signed-off-by: Christian König <christian.koenig@amd.com> Reported-by: Bhardwaj, Rajneesh <Rajneesh.Bhardwaj@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: handle null link encoderMartin Tsai
[Why] The link encoder mapping could return a null one and causes system crash. [How] Let the mapping can get an available link encoder without endpoint identification check. Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Martin Tsai <martin.tsai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: 3.2.172Aric Cyr
This version brings along the following fixes: -fix for build failure uninitalized error -Bug fix for DP2 using uncertified cable -limit unbounded request to 5k -fix DP LT sequence on EQ fail -Bug fixes for S3/S4 Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: [FW Promotion] Release 0.0.103.0Anthony Koo
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Anthony Koo <Anthony.Koo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: Fix DP LT sequence on EQ failIlya
[Why] The number of lanes wasn't being reset to maximum when reducing link rate due to an EQ failure. This could result in having fewer lanes in the verified link capabilities, a lower maximum link bandwidth, and fewer modes being supported. [How] Reset the number of lanes to max when dropping link rate due to EQ failure during link training. Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Ilya <Ilya.Bakoulin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: keep eDP Vdd on when eDP stream is already enabledZhan Liu
[Why] Even if can_apply_edp_fast_boot is set to 1 at boot, this flag will be cleared to 0 at S3 resume. [How] Keep eDP Vdd on when eDP stream is already enabled. Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Zhan Liu <Zhan.Liu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: change fastboot timing validationPaul Hsieh
[Why] VBIOS light up eDP with 6bpc but driver use 8bpc without disable valid stream then re-enable valid stream. Some panels can't runtime change color depth. [How] Change fastboot timing validation function. Not only check LANE_COUNT, LINK_RATE...etc Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Paul Hsieh <paul.hsieh@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: fix yellow carp wm clampingDmytro Laktyushkin
Fix clamping to match register field size Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu/display/dc: do blocked MST topology discovery at resume from S3/S4Bing Guo
Why: When resume from sleep or hiberation, blocked MST Topology discovery might need to be used. How: Added "DETECT_REASON_RESUMEFROMS3S4" to enum dc_detect_reason; use it to require blocked MST Topology discovery. Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Bing Guo <Bing.Guo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: remove static from optc31_set_drrEric Bernstein
remove static from optc31_set_drr Reviewed-by: Nevenko Stupar <Nevenko.Stupar@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Eric Bernstein <eric.bernstein@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: limit unbounded requesting to 5kDmytro Laktyushkin
Unbounded requesting is unsupported on pipe split modes and this change prevents us running into such a situation with wide modes. Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: Fix stream->link_enc unassigned during stream removalNicholas Kazlauskas
[Why] Found when running igt@kms_atomic. Userspace attempts to do a TEST_COMMIT when 0 streams which calls dc_remove_stream_from_ctx. This in turn calls link_enc_unassign which ends up modifying stream->link = NULL directly, causing the global link_enc to be removed preventing further link activity and future link validation from passing. [How] We take care of link_enc unassignment at the start of link_enc_cfg_link_encs_assign so this call is no longer necessary. Fixes global state from being modified while unlocked. Reviewed-by: Jimmy Kizito <Jimmy.Kizito@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: Fix for variable may be used uninitialized errorEric Bernstein
[Why] Build failure due to ‘status’ may be used uninitialized [How] Initialize status to LINK_TRAINING_SUCCESS Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com> Acked-by: Jasdeep Dhillon <jdhillon@amd.com> Signed-off-by: Eric Bernstein <eric.bernstein@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/pm: revise the implementation of ↵Evan Quan
smu_cmn_disable_all_features_with_exception As there is no internal cache for enabled ppfeatures now. Thus the 2nd parameter will be not needed any more. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/pm: avoid consecutive retrieving for enabled ppfeaturesEvan Quan
As the enabled ppfeatures are just retrieved ahead. We can use that directly instead of retrieving again and again. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/pm: drop the cache for enabled ppfeaturesEvan Quan
The following scenarios make the driver cache for enabled ppfeatures outdated and invalid: - Other tools interact with PMFW to change the enabled ppfeatures. - PMFW may enable/disable some features behind driver's back. E.g. for sienna_cichild, on gfxoff entering, PMFW will disable gfx related DPM features. All those are performed without driver's notice. Also considering driver does not actually interact with PMFW such frequently, the benefit brought by such cache is very limited. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/pm: correct the usage for 'supported' member of smu_feature structureEvan Quan
The supported features should be retrieved just after EnableAllDpmFeatures message complete. And the check(whether some dpm feature is supported) is only needed when we decide to enable or disable it. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/pm: update the data type for retrieving enabled ppfeaturesEvan Quan
Use uint64_t instead of an array of uint32_t. This can avoid some non-necessary intermediate uint32_t -> uint64_t conversions. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/pm: unify the interface for retrieving enabled ppfeaturesEvan Quan
Instead of having two which do the same thing. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/pm: correct the way for retrieving enabled ppfeatures on RenoirEvan Quan
As other dGPU asics, Renoir should use smu_cmn_get_enabled_mask() for that job. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amd/display: Cap pflip irqs per max otg numberRoman Li
[Why] pflip interrupt order are mapped 1 to 1 to otg id. e.g. if irq_src=26 corresponds to otg0 then 27->otg1, 28->otg2... Linux DM registers pflip interrupts per number of crtcs. In fused pipe case crtc numbers can be less than otg id. e.g. if one pipe out of 3(otg#0-2) is fused adev->mode_info.num_crtc=2 so DM only registers irq_src 26,27. This is a bug since if pipe#2 remains unfused DM never gets otg2 pflip interrupt (irq_src=28) That may results in gfx failure due to pflip timeout. [How] Register pflip interrupts per max num of otg instead of num_crtc Signed-off-by: Roman Li <Roman.Li@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: check the GART table before invalidating TLBAaron Liu
Bypass group programming (utcl2_harvest) aims to forbid UTCL2 to send invalidation command to harvested SE/SA. Once invalidation command comes into harvested SE/SA, SE/SA has no response and system hang. This patch is to add checking if the GART table is already allocated before invalidating TLB. The new procedure is as following: 1. Calling amdgpu_gtt_mgr_init() in amdgpu_ttm_init(). After this step GTT BOs can be allocated, but GART mappings are still ignored. 2. Calling amdgpu_gart_table_vram_alloc() from the GMC code. This allocates the GART backing store. 3. Initializing the hardware, and programming the backing store into VMID0 for all VMHUBs. 4. Calling amdgpu_gtt_mgr_recover() to make sure the table is updated with the GTT allocations done before it was allocated. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: add utcl2_harvest to gc 10.3.1Aaron Liu
Confirmed with hardware team, there is harvesting for gc 10.3.1. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: fix list add issue in vram reserveTao Zhou
The parameter order in the list_add_tail is incorrect, it causes the reuse of ras reserved page. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07Revert "drm/amdgpu: Add judgement to avoid infinite loop"yipechai
The commit d5e8ff5f7b2a ("drm/amdgpu: Fixed the defect of soft lock caused by infinite loop") had fixed this defect. Revert workaround commit a2170b4af62f ("drm/amdgpu: Add judgement to avoid infinite loop"). Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: Fixed the defect of soft lock caused by infinite loopyipechai
1. The infinite loop case only occurs on multiple cards support ras functions. 2. The explanation of root cause refer to commit 76641cbbf196 ("drm/amdgpu: Add judgement to avoid infinite loop"). 3. Create new node to manage each unique ras instance to guarantee each device .ras_list is completely independent. 4. Fixes: commit 7a6b8ab3231b51 ("drm/amdgpu: Unify ras block interface for each ras block"). 5. The soft locked logs are as follows: [ 262.165690] CPU: 93 PID: 758 Comm: kworker/93:1 Tainted: G OE 5.13.0-27-generic #29~20.04.1-Ubuntu [ 262.165695] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS T20200717143848 07/17/2020 [ 262.165698] Workqueue: events amdgpu_ras_do_recovery [amdgpu] [ 262.165980] RIP: 0010:amdgpu_ras_get_ras_block+0x86/0xd0 [amdgpu] [ 262.166239] Code: 68 d8 4c 8d 71 d8 48 39 c3 74 54 49 8b 45 38 48 85 c0 74 32 44 89 fa 44 89 e6 4c 89 ef e8 82 e4 9b dc 85 c0 74 3c 49 8b 46 28 <49> 8d 56 28 4d 89 f5 48 83 e8 28 48 39 d3 74 25 49 89 c6 49 8b 45 [ 262.166243] RSP: 0018:ffffac908fa87d80 EFLAGS: 00000202 [ 262.166247] RAX: ffffffffc1394248 RBX: ffff91e4ab8d6e20 RCX: ffffffffc1394248 [ 262.166249] RDX: ffff91e4aa356e20 RSI: 000000000000000e RDI: ffff91e4ab8c0000 [ 262.166252] RBP: ffffac908fa87da8 R08: 0000000000000007 R09: 0000000000000001 [ 262.166254] R10: ffff91e4930b64ec R11: 0000000000000000 R12: 000000000000000e [ 262.166256] R13: ffff91e4aa356df8 R14: ffffffffc1394320 R15: 0000000000000003 [ 262.166258] FS: 0000000000000000(0000) GS:ffff92238fb40000(0000) knlGS:0000000000000000 [ 262.166261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 262.166264] CR2: 00000001004865d0 CR3: 000000406d796000 CR4: 0000000000350ee0 [ 262.166267] Call Trace: [ 262.166272] amdgpu_ras_do_recovery+0x130/0x290 [amdgpu] [ 262.166529] ? psi_task_switch+0xd2/0x250 [ 262.166537] ? __switch_to+0x11d/0x460 [ 262.166542] ? __switch_to_asm+0x36/0x70 [ 262.166549] process_one_work+0x220/0x3c0 [ 262.166556] worker_thread+0x4d/0x3f0 [ 262.166560] ? process_one_work+0x3c0/0x3c0 [ 262.166563] kthread+0x12b/0x150 [ 262.166568] ? set_kthread_struct+0x40/0x40 [ 262.166571] ret_from_fork+0x22/0x30 Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: Set FRU bus for Aldebaran and Vega 20Luben Tuikov
The FRU and RAS EEPROMs share the same I2C bus on Aldebaran and Vega 20 ASICs. Set the FRU bus "pointer" to this single bus, as access to the FRU is sought through that bus "pointer" and not through the RAS bus "pointer". Cc: Roy Sun <Roy.Sun@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Fixes: 2f60dd50769efc ("drm/amd: Expose the FRU SMU I2C bus") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: Fix recursive locking warningRajneesh Bhardwaj
Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks. [ +0.000003] WARNING: possible recursive locking detected [ +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted [ +0.000004] -------------------------------------------- [ +0.000002] python/4822 is trying to acquire lock: [ +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000203] but task is already holding lock: [ +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000017] other info that might help us debug this: [ +0.000002] Possible unsafe locking scenario: [ +0.000003] CPU0 [ +0.000002] ---- [ +0.000002] lock(reservation_ww_class_mutex); [ +0.000004] lock(reservation_ww_class_mutex); [ +0.000003] *** DEADLOCK *** [ +0.000002] May be due to missing lock nesting notation [ +0.000003] 7 locks held by python/4822: [ +0.000003] #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at: kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu] [ +0.000232] #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu] [ +0.000241] #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu] [ +0.000236] #3: ffffb2b35606fd28 (reservation_ww_class_acquire){+.+.}-{0:0}, at: amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu] [ +0.000235] #4: ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: ttm_eu_reserve_buffers+0x270/0x470 [ttm] [ +0.000015] #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at: drm_dev_enter+0x5/0xa0 [drm] [ +0.000038] #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3}, at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu] [ +0.000195] stack backtrace: [ +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted 5.13.0-kfd-rajneesh #1030 [ +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02 08/29/2018 [ +0.000003] Call Trace: [ +0.000003] dump_stack+0x6d/0x89 [ +0.000010] __lock_acquire+0xb93/0x1a90 [ +0.000009] lock_acquire+0x25d/0x2d0 [ +0.000005] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000184] __ww_mutex_lock.constprop.17+0xca/0x1060 [ +0.000007] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] ? lock_release+0x13f/0x270 [ +0.000005] ? lock_is_held_type+0xa2/0x110 [ +0.000006] ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000183] amdgpu_bo_release_notify+0xc4/0x160 [amdgpu] [ +0.000185] ttm_bo_release+0x4c6/0x580 [ttm] [ +0.000010] amdgpu_bo_unref+0x1a/0x30 [amdgpu] [ +0.000183] amdgpu_vm_free_table+0x76/0xa0 [amdgpu] [ +0.000189] amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu] [ +0.000189] amdgpu_vm_update_ptes+0x411/0x770 [amdgpu] [ +0.000191] amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu] [ +0.000191] amdgpu_vm_bo_update+0x251/0x610 [amdgpu] [ +0.000191] update_gpuvm_pte+0xcc/0x290 [amdgpu] [ +0.000229] ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu] [ +0.000190] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60 [amdgpu] [ +0.000234] kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu] [ +0.000218] kfd_ioctl+0x2b9/0x600 [amdgpu] [ +0.000216] ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu] [ +0.000216] ? lock_release+0x13f/0x270 [ +0.000006] ? __fget_files+0x107/0x1e0 [ +0.000007] __x64_sys_ioctl+0x8b/0xd0 [ +0.000007] do_syscall_64+0x36/0x70 [ +0.000004] entry_SYSCALL_64_after_hwframe+0x44/0xae [ +0.000007] RIP: 0033:0x7fbff90a7317 [ +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 [ +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX: 00007fbff90a7317 [ +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI: 0000000000000004 [ +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09: 00007fbcc402d880 [ +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12: 00000000c0184b18 [ +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15: 00007fbcc402d820 Cc: Christian König <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: Prevent random memory access in FRU codeLuben Tuikov
Prevent random memory access in the FRU EEPROM code by passing the size of the destination buffer to the reading routine, and reading no more than the size of the buffer. Cc: Kent Russell <kent.russell@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: Don't offset by 2 in FRU EEPROMLuben Tuikov
Read buffers no longer expose the I2C address, and so we don't need to offset by two when we get the read data. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Kent Russell <kent.russell@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Fixes: bd607166af7fe3 ("drm/amdgpu: Enable reading FRU chip via I2C v3") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdgpu: Nerf "buff" to "buf"Luben Tuikov
Buffer is abbreviated "buf" (buf-fer), not "buff" (buff-er). This is consistent with the rest of the kernel code. Cc: Kent Russell <kent.russell@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07drm/amdkfd: Bump up KFD API version for CRIURajneesh Bhardwaj
- Change KFD minor version to 7 for CRIU Proposed userspace changes: https://github.com/RadeonOpenCompute/criu Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>