summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-05-30drm/xe: replace format-less snprintf() with strscpy()Arnd Bergmann
Using snprintf() with a format string from task->comm is a bit dangerous since the string may be controlled by unprivileged userspace: drivers/gpu/drm/xe/xe_devcoredump.c: In function 'devcoredump_snapshot': drivers/gpu/drm/xe/xe_devcoredump.c:184:9: error: format not a string literal and no format arguments [-Werror=format-security] 184 | snprintf(ss->process_name, sizeof(ss->process_name), process_name); | ^~~~~~~~ In this case there is no reason for an snprintf(), so use a simpler string copy. Fixes: b10d0c5e9df7 ("drm/xe: Add process name to devcoredump") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240528133251.2310868-1-arnd@kernel.org
2024-05-29drm/xe: Decouple xe_exec_queue and xe_lrcNiranjana Vishwanathapura
Decouple xe_lrc from xe_exec_queue and reference count xe_lrc. Removing hard coupling between xe_exec_queue and xe_lrc allows flexible design where the user interface xe_exec_queue can be destroyed independent of the hardware/firmware interface xe_lrc. v2: Fix lrc indexing in wq_item_append() Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240530032211.29299-1-niranjana.vishwanathapura@intel.com
2024-05-29drm/xe: Remove unwanted mutex lockingNiranjana Vishwanathapura
Do not hold xef->exec_queue.lock mutex while parsing the xarray xef->exec_queue.xa in xe_file_close() as it is not needed and will cause an unwanted dependency between this lock and the vm->lock. This lock protects the exec queue lookup and reference taking which doesn't apply to this code path. When FD is closing, IOCTLs presumably can't be modifying the xarray. v2: Update commit text (Matt Brost) v3: Add more code comment (Rodrigo Vivi) v4: Further expand code comment (Rodirgo Vivi) Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240529221639.23117-1-niranjana.vishwanathapura@intel.com
2024-05-29drm/xe: Drop undesired prefix from the platform nameMichal Wajdeczko
We don't have to use exact names of the enumerators as the potentially user-facing platform names. When constructing platform descriptor fields, use the unique platform tag and add the XE_ prefix only to the generated enum field. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521142257.756-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-05-29drm/xe/hwmon: Expose card power and energy attributes of BMGKarthik Poosa
In BMG there are separate registers for card/platform power and energy. These are exposed through channel 0 i.e power_1/energy1_xxx. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20240523144351.4040131-3-balasubramani.vivekanandan@intel.com Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240529050758.442056-3-balasubramani.vivekanandan@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-29drm/xe/hwmon: Add HWMON support for BMGKarthik Poosa
Add HWMON support for BMG. Exposing the pkg power, current, energy info. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20240523144351.4040131-2-balasubramani.vivekanandan@intel.com Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240529050758.442056-2-balasubramani.vivekanandan@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-29drm/xe: Add engine name to the engine reset and cat-err logNirmoy Das
Add engine name to the engine reset and cat error log which should be useful while debugging. v2: Add logical mask and engine class(Matt) Use xe_gt_{info|dbg} (Michal) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240528101445.27688-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
2024-05-29drm/xe: Check empty pinned BO list with lock held.Nirmoy Das
Use lock that is meant to use for accessing the BO pin list. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240528115408.22094-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
2024-05-28drm/xe/guc: Fix uninitialised count in GuC load debug printsJohn Harrison
The debug prints about how long the GuC load takes have a loop counter. However that was neither initialised nor incremented! Plus, counting loops is no longer meaningful given the wait function returns early for any change in the status value. So fix it to only count loops due to actual timeouts. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405250151.IbH0l8FG-lkp@intel.com/ Fixes: b0ac1b42dbdc ("drm/xe/guc: Port over the slow GuC loading support from i915") Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Fei Yang <fei.yang@intel.com> Cc: intel-xe@lists.freedesktop.org Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524202603.4011656-1-John.C.Harrison@Intel.com
2024-05-28MAINTAINERS: update Xe driver maintainersOded Gabbay
Because I left Intel, I'm removing myself from the list of Xe driver maintainers. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240515162222.12958-3-ogabbay@kernel.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-05-28drm/xe: Enable Coarse Power GatingRiana Tauro
Enable power gating for all units and sub-pipes that are disabled by default. v2: change the init function name use symmetric calls for enable/disable pg re-pharase commit message (Rodrigo) modify the sub-pipe power gating condition v3: set hysteresis value for render and media when GuC PC is disabled skip CPG for PVC (Vinay) v4: rebase Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #v2 Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524070916.143022-3-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-28drm/xe: Standardize power gate registersRiana Tauro
Standardize power gate registers No functional changes v2: change commit message (Rodrigo) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524070916.143022-2-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-28drm/xe: Don't refer to general LRC initialization as a "wa"Matt Roper
During engine LRC initialization a number of registers need to be programmed as general setup. This programming is not a "workaround" so naming the RTP table as "lrc_was" is misleading; switch to the name "lrc_setup" to more accurately describe what the table is actually for. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524230444.1447797-2-matthew.d.roper@intel.com
2024-05-28drm/xe: Use platform name in xe_assert()Michal Wajdeczko
We can now use more user-friendly platform name instead of previosly used magic platform enumerator value: [ ] xe 0000:00:02.0: [drm] Assertion `false` failed! platform: ALDERLAKE_S ... [ ] xe 0000:03:00.0: [drm] Assertion `false` failed! platform: DG2 ... vs [ ] xe 0000:00:02.0: [drm] Assertion `false` failed! platform: 3 ... [ ] xe 0000:03:00.0: [drm] Assertion `false` failed! platform: 7 ... Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521142257.756-4-michal.wajdeczko@intel.com
2024-05-28drm/xe: Store platform name in xe_device.infoMichal Wajdeczko
We already maintain the platform name as part of the device descriptor, but in xe_device.info we only store platform enum, which is not the best for use in any user-facing messages. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521142257.756-2-michal.wajdeczko@intel.com
2024-05-28drm/xe: allow unaligned start and size xe_res_cursor parametersAndrzej Hajda
xe_res_cursor code does not depend on the alignment. On the other side unaligned accesses are useful from pread/pwrite point of view. Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240418-xe_res_cursor-no-align-v1-1-8df7834266c9@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
2024-05-28drm/xe: flush gtt before signalling user fence on all enginesAndrzej Hajda
Tests show that user fence signalling requires kind of write barrier, otherwise not all writes performed by the workload will be available to userspace. It is already done for render and compute, we need it also for the rest: video, gsc, copy. v2: added gsc and copy engines, added fixes and r-b tags Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1488 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522-xu_flush_vcs_before_ufence-v2-1-9ac3e9af0323@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
2024-05-27drm/xe: Do not access xe file when updating exec queue run_ticksUmesh Nerlige Ramappa
The current code is running into a use after free case where xe file is closed before the exec queue run_ticks can be updated. This is occurring in the xe_file_close path. To fix that, do not access xe file when updating the exec queue run_ticks. Instead store the exec queue run_ticks locally in the exec queue object and accumulate it when the user dumps the drm client stats. We know that the xe file is valid when user is dumping the run_ticks for the drm client, so this effectively removes the dependency on xe file object in xe_exec_queue_update_run_ticks(). v2: - Fix the accumulation of q->run_ticks delta into xe file run_ticks - s/runtime/run_ticks/ (Rodrigo) Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908 Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524234744.1352543-2-umesh.nerlige.ramappa@intel.com
2024-05-27drm/xe: Use run_ticks instead of runtime for client statsUmesh Nerlige Ramappa
Note that runtime is also used in the pm context, so it is confusing to use the same name to denote run time of the drm client. Use a more appropriate name for the client utilization. While at it, drop the incorrect multi-lrc comment in the helper description v2: s/show_runtime/show_run_ticks/ (Rodrigo) Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524234744.1352543-1-umesh.nerlige.ramappa@intel.com
2024-05-27drm/xe: Move job creation out of the struct xe_migrate::job_mutexThomas Hellström
In order to be able to run gpu jobs from reclaim context, move job creation (where allocation takes place) out of the struct xe_migrate::job_mutex, and prime that mutex as reclaim tainted. Jobs that may need to run from reclaim context include CCS metadata extraction at shrinking time. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-6-thomas.hellstrom@linux.intel.com
2024-05-27drm/xe: Remove xe_lrc_create_seqno_fence()Thomas Hellström
It's not used anymore. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-5-thomas.hellstrom@linux.intel.com
2024-05-27drm/xe: Don't initialize fences at xe_sched_job_create()Thomas Hellström
Pre-allocate but don't initialize fences at xe_sched_job_create(), and initialize / arm them instead at xe_sched_job_arm(). This makes it possible to move xe_sched_job_create() with its memory allocation out of any lock that is required for fence initialization, and that may not allow memory allocation under it. Replaces the struct dma_fence_array for parallell jobs with a struct dma_fence_chain, since the former doesn't allow a split-up between allocation and initialization. v2: - Rebase. - Don't always use the first lrc when initializing parallel lrc fences. - Use dma_fence_chain_contained() to access the lrc fences. v4: - Add an assert that job->lrc_seqno == fence->seqno. (Matthew Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-4-thomas.hellstrom@linux.intel.com
2024-05-27drm/xe: Split lrc seqno fence creation upThomas Hellström
Since sometimes a lock is required to initialize a seqno fence, and it might be desirable not to hold that lock while performing memory allocations, split the lrc seqno fence creation up into an allocation phase and an initialization phase. Since lrc seqno fences under the hood are hw_fences, do the same for these and remove the xe_hw_fence_create() function since it is not used anymore. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-3-thomas.hellstrom@linux.intel.com
2024-05-27drm/xe: Decouple job seqno and lrc seqnoMatthew Brost
Tightly coupling these seqno presents problems if alternative fences for jobs are used. Decouple these for correctness. v2: - Slightly reword commit message (Thomas) - Make sure the lrc fence ops are used in comparison (Thomas) - Assume seqno is unsigned rather than signed in format string (Thomas) Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-2-thomas.hellstrom@linux.intel.com
2024-05-27drm/xe/vf: Use only assigned GGTT regionMichal Wajdeczko
Each VF is assigned a limited range of the GGTT address space. To ensure that the VF driver does not use GGTT allocations outside of the assigned region, explicitly reserve GGTT space below and above this region when initializing GGTT. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527112015.1020-1-michal.wajdeczko@intel.com
2024-05-27drm/xe/vf: Read VF configuration prior to GGTT initializationMichal Wajdeczko
Each VF will be assigned with only a limited range of the GGTT address space. Make sure that VF driver will read its own GGTT configuration before starting any GGTT initialization. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240524113714.932-2-michal.wajdeczko@intel.com
2024-05-24drm/xe/vf: Treat GMDID as another runtime registerMichal Wajdeczko
While the GMDID registers are not part of the runtime register list shared by the PF driver, we may still return cached values from our VF specific read32() helper function. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-7-michal.wajdeczko@intel.com
2024-05-24drm/xe/vf: Cache value of the GMDID registerMichal Wajdeczko
Read and cache value of the GMDID register as part of the config query that VF driver is doing over MMIO. While the VF driver likely already obtained the value of the GMDID register once during the early driver probe, we couldn't cache it then as the GT structures were not ready yet. Cache it now, in case the driver needs it later when the GuC MMIO communication, required to query GMDID from GuC, could be no longer desired as it will be replaced by the CTB communication. While around, assert that we will query GMDID only when applicable. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-6-michal.wajdeczko@intel.com
2024-05-24drm/xe/vf: Provide early access to GMDID registerMichal Wajdeczko
VFs do not have direct access to the GMDID register and must obtain its value from the GuC. Since we need GMDID value very early in the driver probe flow, before we even start the full setup of GT and GuC data structures, we must do some early initializations ourselves. Additionally, since we also need GMDID for the media GT, which isn't created yet, temporarly tweak the root GT type into MEDIA to allow communication with the correct GuC, as only it can provide the value of the media GMDID register. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523223042.888-1-michal.wajdeczko@intel.com
2024-05-24drm/xe/vf: Obtain value of GMDID register from GuCMichal Wajdeczko
VFs don't have access to the GMDID register and must obtain it value using GuC VF ABI KLV query. Add function for doing that. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-4-michal.wajdeczko@intel.com
2024-05-24drm/xe/guc: Add GLOBAL_CFG_GMD_ID KLV definitionMichal Wajdeczko
VF drivers can't access GMD_ID register over MMIO. The value of the GMD_ID register must be queried from GuC. It is available as GLOBAL_CFG_GMD_ID KLV. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-3-michal.wajdeczko@intel.com
2024-05-24drm/xe/vf: Use register values obtained from the PFMichal Wajdeczko
As part of the its initialization, the VF driver has already obtained a list of the runtime (fuse) register values from the PF driver. When VF driver is attempting to read register that is inaccessible to the VF, use the values from this list instead. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-2-michal.wajdeczko@intel.com
2024-05-23drm/xe/guc: Port over the slow GuC loading support from i915John Harrison
GuC loading can take longer than it is supposed to for various reasons. So add in the code to cope with that and to report it when it happens. There are also many different reasons why GuC loading can fail, so add in the code for checking for those and for reporting issues in a meaningful manner rather than just hitting a timeout and saying 'fail: status = %x'. Also, remove the 'FIXME' comment about an i915 bug that has never been applicable to Xe! v2: Actually report the requested and granted frequencies rather than showing granted twice (review feedback from Badal). v3: Locally code all the timeout and end condition handling because a helper function is not allowed (review feedback from Lucas/Rodrigo). v4: Add more documentation comments and rename a define to add units (review feedback from Lucas). v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from Lucas) and rebase (no more return value from guc_wait_ucode). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-3-John.C.Harrison@Intel.com
2024-05-23drm/xe: Make read_perf_limit_reasons globally accessibleJohn Harrison
Other driver code beyond the sysfs interface wants to know about throttling. So make the query function globally accessible. v2: Revert include order change (review feedback from Lucas) v3: Remove '_sysfs' from throttle file names and keep limit query in the same file rather than moving elsewhere (review feedback from Rodrigo). v4: Correct #include while renaming header file (review feedback from Lucas). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-2-John.C.Harrison@Intel.com
2024-05-23drm/xe: Nuke simple error captureJosé Roberto de Souza
This error capture prints into dmesg HW state when a gpu hang happens. It was useful when we did not had devcoredump, now it is a incompleted version of devcoredump that has potential to flood dmesg. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522203431.191594-1-jose.souza@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Add process name to devcoredumpJosé Roberto de Souza
Process name help us track what application caused the gpug hang, this is crucial when running several applications at the same time. v2: - handle Xe KMD exec_queues without VM v3: - use get_pid_task() (suggested by Nirmoy) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522201203.145403-1-jose.souza@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: remove unused struct 'xe_gt_desc'Dr. David Alan Gilbert
'xe_gt_desc' is unused since commit 1e6c20be6c83 ("drm/xe: Drop extra_gts[] declarations and XE_GT_TYPE_REMOTE"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240522175840.382107-1-linux@treblig.org Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Enable D3Cold on 'low' VRAM utilizationRodrigo Vivi
Now that we eliminated all the mem_access get/put with its locking issues from the inner calls of migration, we can allow D3Cold. Enable it when VRAM utilization is lower then 300Mb. On higher utilization we only allow D3hot so we don't increase so much the latency on runtime resume due to the memory restoration. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-7-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Stop checking for power_lost on D3ColdRodrigo Vivi
GuC reset status is not reliable for this purpose and it is once in a while ending up in a situation of D3Cold, where power_reset is false and without the proper memory restoration the GuC reload and Display will fail to come back from D3Cold. So, let's do a full restoration of everything if we have a risk of losing power, without further optimizations. v2: also remove the gut_in_reset function (Anshuman) Cc: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-6-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Prepare display for D3ColdRodrigo Vivi
Prepare power-well and DC handling for a full power lost during D3Cold, then sanitize it upon D3->D0. Otherwise we get a bunch of state mismatch. Ideally we could leave DC9 enabled and wouldn't need to move DC9->DC0 on every runtime resume, however, the disable_DC is part of the power-well checks and intrinsic to the dc_off power well. In the future that can be detangled so we can have even bigger power savings. But for now, let's focus on getting a D3Cold, which saves much more power by itself. v2: create new functions to avoid full-suspend-resume path, which would result in a deadlock between xe_gem_fault and the modeset-ioctl. v3: Only avoid the full modeset to avoid the race, for a more robust suspend-resume. Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Uma Shankar <uma.shankar@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-5-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Relax runtime pm protection around VMRodrigo Vivi
In the regular use case scenario, user space will create a VM, and keep it alive for the entire duration of its workload. For the regular desktop cases, it means that the VM is alive even on idle scenarios where display goes off. This is unacceptable since this would entirely block runtime PM indefinitely, blocking deeper Package-C state. This would be a waste drainage of power. Limit the VM protection solely for long-running workloads that are not protected by the scheduler references. By design, run_job for long-running workloads returns NULL and the scheduler drops all the references of it, hence protecting the VM for this case is necessary. v2: Update commit message to a more imperative language and to reflect why the VM protection is really needed. Also add a comment in the code to let the reason visbible. v3: Remove vma_access case and the mentions to mmap. Mmap cases are already protected by the gem page fault. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Relax runtime pm protection during executionRodrigo Vivi
Limit the protection only during moments of actual job execution, and introduce protection for guc submit fini, which is currently unprotected due to the absence of exec_queue life protection. In the regular use case scenario, user space will create an exec queue, and keep it alive to reuse that until it is done with that kind of workload. For the regular desktop cases, it means that the exec_queue is alive even on idle scenarios where display goes off. This is unacceptable since this would entirely block runtime PM indefinitely, blocking deeper Package-C state. This would be a waste drainage of power. Cc: Matthew Brost <matthew.brost@intel.com> Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Fix xe_pm_runtime_get_if_in_use documentationRodrigo Vivi
Let's be clear on what it is actually doing and align with xe_pm_runtime_get_if_active doc style. Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-2-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-23drm/xe: Fix xe_pm_runtime_get_if_active returnRodrigo Vivi
Current callers of this function are already taking the result to a boolean and using in an if. It might be a problem because current function might return negative error codes on failure, without increasing the reference counter. In this scenario we could end up with extra 'put' call ending in unbalanced scenarios. Let's fix it, while aligning with the current xe_pm_get_if_in_use style. Tested-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-05-22drm/xe: Properly handle alloc_guc_id() failureNiranjana Vishwanathapura
Release the submission_state lock if alloc_guc_id() fails. v2: Add Fixes tag and CC stable kernel Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521201711.4934-1-niranjana.vishwanathapura@intel.com
2024-05-22drm/xe/uc: Don't emit false error if running in execlist modeMichal Wajdeczko
When running in execlist mode (using force_execlist=1 modparam) we incorrectly select the error path in xe_uc_init(), leading to an unwanted error message like this: [ ] xe 0000:00:00.0: [drm] *ERROR* GT0: Failed to initialize uC (0000000000000000) Fix that by doing early return like we do in other similar cases. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521114857.712-1-michal.wajdeczko@intel.com
2024-05-22drm/xe/display: move device_remove over to drmmMatthew Auld
i915 display calls this when releasing the drm_device, match this also in xe by using drmm. intel_display_device_remove() is freeing purely software state for the drm_device. v2: fix build error Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-36-matthew.auld@intel.com
2024-05-22drm/xe/display: stop calling domains_driver_remove twiceMatthew Auld
Unclear why we call this twice. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-35-matthew.auld@intel.com
2024-05-22drm/xe/display: move display fini stuff to devmMatthew Auld
Match the i915 display handling here with calling both no_irq and noaccel when removing the device. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-34-matthew.auld@intel.com
2024-05-22drm/xe: reset mmio mappings with devmMatthew Auld
Set our various mmio mappings to NULL. This should make it easier to catch something rogue trying to mess with mmio after device removal. For example, we might unmap everything and then start hitting some mmio address which has already been unmamped by us and then remapped by something else, causing all kinds of carnage. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-33-matthew.auld@intel.com