linux.git - Linus' kernel tree

diff options

author	Lucas De Marchi <lucas.demarchi@intel.com>	2024-11-07 21:33:17 -0800
committer	Lucas De Marchi <lucas.demarchi@intel.com>	2024-11-13 09:10:57 -0800
commit	db696095b08fb7186fedb93ce216f67121ec9b44 (patch)
tree	6141824a90a684359b7eb6a166f09e40bb24c51a /scripts/lib/kdoc/kdoc_output.py
parent	0fd4380c050d71334eb61067f3228a5d57172a45 (diff)

drm/xe: Sample gpu timestamp closer to exec queues

Move the force_wake_get to the beginning of the function so the gpu timestamp can be closer to the sampled value for exec queues. This avoids additional delays waiting for force wake ack which can make the proportion between cycles/total_cycles fluctuate around the real value. For a gputop-like application getting 2 samples to calculate the utilization: sample 0: read_exec_queue_timestamp <<<< (A) read_gpu_timestamp sample 1: read_exec_queue_timestamp <<<<< (B) read_gpu_timestamp In the above case, utilization can be bigger or smaller than it should be, depending on if (A) or (B) receives additional delay, respectively. With this a LNL system that was failing on `xe_drm_fdinfo --r utilization-single-full-load` after ~60 iterations, get to run to 100 without a failure. This is still not perfect, and it's easy to introduce errors by just loading the CPU with `stress --cpu $(nproc)` - the same igt test in this case fails after 2 or 3 iterations. That will be dealt with in the test itself, using a longer sampling period. v2: Rename function and add another to get "any engine", preparing for caching the hwe in future (Umesh / Jonathan) Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241108053318.3483678-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

Diffstat (limited to 'scripts/lib/kdoc/kdoc_output.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: