summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/scheduler
AgeCommit message (Collapse)Author
2025-06-06Merge tag 'drm-fixes-2025-06-06' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull more drm fixes from Simona Vetter: "Another small batch of drm fixes, this time with a different baseline and hence separate. Drivers: - ivpu: - dma_resv locking - warning fixes - reset failure handling - improve logging - update fw file names - fix cmdqueue unregister - panel-simple: add Evervision VGG644804 Core Changes: - sysfb: screen_info type check - video: screen_info for relocated pci fb - drm/sched: signal fence of killed job - dummycon: deferred takeover fix" * tag 'drm-fixes-2025-06-06' of https://gitlab.freedesktop.org/drm/kernel: sysfb: Fix screen_info type check for VGA video: screen_info: Relocate framebuffers behind PCI bridges accel/ivpu: Fix warning in ivpu_gem_bo_free() accel/ivpu: Trigger device recovery on engine reset/resume failure accel/ivpu: Use dma_resv_lock() instead of a custom mutex drm/panel-simple: fix the warnings for the Evervision VGG644804 accel/ivpu: Reorder Doorbell Unregister and Command Queue Destruction accel/ivpu: Use firmware names from upstream repo accel/ivpu: Improve buffer object logging dummycon: Trigger redraw when switching consoles with deferred takeover drm/scheduler: signal scheduled fence when kill job
2025-05-22drm/scheduler: signal scheduled fence when kill jobLin.Cao
When an entity from application B is killed, drm_sched_entity_kill() removes all jobs belonging to that entity through drm_sched_entity_kill_jobs_work(). If application A's job depends on a scheduled fence from application B's job, and that fence is not properly signaled during the killing process, application A's dependency cannot be cleared. This leads to application A hanging indefinitely while waiting for a dependency that will never be resolved. Fix this issue by ensuring that scheduled fences are properly signaled when an entity is killed, allowing dependent applications to continue execution. Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250515020713.1110476-1-lincao12@amd.com
2025-04-07Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging to get v6.15-rc1 into drm-misc-next. Also fixes a build issue when enabling CONFIG_DRM_SCHED_KUNIT_TEST. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2025-03-28Merge tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "Outside of drm there are some rust patches from Danilo who maintains that area in here, and some pieces for drm header check tests. The major things in here are a new driver supporting the touchbar displays on M1/M2, the nova-core stub driver which is just the vehicle for adding rust abstractions and start developing a real driver inside of. xe adds support for SVM with a non-driver specific SVM core abstraction that will hopefully be useful for other drivers, along with support for shrinking for TTM devices. I'm sure xe and AMD support new devices, but the pipeline depth on these things is hard to know what they end up being in the marketplace! uapi: - add mediatek tiled fourcc - add support for notifying userspace on device wedged new driver: - appletbdrm: support for Apple Touchbar displays on m1/m2 - nova-core: skeleton rust driver to develop nova inside off firmware: - add some rust firmware pieces rust: - add 'LocalModule' type alias component: - add helper to query bound status fbdev: - fbtft: remove access to page->index media: - cec: tda998x: import driver from drm dma-buf: - add fast path for single fence merging tests: - fix lockdep warnings atomic: - allow full modeset on connector changes - clarify semantics of allow_modeset and drm_atomic_helper_check - async-flip: support on arbitary planes - writeback: fix UAF - Document atomic-state history format-helper: - support ARGB8888 to ARGB4444 conversions buddy: - fix multi-root cleanup ci: - update IGT dp: - support extended wake timeout - mst: fix RAD to string conversion - increase DPCD eDP control CAP size to 5 bytes - add DPCD eDP v1.5 definition - add helpers for LTTPR transparent mode panic: - encode QR code according to Fido 2.2 scheduler: - add parameter struct for init - improve job peek/pop operations - optimise drm_sched_job struct layout ttm: - refactor pool allocation - add helpers for TTM shrinker panel-orientation: - add a bunch of new quirks panel: - convert panels to multi-style functions - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3, LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006, Lenovo T14s Gen6 Snapdragon - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay kd110n11-51ie, Starry 2082109qfh040022-50e - visionox-r66451: use multi-style MIPI-DSI functions - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 bridge: - pass full atomic state to various callbacks - adv7511: Report correct capabilities - it6505: Fix HDCP V compare - snd65dsi86: fix device IDs - nwl-dsi: set bridge type - ti-sn65si83: add error recovery and set bridge type - synopsys: add HDMI audio support xe: - support device-wedged event - add mmap support for PCI memory barrier - perf pmu integration and expose per-engien activity - add EU stall sampling support - GPU SVM and Xe SVM implementation - use TTM shrinker - add survivability mode to allow the driver to do firmware updates in critical failure states - PXP HWDRM support for MTL and LNL - expose package/vram temps over hwmon - enable DP tunneling - drop mmio_ext abstraction - Reject BO evcition if BO is bound to current VM - Xe suballocator improvements - re-use display vmas when possible - add GuC Buffer Cache abstraction - PCI ID update for Panther Lake and Battlemage - Enable SRIOV for Panther Lake - Refactor VRAM manager location i915: - enable extends wake timeout - support device-wedged event - Enable DP 128b/132b SST DSC - FBC dirty rectangle support for display version 30+ - convert i915/xe to drm client setup - Compute HDMI PLLS for rates not in fixed tables - Allow DSB usage when PSR is enabled on LNL+ - Enable panel replay without full modeset - Enable async flips with compressed buffers on ICL+ - support luminance based brightness via DPCD for eDP - enable VRR enable/disable without full modeset - allow GuC SLPC default strategies on MTL+ for performance - lots of display refactoring in move to struct intel_display amdgpu: - add device wedged event - support async page flips on overlay planes - enable broadcast RGB drm property - add info ioctl for virt mode - OEM i2c support for RGB lights - GC 11.5.2 + 11.5.3 support - SDMA 6.1.3 support - NBIO 7.9.1 + 7.11.2 support - MMHUB 1.8.1 + 3.3.2 support - DCN 3.6.0 support - Add dynamic workload profile switching for GC 10-12 - support larger VBIOS sizes - Mark gttsize parameters as deprecated - Initial JPEG queue resset support amdkfd: - add KFD per process flags for setting precision - sync pasid values between KGD and KFD - improve GTT/VRAM handling for APUs - fix user queue validation on GC7/8 - SDMA queue reset support raedeon: - rs400 hyperz fix i2c: - td998x: drop platform_data, split driver into media and bridge ast: - transmitter chip detection refactoring - vbios display mode refactoring - astdp: fix connection status and filter unsupported modes - cursor handling refactoring imagination: - check job dependencies with sched helper ivpu: - improve command queue handling - use workqueue for IRQ handling - add support HW fault injection - locking fixes mgag200: - add support for G200eH5 msm: - dpu: add concurrent writeback support for DPU 10.x+ - use LTTPR helpers - GPU: - Fix obscure GMU suspend failure - Expose syncobj timeline support - Extend GPU devcoredump with pagetable info - a623 support - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot / devcoredump - Display: - Add cpu-cfg interconnect paths on SM8560 and SM8650 - Introduce KMS OMMU fault handler, causing devcoredump snapshot - Fixed error pointer dereference in msm_kms_init_aspace() - DPU: - Fix mode_changing handling - Add writeback support on SM6150 (QCS615) - Fix DSC programming in 1:1:1 topology - Reworked hardware resource allocation, moving it to the CRTC code - Enabled support for Concurrent WriteBack (CWB) on SM8650 - Enabled CDM blocks on all relevant platforms - Reworked debugfs interface for BW/clocks debugging - Clear perf params before calculating bw - Support YUV formats on writeback - Fixed double inclusion - Fixed writeback in YUV formats when using cloned output, Dropped wb2_formats_rgb - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt kerneldocs - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode() - DSI: - DSC-related fixes - Rework clock programming - DSI PHY: - Fix 7nm (and lower) PHY programming - Add proper DT schema definitions for DSI PHY clocks - HDMI: - Rework the driver, enabling the use of the HDMI Connector framework - Bindings: - Added eDP PHY on SA8775P nouveau: - move drm_slave_encoder interface into driver - nvkm: refactor GSP RPC - use LTTPR helpers mediatek: - HDMI fixup and refinement - add MT8188 dsc compatible - MT8365 SoC support panthor: - Expose sizes of intenral BOs via fdinfo - Fix race between reset and suspend - Improve locking qaic: - Add support for AIC200 renesas: - Fix limits in DT bindings rockchip: - support rk3562-mali - rk3576: Add HDMI support - vop2: Add new display modes on RK3588 HDMI0 up to 4K - Don't change HDMI reference clock rate - Fix DT bindings - analogix_dp: add eDP support - fix shutodnw solomon: - Set SPI device table to silence warnings - Fix pixel and scanline encoding v3d: - handle clock vc4: - Use drm_exec - Use dma-resv for wait-BO ioctl - Remove seqno infrastructure virtgpu: - Support partial mappings of GEM objects - Reserve VGA resources during initialization - Fix UAF in virtgpu_dma_buf_free_obj() - Add panic support vkms: - Switch to a managed modesetting pipeline - Add support for ARGB8888 - fix UAf xlnx: - Set correct DMA segment size - use mutex guards - Fix error handling - Fix docs" * tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel: (1762 commits) drm/amd/pm: Update feature list for smu_v13_0_6 drm/amdgpu: Add parameter documentation for amdgpu_sync_fence drm/amdgpu/discovery: optionally use fw based ip discovery drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics drm/amdgpu/discovery: check ip_discovery fw file available drm/amd/pm: Remove unnecessay UQ10 to UINT conversion drm/amd/pm: Remove unnecessay UQ10 to UINT conversion drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush drm/amd/amdgpu: Increase max rings to enable SDMA page ring drm/amdgpu: Decode deferred error type in gfx aca bank parser drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs drm/amdgpu/mes: clean up SDMA HQD loop drm/amdgpu/mes: enable compute pipes across all MEC drm/amdgpu/mes: drop MES 10.x leftovers drm/amdgpu/mes: optimize compute loop handling drm/amdgpu/sdma: guilty tracking is per instance drm/amdgpu/sdma: fix engine reset handling drm/amdgpu: remove invalid usage of sched.ready drm/amdgpu: add cleaner shader trace point ...
2025-03-24drm/sched: Add a basic test for checking credit limitTvrtko Ursulin
Add a basic test for checking whether scheduler respects the configured credit limit. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250324092633.49746-7-tvrtko.ursulin@igalia.com
2025-03-24drm/sched: Add a basic test for modifying entities scheduler listTvrtko Ursulin
Add a basic test for exercising modifying the entities scheduler list at runtime. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250324092633.49746-6-tvrtko.ursulin@igalia.com
2025-03-24drm/sched: Add basic priority testsTvrtko Ursulin
Add some basic tests for exercising entity priority handling. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250324092633.49746-5-tvrtko.ursulin@igalia.com
2025-03-24drm/sched: Add a simple timeout testTvrtko Ursulin
Add a very simple timeout test which submits a single job and verifies that the timeout handling will run if the backend failed to complete the job in time. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250324092633.49746-4-tvrtko.ursulin@igalia.com
2025-03-24drm/sched: Add scheduler unit testing infrastructure and some basic testsTvrtko Ursulin
Implement a mock scheduler backend and add some basic test to exercise the core scheduler code paths. Mock backend (kind of like a very simple mock GPU) can either process jobs by tests manually advancing the "timeline" job at a time, or alternatively jobs can be configured with a time duration in which case they get completed asynchronously from the unit test code. Core scheduler classes are subclassed to support this mock implementation. The tests added are just a few simple submission patterns. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Suggested-by: Philipp Stanner <phasta@kernel.org> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250324092633.49746-3-tvrtko.ursulin@igalia.com
2025-03-14drm/sched: Clarify docu concerning drm_sched_job_arm()Philipp Stanner
The documentation for drm_sched_job_arm() and especially drm_sched_job_cleanup() does not make it very clear why drm_sched_job_arm() is a point of no return, which it indeed is. Make the nature of drm_sched_job_arm() in the docu as clear as possible. Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250313093053.65001-2-phasta@kernel.org
2025-03-13drm/sched: Fix fence reference count leakqianyi liu
The last_scheduled fence leaks when an entity is being killed and adding the cleanup callback fails. Decrement the reference count of prev when dma_fence_add_callback() fails, ensuring proper balance. Cc: stable@vger.kernel.org # v6.2+ [phasta: add git tag info for stable kernel] Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Signed-off-by: qianyi liu <liuqianyi125@gmail.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250311060251.4041101-1-liuqianyi125@gmail.com
2025-03-12drm/sched: revert "drm_sched_job_cleanup(): correct false doc"Christian König
This reverts commit 44d2f310f008613c1dbe5e234c2cf2be90cbbfab. The function drm_sched_job_arm() is indeed the point of no return. The background is that it is nearly impossible for the driver to correctly retract the fence and signal it in the order enforced by the dma_fence framework. The code in drm_sched_job_cleanup() is for the purpose to cleanup after the job was armed through drm_sched_job_arm() *and* processed by the scheduler. We can certainly improve the documentation, but removing the warning is clearly not a good idea. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250312134400.2176393-1-christian.koenig@amd.com
2025-03-12Backmerge tag 'v6.14-rc6' into drm-nextDave Airlie
This is a backmerge from Linux 6.14-rc6, needed for the nova PR. Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-03-06drm/sched: Document run_job() refcount hazardPhilipp Stanner
drm_sched_backend_ops.run_job() returns a dma_fence for the scheduler. That fence is signalled by the driver once the hardware completed the associated job. The scheduler does not increment the reference count on that fence, but implicitly expects to inherit this fence from run_job(). This is relatively subtle and prone to misunderstandings. This implies that, to keep a reference for itself, a driver needs to call dma_fence_get() in addition to dma_fence_init() in that callback. It's further complicated by the fact that the scheduler even decrements the refcount in drm_sched_run_job_work() since it created a new reference in drm_sched_fence_scheduled(). It does, however, still use its pointer to the fence after calling dma_fence_put() - which is safe because of the aforementioned new reference, but actually still violates the refcounting rules. Move the call to dma_fence_put() to the position behind the last usage of the fence. Suggested-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250305130551.136682-4-phasta@kernel.org
2025-03-05drm/sched: drm_sched_job_cleanup(): correct false docPhilipp Stanner
drm_sched_job_cleanup()'s documentation claims that calling drm_sched_job_arm() is a "point of no return", implying that afterwards a job cannot be cancelled anymore. This is not correct, as proven by the function's code itself, which takes a previous call to drm_sched_job_arm() into account. In truth, the decisive factors are whether fences have been shared (e.g., with other processes) and if the job has been submitted to an entity already. Correct the wrong docstring. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250304141346.102683-2-phasta@kernel.org
2025-03-04drm/sched: stop passing non struct drm_device to drm_err() and friendsJani Nikula
The expectation is that the struct drm_device based logging helpers get passed an actual struct drm_device pointer rather than some random struct pointer where you can dereference the ->dev member. Convert drm_err(sched, ...) to dev_err(sched->dev, ...) and similar. This matches current usage, as struct drm_device is not available, but drops "[drm]" or "[drm] *ERROR*" prefix from logging. Unfortunately, there's no dev_WARN_ON(), so the conversion is not exactly the same. Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/fe441dd1469d2b03e6b2ff247078bdde2011c6e3.1737644530.git.jani.nikula@intel.com
2025-03-03drm/sched: Fix preprocessor guardPhilipp Stanner
When writing the header guard for gpu_scheduler_trace.h, a typo, apparently, occurred. Fix the typo and document the scope of the guard. Fixes: 353da3c520b4 ("drm/amdgpu: add tracepoint for scheduler (v2)") Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250218124149.118002-2-phasta@kernel.org
2025-02-24drm/sched: Move internal prototypes to internal headerTvrtko Ursulin
Now that we have a header file for internal scheduler interfaces we can move some more prototypes into it. By doing that we eliminate the chance of drivers trying to use something which was not intended to be used. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-6-tvrtko.ursulin@igalia.com
2025-02-24drm/sched: Move drm_sched_entity_is_ready to internal headerTvrtko Ursulin
Helper is for scheduler internal use so lets hide it from DRM drivers completely. At the same time we change the method of checking whethere there is anything in the queue from peeking to looking at the node count. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-5-tvrtko.ursulin@igalia.com
2025-02-24drm/sched: Add internal job peek/pop APITvrtko Ursulin
Idea is to add helpers for peeking and popping jobs from entities with the goal of decoupling the hidden assumption in the code that queue_node is the first element in struct drm_sched_job. That assumption usually comes in the form of: while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) Which breaks if the queue_node is re-positioned due to_drm_sched_job being implemented with a container_of. This also allows us to remove duplicate definitions of to_drm_sched_job. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-2-tvrtko.ursulin@igalia.com
2025-02-12drm/sched: Use struct for drm_sched_init() paramsPhilipp Stanner
drm_sched_init() has a great many parameters and upcoming new functionality for the scheduler might add even more. Generally, the great number of parameters reduces readability and has already caused one missnaming, addressed in: commit 6f1cacf4eba7 ("drm/nouveau: Improve variable name in nouveau_sched_init()"). Introduce a new struct for the scheduler init parameters and port all users. Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Acked-by: Matthew Brost <matthew.brost@intel.com> # for Xe Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> # for Panfrost and Panthor Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> # for Etnaviv Reviewed-by: Frank Binns <frank.binns@imgtec.com> # for Imagination Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> # for Sched Reviewed-by: Maíra Canal <mcanal@igalia.com> # for v3d Reviewed-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> # for amdxdna Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250211111422.21235-2-phasta@kernel.org
2025-01-20drm/sched: Add helper to check job dependenciesTvrtko Ursulin
Lets isolate scheduler internals from drivers such as pvr which currently walks the dependency array to look for fences. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250113103341.43914-1-tvrtko.ursulin@igalia.com
2025-01-16drm/sched: Remove weak paused submission checksTvrtko Ursulin
There is no need to check the boolean in the work item's prologues since the boolean can be set at any later time anyway. The helper which pauses submission sets it and synchronously cancels the work and helpers which queue the work check for the flag so all should be good. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250114105942.64832-1-tvrtko.ursulin@igalia.com
2025-01-13drm/sched: Delete unused update_job_creditsTvrtko Ursulin
No driver is using the update_job_credits() schduler vfunc so lets remove it. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Acked-by: Matt Coster <matt.coster@imgtec.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250110111301.76909-1-tvrtko.ursulin@igalia.com
2024-12-19drm/sched: Fix drm_sched_fini() docu generationBagas Sanjaya
Commit baf4afc5831438 ("drm/sched: Improve teardown documentation") added a list of drm_sched_fini()'s problems. The list triggers htmldocs warning (but renders correctly in htmldocs output): Documentation/gpu/drm-mm:571: ./drivers/gpu/drm/scheduler/sched_main.c:1359: ERROR: Unexpected indentation. Separate the list from the preceding paragraph by a blank line to fix the warning. While at it, also end the aforementioned paragraph by a colon. Fixes: baf4afc58314 ("drm/sched: Improve teardown documentation") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20241108175655.6d3fcfb7@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> [phasta: Adjust commit message] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241217034915.62594-1-bagasdotme@gmail.com
2024-12-19drm/sched: Fix drm_sched_fini() docu generationBagas Sanjaya
Commit baf4afc5831438 ("drm/sched: Improve teardown documentation") documents problems of drm_sched_fini() in form of a list. The checklist triggers htmldocs warning (but renders correctly in htmldocs output): Documentation/gpu/drm-mm:571: ./drivers/gpu/drm/scheduler/sched_main.c:1359: ERROR: Unexpected indentation. Separate the list from the preceding paragraph by a blank line to fix the warning. While at it, also end the aforementioned paragraph by a colon. Fixes: baf4afc58314 ("drm/sched: Improve teardown documentation") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20241108175655.6d3fcfb7@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> [phasta: Adjust commit message] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241217034915.62594-1-bagasdotme@gmail.com
2024-11-07drm/sched: Improve teardown documentationPhilipp Stanner
If jobs are still enqueued in struct drm_gpu_scheduler.pending_list when drm_sched_fini() gets called, those jobs will be leaked since that function stops both job-submission and (automatic) job-cleanup. It is, thus, up to the driver to take care of preventing leaks. The related function drm_sched_wqueue_stop() also prevents automatic job cleanup. Those pitfals are not reflected in the documentation, currently. Explicitly inform about the leak problem in the docstring of drm_sched_fini(). Additionally, detail the purpose of drm_sched_wqueue_{start,stop} and hint at the consequences for automatic cleanup. Signed-off-by: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105143137.71893-2-pstanner@redhat.com
2024-11-04Backmerge v6.12-rc6 of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next Backmerge Linus tree for some drm-fixes needed for msm and xe merges. Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-10-31drm/sched: Document purpose of drm_sched_{start,stop}Philipp Stanner
drm_sched_start()'s and drm_sched_stop()'s names suggest that those functions might be intended for actively starting and stopping the scheduler on initialization and teardown. They are, however, only used on timeout handling (reset recovery). The docstrings should reflect that to prevent confusion. Document those functions' purpose. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029133819.78696-2-pstanner@redhat.com
2024-10-28drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIMMatthew Brost
drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the path of dma-fences, and dma-fences are in the path of reclaim. Mark scheduler work queue with WQ_MEM_RECLAIM to ensure forward progress during reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward progress during reclaim. v2: - Fixes tags (Philipp) - Reword commit message (Philipp) Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Philipp Stanner <pstanner@redhat.com> Cc: stable@vger.kernel.org Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for submit_wq") Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work queue rather than kthread") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023235917.1836428-1-matthew.brost@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-10-25drm/sched: warn about drm_sched_job_init()'s partial initPhilipp Stanner
drm_sched_job_init()'s name suggests that after the function succeeded, parameter "job" will be fully initialized. This is not the case; some members are only later set, notably drm_sched_job.sched by drm_sched_job_arm(). Document that drm_sched_job_init() does not set all struct members. Document the lifetime of drm_sched_job.sched. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023141530.113370-2-pstanner@redhat.com
2024-10-22drm/sched: memset() 'job' in drm_sched_job_init()Philipp Stanner
drm_sched_job_init() has no control over how users allocate struct drm_sched_job. Unfortunately, the function can also not set some struct members such as job->sched. This could theoretically lead to UB by users dereferencing the struct's pointer members too early. It is easier to debug such issues if these pointers are initialized to NULL, so dereferencing them causes a NULL pointer exception. Accordingly, drm_sched_entity_init() does precisely that and initializes its struct with memset(). Initialize parameter "job" to 0 in drm_sched_job_init(). Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241021105028.19794-2-pstanner@redhat.com Reviewed-by: Christian König <christian.koenig@amd.com>
2024-10-17drm/sched: Further optimise drm_sched_entity_push_jobTvrtko Ursulin
Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming inconsistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-6-tursulin@igalia.com
2024-10-17drm/sched: Re-group and rename the entity run-queue lockTvrtko Ursulin
When writing to a drm_sched_entity's run-queue, writers are protected through the lock drm_sched_entity.rq_lock. This naming, however, frequently collides with the separate internal lock of struct drm_sched_rq, resulting in uses like this: spin_lock(&entity->rq_lock); spin_lock(&entity->rq->lock); Rename drm_sched_entity.rq_lock to improve readability. While at it, re-order that struct's members to make it more obvious what the lock protects. v2: * Rename some rq_lock straddlers in kerneldoc, improve commit text. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> [pstanner: Fix typo in docstring] Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-5-tursulin@igalia.com
2024-10-17drm/sched: Stop setting current entity in FIFO modeTvrtko Ursulin
It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of continuing in RR mode from where FIFO left it, and that sounds completely fine. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-3-tursulin@igalia.com
2024-10-17drm/sched: Optimise drm_sched_entity_push_jobTvrtko Ursulin
In FIFO mode (which is the default), both drm_sched_entity_push_job() and drm_sched_rq_update_fifo(), where the latter calls the former, are currently taking and releasing the same entity->rq_lock. We can avoid that design inelegance, and also have a miniscule efficiency improvement on the submit from idle path, by introducing a new drm_sched_rq_update_fifo_locked() helper and pulling up the lock taking to its callers. v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) v3: * Improved commit message. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-2-tursulin@igalia.com
2024-10-09Merge tag 'drm-misc-next-2024-09-26' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.13: UAPI Changes: - panthor: Add realtime group priority and priority query. Cross-subsystem Changes: - Add Vivek Kasireddy as udmabuf maintainer. - Assorted udmabuf changes. - Device tree binding updates. - dmabuf documentation fixes. - Move drm_rect to drm core module from kms helper. Core Changes: - Update scheduler documentation and concurrency fixes. - drm/ci updates. - Add memory-agnostic fbdev client and client-agnostic setup helper. - Huge driver conversion for using the above. Driver Changes: - Assorted fixes to imx, panel/nt35510, sti, accel/ivpu, v3d, vkms, host1x. - Add panel quirks for AYA NEO panels. - Make module autoloading work for bridge/it6505 and mcde. - Add huge page support to v3d using a custom shmfs. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a9b95e6f-9f35-464e-83f6-bda75b35ee0b@linux.intel.com
2024-10-09Merge tag 'drm-misc-next-2024-09-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: - Add panthor/DEV_QUERY_TIMESTAMP_INFO query. Cross-subsystem Changes: - Updated dt bindings. - Add documentation explaining default errnos for fences. - Mark dma-buf heaps creation functions as __init. Core Changes: - Split DSC helpers from DP helpers. - Clang build fixes for drm/mm test. - Remove simple pipeline support for gem-vram, no longer any users left after converting bochs. - Add erno to drm_sched_start to distinguish between GPU and queue reset. - Add drm_framebuffer testcases. - Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=n. - Use read_trylock instead of read_lock in dma_fence_begin_signalling to quiesce lockdep. Driver Changes: - Assorted small fixes and updates for tegra, host1x, imagination, nouveau, panfrost, panthor, panel/ili9341, mali, exynos, panel/samsung-s6e3fa7, ast, bridge/ti-sn65dsi86, panel/himax-hx83112a, bridge/tc358767, bridge/imx8mp-hdmi-tx, panel/khadas-ts050, panel/nt36523, panel/sony-acx565akm, kmb, accel/qaic, omap, v3d. - Add bridge/TI TDP158. - Assorted documentation updates. - Convert bochs from simple drm to gem shmem, and check modes against available memory. - Many VC4 fixes, most related to scaling and YUV support. - Convert some drivers to use SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS. - Rockchip 4k@60 support. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/445713a6-2427-4c53-8ec2-3a894ec62405@linux.intel.com
2024-10-02drm/sched: Use drm sched lockdep map for submit_wqMatthew Brost
Avoid leaking a lockdep map on each drm sched creation and destruction by using a single lockdep map for all drm sched allocated submit_wq. v2: - Use alloc_ordered_workqueue_lockdep_map (Tejun) Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241002131639.3425022-2-matthew.brost@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-10-01Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixesMaarten Lankhorst
Required for a panthor fix that broke when FOP_UNSIGNED_OFFSET was added in place of FMODE_UNSIGNED_OFFSET. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-10-01Merge tag 'drm-misc-fixes-2024-09-26' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: atomic: - Use correct type when reading damage rectangles display: - Fix kernel docs dp-mst: - Fix DSC decompression detection hdmi: - Fix infoframe size panthor: - Fix locking sched: - Update maintainers - Fix race condition whne queueing up jobs sysfb: - Disable sysfb if framebuffer parent device is unknown vbox: - Fix VLA handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240926121045.GA561653@localhost.localdomain
2024-09-30drm/sched: revert "Always increment correct scheduler score"Christian König
This reverts commit 087913e0ba2b3b9d7ccbafb2acf5dab9e35ae1d5. It turned out that the original code was correct since the rq can only change when there is no armed job for an entity. This change here broke the logic since we only incremented the counter for the first job, so revert it. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930131451.536150-1-christian.koenig@amd.com
2024-09-26drm/sched: Always increment correct scheduler scoreTvrtko Ursulin
Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.9+ Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924101914.2713-4-tursulin@igalia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-09-26drm/sched: Always wake up correct scheduler in drm_sched_entity_push_jobTvrtko Ursulin
Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more problematic since the same lock is taken inside drm_sched_rq_update_fifo(). v2: * Improve commit message. (Philipp) * Cache the scheduler pointer directly. (Christian) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Philipp Stanner <pstanner@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924101914.2713-3-tursulin@igalia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-09-26drm/sched: Add locking to drm_sched_entity_modify_schedTvrtko Ursulin
Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * Improve commit message. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: Philipp Stanner <pstanner@redhat.com> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240913160559.49054-2-tursulin@igalia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-09-24drm/sched: Add locking to drm_sched_entity_modify_schedTvrtko Ursulin
Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * Improve commit message. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: Philipp Stanner <pstanner@redhat.com> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924101914.2713-2-tursulin@igalia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-09-24drm/scheduler: Improve documentationShuicheng Lin
Function drm_sched_entity_push_job() doesn't have a return value, remove the return value description for it. Correct several other typo errors. v2 (Philipp): - more correction with related comments. Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240917144732.2758572-1-shuicheng.lin@intel.com
2024-09-24drm/sched: Fix dynamic job-flow control raceRob Clark
Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609 The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <lina@asahilina.net> Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control") Closes: https://github.com/AsahiLinux/linux/issues/309 Cc: stable@vger.kernel.org Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Janne Grunau <j@jannau.net> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240913202301.16772-1-robdclark@gmail.com
2024-09-06drm/sched: add optional errno to drm_sched_start()Christian König
The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL parent in the pending list, making it difficult to distinguish between recovery methods, whether a queue reset or a full GPU reset was used. To improve this, we first try a soft recovery for timeout jobs and use the error code -ENODATA. If soft recovery fails, we proceed with a queue reset, where the error code remains -ENODATA for the job. Finally, for a full GPU reset, we use error codes -ECANCELED or -ETIME. This patch adds an error code parameter to drm_sched_start, allowing us to differentiate between queue reset and GPU reset failures. This enables user mode and test applications to validate the expected correctness of the requested operation. After a successful queue reset, the only way to continue normal operation is to call drm_sched_job_done with the specific error code -ENODATA. v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain and amdgpu_device_unlock_reset_domain to allow user mode to track the queue reset status and distinguish between queue reset and GPU reset. v2: Christian suggested using the error codes -ENODATA for queue reset and -ECANCELED or -ETIME for GPU reset, returned to amdgpu_cs_wait_ioctl. v3: To meet the requirements, we introduce a new function drm_sched_start_ex with an additional parameter to set dma_fence_set_error, allowing us to handle the specific error codes appropriately and dispose of bad jobs with the selected error code depending on whether it was a queue reset or GPU reset. v4: Alex suggested using a new name, drm_sched_start_with_recovery_error, which more accurately describes the function's purpose. Additionally, it was recommended to add documentation details about the new method. v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex) v6 (chk): rebase on upstream changes, cleanup the commit message, drop the new function again and update all callers, apply the errno also to scheduler fences with hw fences v7 (chk): rebased Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com
2024-07-25drm/scheduler: remove full_recover from drm_sched_startChristian König
This was basically just another one of amdgpus hacks. The parameter allowed to restart the scheduler without turning fence signaling on again. That this is absolutely not a good idea should be obvious by now since the fences will then just sit there and never signal. While at it cleanup the code a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com