summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/gem
AgeCommit message (Collapse)Author
2022-07-29Merge branches 'arm/exynos', 'arm/mediatek', 'arm/msm', 'arm/smmu', ↵Joerg Roedel
'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next
2022-07-28drm/i915/gt: Batch TLB invalidationsChris Wilson
Invalidate TLB in batches, in order to reduce performance regressions. Currently, every caller performs a full barrier around a TLB invalidation, ignoring all other invalidations that may have already removed their PTEs from the cache. As this is a synchronous operation and can be quite slow, we cause multiple threads to contend on the TLB invalidate mutex blocking userspace. We only need to invalidate the TLB once after replacing our PTE to ensure that there is no possible continued access to the physical address before releasing our pages. By tracking a seqno for each full TLB invalidate we can quickly determine if one has been performed since rewriting the PTE, and only if necessary trigger one for ourselves. That helps to reduce the performance regression introduced by TLB invalidate logic. [mchehab: rebased to not require moving the code to a separate file] Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4e97ef5deb6739cadaaf40aa45620547e9c4ec06.1658924372.git.mchehab@kernel.org
2022-07-28drm/i915/gt: Ignore TLB invalidations on idle enginesChris Wilson
Check if the device is powered down prior to any engine activity, as, on such cases, all the TLBs were already invalidated, so an explicit TLB invalidation is not needed, thus reducing the performance regression impact due to it. This becomes more significant with GuC, as it can only do so when the connection to the GuC is awake. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/278a57a672edac75683f0818b292e95da583a5fe.1658924372.git.mchehab@kernel.org
2022-07-28drm/i915: Suppress oom warning for shmemfs object allocation failureChris Wilson
We report object allocation failures to userspace with ENOMEM, yet we still show the memory warning after failing to shrink device allocated pages. While this warning is similar to other system page allocation failures, it is superfluous to the ENOMEM provided directly to userspace. v2: Add NOWARN in few more places from where we might return ENOMEM to userspace. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4936 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Co-developed-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220727174023.16766-1-nirmoy.das@intel.com
2022-07-22Merge tag 'drm-intel-gt-next-2022-07-13' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver uAPI changes: - All related to the Small BAR support: (and all by Matt Auld) * add probed_cpu_visible_size * expose the avail memory region tracking * apply ALLOC_GPU only by default * add NEEDS_CPU_ACCESS hint * tweak error capture on recoverable contexts Driver highlights: - Add Small BAR support (Matt) - Add MeteorLake support (RK) - Add support for LMEM PCIe resizable BAR (Akeem) Driver important fixes: - ttm related fixes (Matt Auld) - Fix a performance regression related to waitboost (Chris) - Fix GT resets (Chris) Driver others: - Adding GuC SLPC selftest (Vinay) - Fix ADL-N GuC load (Daniele) - Add platform workaround (Gustavo, Matt Roper) - DG2 and ATS-M device ID updates (Matt Roper) - Add VM_BIND doc rfc with uAPI documentation (Niranjana) - Fix user-after-free in vma destruction (Thomas) - Async flush of GuC log regions (Alan) - Fixes in selftests (Chris, Dan, Andrzej) - Convert to drm_dbg (Umesh) - Disable OA sseu config param for newer hardware (Umesh) - Multi-cast register steering changes (Matt Roper) - Add lmem_bar_size modparam (Priyanka) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Ys85pcMYLkqF/HtB@intel.com
2022-07-17drm/i915/ttm: fix 32b buildMatthew Auld
Since segment_pages is no longer a compile time constant, it looks the DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest is just to use the ULL variant, but really we should need not need more than u32 for the page alignment (also we are limited by that due to the sg->length type), so also make it all u32. Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: aff1e0b09b54 ("drm/i915/ttm: fix sg_table construction") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220712174050.592550-1-matthew.auld@intel.com (cherry picked from commit 9306b2b2dfce6931241ef804783692cee526599c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-07-15drm/i915: Remove unnecessary includeLu Baolu
intel-iommu.h is not needed in drm/i915 anymore. Remove its include. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://lore.kernel.org/r/20220514014322.2927339-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-13drm/i915/ttm: fix 32b buildMatthew Auld
Since segment_pages is no longer a compile time constant, it looks the DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest is just to use the ULL variant, but really we should need not need more than u32 for the page alignment (also we are limited by that due to the sg->length type), so also make it all u32. Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: bc99f1209f19 ("drm/i915/ttm: fix sg_table construction") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220712174050.592550-1-matthew.auld@intel.com
2022-07-13Merge drm/drm-next into drm-misc-nextMaxime Ripard
I need to have some vc4 patches merged in -rc4, but drm-misc-next is only at -rc2 for now. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2022-07-12drm/i915/gem: Look for waitboosting across the whole object prior to ↵Chris Wilson
individual waits We employ a "waitboost" heuristic to detect when userspace is stalled waiting for results from earlier execution. Under latency sensitive work mixed between the gpu/cpu, the GPU is typically under-utilised and so RPS sees that low utilisation as a reason to downclock the frequency, causing longer stalls and lower throughput. The user left waiting for the results is not impressed. On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") it was observed that deinterlacing h264 on Haswell performance dropped by 2-5x. The reason being that the natural workload was not intense enough to trigger RPS (using HW evaluation intervals) to upclock, and so it was depending on waitboosting for the throughput. Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") changes the composition of dma-resv from keeping a single write fence + multiple read fences, to a single array of multiple write and read fences (a maximum of one pair of write/read fences per context). The iteration order was also changed implicitly from all-read fences then the single write fence, to a mix of write fences followed by read fences. It is that ordering change that belied the fragility of waitboosting. Currently, a waitboost is inspected at the point of waiting on an outstanding fence. If the GPU is backlogged such that we haven't yet stated the request we need to wait on, we force the GPU to upclock until the completion of that request. By changing the order in which we waited upon requests, we ended up waiting on those requests in sequence and as such we saw that each request was already started and so not a suitable candidate for waitboosting. Instead of asking whether to boost each fence in turn, we can look at whether boosting is required for the dma-resv ensemble prior to waiting on any fence, making the heuristic more robust to the order in which fences are stored in the dma-resv. Reported-by: Thomas Voegtle <tv@lio96.de> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284 Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com> Tested-by: Thomas Voegtle <tv@lio96.de> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com (cherry picked from commit 394e2b57a989113de494c52d4683444bcb02d4e1) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-07-12drm/i915/ttm: fix sg_table constructionMatthew Auld
If we encounter some monster sized local-memory page that exceeds the maximum sg length (UINT32_MAX), ensure that don't end up with some misaligned address in the entry that follows, leading to fireworks later. Also ensure we have some coverage of this in the selftests. v2(Chris): - Use round_down consistently to avoid udiv errors v3(Nirmoy): - Also update the max_segment in the selftest Fixes: f701b16d4cc5 ("drm/i915/ttm: add i915_sg_from_buddy_resource") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6379 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220711085859.24198-1-matthew.auld@intel.com (cherry picked from commit bc99f1209f19fefa3ee11e77464ccfae541f4291) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-07-12drm/i915/gem: Look for waitboosting across the whole object prior to ↵Chris Wilson
individual waits We employ a "waitboost" heuristic to detect when userspace is stalled waiting for results from earlier execution. Under latency sensitive work mixed between the gpu/cpu, the GPU is typically under-utilised and so RPS sees that low utilisation as a reason to downclock the frequency, causing longer stalls and lower throughput. The user left waiting for the results is not impressed. On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") it was observed that deinterlacing h264 on Haswell performance dropped by 2-5x. The reason being that the natural workload was not intense enough to trigger RPS (using HW evaluation intervals) to upclock, and so it was depending on waitboosting for the throughput. Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") changes the composition of dma-resv from keeping a single write fence + multiple read fences, to a single array of multiple write and read fences (a maximum of one pair of write/read fences per context). The iteration order was also changed implicitly from all-read fences then the single write fence, to a mix of write fences followed by read fences. It is that ordering change that belied the fragility of waitboosting. Currently, a waitboost is inspected at the point of waiting on an outstanding fence. If the GPU is backlogged such that we haven't yet stated the request we need to wait on, we force the GPU to upclock until the completion of that request. By changing the order in which we waited upon requests, we ended up waiting on those requests in sequence and as such we saw that each request was already started and so not a suitable candidate for waitboosting. Instead of asking whether to boost each fence in turn, we can look at whether boosting is required for the dma-resv ensemble prior to waiting on any fence, making the heuristic more robust to the order in which fences are stored in the dma-resv. Reported-by: Thomas Voegtle <tv@lio96.de> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284 Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com> Tested-by: Thomas Voegtle <tv@lio96.de> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com
2022-07-11drm/i915/ttm: fix sg_table constructionMatthew Auld
If we encounter some monster sized local-memory page that exceeds the maximum sg length (UINT32_MAX), ensure that don't end up with some misaligned address in the entry that follows, leading to fireworks later. Also ensure we have some coverage of this in the selftests. v2(Chris): - Use round_down consistently to avoid udiv errors v3(Nirmoy): - Also update the max_segment in the selftest Fixes: f701b16d4cc5 ("drm/i915/ttm: add i915_sg_from_buddy_resource") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6379 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@linux.intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220711085859.24198-1-matthew.auld@intel.com
2022-07-11drm/ttm: rename and cleanup ttm_bo_initChristian König
Rename ttm_bo_init to ttm_bo_init_validate since that better matches what the function is actually doing. Remove the unused size parameter, move the function's kerneldoc to the implementation and cleanup the whole error handling. Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220707102453.3633-2-christian.koenig@amd.com Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2022-07-07drm/i915/selftests: Grab the runtime pm in shrink_thpChris Wilson
Since we are not holding a wakeref, shrinking a bound object is not guaranteed. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6370 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220706154738.235204-1-matthew.auld@intel.com
2022-07-03mm: shrinkers: provide shrinkers with namesRoman Gushchin
Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-01drm/i915/ttm: disallow CPU fallback mode for ccs pagesMatthew Auld
Falling back to memcpy/memset shouldn't be allowed if we know we have CCS state to manage using the blitter. Otherwise we are potentially leaving the aux CCS state in an unknown state, which smells like an info leak. Fixes: 48760ffe923a ("drm/i915/gt: Clear compress metadata for Flat-ccs objects") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-12-matthew.auld@intel.com
2022-07-01drm/i915/ttm: handle blitter failure on DG2Matthew Auld
If the move or clear operation somehow fails, and the memory underneath is not cleared, like when moving to lmem, then we currently fallback to memcpy or memset. However with small-BAR systems this fallback might no longer be possible. For now we use the set_wedged sledgehammer if we ever encounter such a scenario, and mark the object as borked to plug any holes where access to the memory underneath can happen. Add some basic selftests to exercise this. v2: - In the selftests make sure we grab the runtime pm around the reset. Also make sure we grab the reset lock before checking if the device is wedged, since the wedge might still be in-progress and hence the bit might not be set yet. - Don't wedge or put the object into an unknown state, if the request construction fails (or similar). Just returning an error and skipping the fallback should be safe here. - Make sure we wedge each gt. (Thomas) - Peek at the unknown_state in io_reserve, that way we don't have to export or hand roll the fault_wait_for_idle. (Thomas) - Add the missing read-side barriers for the unknown_state. (Thomas) - Some kernel-doc fixes. (Thomas) v3: - Tweak the ordering of the set_wedged, also add FIXME. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-11-matthew.auld@intel.com
2022-07-01drm/i915/selftests: ensure we reserve a fence slotMatthew Auld
We should always be explicit and allocate a fence slot before adding a new fence. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-10-matthew.auld@intel.com
2022-07-01drm/i915/selftests: skip the mman tests for stolenMatthew Auld
It's not supported, and just skips later anyway. With small-BAR things get more complicated since all of stolen is likely not even CPU accessible, hence not passing I915_BO_ALLOC_GPU_ONLY just results in the object create failing. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-9-matthew.auld@intel.com
2022-07-01drm/i915/uapi: tweak error capture on recoverable contextsMatthew Auld
A non-recoverable context must be used if the user wants proper error capture on discrete platforms. In the future the kernel may want to blit the contents of some objects when later doing the capture stage. Also extend to newer integrated platforms. v2(Thomas): - Also extend to newer integrated platforms, for capture buffer memory allocation purposes. v3 (Reported-by: kernel test robot <lkp@intel.com>): - Fix build on !CONFIG_DRM_I915_CAPTURE_ERROR Testcase: igt@gem_exec_capture@capture-recoverable Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-8-matthew.auld@intel.com
2022-07-01drm/i915/uapi: add NEEDS_CPU_ACCESS hintMatthew Auld
If set, force the allocation to be placed in the mappable portion of I915_MEMORY_CLASS_DEVICE. One big restriction here is that system memory (i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the object, that way we can always spill the object into system memory if we can't make space. Testcase: igt@gem-create@create-ext-cpu-access-sanity-check Testcase: igt@gem-create@create-ext-cpu-access-big Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-6-matthew.auld@intel.com
2022-07-01drm/i915/uapi: apply ALLOC_GPU_ONLY by defaultMatthew Auld
On small BAR configurations, when dealing with I915_MEMORY_CLASS_DEVICE allocations, we assume that by default, all userspace allocations should be placed in the non-CPU visible portion. Note that dumb buffers are not included here, since these are not "GPU accelerated" and likely need CPU access. We choose to just always set GPU_ONLY, and let the backend figure out if that should be ignored or not, for example on full BAR systems. In a later patch userspace will be able to provide a hint if CPU access to the buffer is needed. v2(Thomas) - Apply GPU_ONLY on all discrete devices, but only if the BO can be placed in LMEM. Down in the depths this should be turned into a noop, where required, and as an annotation it still make some sense. If we apply it regardless of the placements then we end up needing to check the placements during exec capture. Also it's slightly inconsistent since the NEEDS_CPU_ACCESS can only be applied on objects that can be placed in LMEM. The other annoyance would be gem_create_ext vs plain gem_create, if we were to always apply GPU_ONLY. Testcase: igt@gem-create@create-ext-cpu-access-sanity-check Testcase: igt@gem-create@create-ext-cpu-access-big Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629174350.384910-5-matthew.auld@intel.com
2022-07-01Merge tag 'drm-intel-gt-next-2022-06-29' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - Expose per tile media freq factor in sysfs (Ashutosh Dixit, Dale B Stimson) - Document memory residency and Flat-CCS capability of obj (Ramalingam C) - Disable GETPARAM lookups of I915_PARAM_[SUB]SLICE_MASK on Xe_HP+ (Matt Roper) Cross-subsystem Changes: - Rename intel-gtt symbols (Lucas De Marchi) Core Changes: Driver Changes: - Support programming the EU priority in the GuC descriptor (DG2) (Matthew Brost) - DG2 HuC loading support (Daniele Ceraolo Spurio) - Fix build error without CONFIG_PM (YueHaibing) - Enable THP on Icelake and beyond (Tvrtko Ursulin) - Only setup private tmpfs mount when needed and fix logging (Tvrtko Ursulin) - Make __guc_reset_context aware of guilty engines (Umesh Nerlige Ramappa) - DG2 small bar memory probing fixes (Nirmoy Das) - Remove unnecessary GuC err capture noise (Alan Previn) - Fix i915_gem_object_ggtt_pin_ww regression on old platforms (Maarten Lankhorst) - Fix undefined behavior in GuC backend due to shift overflowing the constant (Borislav Petkov) - New DG2 workarounds (Swathi Dhanavanthri, Anshuman Gupta) - Report no hwconfig support on ADL-N (Balasubramani Vivekanandan) - Fix error_state_read ptr + offset use (Alan Previn) - Expose per tile media freq factor in sysfs (Ashutosh Dixit, Dale B Stimson) - Fix memory leaks in per-gt sysfs (Ashutosh Dixit) - Fix dma_resv fence handling in multi-batch execbuf (Nirmoy Das) - Add extra registers to GPU error dump on Gen11+ (Stuart Summers) - More PVC+DG2 workarounds (Matt Roper) - Improve user experience and driver robustness under SIGINT or similar (Tvrtko Ursulin) - Don't show engine classes not present (Tvrtko Ursulin) - Improve on suspend / resume time with VT-d enabled (Thomas Hellström) - Add missing else (katrinzhou) - Don't leak lmem mapping in vma_evict (Juha-Pekka Heikkila) - Add smem fallback allocation for dpt (Juha-Pekka Heikkila) - Tweak the ordering in cpu_write_needs_clflush (Matthew Auld) - Do not access rq->engine without a reference (Niranjana Vishwanathapura) - Revert "drm/i915: Hold reference to intel_context over life of i915_request" (Niranjana Vishwanathapura) - Don't update engine busyness stats too frequently (Alan Previn) - Add additional steps for Wa_22011802037 for execlist backend (Umesh Nerlige Ramappa) - Fix a lockdep warning at error capture (Nirmoy Das) - Ponte Vecchio prep work and new blitter engines (Matt Roper, John Harrison, Lucas De Marchi) - Read correct RP_STATE_CAP register (PVC) (Matt Roper) - Define MOCS table for PVC (Ayaz A Siddiqui) - Driver refactor and support Ponte Vecchio forcewake handling (Matt Roper) - Remove additional 3D flags from PIPE_CONTROL (Ponte Vecchio) (Stuart Summers) - XEHPSDV and PVC do not use HuC (Daniele Ceraolo Spurio) - Extract stepping information from PCI revid (Ponte Vecchio) (Matt Roper) - Add initial PVC workarounds (Stuart Summers) - SSEU handling driver refactor and Ponte Vecchio support (Matt Roper) - GuC depriv applies to PVC (Matt Roper) - Add register steering (Ponte Vecchio) (Matt Roper) - Add recommended MMIO setting (Ponte Vecchio) (Matt Roper) - Move multicast register handling to a dedicated file (Matt Roper) - Cleanup interface for MCR operations (Matt Roper) - Extend i915_vma_pin_iomap() (CQ Tang) - Re-do the intel-gtt split (Lucas De Marchi) - Correct duplicated/misplaced GT register definitions (Matt Roper) - Prefer "XEHP_" prefix for registers (Matt Roper) - Don't use DRM_DEBUG_WARN_ON for unexpected l3bank/mslice config (Tvrtko Ursulin) - Don't use DRM_DEBUG_WARN_ON for ring unexpectedly not idle (Tvrtko Ursulin) - Make drop_pages() return bool (Lucas De Marchi) - Fix CFI violation with show_dynamic_id() (Nathan Chancellor) - Use i915_probe_error instead of drm_error in GuC code (Vinay Belgaumkar) - Fix use of static in macro mismatch (Andi Shyti) - Update tiled blits selftest (Bommu Krishnaiah) - Future-proof platform checks (Matt Roper) - Only include what's needed (Jani Nikula) - remove accidental static from a local variable (Jani Nikula) - Add global forcewake request to drpc (Vinay Belgaumkar) - Fix spelling typo in comment (pengfuyuan) - Increase timeout for live_parallel_switch selftest (Akeem G Abodunrin) - Use non-blocking H2G for waitboost (Vinay Belgaumkar) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/YrwtLM081SQUG1Dc@tursulin-desk
2022-06-27drm/i915: tweak the ordering in cpu_write_needs_clflushMatthew Auld
For imported dma-buf objects we leave the object as cache_coherent = 0 across all platforms, which is reasonable given that have no clue what the memory underneath is, and its not like the driver can ever manually clflush the pages anyway (like with i915_gem_clflush_object) for such objects. However on discrete we choose to treat cache_dirty = true as a programmer error, leading to a warning. The simplest fix looks to be to just change the ordering in cpu_write_needs_clflush to prevent ever setting cache_dirty for dma-buf objects on discrete. Fixes: d028a7690d87 ("drm/i915/dmabuf: Fix prime_mmap to work when using LMEM") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5266 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220622155919.355081-1-matthew.auld@intel.com (cherry picked from commit 563aaf4a928def2d36d1b3de0a4b515e2477b4da) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2022-06-27drm/i915/gem: add missing elsekatrinzhou
Add missing else in set_proto_ctx_param() to fix coverity issue. Addresses-Coverity: ("Unused value") Fixes: d4433c7600f7 ("drm/i915/gem: Use the proto-context to handle create parameters (v5)") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: katrinzhou <katrinzhou@tencent.com> [tursulin: fixup alignment] Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220621124926.615884-1-tvrtko.ursulin@linux.intel.com (cherry picked from commit 7482a65664c16cc88eb84d2b545a1fed887378a1) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2022-06-27drm/i915: Prefer "XEHP_" prefix for registersMatt Roper
We've been introducing new registers with a mix of "XEHP_" (architecture) and "XEHPSDV_" (platform) prefixes. For consistency, let's settle on "XEHP_" as the preferred form. XEHPSDV_RP_STATE_CAP stays with its current name since that's truly a platform-specific register and not something that applies to the Xe_HP architecture as a whole. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Caz Yokoyama <caz@caztech.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220624210328.308630-2-matthew.d.roper@intel.com
2022-06-27drm/i915: Correct duplicated/misplaced GT register definitionsMatt Roper
XEHPSDV_FLAT_CCS_BASE_ADDR, GEN8_L3_LRA_1_GPGPU, and MMCD_MISC_CTRL were duplicated between i915_reg.h and intel_gt_regs.h. These are all GT registers, so we should drop the copy from i915_reg.h. XEHPSDV_TILE0_ADDR_RANGE was defined in i915_reg.h, but really belongs in intel_gt_regs.h. Move it. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220624210328.308630-1-matthew.d.roper@intel.com
2022-06-27drm/i915: tweak the ordering in cpu_write_needs_clflushMatthew Auld
For imported dma-buf objects we leave the object as cache_coherent = 0 across all platforms, which is reasonable given that have no clue what the memory underneath is, and its not like the driver can ever manually clflush the pages anyway (like with i915_gem_clflush_object) for such objects. However on discrete we choose to treat cache_dirty = true as a programmer error, leading to a warning. The simplest fix looks to be to just change the ordering in cpu_write_needs_clflush to prevent ever setting cache_dirty for dma-buf objects on discrete. Fixes: d028a7690d87 ("drm/i915/dmabuf: Fix prime_mmap to work when using LMEM") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5266 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220622155919.355081-1-matthew.auld@intel.com
2022-06-23drm/i915/selftests: Increase timeout for live_parallel_switchAkeem G Abodunrin
With GuC submission, it takes a little bit longer switching contexts among all available engines simultaneously, when running live_parallel_switch subtest. Increase the timeout. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5885 Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220622141104.334432-1-matthew.auld@intel.com
2022-06-22drm/i915/gem: add missing elsekatrinzhou
Add missing else in set_proto_ctx_param() to fix coverity issue. Addresses-Coverity: ("Unused value") Fixes: d4433c7600f7 ("drm/i915/gem: Use the proto-context to handle create parameters (v5)") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: katrinzhou <katrinzhou@tencent.com> [tursulin: fixup alignment] Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220621124926.615884-1-tvrtko.ursulin@linux.intel.com
2022-06-22drm/i915: Fix spelling typo in commentpengfuyuan
Fix spelling typo in comment. Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: pengfuyuan <pengfuyuan@kylinos.cn> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/tencent_7B226C4A9BC2B5EEB37B70C188B5015D290A@qq.com
2022-06-17drm/i915/gt: Cleanup interface for MCR operationsMatt Roper
Let's replace the assortment of intel_gt_* and intel_uncore_* functions that operate on MCR registers with a cleaner set of interfaces: * intel_gt_mcr_read -- unicast read from specific instance * intel_gt_mcr_read_any[_fw] -- unicast read from any non-terminated instance * intel_gt_mcr_unicast_write -- unicast write to specific instance * intel_gt_mcr_multicast_write[_fw] -- multicast write to all instances We'll also replace the historic "slice" and "subslice" terminology with "group" and "instance" to match the documentation for more recent platforms; these days MCR steering applies to more types of replication than just slice/subslice. v2: - Reference the new kerneldoc from i915.rst. (Jani) - Tweak the wording of the documentation for a couple functions to clarify the difference between "_fw" and non-"_fw" forms. v3: - s/read/write/ to fix copy-paste mistake in a couple comments. (Harish) Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220615001019.1821989-3-matthew.d.roper@intel.com
2022-06-17drm/i915/gt: Move multicast register handling to a dedicated fileMatt Roper
Handling of multicast/replicated registers is spread across intel_gt.c and intel_uncore.c today. As multicast handling and the related steering logic gets more complicated with the addition of new platforms and new rules it makes sense to centralize it all in one place. For now the existing functions have been moved to the new .c/.h as-is. Function renames and updates to operate in a more consistent manner will be done in subsequent patches. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220615001019.1821989-2-matthew.d.roper@intel.com
2022-06-17drm/i915: Improve user experience and driver robustness under SIGINT or similarTvrtko Ursulin
We have long standing customer complaints that pressing Ctrl-C (or to the effect of) causes engine resets with otherwise well behaving programs. Not only is logging engine resets during normal operation not desirable since it creates support incidents, but more fundamentally we should avoid going the engine reset path when we can since any engine reset introduces a chance of harming an innocent context. Reason for this undesirable behaviour is that the driver currently does not distinguish between banned contexts and non-persistent contexts which have been closed. To fix this we add the distinction between the two reasons for revoking contexts, which then allows the strict timeout only be applied to banned, while innocent contexts (well behaving) can preempt cleanly and exit without triggering the engine reset path. Note that the added context exiting category applies both to closed non- persistent context, and any exiting context when hangcheck has been disabled by the user. At the same time we rename the backend operation from 'ban' to 'revoke' which more accurately describes the actual semantics. (There is no ban at the backend level since banning is a concept driven by the scheduling frontend. Backends are simply able to revoke a running context so that is the more appropriate name chosen.) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220527072452.2225610-1-tvrtko.ursulin@linux.intel.com
2022-06-13drm/i915: Individualize fences before adding to dma_resv objNirmoy Das
_i915_vma_move_to_active() can receive > 1 fences for multiple batch buffers submission. Because dma_resv_add_fence() can only accept one fence at a time, change _i915_vma_move_to_active() to be aware of multiple fences so that it can add individual fences to the dma resv object. v6: fix multi-line comment. v5: remove double fence reservation for batch VMAs. v4: Reserve fences for composite_fence on multi-batch contexts and also reserve fence slots to composite_fence for each VMAs. v3: dma_resv_reserve_fences is not cumulative so pass num_fences. v2: make sure to reserve enough fence slots before adding. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5614 Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Cc: <stable@vger.kernel.org> # v5.16+ Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220525095955.15371-1-nirmoy.das@intel.com (cherry picked from commit 420a07b841d03f6a436d8c06571c69aa5c783897) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2022-06-02drm/i915/sseu: Disassociate internal subslice mask representation from uapiMatt Roper
As with EU masks, it's easier to store subslice/DSS masks internally in a format that's more natural for the driver to work with, and then only covert into the u8[] uapi form when the query ioctl is invoked. Since the hardware design changed significantly with Xe_HP, we'll use a union to choose between the old "hsw-style" subslice masks or the newer xehp mask. HSW-style masks will be stored in an array of u8's, indexed by slice (there's never more than 6 subslices per slice on older platforms). For Xe_HP and beyond where slices no longer exist, we only need a single bitmask. However we already know that this mask is eventually going to grow too large for a simple u64 to hold, so we'll represent it in a manner that can be operated on by the utilities in linux/bitmap.h. v2: - Fix typo: BIT(s) -> BIT(ss) in gen9_sseu_device_status() v3: - Eliminate sseu->ss_stride and just calculate the stride while specifically handling uapi. (Tvrtko) - Use BITMAP_BITS() macro to refer to size of masks rather than passing I915_MAX_SS_FUSE_BITS directly. (Tvrtko) - Report compute/geometry DSS masks separately when dumping Xe_HP SSEU info. (Tvrtko) - Restore dropped range checks to intel_sseu_has_subslice(). (Tvrtko) v4: - Make the bitmap size macro check the size of the .xehp field rather than the containing union. (Tvrtko) - Don't add GEM_BUG_ON() intel_sseu_has_subslice()'s check for whether slice or subslice ID exceed sseu->max_[sub]slices; various loops in the driver are expected to exceed these, so we should just silently return 'false.' v5: - Move XEHP_BITMAP_BITS() to the header so that we can also replace a usage of I915_MAX_SS_FUSE_BITS in one of the inline functions. (Bala) - Change the local variable in intel_slicemask_from_xehp_dssmask() from u16 to 'unsigned long' to make it a bit more future-proof. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220601150725.521468-6-matthew.d.roper@intel.com
2022-05-27drm/i915: Individualize fences before adding to dma_resv objNirmoy Das
_i915_vma_move_to_active() can receive > 1 fences for multiple batch buffers submission. Because dma_resv_add_fence() can only accept one fence at a time, change _i915_vma_move_to_active() to be aware of multiple fences so that it can add individual fences to the dma resv object. v6: fix multi-line comment. v5: remove double fence reservation for batch VMAs. v4: Reserve fences for composite_fence on multi-batch contexts and also reserve fence slots to composite_fence for each VMAs. v3: dma_resv_reserve_fences is not cumulative so pass num_fences. v2: make sure to reserve enough fence slots before adding. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5614 Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Cc: <stable@vger.kernel.org> # v5.16+ Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220525095955.15371-1-nirmoy.das@intel.com
2022-05-25Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "Intel have enabled DG2 on certain SKUs for laptops, AMD has started some new GPU support, msm has user allocated VA controls dma-buf: - add dma_resv_replace_fences - add dma_resv_get_singleton - make dma_excl_fence private core: - EDID parser refactorings - switch drivers to drm_mode_copy/duplicate - DRM managed mutex initialization display-helper: - put HDMI, SCDC, HDCP, DSC and DP into new module gem: - rework fence handling ttm: - rework bulk move handling - add common debugfs for resource managers - convert to kvcalloc format helpers: - support monochrome formats - RGB888, RGB565 to XRGB8888 conversions fbdev: - cfb/sys_imageblit fixes - pagelist corruption fix - create offb platform device - deferred io improvements sysfb: - Kconfig rework - support for VESA mode selection bridge: - conversions to devm_drm_of_get_bridge - conversions to panel_bridge - analogix_dp - autosuspend support - it66121 - audio support - tc358767 - DSI to DPI support - icn6211 - PLL/I2C fixes, DT property - adv7611 - enable DRM_BRIDGE_OP_HPD - anx7625 - fill ELD if no monitor - dw_hdmi - add audio support - lontium LT9211 support, i.MXMP LDB - it6505: Kconfig fix, DPCD set power fix - adv7511 - CEC support for ADV7535 panel: - ltk035c5444t, B133UAN01, NV3052C panel support - DataImage FG040346DSSWBG04 support - st7735r - DT bindings fix - ssd130x - fixes i915: - DG2 laptop PCI-IDs ("motherboard down") - Initial RPL-P PCI IDs - compute engine ABI - DG2 Tile4 support - DG2 CCS clear color compression support - DG2 render/media compression formats support - ATS-M platform info - RPL-S PCI IDs added - Bump ADL-P DMC version to v2.16 - Support static DRRS - Support multiple eDP/LVDS native mode refresh rates - DP HDR support for HSW+ - Lots of display refactoring + fixes - GuC hwconfig support and query - sysfs support for multi-tile - fdinfo per-client gpu utilisation - add geometry subslices query - fix prime mmap with LMEM - fix vm open count and remove vma refcounts - contiguous allocation fixes - steered register write support - small PCI BAR enablement - GuC error capture support - sunset igpu legacy mmap support for newer devices - GuC version 70.1.1 support amdgpu: - Initial SoC21 support - SMU 13.x enablement - SMU 13.0.4 support - ttm_eu cleanups - USB-C, GPUVM updates - TMZ fixes for RV - RAS support for VCN - PM sysfs code cleanup - DC FP rework - extend CG/PG flags to 64-bit - SI dpm lockdep fix - runtime PM fixes amdkfd: - RAS/SVM fixes - TLB flush fixes - CRIU GWS support - ignore bogus MEC signals more efficiently msm: - Fourcc modifier for tiled but not compressed layouts - Support for userspace allocated IOVA (GPU virtual address) - DPU: DSC (Display Stream Compression) support - DP: eDP support - DP: conversion to use drm_bridge and drm_bridge_connector - Merge DPU1 and MDP5 MDSS driver - DPU: writeback support nouveau: - make some structures static - make some variables static - switch to drm_gem_plane_helper_prepare_fb radeon: - misc fixes/cleanups mxsfb: - rework crtc mode setting - LCDIF CRC support etnaviv: - fencing improvements - fix address space collisions - cleanup MMU reference handling gma500: - GEM/GTT improvements - connector handling fixes komeda: - switch to plane reset helper mediatek: - MIPI DSI improvements omapdrm: - GEM improvements qxl: - aarch64 support vc4: - add a CL submission tracepoint - HDMI YUV support - HDMI/clock improvements - drop is_hdmi caching virtio: - remove restriction of non-zero blob types vmwgfx: - support for cursormob and cursorbypass 4 - fence improvements tidss: - reset DISPC on startup solomon: - SPI support - DT improvements sun4i: - allwinner D1 support - drop is_hdmi caching imx: - use swap() instead of open-coding - use devm_platform_ioremap_resource - remove redunant initializations ast: - Displayport support rockchip: - Refactor IOMMU initialisation - make some structures static - replace drm_detect_hdmi_monitor with drm_display_info.is_hdmi - support swapped YUV formats, - clock improvements - rk3568 support - VOP2 support mediatek: - MT8186 support tegra: - debugabillity improvements" * tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm: (1740 commits) drm/i915/dsi: fix VBT send packet port selection for ICL+ drm/i915/uc: Fix undefined behavior due to shift overflowing the constant drm/i915/reg: fix undefined behavior due to shift overflowing the constant drm/i915/gt: Fix use of static in macro mismatch drm/i915/audio: fix audio code enable/disable pipe logging drm/i915: Fix CFI violation with show_dynamic_id() drm/i915: Fix 'mixing different enum types' warnings in intel_display_power.c drm/i915/gt: Fix build error without CONFIG_PM drm/msm/dpu: handle pm_runtime_get_sync() errors in bind path drm/msm/dpu: add DRM_MODE_ROTATE_180 back to supported rotations drm/msm: don't free the IRQ if it was not requested drm/msm/dpu: limit writeback modes according to max_linewidth drm/amd: Don't reset dGPUs if the system is going to s2idle drm/amdgpu: Unmap legacy queue when MES is enabled drm: msm: fix possible memory leak in mdp5_crtc_cursor_set() drm/msm: Fix fb plane offset calculation drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init drm/msm/dsi: don't powerup at modeset time for parade-ps8640 drm/rockchip: Change register space names in vop2 dt-bindings: display: rockchip: make reg-names mandatory for VOP2 ...
2022-05-24Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull page cache updates from Matthew Wilcox: - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits) nilfs2: Fix some kernel-doc comments Appoint myself page cache maintainer fs: Remove aops->freepage secretmem: Convert to free_folio nfs: Convert to free_folio orangefs: Convert to free_folio fs: Add free_folio address space operation fs: Convert drop_buffers() to use a folio fs: Change try_to_free_buffers() to take a folio jbd2: Convert release_buffer_page() to use a folio jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio reiserfs: Convert release_buffer_page() to use a folio fs: Remove last vestiges of releasepage ubifs: Convert to release_folio reiserfs: Convert to release_folio orangefs: Convert to release_folio ocfs2: Convert to release_folio nilfs2: Remove comment about releasepage nfs: Convert to release_folio jfs: Convert to release_folio ...
2022-05-24drm/i915: Update tiled blits selftestBommu Krishnaiah
Update the selftest to include Tile 4 mode and switch to Tile 4 on platforms that supports Tile 4 but no Tile Y and vice versa. Also switch to XY_FAST_COPY_BLT on platforms that supports it. v4: update commit message to reflect the code changes properly. v3: add a function to find X-tile availability for a platform. v2: disable Tile X for iGPU in fastblit and fix checkpath --strict warnings. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5879 Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com> Co-developed-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220516082015.32020-1-nirmoy.das@intel.com
2022-05-23Merge tag 'drm-intel-next-2022-05-20' of ↵Tvrtko Ursulin
git://anongit.freedesktop.org/drm/drm-intel into drm-intel-gt-next drm/i915 drm-intel-next -> drm-intel-gt-next cross-merge sync Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> # Conflicts: # drivers/gpu/drm/i915/gt/intel_rps.c # drivers/gpu/drm/i915/i915_vma.c From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87y1ywbh5y.fsf@intel.com
2022-05-19drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomapMaarten Lankhorst
When removing short term pins, I've changed the the batch buffer pinning for relocation to use __i915_vma_pin, because i915_gem_object_ggtt_pin_ww was destroying the old vma. This caused regressions, because the functions are not identical. Fix the regressions by calling i915_gem_object_ggtt_pin_ww() again on ggtt-only platforms, but only if the batch can be pinned without being moved. Fixes: b5cfe6f7a6e1 ("drm/i915: Remove short-term pins from execbuf, v6.") Cc: Matthew Auld <matthew.auld@intel.com> Reported-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5806 Link: https://patchwork.freedesktop.org/patch/msgid/20220511115219.46507-1-maarten.lankhorst@linux.intel.com (cherry picked from commit 451374eef622fca6f00eeeda89aaccb45a30a149) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2022-05-18drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomapMaarten Lankhorst
When removing short term pins, I've changed the the batch buffer pinning for relocation to use __i915_vma_pin, because i915_gem_object_ggtt_pin_ww was destroying the old vma. This caused regressions, because the functions are not identical. Fix the regressions by calling i915_gem_object_ggtt_pin_ww() again on ggtt-only platforms, but only if the batch can be pinned without being moved. Fixes: b5cfe6f7a6e1 ("drm/i915: Remove short-term pins from execbuf, v6.") Cc: Matthew Auld <matthew.auld@intel.com> Reported-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5806 Link: https://patchwork.freedesktop.org/patch/msgid/20220511115219.46507-1-maarten.lankhorst@linux.intel.com
2022-05-10drm/i915/gem: Make drop_pages() return boolLucas De Marchi
Commit e4e806253003 ("drm/i915: Change shrink ordering to use locking around unbinding.") changed the return type to int without changing the return values or their meaning to "0 is success". Move it back to boolean. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220503061556.513175-1-lucas.demarchi@intel.com
2022-05-09drm/i915: Only setup private tmpfs mount when needed and fix loggingTvrtko Ursulin
If i915 does not want to use huge pages there is a) no point in setting up the private mount and b) should former fail, it is misleading to log THP support is disabled in the caller, which does not even know if callee tried to enable it. Fix both by restructuring the flow in i915_gemfs_init and at the same time note the failure to set it up in all cases. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220429100414.647857-2-tvrtko.ursulin@linux.intel.com
2022-05-09drm/i915: Enable THP on Icelake and beyondTvrtko Ursulin
We have a statement from HW designers that the GPU read regression when using 2M pages was fixed from Icelake onwards, which was also confirmed by bencharking Eero did last year: """ When IOMMU is disabled, enabling THP causes following perf changes on TGL-H (GT1): 10-15% SynMark Batch[0-3] 5-10% MemBW GPU texture, SynMark ShMapVsm 3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4 1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7] + few others -7% MemBW GPU blend In the above 3D benchmark names, * means all the variants of tests with the same prefix. For example "SynMark TexMem*", means both TexMem128 & TexMem512 tests in the synthetic (Intel internal) SynMark test suite. In the (public, but proprietary) GfxBench & GLB(enchmark) test suites, there are both onscreen and offscreen variants of each test. Unless explicitly stated otherwise, numbers are for both variants. All tests are run with FullHD monitor. All tests are fullscreen except for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest triangle is run both in fullscreen and 1/2 screen window). """ Since the only regression is MemBW GPU blend, against many more gains, it sounds it is time to enable THP on Gen11+. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220429100414.647857-1-tvrtko.ursulin@linux.intel.com
2022-05-08i915: Call aops write_begin() and write_end() directlyMatthew Wilcox (Oracle)
pagecache_write_begin() and pagecache_write_end() are now trivial wrappers, so call the aops directly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-04-28Merge tag 'drm-intel-gt-next-2022-04-27' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - GuC hwconfig support and query (John Harrison, Rodrigo Vivi, Tvrtko Ursulin) - Sysfs support for multi-tile devices (Andi Shyti, Sujaritha Sundaresan) - Per client GPU utilisation via fdinfo (Tvrtko Ursulin, Ashutosh Dixit) - Add DRM_I915_QUERY_GEOMETRY_SUBSLICES (Matt Atwood) Cross-subsystem Changes: - Add GSC as a MEI auxiliary device (Tomas Winkler, Alexander Usyskin) Core Changes: - Document fdinfo format specification (Tvrtko Ursulin) Driver Changes: - Fix prime_mmap to work when using LMEM (Gwan-gyeong Mun) - Fix vm open count and remove vma refcount (Thomas Hellström) - Fixup setting screen_size (Matthew Auld) - Opportunistically apply ALLOC_CONTIGIOUS (Matthew Auld) - Limit where we apply TTM_PL_FLAG_CONTIGUOUS (Matthew Auld) - Drop aux table invalidation on FlatCCS platforms (Matt Roper) - Add missing boundary check in vm_access (Mastan Katragadda) - Update topology dumps for Xe_HP (Matt Roper) - Add support for steered register writes (Matt Roper) - Add steering info to GuC register save/restore list (Daniele Ceraolo Spurio) - Small PCI BAR enabling (Matthew Auld, Akeem G Abodunrin, CQ Tang) - Add preemption changes for Wa_14015141709 (Akeem G Abodunrin) - Add logical mapping for video decode engines (Matthew Brost) - Don't evict unmappable VMAs when pinning with PIN_MAPPABLE (v2) (Vivek Kasireddy) - GuC error capture support (Alan Previn, Daniele Ceraolo Spurio) - avoid concurrent writes to aux_inv (Fei Yang) - Add Wa_22014226127 (José Roberto de Souza) - Sunset igpu legacy mmap support based on GRAPHICS_VER_FULL (Matt Roper) - Evict and restore of compressed objects (Ramalingam C) - Update to GuC version 70.1.1 (John Harrison) - Add Wa_22011802037 force cs halt (Tilak Tangudu) - Enable Wa_22011802037 for gen12 GuC based platforms (Umesh Nerlige Ramappa) - GuC based workarounds for DG2 (Vinay Belgaumkar, John Harrison, Matthew Brost, José Roberto de Souza) - consider min_page_size when migrating (Matthew Auld) - Prep work for next GuC firmware release (John Harrison) - Support platforms with CCS engines but no RCS (Matt Roper, Stuart Summers) - Don't overallocate subslice storage (Matt Roper) - Reduce stack usage in debugfs due to SSEU (John Harrison) - Report steering details in debugfs (Matt Roper) - Refactor some x86-ism out to prepare for non-x86 builds (Michael Cheng) - add lmem_size modparam (CQ Tang) - Refactor for non-x86 driver builds (Casey Bowman) - Centralize computation of freq caps (Ashutosh Dixit) - Update dma_buf_ops.unmap_dma_buf callback to use drm_gem_unmap_dma_buf() (Gwan-gyeong Mun) - Limit the async bind to bind_async_flags (Matthew Auld) - Stop checking for NULL vma->obj (Matthew Auld) - Reduce overzealous alignment constraints for GGTT (Matthew Auld) - Remove GEN12_SFC_DONE_MAX from register defs header (Matt Roper) - Fix renamed struct field (Lucas De Marchi) - Do not return '0' if there is nothing to return (Andi Shyti) - fix i915_reg_t initialization (Jani Nikula) - move the migration sanity check (Matthew Auld) - handle more rounding in selftests (Matthew Auld) - Perf and i915 query kerneldoc updates (Matt Roper) - Use i915_probe_error instead of drm_err (Vinay Belgaumkar) - sanity check object size in the buddy allocator (Matthew Auld) - fixup selftests min_alignment usage (Matthew Auld) - tweak selftests misaligned_case (Matthew Auld) Signed-off-by: Dave Airlie <airlied@redhat.com> # Conflicts: # drivers/gpu/drm/i915/i915_vma.c From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Ymkfy8FjsG2JrodK@tursulin-mobl2
2022-04-21Merge drm/drm-next into drm-intel-gt-nextRodrigo Vivi
In order to get the GSC Support merged on drm-intel-gt-next in a clean fashion we needed this ATS-M patch to avoid conflict in i915_pci.c: commit 412c942bdfae ("drm/i915/ats-m: add ATS-M platform info") -- Fixing a silent conflict on drivers/gpu/drm/i915/gt/intel_gt_gmch.c: - if (!intel_vtd_active(i915)) + if (!i915_vtd_active(i915)) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>