Age | Commit message (Collapse) | Author |
|
Update a bunch more debug prints to use the new GT based scheme.
v2: Upgrade the no node found message to a warning on the grounds of
it being quite important if the error capture can't find any register
state information.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230207050717.1833718-4-John.C.Harrison@Intel.com
|
|
Update a bunch more debug prints to use the new GT based scheme.
v2: Also change prints to use %pe for error values (MichalW).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230207050717.1833718-3-John.C.Harrison@Intel.com
|
|
Update a bunch more debug prints to use the new GT based scheme.
v2: Also change prints to use %pe for error values (MichalW).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230207050717.1833718-2-John.C.Harrison@Intel.com
|
|
INF_UNIT_LEVEL_CLKGATE is not replicated, but since it's not actually
used it can just be removed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230206165410.3056073-2-lucas.demarchi@intel.com
|
|
Register 0x9424 is not replicated on any platform, so it shouldn't be
declared with REG_MCR(). Declaring it with _MMIO() is basically
duplicate of the GEN7 version, so just remove the GEN8 and change all
the callers to use the right functions.
Old versions of the gen8 bspec page used to contain a table with MCR
registers, apparently implying 0x9400 - 0x94ff registers were
replicated. However that table went away and there is no information
related to the ranges for gen8 anymore. Moreover the current behavior of
the driver wouldn't do anything special for 0x9424 since there is no
equivalent table in intel_gt_mcr.c: the driver would just fallback to
intel_uncore_{read,write}(). Therefore, do not care about the possible
special case for gen8 and just use the register as non-MCR for all the
platforms.
One place doing read + write is also converted to intel_uncore_rmw().
v2: Reword commit message adding the justification wrt gen8
Fixes: a9e69428b1b4 ("drm/i915: Define MCR registers explicitly")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230206165410.3056073-1-lucas.demarchi@intel.com
|
|
The UNSLICE_UNIT_LEVEL_CLKGATE register programmed by this workaround
has 'BUS' style reset, indicating that it does not lose its value on
engine resets. Furthermore, this register is part of the GT forcewake
domain rather than the RENDER domain, so it should not be impacted by
RCS engine resets. As such, we should implement this on the GT
workaround list rather than an engine list.
Bspec: 19219
Fixes: 3551ff928744 ("drm/i915/gen11: Moving WAs to rcs_engine_wa_init()")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230201222831.608281-2-matthew.d.roper@intel.com
|
|
XEHPC_LNCFMISCCFGREG0 and XEHPC_L3SCRUB are both in MCR register ranges
on PVC (with HALFBSLICE and L3BANK replication respectively), so they
should be explicitly declared as MCR registers and use MCR-aware
workaround handlers.
The workarounds/tuning settings should still be applied properly on PVC
even without the MCR annotation, but readback verification on
CONFIG_DRM_I915_DEBUG_GEM builds could potentitally give false positive
"workaround lost on load" warnings on parts fused such that a unicast
read targets a terminated register instance.
Fixes: a9e69428b1b4 ("drm/i915: Define MCR registers explicitly")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230201222831.608281-1-matthew.d.roper@intel.com
|
|
__le64 and friends should go through the cpu_to_* and *_to_cpu
accessors:
drivers/gpu/drm/i915/pxp/intel_pxp_huc.c:41:35: warning: incorrect type in assignment (different base types)
drivers/gpu/drm/i915/pxp/intel_pxp_huc.c:41:35: expected restricted __le64 [assigned] [usertype] huc_base_address
drivers/gpu/drm/i915/pxp/intel_pxp_huc.c:41:35: got unsigned long long [assigned] [usertype] huc_phys_addr
Cc: Tomas Winkler <tomas.winkler@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230207124026.2105442-4-jani.nikula@intel.com
|
|
Annotate intel_gt_mcr_lock() and intel_gt_mcr_unlock() to fix sparse
warnings:
drivers/gpu/drm/i915/gt/intel_gt_mcr.c:397:9: warning: context imbalance in 'intel_gt_mcr_lock' - wrong count at exit
drivers/gpu/drm/i915/gt/intel_gt_mcr.c:412:6: warning: context imbalance in 'intel_gt_mcr_unlock' - unexpected unlock
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230207124026.2105442-1-jani.nikula@intel.com
|
|
During module load the punit might still be busy with its booting
routines. During this time we try to communicate with it but we
fail because we don't receive any feedback from it and we return
immediately with a -EINVAL fatal error.
At this point the driver load is "dramatically" aborted. The
following error message notifies us about it.
i915 0000:4d:00.0: drm_WARN_ON_ONCE(timeout_base_ms > 3)
It would be enough to wait a little in order to give the punit
the chance to come up bright and shiny, ready to interact with
the driver.
Wait up 10 seconds for the punit to settle and complete any
outstanding transactions upon module load. If it still fails try
again with a longer timeout, 180s, 3 minutes. If it still fails
then return -EPROBE_DEFER, in order to give the punit a second
chance.
Even if these timers might look long, we should consider that the
punit, depending on the platforms, might need long times to
complete its routines. Besides we want to try anything possible
to move forward before deciding to abort the driver's load.
The issue has been reported in:
https://gitlab.freedesktop.org/drm/intel/-/issues/7814
The changes in this patch are valid only and uniquely during
boot. The common transactions with the punit during the driver's
normal operation are not affected.
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Co-developed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230206183236.109908-1-andi.shyti@linux.intel.com
|
|
Like we did it for GuC, introduce some helper print macros for
HuC to have unified format of messages that also include GT#.
While around improve some messages and use %pe if possible.
v2: update GSC/PXP timeout message
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230203085912.1963-1-michal.wajdeczko@intel.com
|
|
Stephen Rothwell reported htmldocs warnings:
Documentation/gpu/i915:64: drivers/gpu/drm/i915/gt/intel_workarounds.c:32: WARNING: Inline emphasis start-string without end-string.
Documentation/gpu/i915:64: drivers/gpu/drm/i915/gt/intel_workarounds.c:57: WARNING: Inline emphasis start-string without end-string.
Documentation/gpu/i915:64: drivers/gpu/drm/i915/gt/intel_workarounds.c:66: WARNING: Inline emphasis start-string without end-string.
Escape wildcards in *_ctx_workarounds_init(), *_gt_workarounds_init(), and
*_whitelist_build() to fix above warnings.
Link: https://lore.kernel.org/linux-next/20230203134622.0b6315b9@canb.auug.org.au/
Fixes: 0c3064cf33fbfa ("drm/i915/doc: Document where to implement register workarounds")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230203100215.31852-2-bagasdotme@gmail.com
|
|
Obj flags for shmem objects is not being set correctly. Fixes in setting
BO_ALLOC_USER flag which applies to shmem objs as well.
v2: Add fixes tag (Tvrtko, Matt A)
Fixes: 13d29c823738 ("drm/i915/ehl: unconditionally flush the pages on acquire")
Cc: <stable@vger.kernel.org> # v5.15+
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[tursulin: Grouped all tags together.]
Link: https://patchwork.freedesktop.org/patch/msgid/20230203135205.4051149-1-aravind.iddamsetty@intel.com
|
|
Because eb_composite_fence_create() drops the fence_array reference
after creation of the sync_file, only the sync_file holds a ref to the
fence. But fd_install() makes that reference visable to userspace, so
it must be the last thing we do with the fence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
Cc: <stable@vger.kernel.org> # v5.15+
[tursulin: Added stable tag.]
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230203164937.4035503-1-robdclark@gmail.com
|
|
DSM granularity is 1MB so make sure we stick to that.
The address set by firmware in GEN12_DSMBASE in driver initialization
doesn't mean "anything above that and until end of lmem is part of DSM".
In fact, there may be a few KB that is not part of DSM on the end of
lmem. How large is that space is platform-dependent, but since it's
always less than the DSM granularity, it can be simplified by simply
aligning the size down.
v2: replace "1 * SZ_1M" with SZ_1M (Andrzej).
v3: reword commit message to explain why the round down is needed
(Lucas)
Cc: Matthew Auld <matthew.auld@intel.com>
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230202180243.23637-1-nirmoy.das@intel.com
|
|
Just recently we switched over to new GuC oriented log macros but in
the meantime yet another message was added that we missed to update.
While around improve that new message by adding engine name and use
existing helpers to check for context state.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230131214413.1879-1-michal.wajdeczko@intel.com
|
|
Use sysfs_emit() and sysfs_emit_at() in show() callback
as recommended by Documentation/filesystems/sysfs.rst
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230130131358.16800-1-nirmoy.das@intel.com
|
|
Check that we invalidate the TLB cache, the updated physical addresses
are immediately visible to the HW, and there is no retention of the old
physical address for concurrent HW access.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[ahajda: adjust to upstream driver, v2+]
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[tursulin: Small indentation fix.]
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230130165058.1647414-1-andrzej.hajda@intel.com
|
|
Wa_22011802037 requires waiting for an engine-specific register to
clear. A missing entry for GSC engine in the register table is flagged
as a drm_err. The drm_err was originally intended to catch missing
register entries for newer engines, however, it was later found that the
WA is only required for 'legacy' engines. So just drop the drm_err.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230124231111.1786429-1-umesh.nerlige.ramappa@intel.com
|
|
Use new macros to have common prefix that also include GT#.
v2: pass gt to print_fw_ver
v3: prefer guc_dbg in suspend/resume logs
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-9-michal.wajdeczko@intel.com
|
|
Use new macros to have common prefix that also include GT#.
v2: improve few existing messages
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-8-michal.wajdeczko@intel.com
|
|
Use new macros to have common prefix that also include GT#.
v2: drop redundant GuC strings, minor improvements
v3: more message improvements
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-7-michal.wajdeczko@intel.com
|
|
Use new macros to have common prefix that also include GT#.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-6-michal.wajdeczko@intel.com
|
|
Use new macros to have common prefix that also include GT#.
v2: drop unused helpers
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-5-michal.wajdeczko@intel.com
|
|
Use new macros to have common prefix that also include GT#.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-4-michal.wajdeczko@intel.com
|
|
Use new macros to have common prefix that also include GT#.
v2: drop now redundant "GuC" word from the message
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-3-michal.wajdeczko@intel.com
|
|
While we do have GT oriented print macros, add few more GuC
specific to have common look and feel across all messages
related to the GuC and to avoid chasing the gt pointer.
We will use these macros shortly in upcoming patches.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230128195907.1837-2-michal.wajdeczko@intel.com
|
|
A userspace with multiple threads racing I915_GEM_SET_TILING to set the
tiling to I915_TILING_NONE could trigger a double free of the bit_17
bitmask. (Or conversely leak memory on the transition to tiled.) Move
allocation/free'ing of the bitmask within the section protected by the
obj lock.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fixes: 2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Cc: <stable@vger.kernel.org> # v5.5+
[tursulin: Correct fixes tag and added cc stable.]
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127200550.3531984-1-robdclark@gmail.com
|
|
The GuC specific register state entry in the error capture object was
just called 'capture'. Although the companion 'node' entry was called
'guc_capture_node'. Rename the base entry to be 'guc_capture' instead
so that it is a) more consistent and b) more obvious what it is.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-9-John.C.Harrison@Intel.com
|
|
For understanding bug reports, it can be useful to have an explicit
dmesg print when a reset notification is received from GuC. As opposed
to simply inferring that this happened from other messages.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-8-John.C.Harrison@Intel.com
|
|
Engine resets are supposed to never fail. But in the case when one
does (due to unknown reasons that normally come down to a missing
w/a), it is useful to get as much information out of the system as
possible. Given that the GuC intentionally dies on such a situation,
it is not possible to get a guilty context notification back. So do a
manual search instead. Given that GuC is dead, this is safe because
GuC won't be changing the engine state asynchronously.
v2: Change comment to be less alarming (Tvrtko)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-7-John.C.Harrison@Intel.com
|
|
A hang situation has been observed where the only requests on the
context were either completed or not yet started according to the
breaadcrumbs. However, the register state claimed a batch was (maybe)
in progress. So, allow capture of the pending request on the grounds
that this might be better than nothing.
v2: Reword 'not started' warning message (Tvrtko)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-6-John.C.Harrison@Intel.com
|
|
There was a report of error captures occurring without any hung
context being indicated despite the capture being initiated by a 'hung
context notification' from GuC. The problem was not reproducible.
However, it is possible to happen if the context in question has no
active requests. For example, if the hang was in the context switch
itself then the breadcrumb write would have occurred and the KMD would
see an idle context.
In the interests of attempting to provide as much information as
possible about a hang, it seems wise to include the engine info
regardless of whether a request was found or not. As opposed to just
prentending there was no hang at all.
So update the error capture code to always record engine information
if a context is given. Which means updating record_context() to take a
context instead of a request (which it only ever used to find the
context anyway). And split the request agnostic parts of
intel_engine_coredump_add_request() out into a seaprate function.
v2: Remove a duplicate 'if' statement (Umesh) and fix a put of a null
pointer.
v3: Tidy up request locking code flow (Tvrtko)
v4: Pull in improved info message from next patch and fix up potential
leak of GuC register state (Daniele)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> (v2)
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-5-John.C.Harrison@Intel.com
|
|
The debugfs dump of requests was confused about what state requires
the execlist lock versus the GuC lock. There was also a bunch of
duplicated messy code between it and the error capture code.
So refactor the hung request search into a re-usable function. And
reduce the span of the execlist state lock to only the execlist
specific code paths. In order to do that, also move the report of hold
count (which is an execlist only concept) from the top level dump
function to the lower level execlist specific function. Also, move the
execlist specific code into the execlist source file.
v2: Rename some functions and move to more appropriate files (Daniele).
v3: Rename new execlist dump function (Daniele)
Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Michael Cheng <michael.cheng@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-4-John.C.Harrison@Intel.com
|
|
When GuC support was added to error capture, the reference counting
around the request object was broken. Fix it up.
The context based search manages the spinlocking around the search
internally. So it needs to grab the reference count internally as
well. The execlist only request based search relies on external
locking, so it needs an external reference count but within the
spinlock not outside it.
The only other caller of the context based search is the code for
dumping engine state to debugfs. That code wasn't previously getting
an explicit reference at all as it does everything while holding the
execlist specific spinlock. So, that needs updaing as well as that
spinlock doesn't help when using GuC submission. Rather than trying to
conditionally get/put depending on submission model, just change it to
always do the get/put.
v2: Explicitly document adding an extra blank line in some dense code
(Andy Shevchenko). Fix multiple potential null pointer derefs in case
of no request found (some spotted by Tvrtko, but there was more!).
Also fix a leaked request in case of !started and another in
__guc_reset_context now that intel_context_find_active_request is
actually reference counting the returned request.
v3: Add a _get suffix to intel_context_find_active_request now that it
grabs a reference (Daniele).
v4: Split the intel_guc_find_hung_context change to a separate patch
and rename intel_context_find_active_request_get to
intel_context_get_active_request (Tvrtko).
v5: s/locking/reference counting/ in commit message (Tvrtko)
Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC")
Fixes: 573ba126aef3 ("drm/i915/guc: Capture error state on context reset")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Michael Cheng <michael.cheng@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-3-John.C.Harrison@Intel.com
|
|
intel_guc_find_hung_context() was not acquiring the correct spinlock
before searching the request list. So fix that up. While at it, add
some extra whitespace padding for readability.
Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Michael Cheng <michael.cheng@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-2-John.C.Harrison@Intel.com
|
|
Adding the vm to the vm_xa table makes it visible to userspace, which
could try to race with us to close the vm. So we need to take our extra
reference before putting it in the table.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Fixes: 9ec8795e7d91 ("drm/i915: Drop __rcu from gem_context->vm")
Cc: <stable@vger.kernel.org> # v5.16+
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230119173321.2825472-1-robdclark@gmail.com
|
|
GAMSTLB_CTRL and GAMCNTRL_CTRL became multicast/replicated registers on
Xe_HP. They should be defined accordingly and use MCR-aware operations.
These registers have only been used for some dg2/xehpsdv workarounds, so
this fix is mostly just for consistency/future-proofing; even lacking
the MCR annotation, workarounds will always be properly applied in a
multicast manner on these platforms.
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Fixes: 58bc2453ab8a ("drm/i915: Define multicast registers as a new type")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125234159.3015385-3-matthew.d.roper@intel.com
|
|
Workaround Wa_18018781329 has applied to several recent Xe_HP-based
platforms. However there are some extra gotchas to implementing this
properly for MTL that we need to take into account:
* Due to the separation of media and render/compute into separate GTs,
this workaround needs to be implemented on each GT, not just the
primary GT. Since each class of register only exists on one of the
two GTs, we should program the appropriate registers on each GT.
* As with past Xe_HP platforms, the registers on the primary GT (Xe_LPG
IP) are multicast/replicated registers and should be handled with the
MCR-aware functions. However the registers on the media GT (Xe_LPM+
IP) are regular singleton registers and should _not_ use MCR
handling. We need to create separate register definitions for the
Xe_HP multicast form and the Xe_LPM+ singleton form and use each in
the appropriate place.
* Starting with MTL, workarounds documented by the hardware teams are
technically associated with IP versions/steppings rather than
top-level platforms. That means we should take care to check the
media IP version rather than the graphics IP version when deciding
whether the workaround is needed on the Xe_LPM+ media GT (in this
case the workaround applies to both IPs and the stepping bounds are
identical, but we should still write the code appropriately to set a
proper precedent for future workaround implementations).
* It's worth noting that the GSC register and the CCS register are
defined with the same MMIO offset (0xCF30). Since the CCS is only
relevant to the primary GT and the GSC is only relevant to the media
GT there isn't actually a clash here (the media GT automatically adds
the additional 0x380000 GSI offset). However there's currently a
glitch in the bspec where the CCS register doesn't show up at all and
the GSC register is listed as existing on both GTs. That's a known
documentation problem for several registers with shared GSC/CCS
offsets; rest assured that the CCS register really does still exist.
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Fixes: 41bb543f5598 ("drm/i915/mtl: Add initial gt workarounds")
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125234159.3015385-2-matthew.d.roper@intel.com
|
|
Register reset characteristics (i.e., whether the register maintains or
loses its value on engine reset) is an important factor that determines
which wa_list we want to add workarounds to. We recently found out that
the bspec documentation for the Xe_HP's "GAM" registers in the 0xC800 -
0xCFFF range was misleading; these registers do not actually lose their
value on engine resets as the documentation implied. This means there's
no need to re-apply workarounds touching these registers after a reset,
and the corresponding workarounds should be moved from the 'engine'
lists back to the 'gt' list.
v2:
- Don't add Wa_18018781329 to xehpsdv; the original condition didn't
include that platform. (Gustavo)
- Move the MTL code to the GT function as-is for now; we'll take care
of the additional fixes needed in a follow-up patch.
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Fixes: edf176f48d87 ("drm/i915/dg2: Move misplaced 'ctx' & 'gt' wa's to engine wa list")
Fixes: b2006061ae28 ("drm/i915/xehpsdv: Move render/compute engine reset domains related workarounds")
Fixes: 41bb543f5598 ("drm/i915/mtl: Add initial gt workarounds")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125234159.3015385-1-matthew.d.roper@intel.com
|
|
We want to idle all tiles when exiting selftests.
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230125100003.18243-1-nirmoy.das@intel.com
|
|
Lets eradicate a silent conflict between drm-intel-next and
drm-intel-gt-next which added a duplicate function (try_firmware_load).
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver Changes:
Fixes/improvements/new stuff:
- Fix workarounds on Gen2-3 (Tvrtko Ursulin)
- Fix HuC delayed load memory leaks (Daniele Ceraolo Spurio)
- Fix a BUG caused by impendance mismatch in dma_fence_wait_timeout and GuC (Janusz Krzysztofik)
- Add DG2 workarounds Wa_18018764978 and Wa_18019271663 (Matt Atwood)
- Apply recommended L3 hashing mask tuning parameters (Gen12+) (Matt Roper)
- Improve suspend / resume times with VT-d scanout workaround active (Andi Shyti, Chris Wilson)
- Silence misleading "mailbox access failed" warning in snb_pcode_read (Ashutosh Dixit)
- Fix null pointer dereference on HSW perf/OA (Umesh Nerlige Ramappa)
- Avoid trampling the ring during buffer migration (and selftests) (Chris Wilson, Matthew Auld)
- Fix DG2 visual corruption on small BAR systems by not forgetting to copy CCS aux state (Matthew Auld)
- More fixing of DG2 visual corruption by not forgetting to copy CCS aux state of backup objects (Matthew Auld)
- Fix TLB invalidation for Gen12.50 video and compute engines (Andrzej Hajda)
- Limit Wa_22012654132 to just specific steppings (Matt Roper)
- Fix userspace crashes due eviction not working under lock contention after the object locking conversion (Matthew Auld)
- Avoid double free is user deploys a corrupt GuC firmware (John Harrison)
- Fix 32-bit builds by using "%zu" to format size_t (Nirmoy Das)
- Fix a possible BUG in TTM async unbind due not reserving enough fence slots (Nirmoy Das)
- Fix potential use after free by not exposing the GEM context id to userspace too early (Rob Clark)
- Show clamped PL1 limit to the user (hwmon) (Ashutosh Dixit)
- Workaround unreliable reset on Jasperlake (Chris Wilson)
- Cover rest of SVG unit MCR registers (Gustavo Sousa)
- Avoid PXP log spam on platforms which do not support the feature (Alan Previn)
- Re-disable RC6p on Sandy Bridge to avoid GPU hangs and visual glitches (Sasa Dragic)
Future platform enablement:
- Manage uncore->lock while waiting on MCR register (Matt Roper)
- Enable Idle Messaging for GSC CS (Vinay Belgaumkar)
- Only initialize GSC in tile 0 (José Roberto de Souza)
- Media GT and Render GT share common GGTT (Aravind Iddamsetty)
- Add dedicated MCR lock (Matt Roper)
- Implement recommended caching policy (PVC) (Wayne Boyer)
- Add hardware-level lock for steering (Matt Roper)
- Check full IP version when applying hw steering semaphore (Matt Roper)
- Enable GuC GGTT invalidation from the start (Daniele Ceraolo Spurio)
- MTL GSC firmware support (Daniele Ceraolo Spurio, Jonathan Cavitt)
- MTL OA support (Umesh Nerlige Ramappa)
- MTL initial gt workarounds (Matt Roper)
Driver refactors:
- Hold forcewake and MCR lock over PPAT setup (Matt Roper)
- Acquire fw before loop in intel_uncore_read64_2x32 (Umesh Nerlige Ramappa)
- GuC filename cleanups and use submission API version number (John Harrison)
- Promote pxp subsystem to top-level of i915 (Alan Previn)
- Finish proofing the code agains object size overflows (Chris Wilson, Gwan-gyeong Mun)
- Start adding module oriented dmesg output (John Harrison)
Miscellaneous:
- Correct kerneldoc for intel_gt_mcr_wait_for_reg() (Matt Roper)
- Bump up sample period for busy stats selftest (Umesh Nerlige Ramappa)
- Make GuC default_lists const data (Jani Nikula)
- Fix table order verification to check all FW types (John Harrison)
- Remove some limited use register access wrappers (Jani Nikula)
- Remove struct_member macro (Andrzej Hajda)
- Remove hardcoded value with a macro (Nirmoy Das)
- Use helper func to find out map type (Nirmoy Das)
- Fix a static analysis warning (John Harrison)
- Consolidate VMA active tracking helpers (Andrzej Hajda)
- Do not cover all future platforms in TLB invalidation (Tvrtko Ursulin)
- Replace zero-length arrays with flexible-array members (Gustavo A. R. Silva)
- Unwind hugepages to drop wakeref on error (Chris Wilson)
- Remove a couple of superfluous i915_drm.h includes (Jani Nikula)
Merges:
- Merge drm/drm-next into drm-intel-gt-next (Rodrigo Vivi)
danvet: Fix up merge conflict in intel_uc_fw.c, we ended up with 2
copies of try_firmware_load() somehow.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y8fW2Ny1B1hZ5ZmF@tursulin-desk
|
|
Default engine map is exactly about uabi engines so no excuse not to use
the appropriate iterator to populate it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
[tursulin: Fixed up r-b tag spelling.]
Link: https://patchwork.freedesktop.org/patch/msgid/20230123185629.1593320-1-jonathan.cavitt@intel.com
|
|
That register became a multicast register as of Xe_HP and it is
currently used only for DG2. Use a proper prefix since there could be
usage of the same register for previous platforms in the future, which
would require a different definition (i.e. using _MMIO).
Note that, in its current state, the code does not cause functional
problems, since the actual application of the workaround would
implicitly use multicast mode. This fix is more toward consistency and
being future-proof uses of this register outside of workarounds.
v2:
- Add paragraph noting that this change is for consistency and
making the code future-proof. (Matt)
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Atwood <matthew.s.atwood@intel.com>
Fixes: 468a4e630c7d ("drm/i915/dg2: Introduce Wa_18018764978")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230120181423.90507-1-gustavo.sousa@intel.com
|
|
The definition of intel_selftest_modify_policy() does not match the
declaration, as gcc-13 points out:
drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c:29:5: error: conflicting types for 'intel_selftest_modify_policy' due to enum/integer mismatch; have 'int(struct intel_engine_cs *, struct intel_selftest_saved_policy *, u32)' {aka 'int(struct intel_engine_cs *, struct intel_selftest_saved_policy *, unsigned int)'} [-Werror=enum-int-mismatch]
29 | int intel_selftest_modify_policy(struct intel_engine_cs *engine,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c:11:
drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h:28:5: note: previous declaration of 'intel_selftest_modify_policy' with type 'int(struct intel_engine_cs *, struct intel_selftest_saved_policy *, enum selftest_scheduler_modify)'
28 | int intel_selftest_modify_policy(struct intel_engine_cs *engine,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change the type in the definition to match.
Fixes: 617e87c05c72 ("drm/i915/selftest: Fix hangcheck self test for GuC submission")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230117163743.1003219-1-arnd@kernel.org
|
|
That register doesn't belong to a specific engine, so the proper
placement for workarounds programming it should be
general_render_compute_wa_init().
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230118155249.41551-3-gustavo.sousa@intel.com
|
|
Extend the existing documentation in gt/intel_workarounds.c to make it
clear which functions register workarounds should be implemented in
according to their types.
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230118155249.41551-2-gustavo.sousa@intel.com
|
|
Commit 0d0e7d1eea9e ("drm/i915/mtl: Define engine context layouts")
added the engine context for Meteor Lake. In a second revision of the
patch it was believed the xcs offsets were wrong due to a tagging
issue in the spec. The first version was actually correct, as shown
by the intel_lrc_live_selftests/live_lrc_layout test:
i915: Running gt_lrc
i915: Running intel_lrc_live_selftests/live_lrc_layout
bcs0: LRI command mismatch at dword 1, expected 1108101d found 11081019
[drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:236:DP-1] disconnected
bcs0: HW register image:
[0000] 00000000 1108101d 00022244 ffff0008 00022034 00000088 00022030 00000088
...
bcs0: SW register image:
[0000] 00000000 11081019 00022244 00090009 00022034 00000000 00022030 00000000
The difference in the 2 additional dwords (0x1d vs 0x19) are the offsets
0x120 / 0x124 that are indeed part of the context image.
Bspec: 45585
Fixes: 0d0e7d1eea9e ("drm/i915/mtl: Define engine context layouts")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230111235531.3353815-2-radhakrishna.sripada@intel.com
|
|
The implementation of Wa_22011450934 introduced three new register
definitions in i915_reg.h that didn't get moved to the GT/engine
register headers when all the other registers moved; let's move them to
the appropriate headers and tidy up their definitions now for
consistency:
- STATE_ACK_DEBUG is moved to the engine register header and converted
to a parameterized definition; the workaround only needs the RCS
instance to be programmed, but there are instances on other engines
that could be used by other workarounds in the future.
- The two CULLBIT registers move to the GT register header. Since
they belong to MMIO ranges that became MCR starting with Xe_HP,
their definitions should be defined as MCR_REG() and use an Xe_HP
prefix to keep the register semantics clear.
Note that the MCR definition is just for consistency and to prevent
accidental misuse if other workarounds related to these registers show
up in the future. There's no functional change to today's driver since
the workaround that references these registers only accesses them via
MI_LRR engine instructions. Engine-initiated register accesses do not
utilize the same steering controls as CPU-initiated accesses; they
use a different steering control register (0x20CC) which is initialized
to a non-terminated DSS target by pre-OS firmware and never changed
thereafter (i915 does not touch it and userspace does not have
permission to change that register).
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230117202627.4134579-1-matthew.d.roper@intel.com
|