Age | Commit message (Collapse) | Author |
|
Pull drm updates from Dave Airlie:
"As part of building up nova-core/nova-drm pieces we've brought in some
rust abstractions through this tree, aux bus being the main one, with
devres changes also in the driver-core tree. Along with the drm core
abstractions and enough nova-core/nova-drm to use them. This is still
all stub work under construction, to build the nova driver upstream.
The other big NVIDIA related one is nouveau adds support for
Hopper/Blackwell GPUs, this required a new GSP firmware update to
570.144, and a bunch of rework in order to support multiple fw
interfaces.
There is also the introduction of an asahi uapi header file as a
precursor to getting the real driver in later, but to unblock
userspace mesa packages while the driver is trapped behind rust
enablement.
Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe,
and msm being the main ones, and some changes to vsprintf.
new drivers:
- bring in the asahi uapi header standalone
- nova-drm: stub driver
rust dependencies (for nova-core):
- auxiliary
- bus abstractions
- driver registration
- sample driver
- devres changes from driver-core
- revocable changes
core:
- add Apple fourcc modifiers
- add virtio capset definitions
- extend EXPORT_SYNC_FILE for timeline syncobjs
- convert to devm_platform_ioremap_resource
- refactor shmem helper page pinning
- DP powerup/down link helpers
- extended %p4cc in vsprintf.c to support fourcc prints
- change vsprintf %p4cn to %p4chR, remove %p4cn
- Add drm_file_err function
- IN_FORMATS_ASYNC property
- move sitronix from tiny to their own subdir
rust:
- add drm core infrastructure rust abstractions
(device/driver, ioctl, file, gem)
dma-buf:
- adjust sg handling to not cache map on attach
- allow setting dma-device for import
- Add a helper to sort and deduplicate dma_fence arrays
docs:
- updated drm scheduler docs
- fbdev todo update
- fb rendering
- actual brightness
ttm:
- fix delayed destroy resv object
bridge:
- add kunit tests
- convert tc358775 to atomic
- convert drivers to devm_drm_bridge_alloc
- convert rk3066_hdmi to bridge driver
scheduler:
- add kunit tests
panel:
- refcount panels to improve lifetime handling
- Powertip PH128800T004-ZZA01
- NLT NL13676BC25-03F, Tianma TM070JDHG34-00
- Himax HX8279/HX8279-D DDIC
- Visionox G2647FB105
- Sitronix ST7571
- ZOTAC rotation quirk
vkms:
- allow attaching more displays
i915:
- xe3lpd display updates
- vrr refactor
- intel_display struct conversions
- xe2hpd memory type identification
- add link rate/count to i915_display_info
- cleanup VGA plane handling
- refactor HDCP GSC
- fix SLPC wait boosting reference counting
- add 20ms delay to engine reset
- fix fence release on early probe errors
xe:
- SRIOV updates
- BMG PCI ID update
- support separate firmware for each GT
- SVM fix, prelim SVM multi-device work
- export fan speed
- temp disable d3cold on BMG
- backup VRAM in PM notifier instead of suspend/freeze
- update xe_ttm_access_memory to use GPU for non-visible access
- fix guc_info debugfs for VFs
- use copy_from_user instead of __copy_from_user
- append PCIe gen5 limitations to xe_firmware document
amdgpu:
- DSC cleanup
- DC Scaling updates
- Fused I2C-over-AUX updates
- DMUB updates
- Use drm_file_err in amdgpu
- Enforce isolation updates
- Use new dma_fence helpers
- USERQ fixes
- Documentation updates
- SR-IOV updates
- RAS updates
- PSP 12 cleanups
- GC 9.5 updates
- SMU 13.x updates
- VCN / JPEG SR-IOV updates
amdkfd:
- Update error messages for SDMA
- Userptr updates
- XNACK fixes
radeon:
- CIK doorbell cleanup
nouveau:
- add support for NVIDIA r570 GSP firmware
- enable Hopper/Blackwell support
nova-core:
- fix task list
- register definition infrastructure
- move firmware into own rust module
- register auxiliary device for nova-drm
nova-drm:
- initial driver skeleton
msm:
- GPU:
- ACD (adaptive clock distribution) for X1-85
- drop fictional address_space_size
- improve GMU HFI response time out robustness
- fix crash when throttling during boot
- DPU:
- use single CTL path for flushing on DPU 5.x+
- improve SSPP allocation code for better sharing
- Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550
- Added SAR2130P support
- Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660
- DP:
- switch to new audio helpers
- better LTTPR handling
- DSI:
- Added support for SA8775P
- Added SAR2130P support
- HDMI:
- Switched to use new helpers for ACR data
- Fixed old standing issue of HPD not working in some cases
amdxdna:
- add dma-buf support
- allow empty command submits
renesas:
- add dma-buf support
- add zpos, alpha, blend support
panthor:
- fail properly for NO_MMAP bos
- add SET_LABEL ioctl
- debugfs BO dumping support
imagination:
- update DT bindings
- support TI AM68 GPU
hibmc:
- improve interrupt handling and HPD support
virtio:
- add panic handler support
rockchip:
- add RK3588 support
- add DP AUX bus panel support
ivpu:
- add heartbeat based hangcheck
mediatek:
- prepares support for MT8195/99 HDMIv2/DDCv2
anx7625:
- improve HPD
tegra:
- speed up firmware loading
* tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits)
drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr()
drm/xe: Default auto_link_downgrade status to false
drm/xe/guc: Make creation of SLPC debugfs files conditional
drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue()
drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read
drm/i915/ptl: Use everywhere the correct DDI port clock select mask
drm/nouveau/kms: add support for GB20x
drm/dp: add option to disable zero sized address only transactions.
drm/nouveau: add support for GB20x
drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle
drm/nouveau: add support for GB10x
drm/nouveau/gf100-: track chan progress with non-WFI semaphore release
drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA
drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos
drm/nouveau: add support for GH100
drm/nouveau: improve handling of 64-bit BARs
drm/nouveau/gv100-: switch to volta semaphore methods
drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES
drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY
drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM
...
|
|
For determining actual job execution time, save the current value of the
CTX_TIMESTAMP register rather than the value saved in LRC since the
current register value is the closest to the start time of the job.
v2: Define MI_STORE_REGISTER_MEM to fix compile error
v3: Place MI_STORE_REGISTER_MEM sorted by MI_INSTR (Lucas)
Fixes: 65921374c48f ("drm/xe: Emit ctx timestamp copy in ring ops")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250509161159.2173069-6-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 38b14233e5deff51db8faec287b4acd227152246)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Historically, the Vertex Fetcher unit has not been an L3 client. That
meant that, when a buffer containing vertex data was written to, it was
necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
VF L2 cachelines associated with that buffer, so the new value would be
properly read from memory.
Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
have included an "L3 Bypass Enable" bit which userspace drivers can set
to request that the vertex fetcher unit snoop L3. However, unlike most
true L3 clients, the "VF Cache Invalidate" bit continues to only
invalidate the VF L2 cache - and not any associated L3 lines.
To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
Bit", which according to the docs, "controls the invalidation of the
Geometry streams cached in L3 cache at the top of the pipe." In other
words, the vertex and index buffer data that gets cached in L3 when
"L3 Bypass Disable" is set.
Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
Cache Invalidate so that both L2 and L3 vertex data is invalidated.
xe is issuing VF cache invalidates too (which handles cases like CPU
writes to a buffer between GPU batches). Because userspace may enable
L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
Fixes significant flickering in Firefox on Meteorlake, which was writing
to vertex buffers via the CPU between batches; the missing L3 Read Only
invalidates were causing the vertex fetcher to read stale data from L3.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
Fixes: 6ef3bb60557d ("drm/xe: enable lite restore")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 61672806b579dd5a150a042ec9383be2bbc2ae7e)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Historically, the Vertex Fetcher unit has not been an L3 client. That
meant that, when a buffer containing vertex data was written to, it was
necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
VF L2 cachelines associated with that buffer, so the new value would be
properly read from memory.
Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
have included an "L3 Bypass Enable" bit which userspace drivers can set
to request that the vertex fetcher unit snoop L3. However, unlike most
true L3 clients, the "VF Cache Invalidate" bit continues to only
invalidate the VF L2 cache - and not any associated L3 lines.
To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
Bit", which according to the docs, "controls the invalidation of the
Geometry streams cached in L3 cache at the top of the pipe." In other
words, the vertex and index buffer data that gets cached in L3 when
"L3 Bypass Disable" is set.
Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
Cache Invalidate so that both L2 and L3 vertex data is invalidated.
xe is issuing VF cache invalidates too (which handles cases like CPU
writes to a buffer between GPU batches). Because userspace may enable
L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
Fixes significant flickering in Firefox on Meteorlake, which was writing
to vertex buffers via the CPU between batches; the missing L3 Read Only
invalidates were causing the vertex fetcher to read stale data from L3.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
Fixes: 6ef3bb60557d ("drm/xe: enable lite restore")
Cc: stable@vger.kernel.org # v6.13+
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The command streamer implements an Arithmetic Logic Unit (ALU)
which supports basic arithmetic and logical operations on two
64-bit operands. Access to this ALU is thru the MI_MATH command
and sixteen General Purpose Register (GPR) 64-bit registers,
which are used as temporary storage.
Bspec: 45737, 60236 # MI
Bspec: 45525, 60132 # ALU
Bspec: 45533, 60309 # GPR
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250304162307.1866-1-michal.wajdeczko@intel.com
|
|
The MI_LOAD_REGISTER_REG command reads value from a source register
location and writes that value to a destination register location.
Bspec: 45730, 60233
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250303173522.1822-2-michal.wajdeczko@intel.com
|
|
Xe3 adds a new 3DSTATE_COARSE_PIXEL state instruction as part of the
render engine LRC. Ensure we can recognize and report this properly in
the LRC dumps.
Bspec: 65182, 73415
Reviewed-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250307190754.678376-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
The key termination is done with a specific submission to the VCS
engine. This flow will be triggered in response to a termination
interrupt, whose handling is coming in a follow-up patch in the series.
v2: clean up defines and command emission code. (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-4-daniele.ceraolospurio@intel.com
|
|
This reverts commit 1460bb1fef9ccf7390af0d74a15252442fd6effd.
In all places the MI_STORE_DATA_IMM are not followed by a read of
the same memory address in the same batch buffer and the posted writes
are flushed with PIPE_CONTROL or MI_FLUSH_DW in xe_ring_ops.c functions
so there is no need to set this register.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Fixes: 1460bb1fef9c ("drm/xe: Force write completion of MI_STORE_DATA_IMM")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241227183230.101334-1-jose.souza@intel.com
|
|
With Force write completion unset there is no guarantees of when the
write will be globally visible what is not the behavior wanted.
Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241217160732.46280-1-jose.souza@intel.com
|
|
Add OAR support to allow userspace to execute MI_REPORT_PERF_COUNT on
render engines. Configuration batches are used to program the OAR unit, as
well as modifying the render engine context image of a specified exec queue
(to have correct register values when that context switches in).
v2: Rename/refactor xe_oa_modify_self (Umesh)
v3: Move IS_MI_LRI_CMD() into xe_oa.c (Michal)
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-11-ashutosh.dixit@intel.com
|
|
MI_COPY_MEM_MEM GPU instructions are used to copy ctx timestamp from a
LRC registers to another location at the beginning of every jobs
execution. Add MI_COPY_MEM_MEM GPU instruction definitions.
v2:
- Include MI_COPY_MEM_MEM based on instruction order (Michal)
- Fix tabs/spaces issue (Michal)
- Use macro for DW definition (Michal)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611144053.2805091-3-matthew.brost@intel.com
|
|
All other files with commands definitions are in instructions/
folder. Move xe_gpu_commands.h also there.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240508174856.1908-1-michal.wajdeczko@intel.com
|
|
The LRCs on some of our newer platforms appear to contain a few GPU
instructions that weren't handled in our LRC parser. Add the relevant
instruction names and opcodes so that our debugfs LRC dumps will
properly indicate what these are.
Bspec: 55866, 64848, 46931
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240222184009.6857-2-matthew.d.roper@intel.com
|
|
We will need this shortly during context state preparation.
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-2-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
|
|
The version is obtained via a dedicated MKHI GSC HECI command.
The compatibility version is what we want to match against for the GSC,
so we need to call the FW version checker after obtaining the version.
Since this is the first time we send a GSC HECI command via the GSCCS,
this patch also introduces common infrastructure to send such commands
to the GSC. Communication with the GSC FW is done via input/output
buffers, whose addresses are provided via a GSCCS command. The buffers
contain a generic header and a client-specific packet (e.g. PXP, HDCP);
the clients don't care about the header format and/or the GSCCS command
in the batch, they only care about their client-specific header. This
patch therefore introduces helpers that allow the callers to
automatically fill in the input header, submit the GSCCS job and decode
the output header, to make it so that the caller only needs to worry about
their client-specific input and output messages.
v3: squash of 2 separate patches ahead of merge, so that the common
functions and their first user are added at the same time
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.Com> #v1
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The GSC FW must be copied in a 4MB stolen memory allocation, whose GGTT
address is then passed as a parameter to a dedicated load instruction
submitted via the GSC engine.
Since the GSC load is relatively slow (up to 250ms), we perform it
asynchronously via a worker. This requires us to make sure that the
worker has stopped before suspending/unloading.
Note that we can't yet use xe_migrate_copy for the copy because it
doesn't work with stolen memory right now, so we do a memcpy from the
CPU side instead.
v2: add comment about timeout value, fix GSC status checking
before load (John)
Bspec: 65306, 65346
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
As with DG2/MTL, Xe2 also fails to emit instruction headers for SVG
state instructions if no explicit state has been set. The SVG part of
the LRC is nearly identical to DG2/MTL; the only change is that
3DSTATE_DRAWING_RECTANGLE has been replaced by
3DSTATE_DRAWING_RECTANGLE_FAST, so we can just re-use the same state
table and handle that single instruction when we encounter it.
Bspec: 65182
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20231025151732.3461842-8-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
When recording the default LRC, the expectation is that the hardware's
original state settings (both register and instruction) will be written
out to the LRC upon first context switch. For many 3DSTATE_* state
instructions that don't truly have "default" values, this translates to
a simple instruction header (opcodes + dword length) being written to
the LRC, followed by an appropriate number of blank dwords as a place
holder. When userspace creates a context (which starts as a copy of the
default LRC), they'll generally emit real 3DSTATE_* as part of their
initialization to select the settings they desire. If they don't emit
one of the 3DSTATE instructions, then the zeroed dwords that remain in
their LRC image generally translate to various state remaining disabled.
This will either be what userspace wants or will lead to very
reproducible and easily-debugged problems (rendering glitches, engine
hangs).
It turns out that a subset of the 3DSTATE instructions, specifically
those belonging to the SVG (State Variable - Global) unit, are not only
emitting 0's for the instruction's "body" dwords, but also for the
instruction header dword if no specific state has been explicitly set
before context switch. This means that when the hardware switches to a
context that hasn't explicitly provided an appropriate state setting,
the hardware will just see a sequence of NOOPs in the spot reserved for
that 3DSTATE instruction while executing the LRC, and the actual
hardware state setting will unintentionally inherit the configuration
used by the previously running context. Now when userspace makes a
mistake and forgets to emit an important state instruction they no
longer get consistent, easily-reproducible corruption/hangs, but rather
erratic behavior where the presence/absence of a problem depends on what
other workloads are running on the system and what order the contexts
are scheduled on the engine.
A specific example of this that came up recently related to mesh shading
The OpenGL driver was not specifically emitting a 3DSTATE_MESH_CONTROL
to disable mesh shading at context init, so on context switch, mesh
shading would either be on or off depending on what the previous context
had been doing. Vulkan apps _were_ enabling mesh shading, so running a
Vulkan app and then context switching to an OpenGL app resulted in mesh
shading still unexpectedly being enabled during OpenGL operation, and
since other Mesh-related state was not properly initialized for that
context a GPU hang was seen. Due to the specific ordering requirements
(Vulkan app runs first, followed by OpenGL app), it took additional
debug effort to track down the cause of the problem.
There are various workarounds related to this behavior, with current
implementations handled in the userspace drivers. E.g., Wa_14019789679
and Wa_22018402687. However it's been suggested that the kernel driver
can help simplify things here by emitting zeroed SVG state with proper
instruction headers as part of our default context creation (i.e., at
the same point we apply LRC workarounds). This will help ensure that
any future cases where a userspace driver does not emit an important
state setting will result in consistent behavior.
Bspec: 46261
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20231025151732.3461842-7-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
RCS and CCS engines include several non-register gfxpipe commands in
their LRC images. Include these in the dump output so that we can see
exactly what's inside the context snapshot.
v2:
- Include raw instruction header in output
- Add 3DSTATE_AMFS_TEXTURE_POINTERS and 3DSTATE_MONOFILTER_SIZE. The
first was supposed to be removed in Xe_HPG, and the second by
gen12, but both still show up in the RCS LRC.
v3:
- Sanity check that we don't have numdw > remaining_dw. (Lucas)
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-14-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
For non-RCS engines, nearly all of the LRC state is composed of MI
instructions (specifically MI_LOAD_REGISTER_IMM). Providing a dump
interface allows us to verify that the context image layout matches
what's documented in the bspec, and also allows us to check whether LRC
workarounds are being properly captured by the default state we record
at startup.
For now, the non-MI instructions found in the RCS and CCS engines will
dump as "unknown;" parsing of those will be added in a follow-up patch.
v2:
- Add raw instruction header as well as decoded meaning. (Lucas)
- Check that num_dw isn't greater than remaining_dw for instructions
that have a "# dwords" field. (Lucas)
- Clarify comment about skipping over ppHWSP. (Lucas)
Bspec: 64993
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Extracting the common MI_* instructions that can be used with any engine
to their own header will make it easier as we add additional engine
instructions in upcoming patches.
Also, since the majority of GPU instructions (both MI and non-MI) have
a "length" field in bits 7:0 of the instruction header, a common define
is added for that. Instruction-specific length fields are still defined
for special case instructions that have larger/smaller length fields.
v2:
- Use "instr" instead of "inst" as the short form of "instruction"
everywhere. (Lucas)
- Include xe_reg_defs.h instead of the i915 compat header. (Lucas)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-12-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|