Age | Commit message (Collapse) | Author |
|
GH100/GBxxx have significant changes to the GSP-RM boot process.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
GH100/GBxxx use a slightly different set of firmwares to boot GSP-RM.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Split handling of NV01_DEVICE (and other related objects) out into its
own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Split NV01_ROOT handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Split base RM_ALLOC handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Split base RM_CONTROL handling out into its own module.
Aside from moving the function pointers, no code change is intended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Later patches in the series add HALs around various RM APIs in order to
support a newer version of GSP-RM firmware. In order to do this, begin
by splitting the code up into "modules" that roughly represent RM's API
boundaries so they can be more easily managed.
Aside from moving the RPC function pointers, no code change is indended.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
In order to specify a channel ID to RM during channel allocation, the
channel ID is broken down into a "userd page" index and an index into
that page.
It was assumed that RM would enforce that the same physical block of
memory be used for all CHIDs within a "userd page", and the GSP paths
override NVKM's normal CHID allocation to handle this.
However, none of that turns out to be necessary.
Remove the GSP-specific code and use the regular CHID allocation path.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Though the initial upstreamed GSP-RM version in nouveau was 535.113.01,
the code was developed against earlier versions.
535.42.02 modified the mailbox value used by GSP-RM to signal shutdown
has completed, which was missed at the time.
I'm not aware of any issues caused by this, but noticed the bug while
working on GB20x support.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Some GSP RPC commands need a new reply policy: "caller don't care about
the message content but want to make sure a reply is received". To
support this case, a new reply policy is introduced.
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY is a large GSP RPC command. The actual
required policy is NVKM_GSP_RPC_REPLY_POLL. This can be observed from the
dump of the GSP message queue. After the large GSP RPC command is issued,
GSP will write only an empty RPC header in the queue as the reply.
Without this change, the policy "receiving the entire message" is used
for NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY. This causes the timeout of receiving
the returned GSP message in the suspend/resume path.
Introduce the new reply policy NVKM_GSP_RPC_REPLY_POLL, which waits for
the returned GSP message but discards it for the caller. Use the new policy
NVKM_GSP_RPC_REPLY_POLL on the GSP RPC command
NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY.
Fixes: 50f290053d79 ("drm/nouveau: support handling the return of large GSP message")
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-3-zhiw@nvidia.com
|
|
There can be multiple cases of handling the GSP RPC messages, which are
the reply of GSP RPC commands according to the requirement of the
callers and the nature of the GSP RPC commands.
The current supported reply policies are "callers don't care" and "receive
the entire message" according to the requirement of the callers. To
introduce a new policy, factor out the current RPC command reply polices.
Also, centralize the handling of the reply in a single function.
Factor out NVKM_GSP_RPC_REPLY_NOWAIT as "callers don't care" and
NVKM_GSP_RPC_REPLY_RECV as "receive the entire message". Introduce a
kernel doc to document the policies. Factor out
r535_gsp_rpc_handle_reply().
No functional change is intended for small GSP RPC commands. For large GSP
commands, the caller decides the policy of how to handle the returned GSP
RPC message.
Cc: Ben Skeggs <bskeggs@nvidia.com>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250227013554.8269-2-zhiw@nvidia.com
|
|
Backmerge Linux 6.14-rc4 at the request of tzimmermann so misc-next
can base on rc4.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
If "rpc" is an error pointer then return directly. Otherwise it leads
to an error pointer dereference.
Fixes: 50f290053d79 ("drm/nouveau: support handling the return of large GSP message")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/b7052ac0-98e4-433b-ad58-f563bf51858c@stanley.mountain
|
|
Most kernel configs enable multiple Tegra SoC generations, causing this
typo to go unnoticed. But in the case where a kernel config is strictly
for Tegra186, this is a problem.
Fixes: 989863d7cbe5 ("drm/nouveau/pmu: select implementation based on available firmware")
Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218-nouveau-gm10b-guard-v2-1-a4de71500d48@gmail.com
|
|
Bring rc1 to start the new release dev.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
As the GSP message recv path is able to handle the return of large GSP
message, consume the return of large GSP message in the sending path.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-16-zhiw@nvidia.com
|
|
The max GSP message element size is 16 pages (including the headers). To
send a message larger than 16 pages, nvkm should split it into multiple
and send them accordingly. The first element has the expected function
number, while the rest are sent with function number as
NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD. GSP consumes the elements from
the cmdq and always writes the result back to the msgq. The result is also
formed as split elements.
However, nvkm is able to split the large GSP message and send them, but
totally not aware of handling the return of the large GSP message, which
are the split elements in the msgq. Thus, it keeps dumping the unknown RPC
messages from msgq, which is actually CONTINUATION_RECORD message,
discard them unexpectedly. Thus, the caller will not be able to consume
the result from GSP.
Introduce the handling of the return of large GSP message on the msgq path.
Slightly re-factor the low-level part of msg receiving routines. Merge the
split elements back into a large element before handling it to the upper
level. Thus, the upper-level of GSP RPC APIs don't need to be heavily
changed.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-15-zhiw@nvidia.com
|
|
Prepare for supporting receive the large GSP RPC message.
Factor out r535_gsp_msgq_recv_one_elem(). Fold its params into a data
structure of params. Move the allocation of the GSP RPC message to its
caller. Refine the variable names in the re-factor.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-14-zhiw@nvidia.com
|
|
To receive a GSP message queue element from the GSP status queue, the
driver needs to make sure there are available elements in the queue.
The previous r535_gsp_msgq_wait() consists of three functions, which is
a little too complicated for a single function:
- wait for an available element.
- peek the message element header in the queue.
- recevice the element from the queue.
Factor out r535_gsp_msgq_peek() and divide the functions in
r535_gsp_msgq_wait() into three functions.
No functional change is intended.
Cc: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-13-zhiw@nvidia.com
|
|
Refine the name to align with the terms in the kernel doc.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-12-zhiw@nvidia.com
|
|
The variable "msg" in r535_gsp_msg_recv() actually means the GSP RPC.
Refine the names to align with the terms in the kernel doc.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-11-zhiw@nvidia.com
|
|
The variable names in r535_gsp_rpc_push() are quite confusing and some
of them are not representing what they really are.
Update the names and explanations in the decoder section of the
kernel doc. Refine the names to align with the terms in the kernel doc.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-10-zhiw@nvidia.com
|
|
There has been a GSP_MSG_MAX_SIZE which represents the max size of a GSP
message element header. Use it instead of a magic number.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-9-zhiw@nvidia.com
|
|
The macro GSP_MSG_MAX_SIZE refers to another macro that doesn't exist.
It represents the max GSP message element size.
Fix the broken marco so it can be used to replace some magic numbers in
the code.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-8-zhiw@nvidia.com
|
|
The name "argc" has different meanings in different functions.
To improve the readability, it's better to refine it to a name that
reflects what it represents.
Rename "argc" to what it represents. Add terms in the decoder section to
explain their meaning.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
[ Fix indentation. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-7-zhiw@nvidia.com
|
|
The name "argv" has different meanings in different functions.
To improve the readability, it's better to refine it to a name that
reflects what it represents.
Rename "argv" to what it represents. Wrap the long container_of() into
to_payload_header() to denote a clear meaning and make checkpatch.pl
happy.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-6-zhiw@nvidia.com
|
|
The user of *rm_alloc_push() always pass 0 in repc.
Remove unused param repc since no user actually uses it.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-5-zhiw@nvidia.com
|
|
The name "argv" has different meanings in different functions.
To improve the readability, it's better to refine it to a name that
reflects what it represents.
Rename "repc" to what it represents in the GSP message send path.
Wrap the long container_of() into to_gsp_hdr() to make checkpatch.pl
happy.
No functional change is intended.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-4-zhiw@nvidia.com
|
|
The name "repc" has different meanings in different contexts.
To improve the readability, it's better to refine it to a name that
reflects what it actually represents.
Rename "repc" to "gsp_rpc_len" in the GSP message recv path. Add an
section in the doc to explain the terms.
No functional change is intended.
Cc: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-3-zhiw@nvidia.com
|
|
In order to explain the name clean-ups in GSP RPC routines, a kernel
doc to explain the memory layout and terms is required.
Add a kernel doc to introduce the GSP RPC.
Cc: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
[ Fix bullet list indentation; add SPDX-License-Identifier. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124182958.2040494-2-zhiw@nvidia.com
|
|
A regression was caused by commit e4b5ccd392b9 ("drm/v3d: Ensure job
pointer is set to NULL after job completion"), but this commit is not
yet in next-fixes, fast-forward it.
Note that this recreates Linus merge in 96c84703f1cf ("Merge tag
'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel")
because I didn't want to backmerge a random point in the merge window.
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
|
|
Pull drm updates from Dave Airlie:
"There are two external interactions of note, the msm tree pull in some
opp tree, hopefully the opp tree arrives from the same git tree
however it normally does.
There is also a new cgroup controller for device memory, that is used
by drm, so is merging through my tree. This will hopefully help open
up gpu cgroup usage a bit more and move us forward.
There is a new accelerator driver for the AMD XDNA Ryzen AI NPUs.
Then the usual xe/amdgpu/i915/msm leaders and lots of changes and
refactors across the board:
core:
- device memory cgroup controller added
- Remove driver date from drm_driver
- Add drm_printer based hex dumper
- drm memory stats docs update
- scheduler documentation improvements
new driver:
- amdxdna - Ryzen AI NPU support
connector:
- add a mutex to protect ELD
- make connector setup two-step
panels:
- Introduce backlight quirks infrastructure
- New panels: KDB KD116N2130B12, Tianma TM070JDHG34-00,
- Multi-Inno Technology MI1010Z1T-1CP11
bridge:
- ti-sn65dsi83: Add ti,lvds-vod-swing optional properties
- Provide default implementation of atomic_check for HDMI bridges
- it605: HDCP improvements, MCCS Support
xe:
- make OA buffer size configurable
- GuC capture fixes
- add ufence and g2h flushes
- restore system memory GGTT mappings
- ioctl fixes
- SRIOV PF scheduling priority
- allow fault injection
- lots of improvements/refactors
- Enable GuC's WA_DUAL_QUEUE for newer platforms
- IRQ related fixes and improvements
i915:
- More accurate engine busyness metrics with GuC submission
- Ensure partial BO segment offset never exceeds allowed max
- Flush GuC CT receive tasklet during reset preparation
- Some DG2 refactor to fix DG2 bugs when operating with certain CPUs
- Fix DG1 power gate sequence
- Enabling uncompressed 128b/132b UHBR SST
- Handle hdmi connector init failures, and no HDMI/DP cases
- More robust engine resets on Haswell and older
i915/xe display:
- HDCP fixes for Xe3Lpd
- New GSC FW ARL-H/ARL-U
- support 3 VDSC engines 12 slices
- MBUS joining sanitisation
- reconcile i915/xe display power mgmt
- Xe3Lpd fixes
- UHBR rates for Thunderbolt
amdgpu:
- DRM panic support
- track BO memory stats at runtime
- Fix max surface handling in DC
- Cleaner shader support for gfx10.3 dGPUs
- fix drm buddy trim handling
- SDMA engine reset updates
- Fix doorbell ttm cleanup
- RAS updates
- ISP updates
- SDMA queue reset support
- Rework DPM powergating interfaces
- Documentation updates and cleanups
- DCN 3.5 updates
- Use a pm notifier to more gracefully handle VRAM eviction on
suspend or hibernate
- Add debugfs interfaces for forcing scheduling to specific engine
instances
- GG 9.5 updates
- IH 4.4 updates
- Make missing optional firmware less noisy
- PSP 13.x updates
- SMU 13.x updates
- VCN 5.x updates
- JPEG 5.x updates
- GC 12.x updates
- DC FAMS updates
amdkfd:
- GG 9.5 updates
- Logging improvements
- Shader debugger fixes
- Trap handler cleanup
- Cleanup includes
- Eviction fence wq fix
msm:
- MDSS:
- properly described UBWC registers
- added SM6150 (aka QCS615) support
- DPU:
- added SM6150 (aka QCS615) support
- enabled wide planes if virtual planes are enabled (by using two
SSPPs for a single plane)
- added CWB hardware blocks support
- DSI:
- added SM6150 (aka QCS615) support
- GPU:
- Print GMU core fw version
- GMU bandwidth voting for a740 and a750
- Expose uche trap base via uapi
- UAPI error reporting
rcar-du:
- Add r8a779h0 Support
ivpu:
- Fix qemu crash when using passthrough
nouveau:
- expose GSP-RM logging buffers via debugfs
panfrost:
- Add MT8188 Mali-G57 MC3 support
rockchip:
- Gamma LUT support
hisilicon:
- new HIBMC support
virtio-gpu:
- convert to helpers
- add prime support for scanout buffers
v3d:
- Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
vc4:
- Add support for BCM2712
vkms:
- line-per-line compositing algorithm to improve performance
zynqmp:
- Add DP audio support
mediatek:
- dp: Add sdp path reset
- dp: Support flexible length of DP calibration data
etnaviv:
- add fdinfo memory support
- add explicit reset handling"
* tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (1070 commits)
drm/bridge: fix documentation for the hdmi_audio_prepare() callback
doc/cgroup: Fix title underline length
drm/doc: Include new drm-compute documentation
cgroup/dmem: Fix parameters documentation
cgroup/dmem: Select PAGE_COUNTER
kernel/cgroup: Remove the unused variable climit
drm/display: hdmi: Do not read EDID on disconnected connectors
drm/tests: hdmi: Add connector disablement test
drm/connector: hdmi: Do atomic check when necessary
drm/amd/display: 3.2.316
drm/amd/display: avoid reset DTBCLK at clock init
drm/amd/display: improve dpia pre-train
drm/amd/display: Apply DML21 Patches
drm/amd/display: Use HW lock mgr for PSR1
drm/amd/display: Revised for Replay Pseudo vblank control
drm/amd/display: Add a new flag for replay low hz
drm/amd/display: Remove unused read_ono_state function from Hwss module
drm/amd/display: Do not elevate mem_type change to full update
drm/amd/display: Do not wait for PSR disable on vbl enable
drm/amd/display: Remove unnecessary eDP power down
...
|
|
Fix some malformed kernel-doc comments that were added in a recent commit.
Also, kernel-doc does not support global variables, so change those
kernel-doc comments into regular comments.
Fixes: 214c9539cf2f ("drm/nouveau: expose GSP-RM logging buffers via debugfs")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412310834.jtCJj4oz-lkp@intel.com/
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250108234329.842256-1-ttabi@nvidia.com
|
|
Macbook 5,1 with MCP79 lost its backlight control since the recent
change for supporting GSP-RM; it rewrote the whole nv50 backlight
control code and each display engine is supposed to have an entry for
IOR bl callback, but it didn't cover mcp77.
This patch adds the missing bl entry initialization for mcp77 display
engine to recover the backlight control.
Fixes: 2274ce7e3681 ("drm/nouveau/disp: add output backlight control methods")
Cc: stable@vger.kernel.org
Link: https://bugzilla.suse.com/show_bug.cgi?id=1223838
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250102114944.11499-1-tiwai@suse.de
|
|
The LOGINIT, LOGINTR, LOGRM, and LOGPMU buffers are circular buffers
that have printf-like logs from GSP-RM and PMU encoded in them.
LOGINIT, LOGINTR, and LOGRM are allocated by Nouveau and their DMA
addresses are passed to GSP-RM during initialization. The buffers are
required for GSP-RM to initialize properly.
LOGPMU is also allocated by Nouveau, but its contents are updated
when Nouveau receives an NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPC from
GSP-RM. Nouveau then copies the RPC to the buffer.
The messages are encoded as an array of variable-length structures that
contain the parameters to an NV_PRINTF call. The format string and
parameter count are stored in a special ELF image that contains only
logging strings. This image is not currently shipped with the Nvidia
driver.
There are two methods to extract the logs.
OpenRM tries to load the logging ELF, and if present, parses the log
buffers in real time and outputs the strings to the kernel console.
Alternatively, and this is the method used by this patch, the buffers
can be exposed to user space, and a user-space tool (along with the
logging ELF image) can parse the buffer and dump the logs.
This method has the advantage that it allows the buffers to be parsed
even when the logging ELF file is not available to the user. However,
it has the disadvantage the debugfs entries need to remain until the
driver is unloaded.
The buffers are exposed via debugfs. If GSP-RM fails to initialize, then
Nouveau immediately shuts down the GSP interface. This would normally
also deallocate the logging buffers, thereby preventing the user from
capturing the debug logs.
To avoid this, introduce the keep-gsp-logging command line parameter. If
specified, and if at least one logging buffer has content, then Nouveau
will migrate these buffers into new debugfs entries that are retained
until the driver unloads.
An end-user can capture the logs using the following commands:
cp /sys/kernel/debug/nouveau/<path>/loginit loginit
cp /sys/kernel/debug/nouveau/<path>/logrm logrm
cp /sys/kernel/debug/nouveau/<path>/logintr logintr
cp /sys/kernel/debug/nouveau/<path>/logpmu logpmu
where (for a PCI device) <path> is the PCI ID of the GPU (e.g.
0000:65:00.0).
Since LOGPMU is not needed for normal GSP-RM operation, it is only
created if debugfs is available. Otherwise, the
NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPCs are ignored.
A simple way to test the buffer migration feature is to have
nvkm_gsp_init() return an error code.
Tested-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241030202952.694055-2-ttabi@nvidia.com
|
|
Store the struct device pointer used to allocate the DMA buffer in
the nvkm_gsp_mem object. This allows nvkm_gsp_mem_dtor() to release
the buffer without needing the nvkm_gsp. This is needed so that
we can retain DMA buffers even after the nvkm_gsp object is deleted.
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241030202952.694055-1-ttabi@nvidia.com
|
|
Kickstart 6.14 cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
r535_gsp_cmdq_push() waits for the available page in the GSP cmdq
buffer when handling a large RPC request. When it sees at least one
available page in the cmdq, it quits the waiting with the amount of
free buffer pages in the queue.
Unfortunately, it always takes the [write pointer, buf_size) as
available buffer pages before rolling back and wrongly calculates the
size of the data should be copied. Thus, it can overwrite the RPC
request that GSP is currently reading, which causes GSP hang due
to corrupted RPC request:
[ 549.209389] ------------[ cut here ]------------
[ 549.214010] WARNING: CPU: 8 PID: 6314 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:116 r535_gsp_msgq_wait+0xd0/0x190 [nvkm]
[ 549.225678] Modules linked in: nvkm(E+) gsp_log(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_timer(E) snd_seq_device(E) snd(E) soundcore(E) rfkill(E) qrtr(E) vfat(E) fat(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) mlx5_ib(E) amd64_edac(E) edac_mce_amd(E) kvm_amd(E) ib_uverbs(E) kvm(E) ib_core(E) acpi_ipmi(E) ipmi_si(E) mxm_wmi(E) ipmi_devintf(E) rapl(E) i2c_piix4(E) wmi_bmof(E) joydev(E) ptdma(E) acpi_cpufreq(E) k10temp(E) pcspkr(E) ipmi_msghandler(E) xfs(E) libcrc32c(E) ast(E) i2c_algo_bit(E) crct10dif_pclmul(E) drm_shmem_helper(E) nvme_tcp(E) crc32_pclmul(E) ahci(E) drm_kms_helper(E) libahci(E) nvme_fabrics(E) crc32c_intel(E) nvme(E) cdc_ether(E) mlx5_core(E) nvme_core(E) usbnet(E) drm(E) libata(E) ccp(E) ghash_clmulni_intel(E) mii(E) t10_pi(E) mlxfw(E) sp5100_tco(E) psample(E) pci_hyperv_intf(E) wmi(E) dm_multipath(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) tls(E) libcxgbi(E) libcxgb(E) qla4xxx(E)
[ 549.225752] iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E) [last unloaded: gsp_log(E)]
[ 549.326293] CPU: 8 PID: 6314 Comm: insmod Tainted: G E 6.9.0-rc6+ #1
[ 549.334039] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022
[ 549.341781] RIP: 0010:r535_gsp_msgq_wait+0xd0/0x190 [nvkm]
[ 549.347343] Code: 08 00 00 89 da c1 e2 0c 48 8d ac 11 00 10 00 00 48 8b 0c 24 48 85 c9 74 1f c1 e0 0c 4c 8d 6d 30 83 e8 30 89 01 e9 68 ff ff ff <0f> 0b 49 c7 c5 92 ff ff ff e9 5a ff ff ff ba ff ff ff ff be c0 0c
[ 549.366090] RSP: 0018:ffffacbccaaeb7d0 EFLAGS: 00010246
[ 549.371315] RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000923e28
[ 549.378451] RDX: 0000000000000000 RSI: 0000000055555554 RDI: ffffacbccaaeb730
[ 549.385590] RBP: 0000000000000001 R08: ffff8bd14d235f70 R09: ffff8bd14d235f70
[ 549.392721] R10: 0000000000000002 R11: ffff8bd14d233864 R12: 0000000000000020
[ 549.399854] R13: ffffacbccaaeb818 R14: 0000000000000020 R15: ffff8bb298c67000
[ 549.406988] FS: 00007f5179244740(0000) GS:ffff8bd14d200000(0000) knlGS:0000000000000000
[ 549.415076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 549.420829] CR2: 00007fa844000010 CR3: 00000001567dc005 CR4: 0000000000770ef0
[ 549.427963] PKRU: 55555554
[ 549.430672] Call Trace:
[ 549.433126] <TASK>
[ 549.435233] ? __warn+0x7f/0x130
[ 549.438473] ? r535_gsp_msgq_wait+0xd0/0x190 [nvkm]
[ 549.443426] ? report_bug+0x18a/0x1a0
[ 549.447098] ? handle_bug+0x3c/0x70
[ 549.450589] ? exc_invalid_op+0x14/0x70
[ 549.454430] ? asm_exc_invalid_op+0x16/0x20
[ 549.458619] ? r535_gsp_msgq_wait+0xd0/0x190 [nvkm]
[ 549.463565] r535_gsp_msg_recv+0x46/0x230 [nvkm]
[ 549.468257] r535_gsp_rpc_push+0x106/0x160 [nvkm]
[ 549.473033] r535_gsp_rpc_rm_ctrl_push+0x40/0x130 [nvkm]
[ 549.478422] nvidia_grid_init_vgpu_types+0xbc/0xe0 [nvkm]
[ 549.483899] nvidia_grid_init+0xb1/0xd0 [nvkm]
[ 549.488420] ? srso_alias_return_thunk+0x5/0xfbef5
[ 549.493213] nvkm_device_pci_probe+0x305/0x420 [nvkm]
[ 549.498338] local_pci_probe+0x46/0xa0
[ 549.502096] pci_call_probe+0x56/0x170
[ 549.505851] pci_device_probe+0x79/0xf0
[ 549.509690] ? driver_sysfs_add+0x59/0xc0
[ 549.513702] really_probe+0xd9/0x380
[ 549.517282] __driver_probe_device+0x78/0x150
[ 549.521640] driver_probe_device+0x1e/0x90
[ 549.525746] __driver_attach+0xd2/0x1c0
[ 549.529594] ? __pfx___driver_attach+0x10/0x10
[ 549.534045] bus_for_each_dev+0x78/0xd0
[ 549.537893] bus_add_driver+0x112/0x210
[ 549.541750] driver_register+0x5c/0x120
[ 549.545596] ? __pfx_nvkm_init+0x10/0x10 [nvkm]
[ 549.550224] do_one_initcall+0x44/0x300
[ 549.554063] ? do_init_module+0x23/0x240
[ 549.557989] do_init_module+0x64/0x240
Calculate the available buffer page before rolling back based on
the result from the waiting.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017071922.2518724-3-zhiw@nvidia.com
|
|
A GSP event message consists three parts: message header, RPC header,
message body. GSP calculates the number of pages to write from the
total size of a GSP message. This behavior can be observed from the
movement of the write pointer.
However, nvkm takes only the size of RPC header and message body as
the message size when advancing the read pointer. When handling a
two-page GSP message in the non rollback case, It wrongly takes the
message body of the previous message as the message header of the next
message. As the "message length" tends to be zero, in the calculation of
size needs to be copied (0 - size of (message header)), the size needs to
be copied will be "0xffffffxx". It also triggers a kernel panic due to a
NULL pointer error.
[ 547.614102] msg: 00000f90: ff ff ff ff ff ff ff ff 40 d7 18 fb 8b 00 00 00 ........@.......
[ 547.622533] msg: 00000fa0: 00 00 00 00 ff ff ff ff ff ff ff ff 00 00 00 00 ................
[ 547.630965] msg: 00000fb0: ff ff ff ff ff ff ff ff 00 00 00 00 ff ff ff ff ................
[ 547.639397] msg: 00000fc0: ff ff ff ff 00 00 00 00 ff ff ff ff ff ff ff ff ................
[ 547.647832] nvkm 0000:c1:00.0: gsp: peek msg rpc fn:0 len:0x0/0xffffffffffffffe0
[ 547.655225] nvkm 0000:c1:00.0: gsp: get msg rpc fn:0 len:0x0/0xffffffffffffffe0
[ 547.662532] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 547.669485] #PF: supervisor read access in kernel mode
[ 547.674624] #PF: error_code(0x0000) - not-present page
[ 547.679755] PGD 0 P4D 0
[ 547.682294] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 547.686643] CPU: 22 PID: 322 Comm: kworker/22:1 Tainted: G E 6.9.0-rc6+ #1
[ 547.694893] Hardware name: ASRockRack 1U1G-MILAN/N/ROMED8-NL, BIOS L3.12E 09/06/2022
[ 547.702626] Workqueue: events r535_gsp_msgq_work [nvkm]
[ 547.707921] RIP: 0010:r535_gsp_msg_recv+0x87/0x230 [nvkm]
[ 547.713375] Code: 00 8b 70 08 48 89 e1 31 d2 4c 89 f7 e8 12 f5 ff ff 48 89 c5 48 85 c0 0f 84 cf 00 00 00 48 81 fd 00 f0 ff ff 0f 87 c4 00 00 00 <8b> 55 10 41 8b 46 30 85 d2 0f 85 f6 00 00 00 83 f8 04 76 10 ba 05
[ 547.732119] RSP: 0018:ffffabe440f87e10 EFLAGS: 00010203
[ 547.737335] RAX: 0000000000000010 RBX: 0000000000000008 RCX: 000000000000003f
[ 547.744461] RDX: 0000000000000000 RSI: ffffabe4480a8030 RDI: 0000000000000010
[ 547.751585] RBP: 0000000000000010 R08: 0000000000000000 R09: ffffabe440f87bb0
[ 547.758707] R10: ffffabe440f87dc8 R11: 0000000000000010 R12: 0000000000000000
[ 547.765834] R13: 0000000000000000 R14: ffff9351df1e5000 R15: 0000000000000000
[ 547.772958] FS: 0000000000000000(0000) GS:ffff93708eb00000(0000) knlGS:0000000000000000
[ 547.781035] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 547.786771] CR2: 0000000000000020 CR3: 00000003cc220002 CR4: 0000000000770ef0
[ 547.793896] PKRU: 55555554
[ 547.796600] Call Trace:
[ 547.799046] <TASK>
[ 547.801152] ? __die+0x20/0x70
[ 547.804211] ? page_fault_oops+0x75/0x170
[ 547.808221] ? print_hex_dump+0x100/0x160
[ 547.812226] ? exc_page_fault+0x64/0x150
[ 547.816152] ? asm_exc_page_fault+0x22/0x30
[ 547.820341] ? r535_gsp_msg_recv+0x87/0x230 [nvkm]
[ 547.825184] r535_gsp_msgq_work+0x42/0x50 [nvkm]
[ 547.829845] process_one_work+0x196/0x3d0
[ 547.833861] worker_thread+0x2fc/0x410
[ 547.837613] ? __pfx_worker_thread+0x10/0x10
[ 547.841885] kthread+0xdf/0x110
[ 547.845031] ? __pfx_kthread+0x10/0x10
[ 547.848775] ret_from_fork+0x30/0x50
[ 547.852354] ? __pfx_kthread+0x10/0x10
[ 547.856097] ret_from_fork_asm+0x1a/0x30
[ 547.860019] </TASK>
[ 547.862208] Modules linked in: nvkm(E) gsp_log(E) snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_timer(E) snd_seq_device(E) snd(E) soundcore(E) rfkill(E) qrtr(E) vfat(E) fat(E) ipmi_ssif(E) amd_atl(E) intel_rapl_msr(E) intel_rapl_common(E) amd64_edac(E) mlx5_ib(E) edac_mce_amd(E) kvm_amd(E) ib_uverbs(E) kvm(E) ib_core(E) acpi_ipmi(E) ipmi_si(E) ipmi_devintf(E) mxm_wmi(E) joydev(E) rapl(E) ptdma(E) i2c_piix4(E) acpi_cpufreq(E) wmi_bmof(E) pcspkr(E) k10temp(E) ipmi_msghandler(E) xfs(E) libcrc32c(E) ast(E) i2c_algo_bit(E) drm_shmem_helper(E) crct10dif_pclmul(E) drm_kms_helper(E) ahci(E) crc32_pclmul(E) nvme_tcp(E) libahci(E) nvme(E) crc32c_intel(E) nvme_fabrics(E) cdc_ether(E) nvme_core(E) usbnet(E) mlx5_core(E) ghash_clmulni_intel(E) drm(E) libata(E) ccp(E) mii(E) t10_pi(E) mlxfw(E) sp5100_tco(E) psample(E) pci_hyperv_intf(E) wmi(E) dm_multipath(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) tls(E) libcxgbi(E) libcxgb(E) qla4xxx(E)
[ 547.862283] iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) fuse(E) [last unloaded: gsp_log(E)]
[ 547.962691] CR2: 0000000000000020
[ 547.966003] ---[ end trace 0000000000000000 ]---
[ 549.012012] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 1370499158 wd_nsec: 1370498904
[ 549.043676] pstore: backend (erst) writing error (-28)
[ 549.050924] RIP: 0010:r535_gsp_msg_recv+0x87/0x230 [nvkm]
[ 549.056389] Code: 00 8b 70 08 48 89 e1 31 d2 4c 89 f7 e8 12 f5 ff ff 48 89 c5 48 85 c0 0f 84 cf 00 00 00 48 81 fd 00 f0 ff ff 0f 87 c4 00 00 00 <8b> 55 10 41 8b 46 30 85 d2 0f 85 f6 00 00 00 83 f8 04 76 10 ba 05
[ 549.075138] RSP: 0018:ffffabe440f87e10 EFLAGS: 00010203
[ 549.080361] RAX: 0000000000000010 RBX: 0000000000000008 RCX: 000000000000003f
[ 549.087484] RDX: 0000000000000000 RSI: ffffabe4480a8030 RDI: 0000000000000010
[ 549.094609] RBP: 0000000000000010 R08: 0000000000000000 R09: ffffabe440f87bb0
[ 549.101733] R10: ffffabe440f87dc8 R11: 0000000000000010 R12: 0000000000000000
[ 549.108857] R13: 0000000000000000 R14: ffff9351df1e5000 R15: 0000000000000000
[ 549.115982] FS: 0000000000000000(0000) GS:ffff93708eb00000(0000) knlGS:0000000000000000
[ 549.124061] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 549.129807] CR2: 0000000000000020 CR3: 00000003cc220002 CR4: 0000000000770ef0
[ 549.136940] PKRU: 55555554
[ 549.139653] Kernel panic - not syncing: Fatal exception
[ 549.145054] Kernel Offset: 0x18c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 549.165074] ---[ end Kernel panic - not syncing: Fatal exception ]---
Also, nvkm wrongly advances the read pointer when handling a two-page GSP
message in the rollback case. In the rollback case, the GSP message will
be copied in two rounds. When handling a two-page GSP message, nvkm first
copies amount of (GSP_PAGE_SIZE - header) data into the buffer, then
advances the read pointer by the result of DIV_ROUND_UP(size,
GSP_PAGE_SIZE). Thus, the read pointer is advanced by 1.
Next, nvkm copies the amount of (total size - (GSP_PAGE_SIZE -
header)) data into the buffer. The left amount of the data will be always
larger than one page since the message header is not taken into account
in the first copy. Thus, the read pointer is advanced by DIV_ROUND_UP(
size(larger than one page), GSP_PAGE_SIZE) = 2.
In the end, the read pointer is wrongly advanced by 3 when handling a
two-page GSP message in the rollback case.
Fix the problems by taking the total size of the message into account
when advancing the read pointer and calculate the read pointer in the end
of the all copies for the rollback case.
BTW: the two-page GSP message can be observed in the msgq when vGPU is
enabled.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017071922.2518724-2-zhiw@nvidia.com
|
|
Pull drm updates from Dave Airlie:
"There's a lot of rework, the panic helper support is being added to
more drivers, v3d gets support for HW superpages, scheduler
documentation, drm client and video aperture reworks, some new
MAINTAINERS added, amdgpu has the usual lots of IP refactors, Intel
has some Pantherlake enablement and xe is getting some SRIOV bits, but
just lots of stuff everywhere.
core:
- split DSC helpers from DP helpers
- clang build fixes for drm/mm test
- drop simple pipeline support for gem vram
- document submission error signaling
- move drm_rect to drm core module from kms helper
- add default client setup to most drivers
- move to video aperture helpers instead of drm ones
tests:
- new framebuffer tests
ttm:
- remove swapped and pinned BOs from TTM lru
panic:
- fix uninit spinlock
- add ABGR2101010 support
bridge:
- add TI TDP158 support
- use standard PM OPS
dma-fence:
- use read_trylock instead of read_lock to help lockdep
scheduler:
- add errno to sched start to report different errors
- add locking to drm_sched_entity_modify_sched
- improve documentation
xe:
- add drm_line_printer
- lots of refactoring
- Enable Xe2 + PES disaggregation
- add new ARL PCI ID
- SRIOV development work
- fix exec unnecessary implicit fence
- define and parse OA sync props
- forcewake refactoring
i915:
- Enable BMG/LNL ultra joiner
- Enable 10bpx + CCS scanout on ICL+, fp16/CCS on TGL+
- use DSB for plane/color mgmt
- Arrow lake PCI IDs
- lots of i915/xe display refactoring
- enable PXP GuC autoteardown
- Pantherlake (PTL) Xe3 LPD display enablement
- Allow fastset HDR infoframe changes
- write DP source OUI for non-eDP sinks
- share PCI IDs between i915 and xe
amdgpu:
- SDMA queue reset support
- SMU 13.0.6, JPEG 4.0.3 updates
- Initial runtime repartitioning support
- rework IP structs for multiple IP instances
- Fetch EDID from _DDC if available
- SMU13 zero rpm user control
- lots of fixes/cleanups
amdkfd:
- Increase event FIFO size
- add topology cap flag for per queue reset
msm:
- DPU:
- SA8775P support
- (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support
- Enable large framebuffer support
- Drop MSM8998 and SDM845
- DP:
- SA8775P support
- GPU:
- a7xx preemption support
- Adreno A663 support
ast:
- warn about unsupported TX chips
ivpu:
- add coredump
- add pantherlake support
rockchip:
- 4K@60Hz display enablement
- generate pll programming tables
panthor:
- add timestamp query API
- add realtime group priority
- add fdinfo support
etnaviv:
- improve handling of DMA address limits
- improve GPU hangcheck
exynos:
- Decon Exynos7870 support
mediatek:
- add OF graph support
omap:
- locking fixes
bochs:
- convert to gem/shmem from simpledrm
v3d:
- support big/super pages
- add gemfs
vc4:
- BCM2712 support refactoring
- add YUV444 format support
udmabuf:
- folio related fixes
nouveau:
- add panic support on nv50+"
* tag 'drm-next-2024-11-21' of https://gitlab.freedesktop.org/drm/kernel: (1583 commits)
drm/xe/guc: Fix dereference before NULL check
drm/amd: Fix initialization mistake for NBIO 7.7.0
Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"
drm/amd/display: Fix failure to read vram info due to static BP_RESULT
drm/amdgpu: enable GTT fallback handling for dGPUs only
drm/amd/amdgpu: limit single process inside MES
drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X
drm/amdgpu/mes12: correct kiq unmap latency
drm/amdgpu: Support vcn and jpeg error info parsing
drm/amd : Update MES API header file for v11 & v12
drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling
drm/amdkfd: change kfd process kref count at creation
drm/amdgpu: Cleanup shift coding style
drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data
drm/amdgpu: Implement virt req_ras_err_count
drm/amdgpu: VF Query RAS Caps from Host if supported
drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry
drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support
drm/amd/display: 3.2.309
drm/amd/display: Adjust VSDB parser for replay feature
...
|
|
eb284f4b3781 drm/nouveau/dp: Honor GSP link training retry timeouts
tried to fix a problem with panel retires, however it appears
the auxch also needs the same treatment, so add the same retry
wrapper around it.
This fixes some eDP panels after a suspend/resume cycle.
Fixes: eb284f4b3781 ("drm/nouveau/dp: Honor GSP link training retry timeouts")
Cc: stable@vger.kernel.org
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241111034126.2028401-2-airlied@gmail.com
|
|
The upper layer transfer functions expect EBUSY as a return
for when retries should be done.
Fix the AUX error translation, but also check for both errors
in a few places.
Fixes: eb284f4b3781 ("drm/nouveau/dp: Honor GSP link training retry timeouts")
Cc: stable@vger.kernel.org
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241111034126.2028401-1-airlied@gmail.com
|
|
When this code moved to non-coherent allocator the sync was put too
early for some firmwares which called the setup function, move the
sync down after the setup function.
Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Fixes: 9b340aeb26d5 ("nouveau/firmware: use dma non-coherent allocator")
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241114004603.3095485-1-airlied@gmail.com
|
|
When the call to gf100_grctx_generate() fails, unlock gr->fecs.mutex
before returning the error.
Fixes smatch warning:
drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:480 gf100_gr_chan_new() warn: inconsistent returns '&gr->fecs.mutex'.
Fixes: ca081fff6ecc ("drm/nouveau/gr/gf100-: generate golden context during first object alloc")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241026173844.2392679-1-lihuafei1@huawei.com
|
|
The goal is to clean-up Linux repository from AUX file names, because
the use of such file names is prohibited on other operating systems
such as Windows, so the Linux repository cannot be cloned and
edited on them.
Signed-off-by: Benjamin Szőke <egyszeregy@freemail.hu>
Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240603091558.35672-1-egyszeregy@freemail.hu
|
|
Get drm-misc-next to up v6.12-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
Pull drm updates from Dave Airlie:
"This adds a couple of patches outside the drm core, all should be
acked appropriately, the string and pstore ones are the main ones that
come to mind.
Otherwise it's the usual drivers, xe is getting enabled by default on
some new hardware, we've changed the device number handling to allow
more devices, and we added some optional rust code to create QR codes
in the panic handler, an idea first suggested I think 10 years ago :-)
string:
- add mem_is_zero()
core:
- support more device numbers
- use XArray for minor ids
- add backlight constants
- Split dma fence array creation into alloc and arm
fbdev:
- remove usage of old fbdev hooks
kms:
- Add might_fault() to drm_modeset_lock priming
- Add dynamic per-crtc vblank configuration support
dma-buf:
- docs cleanup
buddy:
- Add start address support for trim function
printk:
- pass description to kmsg_dump
scheduler:
- Remove full_recover from drm_sched_start
ttm:
- Make LRU walk restartable after dropping locks
- Allow direct reclaim to allocate local memory
panic:
- add display QR code (in rust)
displayport:
- mst: GUID improvements
bridge:
- Silence error message on -EPROBE_DEFER
- analogix: Clean aup
- bridge-connector: Fix double free
- lt6505: Disable interrupt when powered off
- tc358767: Make default DP port preemphasis configurable
- lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR
- anx7625: simplify OF array handling
- dw-hdmi: simplify clock handling
- lontium-lt8912b: fix mode validation
- nwl-dsi: fix mode vsync/hsync polarity
xe:
- Enable LunarLake and Battlemage support
- Introducing Xe2 ccs modifiers for integrated and discrete graphics
- rename xe perf to xe observation
- use wb caching on DGFX for system memory
- add fence timeouts
- Lunar Lake graphics/media/display workarounds
- Battlemage workarounds
- Battlemage GSC support
- GSC and HuC fw updates for LL/BM
- use dma_fence_chain_free
- refactor hw engine lookup and mmio access
- enable priority mem read for Xe2
- Add first GuC BMG fw
- fix dma-resv lock
- Fix DGFX display suspend/resume
- Use xe_managed for kernel BOs
- Use reserved copy engine for user binds on faulting devices
- Allow mixing dma-fence jobs and long-running faulting jobs
- fix media TLB invalidation
- fix rpm in TTM swapout path
- track resources and VF state by PF
i915:
- Type-C programming fix for MTL+
- FBC cleanup
- Calc vblank delay more accurately
- On DP MST, Enable LT fallback for UHBR<->non-UHBR rates
- Fix DP LTTPR detection
- limit relocations to INT_MAX
- fix long hangs in buddy allocator on DG2/A380
amdgpu:
- Per-queue reset support
- SDMA devcoredump support
- DCN 4.0.1 updates
- GFX12/VCN4/JPEG4 updates
- Convert vbios embedded EDID to drm_edid
- GFX9.3/9.4 devcoredump support
- process isolation framework for GFX 9.4.3/4
- take IOMMU mappings into account for P2P DMA
amdkfd:
- CRIU fixes
- HMM fix
- Enable process isolation support for GFX 9.4.3/4
- Allow users to target recommended SDMA engines
- KFD support for targetting queues on recommended SDMA engines
radeon:
- remove .load and drm_dev_alloc
- Fix vbios embedded EDID size handling
- Convert vbios embedded EDID to drm_edid
- Use GEM references instead of TTM
- r100 cp init cleanup
- Fix potential overflows in evergreen CS offset tracking
msm:
- DPU:
- implement DP/PHY mapping on SC8180X
- Enable writeback on SM8150, SC8180X, SM6125, SM6350
- DP:
- Enable widebus on all relevant chipsets
- MSM8998 HDMI support
- GPU:
- A642L speedbin support
- A615/A306/A621 support
- A7xx devcoredump support
ast:
- astdp: Support AST2600 with VGA
- Clean up HPD
- Fix timeout loop for DP link training
- reorganize output code by type (VGA, DP, etc)
- convert to struct drm_edid
- fix BMC handling for all outputs
exynos:
- drop stale MAINTAINERS pattern
- constify struct
loongson:
- use GEM refcount over TTM
mgag200:
- Improve BMC handling
- Support VBLANK intterupts
- transparently support BMC outputs
nouveau:
- Refactor and clean up internals
- Use GEM refcount over TTM's
gm12u320:
- convert to struct drm_edid
gma500:
- update i2c terms
lcdif:
- pixel clock fix
host1x:
- fix syncpoint IRQ during resume
- use iommu_paging_domain_alloc()
imx:
- ipuv3: convert to struct drm_edid
omapdrm:
- improve error handling
- use common helper for_each_endpoint_of_node()
panel:
- add support for BOE TV101WUM-LL2 plus DT bindings
- novatek-nt35950: improve error handling
- nv3051d: improve error handling
- panel-edp:
- add support for BOE NE140WUM-N6G
- revert support for SDC ATNA45AF01
- visionox-vtdr6130:
- improve error handling
- use devm_regulator_bulk_get_const()
- boe-th101mb31ig002:
- Support for starry-er88577 MIPI-DSI panel plus DT
- Fix porch parameter
- edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE
NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2,
CMN N116BCP-EA2, CSW MNB601LS1-4
- himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT
- ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT
- jd9365da:
- Support Melfas lmfbx101117480 MIPI-DSI panel plus DT
- Refactor for code sharing
- panel-edp: fix name for HKC MB116AN01
- jd9365da: fix "exit sleep" commands
- jdi-fhd-r63452: simplify error handling with DSI multi-style
helpers
- mantix-mlaf057we51: simplify error handling with DSI multi-style
helpers
- simple:
- support Innolux G070ACE-LH3 plus DT bindings
- support On Tat Industrial Company KD50G21-40NT-A1 plus DT
bindings
- st7701:
- decouple DSI and DRM code
- add SPI support
- support Anbernic RG28XX plus DT bindings
mediatek:
- support alpha blending
- remove cl in struct cmdq_pkt
- ovl adaptor fix
- add power domain binding for mediatek DPI controller
renesas:
- rz-du: add support for RZ/G2UL plus DT bindings
rockchip:
- Improve DP sink-capability reporting
- dw_hdmi: Support 4k@60Hz
- vop:
- Support RGB display on Rockchip RK3066
- Support 4096px width
sti:
- convert to struct drm_edid
stm:
- Avoid UAF wih managed plane and CRTC helpers
- Fix module owner
- Fix error handling in probe
- Depend on COMMON_CLK
- ltdc:
- Fix transparency after disabling plane
- Remove unused interrupt
tegra:
- gr3d: improve PM domain handling
- convert to struct drm_edid
- Call drm_atomic_helper_shutdown()
vc4:
- fix PM during detect
- replace DRM_ERROR() with drm_error()
- v3d: simplify clock retrieval
v3d:
- Clean up perfmon
virtio:
- add DRM capset"
* tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel: (1326 commits)
drm/xe: Fix missing conversion to xe_display_pm_runtime_resume
drm/xe/xe2hpg: Add Wa_15016589081
drm/xe: Don't keep stale pointer to bo->ggtt_node
drm/xe: fix missing 'xe_vm_put'
drm/xe: fix build warning with CONFIG_PM=n
drm/xe: Suppress missing outer rpm protection warning
drm/xe: prevent potential UAF in pf_provision_vf_ggtt()
drm/amd/display: Add all planes on CRTC to state for overlay cursor
drm/i915/bios: fix printk format width
drm/i915/display: Fix BMG CCS modifiers
drm/amdgpu: get rid of bogus includes of fdtable.h
drm/amdkfd: CRIU fixes
drm/amdgpu: fix a race in kfd_mem_export_dmabuf()
drm: new helper: drm_gem_prime_handle_to_dmabuf()
drm/amdgpu/atomfirmware: Silence UBSAN warning
drm/amdgpu: Fix kdoc entry in 'amdgpu_vm_cpu_prepare'
drm/amd/amdgpu: apply command submission parser for JPEG v1
drm/amd/amdgpu: apply command submission parser for JPEG v2+
drm/amd/pm: fix the pp_dpm_pcie issue on smu v14.0.2/3
drm/amd/pm: update the features set on smu v14.0.2/3
...
|
|
Backmerging to get fixes from v6.12-rc7.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
Thomas needs 5a498d4d06d6 ("drm/fbdev-dma: Only install deferred I/O
if necessary") in drm-misc, so start the backmerge cascade.
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
|
|
init() was removed from ramgp102 when reworking the memory detection, as
it was thought that the code was only necessary when the driver performs
mclk changes, which nouveau doesn't support on pascal.
However, it turns out that we still need to execute this on some GPUs to
restore settings after DEVINIT, so revert to the original behaviour.
v2: fix tags in commit message, cc stable
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/319
Fixes: 2c0c15a22fa0 ("drm/nouveau/fb/gp102-ga100: switch to simpler vram size detection method")
Cc: stable@vger.kernel.org # 6.6+
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240904232418.8590-1-bskeggs@nvidia.com
|