Age | Commit message (Collapse) | Author |
|
Use a literal block for the EXPORT_SYMBOL_GPL_FOR_MODULES() example to
avoid a Docutils warning about unmatched '*'. This ensures correct rendering
and keeps the source readable.
Warning:
Documentation/core-api/symbol-namespaces.rst:90: WARNING: Inline emphasis start-string without end-string. [docutils]
Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux@gmail.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Since commit 7273ad2b08f8 ("kbuild: link lib-y objects to vmlinux
forcibly when CONFIG_MODULES=y"), all objects from lib-y have been
forcibly linked to vmlinux when CONFIG_MODULES=y.
To simplify future changes, this commit makes all objects from lib-y
be linked regardless of the CONFIG_MODULES setting.
Most use cases (CONFIG_MODULES=y) are not affected by this change.
The vmlinux size with ARCH=arm allnoconfig, where CONFIG_MODULES=n,
increases as follows:
text data bss dec hex filename
1368644 835104 206288 2410036 24c634 vmlinux.before
1379440 837064 206288 2422792 24f808 vmlinux.after
We no longer benefit from using static libraries, but the impact is
mitigated by supporting CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.
For example, the size of vmlinux remains almost the same with ARCH=arm
tinyconfig, where CONFIG_MODULES=n and
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y.
text data bss dec hex filename
455316 93404 15472 564192 89be0 vmlinux.before
455312 93404 15472 564188 89bdc vmlinux.after
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This CONFIG option, if supported by the architecture, helps reduce the
size of vmlinux.
For example, the size of vmlinux with ARCH=arm tinyconfig decreases as
follows:
text data bss dec hex filename
631684 104500 18176 754360 b82b8 vmlinux.before
455316 93404 15472 564192 89be0 vmlinux.after
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Fix three issues introduced into device suspend/resume error paths in
the PM core by some of the recent updates.
First off, replace list_splice() with list_splice_init() in three
places in device suspend error paths to avoid attempting to use an
uninitialized list head going forward.
Second, rearrange device_resume() to avoid leaking the
power.is_suspended device PM flag to the next system suspend/resume
cycle where it can confuse rolling back after an error or early
wakeup.
Finally, add synchronization to dpm_async_resume_children() to avoid
resetting the async state mistakenly for devices whose resume
callbacks have already been queued up for asynchronous execution in
the given device resume phase, which fortunately can happen only if
the preceding system suspend transition has been aborted"
* tag 'pm-6.16-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Add locking to dpm_async_resume_children()
PM: sleep: Fix power.is_suspended cleanup for direct-complete devices
PM: sleep: Fix list splicing in device suspend error paths
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN, wireless, Bluetooth, and Netfilter.
Current release - regressions:
- Revert "kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in
all_tests", makes kunit error out if compiler is old
- wifi: iwlwifi: mvm: fix assert on suspend
- rxrpc: fix return from none_validate_challenge()
Current release - new code bugs:
- ovpn: couple of fixes for socket cleanup and UDP-tunnel teardown
- can: kvaser_pciefd: refine error prone echo_skb_max handling logic
- fix net_devmem_bind_dmabuf() stub when DEVMEM not compiled
- eth: airoha: fixes for config / accel in bridge mode
Previous releases - regressions:
- Bluetooth: hci_qca: move the SoC type check to the right place, fix
GPIO integration
- prevent a NULL deref in rtnl_create_link() after locking changes
- fix udp gso skb_segment after pull from frag_list
- hv_netvsc: fix potential deadlock in netvsc_vf_setxdp()
Previous releases - always broken:
- netfilter:
- nf_nat: also check reverse tuple to obtain clashing entry
- nf_set_pipapo_avx2: fix initial map fill (zeroing)
- fix the helper for incremental update of packet checksums after
modifying the IP address, used by ILA and BPF
- eth:
- stmmac: prevent div by 0 when clock rate is misconfigured
- ice: fix Tx scheduler handling of XDP and changing queue count
- eth: fix support for the RGMII interface when delays configured"
* tag 'net-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
calipso: unlock rcu before returning -EAFNOSUPPORT
seg6: Fix validation of nexthop addresses
net: prevent a NULL deref in rtnl_create_link()
net: annotate data-races around cleanup_net_task
selftests: drv-net: tso: make bkg() wait for socat to quit
selftests: drv-net: tso: fix the GRE device name
selftests: drv-net: add configs for the TSO test
wireguard: device: enable threaded NAPI
netlink: specs: rt-link: decode ip6gre
netlink: specs: rt-link: add missing byte-order properties
net: wwan: mhi_wwan_mbim: use correct mux_id for multiplexing
wifi: cfg80211/mac80211: correctly parse S1G beacon optional elements
net: dsa: b53: do not touch DLL_IQQD on bcm53115
net: dsa: b53: allow RGMII for bcm63xx RGMII ports
net: dsa: b53: do not configure bcm63xx's IMP port interface
net: dsa: b53: do not enable RGMII delay on bcm63xx
net: dsa: b53: do not enable EEE on bcm63xx
net: ti: icssg-prueth: Fix swapped TX stats for MII interfaces.
selftests: netfilter: nft_nat.sh: add test for reverse clash with nat
netfilter: nf_nat: also check reverse tuple to obtain clashing entry
...
|
|
This reports as a warning in linux-next along the lines of
Documentation/arch/riscv/cmodx.rst:14: WARNING: Title underline too short.
CMODX in the Kernel Space
--------------------- [docutils]
Documentation/arch/riscv/cmodx.rst:43: WARNING: Title underline too short.
CMODX in the User Space
--------------------- [docutils]
Documentation/arch/riscv/cmodx.rst:43: WARNING: Title underline too short.
CMODX in the User Space
--------------------- [docutils]
Link: https://lore.kernel.org/all/20250603154544.1602a8b5@canb.auug.org.au/
Fixes: 0e07200b2af6 ("riscv: Documentation: add a description about dynamic ftrace")
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20250603172856.49925-1-palmer@dabbelt.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into for-next
riscv patches for 6.16-rc1
* Implement atomic patching support for ftrace which finally allows to
get rid of stop_machine().
* Support for kexec_file_load() syscall
* Improve module loading time by changing the algorithm that counts the
number of plt/got entries in a module.
* Zicbop is now used in the kernel to prefetch instructions
[Palmer: There's been two rounds of surgery on this one, so as a result
it's a bit different than the PR.]
* alex-pr: (734 commits)
riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE
MAINTAINERS: Update Atish's email address
riscv: hwprobe: export Zabha extension
riscv: Make regs_irqs_disabled() more clear
perf symbols: Ignore mapping symbols on riscv
RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND
riscv: module: Optimize PLT/GOT entry counting
riscv: Add support for PUD THP
riscv: xchg: Prefetch the destination word for sc.w
riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
riscv: Add support for Zicbop
riscv: Introduce Zicbop instructions
riscv/kexec_file: Fix comment in purgatory relocator
riscv: kexec_file: Support loading Image binary file
riscv: kexec_file: Split the loading of kernel and others
riscv: Documentation: add a description about dynamic ftrace
riscv: ftrace: support direct call using call_ops
riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
riscv: ftrace: support PREEMPT
riscv: add a data fence for CMODX in the kernel mode
...
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Fix a couple of spelling issues plus some minor details on the grammar.
Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com>
Link: https://lore.kernel.org/r/20250501130309.14803-1-mikisabate@gmail.com
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
My personal upstream email account was previously based on gmail which
has become difficult to manage upstream activities lately.
Update it to the more reliable linux.dev account.
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250505-update_email_address-v1-1-1c24db506fdb@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Alexandre Ghiti <alexghiti@rivosinc.com> says:
I found this lost series developed by Guo so here is a respin with the
comments on v2 applied.
This patch series adds Zicbop support and then enables the Linux
prefetch features.
* patches from https://lore.kernel.org/r/20250421142441.395849-1-alexghiti@rivosinc.com:
riscv: xchg: Prefetch the destination word for sc.w
riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
riscv: Add support for Zicbop
riscv: Introduce Zicbop instructions
Link: https://lore.kernel.org/r/20250421142441.395849-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
I am volunteering to maintain the kernel's crypto library code.
[ And Jason and Ard piped up too - Linus ]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Johannes Berg:
"The only really new thing is the long-standing seccomp work
(originally from 2021!). Wven if it still isn't enabled by default due
to security concerns it can still be used e.g. for tests.
- remove obsolete network transports
- remove PCI IO port support
- start adding seccomp-based process handling instead of ptrace"
* tag 'uml-for-linux-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (29 commits)
um: remove "extern" from implementation of sigchld_handler
um: fix unused variable warning
um: fix SECCOMP 32bit xstate register restore
um: pass FD for memory operations when needed
um: Add SECCOMP support detection and initialization
um: Implement kernel side of SECCOMP based process handling
um: Track userspace children dying in SECCOMP mode
um: Add helper functions to get/set state for SECCOMP
um: Add stub side of SECCOMP/futex based process handling
um: Move faultinfo extraction into userspace routine
um: vector: Use mac_pton() for MAC address parsing
um: vector: Clean up and modernize log messages
um: chan_kern: use raw spinlock for irqs_to_free_lock
MAINTAINERS: remove obsolete file entry in TUN/TAP DRIVER
um: Fix tgkill compile error on old host OSes
um: stop using PCI port I/O
um: Remove legacy network transport infrastructure
um: vector: Eliminate the dependency on uml_net
um: Remove obsolete legacy network transports
um/asm: Replace "REP; NOP" with PAUSE mnemonic
...
|
|
Doing misaligned access to userspace memory would make a trap on
platform where it is emulated. Latest fixes removed the kernel
capability to do unaligned accesses to userspace memory safely since
interrupts are kept disabled at all time during that. Thus doing so
would crash the kernel.
Such behavior was detected with GET_UNALIGN_CTL() that was doing
a put_user() with an unsigned long* address that should have been an
unsigned int*. Reenabling kernel misaligned access emulation is a bit
risky and it would also degrade performances. Rather than doing that,
we will try to avoid any misaligned accessed by using copy_from/to_user()
which does not do any misaligned accesses. This can be done only for
!CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and thus allows to only generate
a bit more code for this config.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250602193918.868962-4-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"We've got a couple of build fixes when using LLD, a missing TLB
invalidation and a workaround for broken firmware on SoCs with CPUs
that implement MPAM:
- Disable problematic linker assertions for broken versions of LLD
- Work around sporadic link failure with LLD and various randconfig
builds
- Fix missing invalidation in the TLB batching code when reclaim
races with mprotect() and friends
- Add a command-line override for MPAM to allow booting on systems
with broken firmware"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Add override for MPAM
arm64/mm: Close theoretical race where stale TLB entry remains valid
arm64: Work around convergence issue with LLD linker
arm64: Disable LLD linker ASSERT()s for the time being
|
|
The specification of prctl() for GET_UNALIGN_CTL states that the value is
returned in an unsigned int * address passed as an unsigned long. Change
the type to match that and avoid an unaligned access as well.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250602193918.868962-3-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The current implementation is underperforming and in addition, it
triggers misaligned access traps on platforms which do not handle
misaligned accesses in hardware.
Use the existing assembly routines to solve both problems at once.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250602193918.868962-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Pull ARM fixes from Russell King:
- Fix arch_memremap_can_ram_remap() which incorrectly passed a PFN to
memblock_is_map_memory rather than the actual address.
- Disallow kernel mode NEON when IRQs are disabled
Explanation:
"To avoid having to preserve/restore kernel mode NEON state when
such a softirq is taken softirqs are now disabled when using the
NEON from task context."
should explain that it's nested kernel mode.
In other words, softirqs from user mode are fine, because the context
will be preserved. softirqs from kernel mode may be from a context
that has already saved the user NEON state, and thus we would need to
preserve the NEON state for the parent kernel mode context, and this
we don't allow.
The problem occurs when the kernel context disables hard IRQs, and
then uses NEON. When it's finished, and restores the userspace NEON
state, we call local_bh_enable() with hard IRQs disabled, which
causes a warning.
This commit addresses that by disallowing the use of NEON with hard
IRQs disabled.
https://lore.kernel.org/all/20250516231858.27899-4-ebiggers@kernel.org/T/#m104841b6e9346b1814c8b0fb9f2340551b0cd3e8
has some further context
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9446/1: Disallow kernel mode NEON when IRQs are disabled
ARM: 9447/1: arm/memremap: fix arch_memremap_can_ram_remap()
|
|
Export Zabha through the hwprobe syscall.
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20250421141413.394444-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The return value of regs_irqs_disabled() is true or false, so change
its type to reflect that and also make it always inline.
Suggested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250422113156.25742-1-yangtiezhu@loongson.cn
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
RISCV ELF use mapping symbols with special names $x, $d to
identify regions of RISCV code or code with different ISAs[1].
These symbols don't identify functions, so will confuse the
perf output.
The patch filters out these symbols at load time, similar to
"4886f2ca perf symbols: Ignore mapping symbols on aarch64".
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/
master/riscv-elf.adoc#mapping-symbol
Signed-off-by: Haibo Xu <haibo1.xu@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250409025202.201046-1-haibo1.xu@intel.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Björn Töpel <bjorn@kernel.org> says:
From: Björn Töpel <bjorn@rivosinc.com>
Hi!
For over a year ago, Daniel and I was testing the V2 of Song's series.
I also promised to take the V2, that had been sitting on the lists for
too long, to rebase it on a new kernel, and re-test it.
One year later, here's the V3! ;-)
There are no changes from V2 other, than some simple checkpatch
cleanups.
Song's original cover:
| This series makes the kexec_file_load() syscall support to load
| Image binary file. At the same time, corresponding support for
| kexec-tools had been pushed to my repo[2].
|
| Now, we can leverage that kexec-tools and this series to use the
| kexec_load() or kexec_file_load() syscall to boot both vmlinux and
| Image file, as seen in these combo tests:
|
| ```
| 1. kexec -l vmlinux
| 2. kexec -l Image
| 3. kexec -s -l vmlinux
| 4. kexec -s -l Image
| ```
Notably, kexec-tools has still not made it upstream. I've prepared a
branch on my GH [3], that I indend to post ASAP. That branch is a
collection of fixes/features, including Song's userland Image loading.
The V2 is here [2], and V1 [1].
I've tested the kexec-file/Image on qemu-rv64, with following
combinations:
* ACPI/UEFI
* DT/UEFI
* DT
both "regular" kexec (-s + -e), and crashkernels (-p).
Note that there are two purgatory patches that has to be present (part
of -rc1, so all good):
commit 28093cfef5dd ("riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator")
commit 3f7023171df4 ("riscv/purgatory: 4B align purgatory_start")
[1] https://lore.kernel.org/linux-riscv/20230914020044.1397356-1-songshuaishuai@tinylab.org/
[2] https://lore.kernel.org/linux-riscv/20231016092006.3347632-1-songshuaishuai@tinylab.org/
[3] https://github.com/bjoto/kexec-tools/tree/rv-on-master
* patches from https://lore.kernel.org/r/20250409193004.643839-1-bjorn@kernel.org:
riscv: kexec_file: Support loading Image binary file
riscv: kexec_file: Split the loading of kernel and others
Link: https://lore.kernel.org/r/20250409193004.643839-1-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
It is the built-in command line appended to the bootloader command line,
not the bootloader command line appended to the built-in command line.
Fixes: 3aed8c43267e ("RISC-V: Update Kconfig to better handle CMDLINE")
Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com>
Link: https://lore.kernel.org/r/tencent_A93C7FB46BFD20054AD2FEF4645913FF550A@qq.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
perf reports that 99.63% of the cycles from `modprobe amdgpu` are spent
inside module_frob_arch_sections(). This is because amdgpu.ko contains
about 300000 relocations in its .rela.text section, and the algorithm in
count_max_entries() takes quadratic time.
Apply two optimizations from the arm64 code, which together reduce the
total execution time by 99.58%. First, sort the relocations so duplicate
entries are adjacent. Second, reduce the number of relocations that must
be sorted by filtering to only relocations that need PLT/GOT entries, as
done in commit d4e0340919fb ("arm64/module: Optimize module load time by
optimizing PLT counting").
Unlike the arm64 code, here the filtering and sorting is done in a
scratch buffer, because the HI20 relocation search optimization in
apply_relocate_add() depends on the original order of the relocations.
This allows accumulating PLT/GOT relocations across sections so sorting
and counting is only done once per module.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250409171526.862481-3-samuel.holland@sifive.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Andy Chiu <andybnac@gmail.com> says:
This series makes atomic code patching in ftrace possible and eliminates
the need of the stop_machine dance. The major difference of this version
is that we merge the CALL_OPS support from Puranjay [1] and make direct
calls available for practical uses such as BPF. Thanks for the time
reviewing the series and suggestions, we hope this version gets a step
closer to happening in the upstream.
Please reference the link to v3 below for more introductory view of the
implementation [2]
Added patch: 2, 4, 10, 11, 12
Modified patch: 5, 6
Unchanged patch: 1, 3, 7, 8, 9
(1, 8 has commit msg modified)
Special thanks to Björn for his efforts on testing and guiding the
series!
[1]: https://lore.kernel.org/lkml/20240306165904.108141-1-puranjay12@gmail.com/
[2]: https://lore.kernel.org/linux-riscv/20241127172908.17149-1-andybnac@gmail.com/
* patches from https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com:
riscv: Documentation: add a description about dynamic ftrace
riscv: ftrace: support direct call using call_ops
riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
riscv: ftrace: support PREEMPT
riscv: add a data fence for CMODX in the kernel mode
riscv: vector: Support calling schedule() for preemptible Vector
riscv: ftrace: do not use stop_machine to update code
riscv: ftrace: prepare ftrace for atomic code patching
kernel: ftrace: export ftrace_sync_ipi
riscv: ftrace: align patchable functions to 4 Byte boundary
riscv: ftrace factor out code defined by !WITH_ARG
riscv: ftrace: support fastcc in Clang for WITH_ARGS
Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
|
|
Add the necessary page table functions to deal with PUD THP, this
enables the use of PUD pfnmap.
Link: https://lore.kernel.org/r/20250321123954.225097-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.
This patch makes use of prefetch.w to prefetch cachelines for write
prior to lr/sc loops when using the xchg_small atomic routine.
This patch is inspired by commit 0ea366f5e1b6 ("arm64: atomics:
prefetch the destination word for write prior to stxr").
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231231082955.16516-4-guoren@kernel.org
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-5-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Enable Linux prefetch and prefetchw primitives using Zicbop.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231231082955.16516-3-guoren@kernel.org
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-4-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Zicbop introduces cache blocks prefetching instructions, add the
necessary support for the kernel to use it in the coming commits.
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-3-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The S-type instructions are first introduced and then used to define the
encoding of the Zicbop prefetching instructions.
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-2-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Apparently sec_base doesn't mean relocated symbol value, which seems a
copy-pasting error in the comment. Assigned with the address of section
indexed by sym->st_shndx, it should represent base address of the
relevant section. Let's fix the comment to avoid possible confusion.
Fixes: 838b3e28488f ("RISC-V: Load purgatory in kexec_file")
Signed-off-by: Yao Zi <ziyao@disroot.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250326073450.57648-2-ziyao@disroot.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This patch creates image_kexec_ops to load Image binary file
for kexec_file_load() syscall.
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250409193004.643839-3-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This is the preparative patch for kexec_file_load Image support.
It separates the elf_kexec_load() as two parts:
- the first part loads the vmlinux (or Image)
- the second part loads other segments (e.g. initrd,fdt,purgatory)
And the second part is exported as the load_extra_segments() function
which would be used in both kexec-elf.c and kexec-image.c.
No functional change intended.
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250409193004.643839-2-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Add a section in cmodx to describe how dynamic ftrace works on riscv,
limitations, and assumptions.
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-12-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
jump to FTRACE_ADDR if distance is out of reach
Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-11-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.
The idea and most of the implementation is taken from the ARM64's
implementation of the same feature. The idea is to place a pointer to
the ftrace_ops as a literal at a fixed offset from the function entry
point, which can be recovered by the common ftrace trampoline.
We use -fpatchable-function-entry to reserve 8 bytes above the function
entry by emitting 2 4 byte or 4 2 byte nops depending on the presence of
CONFIG_RISCV_ISA_C. These 8 bytes are patched at runtime with a pointer
to the associated ftrace_ops for that callsite. Functions are aligned to
8 bytes to make sure that the accesses to this literal are atomic.
This approach allows for directly invoking ftrace_ops::func even for
ftrace_ops which are dynamically-allocated (or part of a module),
without going via ftrace_ops_list_func.
We've benchamrked this with the ftrace_ops sample module on Spacemit K1
Jupiter:
Without this patch:
baseline (Linux rivos 6.14.0-09584-g7d06015d936c #3 SMP Sat Mar 29
+-----------------------+-----------------+----------------------------+
| Number of tracers | Total time (ns) | Per-call average time |
|-----------------------+-----------------+----------------------------|
| Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) |
|----------+------------+-----------------+------------+---------------|
| 0 | 0 | 1357958 | 13 | - |
| 0 | 1 | 1302375 | 13 | - |
| 0 | 2 | 1302375 | 13 | - |
| 0 | 10 | 1379084 | 13 | - |
| 0 | 100 | 1302458 | 13 | - |
| 0 | 200 | 1302333 | 13 | - |
|----------+------------+-----------------+------------+---------------|
| 1 | 0 | 13677833 | 136 | 123 |
| 1 | 1 | 18500916 | 185 | 172 |
| 1 | 2 | 22856459 | 228 | 215 |
| 1 | 10 | 58824709 | 588 | 575 |
| 1 | 100 | 505141584 | 5051 | 5038 |
| 1 | 200 | 1580473126 | 15804 | 15791 |
|----------+------------+-----------------+------------+---------------|
| 1 | 0 | 13561000 | 135 | 122 |
| 2 | 0 | 19707292 | 197 | 184 |
| 10 | 0 | 67774750 | 677 | 664 |
| 100 | 0 | 714123125 | 7141 | 7128 |
| 200 | 0 | 1918065668 | 19180 | 19167 |
+----------+------------+-----------------+------------+---------------+
Note: per-call overhead is estimated relative to the baseline case with
0 relevant tracers and 0 irrelevant tracers.
With this patch:
v4-rc4 (Linux rivos 6.14.0-09598-gd75747611c93 #4 SMP Sat Mar 29
+-----------------------+-----------------+----------------------------+
| Number of tracers | Total time (ns) | Per-call average time |
|-----------------------+-----------------+----------------------------|
| Relevant | Irrelevant | 100000 calls | Total (ns) | Overhead (ns) |
|----------+------------+-----------------+------------+---------------|
| 0 | 0 | 1459917 | 14 | - |
| 0 | 1 | 1408000 | 14 | - |
| 0 | 2 | 1383792 | 13 | - |
| 0 | 10 | 1430709 | 14 | - |
| 0 | 100 | 1383791 | 13 | - |
| 0 | 200 | 1383750 | 13 | - |
|----------+------------+-----------------+------------+---------------|
| 1 | 0 | 5238041 | 52 | 38 |
| 1 | 1 | 5228542 | 52 | 38 |
| 1 | 2 | 5325917 | 53 | 40 |
| 1 | 10 | 5299667 | 52 | 38 |
| 1 | 100 | 5245250 | 52 | 39 |
| 1 | 200 | 5238459 | 52 | 39 |
|----------+------------+-----------------+------------+---------------|
| 1 | 0 | 5239083 | 52 | 38 |
| 2 | 0 | 19449417 | 194 | 181 |
| 10 | 0 | 67718584 | 677 | 663 |
| 100 | 0 | 709840708 | 7098 | 7085 |
| 200 | 0 | 2203580626 | 22035 | 22022 |
+----------+------------+-----------------+------------+---------------+
Note: per-call overhead is estimated relative to the baseline case with
0 relevant tracers and 0 irrelevant tracers.
As can be seen from the above:
a) Whenever there is a single relevant tracer function associated with a
tracee, the overhead of invoking the tracer is constant, and does not
scale with the number of tracers which are *not* associated with that
tracee.
b) The overhead for a single relevant tracer has dropped to ~1/3 of the
overhead prior to this series (from 122ns to 38ns). This is largely
due to permitting calls to dynamically-allocated ftrace_ops without
going through ftrace_ops_list_func.
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
[update kconfig, asm, refactor]
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-10-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Now, we can safely enable dynamic ftrace with kernel preemption.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-9-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
RISC-V spec explicitly calls out that a local fence.i is not enough for
the code modification to be visble from a remote hart. In fact, it
states:
To make a store to instruction memory visible to all RISC-V harts, the
writing hart also has to execute a data FENCE before requesting that all
remote RISC-V harts execute a FENCE.I.
Although current riscv drivers for IPI use ordered MMIO when sending IPIs
in order to synchronize the action between previous csd writes, riscv
does not restrict itself to any particular flavor of IPI. Any driver or
firmware implementation that does not order data writes before the IPI
may pose a risk for code-modifying race.
Thus, add a fence here to order data writes before making the IPI.
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-8-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Each function entry implies a call to ftrace infrastructure. And it may
call into schedule in some cases. So, it is possible for preemptible
kernel-mode Vector to implicitly call into schedule. Since all V-regs
are caller-saved, it is possible to drop all V context when a thread
voluntarily call schedule(). Besides, we currently don't pass argument
through vector register, so we don't have to save/restore V-regs in
ftrace trampoline.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-7-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Now it is safe to remove dependency from stop_machine() for us to patch
code in ftrace.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-6-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
instruction fetch can break down to 4 byte at a time, it is impossible
to update two instructions without a race. In order to mitigate it, we
initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
patching can change NOP4 to JALR to eable/disable ftrcae from a
function. This limits the reach of each ftrace entry to +-2KB displacing
from ftrace_caller.
Starting from the trampoline, we add a level of indirection for it to
reach ftrace caller target. Now, it loads the target address from a
memory location, then perform the jump. This enable the kernel to update
the target atomically.
The new don't-stop-the-world text patching on change only one RISC-V
instruction:
| -8: &ftrace_ops of the associated tracer function.
| <ftrace enable>:
| 0: auipc t0, hi(ftrace_caller)
| 4: jalr t0, lo(ftrace_caller)
|
| -8: &ftrace_nop_ops
| <ftrace disable>:
| 0: auipc t0, hi(ftrace_caller)
| 4: nop
This means that f+0x0 is fixed, and should not be claimed by ftrace,
e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
offset and MCOUNT_INSN_SIZE accordingly.
[ alex: Fix build errors with !CONFIG_DYNAMIC_FTRACE ]
Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-5-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The following ftrace patch for riscv uses a data store to update ftrace
function. Therefore, a romote fence is required to order it against
function_trace_op updates. The mechanism is similar to the fence between
function_trace_op and update_ftrace_func in the generic ftrace, so we
leverage the same ftrace_sync_ipi function.
[ alex: Fix build warning when !CONFIG_DYNAMIC_FTRACE ]
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-4-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
We are changing ftrace code patching in order to remove dependency from
stop_machine() and enable kernel preemption. This requires us to align
functions entry at a 4-B align address.
However, -falign-functions on older versions of GCC alone was not strong
enoungh to align all functions. In fact, cold functions are not aligned
after turning on optimizations. We consider this is a bug in GCC and
turn off guess-branch-probility as a workaround to align all functions.
GCC bug id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
The option -fmin-function-alignment is able to align all functions
properly on newer versions of gcc. So, we add a cc-option to test if
the toolchain supports it.
Suggested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-3-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
DYNAMIC_FTRACE selects DYNAMIC_FTRACE_WITH_ARGS and mcount-dyn.S in
riscv, so we can remove ifdef jargons of WITH_ARG when it is known that
DYNAMIC_FTRACE is true.
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-2-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Some caller-saved registers which are not defined as function arguments
in the ABI can still be passed as arguments when the kernel is compiled
with Clang. As a result, we must save and restore those registers to
prevent ftrace from clobbering them.
- [1]: https://reviews.llvm.org/D68559
Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/
Fixes: 7caa9765465f ("ftrace: riscv: move from REGS to ARGS")
Acked-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
There is unmatched xe_vm_unlock() in the __xe_exec_queue_init().
Leftover from commit fbeaad071a98 ("drm/xe: Create LRC BO without VM")
Fixes: 2b0a0ce0c20b ("drm/xe: Create LRC BO without VM")
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://lore.kernel.org/r/20250530135627.2821612-1-maciej.patelczyk@intel.com
(cherry picked from commit 28b996ce73982a44fa86736ca0e3684cb1ae8b24)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Specifying VM during lrc->bo creation requires VM's reference
to be held for the lifetime of lrc->bo as it will use VM's dma
reservation object. Using VM's dma reservation object for
lrc->bo doesn't provide any advantage. Hence do not pass VM
while creating lrc->bo.
v2: Use xe_bo_unpin_map_no_vm (Matthew Brost)
Fixes: 264eecdba211 ("drm/xe: Decouple xe_exec_queue and xe_lrc")
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250529052031.2429120-2-niranjana.vishwanathapura@intel.com
(cherry picked from commit fbeaad071a98fef87deccee81d564de1c8e8e16d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Daniele noticed that the fix in commit 2d2be279f1ca ("drm/xe: fix UAF
around queue destruction") looks to have been unintentionally removed as
part of handling a conflict in some past merge commit. Add it back.
Fixes: ac44ff7cec33 ("Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes")
Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250603174213.1543579-2-matthew.auld@intel.com
(cherry picked from commit 9d9fca62dc49d96f97045b6d8e7402a95f8cf92a)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The expected flow of operations when using PXP is to query the PXP
status and wait for it to transition to "ready" before attempting to
create an exec_queue. This flow is followed by the Mesa driver, but
there is no guarantee that an incorrectly coded (or malicious) app
will not attempt to create the queue first without querying the status.
Therefore, we need to clarify what the expected behavior of the queue
creation ioctl is in this scenario.
Currently, the ioctl always fails with an -EBUSY code no matter the
error, but for consistency it is better to distinguish between "failed
to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl
does. Note that, while this is a change in the return code of an ioctl,
the behavior of the ioctl in this particular corner case was not clearly
spec'd, so no one should have been relying on it (and we know that Mesa,
which is the only known userspace for this, didn't).
v2: Minor rework of the doc (Rodrigo)
Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
(cherry picked from commit 21784ca96025b62d95b670b7639ad70ddafa69b8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
The define of the extension type was accidentally used instead of the
one of the property itself. They're both zero, so no functional issue,
but we should use the correct define for code correctness.
Fixes: 41a97c4a1294 ("drm/xe/pxp/uapi: Add API to mark a BO as using PXP")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250522225401.3953243-6-daniele.ceraolospurio@intel.com
(cherry picked from commit 1d891ee820fd0fbb4101eacb0d922b5050a24933)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
Customer is reporting a really subtle issue where we get random DMAR
faults, hangs and other nasties for kernel migration jobs when stressing
stuff like s2idle/s3/s4. The explosions seems to happen somewhere
after resuming the system with splats looking something like:
PM: suspend exit
rfkill: input handler disabled
xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=bcs, logical_mask: 0x2, guc_id=0
xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=24496, lrc_seqno=24496, guc_id=0, flags=0x13 in no process [-1]
xe 0000:00:02.0: [drm] GT0: Kernel-submitted job timed out
The likely cause appears to be a race between suspend cancelling the
worker that processes the free_job()'s, such that we still have pending
jobs to be freed after the cancel. Following from this, on resume the
pending_list will now contain at least one already complete job, but it
looks like we call drm_sched_resubmit_jobs(), which will then call
run_job() on everything still on the pending_list. But if the job was
already complete, then all the resources tied to the job, like the bb
itself, any memory that is being accessed, the iommu mappings etc. might
be long gone since those are usually tied to the fence signalling.
This scenario can be seen in ftrace when running a slightly modified
xe_pm IGT (kernel was only modified to inject artificial latency into
free_job to make the race easier to hit):
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x4
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=0, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 1:0x1, gt=1, width=1, guc_id=1, guc_state=0x0, flags=0x3
xe_exec_queue_stop: dev=0000:00:02.0, 4:0x1, gt=1, width=1, guc_id=2, guc_state=0x0, flags=0x3
xe_exec_queue_resubmit: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
xe_sched_job_run: dev=0000:00:02.0, fence=0xffff888276cc8540, seqno=0, lrc_seqno=0, gt=0, guc_id=0, batch_addr=0x000000146910 ...
.....
xe_exec_queue_memory_cat_error: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x3, flags=0x13
So the job_run() is clearly triggered twice for the same job, even
though the first must have already signalled to completion during
suspend. We can also see a CAT error after the re-submit.
To prevent this only resubmit jobs on the pending_list that have not yet
signalled.
v2:
- Make sure to re-arm the fence callbacks with sched_start().
v3 (Matt B):
- Stop using drm_sched_resubmit_jobs(), which appears to be deprecated
and just open-code a simple loop such that we skip calling run_job()
on anything already signalled.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4856
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: William Tseng <william.tseng@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20250528113328.289392-2-matthew.auld@intel.com
(cherry picked from commit 38fafa9f392f3110d2de431432d43f4eef99cd1b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|