diff options
| author | Jisheng Zhang <jszhang@kernel.org> | 2024-03-25 19:10:38 +0800 | 
|---|---|---|
| committer | Palmer Dabbelt <palmer@rivosinc.com> | 2024-04-24 12:57:49 -0700 | 
| commit | 79d6e4eae9662b9103fecf94d52b44deca56743c (patch) | |
| tree | 23e09e4534c78b161086b2b1edd773defb04add9 /scripts/clang-tools/gen_compile_commands.py | |
| parent | eb1e5037294652ddf1437f62292c0727183f11ae (diff) | |
riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release}
After selecting ARCH_USE_CMPXCHG_LOCKREF, one straight futher
optimization is implementing the arch_cmpxchg64_relaxed() because the
lockref code does not need the cmpxchg to have barrier semantics. At
the same time, implement arch_cmpxchg64_acquire and
arch_cmpxchg64_release as well.
However, on both TH1520 and JH7110 platforms, I didn't see obvious
performance improvement with Linus' test case [1]. IMHO, this may
be related with the fence and lr.d/sc.d hw implementations. In theory,
lr/sc without fence could give performance improvement over lr/sc plus
fence, so add the code here to leave performance improvement room on
newer HW platforms.
Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1]
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20240325111038.1700-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Diffstat (limited to 'scripts/clang-tools/gen_compile_commands.py')
0 files changed, 0 insertions, 0 deletions
