path: root/Documentation/atomic_t.txt
AgeCommit message (Collapse)Author
2020-06-29Documentation/litmus-tests/atomic: Add a test for smp_mb__after_atomic()Boqun Feng
We already use a litmus test in atomic_t.txt to describe atomic RMW + smp_mb__after_atomic() is stronger than acquire (both the read and the write parts are ordered). So make it a litmus test in atomic-tests directory, so that people can access the litmus easily. Additionally, change the processor numbers "P1, P2" to "P0, P1" in atomic_t.txt for the consistency with the processor numbers in the litmus test, which herd can handle. Acked-by: Alan Stern <> Acked-by: Andrea Parri <> Signed-off-by: Boqun Feng <> Reviewed-by: Joel Fernandes (Google) <> Signed-off-by: Paul E. McKenney <>
2020-06-29Documentation/litmus-tests/atomic: Add a test for atomic_set()Boqun Feng
We already use a litmus test in atomic_t.txt to describe the behavior of an atomic_set() with the an atomic RMW, so add it into atomic-tests directory to make it easily accessible for anyone who cares about the semantics of our atomic APIs. Besides currently the litmus test "atomic-set" in atomic_t.txt has a few things to be improved: 1) The CPU/Processor numbers "P1,P2" are not only inconsistent with the rest of the document, which uses "CPU0" and "CPU1", but also unacceptable by the herd tool, which requires processors start at "P0". 2) The initialization block uses a "atomic_set()", which is OK, but it's better to use ATOMIC_INIT() to make clear this is an initialization. 3) The return value of atomic_add_unless() is discarded inexplicitly, which is OK for C language, but it will be helpful to the herd tool if we use a void cast to make the discard explicit. 4) The name and the paragraph describing the test need to be more accurate and aligned with our wording in LKMM. Therefore fix these in both atomic_t.txt and the new added litmus test. Acked-by: Andrea Parri <> Acked-by: Alan Stern <> Signed-off-by: Boqun Feng <> Reviewed-by: Joel Fernandes (Google) <> Signed-off-by: Paul E. McKenney <>
2019-07-08Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git:// Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - rwsem scalability improvements, phase #2, by Waiman Long, which are rather impressive: "On a 2-socket 40-core 80-thread Skylake system with 40 reader and writer locking threads, the min/mean/max locking operations done in a 5-second testing window before the patchset were: 40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810 40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255 After the patchset, they became: 40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741 40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098" There's a lot of changes to the locking implementation that makes it similar to qrwlock, including owner handoff for more fair locking. Another microbenchmark shows how across the spectrum the improvements are: "With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 2-socket Skylake system with equal numbers of readers and writers (mixed) before and after this patchset were: # of Threads Before Patch After Patch ------------ ------------ ----------- 2 2,618 4,193 4 1,202 3,726 8 802 3,622 16 729 3,359 32 319 2,826 64 102 2,744" The changes are extensive and the patch-set has been through several iterations addressing various locking workloads. There might be more regressions, but unless they are pathological I believe we want to use this new implementation as the baseline going forward. - jump-label optimizations by Daniel Bristot de Oliveira: the primary motivation was to remove IPI disturbance of isolated RT-workload CPUs, which resulted in the implementation of batched jump-label updates. Beyond the improvement of the real-time characteristics kernel, in one test this patchset improved static key update overhead from 57 msecs to just 1.4 msecs - which is a nice speedup as well. - atomic64_t cross-arch type cleanups by Mark Rutland: over the last ~10 years of atomic64_t existence the various types used by the APIs only had to be self-consistent within each architecture - which means they became wildly inconsistent across architectures. Mark puts and end to this by reworking all the atomic64 implementations to use 's64' as the base type for atomic64_t, and to ensure that this type is consistently used for parameters and return values in the API, avoiding further problems in this area. - A large set of small improvements to lockdep by Yuyang Du: type cleanups, output cleanups, function return type and othr cleanups all around the place. - A set of percpu ops cleanups and fixes by Peter Zijlstra. - Misc other changes - please see the Git log for more details" * 'locking-core-for-linus' of git:// (82 commits) locking/lockdep: increase size of counters for lockdep statistics locking/atomics: Use sed(1) instead of non-standard head(1) option locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING x86/jump_label: Make tp_vec_nr static x86/percpu: Optimize raw_cpu_xchg() x86/percpu, sched/fair: Avoid local_clock() x86/percpu, x86/irq: Relax {set,get}_irq_regs() x86/percpu: Relax smp_processor_id() x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() locking/rwsem: Guard against making count negative locking/rwsem: Adaptive disabling of reader optimistic spinning locking/rwsem: Enable time-based spinning on reader-owned rwsem locking/rwsem: Make rwsem->owner an atomic_long_t locking/rwsem: Enable readers spinning on writer locking/rwsem: Clarify usage of owner's nonspinaable bit locking/rwsem: Wake up almost all readers in wait queue locking/rwsem: More optimal RT task handling of null owner locking/rwsem: Always release wait_lock before waking up tasks locking/rwsem: Implement lock handoff to prevent lock starvation locking/rwsem: Make rwsem_spin_on_owner() return owner state ...
2019-06-19Documentation: atomic_t.txt: Explain ordering provided by ↵Alan Stern
smp_mb__{before,after}_atomic() The description of smp_mb__before_atomic() and smp_mb__after_atomic() in Documentation/atomic_t.txt is slightly terse and misleading. It does not clearly state which other instructions are ordered by these barriers. This improves the text to make the actual ordering implications clear, and also to explain how these barriers differ from a RELEASE or ACQUIRE ordering. Signed-off-by: Alan Stern <> Cc: Jonathan Corbet <> Cc: Peter Zijlstra <> Acked-by: Andrea Parri <> Signed-off-by: Paul E. McKenney <>
2019-06-17x86/atomic: Fix smp_mb__{before,after}_atomic()Peter Zijlstra
Recent probing at the Linux Kernel Memory Model uncovered a 'surprise'. Strongly ordered architectures where the atomic RmW primitive implies full memory ordering and smp_mb__{before,after}_atomic() are a simple barrier() (such as x86) fail for: *x = 1; atomic_inc(u); smp_mb__after_atomic(); r0 = *y; Because, while the atomic_inc() implies memory order, it (surprisingly) does not provide a compiler barrier. This then allows the compiler to re-order like so: atomic_inc(u); *x = 1; smp_mb__after_atomic(); r0 = *y; Which the CPU is then allowed to re-order (under TSO rules) like: atomic_inc(u); r0 = *y; *x = 1; And this very much was not intended. Therefore strengthen the atomic RmW ops to include a compiler barrier. NOTE: atomic_{or,and,xor} and the bitops already had the compiler barrier. Signed-off-by: Peter Zijlstra (Intel) <> Cc: Linus Torvalds <> Cc: Peter Zijlstra <> Cc: Thomas Gleixner <> Signed-off-by: Ingo Molnar <>
2019-06-03Documentation/atomic_t.txt: Clarify pure non-rmw usagePeter Zijlstra
Clarify that pure non-RMW usage of atomic_t is pointless, there is nothing 'magical' about atomic_set() / atomic_read(). This is something that seems to confuse people, because I happen upon it semi-regularly. Signed-off-by: Peter Zijlstra (Intel) <> Reviewed-by: Greg Kroah-Hartman <> Acked-by: Will Deacon <> Cc: Linus Torvalds <> Cc: Peter Zijlstra <> Cc: Thomas Gleixner <> Link: Signed-off-by: Ingo Molnar <>
2019-03-18Documentation/atomic_t: Clarify signed vs unsignedPeter Zijlstra
Clarify the whole signed vs unsigned issue for atomic_t. There has been enough confusion on this topic to warrant a few explicit words I feel. Signed-off-by: Peter Zijlstra (Intel) <> Acked-by: Will Deacon <> Acked-by: Boqun Feng <> Signed-off-by: Paul E. McKenney <>
2017-08-25Documentation/locking/atomic: Finish the document...Peter Zijlstra
Julia reported that the document looked unfinished, and it is. I forgot to include the example cooked up by Paul here: and I added an explicit example showing how, while it is an ACQUIRE pattern, it really does provide an MB. Reported-by: Julia Cartwright <> Signed-off-by: Peter Zijlstra (Intel) <> Cc: Boqun Feng <> Cc: Linus Torvalds <> Cc: Paul E. McKenney <> Cc: Peter Zijlstra <> Cc: Thomas Gleixner <> Cc: Will Deacon <> Signed-off-by: Ingo Molnar <>
2017-08-10Documentation/locking/atomic: Add documents for new atomic_t APIsPeter Zijlstra
Since we've vastly expanded the atomic_t interface in recent years the existing documentation is woefully out of date and people seem to get confused a bit. Start a new document to hopefully better explain the current state of affairs. The old atomic_ops.txt also covers bitmaps and a few more details so this is not a full replacement and we'll therefore keep that document around until such a time that we've managed to write more text to cover its entire. Also please, ReST people, go away. Signed-off-by: Peter Zijlstra (Intel) <> Cc: Boqun Feng <> Cc: Linus Torvalds <> Cc: Paul McKenney <> Cc: Peter Zijlstra <> Cc: Randy Dunlap <> Cc: Thomas Gleixner <> Cc: Will Deacon <> Signed-off-by: Ingo Molnar <>