summaryrefslogtreecommitdiff
path: root/kernel/rcu
AgeCommit message (Collapse)Author
2023-01-03srcu: Update comment after the index flipPaul E. McKenney
Because there is not guaranteed to be a full memory barrier between the ->srcu_unlock_count increment of an srcu_read_unlock() and the ->srcu_lock_count increment of the next srcu_read_lock(), this next srcu_read_lock() is not guaranteed to see the effect of the index flip just prior to this comment. However, this next srcu_read_lock() will execute a full memory barrier, so the srcu_read_lock() after that is guaranteed to see that index flip. This guarantee is illustrated by the following diagram of events and the litmus test following that. ------------------------------------------------------------------------ READER UPDATER ------------- ---------- // idx is initially 0. srcu_flip() { smp_mb(); // RSCS srcu_read_unlock() { smp_mb(); idx++; // P smp_mb(); // QQ } srcu_readers_unlock_idx(0) { ,--counted------------ count all unlock[0]; // Q | unlock[0]++; // X } smp_mb(); srcu_read_lock() { READ(idx) = 0; ,---- count all lock[0]; // contributes imbalance of 1. lock[0]++; ----counted | smp_mb(); // PP } | } | | // RSCS not going to effect above scan | srcu_read_unlock() { | smp_mb(); | unlock[0]++; | } | / / srcu_read_lock() { | READ(idx); // Y -----cannot be counted because of P (has to sample idx as 1) lock[1]++; ... } ------------------------------------------------------------------------ This makes it similar to the store buffer pattern. Using X, Y, P and Q annotated above, we get: ------------------------------------------------------------------------ READER UPDATER X (write) P (write) smp_mb(); //PP smp_mb(); //QQ Y (read) Q (read) ------------------------------------------------------------------------ ASCII art courtesy of Joel Fernandes. Reported-by: Joel Fernandes <joel@joelfernandes.org> Reported-by: Boqun Feng <boqun.feng@gmail.com> Reported-by: Frederic Weisbecker <frederic@kernel.org> Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03srcu: Yet more detail for srcu_readers_active_idx_check() commentsPaul E. McKenney
The comment in srcu_readers_active_idx_check() following the smp_mb() is out of date, hailing from a simpler time when preemption was disabled across the bulk of __srcu_read_lock(). The fact that preemption was disabled meant that the number of tasks that had fetched the old index but not yet incremented counters was limited by the number of CPUs. In our more complex modern times, the number of CPUs is no longer a limit. This commit therefore updates this comment, additionally giving more memory-ordering detail. [ paulmck: Apply Nt->Nc feedback from Joel Fernandes. ] Reported-by: Boqun Feng <boqun.feng@gmail.com> Reported-by: Frederic Weisbecker <frederic@kernel.org> Reported-by: "Joel Fernandes (Google)" <joel@joelfernandes.org> Reported-by: Neeraj Upadhyay <neeraj.iitr10@gmail.com> Reported-by: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03srcu: Remove needless rcu_seq_done() check while holding read lockPingfan Liu
The srcu_gp_start_if_needed() function now read-holds the srcu_struct whose grace period is being started, which means that the corresponding SRCU grace period cannot end. This in turn means that the SRCU grace-period sequence number returned by rcu_seq_snap() cannot expire during this time. And that means that the calls to rcu_seq_done() in srcu_funnel_exp_start() and srcu_funnel_gp_start() can never return true. This commit therefore removes these rcu_seq_done() checks, but adds checks in kernels built with CONFIG_PROVE_RCU=y that splats if rcu_seq_done() does somehow return true. [ paulmck: Rearrange checks to handle kernels built with lockdep. ] Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> To: rcu@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu: Add test code for semaphore-like SRCU readersPaul E. McKenney
This commit adds trivial test code for srcu_down_read() and srcu_up_read(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03srcu: Fix the comparision in srcu_invl_snp_seq()Pingfan Liu
A grace-period sequence number contains two fields: counter and state. SRCU_SNP_INIT_SEQ provides a guaranteed invalid value for grace-period sequence numbers in newly allocated srcu_node structures' ->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp fields. The point of the comparison in srcu_invl_snp_seq() is not to detect invalid grace-period sequence numbers in general, but rather to detect a newly allocated srcu_node structure whose ->srcu_have_cbs[] and ->srcu_gp_seq_needed_exp fields need to be brought into line with the srcu_struct structure's ->srcu_gp_seq field. This commit therefore causes srcu_invl_snp_seq() to compare both fields of the specified grace-period sequence number. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: <rcu@vger.kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03srcu: Delegate work to the boot cpu if using SRCU_SIZE_SMALLPingfan Liu
Commit 994f706872e6 ("srcu: Make Tree SRCU able to operate without snp_node array") assumes that cpu 0 is always online. However, there really are situations when some other CPU is the boot CPU, for example, when booting a kdump kernel with the maxcpus=1 boot parameter. On PowerPC, the kdump kernel can hang as follows: ... [ 1.740036] systemd[1]: Hostname set to <xyz.com> [ 243.686240] INFO: task systemd:1 blocked for more than 122 seconds. [ 243.686264] Not tainted 6.1.0-rc1 #1 [ 243.686272] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.686281] task:systemd state:D stack:0 pid:1 ppid:0 flags:0x00042000 [ 243.686296] Call Trace: [ 243.686301] [c000000016657640] [c000000016657670] 0xc000000016657670 (unreliable) [ 243.686317] [c000000016657830] [c00000001001dec0] __switch_to+0x130/0x220 [ 243.686333] [c000000016657890] [c000000010f607b8] __schedule+0x1f8/0x580 [ 243.686347] [c000000016657940] [c000000010f60bb4] schedule+0x74/0x140 [ 243.686361] [c0000000166579b0] [c000000010f699b8] schedule_timeout+0x168/0x1c0 [ 243.686374] [c000000016657a80] [c000000010f61de8] __wait_for_common+0x148/0x360 [ 243.686387] [c000000016657b20] [c000000010176bb0] __flush_work.isra.0+0x1c0/0x3d0 [ 243.686401] [c000000016657bb0] [c0000000105f2768] fsnotify_wait_marks_destroyed+0x28/0x40 [ 243.686415] [c000000016657bd0] [c0000000105f21b8] fsnotify_destroy_group+0x68/0x160 [ 243.686428] [c000000016657c40] [c0000000105f6500] inotify_release+0x30/0xa0 [ 243.686440] [c000000016657cb0] [c0000000105751a8] __fput+0xc8/0x350 [ 243.686452] [c000000016657d00] [c00000001017d524] task_work_run+0xe4/0x170 [ 243.686464] [c000000016657d50] [c000000010020e94] do_notify_resume+0x134/0x140 [ 243.686478] [c000000016657d80] [c00000001002eb18] interrupt_exit_user_prepare_main+0x198/0x270 [ 243.686493] [c000000016657de0] [c00000001002ec60] syscall_exit_prepare+0x70/0x180 [ 243.686505] [c000000016657e10] [c00000001000bf7c] system_call_vectored_common+0xfc/0x280 [ 243.686520] --- interrupt: 3000 at 0x7fffa47d5ba4 [ 243.686528] NIP: 00007fffa47d5ba4 LR: 0000000000000000 CTR: 0000000000000000 [ 243.686538] REGS: c000000016657e80 TRAP: 3000 Not tainted (6.1.0-rc1) [ 243.686548] MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 42044440 XER: 00000000 [ 243.686572] IRQMASK: 0 [ 243.686572] GPR00: 0000000000000006 00007ffffa606710 00007fffa48e7200 0000000000000000 [ 243.686572] GPR04: 0000000000000002 000000000000000a 0000000000000000 0000000000000001 [ 243.686572] GPR08: 000001000c172dd0 0000000000000000 0000000000000000 0000000000000000 [ 243.686572] GPR12: 0000000000000000 00007fffa4ff4bc0 0000000000000000 0000000000000000 [ 243.686572] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 243.686572] GPR20: 0000000132dfdc50 000000000000000e 0000000000189375 0000000000000000 [ 243.686572] GPR24: 00007ffffa606ae0 0000000000000005 000001000c185490 000001000c172570 [ 243.686572] GPR28: 000001000c172990 000001000c184850 000001000c172e00 00007fffa4fedd98 [ 243.686683] NIP [00007fffa47d5ba4] 0x7fffa47d5ba4 [ 243.686691] LR [0000000000000000] 0x0 [ 243.686698] --- interrupt: 3000 [ 243.686708] INFO: task kworker/u16:1:24 blocked for more than 122 seconds. [ 243.686717] Not tainted 6.1.0-rc1 #1 [ 243.686724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.686733] task:kworker/u16:1 state:D stack:0 pid:24 ppid:2 flags:0x00000800 [ 243.686747] Workqueue: events_unbound fsnotify_mark_destroy_workfn [ 243.686758] Call Trace: [ 243.686762] [c0000000166736e0] [c00000004fd91000] 0xc00000004fd91000 (unreliable) [ 243.686775] [c0000000166738d0] [c00000001001dec0] __switch_to+0x130/0x220 [ 243.686788] [c000000016673930] [c000000010f607b8] __schedule+0x1f8/0x580 [ 243.686801] [c0000000166739e0] [c000000010f60bb4] schedule+0x74/0x140 [ 243.686814] [c000000016673a50] [c000000010f699b8] schedule_timeout+0x168/0x1c0 [ 243.686827] [c000000016673b20] [c000000010f61de8] __wait_for_common+0x148/0x360 [ 243.686840] [c000000016673bc0] [c000000010210840] __synchronize_srcu.part.0+0xa0/0xe0 [ 243.686855] [c000000016673c30] [c0000000105f2c64] fsnotify_mark_destroy_workfn+0xc4/0x1a0 [ 243.686868] [c000000016673ca0] [c000000010174ea8] process_one_work+0x2a8/0x570 [ 243.686882] [c000000016673d40] [c000000010175208] worker_thread+0x98/0x5e0 [ 243.686895] [c000000016673dc0] [c0000000101828d4] kthread+0x124/0x130 [ 243.686908] [c000000016673e10] [c00000001000cd40] ret_from_kernel_thread+0x5c/0x64 [ 366.566274] INFO: task systemd:1 blocked for more than 245 seconds. [ 366.566298] Not tainted 6.1.0-rc1 #1 [ 366.566305] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 366.566314] task:systemd state:D stack:0 pid:1 ppid:0 flags:0x00042000 [ 366.566329] Call Trace: ... The above splat occurs because PowerPC really does use maxcpus=1 instead of nr_cpus=1 in the kernel command line. Consequently, the (quite possibly non-zero) kdump CPU is the only online CPU in the kdump kernel. SRCU unconditionally queues a sdp->work on cpu 0, for which no worker thread has been created, so sdp->work will be never executed and __synchronize_srcu() will never be completed. This commit therefore replaces CPU ID 0 with get_boot_cpu_id() in key places in Tree SRCU. Since the CPU indicated by get_boot_cpu_id() is guaranteed to be online, this avoids the above splat. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> To: rcu@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03srcu: Release early_srcu resources when no longer in useZqiang
Kernels built with the CONFIG_TREE_SRCU Kconfig option set and then booted with rcupdate.rcu_self_test=1 and srcutree.convert_to_big=1 will test Tree SRCU during early boot. The early_srcu structure's srcu_node array will be allocated when init_srcu_struct_fields() is invoked, but after the test completes this early_srcu structure will not be used. This commit therefore invokes cleanup_srcu_struct() to free that srcu_node structure. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu/kvfree: Split ready for reclaim objects from a batchUladzislau Rezki (Sony)
This patch splits the lists of objects so as to avoid sending any through RCU that have already been queued for more than one grace period. These long-term-resident objects are immediately freed. The remaining short-term-resident objects are queued for later freeing using queue_rcu_work(). This change avoids delaying workqueue handlers with synchronize_rcu() invocations. Yes, workqueue handlers are designed to handle blocking, but avoiding blocking when unnecessary improves performance during low-memory situations. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu/kvfree: Carefully reset number of objects in krcpUladzislau Rezki (Sony)
The schedule_delayed_monitor_work() function relies on the count of objects queued into any given kfree_rcu_cpu structure. This count is used to determine how quickly to schedule passing these objects to RCU. There are three pipes where pointers can be placed. When any pipe is offloaded, the kfree_rcu_cpu structure's ->count counter is set to zero, which is wrong because the other pipes might still be non-empty. This commit therefore maintains per-pipe counters, and introduces a krc_count() helper to access the aggregate value of those counters. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu/kvfree: Use READ_ONCE() when access to krcp->headUladzislau Rezki (Sony)
The need_offload_krc() function is now lock-free, which gives the compiler freedom to load old values from plain C-language loads from the kfree_rcu_cpu struture's ->head pointer. This commit therefore applied READ_ONCE() to these loads. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu/kvfree: Use a polled API to speedup a reclaim processUladzislau Rezki (Sony)
Currently all objects placed into a batch wait for a full grace period to elapse after that batch is ready to send to RCU. However, this can unnecessarily delay freeing of the first objects that were added to the batch. After all, several RCU grace periods might have elapsed since those objects were added, and if so, there is no point in further deferring their freeing. This commit therefore adds per-page grace-period snapshots which are obtained from get_state_synchronize_rcu(). When the batch is ready to be passed to call_rcu(), each page's snapshot is checked by passing it to poll_state_synchronize_rcu(). If a given page's RCU grace period has already elapsed, its objects are freed immediately by kvfree_rcu_bulk(). Otherwise, these objects are freed after a call to synchronize_rcu(). This approach requires that the pages be traversed in reverse order, that is, the oldest ones first. Test example: kvm.sh --memory 10G --torture rcuscale --allcpus --duration 1 \ --kconfig CONFIG_NR_CPUS=64 \ --kconfig CONFIG_RCU_NOCB_CPU=y \ --kconfig CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y \ --kconfig CONFIG_RCU_LAZY=n \ --bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 \ rcuscale.holdoff=20 rcuscale.kfree_loops=10000 \ torture.disable_onoff_at_boot" --trust-make Before this commit: Total time taken by all kfree'ers: 8535693700 ns, loops: 10000, batches: 1188, memory footprint: 2248MB Total time taken by all kfree'ers: 8466933582 ns, loops: 10000, batches: 1157, memory footprint: 2820MB Total time taken by all kfree'ers: 5375602446 ns, loops: 10000, batches: 1130, memory footprint: 6502MB Total time taken by all kfree'ers: 7523283832 ns, loops: 10000, batches: 1006, memory footprint: 3343MB Total time taken by all kfree'ers: 6459171956 ns, loops: 10000, batches: 1150, memory footprint: 6549MB After this commit: Total time taken by all kfree'ers: 8560060176 ns, loops: 10000, batches: 1787, memory footprint: 61MB Total time taken by all kfree'ers: 8573885501 ns, loops: 10000, batches: 1777, memory footprint: 93MB Total time taken by all kfree'ers: 8320000202 ns, loops: 10000, batches: 1727, memory footprint: 66MB Total time taken by all kfree'ers: 8552718794 ns, loops: 10000, batches: 1790, memory footprint: 75MB Total time taken by all kfree'ers: 8601368792 ns, loops: 10000, batches: 1724, memory footprint: 62MB The reduction in memory footprint is well in excess of an order of magnitude. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu/kvfree: Move need_offload_krc() out of krcp->lockUladzislau Rezki (Sony)
The need_offload_krc() function currently holds the krcp->lock in order to safely check krcp->head. This commit removes the need for this lock in that function by updating the krcp->head pointer using WRITE_ONCE() macro so that readers can carry out lockless loads of that pointer. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu/kvfree: Move bulk/list reclaim to separate functionsUladzislau Rezki (Sony)
The kvfree_rcu() code maintains lists of pages of pointers, but also a singly linked list, with the latter being used when memory allocation fails. Traversal of these two types of lists is currently open coded. This commit simplifies the code by providing kvfree_rcu_bulk() and kvfree_rcu_list() functions, respectively, to traverse these two types of lists. This patch does not introduce any functional change. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu/kvfree: Switch to a generic linked list APIUladzislau Rezki (Sony)
This commit improves the readability and maintainability of the kvfree_rcu() code by switching from an open-coded linked list to the standard Linux-kernel circular doubly linked list. This patch does not introduce any functional change. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu: Refactor kvfree_call_rcu() and high-level helpersUladzislau Rezki (Sony)
Currently a kvfree_call_rcu() takes an offset within a structure as a second parameter, so a helper such as a kvfree_rcu_arg_2() has to convert rcu_head and a freed ptr to an offset in order to pass it. That leads to an extra conversion on macro entry. Instead of converting, refactor the code in way that a pointer that has to be freed is passed directly to the kvfree_call_rcu(). This patch does not make any functional change and is transparent to all kvfree_rcu() users. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu: Allow expedited RCU CPU stall warnings to dump task stacksPaul E. McKenney
This commit introduces the rcupdate.rcu_exp_stall_task_details kernel boot parameter, which cause expedited RCU CPU stall warnings to dump the stacks of any tasks blocking the current expedited grace period. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu: Test synchronous RCU grace periods at the end of rcu_init()Paul E. McKenney
This commit tests synchronize_rcu() and synchronize_rcu_expedited() at the end of rcu_init(), in addition to the test already at the beginning of that function. These tests are run only in kernels built with CONFIG_PROVE_RCU=y. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu: Make rcu_blocking_is_gp() stop early-boot might_sleep()Zqiang
Currently, rcu_blocking_is_gp() invokes might_sleep() even during early boot when interrupts are disabled and before the scheduler is scheduling. This is at best an accident waiting to happen. Therefore, this commit moves that might_sleep() under an rcu_scheduler_active check in order to ensure that might_sleep() is not invoked unless sleeping might actually happen. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait()Paul E. McKenney
The normal grace period's RCU CPU stall warnings are invoked from the scheduling-clock interrupt handler, and can thus invoke smp_processor_id() with impunity, which allows them to directly invoke dump_cpu_task(). In contrast, the expedited grace period's RCU CPU stall warnings are invoked from process context, which causes the dump_cpu_task() function's calls to smp_processor_id() to complain bitterly in debug kernels. This commit therefore causes synchronize_rcu_expedited_wait() to disable preemption around its call to dump_cpu_task(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu: Upgrade header comment for poll_state_synchronize_rcu()Paul E. McKenney
This commit emphasizes the possibility of concurrent calls to synchronize_rcu() and synchronize_rcu_expedited() causing one or the other of the two grace periods being lost from the viewpoint of poll_state_synchronize_rcu(). If you cannot afford to lose grace periods this way, you should instead use the _full() variants of the polled RCU API, for example, poll_state_synchronize_rcu_full(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu: Throttle callback invocation based on number of ready callbacksPaul E. McKenney
Currently, rcu_do_batch() sizes its batches based on the total number of callbacks in the callback list. This can result in some strange choices, for example, if there was 12,800 callbacks in the list, but only 200 were ready to invoke, RCU would invoke 100 at a time (12,800 shifted down by seven bits). A more measured approach would use the number that were actually ready to invoke, an approach that has become feasible only recently given the per-segment ->seglen counts in ->cblist. This commit therefore bases the batch limit on the number of callbacks ready to invoke instead of on the total number of callbacks. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-03rcu: Consolidate initialization and CPU-hotplug codePaul E. McKenney
This commit consolidates the initialization and CPU-hotplug code at the end of kernel/rcu/tree.c. This is strictly a code-motion commit. No functionality has changed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-12-21Merge tag 'rcu-urgent.2022.12.17a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "This fixes a lockdep false positive in synchronize_rcu() that can otherwise occur during early boot. The fix simply avoids invoking lockdep if the scheduler has not yet been initialized, that is, during that portion of boot when interrupts are disabled" * tag 'rcu-urgent.2022.12.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Don't assert interrupts enabled too early in boot
2022-12-17rcu: Don't assert interrupts enabled too early in bootPaul E. McKenney
The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check that interrupts are enabled, as they normally should be when waiting for an RCU grace period. Except that it is legal to wait for grace periods during early boot, before interrupts have been enabled for the first time, and polling for grace periods is required to work during this time. This can result in false-positive lockdep splats in the presence of boot-time-initiated tracing. This commit therefore conditions those interrupts-enabled checks on rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by which time interrupts have been enabled. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-13Merge tag 'net-next-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Allow live renaming when an interface is up - Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations - Add inet drop monitor support - A few GRO performance improvements - Add infrastructure for atomic dev stats, addressing long standing data races - De-duplicate common code between OVS and conntrack offloading infrastructure - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements - Netfilter: introduce packet parser for tunneled packets - Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs - Add the helper support for connection-tracking OVS offload BPF: - Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF - Make cgroup local storage available to non-cgroup attached BPF programs - Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers - A relevant bunch of BPF verifier fixes and improvements - Veristat tool improvements to support custom filtering, sorting, and replay of results - Add LLVM disassembler as default library for dumping JITed code - Lots of new BPF documentation for various BPF maps - Add bpf_rcu_read_{,un}lock() support for sleepable programs - Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs - Add support storing struct task_struct objects as kptrs in maps - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions Protocols: - TCP: implement Protective Load Balancing across switch links - TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path - UDP: Introduce optional per-netns hash lookup table - IPv6: simplify and cleanup sockets disposal - Netlink: support different type policies for each generic netlink operation - MPTCP: add MSG_FASTOPEN and FastOpen listener side support - MPTCP: add netlink notification support for listener sockets events - SCTP: add VRF support, allowing sctp sockets binding to VRF devices - Add bridging MAC Authentication Bypass (MAB) support - Extensions for Ethernet VPN bridging implementation to better support multicast scenarios - More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage - IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading - IPSec: extended ack support for more descriptive XFRM error reporting - RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking - IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks - Tun: bump the link speed from 10Mbps to 10Gbps - Tun/VirtioNet: implement UDP segmentation offload support Driver API: - PHY/SFP: improve power level switching between standard level 1 and the higher power levels - New API for netdev <-> devlink_port linkage - PTP: convert existing drivers to new frequency adjustment implementation - DSA: add support for rx offloading - Autoload DSA tagging driver when dynamically changing protocol - Add new PCP and APPTRUST attributes to Data Center Bridging - Add configuration support for 800Gbps link speed - Add devlink port function attribute to enable/disable RoCE and migratable - Extend devlink-rate to support strict prioriry and weighted fair queuing - Add devlink support to directly reading from region memory - New device tree helper to fetch MAC address from nvmem - New big TCP helper to simplify temporary header stripping New hardware / drivers: - Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches - Marvel Prestera AC5X Ethernet Switch - WangXun 10 Gigabit NIC - Motorcomm yt8521 Gigabit Ethernet - Microchip ksz9563 Gigabit Ethernet Switch - Microsoft Azure Network Adapter - Linux Automation 10Base-T1L adapter - PHY: - Aquantia AQR112 and AQR412 - Motorcomm YT8531S - PTP: - Orolia ART-CARD - WiFi: - MediaTek Wi-Fi 7 (802.11be) devices - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices - Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets - Realtek RTL8852BE and RTL8723DS - Cypress.CYW4373A0 WiFi + Bluetooth combo device Drivers: - CAN: - gs_usb: bus error reporting support - kvaser_usb: listen only and bus error reporting support - Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping - implement devlink-rate support - support direct read from memory - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate - Support for enhanced events compression - extend H/W offload packet manipulation capabilities - implement IPSec packet offload mode - nVidia/Mellanox (mlx4): - better big TCP support - Netronome Ethernet NICs (nfp): - IPsec offload support - add support for multicast filter - Broadcom: - RSS and PTP support improvements - AMD/SolarFlare: - netlink extened ack improvements - add basic flower matches to offload, and related stats - Virtual NICs: - ibmvnic: introduce affinity hint support - small / embedded: - FreeScale fec: add initial XDP support - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood - TI am65-cpsw: add suspend/resume support - Mediatek MT7986: add RX wireless wthernet dispatch support - Realtek 8169: enable GRO software interrupt coalescing per default - Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support - add ip6gre support - Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support - enable flow offload support - Renesas: - add rswitch R-Car Gen4 gPTP support - Microchip (lan966x): - add full XDP support - add TC H/W offload via VCAP - enable PTP on bridge interfaces - Microchip (ksz8): - add MTU support for KSZ8 series - Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan - MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support - add ack signal support - enable coredump support - remain_on_channel support - Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support - RealTek WiFi (rtw89): - new dynamic header firmware format support - wake-over-WLAN support" * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits) ipvs: fix type warning in do_div() on 32 bit net: lan966x: Remove a useless test in lan966x_ptp_add_trap() net: ipa: add IPA v4.7 support dt-bindings: net: qcom,ipa: Add SM6350 compatible bnxt: Use generic HBH removal helper in tx path IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver selftests: forwarding: Add bridge MDB test selftests: forwarding: Rename bridge_mdb test bridge: mcast: Support replacement of MDB port group entries bridge: mcast: Allow user space to specify MDB entry routing protocol bridge: mcast: Allow user space to add (*, G) with a source list and filter mode bridge: mcast: Add support for (*, G) with a source list and filter mode bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source bridge: mcast: Add a flag for user installed source entries bridge: mcast: Expose __br_multicast_del_group_src() bridge: mcast: Expose br_multicast_new_group_src() bridge: mcast: Add a centralized error path bridge: mcast: Place netlink policy before validation functions bridge: mcast: Split (*, G) and (S, G) addition into different functions bridge: mcast: Do not derive entry type from its filter mode ...
2022-12-12Merge tag 'printk-for-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add NMI-safe SRCU reader API. It uses atomic_inc() instead of this_cpu_inc() on strong load-store architectures. - Introduce new console_list_lock to synchronize a manipulation of the list of registered consoles and their flags. This is a first step in removing the big-kernel-lock-like behavior of console_lock(). This semaphore still serializes console->write() calbacks against: - each other. It primary prevents potential races between early and proper console drivers using the same device. - suspend()/resume() callbacks and init() operations in some drivers. - various other operations in the tty/vt and framebufer susbsystems. It is likely that console_lock() serializes even operations that are not directly conflicting with the console->write() callbacks here. This is the most complicated big-kernel-lock aspect of the console_lock() that will be hard to untangle. - Introduce new console_srcu lock that is used to safely iterate and access the registered console drivers under SRCU read lock. This is a prerequisite for introducing atomic console drivers and console kthreads. It will reduce the complexity of serialization against normal consoles and console_lock(). Also it should remove the risk of deadlock during critical situations, like Oops or panic, when only atomic consoles are registered. - Check whether the console is registered instead of enabled on many locations. It was a historical leftover. - Cleanly force a preferred console in xenfb code instead of a dirty hack. - A lot of code and comment clean ups and improvements. * tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (47 commits) printk: htmldocs: add missing description tty: serial: sh-sci: use setup() callback for early console printk: relieve console_lock of list synchronization duties tty: serial: kgdboc: use console_list_lock to trap exit tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console() tty: serial: kgdboc: use console_list_lock for list traversal tty: serial: kgdboc: use srcu console list iterator proc: consoles: use console_list_lock for list iteration tty: tty_io: use console_list_lock for list synchronization printk, xen: fbfront: create/use safe function for forcing preferred netconsole: avoid CON_ENABLED misuse to track registration usb: early: xhci-dbc: use console_is_registered() tty: serial: xilinx_uartps: use console_is_registered() tty: serial: samsung_tty: use console_is_registered() tty: serial: pic32_uart: use console_is_registered() tty: serial: earlycon: use console_is_registered() tty: hvc: use console_is_registered() efi: earlycon: use console_is_registered() tty: nfcon: use console_is_registered() serial_core: replace uart_console_enabled() with uart_console_registered() ...
2022-12-12Merge tag 'rcu.2022.12.02a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates. This is the second in a series from an ongoing review of the RCU documentation. - Miscellaneous fixes. - Introduce a default-off Kconfig option that depends on RCU_NOCB_CPU that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument CPU lists, causes call_rcu() to introduce delays. These delays result in significant power savings on nearly idle Android and ChromeOS systems. These savings range from a few percent to more than ten percent. This series also includes several commits that change call_rcu() to a new call_rcu_hurry() function that avoids these delays in a few cases, for example, where timely wakeups are required. Several of these are outside of RCU and thus have acks and reviews from the relevant maintainers. - Create an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe() for architectures that support NMIs, but which do not provide NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required by the upcoming lockless printk() work by John Ogness et al. - Changes providing minor but important increases in torture test coverage for the new RCU polled-grace-period APIs. - Changes to torturescript that avoid redundant kernel builds, thus providing about a 30% speedup for the torture.sh acceptance test. * tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits) net: devinet: Reduce refcount before grace period net: Use call_rcu_hurry() for dst_release() workqueue: Make queue_rcu_work() use call_rcu_hurry() percpu-refcount: Use call_rcu_hurry() for atomic switch scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu() rcu/rcutorture: Use call_rcu_hurry() where needed rcu/rcuscale: Use call_rcu_hurry() for async reader test rcu/sync: Use call_rcu_hurry() instead of call_rcu rcuscale: Add laziness and kfree tests rcu: Shrinker for lazy rcu rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() rcu: Make call_rcu() lazy to save power rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC srcu: Debug NMI safety even on archs that don't require it srcu: Explain the reason behind the read side critical section on GP start srcu: Warn when NMI-unsafe API is used in NMI arch/s390: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option arch/loongarch: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state() rcu-tasks: Make grace-period-age message human-readable ...
2022-12-01srcu: Make Tiny synchronize_srcu() check for readersZqiang
This commit adds lockdep checks for illegal use of synchronize_srcu() within same-type SRCU read-side critical sections and within normal RCU read-side critical sections. It also makes synchronize_srcu() be a no-op during early boot. These changes bring Tiny synchronize_srcu() into line with both Tree synchronize_srcu() and Tiny synchronize_rcu(). Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: John Ogness <john.ogness@linutronix.de>
2022-11-30Merge branches 'doc.2022.10.20a', 'fixes.2022.10.21a', 'lazy.2022.11.30a', ↵Paul E. McKenney
'srcunmisafe.2022.11.09a', 'torture.2022.10.18c' and 'torturescript.2022.10.20a' into HEAD doc.2022.10.20a: Documentation updates. fixes.2022.10.21a: Miscellaneous fixes. lazy.2022.11.30a: Lazy call_rcu() and NOCB updates. srcunmisafe.2022.11.09a: NMI-safe SRCU readers. torture.2022.10.18c: Torture-test updates. torturescript.2022.10.20a: Torture-test scripting updates.
2022-11-29rcu: Make SRCU mandatoryPaul E. McKenney
Kernels configured with CONFIG_PRINTK=n and CONFIG_SRCU=n get build failures. This causes trouble for deep embedded systems. But given that there are more than 25 instances of "select SRCU" in the kernel, it is hard to believe that there are many kernels running in production without SRCU. This commit therefore makes SRCU mandatory. The SRCU Kconfig option remains for backwards compatibility, and will be removed when it is no longer used. [ paulmck: Update per kernel test robot feedback. ] Reported-by: John Ogness <john.ogness@linutronix.de> Reported-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: <linux-arch@vger.kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reviewed-by: John Ogness <john.ogness@linutronix.de>
2022-11-29rcu/rcutorture: Use call_rcu_hurry() where neededJoel Fernandes (Google)
call_rcu() changes to save power will change the behavior of rcutorture tests. Use the call_rcu_hurry() API instead which reverts to the old behavior. [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-29rcu/rcuscale: Use call_rcu_hurry() for async reader testJoel Fernandes (Google)
rcuscale uses call_rcu() to queue async readers. With recent changes to save power, the test will have fewer async readers in flight. Use the call_rcu_hurry() API instead to revert to the old behavior. [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-29rcu/sync: Use call_rcu_hurry() instead of call_rcuJoel Fernandes (Google)
call_rcu() changes to save power will slow down rcu sync. Use the call_rcu_hurry() API instead which reverts to the old behavior. [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-29rcuscale: Add laziness and kfree testsJoel Fernandes (Google)
This commit adds 2 tests to rcuscale. The first one is a startup test to check whether we are not too lazy or too hard working. The second one causes kfree_rcu() itself to use call_rcu() and checks memory pressure. Testing indicates that the new call_rcu() keeps memory pressure under control roughly as well as does kfree_rcu(). [ paulmck: Apply checkpatch feedback. ] Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-29rcu: Shrinker for lazy rcuVineeth Pillai
The shrinker is used to speed up the free'ing of memory potentially held by RCU lazy callbacks. RCU kernel module test cases show this to be effective. Test is introduced in a later patch. Signed-off-by: Vineeth Pillai <vineeth@bitbyteword.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-29rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()Joel Fernandes (Google)
This consolidates the code a bit and makes it cleaner. Functionally it is the same. Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-11-29rcu: Make call_rcu() lazy to save powerJoel Fernandes (Google)
Implement timer-based RCU callback batching (also known as lazy callbacks). With this we save about 5-10% of power consumed due to RCU requests that happen when system is lightly loaded or idle. By default, all async callbacks (queued via call_rcu) are marked lazy. An alternate API call_rcu_hurry() is provided for the few users, for example synchronize_rcu(), that need the old behavior. The batch is flushed whenever a certain amount of time has passed, or the batch on a particular CPU grows too big. Also memory pressure will flush it in a future patch. To handle several corner cases automagically (such as rcu_barrier() and hotplug), we re-use bypass lists which were originally introduced to address lock contention, to handle lazy CBs as well. The bypass list length has the lazy CB length included in it. A separate lazy CB length counter is also introduced to keep track of the number of lazy CBs. [ paulmck: Fix formatting of inline call_rcu_lazy() definition. ] [ paulmck: Apply Zqiang feedback. ] [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] Suggested-by: Paul McKenney <paulmck@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/linux/net.h a5ef058dc4d9 ("net: introduce and use custom sockopt socket flag") e993ffe3da4b ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-21srcu: Debug NMI safety even on archs that don't require itFrederic Weisbecker
Currently the NMI safety debugging is only performed on architectures that don't support NMI-safe this_cpu_inc(). Reorder the code so that other architectures like x86 also detect bad uses. [ paulmck: Apply kernel test robot, Stephen Rothwell, and Zqiang feedback. ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-21srcu: Explain the reason behind the read side critical section on GP startFrederic Weisbecker
Tell about the need to protect against concurrent updaters who may overflow the GP counter behind the current update. Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-21srcu: Warn when NMI-unsafe API is used in NMIFrederic Weisbecker
Using the NMI-unsafe reader API from within an NMI handler is very likely to be buggy for three reasons: 1) NMIs aren't strictly re-entrant (a pending nested NMI will execute at the end of the current one) so it should be fine to use a non-atomic increment here. However, breakpoints can still interrupt NMIs and if a breakpoint callback has a reader on that same ssp, a racy increment can happen. 2) If the only reader site for a given srcu_struct structure is in an NMI handler, then RCU should be used instead of SRCU. 3) Because of the previous reason (2), an srcu_struct structure having an SRCU read side critical section in an NMI handler is likely to have another one from a task context. For all these reasons, warn if an NMI-unsafe reader API is used from an NMI handler. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-21rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()Zqiang
Running rcutorture with non-zero fqs_duration module parameter in a kernel built with CONFIG_PREEMPTION=y results in the following splat: BUG: using __this_cpu_read() in preemptible [00000000] code: rcu_torture_fqs/398 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 398 Comm: rcu_torture_fqs Not tainted 6.0.0-rc1-yoctodev-standard+ Call Trace: <TASK> dump_stack_lvl+0x5b/0x86 dump_stack+0x10/0x16 check_preemption_disabled+0xe5/0xf0 __this_cpu_preempt_check+0x13/0x20 rcu_force_quiescent_state.part.0+0x1c/0x170 rcu_force_quiescent_state+0x1e/0x30 rcu_torture_fqs+0xca/0x160 ? rcu_torture_boost+0x430/0x430 kthread+0x192/0x1d0 ? kthread_complete_and_exit+0x30/0x30 ret_from_fork+0x22/0x30 </TASK> The problem is that rcu_force_quiescent_state() uses __this_cpu_read() in preemptible code instead of the proper raw_cpu_read(). This commit therefore changes __this_cpu_read() to raw_cpu_read(). Signed-off-by: Zqiang <qiang1.zhang@intel.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-21rcu-tasks: Make grace-period-age message human-readablePaul E. McKenney
This commit adds a few words to the informative message that appears every ten seconds in RCU Tasks and RCU Tasks Trace grace periods. This message currently reads as follows: rcu_tasks_wait_gp: rcu_tasks grace period 1046 is 10088 jiffies old. After this change, it provides additional context, instead reading as follows: rcu_tasks_wait_gp: rcu_tasks grace period number 1046 (since boot) is 10088 jiffies old. Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-21rcu: Remove rcu_is_idle_cpu()Yipeng Zou
The commit 3fcd6a230fa7 ("x86/cpu: Avoid cpuinfo-induced IPIing of idle CPUs") introduced rcu_is_idle_cpu() in order to identify the current CPU idle state. But commit f3eca381bd49 ("x86/aperfmperf: Replace arch_freq_get_on_cpu()") switched to using MAX_SAMPLE_AGE, so rcu_is_idle_cpu() is no longer used. This commit therefore removes it. Fixes: f3eca381bd49 ("x86/aperfmperf: Replace arch_freq_get_on_cpu()") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-20rcu: Keep synchronize_rcu() from enabling irqs in early bootPaul E. McKenney
Making polled RCU grace periods account for expedited grace periods required acquiring the leaf rcu_node structure's lock during early boot, but after rcu_init() was called. This lock is irq-disabled, but the code incorrectly assumes that irqs are always disabled when invoking synchronize_rcu(). The exception is early boot before the scheduler has started, which means that upon return from synchronize_rcu(), irqs will be incorrectly enabled. This commit fixes this bug by using irqsave/irqrestore locking primitives. Fixes: bf95b2bc3e42 ("rcu: Switch polled grace-period APIs to ->gp_seq_polled") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-20srcu: Check for consistent global per-srcu_struct NMI safetyPaul E. McKenney
This commit adds runtime checks to verify that a given srcu_struct uses consistent NMI-safe (or not) read-side primitives globally, but based on the per-CPU data. These global checks are made by the grace-period code that must scan the srcu_data structures anyway, and are done only in kernels built with CONFIG_PROVE_RCU=y. Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Ogness <john.ogness@linutronix.de> Cc: Petr Mladek <pmladek@suse.com>
2022-10-20srcu: Check for consistent per-CPU per-srcu_struct NMI safetyPaul E. McKenney
This commit adds runtime checks to verify that a given srcu_struct uses consistent NMI-safe (or not) read-side primitives on a per-CPU basis. Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Ogness <john.ogness@linutronix.de> Cc: Petr Mladek <pmladek@suse.com>
2022-10-20srcu: Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe()Paul E. McKenney
On strict load-store architectures, the use of this_cpu_inc() by srcu_read_lock() and srcu_read_unlock() is not NMI-safe in TREE SRCU. To see this suppose that an NMI arrives in the middle of srcu_read_lock(), just after it has read ->srcu_lock_count, but before it has written the incremented value back to memory. If that NMI handler also does srcu_read_lock() and srcu_read_lock() on that same srcu_struct structure, then upon return from that NMI handler, the interrupted srcu_read_lock() will overwrite the NMI handler's update to ->srcu_lock_count, but leave unchanged the NMI handler's update by srcu_read_unlock() to ->srcu_unlock_count. This can result in a too-short SRCU grace period, which can in turn result in arbitrary memory corruption. If the NMI handler instead interrupts the srcu_read_unlock(), this can result in eternal SRCU grace periods, which is not much better. This commit therefore creates a pair of new srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe() functions, which allow SRCU readers in both NMI handlers and in process and IRQ context. It is bad practice to mix the existing and the new _nmisafe() primitives on the same srcu_struct structure. Use one set or the other, not both. Just to underline that "bad practice" point, using srcu_read_lock() at process level and srcu_read_lock_nmisafe() in your NMI handler will not, repeat NOT, work. If you do not immediately understand why this is the case, please review the earlier paragraphs in this commit log. [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Apply feedback from Randy Dunlap. ] [ paulmck: Apply feedback from John Ogness. ] [ paulmck: Apply feedback from Frederic Weisbecker. ] Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Ogness <john.ogness@linutronix.de> Cc: Petr Mladek <pmladek@suse.com>
2022-10-18rcutorture: Verify NUM_ACTIVE_RCU_POLL_OLDSTATEPaul E. McKenney
This commit adds code to the RTWS_POLL_GET case of rcu_torture_writer() to verify that the value of NUM_ACTIVE_RCU_POLL_OLDSTATE is sufficiently large Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-10-18rcutorture: Verify NUM_ACTIVE_RCU_POLL_FULL_OLDSTATEPaul E. McKenney
This commit adds code to the RTWS_POLL_GET_FULL case of rcu_torture_writer() to verify that the value of NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE is sufficiently large. [ paulmck: Fix whitespace issue located by checkpatch.pl. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>