summaryrefslogtreecommitdiff
path: root/scripts/lib/kdoc/kdoc_files.py
diff options
context:
space:
mode:
authorFrederic Weisbecker <frederic@kernel.org>2025-04-28 13:11:47 +0200
committerPeter Zijlstra <peterz@infradead.org>2025-05-08 21:50:19 +0200
commitd20eb2d5fe8f8818abcfdadf5ac5109938f1318e (patch)
treeeda8bc01657e45c0b14c6aeb602ef9c70f2df5e9 /scripts/lib/kdoc/kdoc_files.py
parent22d38babb3adcb1227ecfb91d9423008a46548fe (diff)
perf: Fix irq work dereferencing garbage
The following commit: da916e96e2de ("perf: Make perf_pmu_unregister() useable") has introduced two significant event's parent lifecycle changes: 1) An event that has exited now has EVENT_TOMBSTONE as a parent. This can result in a situation where the delayed wakeup irq_work can accidentally dereference EVENT_TOMBSTONE on: CPU 0 CPU 1 ----- ----- __schedule() local_irq_disable() rq_lock() <NMI> perf_event_overflow() irq_work_queue(&child->pending_irq) </NMI> perf_event_task_sched_out() raw_spin_lock(&ctx->lock) ctx_sched_out() ctx->is_active = 0 event_sched_out(child) raw_spin_unlock(&ctx->lock) perf_event_release_kernel(parent) perf_remove_from_context(child) raw_spin_lock_irq(&ctx->lock) // Sees !ctx->is_active // Removes from context inline __perf_remove_from_context(child) perf_child_detach(child) event->parent = EVENT_TOMBSTONE raw_spin_rq_unlock_irq(rq); <IRQ> perf_pending_irq() perf_event_wakeup(child) ring_buffer_wakeup(child) rcu_dereference(child->parent->rb) <--- CRASH This also concerns the call to kill_fasync() on parent->fasync. 2) The final parent reference count decrement can now happen before the the final child reference count decrement. ie: the parent can now be freed before its child. On PREEMPT_RT, this can result in a situation where the delayed wakeup irq_work can accidentally dereference a freed parent: CPU 0 CPU 1 CPU 2 ----- ----- ------ perf_pmu_unregister() pmu_detach_events() pmu_get_event() atomic_long_inc_not_zero(&child->refcount) <NMI> perf_event_overflow() irq_work_queue(&child->pending_irq); </NMI> <IRQ> irq_work_run() wake_irq_workd() </IRQ> preempt_schedule_irq() =========> SWITCH to workd irq_work_run_list() perf_pending_irq() perf_event_wakeup(child) ring_buffer_wakeup(child) event = child->parent perf_event_release_kernel(parent) // Not last ref, PMU holds it put_event(child) // Last ref put_event(parent) free_event() call_rcu(...) rcu_core() free_event_rcu() rcu_dereference(event->rb) <--- CRASH This also concerns the call to kill_fasync() on parent->fasync. The "easy" solution to 1) is to check that event->parent is not EVENT_TOMBSTONE on perf_event_wakeup() (including both ring buffer and fasync uses). The "easy" solution to 2) is to turn perf_event_wakeup() to wholefully run under rcu_read_lock(). However because of 2), sanity would prescribe to make event::parent an __rcu pointer and annotate each and every users to prove they are reliable. Propose an alternate solution and restore the stable pointer to the parent until all its children have called _free_event() themselves to avoid any further accident. Also revert the EVENT_TOMBSTONE design that is mostly here to determine which caller of perf_event_exit_event() must perform the refcount decrement on a child event matching the increment in inherit_event(). Arrange instead for checking the attach state of an event prior to its removal and decrement the refcount of the child accordingly. Fixes: da916e96e2de ("perf: Make perf_pmu_unregister() useable") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Diffstat (limited to 'scripts/lib/kdoc/kdoc_files.py')
0 files changed, 0 insertions, 0 deletions