diff options
| author | Petr Mladek <pmladek@suse.com> | 2025-09-26 14:49:10 +0200 |
|---|---|---|
| committer | Petr Mladek <pmladek@suse.com> | 2025-10-30 12:10:16 +0100 |
| commit | c41c0ebfa1e0eb40cfb11846a7a579eb8d9dfb5f (patch) | |
| tree | c684363fca882f7447c6189d80caefbc211ad1a3 /tools/lib/python/kdoc/kdoc_parser.py | |
| parent | 48e3694ae7fae347c1193c84f384f4ea41086075 (diff) | |
printk/nbcon: Block printk kthreads when any CPU is in an emergency context
In emergency contexts, printk() tries to flush messages directly even
on nbcon consoles. And it is allowed to takeover the console ownership
and interrupt the printk kthread in the middle of a message.
Only one takeover and one repeated message should be enough in most
situations. The first emergency message flushes the backlog and printk
kthreads get to sleep. Next emergency messages are flushed directly
and printk() does not wake up the kthreads.
However, the one takeover is not guaranteed. Any printk() in normal
context on another CPU could wake up the kthreads. Or a new emergency
message might be added before the kthreads get to sleep. Note that
the interrupted .write_thread() callbacks usually have to call
nbcon_reacquire_nobuf() and restore the original device setting
before checking for pending messages.
The risk of the repeated takeovers will be even bigger because
__nbcon_atomic_flush_pending_con is going to release the console
ownership after each emitted record. It will be needed to prevent
hardlockup reports on other CPUs which are busy waiting for
the context ownership, for example, by nbcon_reacquire_nobuf() or
__uart_port_nbcon_acquire().
The repeated takeovers break the output, for example:
[ 5042.650211][ T2220] Call Trace:
[ 5042.6511
** replaying previous printk message **
[ 5042.651192][ T2220] <TASK>
[ 5042.652160][ T2220] kunit_run_
** replaying previous printk message **
[ 5042.652160][ T2220] kunit_run_tests+0x72/0x90
[ 5042.653340][ T22
** replaying previous printk message **
[ 5042.653340][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5042.654628][ T2220] ? stack_trace_save+0x4d/0x70
[ 5042.6553
** replaying previous printk message **
[ 5042.655394][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5
[ 5042.656713][ T2220] ? save_trace+0x5b/0x180
A more robust solution is to block the printk kthread entirely whenever
*any* CPU enters an emergency context. This ensures that critical messages
can be flushed without contention from the normal, non-atomic printing
path.
Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz
Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20250926124912.243464-2-pmladek@suse.com
[pmladek@suse.com: Added changes proposed by John Ogness]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Diffstat (limited to 'tools/lib/python/kdoc/kdoc_parser.py')
0 files changed, 0 insertions, 0 deletions
