Ubuntu
linux package

zap_pid_ns_processes() gets stuck in a busy loop when zombie processes are in namespace

Jammy (22.04)
Bug #2077044

Bug #2077044 reported by Matthew Ruffell on 2024-08-15

This bug affects 1 person

	Status	Importance	Assigned to
linux (Ubuntu)	Fix Released	Undecided	Unassigned
Jammy	Fix Committed	Medium	Matthew Ruffell
Noble	Fix Committed	Medium	Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/2077044

[Impact]

A deadlock can occur in zap_pid_ns_processes() which can hang the system due to RCU getting stuck.

zap_pid_ns_processes() has a busy loop that calls kernel_wait4() on a child process of the namespace init task, waiting for it to exit. The problem is, it clears TIF_SIGPENDING, but not TIF_NOTIFY_SIGNAL as well, leading us to get stuck in the busy loop forever, due to the child sleeping in synchronize_rcu(), and is never woken up due to the parent being stuck in the busy loop and never calling schedule() or rcu_note_context_switch().

A oops is:

Watchdog: BUG: soft lockup - CPU#3 stuck for 276s! [rcudeadlock:1836]
CPU: 3 PID: 1836 Comm: rcudeadlock Tainted: G L 5.15.0-117-generic #127-Ubuntu
RIP: 0010:_raw_read_lock+0xe/0x30
Code: f0 0f b1 17 74 08 31 c0 5d c3 cc cc cc cc b8 01 00 00 00 5d c3 cc cc cc cc 0f 1f 00 0f 1f 44 00 00 b8 00 02 00 00 f0 0f c1 07 <a9> ff 01 00 00 75 05 c3 cc cc cc cc 55 48 89 e5 e8 4d 79 36 ff 5d
CR2: 000000c0002b0000
Call Trace:
<IRQ>
? show_trace_log_lvl+0x1d6/0x2ea
? show_trace_log_lvl+0x1d6/0x2ea
? kernel_wait4+0xaf/0x150
? show_regs.part.0+0x23/0x29
? show_regs.cold+0x8/0xd
? watchdog_timer_fn+0x1be/0x220
? lockup_detector_update_enable+0x60/0x60
? __hrtimer_run_queues+0x107/0x230
? read_hv_clock_tsc_cs+0x9/0x30
? hrtimer_interrupt+0x101/0x220
? hv_stimer0_isr+0x20/0x30
? __sysvec_hyperv_stimer0+0x32/0x70
? sysvec_hyperv_stimer0+0x7b/0x90
</IRQ>
<TASK>
? asm_sysvec_hyperv_stimer0+0x1b/0x20
? _raw_read_lock+0xe/0x30
? do_wait+0xa0/0x310
kernel_wait4+0xaf/0x150
? thread_group_exited+0x50/0x50
zap_pid_ns_processes+0x111/0x1a0
forget_original_parent+0x348/0x360
exit_notify+0x4a/0x210
do_exit+0x24f/0x3c0
do_group_exit+0x3b/0xb0
get_signal+0x150/0x900
arch_do_signal_or_restart+0xde/0x100
? __x64_sys_futex+0x78/0x1e0
exit_to_user_mode_loop+0xc4/0x160
exit_to_user_mode_prepare+0xa3/0xb0
syscall_exit_to_user_mode+0x27/0x50
? x64_sys_call+0x1022/0x1fa0
do_syscall_64+0x63/0xb0
? __io_uring_add_tctx_node+0x111/0x1a0
? fput+0x13/0x20
? __do_sys_io_uring_enter+0x10d/0x540
? __smp_call_single_queue+0x59/0x90
? exit_to_user_mode_prepare+0x37/0xb0
? syscall_exit_to_user_mode+0x2c/0x50
? x64_sys_call+0x1819/0x1fa0
? do_syscall_64+0x63/0xb0
? try_to_wake_up+0x200/0x5a0
? wake_up_q+0x50/0x90
? futex_wake+0x159/0x190
? do_futex+0x162/0x1f0
? __x64_sys_futex+0x78/0x1e0
? switch_fpu_return+0x4e/0xc0
? exit_to_user_mode_prepare+0x37/0xb0
? syscall_exit_to_user_mode+0x2c/0x50
? x64_sys_call+0x1022/0x1fa0
? do_syscall_64+0x63/0xb0
? do_user_addr_fault+0x1e7/0x670
? exit_to_user_mode_prepare+0x37/0xb0
? irqentry_exit_to_user_mode+0xe/0x20
? irqentry_exit+0x1d/0x30
? exc_page_fault+0x89/0x170
entry_SYSCALL_64_after_hwframe+0x6c/0xd6
</TASK>

There is no known workaround.

[Fix]

This was fixed in the below commit in 6.10-rc5:

commit 7fea700e04bd3f424c2d836e98425782f97b494e
Author: Oleg Nesterov <email address hidden>
Date: Sat Jun 8 14:06:16 2024 +0200
Subject: zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fea700e04bd3f424c2d836e98425782f97b494e

This patch has made its way to upstream stable, and is already applied to Ubuntu
kernels.

[Testcase]

There are two possible testcases to reproduce this issue.
This reproducer is courtesy of Rachel Menge, using the reproducers in her github repo:

https://github.com/rlmenge/rcu-soft-lock-issue-repro

Start a Jammy or Noble VM on Azure, D8sV3 will be plenty.

$ git clone https://github.com/rlmenge/rcu-soft-lock-issue-repro.git

npm repro:

Install Docker.

$ sudo docker run telescope.azurecr.io/issue-repro/zombie:v1.1.11
$ ./rcu-npm-repro.sh

go repro:

$ go mod init rcudeadlock.go
$ go mod tidy
$ CGO_ENABLED=0 go build -o ./rcudeadlock ./
$ sudo ./rcudeadlock

Look at dmesg. After some minutes, you should see the hung task timeout from the impact section.

[Where problems can occur]

We are clearing TIF_NOTIFY_SIGNAL in the child, in order for signal_pending() to return false and not lead us to a busy wait loop.
This change should work as intended.

If a regression were to occur, it could potentially affect all processes in namespaces.

[Other Info]

Upstream mailing list discussion:
https://lore<email address hidden>/T/

See original description

Tags:

Matthew Ruffell (mruffell) on 2024-08-15

Changed in linux (Ubuntu):
status:	New → Fix Released
Changed in linux (Ubuntu Jammy):
status:	New → Fix Committed
Changed in linux (Ubuntu Noble):
status:	New → Fix Committed
Changed in linux (Ubuntu Jammy):
importance:	Undecided → Medium
Changed in linux (Ubuntu Noble):
importance:	Undecided → Medium
Changed in linux (Ubuntu Jammy):
assignee:	nobody → Matthew Ruffell (mruffell)
Changed in linux (Ubuntu Noble):
assignee:	nobody → Matthew Ruffell (mruffell)
description:	updated
tags:	added: sts

Revision history for this message

Matthew Ruffell (mruffell) wrote on 2024-08-15:

This should land in 5.15.0-121-generic and 6.8.0-44-generic.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

zap_pid_ns_processes() gets stuck in a busy loop when zombie processes are in namespace

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
linux package