[xenial/armhf] lxc-stop --kill hangs forever, container pid 1 in 'D' state

Bug #1536021 reported by Martin Pitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxcfs (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Since I upgraded our armhf autopkgtest boxes from wily to xenial, I very often get eternal hangs on lxc-stop:

adt-virt-lxc-egctlo RUNNING 10.0.3.154 - - NO

root 15766 0.0 0.0 5044 1488 ? S Jan19 0:00 lxc-stop --kill --name adt-virt-lxc-egctlo

I can still attach to the container, and it seems pid1 is in some "uninterruptible deep kernel sleep":

$ sudo lxc-attach -n adt-virt-lxc-egctlo ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 5060 2344 ? Ds Jan19 0:00 /sbin/init
root 230 0.0 0.1 12112 2224 ? Ss Jan19 0:00 /lib/systemd/systemd-journald
root 263 0.0 0.0 3372 1060 ? Ss Jan19 0:00 /sbin/dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp
root 321 0.0 0.0 4896 912 ? Ss Jan19 0:00 /usr/sbin/cron -f
syslog 329 0.0 0.0 31148 1424 ? Ssl Jan19 0:00 /usr/sbin/rsyslogd -n
message+ 349 0.0 0.0 4860 1540 ? Ss Jan19 0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopid
root 358 0.0 0.0 0 0 ? Zs Jan19 0:00 [systemd-logind] <defunct>
root 384 0.0 0.0 3848 692 pts/3 Ss+ Jan19 0:00 /sbin/agetty --noclear --keep-baud pts/3 115200 38400 9600 vt220
root 386 0.0 0.0 3848 692 pts/0 Ss+ Jan19 0:00 /sbin/agetty --noclear --keep-baud pts/0 115200 38400 9600 vt220
root 389 0.0 0.0 3848 692 pts/1 Ss+ Jan19 0:00 /sbin/agetty --noclear --keep-baud pts/1 115200 38400 9600 vt220
root 391 0.0 0.0 0 0 ? Zs Jan19 0:00 [agetty] <defunct>
root 393 0.0 0.0 5064 1028 ? Ss Jan19 0:00 (agetty)
root 4907 0.0 0.0 5652 1176 ? S Jan19 0:00 reboot
root 4917 0.0 0.0 0 0 ? Zs Jan19 0:00 [ondemand] <defunct>
root 5747 0.0 0.0 0 0 ? Z Jan19 0:00 [bash] <defunct>
root 5748 0.0 0.0 0 0 ? Z Jan19 0:00 [bash] <defunct>
root 7168 0.0 0.0 0 0 ? Z Jan19 0:00 [dkms] <defunct>
root 8516 0.0 0.0 0 0 ? Z Jan19 0:00 [dkms] <defunct>
root 21174 0.0 0.0 6788 1304 pts/3 R+ 07:20 0:00 ps aux

journal in the container still works, but does not show anything interesting. systemctl hangs due to pid1 getting into this 'D' state. Due to that, stracing pid 1 is also useless.

These boxes are still running the trusty kernel 3.13, as newer kernels don't boot on those boxes (the block devices are missing, probably a missing block driver?), so this regression is not due to a kernel change.

So this is somewhere between lxc, lxcfs, systemd, or cgmanager. I'll bisect these packages in the next days to find out, as so far I don't yet have a way to reproduce this reliably.

Revision history for this message
Martin Pitt (pitti) wrote :

As another data point, I don't see this on the s390x boxes which are also running xenial userspace with the same LXC setup, but a 4.3 kernel.

summary: - [xenial/armhf] lxc-stop --kill hangs forever
+ [xenial/armhf] lxc-stop --kill hangs forever, container pid 1 in 'D'
+ state
Revision history for this message
Martin Pitt (pitti) wrote :

restarting lxcfs helps: The hanging container pid 1 and lxc-stop finally finish, and the container stops.

affects: lxc (Ubuntu) → lxcfs (Ubuntu)
Changed in lxcfs (Ubuntu):
importance: Undecided → High
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I assume this has been fixed by now, by the lxcfs restart ability. Closing

Changed in lxcfs (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.