Comment 27 for bug 1680502

Revision history for this message
Arlie Stephens (arlie) wrote :

Today's "hang" involved a zombie compiz consuming 100% of a cpu, along with an emacs instance consuming another 100%. Load average around 11, and climbing. Only 22 zombies currently, but it was 4 when I managed to get on with ssh.

I was in the process of installing software updates, using the GUI tool (rather than direct use of apt-get from the shell) when this happened.

Parts of the update still seem to be running.

arlie@ansuz$ ps -Fa -p1 -www
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
root 1 0 0 30034 4656 2 Apr28 ? 00:00:08 /sbin/init splash
root 25826 25775 0 1127 1712 0 07:57 pts/18 00:00:00 /bin/sh -e /var/lib/dpkg/info/udev.postrm upgrade 229-4ubuntu17
root 25843 25826 0 6542 1352 0 07:57 pts/18 00:00:00 systemctl --system daemon-reload
arlie 25846 22284 0 9342 3232 2 07:57 pts/4 00:00:00 ps -Fa -p1 -www

I'm wondering now whether my first guess of a kernel issue is dead wrong, and the root cause is actually compiz. Or perhaps we have multiple causes, for the same basic symptom.

Here's the current crop of defunct processes

arlie@ansuz$ ps aux | grep defunct
arlie 2488 0.0 0.0 0 0 ? Z<l Apr28 0:00 [pulseaudio] <defunct>
arlie 2503 0.8 0.0 0 0 ? Zsl Apr28 55:08 [compiz] <defunct>
arlie 2692 0.0 0.0 0 0 ? Z Apr28 0:00 [gconf-helper] <defunct>
root 22212 0.0 0.0 0 0 ? Z 07:42 0:00 [check-new-relea] <defunct>
sshd 24480 0.0 0.0 0 0 ? Z 07:52 0:00 [sshd] <defunct>
sshd 24489 0.0 0.0 0 0 ? Z 07:52 0:00 [sshd] <defunct>
sshd 24491 0.0 0.0 0 0 ? Z 07:52 0:00 [sshd] <defunct>
sshd 24494 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24496 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24500 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24504 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24508 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24510 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24514 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24518 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24523 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24532 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24538 0.0 0.0 0 0 ? Z 07:55 0:00 [sshd] <defunct>
sshd 24541 0.0 0.0 0 0 ? Z 07:55 0:00 [sshd] <defunct>
sshd 24543 0.0 0.0 0 0 ? Z 07:55 0:00 [sshd] <defunct>
sshd 25708 0.0 0.0 0 0 ? Z 07:55 0:00 [sshd] <defunct>
sshd 25711 0.0 0.0 0 0 ? Z 07:56 0:00 [sshd] <defunct>
arlie 26946 0.0 0.0 14228 964 pts/4 S+ 08:00 0:00 grep defunct

Systemd is in top's state "D" - just like last time. That's an uninterruptable sleep. It does not appear to have accumulated any cpu time since I got in via ssh.

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 2503 arlie 20 0 0 0 0 Z 100.0 0.0 61:12.34 compiz
    7 root 20 0 0 0 0 S 0.3 0.0 0:57.09 rcu_sched
    1 root 20 0 120136 4656 3204 D 0.0 0.1 0:08.94 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd

So the root cause might be systemd blocking on something.