Today's "hang" involved a zombie compiz consuming 100% of a cpu, along with an emacs instance consuming another 100%. Load average around 11, and climbing. Only 22 zombies currently, but it was 4 when I managed to get on with ssh.
I was in the process of installing software updates, using the GUI tool (rather than direct use of apt-get from the shell) when this happened.
I'm wondering now whether my first guess of a kernel issue is dead wrong, and the root cause is actually compiz. Or perhaps we have multiple causes, for the same basic symptom.
Systemd is in top's state "D" - just like last time. That's an uninterruptable sleep. It does not appear to have accumulated any cpu time since I got in via ssh.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2503 arlie 20 0 0 0 0 Z 100.0 0.0 61:12.34 compiz
7 root 20 0 0 0 0 S 0.3 0.0 0:57.09 rcu_sched
1 root 20 0 120136 4656 3204 D 0.0 0.1 0:08.94 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
So the root cause might be systemd blocking on something.
Today's "hang" involved a zombie compiz consuming 100% of a cpu, along with an emacs instance consuming another 100%. Load average around 11, and climbing. Only 22 zombies currently, but it was 4 when I managed to get on with ssh.
I was in the process of installing software updates, using the GUI tool (rather than direct use of apt-get from the shell) when this happened.
Parts of the update still seem to be running.
arlie@ansuz$ ps -Fa -p1 -www dpkg/info/ udev.postrm upgrade 229-4ubuntu17
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
root 1 0 0 30034 4656 2 Apr28 ? 00:00:08 /sbin/init splash
root 25826 25775 0 1127 1712 0 07:57 pts/18 00:00:00 /bin/sh -e /var/lib/
root 25843 25826 0 6542 1352 0 07:57 pts/18 00:00:00 systemctl --system daemon-reload
arlie 25846 22284 0 9342 3232 2 07:57 pts/4 00:00:00 ps -Fa -p1 -www
I'm wondering now whether my first guess of a kernel issue is dead wrong, and the root cause is actually compiz. Or perhaps we have multiple causes, for the same basic symptom.
Here's the current crop of defunct processes
arlie@ansuz$ ps aux | grep defunct
arlie 2488 0.0 0.0 0 0 ? Z<l Apr28 0:00 [pulseaudio] <defunct>
arlie 2503 0.8 0.0 0 0 ? Zsl Apr28 55:08 [compiz] <defunct>
arlie 2692 0.0 0.0 0 0 ? Z Apr28 0:00 [gconf-helper] <defunct>
root 22212 0.0 0.0 0 0 ? Z 07:42 0:00 [check-new-relea] <defunct>
sshd 24480 0.0 0.0 0 0 ? Z 07:52 0:00 [sshd] <defunct>
sshd 24489 0.0 0.0 0 0 ? Z 07:52 0:00 [sshd] <defunct>
sshd 24491 0.0 0.0 0 0 ? Z 07:52 0:00 [sshd] <defunct>
sshd 24494 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24496 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24500 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24504 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24508 0.0 0.0 0 0 ? Z 07:53 0:00 [sshd] <defunct>
sshd 24510 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24514 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24518 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24523 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24532 0.0 0.0 0 0 ? Z 07:54 0:00 [sshd] <defunct>
sshd 24538 0.0 0.0 0 0 ? Z 07:55 0:00 [sshd] <defunct>
sshd 24541 0.0 0.0 0 0 ? Z 07:55 0:00 [sshd] <defunct>
sshd 24543 0.0 0.0 0 0 ? Z 07:55 0:00 [sshd] <defunct>
sshd 25708 0.0 0.0 0 0 ? Z 07:55 0:00 [sshd] <defunct>
sshd 25711 0.0 0.0 0 0 ? Z 07:56 0:00 [sshd] <defunct>
arlie 26946 0.0 0.0 14228 964 pts/4 S+ 08:00 0:00 grep defunct
Systemd is in top's state "D" - just like last time. That's an uninterruptable sleep. It does not appear to have accumulated any cpu time since I got in via ssh.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2503 arlie 20 0 0 0 0 Z 100.0 0.0 61:12.34 compiz
7 root 20 0 0 0 0 S 0.3 0.0 0:57.09 rcu_sched
1 root 20 0 120136 4656 3204 D 0.0 0.1 0:08.94 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
So the root cause might be systemd blocking on something.