system freezes occasionally for several minutes

Bug #1712858 reported by Bruno Haible
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned
pulseaudio (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Occasionally, the system freezes: display does not show mouse move, clock display does not update itself, ssh impossible. But the machine remains pingable. This lasts for 2 to 20 minutes.

This is with Ubuntu 16.04.2, now upgraded to kernel version 4.8.0-46-generic.

At the end of the freeze, a bunch of messages get written to /var/log/syslog. Examples:

Aug 19 00:03:48 quimper rtkit-daemon[2276]: The canary thread is apparently starving. Taking action.
Aug 19 00:03:48 quimper rtkit-daemon[2276]: Demoting known real-time threads.
Aug 19 00:03:48 quimper rtkit-daemon[2276]: Successfully demoted thread 2607 of process 2275 (n/a).
Aug 19 00:03:48 quimper rtkit-daemon[2276]: Successfully demoted thread 2606 of process 2275 (n/a).
Aug 19 00:03:48 quimper rtkit-daemon[2276]: Successfully demoted thread 2604 of process 2275 (n/a).
Aug 19 00:03:48 quimper rtkit-daemon[2276]: Successfully demoted thread 2275 of process 2275 (n/a).
Aug 19 00:03:48 quimper rtkit-daemon[2276]: Demoted 4 threads.

Aug 19 08:06:19 quimper rtkit-daemon[2276]: The canary thread is apparently starving. Taking action.
Aug 19 08:06:19 quimper rtkit-daemon[2276]: Demoting known real-time threads.
Aug 19 08:06:19 quimper rtkit-daemon[2276]: Successfully demoted thread 2607 of process 2275 (n/a).
Aug 19 08:06:19 quimper rtkit-daemon[2276]: Successfully demoted thread 2606 of process 2275 (n/a).
Aug 19 08:06:19 quimper rtkit-daemon[2276]: Successfully demoted thread 2604 of process 2275 (n/a).
Aug 19 08:06:19 quimper rtkit-daemon[2276]: Successfully demoted thread 2275 of process 2275 (n/a).
Aug 19 08:06:19 quimper rtkit-daemon[2276]: Demoted 4 threads.
Aug 19 08:06:20 quimper systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
Aug 19 08:06:20 quimper systemd[1]: systemd-udevd.service: Main process exited, code=killed, status=6/ABRT
Aug 19 08:06:20 quimper systemd[1]: systemd-udevd.service: Unit entered failed state.
Aug 19 08:06:20 quimper systemd[1]: systemd-udevd.service: Failed with result 'signal'.
Aug 19 08:06:20 quimper systemd[1]: systemd-udevd.service: Service has no hold-off time, scheduling restart.
Aug 19 08:06:20 quimper systemd[1]: Stopped udev Kernel Device Manager.
Aug 19 08:06:20 quimper systemd[1]: Starting udev Kernel Device Manager...
Aug 19 08:06:24 quimper systemd[1]: Started udev Kernel Device Manager.

Aug 19 15:04:55 quimper rtkit-daemon[2276]: The canary thread is apparently starving. Taking action.
Aug 19 15:04:55 quimper rtkit-daemon[2276]: Demoting known real-time threads.
Aug 19 15:04:55 quimper rtkit-daemon[2276]: Successfully demoted thread 2607 of process 2275 (n/a).
Aug 19 15:04:55 quimper rtkit-daemon[2276]: Successfully demoted thread 2606 of process 2275 (n/a).
Aug 19 15:04:55 quimper rtkit-daemon[2276]: Successfully demoted thread 2604 of process 2275 (n/a).
Aug 19 15:04:55 quimper rtkit-daemon[2276]: Successfully demoted thread 2275 of process 2275 (n/a).
Aug 19 15:04:55 quimper rtkit-daemon[2276]: Demoted 4 threads.
Aug 19 15:04:55 quimper systemd[1]: systemd-timesyncd.service: Watchdog timeout (limit 3min)!
Aug 19 15:04:59 quimper systemd[1]: systemd-timesyncd.service: Main process exited, code=killed, status=6/ABRT
Aug 19 15:04:59 quimper systemd[1]: systemd-timesyncd.service: Unit entered failed state.
Aug 19 15:04:59 quimper systemd[1]: systemd-timesyncd.service: Failed with result 'signal'.
Aug 19 15:05:00 quimper systemd[1]: systemd-timesyncd.service: Service has no hold-off time, scheduling restart.
Aug 19 15:05:00 quimper systemd[1]: Stopped Network Time Synchronization.
Aug 19 15:05:00 quimper systemd[1]: Starting Network Time Synchronization...
Aug 19 15:05:00 quimper systemd[1]: Started Network Time Synchronization.
Aug 19 15:05:00 quimper systemd-timesyncd[26894]: Synchronized to time server 91.189.89.198:123 (ntp.ubuntu.com).

Aug 20 06:03:28 quimper rtkit-daemon[2276]: The canary thread is apparently starving. Taking action.
Aug 20 06:03:28 quimper rtkit-daemon[2276]: Demoting known real-time threads.
Aug 20 06:03:28 quimper rtkit-daemon[2276]: Successfully demoted thread 2607 of process 2275 (n/a).
Aug 20 06:03:28 quimper rtkit-daemon[2276]: Successfully demoted thread 2606 of process 2275 (n/a).
Aug 20 06:03:28 quimper rtkit-daemon[2276]: Successfully demoted thread 2604 of process 2275 (n/a).
Aug 20 06:03:28 quimper rtkit-daemon[2276]: Successfully demoted thread 2275 of process 2275 (n/a).
Aug 20 06:03:28 quimper rtkit-daemon[2276]: Demoted 4 threads.

Aug 20 15:09:22 quimper rtkit-daemon[2276]: The canary thread is apparently starving. Taking action.
Aug 20 15:09:26 quimper systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
Aug 20 15:09:26 quimper rtkit-daemon[2276]: Demoting known real-time threads.
Aug 20 15:09:26 quimper systemd[1]: systemd-udevd.service: Main process exited, code=killed, status=6/ABRT
Aug 20 15:09:26 quimper rtkit-daemon[2276]: Successfully demoted thread 2607 of process 2275 (n/a).
Aug 20 15:09:26 quimper systemd[1]: systemd-udevd.service: Unit entered failed state.
Aug 20 15:09:26 quimper rtkit-daemon[2276]: Successfully demoted thread 2606 of process 2275 (n/a).
Aug 20 15:09:26 quimper systemd[1]: systemd-udevd.service: Failed with result 'signal'.
Aug 20 15:09:26 quimper rtkit-daemon[2276]: Successfully demoted thread 2604 of process 2275 (n/a).
Aug 20 15:09:26 quimper systemd[1]: systemd-udevd.service: Service has no hold-off time, scheduling restart.
Aug 20 15:09:26 quimper rtkit-daemon[2276]: Successfully demoted thread 2275 of process 2275 (n/a).
Aug 20 15:09:26 quimper systemd[1]: Stopped udev Kernel Device Manager.
Aug 20 15:09:26 quimper rtkit-daemon[2276]: Demoted 4 threads.
Aug 20 15:09:26 quimper systemd[1]: Starting udev Kernel Device Manager...
Aug 20 15:09:27 quimper systemd[1]: Started udev Kernel Device Manager.

The process the rtkit-daemon talks about is, in all cases, /usr/bin/pulseaudio.

$ ps auxw | grep 2275
haible 2275 0.0 0.0 519164 8576 ? Sl Jul11 3:47 /usr/bin/pulseaudio --start --log-target=syslog

I have not used 'pavucontrol' on this machine in 3 months.

Info:
$ lsb_release -rd
Description: Ubuntu 16.04.2 LTS
Release: 16.04
$ apt-cache policy pulseaudio
pulseaudio:
  Installed: 1:8.0-0ubuntu3.2
  Candidate: 1:8.0-0ubuntu3.3
  Version table:
     1:8.0-0ubuntu3.3 500
        500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
 *** 1:8.0-0ubuntu3.2 100
        100 /var/lib/dpkg/status
     1:8.0-0ubuntu3 500
        500 http://de.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It appears you are missing a few months of updates. In particular, missing a bug fix update:
  pulseaudio 1:8.0-0ubuntu3.3

Please update your system using these commands:
  sudo apt update
  sudo apt full-upgrade

After you have updated and rebooted, check if the problem continues. If the problem persists then please run this command to send more information:
  apport-collect 1712858

Changed in pulseaudio (Ubuntu):
status: New → Incomplete
Revision history for this message
Bruno Haible (bruno-clisp) wrote :

I have upgraded to pulseaudio 1:8.0-0ubuntu3.3.
At the same time, my system got upgraded to use a 4.10 kernel (instead of a 4.8 kernel).

The effect: The lockups got worse:
  - machine not pingable during the lockup,
  - freeze for 2 hours or more; I decided to reboot.

Now I downgraded to a 4.8 kernel.

The effect: The lockups are still there, but are temporary:
  - machine remains pingable,
  - freeze for only a couple of minutes.

The latest such freeze (at 18:57) led to these syslog messages:

Sep 11 18:58:55 rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
Sep 11 18:58:55 rtkit-daemon[2231]: Demoting known real-time threads.
Sep 11 18:58:56 systemd[1]: systemd-timesyncd.service: Watchdog timeout (limit 3min)!
Sep 11 18:58:56 rtkit-daemon[2231]: Successfully demoted thread 2612 of process 2230 (n/a).
Sep 11 18:58:56 rtkit-daemon[2231]: Successfully demoted thread 2611 of process 2230 (n/a).
Sep 11 18:58:56 rtkit-daemon[2231]: Successfully demoted thread 2610 of process 2230 (n/a).
Sep 11 18:58:56 rtkit-daemon[2231]: Successfully demoted thread 2230 of process 2230 (n/a).
Sep 11 18:58:56 rtkit-daemon[2231]: Demoted 4 threads.
Sep 11 18:58:57 systemd[1]: systemd-timesyncd.service: Main process exited, code=killed, status=6/ABRT
Sep 11 18:58:57 systemd[1]: systemd-timesyncd.service: Unit entered failed state.
Sep 11 18:58:57 systemd[1]: systemd-timesyncd.service: Failed with result 'signal'.
Sep 11 18:58:57 systemd[1]: systemd-timesyncd.service: Service has no hold-off time, scheduling restart.
Sep 11 18:58:57 systemd[1]: Stopped Network Time Synchronization.
Sep 11 18:58:58 systemd[1]: Starting Network Time Synchronization...
Sep 11 18:59:00 systemd[1]: Started Network Time Synchronization.

Process 2230 is (again) pulseaudio.

Changed in pulseaudio (Ubuntu):
status: Incomplete → New
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This sounds like it might be the pulseaudio process getting stuck in a system call that's particular to your sound hardware/driver. Please try running 'lspci -k' and pasting the output here.

Also, the problem might not be audio-related at all. It might be some other part of your kernel that's hanging and pulseaudio is just mentioned in the syslog because it's the only one with realtime threads. That doesn't mean those threads caused the problem necessarily.

Please also try the latest live image (just boot from USB and leave it running for a while):
  http://cdimage.ubuntu.com/daily-live/current/

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in pulseaudio (Ubuntu):
status: New → Incomplete
Revision history for this message
Bruno Haible (bruno-clisp) wrote :

> the problem might not be audio-related at all. It might be some other part of your kernel that's hanging
This is quite possible indeed.

Revision history for this message
Bruno Haible (bruno-clisp) wrote :
Download full text (3.6 KiB)

A measure to reduce the frequency of the hangs is to use little swap space.

Here's an extract of my system log:
syslog.7.gz:Sep 28 19:20:00 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.7.gz:Sep 28 22:19:03 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.7.gz:Sep 29 04:47:32 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.7.gz:Sep 29 07:16:44 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.6.gz:Sep 29 13:35:04 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.6.gz:Sep 29 13:35:04 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.6.gz:Sep 29 17:27:41 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.6.gz:Sep 29 21:43:55 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.6.gz:Sep 29 21:43:55 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.6.gz:Sep 30 03:30:53 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.6.gz:Sep 30 07:50:10 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.5.gz:Sep 30 12:48:52 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.5.gz:Sep 30 12:48:55 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.5.gz:Sep 30 22:07:09 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.5.gz:Oct 1 04:49:01 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.4.gz:Oct 1 11:07:24 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.4.gz:Oct 1 13:33:58 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.3.gz:Oct 2 10:59:56 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.3.gz:Oct 2 13:52:15 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.3.gz:Oct 2 13:52:16 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.3.gz:Oct 2 21:44:36 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.3.gz:Oct 3 01:08:25 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.3.gz:Oct 3 09:26:54 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.3.gz:Oct 3 13:30:52 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.3.gz:Oct 3 17:25:08 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.
syslog.3.gz:Oct 3 21:44:55 quimper rtkit-daemon[2231]: The canary thread is apparently starving. Taking action.

Up until Oct 3, I got 2 to 6 hangs per day (each lasting minutes or hours). In these times I was using between 6 and 10 GB of swap space (in additio...

Read more...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Yes, swapping is the responsibility of the kernel. So if your system occasionally thrashing then that could cause this bug per comment #3. It would also explain why the system is able to recover after some minutes...

Please continue testing but it sounds like the memory hog is the real bug (plasmashell or others). But that said, using several GB of "virtual memory" (VSZ in ps) is not a bug. That's just address space and does not imply any real memory or swap usage. Please monitor RSS (resident set size) instead to see what's impacting the hardware most (run 'ps aux').

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for pulseaudio (Ubuntu) because there has been no activity for 60 days.]

Changed in pulseaudio (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.