desktop freeze, htop shows kernel thread at 100% cpu

Bug #1526657 reported by tojb
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Every few hours the desktop freezes. I can still login via ssh, htop shows one process running at 100%. The process is shown in red, indicating a kernel thread. No response to ctrl-alt-f1 or to service lightdm restart.

Graphics card is NVIDIA NVS 310.

I've tried the NVIDIA drivers in various versions up to nvidia-current, and also nouveau (current bug report is with logfiles etc generated after a crash while running nouveau).

Xorg.log shows a seg fault:

[171809.988] (EE)
[171809.988] (EE) Backtrace:
[171809.988] (EE) 0: /usr/bin/X (xorg_backtrace+0x48) [0x7fc8b9ccd848]
[171809.988] (EE) 1: /usr/bin/X (0x7fc8b9b24000+0x1ad539) [0x7fc8b9cd1539]
[171809.988] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fc8b8c20000+0x10340) [0x7fc8b8c30340]
[171809.988] (EE) 3: /lib/x86_64-linux-gnu/libc.so.6 (0x7fc8b763f000+0x97ff0) [0x7fc8b76d6ff0]
[171809.988] (EE) 4: /usr/lib/xorg/modules/libexa.so (0x7fc8b295f000+0x5356) [0x7fc8b2964356]
[171809.988] (EE) 5: /usr/lib/xorg/modules/libexa.so (0x7fc8b295f000+0x57fb) [0x7fc8b29647fb]
[171809.988] (EE) 6: /usr/lib/xorg/modules/libexa.so (0x7fc8b295f000+0x7e22) [0x7fc8b2966e22]
[171809.988] (EE) 7: /usr/lib/xorg/modules/libexa.so (0x7fc8b295f000+0x1186a) [0x7fc8b297086a]
[171809.988] (EE) 8: /usr/lib/xorg/modules/libexa.so (0x7fc8b295f000+0xe236) [0x7fc8b296d236]
[171809.988] (EE) 9: /usr/lib/xorg/modules/libexa.so (0x7fc8b295f000+0xc028) [0x7fc8b296b028]
[171809.988] (EE) 10: /usr/bin/X (0x7fc8b9b24000+0x133f56) [0x7fc8b9c57f56]
[171809.988] (EE) 11: /usr/bin/X (0x7fc8b9b24000+0x12a3fe) [0x7fc8b9c4e3fe]
[171809.988] (EE) 12: /usr/bin/X (0x7fc8b9b24000+0x55f0e) [0x7fc8b9b79f0e]
[171809.988] (EE) 13: /usr/bin/X (0x7fc8b9b24000+0x59d9a) [0x7fc8b9b7dd9a]
[171809.988] (EE) 14: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf5) [0x7fc8b7660ec5]
[171809.988] (EE) 15: /usr/bin/X (0x7fc8b9b24000+0x451ee) [0x7fc8b9b691ee]
[171809.988] (EE)
[171809.988] (EE) Segmentation fault at address 0x0
[171809.988] (EE)
Fatal server error:
[171809.988] (EE) Caught signal 11 (Segmentation fault). Server aborting
[171809.988] (EE)

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-71-generic 3.13.0-71.114
ProcVersionSignature: Ubuntu 3.13.0-71.114-generic 3.13.11-ckt29
Uname: Linux 3.13.0-71-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC1', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', '/dev/snd/pcmC1D7p', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Wed Dec 16 08:30:46 2015
DistributionChannelDescriptor:
 # This is a distribution channel descriptor
 # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor
 canonical-oem-somerville-precise-amd64-20130203-1
InstallationDate: Installed on 2015-10-17 (59 days ago)
InstallationMedia: Ubuntu 12.04 "Precise" - Build amd64 LIVE Binary 20130203-13:50
IwConfig:
 eth0 no wireless extensions.

 eth1 no wireless extensions.

 lo no wireless extensions.
MachineType: Dell Inc. Precision Tower 7910
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-71-generic root=UUID=8fb5292d-6961-420e-ac9b-3f468520a7de ro plymouth:debug splash quiet drm.debug=0xe
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-71-generic N/A
 linux-backports-modules-3.13.0-71-generic N/A
 linux-firmware 1.127.19
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to trusty on 2015-11-02 (43 days ago)
dmi.bios.date: 09/26/2015
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A09
dmi.board.name: 0215PR
dmi.board.vendor: Dell Inc.
dmi.board.version: A03
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA09:bd09/26/2015:svnDellInc.:pnPrecisionTower7910:pvr:rvnDellInc.:rn0215PR:rvrA03:cvnDellInc.:ct7:cvr:
dmi.product.name: Precision Tower 7910
dmi.sys.vendor: Dell Inc.

Revision history for this message
tojb (the-real-josh-berryman) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.4 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc5-wily

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
tojb (the-real-josh-berryman) wrote : Re: [Bug 1526657] Re: desktop freeze, htop shows kernel thread at 100% cpu
Download full text (6.1 KiB)

Did this issue start happening after an update/upgrade? Was there a
prior kernel version where you were not having this particular problem?
>>Its a new machine.
>>I located another machine with exact same hardware, ubuntu build and
drivers which is manifesting no problems: therefore this is a hardware bug
or at worst a bad interaction of OS with damaged hardware.

On 16 December 2015 at 19:27, Joseph Salisbury <
<email address hidden>> wrote:

> Did this issue start happening after an update/upgrade? Was there a
> prior kernel version where you were not having this particular problem?
>
> Would it be possible for you to test the latest upstream kernel? Refer
> to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
> v4.4 kernel[0].
>
> If this bug is fixed in the mainline kernel, please add the following
> tag 'kernel-fixed-upstream'.
>
> If the mainline kernel does not fix this bug, please add the tag:
> 'kernel-bug-exists-upstream'.
>
> Once testing of the upstream kernel is complete, please mark this bug as
> "Confirmed".
>
>
> Thanks in advance.
>
> [0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc5-wily
>
>
> ** Changed in: linux (Ubuntu)
> Importance: Undecided => Medium
>
> ** Changed in: linux (Ubuntu)
> Status: Confirmed => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1526657
>
> Title:
> desktop freeze, htop shows kernel thread at 100% cpu
>
> Status in linux package in Ubuntu:
> Incomplete
>
> Bug description:
> Every few hours the desktop freezes. I can still login via ssh, htop
> shows one process running at 100%. The process is shown in red,
> indicating a kernel thread. No response to ctrl-alt-f1 or to
> service lightdm restart.
>
> Graphics card is NVIDIA NVS 310.
>
> I've tried the NVIDIA drivers in various versions up to nvidia-
> current, and also nouveau (current bug report is with logfiles etc
> generated after a crash while running nouveau).
>
> Xorg.log shows a seg fault:
>
> [171809.988] (EE)
> [171809.988] (EE) Backtrace:
> [171809.988] (EE) 0: /usr/bin/X (xorg_backtrace+0x48) [0x7fc8b9ccd848]
> [171809.988] (EE) 1: /usr/bin/X (0x7fc8b9b24000+0x1ad539)
> [0x7fc8b9cd1539]
> [171809.988] (EE) 2: /lib/x86_64-linux-gnu/libpthread.so.0
> (0x7fc8b8c20000+0x10340) [0x7fc8b8c30340]
> [171809.988] (EE) 3: /lib/x86_64-linux-gnu/libc.so.6
> (0x7fc8b763f000+0x97ff0) [0x7fc8b76d6ff0]
> [171809.988] (EE) 4: /usr/lib/xorg/modules/libexa.so
> (0x7fc8b295f000+0x5356) [0x7fc8b2964356]
> [171809.988] (EE) 5: /usr/lib/xorg/modules/libexa.so
> (0x7fc8b295f000+0x57fb) [0x7fc8b29647fb]
> [171809.988] (EE) 6: /usr/lib/xorg/modules/libexa.so
> (0x7fc8b295f000+0x7e22) [0x7fc8b2966e22]
> [171809.988] (EE) 7: /usr/lib/xorg/modules/libexa.so
> (0x7fc8b295f000+0x1186a) [0x7fc8b297086a]
> [171809.988] (EE) 8: /usr/lib/xorg/modules/libexa.so
> (0x7fc8b295f000+0xe236) [0x7fc8b296d236]
> [171809.988] (EE) 9: /usr/lib/xorg/modules/libexa.so
> (0x7fc8b295f000+0xc028) [0x7fc8b296b028]
> [171809.988] (EE) 10: /usr/bin/X (0x7fc8b9b24000+0x133f56)
> [0x7fc8b9c57f56]
...

Read more...

Revision history for this message
Alberto Sentieri (22t) wrote :

I have exactly the same problem in both an Ubuntu 14.04 that I use at work and a Debian Jessie that I use at home, and the software is up to date. Both systems have a "VGA compatible controller: NVIDIA Corporation GF119 [NVS 310] (rev a1)" and two monitors. After the problem shows up, I can login into my system through ssh, but the console is dead.

The problem never happens when I am using the system, only when the screen saver takes control and monitors turn off. I am not familiar with the guts of the screen saver and why the monitors turns off: one possibility is the video signal going away during the screen save operation; another is the computer explicitly asking the monitor to turn off. Anyway, the fact is that the monitor turns off and that may cause the problem.

I found a work around at work which virtually made the problem go away: I have the option (on the special monitors I use at work) to never turn them off, even when there is no video signal coming from the computer. When I activate this monitor feature, the problem goes away. At home, however, my monitors do not have this feature and I have the dead console problem every few days. At work, in the last 6 months (and after never turning my monitors off) I have had the dead console problem just once, and that could be a different problem.

So, my conclusion is that the problem happens exactly when the screen saver takes control and the monitor turns off. Hope that help you guys solve the problem. My current Debian kernel version is “Linux version 3.16.0-4-amd64 (<email address hidden>) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt20-1+deb8u2 (2016-01-02)”

Revision history for this message
Alberto Sentieri (22t) wrote :

I should also mention that I have frequently updated both Debian and Ubuntu and that I have seen the same problem with all the updates.

Revision history for this message
tojb (the-real-josh-berryman) wrote :

>>So, my conclusion is that the problem happens exactly when the screen saver takes control and the monitor turns off.

For me it was any sporadically associated with any graphics-related operation, eg zooming in google earth, or opening a 3D molecule viewer. I have swapped out the NVS 310 graphics card for an older FX 1800 and had only one crash of this type since then, instead of the many per day that I was getting before.

Revision history for this message
Alberto Sentieri (22t) wrote :

Well, I am not a strong graphical software user. Most of the time I am using gvim to edit program files. One of the times I had the problem I captured dmesg. Below you will see what I got (through ssh, I guess).

[Sat Nov 21 12:24:38 2015] nouveau E[ PFIFO][0000:01:00.0] read fault at 0x00024e0000 [PAGE_NOT_PRESENT] from PGRAPH/GPC0/TEX on channel 0x001fb0e000 [gnome-shell[1611]]
[Sat Nov 21 12:24:38 2015] nouveau E[ PFIFO][0000:01:00.0] PGRAPH engine fault on channel 5, recovering...
[Sat Nov 21 12:24:38 2015] nouveau E[ PGRAPH][0000:01:00.0] TRAP ch 5 [0x001fb0e000 gnome-shell[1611]]
[Sat Nov 21 12:24:38 2015] nouveau E[ PGRAPH][0000:01:00.0] GPC0/TPC0/TEX: 0x80000049
[Sat Nov 21 12:26:22 2015] nouveau E[Xorg[1082]] failed to idle channel 0xcccc0000 [Xorg[1082]]
[Sat Nov 21 12:26:37 2015] nouveau E[Xorg[1082]] failed to idle channel 0xcccc0000 [Xorg[1082]]
[Sat Nov 21 12:26:37 2015] nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000001b000 [PAGE_NOT_PRESENT] from PFIFO/BAR_READ on channel 0x001fc7e000 [unknown]
[Sat Nov 21 12:27:26 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0002 [Xorg[13885]]
[Sat Nov 21 12:27:41 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0002 [Xorg[13885]]
[Sat Nov 21 12:27:56 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0002 [Xorg[13885]]
[Sat Nov 21 12:27:57 2015] nouveau E[gnome-shell[1611]] failed to idle channel 0xcccc0000 [gnome-shell[1611]]
[Sat Nov 21 12:28:11 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0002 [Xorg[13885]]
[Sat Nov 21 12:28:12 2015] nouveau E[gnome-shell[1611]] failed to idle channel 0xcccc0000 [gnome-shell[1611]]
[Sat Nov 21 12:28:26 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0002 [Xorg[13885]]
[Sat Nov 21 12:28:41 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0002 [Xorg[13885]]
[Sat Nov 21 12:28:56 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0001 [Xorg[13885]]
[Sat Nov 21 12:29:11 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0001 [Xorg[13885]]
[Sat Nov 21 12:29:26 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0001 [Xorg[13885]]
[Sat Nov 21 12:29:41 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0000 [Xorg[13885]]
[Sat Nov 21 12:29:56 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0000 [Xorg[13885]]
[Sat Nov 21 12:30:11 2015] nouveau E[Xorg[13885]] failed to idle channel 0xcccc0000 [Xorg[13885]]

Revision history for this message
tojb (the-real-josh-berryman) wrote :

It would be nice to get a /var/log/Xorg.log capture, just to show that it is really the same thing.

Revision history for this message
Alberto Sentieri (22t) wrote :

Below is the information from Xorg.O.log.old (about the described lock, on Nov. 21st):

[141735.156] nouveau_exa_download_from_screen:295 - falling back to memcpy ignores tiling
[141735.188] (EE)
[141735.188] (EE) Backtrace:
[141735.198] (EE) 0: /usr/bin/Xorg (xorg_backtrace+0x56) [0x7fde93b52d46]
[141735.198] (EE) 1: /usr/bin/Xorg (0x7fde9399c000+0x1baf29) [0x7fde93b56f29]
[141735.198] (EE) 2: /lib/x86_64-linux-gnu/libc.so.6 (0x7fde91690000+0x35180) [0x7fde916c5180]
[141735.198] (EE) 3: /lib/x86_64-linux-gnu/libc.so.6 (0x7fde91690000+0x90720) [0x7fde91720720]
[141735.198] (EE) 4: /usr/lib/xorg/modules/libexa.so (0x7fde8d794000+0x533e) [0x7fde8d79933e]
[141735.198] (EE) 5: /usr/lib/xorg/modules/libexa.so (0x7fde8d794000+0x5803) [0x7fde8d799803]
[141735.198] (EE) 6: /usr/lib/xorg/modules/libexa.so (0x7fde8d794000+0x7e9f) [0x7fde8d79be9f]
[141735.198] (EE) 7: /usr/lib/xorg/modules/libexa.so (0x7fde8d794000+0x11a46) [0x7fde8d7a5a46]
[141735.198] (EE) 8: /usr/lib/xorg/modules/libexa.so (0x7fde8d794000+0xe870) [0x7fde8d7a2870]
[141735.198] (EE) 9: /usr/lib/xorg/modules/libexa.so (0x7fde8d794000+0xecf3) [0x7fde8d7a2cf3]
[141735.198] (EE) 10: /usr/bin/Xorg (0x7fde9399c000+0x13d3b1) [0x7fde93ad93b1]
[141735.198] (EE) 11: /usr/lib/xorg/modules/libexa.so (0x7fde8d794000+0xc2a0) [0x7fde8d7a02a0]
[141735.198] (EE) 12: /usr/bin/Xorg (0x7fde9399c000+0x13d643) [0x7fde93ad9643]
[141735.198] (EE) 13: /usr/bin/Xorg (0x7fde9399c000+0x133587) [0x7fde93acf587]
[141735.199] (EE) 14: /usr/bin/Xorg (0x7fde9399c000+0x573f7) [0x7fde939f33f7]
[141735.199] (EE) 15: /usr/bin/Xorg (0x7fde9399c000+0x5b596) [0x7fde939f7596]
[141735.199] (EE) 16: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf5) [0x7fde916b1b45]
[141735.199] (EE) 17: /usr/bin/Xorg (0x7fde9399c000+0x4590e) [0x7fde939e190e]
[141735.199] (EE)
[141735.200] (EE) Segmentation fault at address 0x0
[141735.200] (EE)
Fatal server error:
[141735.200] (EE) Caught signal 11 (Segmentation fault). Server aborting
[141735.200] (EE)
[141735.201] (EE)
Please consult the The X.Org Foundation support
     at http://wiki.x.org
 for help.
[141735.201] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[141735.201] (EE)
[141735.202] (II) AIGLX: Suspending AIGLX clients for VT switch
[141735.203] (II) NOUVEAU(0): NVLeaveVT is called.
[141735.260] (EE) Server terminated with error (1). Closing log file.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.