X hard lockup in raring, can't even switch VTs

Bug #1117563 reported by Steve Langasek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xserver-xorg-video-intel (Ubuntu)
Expired
High
Unassigned

Bug Description

After an upgrade to raring, I've experienced a bug where firefox and X are each consuming a CPU doing nothing useful. When this happens, I can still log into the machine remotely with no problems, but the console is completely frozen. The X cursor doesn't move, unity doesn't respond, windows are not refreshing, and I can't even change VTs (from the keyboard, or over ssh - if I run 'sudo chvt 1', it hangs).

It appears that X and firefox are both busy, but strace of firefox shows no output. kill -9 of the firefox process also has no effect (bug #1117499). There is nothing logged in dmesg or in the X log that seems relevant.

After debugging this for about an hour, the firefox process spontaneously died, at which point X also stopped monopolizing the CPU; but the console was still frozen, A few minutes later, the VT switch request seems to have registered and I wound up on VT1. However, switching back to VT7 does not result in the screen being refreshed, and 'sudo chvt 1' again hangs.
---
ApportVersion: 2.8-0ubuntu4
Architecture: amd64
DistroCodename: raring
DistroRelease: Ubuntu 13.04
DistroVariant: ubuntu
InstallationDate: Installed on 2010-09-24 (866 days ago)
InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.1)
MarkForUpload: True
Package: xorg-server (not installed)
ProcVersionSignature: Ubuntu 3.8.0-4.8-generic 3.8.0-rc6
Tags: raring running-unity ubuntu
Uname: Linux 3.8.0-4-generic x86_64
UpgradeStatus: Upgraded to raring on 2013-01-25 (12 days ago)
UserGroups: adm admin cdrom dialout libvirtd lpadmin mythtv plugdev sambashare src sudo
---
ApportVersion: 2.8-0ubuntu4
Architecture: amd64
CompizPlugins: [core,composite,opengl,compiztoolbox,decor,imgpng,snap,place,grid,resize,regex,mousepoll,gnomecompat,unitymtgrabhandles,vpswitch,move,animation,expo,session,wall,ezoom,staticswitcher,workarounds,fade,scale,unityshell]
DistUpgraded: 2013-01-25 08:16:10,776 WARNING no activity on terminal for 300 seconds (Configuring texlive)
DistroCodename: raring
DistroRelease: Ubuntu 13.04
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, including running git bisection searches
GraphicsCard:
 Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:215a]
InstallationDate: Installed on 2010-09-24 (866 days ago)
InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.1)
MachineType: LENOVO 3249CTO
MarkForUpload: True
Package: xorg 1:7.7+1ubuntu4
PackageArchitecture: amd64
PlymouthDebug: Error: [Errno 13] Permission denied: u'/var/log/plymouth-debug.log'
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.8.0-4-generic root=/dev/mapper/hostname-root ro quiet splash --verbose vt.handoff=7
ProcVersionSignature: Ubuntu 3.8.0-4.8-generic 3.8.0-rc6
Tags: raring running-unity ubuntu single-occurrence
Uname: Linux 3.8.0-4-generic x86_64
UpgradeStatus: Upgraded to raring on 2013-01-25 (12 days ago)
UserGroups: adm admin cdrom dialout libvirtd lpadmin mythtv plugdev sambashare src sudo
dmi.bios.date: 08/23/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 6QET52WW (1.22 )
dmi.board.name: 3249CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6QET52WW(1.22):bd08/23/2010:svnLENOVO:pn3249CTO:pvrThinkPadX201:rvnLENOVO:rn3249CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 3249CTO
dmi.product.version: ThinkPad X201
dmi.sys.vendor: LENOVO
version.compiz: compiz 1:0.9.9~daily13.01.25-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.41-0ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 9.0.2-0ubuntu1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 9.0.2-0ubuntu1
version.xserver-xorg-core: xserver-xorg-core 2:1.13.2-0ubuntu1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.1.0-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.20.19-0ubuntu3
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.6-0ubuntu2
xserver.bootTime: Mon Feb 4 17:31:16 2013
xserver.logfile: /var/log/Xorg.0.log
xserver.version: 2:1.13.2-0ubuntu1

Revision history for this message
Steve Langasek (vorlon) wrote : ProcEnviron.txt

apport information

Changed in xorg-server (Ubuntu):
importance: Undecided → Critical
tags: added: apport-collected raring running-unity ubuntu
description: updated
affects: xorg-server (Ubuntu) → xorg (Ubuntu)
tags: added: single-occurrence
description: updated
Revision history for this message
Steve Langasek (vorlon) wrote : BootDmesg.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : BootLog.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : Dependencies.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : DpkgLog.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : GconfCompiz.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : LightdmDisplayLog.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : LightdmGreeterLog.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : LightdmGreeterLogOld.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : LightdmLog.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : Lspci.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : Lsusb.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : ProcEnviron.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : ProcModules.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : UdevDb.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : UdevLog.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : XorgConf.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : XorgLog.txt

apport information

Revision history for this message
Steve Langasek (vorlon) wrote : XorgLogOld.txt

apport information

Revision history for this message
Bryce Harrington (bryce) wrote :

For GPU lockup bugs with Intel graphics, you need to collect the output of 'dmesg' and your /sys/kernel/debug/dri/0/i915_error_state file. Both of these must be collected while the machine is locked up (e.g. by sshing into the sick machine over ethernet). See https://wiki.ubuntu.com/X/Troubleshooting/Freeze for additional info.

affects: xorg (Ubuntu) → xserver-xorg-video-intel (Ubuntu)
Changed in xserver-xorg-video-intel (Ubuntu):
importance: Critical → High
status: New → Incomplete
Revision history for this message
Steve Langasek (vorlon) wrote :

Hi Bryce,

> For GPU lockup bugs with Intel graphics, you need to collect the output
> of 'dmesg' and your /sys/kernel/debug/dri/0/i915_error_state file.

The dmesg output is included in the other kernel bug report I mentioned (bug #1117499). There was nothing relevant shown.

I didn't know to check /sys/kernel/debug/dri/0/i915_error_state. Is there a reason apport-collect doesn't gather this automatically if needed?

I'll wait and see if it happens again. I had a similar "hang" yesterday, but it resolved itself after a few minutes, so I'm not sure if that's the same issue as this one which required a reboot to recover X.

description: updated
Revision history for this message
Chris Wilson (ickle) wrote :

Sounds like a page-fault-of-doom scenario. Is it possible to reproduce? And if you can, can you grab a few stacktraces of Xorg and firefox, and also the kernel stacks from /proc/`pidof Xorg`/stack. You will need to use a ssh login.

bugbot (bugbot)
tags: added: freeze
Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1117563] Re: X hard lockup in raring, can't even switch VTs

On Thu, Feb 07, 2013 at 12:38:47PM -0000, Chris Wilson wrote:
> Sounds like a page-fault-of-doom scenario. Is it possible to reproduce?

I am loathe to reproduce it. But if it happens again, I will certainly
follow up.

> And if you can, can you grab a few stacktraces of Xorg and firefox,

Yes, if this happens again I'll fire up gdb. I only thought of doing that
after firefox had managed to die off on its own.

Revision history for this message
Bryce Harrington (bryce) wrote :

There is udev code to detect gpu lockups which will capture i915_error_state in that case, but the general apport-collect script doesn't (we haven't needed it up til now, but probably can't hurt to collect). Since it sounds like that didn't happen here, it suggests you're experiencing something different than a "classic" gpu lockup. In IRC you mention seeing this:

<slangasek> $ sudo cat /sys/kernel/debug/dri/0/i915_error_state
<slangasek> cat: /sys/kernel/debug/dri/0/i915_error_state: Cannot allocate memory

Revision history for this message
Steve Langasek (vorlon) wrote :

On Fri, Feb 08, 2013 at 09:20:11PM -0000, Bryce Harrington wrote:
> There is udev code to detect gpu lockups which will capture
> i915_error_state in that case, but the general apport-collect script
> doesn't (we haven't needed it up til now, but probably can't hurt to
> collect). Since it sounds like that didn't happen here, it suggests
> you're experiencing something different than a "classic" gpu lockup. In
> IRC you mention seeing this:

> <slangasek> $ sudo cat /sys/kernel/debug/dri/0/i915_error_state
> <slangasek> cat: /sys/kernel/debug/dri/0/i915_error_state: Cannot allocate memory

Yeah, though that happened as part of a different crash with different
first-order symptoms, so I wouldn't draw any conclusions from it in
connection with this particular bug.

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote :

Looks like you got apport to pick up the gpu lockup after all: bug #1119793

Revision history for this message
Steve Langasek (vorlon) wrote :

I believe I've reproduced this bug here. Steps to reproduce:

 - load something javascripty that angers firefox and makes it very, very laggy
 - watch compiz notice this lagginess and helpfully animate the window as non-responsive (shading), making everything even more laggy
 - switch workspaces
 - watch your control go away

This time, I was able to remotely kill firefox and recover control of the system.

Revision history for this message
Chris Wilson (ickle) wrote :

What style of lag is this? Since you mention that compiz rendering makes it worse, I imagine it to be render latency. Can you paste an example URL?

Revision history for this message
Steve Langasek (vorlon) wrote :

On Wed, Feb 13, 2013 at 05:51:34PM -0000, Chris Wilson wrote:
> What style of lag is this?

The style where firefox is running some javascript of indeterminate origin
instead of responding to input.

> Since you mention that compiz rendering makes it worse, I imagine it to be
> render latency.

Compiz rendering makes the *CPU contention* worse, thus making it harder for
firefox to process whatever it's doing.

> Can you paste an example URL?

No, it's not associated with a single URL, but with firefox's overall state.
I have many tabs/windows open, and the problem occurs intermittently when
loading the same page. It seems to happen fairly frequently with launchpad
bug pages.

Revision history for this message
Chris Wilson (ickle) wrote :

Steve, could you either watch 'sudo perf top' or grab a few stacktraces of Xorg to see where the cycles are going?

Revision history for this message
Steve Langasek (vorlon) wrote :

On Tue, Feb 19, 2013 at 05:14:17PM -0000, Chris Wilson wrote:
> Steve, could you either watch 'sudo perf top' or grab a few stacktraces
> of Xorg to see where the cycles are going?

If and when I reproduce the problem again, sure.

Revision history for this message
Chris Wilson (ickle) wrote :

A month has gone by without reproduction, so let's give it another couple then retire the bug. The heart of the issue I believe is a race with GPU reset, of which there are more patches that have gone into v3.9.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Incomplete
Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
bugbot (bugbot) wrote :

We're closing this bug since there has not been a response from the original reporter. However, the issue still exists please feel free to reopen with the requested information. If you're not the original reporter, we'd prefer you file a new bug report.

Some tips:

  * Report X.org bugs via the command: `ubuntu-bug xorg`

  * Test against the latest development Ubuntu. http://cdimage.ubuntu.com/daily-live/
    Bugs marked as affecting the development version tend to get priority attention.

  * The `xdiagnose` utility has functionality for enabling debugging and
    analyzing a few common X problems.

  * Tag your bugs with the Ubuntu versions you have reproduced the issue in.

  * See https://wiki.ubuntu.com/X/Reporting for tips on writing good bug reports.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.