[i965gm] GPU lockup EIR:0x00000010, followed by PLL state assertion failure

Bug #850939 reported by Damian Christey
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
High
linux (Ubuntu)
Fix Released
High
Unassigned
Precise
Fix Released
Undecided
Unassigned

Bug Description

First time I've seen this one.

[14585.553826] PM: resume of drv:sd dev:2:0:0:0 complete after 485.400 msecs
[14585.553863] PM: resume of drv:scsi_disk dev:2:0:0:0 complete after 141.803 msecs
[14585.554026] PM: resume of drv:scsi_device dev:2:0:0:0 complete after 485.560 msecs
[14585.640171] firewire_core: rediscovered device fw0
[14585.700866] thinkpad_acpi: ACPI backlight control delay disabled
[14585.700874] render error detected, EIR: 0x00000010
[14585.700877] page table error
[14585.700878] PGTBL_ER: 0x00000100
[14585.700881] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
[14585.700888] fixme: max PWM is zero.
[14585.700956] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[14585.702068] PM: resume of devices complete after 651.237 msecs
[14585.702248] PM: resume devices took 0.650 seconds
[14585.702295] PM: Finishing wakeup.
[14585.702297] Restarting tasks ... done.
[14585.716731] video LNXVIDEO:00: Restoring backlight state
[14585.860327] [drm] Changing LVDS panel from (+hsync, +vsync) to (-hsync, -vsync)
[14585.860483] ------------[ cut here ]------------
[14585.860518] WARNING: at /build/buildd/linux-3.0.0/drivers/gpu/drm/i915/intel_display.c:791 intel_enable_pipe+0x14a/0x150 [i915]()
[14585.860522] Hardware name: 6465CTO
[14585.860523] PLL state assertion failure (expected on, current off)
[14585.860525] Modules linked in: rfcomm bnep parport_pc ppdev snd_hda_codec_analog joydev thinkpad_acpi snd_seq_midi binfmt_misc snd_rawmidi snd_hda_intel snd_hda_codec snd_hwdep pcmcia arc4 snd_seq_midi_event snd_seq btusb iwl3945 iwl_legacy yenta_socket pcmcia_rsrc r852 sm_common nand nand_ids nand_bch mac80211 snd_pcm psmouse bluetooth bch nand_ecc pcmcia_core mtd i915 snd_seq_device serio_raw drm_kms_helper cfg80211 drm snd_timer snd_page_alloc snd i2c_algo_bit soundcore nvram wmi video tpm_tis firewire_sbp2 lp parport sdhci_pci sdhci firewire_ohci firewire_core crc_itu_t ahci libahci e1000e
[14585.860571] Pid: 7980, comm: kworker/u:31 Tainted: G W 3.0.0-11-generic #17-Ubuntu
[14585.860574] Call Trace:
[14585.860583] [<ffffffff8105e82f>] warn_slowpath_common+0x7f/0xc0
[14585.860587] [<ffffffff8105e926>] warn_slowpath_fmt+0x46/0x50
[14585.860600] [<ffffffffa01b27aa>] intel_enable_pipe+0x14a/0x150 [i915]
[14585.860613] [<ffffffffa01b4a41>] i9xx_crtc_mode_set+0x6f1/0xd70 [i915]
[14585.860622] [<ffffffffa018a893>] ? i915_enable_vblank+0x33/0x100 [i915]
[14585.860634] [<ffffffffa01a8d2f>] intel_crtc_mode_set+0x6f/0xa0 [i915]
[14585.860642] [<ffffffffa0171ff9>] drm_crtc_helper_set_mode+0x2a9/0x420 [drm_kms_helper]
[14585.860649] [<ffffffffa01721e3>] drm_helper_resume_force_mode+0x73/0x160 [drm_kms_helper]
[14585.860665] [<ffffffffa00babb3>] ? drm_irq_install+0x143/0x210 [drm]
[14585.860674] [<ffffffffa0181146>] i915_reset+0x296/0x440 [i915]
[14585.860683] [<ffffffffa0185ae0>] ? notify_ring+0xf0/0xf0 [i915]
[14585.860692] [<ffffffffa0185ba8>] i915_error_work_func+0xc8/0x110 [i915]
[14585.860699] [<ffffffff8107c05a>] process_one_work+0x11a/0x480
[14585.860702] [<ffffffff8107cd25>] worker_thread+0x165/0x370
[14585.860705] [<ffffffff8107cbc0>] ? manage_workers.isra.30+0x130/0x130
[14585.860709] [<ffffffff810811cc>] kthread+0x8c/0xa0
[14585.860714] [<ffffffff815f32e4>] kernel_thread_helper+0x4/0x10
[14585.860717] [<ffffffff81081140>] ? flush_kthread_worker+0xa0/0xa0
[14585.860720] [<ffffffff815f32e0>] ? gs_change+0x13/0x13
[14585.860722] ---[ end trace eaf1667dd518c105 ]---
[14585.980038] ------------[ cut here ]------------
[14585.980071] WARNING: at /build/buildd/linux-3.0.0/drivers/gpu/drm/i915/intel_display.c:915 assert_pipe+0x75/0x80 [i915]()
[14585.980074] Hardware name: 6465CTO
[14585.980076] pipe B assertion failure (expected on, current off)
[14585.980078] Modules linked in: rfcomm bnep parport_pc ppdev snd_hda_codec_analog joydev thinkpad_acpi snd_seq_midi binfmt_misc snd_rawmidi snd_hda_intel snd_hda_codec snd_hwdep pcmcia arc4 snd_seq_midi_event snd_seq btusb iwl3945 iwl_legacy yenta_socket pcmcia_rsrc r852 sm_common nand nand_ids nand_bch mac80211 snd_pcm psmouse bluetooth bch nand_ecc pcmcia_core mtd i915 snd_seq_device serio_raw drm_kms_helper cfg80211 drm snd_timer snd_page_alloc snd i2c_algo_bit soundcore nvram wmi video tpm_tis firewire_sbp2 lp parport sdhci_pci sdhci firewire_ohci firewire_core crc_itu_t ahci libahci e1000e
[14585.980122] Pid: 7980, comm: kworker/u:31 Tainted: G W 3.0.0-11-generic #17-Ubuntu
[14585.980124] Call Trace:
[14585.980132] [<ffffffff8105e82f>] warn_slowpath_common+0x7f/0xc0
[14585.980136] [<ffffffff8105e926>] warn_slowpath_fmt+0x46/0x50
[14585.980148] [<ffffffffa01a9d05>] assert_pipe+0x75/0x80 [i915]
[14585.980161] [<ffffffffa01b27ea>] intel_enable_plane+0x3a/0x80 [i915]
[14585.980173] [<ffffffffa01b4a7d>] i9xx_crtc_mode_set+0x72d/0xd70 [i915]
[14585.980183] [<ffffffffa018a893>] ? i915_enable_vblank+0x33/0x100 [i915]
[14585.980194] [<ffffffffa01a8d2f>] intel_crtc_mode_set+0x6f/0xa0 [i915]
[14585.980201] [<ffffffffa0171ff9>] drm_crtc_helper_set_mode+0x2a9/0x420 [drm_kms_helper]
[14585.980208] [<ffffffffa01721e3>] drm_helper_resume_force_mode+0x73/0x160 [drm_kms_helper]
[14585.980223] [<ffffffffa00babb3>] ? drm_irq_install+0x143/0x210 [drm]
[14585.980232] [<ffffffffa0181146>] i915_reset+0x296/0x440 [i915]
[14585.980241] [<ffffffffa0185ae0>] ? notify_ring+0xf0/0xf0 [i915]
[14585.980249] [<ffffffffa0185ba8>] i915_error_work_func+0xc8/0x110 [i915]
[14585.980256] [<ffffffff8107c05a>] process_one_work+0x11a/0x480
[14585.980259] [<ffffffff8107cd25>] worker_thread+0x165/0x370
[14585.980262] [<ffffffff8107cbc0>] ? manage_workers.isra.30+0x130/0x130
[14585.980265] [<ffffffff810811cc>] kthread+0x8c/0xa0
[14585.980270] [<ffffffff815f32e4>] kernel_thread_helper+0x4/0x10
[14585.980273] [<ffffffff81081140>] ? flush_kthread_worker+0xa0/0xa0
[14585.980276] [<ffffffff815f32e0>] ? gs_change+0x13/0x13
[14585.980278] ---[ end trace eaf1667dd518c106 ]---

ProblemType: Crash
DistroRelease: Ubuntu 11.10
Package: xserver-xorg-video-intel 2:2.15.901-1ubuntu2
ProcVersionSignature: Ubuntu 3.0.0-11.18-generic 3.0.4
Uname: Linux 3.0.0-11-generic x86_64
.tmp.unity.support.test.0:

ApportVersion: 1.23-0ubuntu1
Architecture: amd64
Chipset: i965gm
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
Date: Tue Sep 13 14:12:51 2011
DistUpgraded: Log time: 2011-06-05 21:41:42.955296
DistroCodename: oneiric
DistroVariant: ubuntu
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GraphicsCard:
 Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) [8086:2a02] (rev 0c) (prog-if 00 [VGA controller])
   Subsystem: Lenovo T61 [17aa:20b5]
   Subsystem: Lenovo T61 [17aa:20b5]
InterpreterPath: /usr/bin/python2.7
MachineType: LENOVO 6465CTO
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-11-generic root=UUID=eda8e981-7ccb-44f8-bb03-01dbd64009e4 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.6+7ubuntu7
 libdrm2 2.4.26-1ubuntu1
 xserver-xorg-video-intel 2:2.15.901-1ubuntu2
SourcePackage: xserver-xorg-video-intel
Title: [i965gm] False GPU lockup
UpgradeStatus: Upgraded to oneiric on 2011-06-06 (101 days ago)
UserGroups:

dmi.bios.date: 04/25/2008
dmi.bios.vendor: LENOVO
dmi.bios.version: 7LETB7WW (2.17 )
dmi.board.name: 6465CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr7LETB7WW(2.17):bd04/25/2008:svnLENOVO:pn6465CTO:pvrThinkPadT61:rvnLENOVO:rn6465CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 6465CTO
dmi.product.version: ThinkPad T61
dmi.sys.vendor: LENOVO
version.compiz: compiz 1:0.9.5.94+bzr2803-0ubuntu3
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.26-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 7.11-0ubuntu3
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 7.11-0ubuntu3
version.xserver-xorg: xserver-xorg 1:7.6+7ubuntu7
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.6.0-1ubuntu13
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20110811.g93fc084-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.15.901-1ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110411+8378443-1

Revision history for this message
Damian Christey (damian-christey) wrote :
Revision history for this message
In , Bryce Harrington (bryce) wrote :
Download full text (5.3 KiB)

Forwarding this bug from Ubuntu reporter Damian Christey:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/850939

[Problem]
Initially this looks like a false GPU lockup, because it seems to mask the error, but then there is a 'fixme', followed immediately by a render ring initialization failure. This appears to be going on during a resume from suspend.

[Original Description]
First time I've seen this one.

[14585.553826] PM: resume of drv:sd dev:2:0:0:0 complete after 485.400 msecs
[14585.553863] PM: resume of drv:scsi_disk dev:2:0:0:0 complete after 141.803 msecs
[14585.554026] PM: resume of drv:scsi_device dev:2:0:0:0 complete after 485.560 msecs
[14585.640171] firewire_core: rediscovered device fw0
[14585.700866] thinkpad_acpi: ACPI backlight control delay disabled
[14585.700874] render error detected, EIR: 0x00000010
[14585.700877] page table error
[14585.700878] PGTBL_ER: 0x00000100
[14585.700881] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
[14585.700888] fixme: max PWM is zero.
[14585.700956] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[14585.702068] PM: resume of devices complete after 651.237 msecs
[14585.702248] PM: resume devices took 0.650 seconds
[14585.702295] PM: Finishing wakeup.
[14585.702297] Restarting tasks ... done.
[14585.716731] video LNXVIDEO:00: Restoring backlight state
[14585.860327] [drm] Changing LVDS panel from (+hsync, +vsync) to (-hsync, -vsync)
[14585.860483] ------------[ cut here ]------------
[14585.860518] WARNING: at /build/buildd/linux-3.0.0/drivers/gpu/drm/i915/intel_display.c:791 intel_enable_pipe+0x14a/0x150 [i915]()
[14585.860522] Hardware name: 6465CTO
[14585.860523] PLL state assertion failure (expected on, current off)
[14585.860525] Modules linked in: rfcomm bnep parport_pc ppdev snd_hda_codec_analog joydev thinkpad_acpi snd_seq_midi binfmt_misc snd_rawmidi snd_hda_intel snd_hda_codec snd_hwdep pcmcia arc4 snd_seq_midi_event snd_seq btusb iwl3945 iwl_legacy yenta_socket pcmcia_rsrc r852 sm_common nand nand_ids nand_bch mac80211 snd_pcm psmouse bluetooth bch nand_ecc pcmcia_core mtd i915 snd_seq_device serio_raw drm_kms_helper cfg80211 drm snd_timer snd_page_alloc snd i2c_algo_bit soundcore nvram wmi video tpm_tis firewire_sbp2 lp parport sdhci_pci sdhci firewire_ohci firewire_core crc_itu_t ahci libahci e1000e
[14585.860571] Pid: 7980, comm: kworker/u:31 Tainted: G W 3.0.0-11-generic #17-Ubuntu
[14585.860574] Call Trace:
[14585.860583] [<ffffffff8105e82f>] warn_slowpath_common+0x7f/0xc0
[14585.860587] [<ffffffff8105e926>] warn_slowpath_fmt+0x46/0x50
[14585.860600] [<ffffffffa01b27aa>] intel_enable_pipe+0x14a/0x150 [i915]
[14585.860613] [<ffffffffa01b4a41>] i9xx_crtc_mode_set+0x6f1/0xd70 [i915]
[14585.860622] [<ffffffffa018a893>] ? i915_enable_vblank+0x33/0x100 [i915]
[14585.860634] [<ffffffffa01a8d2f>] intel_crtc_mode_set+0x6f/0xa0 [i915]
[14585.860642] [<ffffffffa0171ff9>] drm_crtc_helper_set_mode+0x2a9/0x420 [drm_kms_helper]
[14585.860649] [<ffffffffa01721e3>] drm_helper_resume_force_mode+0x73/0x160 [drm_kms_helper]
[14585.860665] [<ffffffffa00babb3...

Read more...

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 51466
BootDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 51467
CurrentDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 51468
i915_error_state.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 51469
XorgConf.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 51470
XorgLog.txt

Bryce Harrington (bryce)
description: updated
summary: - [i965gm] False GPU lockup
+ [i965gm] GPU lockup EIR:0x00000010, followed by PLL state assertion
+ failure
Revision history for this message
Bryce Harrington (bryce) wrote :

Damian Christey - I've forwarded this bug upstream to http://bugs.freedesktop.org/show_bug.cgi?id=41092 - please subscribe yourself to this bug, in case they need further information or wish you to test something. Thanks ahead of time!

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Triaged
Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → High
Changed in xserver-xorg-video-intel:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
Bryce Harrington (bryce) wrote :

Hi, just checking back in for status. Have you seen more of these lockups since the release or within the last few weeks?

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Daniel Gnoutcheff (gnoutchd) wrote : Re: [Bug 850939] Re: [i965gm] GPU lockup EIR:0x00000010, followed by PLL state assertion failure

On 10/25/2011 06:29 PM, Bryce Harrington wrote:
> Hi, just checking back in for status. Have you seen more of these
> lockups since the release or within the last few weeks?

Yes, I had another one of these lockups just now.

xserver-xorg-video-intel 2:2.15.901-1ubuntu2
linux-image-3.0.0-12-generic 3.0.0-12.20 (amd64)

Robert Hooker (sarvatt)
affects: xserver-xorg-video-intel (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Robert Hooker (sarvatt)
tags: added: precise
removed: false-gpu-hang single-occurrence
Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

This bug seems to be the crash that I sometimes get when I resume my laptop from suspend. Whenever I resume, the display will always turn on initially, usually with a still image from some framebuffer console. Usually, the X server will then come to life after a short delay (e.g. with the unlock dialog from gnome-screensaver). Other times, however, there is a longer delay, after which the display shuts off (blacklight and all). The rest of the system remains functioning, and when I log in via ssh, I'll find the reported error messages in the kernel log. The display remains inoperable even if I do an Xserver restart, a VT switch, another suspend/resume, or a hibernate/thaw. A soft reboot, however, does reset the display.

distro: Ubuntu 11.10 amd64
kernel 3.0.0-16-generic
libdrm 2.4.26-1ubuntu1
mesa 7.11-0ubuntu3
xorg-server 2:1.10.4-1ubuntu4.2
xf86-video-intel 2:2.15.901-1ubuntu2.1
machine: Lenovo Thinkpad R61 7733A82

I've had about fourteen of these crashes in the past month and a half, and I've managed to collect debugging logs on most of them. However, this bug has defeated all my attempts to deliberately reproduce it. Sometimes I go for almost three weeks with no hangs, other times I get several hangs in one day. This may be related to changes in my usage patterns (how frequently I use suspend, how much time I waste on Youtube videos, etc.), but I haven't found any correlation yet, nor do I have any real idea of what to look for.

Some observations I have made so far:
-) Doing many suspend/resume cycles in a row (e.g. running pm-suspend 50 times in a script) is *not* sufficient to reproduce this bug.
-) I once attempted to reproduce this bug by opening a large number of windows/applications at once, with the hope that this would stress the graphics system. However, it had no apparent effect on suspend/resume performance.
-) This bug does not seem to depend on the kernel version. I've encountered these hangs with the 3.0.0 kernel shipped with 11.10, and I've also seen it with vanilla 2.6.38, 2.6.39, and 3.2.0 kernels. However, I never encountered this bug when running Ubuntu 11.04 and its 2.6.38 kernel. It would seem that userspace changes are key.
-) This bug does not seem to be affected by the use of OpenGL compositing managers: the crashes continued even when I switched from Unity to Unity2D.

I'll attach some logs from one of my most recent hangs. I've added drm.debug=0x06 to my default kernel command line, which wastes syslog space but ensures that verbose logging is on when a hang hits. The kernel messages are from my syslog (from which I've pulled out irrelevant non-kernel messages). All of these logs were saved shortly after the hang (while I was logged in via ssh).

I've also set up some small scripts to collect logs and statistics with every suspend/resume, with the hope that this would help find what makes the difference between successful and unsuccessful suspends. Right now I only save dumps from xwininfo and xrestop, for lack of a better idea.

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

Created attachment 57259
gnoutchd-KernelLog.txt

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

Created attachment 57260
gnoutchd-i915_error_state.txt

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

Created attachment 57261
gnoutchd-intel_reg_dumper.txt

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

Created attachment 57262
gnoutchd-XorgLog.txt

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

Created attachment 57263
gnoutchd-vbios.dump

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

FWIW, I disabled DRI with
  Option "DRI" "false"
in xorg.conf, but I still got a hang just now. That would seem to confirm that this regression is not related to the mesa driver.

This is still with the xserver-xorg-video-intel package shipped with Ubuntu oneric (2:2.15.901-1ubuntu2.1). I'll see about re-testing with updated versions of everything.

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

This bug is still present on Ubuntu precise (12.04) beta. The relevant package versions are:

kernel: 3.2.0-18.29
libdrm2: 2.4.30-1ubuntu1
libgl1-mesa-glx: 8.0.1-0ubuntu2
xserver-xorg: 1:7.6+10ubuntu1
xserver-xorg-video-intel: 2:2.17.0-1ubuntu4

Revision history for this message
In , Chris Wilson (ickle) wrote :

I believe these are all related to the underlying bug:

commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604
Author: Chris Wilson <email address hidden>
Date: Wed Dec 14 13:57:23 2011 +0100

    drm/i915: Only clear the GPU domains upon a successful finish

    By clearing the GPU read domains before waiting upon the buffer, we run
    the risk of the wait being interrupted and the domains prematurely
    cleared. The next time we attempt to wait upon the buffer (after
    userspace handles the signal), we believe that the buffer is idle and so
    skip the wait.

    There are a number of bugs across all generations which show signs of an
    overly haste reuse of active buffers.

    Such as:

      https://bugs.freedesktop.org/show_bug.cgi?id=29046
      https://bugs.freedesktop.org/show_bug.cgi?id=35863
      https://bugs.freedesktop.org/show_bug.cgi?id=38952
      https://bugs.freedesktop.org/show_bug.cgi?id=40282
      https://bugs.freedesktop.org/show_bug.cgi?id=41098
      https://bugs.freedesktop.org/show_bug.cgi?id=41102
      https://bugs.freedesktop.org/show_bug.cgi?id=41284
      https://bugs.freedesktop.org/show_bug.cgi?id=42141

    A couple of those pre-date i915_gem_object_finish_gpu(), so may be
    unrelated (such as a wild write from a userspace command buffer), but
    this does look like a convincing cause for most of those bugs.

    Signed-off-by: Chris Wilson <email address hidden>
    Cc: <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    Reviewed-by: Eugeni Dodonov <email address hidden>
    Signed-off-by: Daniel Vetter <email address hidden>

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

to mark dup to show relationship

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

*** This bug has been marked as a duplicate of bug 29046 ***

Changed in xserver-xorg-video-intel:
status: Confirmed → Invalid
Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

NAK, it appears that the referenced commit does *not* fix this bug. I just had another one of these crashes with v3.4-rc1 despite the fact that it contains this commit.

Revision history for this message
In , Chris Wilson (ickle) wrote :

Without the recent error-state I can't confirm that you experienced the same crash.

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

Created attachment 59973
gnoutchd-v3.4-rc1+-i915_error_state

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

Created attachment 59974
gnoutchd-v3.4-rc1+-KernelLog.txt

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

Created attachment 59975
gnoutchd-v3.4-rc1+-intel_reg_dumper.txt

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

error-state and logs attached as requested. The running kernel was compiled from commit v3.4-rc1-271-g5d32c88

Revision history for this message
In , Chris Wilson (ickle) wrote :

Created attachment 59976
Catch writes to address 0

Ok, somebody overwrite the ring-buffer. Just check you don't have the gallium-965 driver installed and try this patch which will hopefully catch the culprit.

Revision history for this message
In , Chris Wilson (ickle) wrote :

Wait a sec... Does this hang almost immediately upon boot? I'd been assuming the values were reset due to a failure during the hang, but on the other hand, we might be not flushing the ringbuffer correctly.

Revision history for this message
In , Chris Wilson (ickle) wrote :

Can you try:

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/inte
index 041f144..8e632a5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -928,6 +928,10 @@ int intel_init_ring_buffer(struct drm_device *dev,
        if (ret)
                goto err_unref;

+ ret = i915_gem_object_set_to_gtt_domain(obj, true);
+ if (ret)
+ goto err_unpin;
+
        ring->map.size = ring->size;
        ring->map.offset = dev->agp->base + obj->gtt_offset;
        ring->map.type = 0;

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

--- Comment #24 from Chris Wilson<email address hidden> 2012-04-14
08:46:59 PDT ---
> Wait a sec... Does this hang almost immediately upon boot?

As I described in comment 6, this is a hang that occasionally happens
immediately after resume from suspend.

I've applied the patch in comment 25 on top of v3.4-rc2-333-g668ce0a and
I am trying it now. It doesn't seem to have broken anything, but since
this hang is rare, unpredictable, and (at present) not reliably
reproducible, I can't say with confidence that the bug is fixed until
I've run it for at least a month.

All of my remarks in comment 6 still hold true. I'm still hoping to
find a way to quickly reproduce this, but at present, I know nothing at
all about computer graphics, so I'm stuck behind a huge learning curve.

Changed in xserver-xorg-video-intel:
status: Invalid → Incomplete
Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

OK, I am reasonably confident that the patch from comment 25 fixes this bug.

 From April 14th to May 22ed, I made sure to use a kernel that contained
that patch, and I never had any post-resume hangs during that period.
As an experiment, I then reverted the patch, and it didn't take long to
get another post-resume hang. So the application of that patch strongly
correlates with the absence of these hangs.

Looks good to me. :D

Revision history for this message
In , Marcos Dione (mdione) wrote :
Download full text (4.1 KiB)

I manage to constantly hit a similar bug with following setup:

(mostly current Debian Sid versions)

kernels: 2.6.39-1/2/3, 3.0.0-2, 3.1.8-2, 3.2.18, 3.2.19
xserver-xorg-video-intel: 2.18.0-2+b1
libdrm2/libdrm-intel1: 2.4.33-1
ufoai: 2.3.1-1~getdeb1 [from getdeb.net]

how to reproduce: launch ufo, select single player game, skirmish, start game. the 3d terrain shows up, but after a few seconds (I woulds say 2-5) it hangs, creen goes black and one or two seconds after music stops. the machine is still reachable by ssh.

dmesg says:

[ 151.976045] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 151.976052] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 151.977046] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 4402 at 4395, next 4419)
[ 151.977213] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[ 151.977542] [drm] Changing LVDS panel from (+hsync, +vsync) to (-hsync, -vsync)
[ 151.977695] ------------[ cut here ]------------
[ 151.977717] WARNING: at /build/buildd-linux-2.6_2.6.39-1-i386-RRBuT6/linux-2.6-2.6.39/debian/build/source_i386_none/drivers/gpu/drm/i915/intel_display.c:1079 intel_enable_pipe+0x59/0xfa [i915]()
[ 151.977720] Hardware name: Inspiron 1420
[ 151.977722] PLL state assertion failure (expected on, current off)
[ 151.977724] Modules linked in: cryptd aes_i586 aes_generic ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse coretemp firewire_sbp2 loop sr_mod cdrom usbhid hid ata_generic snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm option usb_wwan snd_page_alloc r852 sm_common usbserial snd_seq_midi nand snd_seq_midi_event ata_piix nand_ecc i915 uhci_hcd ehci_hcd usbcore snd_rawmidi nand_ids snd_seq arc4 ecb mtd iwl3945 tg3 drm_kms_helper snd_seq_device snd_timer firewire_ohci snd battery ac dell_wmi sparse_keymap joydev pcspkr drm i2c_i801 iwl_legacy i2c_algo_bit dell_laptop mac80211 firewire_core i2c_core crc_itu_t cfg80211 power_supply soundcore wmi rfkill psmouse processor video button sdhci_pci iTCO_wdt libphy sdhci serio_raw mmc_core dcdbas iTCO_vendor_support evdev ext3 mbcache jbd raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear md_mod sd_mod crc_t10dif ahci libahci libata scsi_mod thermal thermal_sys
[ 151.977811] Pid: 153, comm: kworker/u:1 Not tainted 2.6.39-1-686-pae #1
[ 151.977813] Call Trace:
[ 151.977819] [<c1036b3d>] ? warn_slowpath_common+0x6a/0x7b
[ 151.977830] [<f85e6292>] ? intel_enable_pipe+0x59/0xfa [i915]
[ 151.977833] [<c1036bb4>] ? warn_slowpath_fmt+0x28/0x2c
[ 151.977844] [<f85e6292>] ? intel_enable_pipe+0x59/0xfa [i915]
[ 151.977857] [<f85ea4a7>] ? intel_crtc_mode_set+0x14bb/0x157e [i915]
[ 151.977866] [<f85d229f>] ? i915_read32+0x18/0x21 [i915]
[ 151.977877] [<f85e31e1>] ? i915_write32+0x1b/0x27 [i915]
[ 151.977886] [<f875757c>] ? drm_crtc_helper_set_mode+0x1e3/0x31a [drm_kms_helper]
[ 151.977894] [<f8757701>] ? drm...

Read more...

Revision history for this message
In , Daniel Gnoutcheff (gnoutchd) wrote :

--- Comment #28 from Marcos Dione<email address hidden> 2012-06-02
08:30:32 PDT ---
> how to reproduce: launch ufo, select single player game, skirmish, start game.
> the 3d terrain shows up, but after a few seconds (I woulds say 2-5) it hangs,
> creen goes black and one or two seconds after music stops. the machine is still
> reachable by ssh.

Thanks for the report, Marcos. Does it hang happen every time you run
the game? I was not able to reproduce the hang that way, even without
the patch from comment 25. Of course, I have different userspace
(Ubuntu 12.04 amd64 with playdeb's ufoai) and different hardware (Lenovo
ThinkPad R61 7733A82).

Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to comment #28)
> I manage to constantly hit a similar bug with following setup:
>
> (mostly current Debian Sid versions)
>
> kernels: 2.6.39-1/2/3, 3.0.0-2, 3.1.8-2, 3.2.18, 3.2.19
> xserver-xorg-video-intel: 2.18.0-2+b1
> libdrm2/libdrm-intel1: 2.4.33-1
> ufoai: 2.3.1-1~getdeb1 [from getdeb.net]
>
> how to reproduce: launch ufo, select single player game, skirmish, start game.
> the 3d terrain shows up, but after a few seconds (I woulds say 2-5) it hangs,
> creen goes black and one or two seconds after music stops. the machine is still
> reachable by ssh.

Marcus, I expect that you have a distinct bug caused a defect in mesa trigged by the game. Can you please open a new bug and attach your dmesg, Xorg.0.log and critically the /sys/kernel/debug/dri/0/i915_error_state following the hang.

Changed in xserver-xorg-video-intel:
status: Incomplete → In Progress
Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Patch merged to -fixes as:

commit 3eef8918ff440837f6af791942d8dd07e1a268ee
Author: Chris Wilson <email address hidden>
Date: Mon Jun 4 17:05:40 2012 +0100

    drm/i915: Mark the ringbuffers as being in the GTT domain

Thanks a lot for reporting this bug.

Changed in xserver-xorg-video-intel:
status: In Progress → Fix Released
Revision history for this message
Bryce Harrington (bryce) wrote :

Upstream has backported the patch to their -fixes tree. This is probably ripe for kernel team to pull into the ubuntu kernel.

tags: added: kernel-handoff-graphics
Revision history for this message
Bryce Harrington (bryce) wrote :

Patch merged to -fixes as:

commit 3eef8918ff440837f6af791942d8dd07e1a268ee
Author: Chris Wilson <email address hidden>
Date: Mon Jun 4 17:05:40 2012 +0100

    drm/i915: Mark the ringbuffers as being in the GTT domain

Thanks a lot for reporting this bug.

Bryce Harrington (bryce)
no longer affects: linux (Ubuntu Oneiric)
Revision history for this message
In , Björn Ruberg (bjoern-ruberg-wegener) wrote :
Download full text (16.9 KiB)

I'm seeing this bug on a compaq 6720s laptop after having upgraded the CPU from celeron-550 to core2duo t7100 half a year ago. It nearly always makes the laptop unuseable after logging in to a desktop environment (no matter which one).

Im running Fedora 18, specifically with kernel 3.9.6 and xorg-x11-drv-intel 2.21.8. libdrm 2.4.45, xorg-x11-server-Xorg 1.13.3 .

Find i915_error_state attached, dmesg follows.

[ 37.587352] cgroup: libvirtd (791) created nested cgroup for controller "memory" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.
[ 37.587357] cgroup: "memory" requires setting use_hierarchy to 1 on the root.
[ 37.587428] cgroup: libvirtd (791) created nested cgroup for controller "devices" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.
[ 37.587496] cgroup: libvirtd (791) created nested cgroup for controller "blkio" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.
[ 45.093808] fuse init (API version 7.21)
[ 64.704072] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 64.704107] [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state
[ 64.710181] ------------[ cut here ]------------
[ 64.710263] WARNING: at drivers/gpu/drm/i915/intel_display.c:1051 intel_enable_pipe+0xe7/0x190 [i915]()
[ 64.710270] Hardware name: HP Compaq 6720s
[ 64.710274] PLL state assertion failure (expected on, current off)
[ 64.710278] Modules linked in: fuse ebtable_nat nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables dm_crypt snd_hda_codec_analog snd_hda_intel vhost_net acpi_cpufreq mperf coretemp iTCO_wdt iTCO_vendor_support snd_hda_codec e1000e snd_hwdep lib80211_crypt_tkip snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd tun macvtap macvlan hp_wmi sparse_keymap soundcore wl(POF) nfsd auth_rpcgss nfs_acl kvm_intel lockd kvm cfg80211 rfkill lib80211 microcode serio_raw lpc_ich mfd_core ptp sunrpc pps_core uinput i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video
[ 64.710401] Pid: 1372, comm: upowerd Tainted: PF O 3.9.6-200.fc18.x86_64 #1
[ 64.710406] Call Trace:
[ 64.710423] [<ffffffff8105ef85>] warn_slowpath_common+0x75/0xa0
[ 64.710432] [<ffffffff8105f066>] warn_slowpath_fmt+0x46/0x50
[ 64.710481] [<ffffffffa00b2e97>] intel_enable_pipe+0xe7/0x190 [i915]
[ 64.710523] [<ffffffffa00b7157>] i9xx_crtc_mode_set+0xb37/0x13a0 [i915]
[ 64.710565] [<ffffffffa00b588e>] __intel_set_mode+0x5be/0x970 [i915]
[ 64.710610] [<ffffffffa00bca56>] intel_set_mode+0x16/0x30 [i915]
[ 64.710653] [<ffffffffa00bdc79>] intel_get_load_detect_pipe+0x269/0x420 [i915]
[ 64.710702] [<ffffffffa00db574>] intel_tv_detect+0x124/0x4e0 [i915]
[ 64.710744] [<ffffffffa0032b06>] status_show+0x46/0x90 [drm]
[ 64.710757] [<ffffffff813f5eb0>] dev_attr_show+0x20/0x60
[ 64.710767] [<ffffffff81214be5>] sysfs_r...

Revision history for this message
In , Björn Ruberg (bjoern-ruberg-wegener) wrote :

Created attachment 81216
i915_error_state on compaq 6720s

Revision history for this message
In , Chris Wilson (ickle) wrote :

Bjorn, that is a completely different bug and looks just like a common mesa bug.

Revision history for this message
penalvch (penalvch) wrote :

Damian Christey, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder, but the one all the way at the bottom) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.12

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: bios-outdated-2.30
tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Daniel Gnoutcheff (gnoutchd) wrote :

I have not encountered this bug in a long time, and I believe that it is fixed in the kernel as of upstream commit 3eef8918, which is included in v3.5 and later. The fix was backported to v3.2.y kernels with commit 71583724, which is included in v3.2.21 and later.

So, quantal and later have the fix because they use kernel v3.5 and later, precise has this fix because it tracks the upstream v3.2.y stable kernel, and if my memory is correct, lucid was never affected by this bug. So, this bug no longer affects any supported version of Ubuntu.

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in linux (Ubuntu Precise):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.