[i945g] GPU hang and consecutive loss of dri/drm capability [lucid]

Bug #593463 reported by gene
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-intel

After a recent upgrade to Lucid, I get intermittent Xserver crashes resulting from a few days of uptime. Usually it happens when power-manager kicks in to blank the screen. I get these message in the kern.log:
Jun 13 01:21:22 domus kernel: [392077.221266] [drm:i915_hangcheck_elapsed] =*ERROR* Hangcheck timer elapsed... GPU hung
Jun 13 01:21:22 domus kernel: [392077.221276] render error detected, EIR: 0=x00000000
Jun 13 01:21:22 domus kernel: [392077.221293] [drm:i915_do_wait_request] *E=RROR* i915_do_wait_request returns -5 (awaiting 18764775 at 18764774)

Kernel seems to be intact: other services, like webserver works. I get a messages that a low graphics mode should be eneabled. And the dri/drm capablipities are being lost, unless I reboot the machine.
The messages similar to the above are related to a few other issues however not identical to the one I experience.
This olso maybe related to the i915 memeory handling, 'cause I can see that memory gets "more abused" on Lucid (I also witness it on an ATI chip). Is there a memory leak in the Xserver code? I see "(II) intel(0): No memory allocations' in the Xorg.log. Moreover,

Along with the kern.log messages I see the following in my
Xorg.[012].log:

Xorg.[02.]log

(=3D=3D) intel(0): Silken mouse enabled
(II) intel(0): Initializing HW Cursor
(II) intel(0): No memory allocations

Xorg.1.log:

(EE) FBDEV(0): FBIOPUTCMAP: Invalid argument

Last message floods the log file I get =

grep '(EE) FBDEV' /var/log/Xorg.1.log | wc -l
144356
!!!!!!!!

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: xserver-xorg-video-intel 2:2.9.1-3ubuntu5
Uname: Linux 2.6.34-020634-generic x86_64
Architecture: amd64
Date: Sun Jun 13 17:31:13 2010
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: xserver-xorg-video-intel
system:
 distro: Ubuntu
 codename: lucid
 architecture: x86_64
 kernel: Linux 2.6.32-22-generic #36-Ubuntu SMP Thu Jun 3 19:31:57 UTC 2010 x86_64 GNU/Linux

Revision history for this message
gene (eugenios) wrote :
Revision history for this message
gene (eugenios) wrote :

I am no *34 kernel version. The issue occurred with the latest ubuntu kernel version
uname -a: Linux 2.6.32-22-generic #36-Ubuntu SMP Thu Jun 3 19:31:57 UTC 2010 x86_64 GNU/Linux
As mentioned above the problem is similar to a few other intel related ones, e.g., https://bugs.launchpad.net/bugs/528467

description: updated
description: updated
Revision history for this message
gene (eugenios) wrote :

The behavior of the system at the time of crash differs quite much from those reported in #528467 et al. In my case I just lose drm/dri, everything else remains functioning as it should. This makes it a distinct bug

Revision history for this message
gene (eugenios) wrote :

One of the Xorg.log file. It gets flooded with an error message I mentioned above. I guess it results from unsuccessful attempts to revive gdm

Revision history for this message
gene (eugenios) wrote :

oops, sorry, gdm wouldn't have to do with Xorg.1.log I try running
xinit -- :1
and this log file get created

Bryce Harrington (bryce)
tags: added: crash
Geir Ove Myhr (gomyhr)
summary: - xsession crash and consecutive loss of dri/drm capability on i915
+ [i945g] xsession crash and consecutive loss of dri/drm capability on
+ i915
tags: added: 945g
Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
Revision history for this message
gene (eugenios) wrote : 2.6.34* kernel

I grabbed the kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.34-lucid/
So
$ uname -a;uptime
Linux domus 2.6.34-020634-generic #020634 SMP Mon May 17 19:27:49 UTC 2010 x86_64 GNU/Linux
 11:09:47 up 10 days, 3:27, 7 users, load average: 0.00, 0.05, 0.07
I have not had this problem so far...
--
Please do not send me Microsoft Office/Apple iWork documents. Send OpenDocument instead! http://fsf.org/campaigns/opendocument

Revision history for this message
gene (eugenios) wrote : bug still persists

It seems that this bug still persists for me. I got an X-server "hiccup" once again and lost an X for this boot session. So I have to compose this from the emacs-nox.

uname: Linux 2.6.32-24-generic #41-Ubuntu SMP Thu Aug 19 01:38:40 UTC 2010 x86_64 GNU/Linux

I am pretty sure that I disabled the power manager. When I had this crash last time on a recent maverick kernel (happened only once for me) I saw the power-manager enabled somehow. With maverick kernels I tended to have longer uptaime (10-15 days)

uptime : 22:42:08 up 3 days, 12:52, 2 users, load average: 0.19, 0.33, 0.69

This time I was running stellarium which is pretty CPU intensive.

It is a pretty annoying bug suffered by most i915 based chips. Does anyone know if there is "any light in the end of the tunnel"?
In turn, I will get an updated maverick kernel to keep testing it.
--

Please do not send me Microsoft Office/Apple iWork documents. Send OpenDocument instead! http://fsf.org/campaigns/opendocument

Revision history for this message
gene (eugenios) wrote : Re: [i945g] xsession crash and consecutive loss of dri/drm capability on i915

The most recent maverick kernel 2.6.36-020636rc2-generic was too rough to me and gave me an immediate freeze at the beginning of gdm. I will continue to test the mainline 2.6.32-24-generic #41-Ubuntu SMP Thu Aug 19 01:38:40 UTC 2010 kernel instead, before it fails again and I find out which kernel version are safe to use in the sense of the last security issue happened a few days ago.

Revision history for this message
gene (eugenios) wrote :

 I recently built 2.6.37-rc3 from the kernel.org with Mike Galbraith's patch being applied to it. It looks stable so far. Besides seeming keyboard lock-ups shortly after getting to the gdm login window. The "lock-up" lasted for a few seconds and did not come back. The bug was still present for 2.6.35.4, which I tested for quite a while and the issues in question occurred only once.
The following work-around is shaping up:
1) gnome-power-manager should be turned off and "xset dpms 300 300 300" setting should be used for the Xsession instead.
2) if a buggy application (such as npviewer, the adobe flash player wrapper) uses i915, one should issue
the command "xset dpms 0 0 0" to disable display sleep.

The only occasion when dri/drm was dropped by the kernel was on 2.6.35.4 when the infamous Adobe flashplayer played a video. Apparently flasplayer does not care about the display sleep option. It also uses way too much CPU time.

As far as 2.6.37-rc3 is concerned, it might've been improved. Since when a video was played by the Adobe flashplayer (with gnome power manager on) the dsplay went to sleep and could not be returned by any means, other than "SysReq+K" magic keys. The kernel did NOT drop dri/drm capabilities this time !

Revision history for this message
gene (eugenios) wrote :
Download full text (4.1 KiB)

Here's an update:
uname -a :
Linux 2.6.37-rc3-mine #1 SMP Sat Nov 27 19:08:07 CST 2010 x86_64 GNU/Linux
No drm/dri dropping so far, however, a gnome session is getting killed when some under a certain cpu/memory/disk usage load. When I tried to add a media to amarok I lost my gnome session and got to the gdm login. After logging in metacity was in use, I had to do "compiz --repace", which worked.

Here is what I get in the kern.log:

Dec 2 11:57:14 my kernel: [397881.555436] 11:3:1: cannot get freq at ep 0x84
Dec 2 11:57:14 my kernel: [397881.579563] 11:3:1: cannot get freq at ep 0x84
Dec 2 12:54:18 my kernel: [401305.493365] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 12:54:19 my kernel: [401306.144713] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 12:54:19 my kernel: [401306.180669] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 12:54:19 my kernel: [401306.264823] [drm:i915_gem_object_bind_to_gtt] *ERROR* Attempting to bind a purgeable object
Dec 2 12:55:06 my kernel: [401353.309903] 11:3:1: cannot get freq at ep 0x84
Dec 2 12:55:06 my kernel: [401353.345028] 11:3:1: cannot get freq at ep 0x84
Dec 2 12:58:43 my kernel: [401571.028764] 11:3:1: cannot get freq at ep 0x84
Dec 2 12:58:43 my kernel: [401571.066888] 11:3:1: cannot get freq at ep 0x84
Dec 2 13:05:51 my kernel: [401998.422719] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:07:34 my kernel: [402101.878458] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:07:34 my kernel: [402101.878595] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:07:34 my kernel: [402101.878641] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:07:34 my kernel: [402101.878688] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:07:34 my kernel: [402101.878736] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:09:58 my kernel: [402245.435529] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:09:58 my kernel: [402245.437516] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:09:58 my kernel: [402245.437594] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:09:58 my kernel: [402245.437636] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:09:58 my kernel: [402245.437676] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:09:58 my kernel: [402245.438307] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:09:58 my kernel: [402245.438435] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
Dec 2 13:09:58 my kernel: [402245.438566] [drm:i915_gem_mmap_gtt_ioctl] *ERROR* Attempting to mmap a purgeable buffer
 The Xorg.0.log contains this :
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/X (xorg_backtrace+0...

Read more...

Revision history for this message
gene (eugenios) wrote :

The stable 2.6.36.1 kernel seems to be have better. Mo complaints from the kernel like "drm:i915_hangcheck_elapsed"
uname -a
Linux domus 2.6.36.1-mine #1 SMP Sat Dec 4 12:56:31 CST 2010 x86_64 GNU/Linux

Well, that might also be because I recently installed a newer version of i915 - xf86-intel ver. 2.9.99

Revision history for this message
gene (eugenios) wrote :

uname -a
Linux 2.6.35.13-mine #1 SMP Fri May 6 00:20:57 CDT 2011 x86_64 GNU/Linux

Still experience this issue, however when only a buggy or/and gpu-heavy is run without prior killing the dpms funcion of the screen. A crash occurs right after the screen goes to sleep, happened with both flashplayer (criminal #1) and stellarium.
The whole issue might be related to the bug#477256 As far as I get a source of the patch suggested there I will set on to test it.

Bryce Harrington (bryce)
summary: - [i945g] xsession crash and consecutive loss of dri/drm capability on
- i915
+ [i945g] GPU hang and consecutive loss of dri/drm capability
summary: - [i945g] GPU hang and consecutive loss of dri/drm capability
+ [i945g] GPU hang and consecutive loss of dri/drm capability [lucid]
affects: xserver-xorg-video-intel (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: Confirmed → New
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 593463

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Brad Bollenbach (bradb) wrote : Gold Watches

Hello Customer

Order these excellent watches now. You don't risk at all. If you are not satisfied with the quality, we will return you the money, if you receive a damaged watch we will send you another one, and we will reship you a watch, if the watch is missing.
Do you know what is the only dissimilarity between our watches and the real brand watches? You are right, it is the cost and that is it.

************************************************************************************
EXCELLENT TRANSACTION.....Pleasure to do business with....Prompt and courteous. Buy with confidence from this company...Beautiful timepiece!!
Thank you!
                     Josephine Akers
************************************************************************************

Click here ---> http://penio.ru

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.